Language Design Notes

About

This page contains list of reasons behind many of the design decisions made for THT.

Ultimately, every decision is a trade-off, trying to balance many factors like familiarity, simplicity, safety, etc.

Not every decision will appeal to everyone, but I hope this page will show that a lot of thought was put into every part of the language.

Design Principles

Here are some of the higher-level principles that helped guide these decisions:

Favor human performance over machine performance
Reduce cognitive load by minimizing micro-decisions & visual noise
Optimize for the most common cases (the 80/20 rule)
Make the right way (best practices) easier than the wrong way
Favor 1-to-1 shortcuts over invisible magic
Be familiar to PHP programmers and (secondarily) JavaScript programmers

Confidence Score

The “V1 Confidence” percentage for each feature shows the likelihood that it will remain unchanged by Version 1.0.

Nothing is etched in stone, so feedback from Beta users could change things.

Comparisons

In a couple of cases, I refer to the Laravel framework, because it is a larger codebase and an example of well-written, modern PHP code.

Dot Methods
JSON Maps
Prefixed Binary Operators
No Semicolons
No Outer Parens
Dollar Variables
Single-Quoted Strings
No for Loop
No Unary Increment ++ and --
No while Loop
CamelCase Names
1-Based Indexing

Dot Methods

// PHP
Module->method()

// THT
Module.method()

THT replaces the arrow notation of PHP's method calls because they are simply easier to type and create less visual noise.

This approach is mainstream, and hopefully non-controversial.

Other languages that use dot method calls: JavaScript, Java, Python, Ruby, Swift

V1 Confidence: 100%

JSON Maps

// PHP
[ 'key' => 'value', 'num' => 123 ]

// THT
{ key: 'value', num: 123 }

THT replaces arrow/bracket notation with JS object-literal notation.

This is easier to type and contains less visual noise, and should be familiar to all web developers.

V1 Confidence: 100%

Prefixed Binary Operators

// PHP
$result = $op1 | $op2

// THT
$result = $op +| $op2

Binary operators are almost never used in web development, and the operators are too easily mistaken for their logical counterparts.

If you want a bitmask, most of the time a keyword Map will be easier to work with.

This approach is borrowed from Raku (Perl 6).

V1 Confidence: 100%

No Semicolons

// PHP
$a += 1;
print($a);

// THT
$a += 1
print($a)

This reduces visual noise, and it reinforces the good practice of having only one statement per line.

It will sometimes be inconvenient for those of us with semicolons engrained in our muscle memory, but THT gives super clear feedback when a mistake is made, so it’s easy to fix.

Other languages that don’t use/require semicolons: Python, Ruby, Swift, Go, Lua

V1 Confidence: 90%

No Outer Parens

// PHP
if ($condition) {
    ...
}

// THT
if $condition {
    ...
}

This reduces visual noise, and makes it easier to balance parens that are within the condition itself.

Other languages that don’t require parens: Python, Ruby, Rust, Swift, Go

V1 Confidence: 90%

Dollar Variables

// THT & PHP
$myVar = 123

THT keeps the dollar “sigils” in variables to retain its identity as a PHP-based programming language.

This admittedly goes against the idea of removing visual noise, but in this case, familiarity is more important.

V1 Confidence: 90%

Single-Quoted Strings

// PHP
$myString = 'Hello ' . "World!"

// THT
$myString = 'Hello ' ~ 'World!'

In PHP, the ability to choose between single or double-quotes is sometimes useful.

However, because string literals are extremely common, this leads to hundreds (maybe thousands) of micro-decisions per project.

Single quotes were chosen for THT because they are a little easier to type (no SHIFT key), and they create a little less visual noise.

Note: Interpolation is TBD, but THT currently has multiple ways to insert text via .fill(), the ~ stringy operator, and template functions.

V1 Confidence: 90%

No `for` Loop

“For what it’s worth, we don’t have a single C style for loop in the Lyft codebase.” — Keith Smiley, Lyft

// PHP
for ($i = 1; $i <= 10; $i++) { ... }

// THT
foreach range(1, 10) as $i { ... }

In any language with a foreach (or for in) construct, the C-style for loop is mostly unnecessary because the vast majority of loops iterate over a collection or a range of numbers.

For example, the Laravel project uses foreach 638 times and for 9 times (7 of those could be written as a foreach/range). That’s about 300-to-1.

Languages that follow this pattern: Python, Ruby, Swift, Rust

V1 Confidence: 90%

No Unary Increment `++` and `--`

// PHP
if (++$myVar) { ... }

// THT
$myVar += 1
if $myVar { ... }

For such a simple operation (adding 1), this operator is quite complicated.

It often tempts programmers to write “clever” code that mixes mutation with evaluation, and is further complicated by behaving differently when it appears before or after the subject.

You can simply use += 1 instead.

Languages that don’t use ++: Python, Ruby, Swift, Rust

V1 Confidence: 90%

No `while` Loop

// PHP
$status = true
while ($status) {
    $status = doSomething();
    if (!$status) { break; }
}

// THT
loop {
    $status = doSomething()
    if !$status: break
}

The while operator often leads to off-by-one errors and redundant initialization.

The do/while construct complicates things further, as the only language feature that turns the conventional (predicate) { block } convention upside down.

THT's loop codifies the convention of a while (true) loop, giving you total control over the order of operations and where the loop should break.

Keep in mind that while isn’t used very often, so this isn’t a high impact change. In the Laravel project, while appears in 1 out of every 680 logical lines of code.

Languages with loop: Rust

V1 Confidence: 80%

CamelCase Names

// PHP
$my_variable = HTTPClass::myFunction()

// THT
$myVariable = HttpClass.myFunction()

In THT, everything is camelCase.

I realize this might be a deal-breaker for some people. I honestly don’t have a strong preference, but the consensus among professional programmers is that having a single consistent style is important, regardless of what is used.

Benefits of a single language-level style:

More consistency for novice programmers
No extra complexity around mixing styles to denote different things (e.g. ClassNames vs variable_names)
Less bikeshedding among teams about cosmetic concerns
All code chared online is consistent

CamelCase was chosen for a few reasons:

It is familiar and mainstream, used by JavaScript and Java
It requires a little less typing
It creates a little less visual noise
Names appear as one cohesive visual token
Double-clicking selects the entire token (vs hyphen-case)
It allows for lowerCamelCase and UpperCamelCase
Prefixes are still possible with a single letter (e.g. $xMyVariable vs $_my_variable)

It also allows the THT compiler to do some things like:

Use underscores internally to implement case sensitivity
Use UpperCamelCase to recognize module names

Languages that use camelCase convention: JavaScript, Java, Swift

V1 Confidence: 80%

1-Based Indexing

$list = ['a', 'b', 'c']

// PHP
$list[0] //= 'a'

// THT
$list[1] //= 'a'

Unlike PHP and many other C-derived languages, THT starts at one when counting indexes, instead of zero.

Zero-indexing is a vestige of early low-level languages, where it was more efficient to calculate memory addresses that way.

However, the tiny performance benefits that were needed 50 years ago are irrelevant on today’s machines (the original PDP-11 ran at 1.2 MHz with 4KB of RAM).

Zero indexing requires an extra mental step, even for many of us who have been programming for a long time. Modern high-level languages can and should reduce this cognitive load from the programmer.

Here are a number of practical benefits:

It’s more natural to humans. (Hence the term, “Easy as 1-2-3”.) It’s one of the first things every child learns.

Mathematics defines counting as the set of {1, 2, ..., n}.

It maps to ordinal numerals “first”, “second”, “third”, etc.

1: 1st
2: 2nd
3: 3rd

Code editors and programming languages all use line numbers that start at 1.

It is applicable to math and statistics. Languages like Mathematica and Julia use 1-indexing.

It reduces the need to litter -1 and +1 in our code to manually bridge the gap between human-counting and machine-counting, and reduces the chance of off-by-one errors.

Index-related functions can return 0 to indicate a missing item, which is conveniently falsey, and also makes semantic sense.

// 1-Index
if !$list.indexOf('X'): return

// 0-Index
if $list.indexOf('X') == -1: return

Other languages use out-of-band values like false or -1 to indicate a missing item. This forces the programmer to use extra caution to make sure it isn’t interpreted as a valid index.

$badIndex = $list.indexOf('missing')

// THT (0)
$list[$badIndex]  // ✓ Immediate error (safe)

// PHP (false)
$list[$badIndex]  // ✕ BUG! Gets the 1st element (false == 0)

The index of the last element is the same as the collection’s length. Other languages need to offset 1.

// 0-Index
$isLastItem = $index == $list.length() - 1
                                       ^^^
// 1-Index
$isLastItem = $index == $list.length()

It can provide a 1-to-1 mapping between the integer index and its value.

$division = [
    'Division I',
    'Division II',
    'Division III',
]

// 0-Index
$division[1] //= 'Division II'

// 1-Index
$division[2] //= 'Division II'

It simplifies ranges, with less confusion around inclusive vs exclusive ranges.

Inclusive ranges are more natural than exclusive ranges. For example, ask any normal person to “count from 1 to 10”, and they will all include the 10.

// 0-Index
foreach range(0, $list.length() - 1) as $i { ... }

// 1-Index
foreach range(1, $list.length()) as $i { ... }

Negative and positive indexes are symmetrical. In other languages, each direction has a different offset.

// 0-Index
$thirdFromLeft  = $list[2]
$thirdFromRight = $list[-3]

// 1-Index
$thirdFromLeft  = $list[3]
$thirdFromRight = $list[-3]

On Dijkstra, who said in 1982 that numbering should start at zero.

... conventions a) and b) have the advantage that the difference between the bounds as mentioned equals the length of the subsequence...

When starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N

The only practical benefit he offers is that you can subtract the min from the max to get the length.

How often does this come up for modern programmers? Not very often.

Regarding his aesthetic point, why does he think the second one is “nicer”?

His first example could be written as “1 ≤ i ≤ N”, which is arguably the nicest of the three, because it is more symmetrical.

1 ≤ i < N+1  <-- has offset
0 ≤ i < N    <-- no offset (djikstra)
1 ≤ i ≤ N    <-- no offset & same comparison (symmetrical)

Finally, any argument around ranges is far less relevant nowadays, since we use the foreach construct to iterate over collections, versus working with ranges directly. (For example, the Laravel project only uses ranges 7 times in 20,000 logical lines of code.)

// Just iterate
foreach ($items as $item) {
    ...
}

Other languages that use 1-based indexing: Smalltalk, Mathematica, R, Lua, Julia.

V1 Confidence: 80%

Back: Manual

Language Design Notes

About

Design Principles

Confidence Score

Comparisons

Contents

Dot Methods

JSON Maps

Prefixed Binary Operators

No Semicolons

No Outer Parens

Dollar Variables

Single-Quoted Strings

No for Loop

No Unary Increment ++ and --

No while Loop

CamelCase Names

1-Based Indexing

No `for` Loop

No Unary Increment `++` and `--`

No `while` Loop