Language Design Notes
About
This page contains list of reasons behind many of the design decisions made for THT.
Ultimately, every decision is a trade-off, trying to balance many factors like familiarity, simplicity, safety, etc.
Not every decision will appeal to everyone, but I hope this page will show that a lot of thought was put into every part of the language.
Design Principles
Here are some of the higher-level principles that helped guide these decisions:
- Favor human performance over machine performance
- Reduce cognitive load by minimizing micro-decisions & visual noise
- Optimize for the most common cases (the 80/20 rule)
- Make the right way (best practices) easier than the wrong way
- Favor 1-to-1 shortcuts over invisible magic
- Be familiar to PHP programmers and (secondarily) JavaScript programmers
Confidence Score
The “V1 Confidence” percentage for each feature shows the likelihood that it will remain unchanged by Version 1.0.
Nothing is etched in stone, so feedback from Beta users could change things.
Comparisons
In a couple of cases, I refer to the Laravel framework, because it is a larger codebase and an example of well-written, modern PHP code.
Contents
- Dot Methods
- JSON Maps
- Prefixed Binary Operators
- No Semicolons
- No Outer Parens
- Dollar Variables
- Single-Quoted Strings
- No for Loop
- No Unary Increment ++ and --
- No while Loop
- CamelCase Names
- 1-Based Indexing
Dot Methods
// PHP Module->method() // THT Module.method()
THT replaces the arrow notation of PHP's method calls because they are simply easier to type and create less visual noise.
This approach is mainstream, and hopefully non-controversial.
Other languages that use dot method calls: JavaScript, Java, Python, Ruby, Swift
V1 Confidence: 100%
JSON Maps
// PHP [ 'key' => 'value', 'num' => 123 ] // THT { key: 'value', num: 123 }
THT replaces arrow/bracket notation with JS object-literal notation.
This is easier to type and contains less visual noise, and should be familiar to all web developers.
V1 Confidence: 100%
Prefixed Binary Operators
// PHP $result = $op1 | $op2 // THT $result = $op +| $op2
Binary operators are almost never used in web development, and the operators are too easily mistaken for their logical counterparts.
If you want a bitmask, most of the time a keyword Map will be easier to work with.
This approach is borrowed from Raku (Perl 6).
V1 Confidence: 100%
No Semicolons
// PHP $a += 1; print($a); // THT $a += 1 print($a)
This reduces visual noise, and it reinforces the good practice of having only one statement per line.
It will sometimes be inconvenient for those of us with semicolons engrained in our muscle memory, but THT gives super clear feedback when a mistake is made, so it’s easy to fix.
Other languages that don’t use/require semicolons: Python, Ruby, Swift, Go, Lua
V1 Confidence: 90%
No Outer Parens
// PHP if ($condition) { ... } // THT if $condition { ... }
This reduces visual noise, and makes it easier to balance parens that are within the condition itself.
Other languages that don’t require parens: Python, Ruby, Rust, Swift, Go
V1 Confidence: 90%
Dollar Variables
// THT & PHP $myVar = 123
THT keeps the dollar “sigils” in variables to retain its identity as a PHP-based programming language.
This admittedly goes against the idea of removing visual noise, but in this case, familiarity is more important.
V1 Confidence: 90%
Single-Quoted Strings
// PHP $myString = 'Hello ' . "World!" // THT $myString = 'Hello ' ~ 'World!'
In PHP, the ability to choose between single or double-quotes is sometimes useful.
However, because string literals are extremely common, this leads to hundreds (maybe thousands) of micro-decisions per project.
Single quotes were chosen for THT because they are a little easier to type (no SHIFT key), and they create a little less visual noise.
Note: Interpolation is TBD, but THT currently has multiple ways to insert text via .fill()
, the ~
stringy operator, and template functions.
V1 Confidence: 90%
No for
Loop
“For what it’s worth, we don’t have a single C style for loop in the Lyft codebase.” — Keith Smiley, Lyft
// PHP for ($i = 1; $i <= 10; $i++) { ... } // THT foreach range(1, 10) as $i { ... }
In any language with a foreach
(or for in
) construct, the C-style for
loop is mostly unnecessary because the vast majority of loops iterate over a collection or a range of numbers.
For example, the Laravel project uses foreach
638 times and for
9 times (7 of those could be written as a foreach/range). That’s about 300-to-1.
Languages that follow this pattern: Python, Ruby, Swift, Rust
V1 Confidence: 90%
No Unary Increment ++
and --
// PHP if (++$myVar) { ... } // THT $myVar += 1 if $myVar { ... }
For such a simple operation (adding 1), this operator is quite complicated.
It often tempts programmers to write “clever” code that mixes mutation with evaluation, and is further complicated by behaving differently when it appears before or after the subject.
You can simply use += 1
instead.
Languages that don’t use ++
: Python, Ruby, Swift, Rust
V1 Confidence: 90%
No while
Loop
// PHP $status = true while ($status) { $status = doSomething(); if (!$status) { break; } } // THT loop { $status = doSomething() if !$status: break }
The while
operator often leads to off-by-one errors and redundant initialization.
The do/while
construct complicates things further, as the only language feature that turns the conventional (predicate) { block }
convention upside down.
THT's loop
codifies the convention of a while (true)
loop, giving you total control over the order of operations and where the loop should break
.
Keep in mind that while
isn’t used very often, so this isn’t a high impact change. In the Laravel project, while
appears in 1 out of every 680 logical lines of code.
Languages with loop
: Rust
V1 Confidence: 80%
CamelCase Names
// PHP $my_variable = HTTPClass::myFunction() // THT $myVariable = HttpClass.myFunction()
In THT, everything is camelCase.
I realize this might be a deal-breaker for some people. I honestly don’t have a strong preference, but the consensus among professional programmers is that having a single consistent style is important, regardless of what is used.
Benefits of a single language-level style:
- More consistency for novice programmers
- No extra complexity around mixing styles to denote different things (e.g. ClassNames vs variable_names)
- Less bikeshedding among teams about cosmetic concerns
- All code chared online is consistent
CamelCase was chosen for a few reasons:
- It is familiar and mainstream, used by JavaScript and Java
- It requires a little less typing
- It creates a little less visual noise
- Names appear as one cohesive visual token
- Double-clicking selects the entire token (vs hyphen-case)
- It allows for lowerCamelCase and UpperCamelCase
- Prefixes are still possible with a single letter (e.g.
$xMyVariable
vs$_my_variable
)
It also allows the THT compiler to do some things like:
- Use underscores internally to implement case sensitivity
- Use UpperCamelCase to recognize module names
Languages that use camelCase convention: JavaScript, Java, Swift
V1 Confidence: 80%
1-Based Indexing
$list = ['a', 'b', 'c'] // PHP $list[0] //= 'a' // THT $list[1] //= 'a'
Unlike PHP and many other C-derived languages, THT starts at one when counting indexes, instead of zero.
Zero-indexing is a vestige of early low-level languages, where it was more efficient to calculate memory addresses that way.
However, the tiny performance benefits that were needed 50 years ago are irrelevant on today’s machines (the original PDP-11 ran at 1.2 MHz with 4KB of RAM).
Zero indexing requires an extra mental step, even for many of us who have been programming for a long time. Modern high-level languages can and should reduce this cognitive load from the programmer.
Here are a number of practical benefits:
- It’s more natural to humans. (Hence the term, “Easy as 1-2-3”.) It’s one of the first things every child learns.
- Mathematics defines counting as the set of
{1, 2, ..., n}
.
- It maps to ordinal numerals “first”, “second”, “third”, etc.
1: 1st 2: 2nd 3: 3rd
- Code editors and programming languages all use line numbers that start at 1.
- It is applicable to math and statistics. Languages like Mathematica and Julia use 1-indexing.
- It reduces the need to litter
-1
and+1
in our code to manually bridge the gap between human-counting and machine-counting, and reduces the chance of off-by-one errors.
- Index-related functions can return
0
to indicate a missing item, which is conveniently falsey, and also makes semantic sense.
// 1-Index if !$list.indexOf('X'): return // 0-Index if $list.indexOf('X') == -1: return
Other languages use out-of-band values like false
or -1
to indicate a missing item. This forces the programmer to use extra caution to make sure it isn’t interpreted as a valid index.
$badIndex = $list.indexOf('missing') // THT (0) $list[$badIndex] // ✓ Immediate error (safe) // PHP (false) $list[$badIndex] // ✕ BUG! Gets the 1st element (false == 0)
- The index of the last element is the same as the collection’s length. Other languages need to offset 1.
// 0-Index $isLastItem = $index == $list.length() - 1 ^^^ // 1-Index $isLastItem = $index == $list.length()
- It can provide a 1-to-1 mapping between the integer index and its value.
$division = [ 'Division I', 'Division II', 'Division III', ] // 0-Index $division[1] //= 'Division II' // 1-Index $division[2] //= 'Division II'
- It simplifies ranges, with less confusion around inclusive vs exclusive ranges.
Inclusive ranges are more natural than exclusive ranges. For example, ask any normal person to “count from 1 to 10”, and they will all include the 10.
// 0-Index foreach range(0, $list.length() - 1) as $i { ... } // 1-Index foreach range(1, $list.length()) as $i { ... }
- Negative and positive indexes are symmetrical. In other languages, each direction has a different offset.
// 0-Index $thirdFromLeft = $list[2] $thirdFromRight = $list[-3] // 1-Index $thirdFromLeft = $list[3] $thirdFromRight = $list[-3]
- On Dijkstra, who said in 1982 that numbering should start at zero.
... conventions a) and b) have the advantage that the difference between the bounds as mentioned equals the length of the subsequence...
When starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N
The only practical benefit he offers is that you can subtract the min from the max to get the length.
How often does this come up for modern programmers? Not very often.
Regarding his aesthetic point, why does he think the second one is “nicer”?
His first example could be written as
1 ≤ i < N+1 <-- has offset 0 ≤ i < N <-- no offset (djikstra) 1 ≤ i ≤ N <-- no offset & same comparison (symmetrical)
- Finally, any argument around ranges is far less relevant nowadays, since we use the
foreach
construct to iterate over collections, versus working with ranges directly. (For example, the Laravel project only uses ranges 7 times in 20,000 logical lines of code.)
// Just iterate foreach ($items as $item) { ... }
Other languages that use 1-based indexing: Smalltalk, Mathematica, R, Lua, Julia.
V1 Confidence: 80%