Version: v0.8.0 - Beta.  We welcome contributors & feedback.

Language Design Notes

About

This page contains list of reasons behind many of the design decisions made for THT.

Ultimately, every decision is a trade-off, trying to balance many factors like familiarity, simplicity, safety, etc.

Not every decision will appeal to everyone, but I hope this page will show that a lot of thought was put into every part of the language.

Design Principles

Here are some of the higher-level principles that helped guide these decisions:

Confidence Score

The “V1 Confidence” percentage for each feature shows the likelihood that it will remain unchanged by Version 1.0.

Nothing is etched in stone, so feedback from Beta users could change things.

Comparisons

In a couple of cases, I refer to the Laravel framework, because it is a larger codebase and an example of well-written, modern PHP code.

Contents

Dot Methods 

// PHP
Module->method()

// THT
Module.method()

THT replaces the arrow notation of PHP's method calls because they are simply easier to type and create less visual noise.

This approach is mainstream, and hopefully non-controversial.

Other languages that use dot method calls: JavaScript, Java, Python, Ruby, Swift

V1 Confidence: 100%

JSON Maps 

// PHP
[ 'key' => 'value', 'num' => 123 ]

// THT
{ key: 'value', num: 123 }

THT replaces arrow/bracket notation with JS object-literal notation.

This is easier to type and contains less visual noise, and should be familiar to all web developers.

V1 Confidence: 100%

Prefixed Binary Operators 

// PHP
$result = $op1 | $op2

// THT
$result = $op +| $op2

Binary operators are almost never used in web development, and the operators are too easily mistaken for their logical counterparts.

If you want a bitmask, most of the time a keyword Map will be easier to work with.

This approach is borrowed from Raku (Perl 6).

V1 Confidence: 100%

No Semicolons 

// PHP
$a += 1;
print($a);

// THT
$a += 1
print($a)

This reduces visual noise, and it reinforces the good practice of having only one statement per line.

It will sometimes be inconvenient for those of us with semicolons engrained in our muscle memory, but THT gives super clear feedback when a mistake is made, so it’s easy to fix.

Other languages that don’t use/require semicolons: Python, Ruby, Swift, Go, Lua

V1 Confidence: 90%

No Outer Parens 

// PHP
if ($condition) {
    ...
}

// THT
if $condition {
    ...
}

This reduces visual noise, and makes it easier to balance parens that are within the condition itself.

Other languages that don’t require parens: Python, Ruby, Rust, Swift, Go

V1 Confidence: 90%

Dollar Variables 

// THT & PHP
$myVar = 123

THT keeps the dollar “sigils” in variables to retain its identity as a PHP-based programming language.

This admittedly goes against the idea of removing visual noise, but in this case, familiarity is more important.

V1 Confidence: 90%

Single-Quoted Strings 

// PHP
$myString = 'Hello ' . "World!"

// THT
$myString = 'Hello ' ~ 'World!'

In PHP, the ability to choose between single or double-quotes is sometimes useful.

However, because string literals are extremely common, this leads to hundreds (maybe thousands) of micro-decisions per project.

Single quotes were chosen for THT because they are a little easier to type (no SHIFT key), and they create a little less visual noise.

Note: Interpolation is TBD, but THT currently has multiple ways to insert text via .fill(), the ~ stringy operator, and template functions.

V1 Confidence: 90%

No for Loop 

“For what it’s worth, we don’t have a single C style for loop in the Lyft codebase.” — Keith Smiley, Lyft
// PHP
for ($i = 1; $i <= 10; $i++) { ... }

// THT
foreach range(1, 10) as $i { ... }

In any language with a foreach (or for in) construct, the C-style for loop is mostly unnecessary because the vast majority of loops iterate over a collection or a range of numbers.

For example, the Laravel project uses foreach 638 times and for 9 times (7 of those could be written as a foreach/range). That’s about 300-to-1.

Languages that follow this pattern: Python, Ruby, Swift, Rust

V1 Confidence: 90%

No Unary Increment ++ and -- 

// PHP
if (++$myVar) { ... }

// THT
$myVar += 1
if $myVar { ... }

For such a simple operation (adding 1), this operator is quite complicated.

It often tempts programmers to write “clever” code that mixes mutation with evaluation, and is further complicated by behaving differently when it appears before or after the subject.

You can simply use += 1 instead.

Languages that don’t use ++: Python, Ruby, Swift, Rust

V1 Confidence: 90%

No while Loop 

// PHP
$status = true
while ($status) {
    $status = doSomething();
    if (!$status) { break; }
}

// THT
loop {
    $status = doSomething()
    if !$status: break
}

The while operator often leads to off-by-one errors and redundant initialization.

The do/while construct complicates things further, as the only language feature that turns the conventional (predicate) { block } convention upside down.

THT's loop codifies the convention of a while (true) loop, giving you total control over the order of operations and where the loop should break.

Keep in mind that while isn’t used very often, so this isn’t a high impact change. In the Laravel project, while appears in 1 out of every 680 logical lines of code.

Languages with loop: Rust

V1 Confidence: 80%

CamelCase Names 

// PHP
$my_variable = HTTPClass::myFunction()

// THT
$myVariable = HttpClass.myFunction()

In THT, everything is camelCase.

I realize this might be a deal-breaker for some people. I honestly don’t have a strong preference, but the consensus among professional programmers is that having a single consistent style is important, regardless of what is used.

Benefits of a single language-level style:

CamelCase was chosen for a few reasons:

It also allows the THT compiler to do some things like:

Languages that use camelCase convention: JavaScript, Java, Swift

V1 Confidence: 80%

1-Based Indexing 

$list = ['a', 'b', 'c']

// PHP
$list[0] //= 'a'

// THT
$list[1] //= 'a'

Unlike PHP and many other C-derived languages, THT starts at one when counting indexes, instead of zero.

Zero-indexing is a vestige of early low-level languages, where it was more efficient to calculate memory addresses that way.

However, the tiny performance benefits that were needed 50 years ago are irrelevant on today’s machines (the original PDP-11 ran at 1.2 MHz with 4KB of RAM).

Zero indexing requires an extra mental step, even for many of us who have been programming for a long time. Modern high-level languages can and should reduce this cognitive load from the programmer.

Here are a number of practical benefits:

1: 1st
2: 2nd
3: 3rd
// 1-Index
if !$list.indexOf('X'): return

// 0-Index
if $list.indexOf('X') == -1: return

Other languages use out-of-band values like false or -1 to indicate a missing item. This forces the programmer to use extra caution to make sure it isn’t interpreted as a valid index.

$badIndex = $list.indexOf('missing')

// THT (0)
$list[$badIndex]  // ✓ Immediate error (safe)

// PHP (false)
$list[$badIndex]  // ✕ BUG! Gets the 1st element (false == 0)
// 0-Index
$isLastItem = $index == $list.length() - 1
                                       ^^^
// 1-Index
$isLastItem = $index == $list.length()
$division = [
    'Division I',
    'Division II',
    'Division III',
]

// 0-Index
$division[1] //= 'Division II'

// 1-Index
$division[2] //= 'Division II'

Inclusive ranges are more natural than exclusive ranges. For example, ask any normal person to “count from 1 to 10”, and they will all include the 10.

// 0-Index
foreach range(0, $list.length() - 1) as $i { ... }

// 1-Index
foreach range(1, $list.length()) as $i { ... }
// 0-Index
$thirdFromLeft  = $list[2]
$thirdFromRight = $list[-3]

// 1-Index
$thirdFromLeft  = $list[3]
$thirdFromRight = $list[-3]
... conventions a) and b) have the advantage that the difference between the bounds as mentioned equals the length of the subsequence...

When starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N

The only practical benefit he offers is that you can subtract the min from the max to get the length.

How often does this come up for modern programmers? Not very often.

Regarding his aesthetic point, why does he think the second one is “nicer”?

His first example could be written as “1 ≤ i ≤ N”, which is arguably the nicest of the three, because it is more symmetrical.

1 ≤ i < N+1  <-- has offset
0 ≤ i < N    <-- no offset (djikstra)
1 ≤ i ≤ N    <-- no offset & same comparison (symmetrical)
// Just iterate
foreach ($items as $item) {
    ...
}

Other languages that use 1-based indexing: Smalltalk, Mathematica, R, Lua, Julia.

V1 Confidence: 80%