Version: v0.7.1 - Beta.  We welcome contributors & feedback.

Regex Syntax

See also Language Tour - Regexes

Basics

Literal characters within a pattern are matched as-is, and are case sensitive.

PatternMatches
xyzabcxyz
ranorange

Quanitifiers

You can match a variable number of repeated letters with a quantifier symbol.

Symbol Match
* Zero or more times
+ One or more times
? Zero or one time
{n} Exact number of times
{n,} N or more times
{n1,n2} Between n1 and n2 times

Examples:

Pattern Matches
ab*z az, abz, abbbbz
ab+z abz, abbbbz
ab?z az, abz
ab{3}z abbbz
ab{2,}z abbz, abbbbbbbz
ab{1,2}z abz, abbz

Meta Characters

Meta-characters let you match characters by their type.

You can also escape special characters using backslash \.

Symbol Match
. Any single character
\d Digit [0-9]
\D Non-digit
\w Alphanumeric character
\W Non-alphanumberic character
\s Whitespace character
\S Non-whitespace character
\n Newline
\ Escape a special character

Examples:

Pattern Matches
a.z abz, a2z, a z, a#z
a\d+z a1z, a2345z
123\D 123x, 123!
\w+ apple, c64
\w+\W hello!, correct.
\w+\s\w+ red fish, abc 123
\d+ \+ \d 2 + 3

Character Classes

You can specify a list of characters to match by using a character class [...].

To match all characters that are NOT in the list, start the class with a caret ^.

Symbol Match
[abc] Any character that is 'a', 'b', or 'c'
[a-z] Any character between 'a' and 'z' (ASCII order)
[^abc] Any character that is NOT 'a', 'b', or 'c'
[+*?.] Special characters are treated as literals
[0-9a-f] Any character between 0 and 9, or 'a' through 'f'

Examples:

Pattern Matches
[tdl]imetime, dime, lime
[a-z]+zzle fizzle, drizzle, muzzle
fla[^w] flag, flat, fla2
\w+[?!.] Whoa!, Huh?, okay.
[A-Z0-9]+ C3P0, TRS80, AW3SOM3

Non-Greedy Match

By default, quantifiers slurp in as many characters as possible.

Add a `?` to a quantifier to make it non-greedy.

Examples:

Pattern Matches Without '?'
.*?/ abc/def/xyzabc/def/xyz

Anchor Symbols

Anchor symbols are not characters, but represent positions within the string (between characters).

Symbol Match
^ Beginning of string
$ End of string
\b Word boundary
\B Not a word boundary
Pattern Matches Not Match
^gr great, green grape agriculture
ine$ this is fine is this fine?
s+\b less is more, hissss finesse

Match Groups

You can group together subpatterns with parens (...).

This allows you to do two things:

  1. Capture the inner match for later use
  2. Provide "OR" logic via the | separator
Symbol Match
(\w+) Capture word match
(ab|cd) Capture 'ab' or 'cd'
(ab|cd|xy) Capture 'ab' or 'cd' or 'xy'
(abc|[0-9]*) Capture 'abc' or any amount of digits

Example:

$text = 'Product: Tomatoes, Count: 33'
$pattern = r'Product: (\w+), Count: (\d+|none)'

$matches = $text.match($pattern)

print($matches[1])
//= 'Tomatoes'

print($matches[2])
//= '33'

Modifiers

Modifiers are flags that change the overall behavior of the pattern.

Suffix Name Effect
i Ignore case Case-insensitive match.
m Multiline ^ and $ match start/end of a line, not the full string.
s Single line Dot . also matches newlines.
x Extended Ignore spaces (allows manual formatting)

TODO: examples