Version: v0.8.1 - Beta.  We welcome contributors & feedback.

Regex Syntax

See also Language Tour - Regexes

Basics

Literal characters within a pattern are matched as-is, and are case sensitive.

PatternMatches
xyzabcxyz
ranorange

Quanitifiers

You can match a variable number of repeated letters with a quantifier symbol.

Symbol Match
* Zero or more times
+ One or more times
? Zero or one time
{n} Exact number of times
{n,} N or more times
{n1,n2} Between n1 and n2 times

Examples:

Pattern Matches
ab*z az, abz, abbbbz
ab+z abz, abbbbz
ab?z az, abz
ab{3}z abbbz
ab{2,}z abbz, abbbbbbbz
ab{1,2}z abz, abbz

Meta Characters

Meta-characters let you match characters by their type.

You can also escape special characters using backslash \.

Symbol Match
. Any single character
\d Digit [0-9]
\D Non-digit
\w Alphanumeric character
\W Non-alphanumberic character
\s Whitespace character
\S Non-whitespace character
\n Newline
\ Escape a special character

Examples:

Pattern Matches
a.z abz, a2z, a z, a#z
a\d+z a1z, a2345z
123\D 123x, 123!
\w+ apple, c64
\w+\W hello!, correct.
\w+\s\w+ red fish, abc 123
\d+ \+ \d 2 + 3

Character Classes

You can specify a list of characters to match by using a character class [...].

To match all characters that are NOT in the list, start the class with a caret ^.

Symbol Match
[abc] Any character that is 'a', 'b', or 'c'
[a-z] Any character between 'a' and 'z' (ASCII order)
[^abc] Any character that is NOT 'a', 'b', or 'c'
[+*?.] Special characters are treated as literals
[0-9a-f] Any character between 0 and 9, or 'a' through 'f'

Examples:

Pattern Matches
[tdl]imetime, dime, lime
[a-z]+zzle fizzle, drizzle, muzzle
fla[^w] flag, flat, fla2
\w+[?!.] Whoa!, Huh?, okay.
[A-Z0-9]+ C3P0, TRS80, AW3SOM3

Non-Greedy Match

By default, quantifiers slurp in as many characters as possible.

Add a `?` to a quantifier to make it non-greedy.

Examples:

Pattern Matches Without '?'
.*?/ abc/def/xyzabc/def/xyz

Unicode Characters

To match a specific Unicode code point, use \x{number}.

To match a built-in Unicode character class (see below), use \p{...}.

To match characters that are not in the character class, use uppercase \P{...}.

For more information, see this Unicode Regex Reference.

Code Matches
\x{1234} Unicode code point U+1234
\p{L} Any kind of letter from any language.
\p{Z} Any kind of whitespace or invisible separator.
\p{N} Any kind of numeric character in any script.
\p{P} Any kind of punctuation character.

Anchor Symbols

Anchor symbols are not characters, but represent positions within the string (between characters).

Symbol Match
^ Beginning of string
$ End of string
\b Word boundary
\B Not a word boundary
Pattern Matches Not Match
^gr great, green grape agriculture
ine$ this is fine is this fine?
s+\b less is more, hissss finesse

Match Groups

You can group together subpatterns with parens (...).

This allows you to do two things:

  1. Capture the inner match for later use
  2. Provide "OR" logic via the | separator
Symbol Match
(\w+) Capture word match
(ab|cd) Capture 'ab' or 'cd'
(ab|cd|xy) Capture 'ab' or 'cd' or 'xy'
(abc|[0-9]*) Capture 'abc' or any amount of digits

Example:

$text = 'Product: Tomatoes, Count: 33'
$pattern = rx'Product: (\w+), Count: (\d+|none)'

$matches = $text.match($pattern)

print($matches[1])
//= 'Tomatoes'

print($matches[2])
//= '33'

Modifiers

Modifiers are flags appended to the end of a regex string that change the overall behavior of the pattern.

Suffix Name Effect
i Ignore Case Case-insensitive match.
m Multiline ^ and $ match start/end of a line, not the full string.
s Single Line Dot . also matches newlines.
x Extended Ignore literal spaces between pattern symbols.
(To provide more clarity in complex patterns.)
$poem = '''
    Roses are Red
    Violets are Blue
'''

$poem.contains(rx'rose'i)
//= true

$poem.contains(rx'red$'mi)
//= true (end of line) ^

$poem.contains(rx'red.*blue'si)
//= true (across lines)     ^

$line = 'ID:ABC-123'
$line.match(rx'\w+ : (\w+ - \d+)'x)
//= { 1: 'ABC-123' }