Regex Syntax

Basics

Literal characters within a pattern are matched as-is, and are case sensitive.

Pattern	Matches
xyz	abcxyz
ran	orange

Quanitifiers

You can match a variable number of repeated letters with a quantifier symbol.

Symbol	Match
*	Zero or more times
+	One or more times
?	Zero or one time
{n}	Exact number of times
{n,}	N or more times
{n1,n2}	Between n1 and n2 times

Examples:

Pattern	Matches
ab*z	az, abz, abbbbz
ab+z	abz, abbbbz
ab?z	az, abz
ab{3}z	abbbz
ab{2,}z	abbz, abbbbbbbz
ab{1,2}z	abz, abbz

Meta Characters

Meta-characters let you match characters by their type.

You can also escape special characters using backslash \.

Symbol	Match
.	Any single character
\d	Digit [0-9]
\D	Non-digit
\w	Alphanumeric character
\W	Non-alphanumberic character
\s	Whitespace character
\S	Non-whitespace character
\n	Newline
\	Escape a special character

Examples:

Pattern	Matches
a.z	abz, a2z, a z, a#z
a\d+z	a1z, a2345z
123\D	123x, 123!
\w+	apple, c64
\w+\W	hello!, correct.
\w+\s\w+	red fish, abc 123
\d+ \+ \d	2 + 3

Character Classes

You can specify a list of characters to match by using a character class [...].

To match all characters that are NOT in the list, start the class with a caret ^.

Symbol	Match
[abc]	Any character that is 'a', 'b', or 'c'
[a-z]	Any character between 'a' and 'z' (ASCII order)
[^abc]	Any character that is NOT 'a', 'b', or 'c'
[+*?.]	Special characters are treated as literals
[0-9a-f]	Any character between 0 and 9, or 'a' through 'f'

Examples:

Pattern	Matches
[tdl]ime	time, dime, lime
[a-z]+zzle	fizzle, drizzle, muzzle
fla[^w]	flag, flat, fla2
\w+[?!.]	Whoa!, Huh?, okay.
[A-Z0-9]+	C3P0, TRS80, AW3SOM3

Non-Greedy Match

By default, quantifiers slurp in as many characters as possible.

Add a `?` to a quantifier to make it non-greedy.

Examples:

Pattern	Matches	Without '?'
.*?/	abc/def/xyz	abc/def/xyz

Unicode Characters

To match a specific Unicode code point, use \x{number}.

To match a built-in Unicode character class (see below), use \p{...}.

To match characters that are not in the character class, use uppercase \P{...}.

For more information, see this Unicode Regex Reference.

Code	Matches
\x{1234}	Unicode code point U+1234
\p{L}	Any kind of letter from any language.
\p{Z}	Any kind of whitespace or invisible separator.
\p{N}	Any kind of numeric character in any script.
\p{P}	Any kind of punctuation character.

Anchor Symbols

Anchor symbols are not characters, but represent positions within the string (between characters).

Symbol	Match
^	Beginning of string
$	End of string
\b	Word boundary
\B	Not a word boundary

Pattern	Matches	Not Match
^gr	great, green grape	agriculture
ine$	this is fine	is this fine?
s+\b	less is more, hissss	finesse

Match Groups

You can group together subpatterns with parens (...).

This allows you to do two things:

Capture the inner match for later use
Provide "OR" logic via the | separator

Symbol	Match
(\w+)	Capture word match
(ab\|cd)	Capture 'ab' or 'cd'
(ab\|cd\|xy)	Capture 'ab' or 'cd' or 'xy'
(abc\|[0-9]*)	Capture 'abc' or any amount of digits

Example:

$text = 'Product: Tomatoes, Count: 33'
$pattern = rx'Product: (\w+), Count: (\d+|none)'

$matches = $text.match($pattern)

print($matches[1])
//= 'Tomatoes'

print($matches[2])
//= '33'

Modifiers

Modifiers are flags appended to the end of a regex string that change the overall behavior of the pattern.

Suffix	Name	Effect
i	Ignore Case	Case-insensitive match.
m	Multiline	`^` and `$` match start/end of a line, not the full string.
s	Single Line	Dot `.` also matches newlines.
x	Extended	Ignore literal spaces between pattern symbols. (To provide more clarity in complex patterns.)

$poem = '''
    Roses are Red
    Violets are Blue
'''

$poem.contains(rx'rose'i)
//= true

$poem.contains(rx'red$'mi)
//= true (end of line) ^

$poem.contains(rx'red.*blue'si)
//= true (across lines)     ^

$line = 'ID:ABC-123'
$line.match(rx'\w+ : (\w+ - \d+)'x)
//= { 1: 'ABC-123' }

Back: Manual