Regex Syntax
See also Language Tour - Regexes
Basics
Literal characters within a pattern are matched as-is, and are case sensitive.
Pattern | Matches |
xyz | abcxyz |
ran | orange |
Quanitifiers
You can match a variable number of repeated letters with a quantifier symbol.
Symbol | Match |
* | Zero or more times |
+ | One or more times |
? | Zero or one time |
{n} | Exact number of times |
{n,} | N or more times |
{n1,n2} | Between n1 and n2 times |
Examples:
Pattern | Matches |
ab*z | az, abz, abbbbz |
ab+z | abz, abbbbz |
ab?z | az, abz |
ab{3}z | abbbz |
ab{2,}z | abbz, abbbbbbbz |
ab{1,2}z | abz, abbz |
Meta Characters
Meta-characters let you match characters by their type.
You can also escape special characters using backslash \
.
Symbol | Match |
. | Any single character |
\d | Digit [0-9] |
\D | Non-digit |
\w | Alphanumeric character |
\W | Non-alphanumberic character |
\s | Whitespace character |
\S | Non-whitespace character |
\n | Newline |
\ | Escape a special character |
Examples:
Pattern | Matches |
a.z | abz, a2z, a z, a#z |
a\d+z | a1z, a2345z |
123\D | 123x, 123! |
\w+ | apple, c64 |
\w+\W | hello!, correct. |
\w+\s\w+ | red fish, abc 123 |
\d+ \+ \d | 2 + 3 |
Character Classes
You can specify a list of characters to match by using a character class [...]
.
To match all characters that are NOT in the list, start the class with a caret ^
.
Symbol | Match |
[abc] | Any character that is 'a', 'b', or 'c' |
[a-z] | Any character between 'a' and 'z' (ASCII order) |
[^abc] | Any character that is NOT 'a', 'b', or 'c' |
[+*?.] | Special characters are treated as literals |
[0-9a-f] | Any character between 0 and 9, or 'a' through 'f' |
Examples:
Pattern | Matches |
[tdl]ime | time, dime, lime |
[a-z]+zzle | fizzle, drizzle, muzzle |
fla[^w] | flag, flat, fla2 |
\w+[?!.] | Whoa!, Huh?, okay. |
[A-Z0-9]+ | C3P0, TRS80, AW3SOM3 |
Non-Greedy Match
By default, quantifiers slurp in as many characters as possible.
Add a `?` to a quantifier to make it non-greedy.
Examples:
Pattern | Matches | Without '?' |
.*?/ | abc/def/xyz | abc/def/xyz |
Unicode Characters
To match a specific Unicode code point, use \x{number}
.
To match a built-in Unicode character class (see below), use \p{...}
.
To match characters that are not in the character class, use uppercase \P{...}
.
For more information, see this Unicode Regex Reference.
Code | Matches |
\x{1234} | Unicode code point U+1234 |
\p{L} | Any kind of letter from any language. |
\p{Z} | Any kind of whitespace or invisible separator. |
\p{N} | Any kind of numeric character in any script. |
\p{P} | Any kind of punctuation character. |
Anchor Symbols
Anchor symbols are not characters, but represent positions within the string (between characters).
Symbol | Match |
^ | Beginning of string |
$ | End of string |
\b | Word boundary |
\B | Not a word boundary |
Pattern | Matches | Not Match |
^gr | great, green grape | agriculture |
ine$ | this is fine | is this fine? |
s+\b | less is more, hissss | finesse |
Match Groups
You can group together subpatterns with parens (...)
.
This allows you to do two things:
- Capture the inner match for later use
- Provide "OR" logic via the
|
separator
Symbol | Match |
(\w+) | Capture word match |
(ab|cd) | Capture 'ab' or 'cd' |
(ab|cd|xy) | Capture 'ab' or 'cd' or 'xy' |
(abc|[0-9]*) | Capture 'abc' or any amount of digits |
Example:
$text = 'Product: Tomatoes, Count: 33' $pattern = rx'Product: (\w+), Count: (\d+|none)' $matches = $text.match($pattern) print($matches[1]) //= 'Tomatoes' print($matches[2]) //= '33'
Modifiers
Modifiers are flags appended to the end of a regex string that change the overall behavior of the pattern.
Suffix | Name | Effect |
i | Ignore Case | Case-insensitive match. |
m | Multiline | ^ and $ match start/end of a line, not the full string. |
s | Single Line | Dot . also matches newlines. |
x | Extended | Ignore literal spaces between pattern symbols. (To provide more clarity in complex patterns.) |
$poem = ''' Roses are Red Violets are Blue ''' $poem.contains(rx'rose'i) //= true $poem.contains(rx'red$'mi) //= true (end of line) ^ $poem.contains(rx'red.*blue'si) //= true (across lines) ^ $line = 'ID:ABC-123' $line.match(rx'\w+ : (\w+ - \d+)'x) //= { 1: 'ABC-123' }