Regex Syntax
See also Language Tour - Regexes
Basics
Literal characters within a pattern are matched as-is, and are case sensitive.
Pattern | Matches |
xyz | abcxyz |
ran | orange |
Quanitifiers
You can match a variable number of repeated letters with a quantifier symbol.
Symbol | Match |
* | Zero or more times |
+ | One or more times |
? | Zero or one time |
{n} | Exact number of times |
{n,} | N or more times |
{n1,n2} | Between n1 and n2 times |
Examples:
Pattern | Matches |
ab*z | az, abz, abbbbz |
ab+z | abz, abbbbz |
ab?z | az, abz |
ab{3}z | abbbz |
ab{2,}z | abbz, abbbbbbbz |
ab{1,2}z | abz, abbz |
Meta Characters
Meta-characters let you match characters by their type.
You can also escape special characters using backslash \
.
Symbol | Match |
. | Any single character |
\d | Digit [0-9] |
\D | Non-digit |
\w | Alphanumeric character |
\W | Non-alphanumberic character |
\s | Whitespace character |
\S | Non-whitespace character |
\n | Newline |
\ | Escape a special character |
Examples:
Pattern | Matches |
a.z | abz, a2z, a z, a#z |
a\d+z | a1z, a2345z |
123\D | 123x, 123! |
\w+ | apple, c64 |
\w+\W | hello!, correct. |
\w+\s\w+ | red fish, abc 123 |
\d+ \+ \d | 2 + 3 |
Character Classes
You can specify a list of characters to match by using a character class [...]
.
To match all characters that are NOT in the list, start the class with a caret ^
.
Symbol | Match |
[abc] | Any character that is 'a', 'b', or 'c' |
[a-z] | Any character between 'a' and 'z' (ASCII order) |
[^abc] | Any character that is NOT 'a', 'b', or 'c' |
[+*?.] | Special characters are treated as literals |
[0-9a-f] | Any character between 0 and 9, or 'a' through 'f' |
Examples:
Pattern | Matches |
[tdl]ime | time, dime, lime |
[a-z]+zzle | fizzle, drizzle, muzzle |
fla[^w] | flag, flat, fla2 |
\w+[?!.] | Whoa!, Huh?, okay. |
[A-Z0-9]+ | C3P0, TRS80, AW3SOM3 |
Non-Greedy Match
By default, quantifiers slurp in as many characters as possible.
Add a `?` to a quantifier to make it non-greedy.
Examples:
Pattern | Matches | Without '?' |
.*?/ | abc/def/xyz | abc/def/xyz |
Anchor Symbols
Anchor symbols are not characters, but represent positions within the string (between characters).
Symbol | Match |
^ | Beginning of string |
$ | End of string |
\b | Word boundary |
\B | Not a word boundary |
Pattern | Matches | Not Match |
^gr | great, green grape | agriculture |
ine$ | this is fine | is this fine? |
s+\b | less is more, hissss | finesse |
Match Groups
You can group together subpatterns with parens (...)
.
This allows you to do two things:
- Capture the inner match for later use
- Provide "OR" logic via the
|
separator
Symbol | Match |
(\w+) | Capture word match |
(ab|cd) | Capture 'ab' or 'cd' |
(ab|cd|xy) | Capture 'ab' or 'cd' or 'xy' |
(abc|[0-9]*) | Capture 'abc' or any amount of digits |
Example:
$text = 'Product: Tomatoes, Count: 33' $pattern = r'Product: (\w+), Count: (\d+|none)' $matches = $text.match($pattern) print($matches[1]) //= 'Tomatoes' print($matches[2]) //= '33'
Modifiers
Modifiers are flags that change the overall behavior of the pattern.
Suffix | Name | Effect |
i | Ignore case | Case-insensitive match. |
m | Multiline | ^ and $ match start/end of a line, not the full string. |
s | Single line | Dot . also matches newlines. |
x | Extended | Ignore spaces (allows manual formatting) |
TODO: examples