StarAgile
Mar 10, 2023
4,357
15 mins
A specialised characteristic pattern which describes a certain amount of text is referred to as Regex or regexp (Regular expressions). When these patterns are used from any source it is called Regex Cheat sheet or regular expressions cheat sheet. In natural processing and various other tasks like common text data manipulation
There are several regex engines that can be used to process regexes. Regex engines act differently depending on the syntax they use, and you can get a list of most popular engines in this blog. Two of the common languages include Python and R, which have their own engines.
A regex or regular expression cheat sheet can be used to extract substrings from lower strings, check for patterns in a text, and modify text since it describes patterns of text. There are many different types of regex, including simple regex which describes specific words, and more complex regex which finds vague patterns of characters, such as the top-level domain at the end of a URL.
Regular expressions cheat sheets are a powerful tool for manipulating text and searching for patterns. Here's a cheat sheet of some of the most commonly used syntax and regex Python:
1. Anchors
These match a position just before or after any other character:
Syntax | Description | Example Pattern | Example Matches | Example non- matches |
^, \A | Matches the start of line | ^r, \Ar | run recess | dog cat |
$, \Z | Match the end of line | n$, n\Z | corn fun | mobile speaker |
\b | Match the characters at the start and end of a word | \brat\b | the rat ran the rat ate | ratskin ratflow |
\B | Match characters in between of a word | \Boo \B | book shook | shoe flue
|
2. Matching types of character
The more specific type of character tp match is not just the character itself, but the type of character, such as letter, number, and more.
Syntax | Description | Example Pattern | Example Matches | Example non- matches |
. | Anything except for a line break | c.e | cheap clean
| Acert cent |
\d | match a digit | \d | 6060-896 2b|^2b | Tw **___ |
\D | Match a non-digit | \D | The 5 cats ate 12 angry rats | 52 10032 |
\w
| Match word characters | \wee\w | Trees bees | The bee Eels eat skin |
\W | Match non-word characters | \Wbat\W | At bat Swing the bat slow | Wombat bat53 |
\s | Match whitespace | \sfox\s | The fox ate The fox ran | It’s the fox fox-fur |
\S | Match non-whitespace | \See\S | Trees beef | The bee stung The tall tree |
\metacharacter | Escape a metacharacter to match on the metacharacter | \. \^ | The cat ate. 2^3 | The cat ate 23 |
3. Character classes
These are sets or ranges of characters
Syntax | Description | Example Pattern | Example Matches | Example non-matches |
[xy] | Match several characters | gr[ea]y | Gray grey | Green greek |
[x-y] | Match a range of characters | [a-e] | Amber brand | Fox join |
[^xy] | Does not match several characters | gr[^ea]y | Green greek | Gray grey |
[\ ^-] | Match metacharacters inside the character class | 4[\^\.-+*/]\d | 4^3 4.2 | 44 23 |
4. Repetition
The repeated appearance of characters can be matched rather than a single instance of them.
Syntax | Description | Example Pattern | Example Matches | Example non- matches |
x* | Matches zero or more times | ar*o | Cacao carrot | Arugula artichoke |
x+ | Matches one or more times | re+ | Green tree | Trap ruined |
x? | Matches zero or one times | ro?a | roast rant | Root rear |
x{m} | Match m times | \we{2}\w | deer seer | Red enter |
x{m,} | Match m or more times | 2{3,}4 | 671-2224 22224 | 224 123 |
x{m,n} | Match between m and n times | 12{1,3}3 | 1234 1222389 | 15335 122223 |
x*?,x+?, etc | match the minimum number of times - known as a lazy quantifier | re+? | Tree free | Trout roasted |
5. Capturing, alternation & backreferences
Using the capture function, you can identify the parts of the string that you want to extract.
Syntax | Description | Example Pattern | Example Matches | Example non-matches |
(x) | Capturing a pattern | (iss)+ | Mississippi missed | Mist persist |
(?:x) | Create a group without capturing | (?:ab)(cd) | Match: abcd Group 1: cd | acbd |
(?<name>X) | Create a named capture group | (?<first>\d)(?<scrond>\d)\d* | Match: 1325 first: 1 second: 3 | 2 hello |
(x|y) | Match several alternative patterns | (re|ba) | red banter | Rant bear |
\n | reference previous captures where n is the group index starting at 1 | (b)(\w*)\1 | blob bribe | Bear bring |
\k<name> | Reference named captures | (?<first>5)(\d*)\k<first> | 51245 55 | 523 51 |
6. Lookahead
Characters can be specified before or after matching, without those characters appearing in the match.
Syntax | Description | Example Pattern | Example Matches | Example non-matches |
(?=x) | looks ahead at the next characters without using them in the match | an(?=an) iss(?=ipp) | banana Mississippi | band missed |
(?!x) | looks ahead at next characters to not match on | ai(?!n) | fail brail | faint train |
(?<=x) | looks at previous characters for a match without using those in the match | (?<=tr)a | trail translate | bear streak |
(?<!x) | looks at previous characters to not match on | (?!tr)a | bear translate | trail strained |
7. Literal matches and modifiers
Changing the matching rules using modifiers changes how they work.
Syntax | Description | Example Pattern | Example Matches | Example non- mathes |
\Qx\E | match start to finish | \Qtell\E \Q\d\E | tell \d | I’ll tell you this I have 5 coins |
(?i)x(?-i) | set the regex string to case-insensitive | (?i)te(?-i) | sTep tEach | Trench bear |
(?x)x(?-x) | regex ignores whitespace | (?x)t a p(?-x) | tap tapdance | c a t rot a potato |
(?s)x(?-s) | turns on single-line/DOTALL mode which makes the “.” include new-line symbols (\n) in addition to everything else | (?s)first and second(?-s) and third | first and
Second and third | first and second and third |
(?m)x(?-m) | Changes ^ and $ to be end of line rather than end of string | ^eat and sleep$ | eat and sleep eat and sleep | treat and sleep eat and sleep. |
8. Unicode
Chinese characters and emojis can be used with regular expressions beyond the Roman alphabet.
Syntax | Description | Example Pattern | Example Matches | Example non-matches |
\X | match graphemes | \u0000gmail | @gmail www.email@gmail | gmail @aol |
\X\X | Match special characters like ones with an accent | \u00e8 or \u0065\u0300 | è | e |
Regular expressions cheat sheets can be complex and powerful, but also difficult to master. This regex cheat sheet covers some of the most commonly used patterns and syntax, but there are many more possibilities and combinations.
In conclusion, a regex cheatsheet is a quick reference guide that provides a comprehensive list of regular expressions and their corresponding meanings. It is a helpful tool for developers, data analysts, and anyone who works with text data and needs to extract, search or replace specific patterns.
A regex cheat sheet typically includes a variety of regular expression syntaxes, such as anchors, quantifiers, character classes, groups, and assertions. It also covers special characters and metacharacters that have specific meanings within regular expressions, such as the dot (.), caret (^), dollar sign ($), and backslash ().
Using a regex cheatsheet can save time and increase productivity by allowing users to quickly and easily identify the appropriate regular expression pattern for a particular task. However, it is important to keep in mind that regular expressions can be complex and may require additional practice and experimentation to master.’
If you want a hassle-free experience in your career and want to reap all the benefits of the programming language or data science collectively then StarAgile provides a course for Data Science. Data Science is a growing field in the current era. If you complete the training provided by us, you will not only get Data Science Certification but it will also develop your skills to the next level.
professionals trained
countries
sucess rate
>4.5 ratings in Google