How to learn RegEx
Online Tool: RegExr: Learn, Build, & Test RegEx or https://regex101.com/
If you directly want to see Regex Examples: RegEx Examples
The terms used in the article may not be the exact standard terms, I have used terms and names which I feel comfortable with. If you want to know the exact terms refer RegEx documentation.
RegEx (Regular Express) are only used to find a particular pattern in a string. RegEx processing may require more time, if the size of input is less, then there is no problem using it, but if the size is huge, RegEx operation could take time to process.
Regex cannot replace string or data on its own, it is only used for searching. You can use RegEx inside a tool or command that does replace the string. Below is a python script with input and output string.
input = Jessa knows testing and machine learning
Output = Jessa_knows_testing_and_machine_learning
import re
target_str = "Jessa knows testing and machine learning" res_str = re.sub(r"\s", "_", target_str)print(res_str) # Output 'Jessa_knows_testing_and_machine_learning'
The above command will find the string (space) in target_str
and replace all occurrences with _
.
Below is the text that will be used throught the article to search regex, you may see some modification in this in somecases:
RegExr was created by gski!@#$%^&*()_+nner.com, and is proudly hosted by Media Temple.Edit the Expression & Text to see matc!@#$%^&*()/_+hes. Roll over matches or the expression for details. PCRE & JavaScript / flavors of RegEx are suppoRegrted. Validate your expression with Tests mode.01. First 05. Fifth
02. Second 06. Sixth
03. Third 07. Seventh
04. Fourth 08. Eight
1. RegEx Syntax
1.1 Character Set
The character set are just shortcuts to the set of characters. Either you can use these shortcuts or simply use the character set directly
Character set = [a-zA-Z_]
, for all alphabets both small and upper case + _
.
Shortcut = \w
, for all alphabets both small and upper case + _
.
Below are some more shortcuts. Its on your personal preference which you want to use. I like to use character set to make regex readable.
. => Means match any character (excluding new line) which includes all special characters, all character, all numeric, etc.\d => Means match any digit or numeric digit. Character set = [0-9]\w => Means match any word. Character set = [a-zA-Z_]\s => Means match any whitespace/space. Character set = [\ ].....for rest of the sets check documentation.....
1.2 Count of findings to Search in character set
We use {count}
to specify the count of findings. Like you want to select 2 findings next to each other matching the given character set. Example,
Search = Find 2 findings next to each other of digits / numeric.
Character set = [0–9]
Count of findings to search = 2
Using character set method :
\[0-9]{2}/g
Using character set shortcut method :
\d{2}/g
1.3 Prevent Character collision / confusion
We use \
(backslash) to prevent the regex syntax vs the searched character collision/confusion. Example if you want to search for /
you will get error.
Instead use \/
2. Flags in Regex
Flags are optional parameters that we can add to a plain expression to make it search in a different way / change the behavior or search.
For more details check: Flags in RegEx
Note: In RegExr online tool if you want to set a flag, then you have to use the right hand side drop down of Flags to select the required flags, you cannot add using text search.
The beginning /
is just there to specify the start of regular expression, which is set automatically and users cannot control it.
2.1 Global Flag
Retain the index of the last match, allowing subsequent searches to start from the end of the previous match. Without the global flag, subsequent searches will return the same match.
RegExr online tool only searches for a single match when the global flag is disabled to avoid infinite match errors.
Select all occurrences vs Select First occurrence:
Given regex: .*
Select first occurrence: .*
Select all occurrences: .*/g
2.2 Multiline Flag
When the multiline flag is enabled, beginning and end anchors (^
and $
) will match the start and end of a line, instead of the start and end of the whole string.
Note that patterns such as /^[\s\S]+$/m
may return matches that span multiple lines because the anchors will match the start/end of any line.
3. Anchors
\b
: Beginning of a word\B
: Inside a word^
: Start of a string$
: Ending of a string
3.1 Beginning of a word
\bReg
3.2 Beginning of a string
^Reg
This is another example with incorrect implementation, it was intended to search string which start with alphanumberic but instead it search all “non” alphanumberic data
[^a-z0-9A-Z]
3.3 Incorrect Searching (multiline)
Below searches will not work because of new line problem.
^\d{2}
^\$\.
3.4 Correct Searching (multiline)
Special character complicate the searching process, as they are not part of a word. Yes they are part of string, but in regex by default it is does not consider it, due to new line issue. You need to set Multiline flag
The correct method is:
^\$\./gm
^\d{2}/gm
4. Direct Search
If you want to search a string RegEx are supported
and find all occurrences of it, then you could direct search the string without worrying about the regex pattern syntax. Of course for “all occurrences” you need to use the global flag.
RegEx are supported
Using regex in python3
To implement this pattern = \b(25[0–5]|2[0–4]\d|1\d\d|[1–9]?\d)(\.(25[0–5]|2[0–4]\d|1\d\d|[1–9]?\d)){3}\b
in python3 and search all occurrences of this pattern in file contents of “file1.txt” you can use the below template for printing the results in new line.
import re
FILEPATH="file1.txt"
with open(FILEPATH) as file1:
data=file1.read()
pattern=r"\b(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}\b[^\.]"
matches = re.finditer(pattern, data, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print(match.group())
import reFILEPATH="file1.txt"with open(FILEPATH) as file1:
data=file1.read()pattern=r"\b(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}\b[^\.]"matches = re.finditer(pattern, data, re.MULTILINE)for matchNum, match in enumerate(matches, start=1):
print(match.group())
Embedded flagged RegEx
Using embedded flagged RegEx we can define the regex flags within our regular expression. Use https://regex101.com/ for this as it is a suprior online tool.
The below is a RegEx doing the below things
- Search for 2 occurances of “Set-Cookie” string in single regex.
- (?sm) sets the single line modifier flag (s) and multiline modifier (m). This effectively means the regex will match multiline and all newlines will be included in the `.` matches.
(?sm)^Set-Cookie.*?Set-Cookie
Next - RegEx Examples