How to learn RegEx

Takshil Patil
6 min readFeb 1, 2022

--

Online Tool: RegExr: Learn, Build, & Test RegEx or https://regex101.com/

If you directly want to see Regex Examples: RegEx Examples

The terms used in the article may not be the exact standard terms, I have used terms and names which I feel comfortable with. If you want to know the exact terms refer RegEx documentation.

RegEx (Regular Express) are only used to find a particular pattern in a string. RegEx processing may require more time, if the size of input is less, then there is no problem using it, but if the size is huge, RegEx operation could take time to process.

Regex cannot replace string or data on its own, it is only used for searching. You can use RegEx inside a tool or command that does replace the string. Below is a python script with input and output string.

input = Jessa knows testing and machine learning

Output = Jessa_knows_testing_and_machine_learning

import re
target_str = "Jessa knows testing and machine learning"
res_str = re.sub(r"\s", "_", target_str)print(res_str) # Output 'Jessa_knows_testing_and_machine_learning'

The above command will find the string (space) in target_str and replace all occurrences with _.

Below is the text that will be used throught the article to search regex, you may see some modification in this in somecases:

RegExr was created by gski!@#$%^&*()_+nner.com, and is proudly hosted by Media Temple.Edit the Expression & Text to see matc!@#$%^&*()/_+hes. Roll over matches or the expression for details. PCRE & JavaScript / flavors of RegEx are suppoRegrted. Validate your expression with Tests mode.01. First 05. Fifth
02. Second 06. Sixth
03. Third 07. Seventh
04. Fourth 08. Eight

1. RegEx Syntax

1.1 Character Set

The character set are just shortcuts to the set of characters. Either you can use these shortcuts or simply use the character set directly

Character set = [a-zA-Z_] , for all alphabets both small and upper case + _.

Shortcut = \w , for all alphabets both small and upper case + _ .

Below are some more shortcuts. Its on your personal preference which you want to use. I like to use character set to make regex readable.

. => Means match any character (excluding new line) which includes all special characters, all character, all numeric, etc.\d => Means match any digit or numeric digit. Character set = [0-9]\w => Means match any word. Character set = [a-zA-Z_]\s => Means match any whitespace/space. Character set = [\ ].....for rest of the sets check documentation.....
RegExr online tool interface and character set

1.2 Count of findings to Search in character set

We use {count} to specify the count of findings. Like you want to select 2 findings next to each other matching the given character set. Example,

Search = Find 2 findings next to each other of digits / numeric.

Character set = [0–9]

Count of findings to search = 2

Using character set method :

\[0-9]{2}/g
method 1

Using character set shortcut method :

\d{2}/g
method 2

1.3 Prevent Character collision / confusion

We use \ (backslash) to prevent the regex syntax vs the searched character collision/confusion. Example if you want to search for / you will get error.

regex collision / confusion

Instead use \/

2. Flags in Regex

Flags are optional parameters that we can add to a plain expression to make it search in a different way / change the behavior or search.

For more details check: Flags in RegEx

Note: In RegExr online tool if you want to set a flag, then you have to use the right hand side drop down of Flags to select the required flags, you cannot add using text search.

The beginning / is just there to specify the start of regular expression, which is set automatically and users cannot control it.

2.1 Global Flag

Retain the index of the last match, allowing subsequent searches to start from the end of the previous match. Without the global flag, subsequent searches will return the same match.

RegExr online tool only searches for a single match when the global flag is disabled to avoid infinite match errors.

Select all occurrences vs Select First occurrence:

Given regex: .*

Select first occurrence: .*

Select all occurrences: .*/g

2.2 Multiline Flag

When the multiline flag is enabled, beginning and end anchors (^ and $) will match the start and end of a line, instead of the start and end of the whole string.

Note that patterns such as /^[\s\S]+$/m may return matches that span multiple lines because the anchors will match the start/end of any line.

3. Anchors

  1. \b : Beginning of a word
  2. \B : Inside a word
  3. ^ : Start of a string
  4. $ : Ending of a string

3.1 Beginning of a word

\bReg
example 1

3.2 Beginning of a string

^Reg

This is another example with incorrect implementation, it was intended to search string which start with alphanumberic but instead it search all “non” alphanumberic data

[^a-z0-9A-Z]
faulty implemenation of begining of string

3.3 Incorrect Searching (multiline)

Below searches will not work because of new line problem.

^\d{2}
^\$\.

3.4 Correct Searching (multiline)

Special character complicate the searching process, as they are not part of a word. Yes they are part of string, but in regex by default it is does not consider it, due to new line issue. You need to set Multiline flag

The correct method is:

^\$\./gm
For special symbols
^\d{2}/gm
for digits and numeric

4. Direct Search

If you want to search a string RegEx are supported and find all occurrences of it, then you could direct search the string without worrying about the regex pattern syntax. Of course for “all occurrences” you need to use the global flag.

RegEx are supported
direct search

Using regex in python3

To implement this pattern = \b(25[0–5]|2[0–4]\d|1\d\d|[1–9]?\d)(\.(25[0–5]|2[0–4]\d|1\d\d|[1–9]?\d)){3}\b in python3 and search all occurrences of this pattern in file contents of “file1.txt” you can use the below template for printing the results in new line.

import re
FILEPATH="file1.txt"
with open(FILEPATH) as file1:
data=file1.read()
pattern=r"\b(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}\b[^\.]"
matches = re.finditer(pattern, data, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print(match.group())
import reFILEPATH="file1.txt"with open(FILEPATH) as file1:
data=file1.read()
pattern=r"\b(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}\b[^\.]"matches = re.finditer(pattern, data, re.MULTILINE)for matchNum, match in enumerate(matches, start=1):
print(match.group())

Embedded flagged RegEx

Using embedded flagged RegEx we can define the regex flags within our regular expression. Use https://regex101.com/ for this as it is a suprior online tool.

The below is a RegEx doing the below things

  1. Search for 2 occurances of “Set-Cookie” string in single regex.
  2. (?sm) sets the single line modifier flag (s) and multiline modifier (m). This effectively means the regex will match multiline and all newlines will be included in the `.` matches.
(?sm)^Set-Cookie.*?Set-Cookie
regex101 tool gives detailed information about the RegEx

Next - RegEx Examples

--

--

Takshil Patil
Takshil Patil

No responses yet