This article will serve as a comprehensive cheat sheet for regex in Python, providing you with an understanding of the basic concepts and patterns, as well as a rundown of the most common functions and features.
[lwptoc]
Special Characters
Regular expressions use a combination of special characters and literal characters to define search patterns. Here are some of the most common special characters:
.
: Matches any single character except a newline character.^
: Matches the start of the string.$
: Matches the end of the string.*
: Matches zero or more repetitions of the preceding character.+
: Matches one or more repetitions of the preceding character.?
: Matches zero or one repetition of the preceding character.{m,n}
: Matches the preceding character between m and n times.[...]
: A character set, matching any one of the characters inside the brackets.[^...]
: A negated character set, matching any character not inside the brackets.|
: Alternation, matches either the expression before or after the|
.(...)
: Defines a group of characters.
Basic Patterns
Here are some basic patterns used in regular expressions:
\d
: Matches any digit (0-9).\D
: Matches any non-digit character.\s
: Matches any whitespace character (space, tab, newline, etc.).\S
: Matches any non-whitespace character.\w
: Matches any word character (letters, digits, or underscores).\W
: Matches any non-word character.
Python regex Library
The Python regex library, known as the re
module, provides a variety of functions for working with regular expressions. Here are some of the most commonly used functions:
re.compile()
This function compiles a regular expression pattern into a regex object. This can help improve performance when using the same pattern multiple times.
import re
pattern = re.compile(r'\d+')
re.search()
The re.search()
function searches the entire string for a match and returns a match object if a match is found. If no match is found, it returns <span class="hljs-literal">None</span>
.
import re
pattern = re.compile(r'\d+')
text = "The year is 2023."
result = pattern.search(text)
if result:
print("Match found:", result.group())
else:
print("No match found.")
re.match()
The re.match()
function checks if the regular expression pattern matches at the beginning of the string. It returns a match object if a match is found, and None
otherwise.
import re
pattern = re.compile(r'\d+')
text = "2023 is the current year."
result = pattern.match(text)
if result:
print("Match found:", result.group())
else:
print("No match found.")
re.findall()
The re.findall()
function returns all non-overlapping matches of the pattern in the string as a list.
import re
pattern = re.compile(r'\d+')
text = "There are 3 cats and 2 dogs."
result = pattern.findall(text)
print("Matches found:", result)
re.finditer()
The re.finditer()
function returns an iterator yielding match objects for all non-overlapping matches of the pattern in the string.
import re
pattern = re.compile(r'\d+')
text = "There are 3 cats and 2 dogs."
result = pattern.finditer(text)
for match in result:
print("Match found:", match.group())
re.sub()
The re.sub()
function replaces all occurrences of the pattern in the string with the specified replacement string.
import re
pattern = re.compile(r'\d+')
text = "There are 3 cats and 2 dogs."
result = pattern.sub("X", text)
print("Modified text:", result)
re.split()
The re.split()
function splits the string by occurrences of the pattern.
import re
pattern = re.compile(r'\d+')
text = "There are 3 cats and 2 dogs."
result = pattern.split(text)
print("Split text:", result)
Regex Flags
Regex flags modify the behavior of the regex functions. Some commonly used flags include:
re.IGNORECASE
(orre.I
): Performs case-insensitive matching.re.MULTILINE
(orre.M
): Allows^
and$
to match the start and end of each line in the string, rather than the entire string.re.DOTALL
(orre.S
): Makes the.
special character match any character, including newline characters.
Groups and Capturing
Named Groups
Named groups allow you to reference matched text by name instead of by position. You can create named groups using the (?P<name>...)
syntax.
import re
pattern = re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')
text = "The date is 2023-04-30."
result = pattern.search(text)
if result:
print("Year:", result.group("year"))
print("Month:", result.group("month"))
print("Day:", result.group("day"))
Lookaround Assertions
Lookaround assertions are a powerful feature in regular expressions that allow you to check for a pattern without consuming any characters.
Lookahead
Positive lookahead (?=...)
asserts that the pattern inside the lookahead is matched, but does not consume any characters. Negative lookahead (?!...)
asserts that the pattern inside the lookahead is not matched.
import re
pattern = re.compile(r'\d+(?=\D)')
text = "There are 3 cats and 2 dogs."
result = pattern.findall(text)
print("Matches found:", result)
Lookbehind
Positive lookbehind (?<=...)
asserts that the pattern inside the lookbehind is matched immediately before the current position, without consuming any characters. Negative lookbehind (?<!...)
asserts that the pattern inside the lookbehind is not matched.
import re
pattern = re.compile(r'(?<=\D)\d+')
text = "There are 3 cats and 2 dogs."
result = pattern.findall(text)
print("Matches found:", result)
Conclusion
Regular expressions are a powerful tool for working with text in Python. This regex Python cheat sheet covers the basics of regex syntax, the most commonly used functions from the re
module, and some advanced techniques, such as groups and lookaround assertions.
With this knowledge, you can now write more efficient and powerful code when working with text data in Python.
Leave a Reply