Python Regular Expressions

Regular expressions are a very useful technique for extracting information from text such as code, spreadsheets, documents or log-files. The first thing to keep in mind while implementing regular expression is that everything essentially needs to be a character & programmers write patterns to match a specific sequence of characters/strings.

Defining Regular Expression

Regular expressions are characters in special order that help programmers find other sequences of characters or strings or set of strings using specialized syntax held in a pattern. Python supports regular expressions through the standard Python library's' which is packed with every Python installation.

Here, we will be learning about the vital functions that are used to handle regular expressions. There are many characters having special meaning when they are used as regular expressions. This is mostly used in UNIX.

Raw Strings in Python

It is recommended to use raw-strings instead of regular strings. When programmers write regular expressions in Python, they begin raw strings with a special prefix 'r' and backslashes and special meta-characters in the string, that allows us to pass through them to regular-expression-engine directly.

match Function

This method is used to test whether a regular expression matches a specific string in Python. The re.match(). The function returns 'none' of the pattern doesn't match or includes additional information about which part of the string the match was found.

Syntax:
re.match (pattern, string, flags=0)

Here, all the parts are explained below:

  • match(): is a method
  • pattern: this is the regular expression that uses meta-characters to describe what strings can be matched.
  • string: is used to search & match the pattern at the string's initiation.
  • flags: programmers can identify different flags using bitwise operator '|' (OR)
Example:
import re#simple structure of re.match()
matchObject = re.match(pattern, input_str, flags=0)

A Program by USING re.match:

Example:
import re
list = [ "mouse", "cat", "dog", "no-match"]
# Loop starts here
for elements in list:
    m = re.match("(d\w+) \W(d/w+)" , element)
# Check for matching
if m:
    print (m . groups ( ))

In the above example, the pattern uses meta-character to describe what strings it can match. Here '\w' means word-character & + (plus) symbol denotes one-or-more.

Most of the regular expressions' control technique comes to a role when "patterns" are used.

search Function

It works in a different manner than that of a match. Though both of them uses pattern; but 'search' attempts this at all possible starting points in the string. It scans through the input string and tries to match at any location.

Syntax:
re.search( pattern, strings, flags=0)

Program to show how it is used:

import re
value = "cyberdyne"
g = re.search("(dy.*)",  value)
if g:
    print("search: ", g.group(1))
s = re.match("(vi.*)", value)
if s:
    print("match:", m.group(1))
Output:
dyne

split Function

The re.split() accepts a pattern that specifies the delimiter. Using this, we can match pattern & separate text data. 'split()" is also available directly on a string & handles no regular expression.

Program to show how to use split():

Example:
import re
value = "two 2  four 4  six 6"
#separate those non-digit characters
res = re.split ("\D+" , value)
# print the result
for elements in res :
    print (elements)
Output:
2
4
6

In the above program, \D+ represents one or more non-digit characters.


Courses
Subscribe Updates via Email

Join 49,000+ W3schools lovers and get all the latest tutorials, programs, algorithms in your inbox.