Regular expressions are a very useful technique for extracting information from text such as code, spreadsheets, documents or log-files. The first thing to keep in mind while implementing regular expression is that everything essentially needs to be a character & programmers write patterns to match a specific sequence of characters/strings.
Defining Regular Expression
Regular expressions are characters in special order that help programmers find other sequences of characters or strings or set of strings using specialized syntax held in a pattern. Python supports regular expressions through the standard Python library's' which is packed with every Python installation.
Here, we will be learning about the vital functions that are used to handle regular expressions. There are many characters having special meaning when they are used as regular expressions. This is mostly used in UNIX.
Raw Strings in Python
It is recommended to use raw-strings instead of regular strings. When programmers write regular expressions in Python, they begin raw strings with a special prefix 'r' and backslashes and special meta-characters in the string, that allows us to pass through them to regular-expression-engine directly.
This method is used to test whether a regular expression matches a specific string in Python. The re.match(). The function returns 'none' of the pattern doesn't match or includes additional information about which part of the string the match was found.
re.match (pattern, string, flags=0)
Here, all the parts are explained below:
- match(): is a method
- pattern: this is the regular expression that uses meta-characters to describe what strings can be matched.
- string: is used to search & match the pattern at the string's initiation.
- flags: programmers can identify different flags using bitwise operator '|' (OR)
import re#simple structure of re.match() matchObject = re.match(pattern, input_str, flags=0)
A Program by USING re.match:
import re list = [ "mouse", "cat", "dog", "no-match"] # Loop starts here for elements in list: m = re.match("(d\w+) \W(d/w+)" , element) # Check for matching if m: print (m . groups ( ))
In the above example, the pattern uses meta-character to describe what strings it can match. Here '\w' means word-character & + (plus) symbol denotes one-or-more.
Most of the regular expressions' control technique comes to a role when "patterns" are used.
It works in a different manner than that of a match. Though both of them uses pattern; but 'search' attempts this at all possible starting points in the string. It scans through the input string and tries to match at any location.
re.search( pattern, strings, flags=0)
Program to show how it is used:
import re value = "cyberdyne" g = re.search("(dy.*)", value) if g: print("search: ", g.group(1)) s = re.match("(vi.*)", value) if s: print("match:", m.group(1))
The re.split() accepts a pattern that specifies the delimiter. Using this, we can match pattern & separate text data. 'split()" is also available directly on a string & handles no regular expression.
Program to show how to use split():
import re value = "two 2 four 4 six 6" #separate those non-digit characters res = re.split ("\D+" , value) # print the result for elements in res : print (elements)
2 4 6
In the above program, \D+ represents one or more non-digit characters.