python也支持正则表达式,这节说正则表达式模块re
1 >>> import re2 >>> dir(re)3 ['A', 'ASCII', 'DEBUG', 'DOTALL', 'I', 'IGNORECASE', 'L', 'LOCALE', 'M', \ 4 'MULTILINE', 'S', 'Scanner', 'T', 'TEMPLATE', 'U', 'UNICODE', 'VERBOSE', 'X',\ 5 '_MAXCACHE', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', \ 6 '__name__', '__package__', '__spec__', '__version__', '_alphanum_bytes', '_alphanum_str', '_cache', \ 7 '_cache_repl', '_compile', '_compile_repl', '_expand', '_locale', '_pattern_type', '_pickle', '_subx', \ 8 'compile', 'copyreg', 'error', 'escape', 'findall', 'finditer', 'fullmatch', 'match', 'purge', 'search', \ 9 'split', 'sre_compile', 'sre_parse', 'sub', 'subn', 'sys', 'template']
使用正则表达式模块首先需要导入模块,re模块中比较重要的几个方法 findall / match / search /compile ,下面用例子解释下:
#findall >>> help(re.findall) Help on function findall in module re: findall(pattern, string, flags=0) Return a list of all non-overlapping matches in the string. If one or more capturing groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result. 其中:flags定义包括: re.I:忽略大小写 re.L:表示特殊字符集 \w, \W, \b, \B, \s, \S 依赖于当前环境 re.M:多行模式 re.S:' . '并且包括换行符在内的任意字符(注意:' . '不包括换行符) re.U: 表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依赖于 Unicode 字符属性数据库 >>> a='this is a test string'>>> re.findall('a')>>> re.findall(r'test',a) #查找匹配‘test'的单词['test'] >>> a='this is a { {test}} string' >>> patten=r'{ {(.*?)}}' #结合稍微复杂的模式>>> re.findall(patten,a)['test']
>>> pattern=r'a.c'>>> test_str='abc a1c a*c a|c abd aed a\nc' >>> re.findall(pattern,test_str) ['abc', 'a1c', 'a*c', 'a|c'] >>> re.findall(pattern,test_str,re.S) ['abc', 'a1c', 'a*c', 'a|c', 'a\nc']
1 #re.search 该方法检索字符串,找到匹配后返回, 2 >>> help(re.search) 3 Help on function search in module re: 4 5 search(pattern, string, flags=0) 6 Scan through string looking for a match to the pattern, returning 7 a match object, or None if no match was found. 8 >>> test_str='this is a test ,is good,is bad' 9 >>> pattern=r'is'10 >>> re.search(pattern,test_str)11 <_sre.SRE_Match object; span=(2, 4), match='is'>12 >>> dir(b)13 ['__class__', '__copy__', '__deepcopy__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', \ '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', \ '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'end', 'endpos', \ 'expand', 'group', 'groupdict', 'groups', 'lastgroup', 'lastindex', 'pos', 're', 'regs', 'span', 'start', 'string']14 >>> b.group()15 'is'
1 #re.match 仅从字符串首开始匹配 2 >>> help(re.match) 3 Help on function match in module re: 4 5 match(pattern, string, flags=0) 6 Try to apply the pattern at the start of the string, returning 7 a match object, or None if no match was found. 8 >>> test_str 9 'this is a test ,is good,is bad'10 >>> pattern11 'is'12 >>> b=re.match(pattern,test_str)13 >>> b14 >>> print(b)15 None #没有匹配到字符串中的is
re.compile 将正则表达式转换为模式对象,可以实现更有效率的匹配,
1 >>> help(re.compile) 2 Help on function compile in module re: 3 4 compile(pattern, flags=0) 5 Compile a regular expression pattern, returning a pattern object. 6 >>> pattern=r'{ {(.*?)}}' 7 >>> pat_obj=re.compile(pattern) #编译成模式对象 8 >>> pat_obj 9 re.compile('{ {(.*?)}}')10 >>> test_str='this is a { {test}} str, { {}}'11 >>> pat_obj.findall(test_str)12 ['test', ''] #我们看到这同样匹配了找到的所有结果