给定某字符,只需要保留其中的有效汉字或者字母,数字之类的。去掉特殊符号或者以某种格式进行拆分的时候,就可以采用re.split的方法。例如
=============================== RESTART: Shell ===============================>>> s = '''Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] on win32Type "copyright", "credits" or "license()" for more information.'''>>> s'Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] on win32\nType "copyright", "credits" or "license()" for more information.'>>> #现在要对s拆分,去掉里面多余的字符,只提取 数字,字母这些有效字符。>>> >>> import re>>> x = re.split(r'[.(:,[)" ]', s) #把特殊符号和空格都作为拆分条件输入['Python', '3', '6', '1', '', 'v3', '6', '1', '69c0db5', '', 'Mar', '21', '2017', '', '18', '41', '36', '', '', 'MSC', 'v', '1900', '64', 'bit', '', 'AMD64', ']', 'on', 'win32\nType', '', 'copyright', '', '', '', 'credits', '', 'or', '', 'license', '', '', '', 'for', 'more', 'information', '']>>> >>> words = [i for i in x if i]>>> words['Python', '3', '6', '1', 'v3', '6', '1', '69c0db5', 'Mar', '21', '2017', '18', '41', '36', 'MSC', 'v', '1900', '64', 'bit', 'AMD64', ']', 'on', 'win32\nType', 'copyright', 'credits', 'or', 'license', 'for', 'more', 'information']>>>
使用S.join() 方法拼接:
>>> #字符串的拼接>>> >>> help(str.join)Help on method_descriptor:join(...) S.join(iterable) -> str Return a string which is the concatenation of the strings in the iterable. The separator between elements is S.>>> l = list(range(1,9))>>> >>> s = "".join([str(i) for i in l])>>> s'12345678'>>> s = "".join(str(i) for i in l)>>> s'12345678'>>>