1
scarlex 2013-02-02 23:04:00 +08:00 1
python的话,直接re.search(pattern, string)就行了~
python code: -------------------------------------------------------------------- import re f = open('test.txt', 'r') s = open('after_re.txt', 'w') pattern = r'<td rowspan="3" valign="top">.*' for i in f.readlines(): print i match = re.search(pattern, i) try: mystring = match.group(0) s.write(mystring) except: pass -------------------------------------------------------------------- |
2
ooof OP import re
f = open('c:\\t.html', 'r') s = open('c:\\after_re.txt', 'w') pattern = r'<td>.*' for i in f.readlines(): print i match = re.search(pattern, i) try: mystring = match.group(0) s.write(mystring) except: pass 改了一下,执行完, after_re.txt 怎么还是为空啊? |
3
Channing 2013-02-03 04:23:05 +08:00 2
真折腾……
Notepad++中: 1. Ctrl + F, 选“Mark”(第四个Tab), 填<td rowspan="3" valign="top">,并选中“Bookmark line” 2. 菜单栏Search - Bookmark - Remove Unmarked Lines 3. Done |
4
scarlex 2013-02-03 09:00:32 +08:00
@ooof
你文件里没有<td>标签吧... 我们的匹配样式是pattern = r'<td>.*',它匹配的是以<td>开头的一行。 如果你的<td>标签里定义了其他属性,那么就改一下匹配样式吧~ 另外,可以在match = re.search(pattern, i)的下一行增加一句print match 看看有没有返回Match对象,如果全部是None,那么你的文件里就没有<td>.... |
5
weizhenye 2013-02-03 11:33:55 +08:00
^(?!<td rowspan="3" valign="top">).*$
|