1
INT21H 2012-06-14 19:47:46 +08:00 1
>>> from BeautifulSoup import BeautifulSoup
>>> html="""<html> ... <head> ... <title>Test</title> ... </head> ... <body> ... <p>输出我</p> ... <p>我来捣乱</p> ... </body> ... </html>""" >>> bs = BeautifulSoup(html) >>> bs.p <p>输出我</p> >>> bs.p.contents [u'\u8f93\u51fa\u6211'] >>> |
2
vfasky 2012-06-14 20:56:33 +08:00
<code>
html = '''<html> <head> <title>Test</title> </head> <body> <p>输出我</p> <p>我来捣乱</p> </body> </html>''' for t in html.split('</p>') : print t.replace('<p>','') break; </code> |
3
vfasky 2012-06-14 20:58:41 +08:00 1
|
4
muzuiget 2012-06-14 21:03:12 +08:00
关键词:正则表达式,DOM。
|
5
goofansu 2012-06-14 21:05:13 +08:00
最近也在玩,beautifulsoup很棒
|
6
yibin001 2012-06-14 21:16:34 +08:00
beautifulsoup还真是个神器
|
7
likuku 2012-06-14 21:29:06 +08:00 1
#!/usr/bin/env python
# encoding: utf-8 """ html.py Created by likuku on 2012-06-14. Copyright (c) 2012 __MyCompanyName__. All rights reserved. """ import sys import os html=""" <html> <head> <title>Test</title> </head> <body> <p>输出我</p> <p>我来捣乱</p> </body> </html> """ def main(): for text in html.split('\n'): if text.find('<p>') != -1: tmp = text.replace('</p>','').replace('<p>','') print tmp break if __name__ == '__main__': main() |
8
aa88kk 2012-06-14 21:51:15 +08:00 1
用正则:
m = re.search('<p>(.*?)<\/p>', s, re.S) |
9
cute 2012-06-14 21:57:50 +08:00 1
start = s.find('<p>')+ len('<p>')
end = s.find('</p>', start) print s[start:end] |
10
ihciah OP 谢谢各位!!~~~~~~~~·
|
11
ling0322 2012-06-18 21:46:38 +08:00
其实有一个比beautifulsoap更霸气的, 叫pyQuery
|
12
binux 2012-06-18 21:49:04 +08:00
beautifulsoup太费内存了
|
13
chairo 2012-06-18 22:25:00 +08:00
libxml路过
|