求教一个 lxml 抓取内容的简单问题

不浪费大家时间，直接上代码：

from lxml import etree

a = etree.HTML('<div class="ash1"><span class="mark">Vectors</span> \ <span class="mark">Background</span> \ 19.080 results </div>')

现在我想要拿到那边的 "19,080" 几个字，应该怎么做？
a.xpath('//div')[0].text 结果竟然是空的，真是不科学？

lxml

div

class=

6 条回复 • 2014-07-11 10:31:56 +08:00

imn1

2014-07-11 00:55:08 +08:00

a.xpath('//div/text()')[0] 试试

ggarlic

2014-07-11 01:22:48 +08:00

这坑我也踩过
text是空的原因是：text不是你以为的意思(一个标签的text内容）。text在文档中的定义是
Text before the first subelement. This is either a string or the value None, if there was no text.

除了楼上的方法，你也可以用itertext()方法来遍历

binux

2014-07-11 02:18:11 +08:00

a.xpath('//div')[0].text_content()

2014-07-11 02:54:23 +08:00

多谢几位搞定了

pc10201

2014-07-11 09:38:06 +08:00

我为啥总觉得正则提取比xpath好呢？

dingyaguang117

2014-07-11 10:31:56 +08:00

@pc10201 xpath 虽然会慢一点不过写法简洁，而且准确性高