Scrapy， xpath 解析求助 - V2EX

Home Sign Up Sign In

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 4076 days ago, the information mentioned may be changed or developed.

我想爬取某一个网页，一个div下的每一条a，但是第一条是标题，而且和剩下的结构不同，会造成如下错误：
我的想法是爬取的内容应该为：
{“省”：["a","b","c"],“市”:["d","e","f"],“区”：["g","h","i"]},但会变成：
{"省"：["a","b","c"],“市”:["d","e","f"],“区”：["地区"，"g"，"h"]
应该怎么办，我如何从第二条开始爬取。我本想在定义sites时改为 //div/a[2], 但是不成功。
scrapy新手求助！！！

6 replies • 2015-04-07 17:57:14 +08:00

1

Septembers

Apr 7, 2015 via Android

没样本这不是扯淡么？

2

willdatascience

OP

Apr 7, 2015

@Septembers 额。要是能截图我就发html了。。

3

Septembers

Apr 7, 2015 via Android

@willdatascience gist

4

aaaa007cn

Apr 7, 2015

//div/a[position()>1]
//div/a/following-sibling::a

5

zjuster

Apr 7, 2015

//div/a[2] 是只抽取第二个a结点，试试/a[position()>1]，

常用的xpath配置到w3school看看，都有。

6

oseau

Apr 7, 2015

http://zvon.org/comp/r/tut-XPath_1.html

About · Help · Advertise · Blog · API · FAQ · Solana · 4794 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 39ms · UTC 09:45 · PVG 17:45 · LAX 02:45 · JFK 05:45
♥ Do have faith in what you're doing.