V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
xiaoyu9527
V2EX  ›  问与答

继续求教 scrapy 问题 如何忽略某些被隐藏元素的页面?

  •  
  •   xiaoyu9527 · Aug 23, 2016 · 1015 views
    This topic created in 3539 days ago, the information mentioned may be changed or developed.

    抓帖子 有些帖子被隐藏了。 如何才能忽略这些隐藏的帖子

    def parse_item(self, response):
       l = ItemLoader(item=MeizituItem(), response=response)
       l.add_xpath('name', '//html/body/div[1]/div[2]/div/div[2]/div[1]/div[1]/article/header/div[1]/h1/text()')
       l.add_xpath('tags', '//html/body/div[1]/div[2]/div/div[2]/div[1]/div[1]/article/header/div[2]/div[2]/span/ul/li/a/div/text()')
       image_url=l.add_xpath('image_urls', '//html/body/div[1]/div[2]/div/div[2]/div[1]/div[1]/article/div[1]/div/div/img/@src',Identity())
       print image_url
       l.add_value('url', response.url)
       yield l.load_item()
    

    我现在这样抓 如果 image_url 是抓不到的 好像就会报错。。

    No Comments Yet
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   762 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 25ms · UTC 21:17 · PVG 05:17 · LAX 14:17 · JFK 17:17
    ♥ Do have faith in what you're doing.