推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
StackGao
V2EX  ›  Python

大家好,我想请教一个关于 python 爬虫的小问题^_^

  •  
  •   StackGao · Feb 24, 2014 · 3802 views
    This topic created in 4469 days ago, the information mentioned may be changed or developed.
    我想写个小爬虫,获取指定视频网站累计播放次数的

    这是一个视频网页http://www.iqiyi.com/v_19rrh6k4pk.html

    这是它播放量的html片段 <span id="widget-playcount" data-vi-elem="playCount">105万</span>



    这是我的代码:

    # encoding: utf-8
    import urllib2,re


    def getInfo(url,keyword):

    print 'getting information from :'+ url +' ...'
    myPage = urllib2.urlopen(url).read()
    myItems = re.findall(r'<span\sid="widget-playcount"\sdata-vi-elem="playCount">.*?<\/span>',myPage,re.S)

    for item in myItems:
    print item

    getInfo('http://www.iqiyi.com/v_19rrh6k4pk.html','')



    为什么找不到呢?RE表达式有什么问题吗? 谢谢大家帮我看看...
    6 replies    1970-01-01 08:00:00 +08:00
    clino
        1
    clino  
       Feb 24, 2014
    建议用kodos调试正则表达式
    glongzh
        2
    glongzh  
       Feb 24, 2014
    估计数据是动态加载的吧
    yakczh
        3
    yakczh  
       Feb 24, 2014
    from pyquery import PyQuery as pyq


    url=r'http://www.iqiyi.com/v_19rrh6k4pk.html'

    doc=pyq(url)

    legend=doc("#widget-playcount")

    print(legend.text())
    yangg
        4
    yangg  
       Feb 24, 2014
    viewsource 查看源代码为准,而不是从开发工具里看
    StackGao
        5
    StackGao  
    OP
       Feb 24, 2014
    @yangg 还真是 查看源代码还真是没有播放量...为啥开发工具里不准呢... 这种情况怎么抓取特定的信息?
    AlloVince
        6
    AlloVince  
       Feb 24, 2014
    casperjs
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   5764 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 45ms · UTC 02:08 · PVG 10:08 · LAX 19:08 · JFK 22:08
    ♥ Do have faith in what you're doing.