V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
whamjane
V2EX  ›  Python

python 获取图片验证码是什么原理?

  •  
  •   whamjane · Jan 20, 2016 · 4184 views
    This topic created in 3754 days ago, the information mentioned may be changed or developed.
    高工们,请问如下代码中,获取 captcha 的验证码那一段,究竟发生了什么,能够的得到验证码?


    import requests
    from bs4 import BeautifulSoup
    import urllib
    import re

    loginUrl = 'http://accounts.douban.com/login'
    formData={
    "redir":"http://movie.douban.com/mine?status=collect",
    "form_email":username,
    "form_password":password,
    "login":u'登录'
    }
    headers = {"User-Agent":'Mozilla/5.0 (Windows NT 6.1)\
    AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36'}
    r = requests.post(loginUrl,data=formData,headers=headers)
    page = r.text
    #print r.url

    '''''获取验证码图片'''
    #利用 bs4 获取 captcha 地址
    soup = BeautifulSoup(page,"html.parser")
    captchaAddr = soup.find('img',id='captcha_image')['src']
    #利用正则表达式获取 captcha 的 ID
    reCaptchaID = r'<input type="hidden" name="captcha-id" value="(.*?)"/'
    captchaID = re.findall(reCaptchaID,page)
    #print captchaID
    #保存到本地
    urllib.urlretrieve(captchaAddr,"captcha.jpg")
    captcha = raw_input('please input the captcha:')

    formData['captcha-solution'] = captcha
    formData['captcha-id'] = captchaID

    r = requests.post(loginUrl,data=formData,headers=headers)
    page = r.text
    if r.url=='http://movie.douban.com/mine?status=collect':
    print 'Login successfully!!!'
    print '我看过的电影','-'*60
    #获取看过的电影
    soup = BeautifulSoup(page,"html.parser")
    result = soup.findAll('li',attrs={"class":"title"})
    #print result
    for item in result:
    print item.find('a').get_text()
    else:
    print "failed!"
    3 replies    2016-01-20 21:37:56 +08:00
    JoeShu
        1
    JoeShu  
       Jan 20, 2016
    得不到验证码,需要人工输入
    CheungKe
        2
    CheungKe  
       Jan 20, 2016
    urllib.urlretrieve(captchaAddr,"captcha.jpg")

    先获取到图片链接,保存后,自己打开看

    captcha = raw_input('please input the captcha:')

    然后输入
    whamjane
        3
    whamjane  
    OP
       Jan 20, 2016
    @CheungKe @JoeShu 谢谢
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   2454 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 33ms · UTC 02:39 · PVG 10:39 · LAX 19:39 · JFK 22:39
    ♥ Do have faith in what you're doing.