猎聘网这种验证码有什么好的方法模拟吗?

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 3475 天前的主题，其中的信息可能已经有所发展或是发生改变。

搜索了好久也没用发现有人能用 python 模拟的....

猎聘网

Python

模拟

验证码

11 条回复 • 2016-06-06 18:23:10 +08:00

realpg

PRO

2016-06-05 15:26:33 +08:00

能识别这种验证码的大牛才不会去搞这种烂站

ipconfiger

2016-06-05 15:28:09 +08:00

得到两个图片, 先对第一个图片做 ocr 处理, 获取到 4 个字, 再到第二个图片里按顺序找到每个字的座标, 模拟鼠标点击.

ProfFan

2016-06-05 15:57:38 +08:00

这码不难吧

ClutchBear

2016-06-05 16:03:31 +08:00

@ProfFan 有什么思路吗?

holyghost

2016-06-05 16:09:49 +08:00

刚看了下，直接请求 captcha 还是乱序的（好像前端用 css 重新拼一遍？），然后提交坐标验证。

有意思。

ProfFan

2016-06-05 16:22:35 +08:00

按照这个截图，前背景差距这么大，直接背景就没用了。字间距也够大，不用 OCR ，直接上下 match 。

ProfFan

2016-06-05 16:29:43 +08:00

大概像这样

ProfFan

2016-06-05 18:31:15 +08:00

POC

```
import numpy as np
import matplotlib.pyplot as plt

import skimage.color
from skimage import data
from scipy import misc
from skimage.feature import match_template

def rotate(image, angle, center = None, scale = 1.0):
(h, w) = image.shape[:2]

if center is None:
center = (w / 2, h / 2)

# Perform the rotation
M = cv2.getRotationMatrix2D(center, angle, scale)
rotated = cv2.warpAffine(image, M, (w, h))

return rotated

captcha_q = skimage.color.rgb2gray(misc.imread("captcha_q.jpg"))
captcha_a = skimage.color.rgb2gray(misc.imread("captcha_a.jpg"))

image = (captcha_a<0.37)

#coin_raw = (captcha_q<0.37)[15:,55:79]
#coin_raw = (captcha_q<0.37)[15:,12:35]
#coin_raw = (captcha_q<0.37)[15:,79:101]
coin_raw = (captcha_q<0.37)[15:,36:55]

coin_rot = rotate(np.asarray((coin_raw)*255, dtype=np.uint8),-20)
coin = cv2.resize(coin_rot,
tuple(int(1.5*x) for x in coin_rot.T.shape),
interpolation = cv2.INTER_AREA)
result = match_template(image, coin)
ij = np.unravel_index(np.argmax(result), result.shape)
x, y = ij[::-1]

fig, (ax1, ax2, ax3) = plt.subplots(1, 3,gridspec_kw = {'width_ratios':[1, 4, 3]},figsize=(12, 6))

ax1.imshow(coin)
ax1.set_axis_off()
ax1.set_title('char')

ax2.imshow(image)
ax2.set_axis_off()
ax2.set_title('image')
# highlight matched region
hcoin, wcoin = coin.shape
rect = plt.Rectangle((x, y), wcoin, hcoin, edgecolor='r', facecolor='none')
ax2.add_patch(rect)

ax3.imshow(result)
ax3.set_axis_off()
ax3.set_title('matched')
# highlight matched region
ax3.autoscale(False)
ax3.plot(x, y, 'o', markeredgecolor='r', markerfacecolor='none', markersize=10)

plt.show()
```

timeship

2016-06-05 23:03:42 +08:00

知乎的有时候也是点击的，抓下来然后 post 坐标过去

nivan

2016-06-06 09:33:14 +08:00

这个验证码是坐标范围验证，然后每个文字的大小都是一样的，所以可以把验证码和验证码图片按文字大小等分切割，然后进行 OCR

ClutchBear

2016-06-06 18:23:10 +08:00

感谢楼主各个大神,
我最后采用 selenium 手动登录,
保存 cookies,然后 requests 用这个 cookies 登录的方法