1
realpg 2016-06-05 15:26:33 +08:00
能识别这种验证码的大牛才不会去搞这种烂站
|
2
ipconfiger 2016-06-05 15:28:09 +08:00
得到两个图片, 先对第一个图片做 ocr 处理, 获取到 4 个字, 再到第二个图片里按顺序找到每个字的座标, 模拟鼠标点击.
|
3
ProfFan 2016-06-05 15:57:38 +08:00
这码不难吧
|
4
ClutchBear OP @ProfFan 有什么思路吗?
|
5
holyghost 2016-06-05 16:09:49 +08:00
刚看了下,直接请求 captcha 还是乱序的(好像前端用 css 重新拼一遍?),然后提交坐标验证。
有意思。 |
6
ProfFan 2016-06-05 16:22:35 +08:00 1
按照这个截图,前背景差距这么大,直接背景就没用了。字间距也够大,不用 OCR ,直接上下 match 。
|
7
ProfFan 2016-06-05 16:29:43 +08:00 2
|
8
ProfFan 2016-06-05 18:31:15 +08:00 1
POC
``` import numpy as np import matplotlib.pyplot as plt import skimage.color from skimage import data from scipy import misc from skimage.feature import match_template def rotate(image, angle, center = None, scale = 1.0): (h, w) = image.shape[:2] if center is None: center = (w / 2, h / 2) # Perform the rotation M = cv2.getRotationMatrix2D(center, angle, scale) rotated = cv2.warpAffine(image, M, (w, h)) return rotated captcha_q = skimage.color.rgb2gray(misc.imread("captcha_q.jpg")) captcha_a = skimage.color.rgb2gray(misc.imread("captcha_a.jpg")) image = (captcha_a<0.37) #coin_raw = (captcha_q<0.37)[15:,55:79] #coin_raw = (captcha_q<0.37)[15:,12:35] #coin_raw = (captcha_q<0.37)[15:,79:101] coin_raw = (captcha_q<0.37)[15:,36:55] coin_rot = rotate(np.asarray((coin_raw)*255, dtype=np.uint8),-20) coin = cv2.resize(coin_rot, tuple(int(1.5*x) for x in coin_rot.T.shape), interpolation = cv2.INTER_AREA) result = match_template(image, coin) ij = np.unravel_index(np.argmax(result), result.shape) x, y = ij[::-1] fig, (ax1, ax2, ax3) = plt.subplots(1, 3,gridspec_kw = {'width_ratios':[1, 4, 3]},figsize=(12, 6)) ax1.imshow(coin) ax1.set_axis_off() ax1.set_title('char') ax2.imshow(image) ax2.set_axis_off() ax2.set_title('image') # highlight matched region hcoin, wcoin = coin.shape rect = plt.Rectangle((x, y), wcoin, hcoin, edgecolor='r', facecolor='none') ax2.add_patch(rect) ax3.imshow(result) ax3.set_axis_off() ax3.set_title('matched') # highlight matched region ax3.autoscale(False) ax3.plot(x, y, 'o', markeredgecolor='r', markerfacecolor='none', markersize=10) plt.show() ``` |
9
timeship 2016-06-05 23:03:42 +08:00
知乎的有时候也是点击的,抓下来然后 post 坐标过去
|
10
nivan 2016-06-06 09:33:14 +08:00
这个验证码是坐标范围验证,然后每个文字的大小都是一样的,所以可以把验证码和验证码图片按文字大小等分切割,然后进行 OCR
|
11
ClutchBear OP 感谢楼主各个大神,
我最后采用 selenium 手动登录, 保存 cookies,然后 requests 用这个 cookies 登录的方法 |