只是改了一行代码爬虫的效率就低了几十倍求解 - V2EX

Home Sign Up Sign In

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 3564 days ago, the information mentioned may be changed or developed.

上略： s = k.decode("unicode_escape") ll = re.findall('(.aaaa.)',s ) 下略：在标题中寻找特定的英文字段一个大约跑 12s 一个循环 40 个循环大约 8 分钟分钟左右上略： s = str(k.decode("unicode_escape")) ll = re.findall('(.汉字 XXX.)', s) 下略：在标题中寻找特定的中文字段第一个循环 10s 左右越往后越慢到最后一个循环是 200s 10 个循环大约有半个小时我想知道的是为嘛寻找英文和寻找中文的时间差距这么大其他代码完全一样服务器也是同一台机器

4 replies • 2016-09-01 08:12:18 +08:00

1

huntzhan

Aug 31, 2016

我猜是你 regex 写得太烂导致的, 你看下是否存在 catastrophic backtracking.

2

soulmine

OP

Aug 31, 2016

@huntzhan 呃这个怎么看.....

3

Tyanboot

PRO

Aug 31, 2016

也许是因为汉字占两个字节?然后匹配的时间暴增?

4

simple2025

Sep 1, 2016

你确定正则能这样用中文吗？我记得应该不行吧。。

About · Help · Advertise · Blog · API · FAQ · Solana · 5474 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 34ms · UTC 06:46 · PVG 14:46 · LAX 23:46 · JFK 02:46
♥ Do have faith in what you're doing.