新手学习正则练习时碰到的一个问题 - V2EX

Home Sign Up Sign In

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 2834 days ago, the information mentioned may be changed or developed.

python 版本 3.6.5，代码如下：

import requests

import re

content = requests.get('http://book.douban.com/').text

pattern = re.compile('<li.?href="(.?)".?title="(.?)".*?', re.S)

results = re.findall(pattern, content)

print(results)

代码在 results = re.findall(pattern, content)这里卡住了，如果将

pattern = re.compile('<li.?href="(.?)".?title="(.?)".*?', re.S)

去掉一个（）

pattern = re.compile('<li.?title="(.?)".*?', re.S)

就能正确的运行，请问我是哪里出错了？希望大家指教

7 replies • 2018-10-23 15:01:01 +08:00

1

summerwar

Oct 21, 2018

正则不对，？表示重复零次或一次，网址和标题哪有那么短

2

PulpFunction

Oct 21, 2018

1

写代码不能试着写啊…

1 解析网页直接上正则不太好，2requests 不加 head 容易被封

非要上正则的话，你把 findall 参数搞混错了…

建议再看文档： http://www.runoob.com/python/python-reg-expressions.html

还可以了解一下 Beautifulsoup 等等

3

SpiderXiantang

Oct 21, 2018

了解下 xpath 不过正则还是得学的

4

GreatTony

Oct 21, 2018

html 用正则解析效率很低和出错率挺高的，用这个库吧： https://github.com/kennethreitz/requests-html，requests 的作者的另一库，非常好用

5

frostming

Oct 22, 2018

学正则的时候验证一下表达式
http://tool.oschina.net/regex

6

wersonliu9527

Oct 22, 2018

不想正则搞晕,直接用谷歌浏览器的 copy xpath 功能加上 xpath helper 插件吧

7

canwushuang

Oct 23, 2018

问题在于问号 “?” 问好表示非贪婪

About · Help · Advertise · Blog · API · FAQ · Solana · 2841 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 60ms · UTC 05:15 · PVG 13:15 · LAX 22:15 · JFK 01:15
♥ Do have faith in what you're doing.