python的正则表达式入门求助

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 4156 天前的主题，其中的信息可能已经有所发展或是发生改变。

初学python 遇到正则表达式的难题各位大虾能推荐下如何入门么

http://*259

诸如此类网址末尾是数字怎么把它从网页里提取出来？

第 1 条附言 · 2013-11-19 11:41:18 +08:00

谢谢各位的帮助有点心得了

Python

正则表达式

入门

10 条回复 • 1970-01-01 08:00:00 +08:00

zhy0216

2013-11-18 22:42:09 +08:00

http://www.amazon.cn/%E6%AD%A3%E5%88%99%E6%8C%87%E5%BC%95-%E4%BD%99%E6%99%9F/dp/B007X6O6J0/ref=sr_1_3?ie=UTF8&qid=1384785712&sr=8-3&keywords=%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F

yxjxx

2013-11-18 23:21:40 +08:00

我也刚学python不久,写过一篇笔记. http://yxjxx.me/regular-expression

mengzhuo

2013-11-18 23:47:07 +08:00

首先网页就不要用正则提取内容，BS4是你的好伙伴
然后提取的所有链接再用正则匹配

https?:\/\/([\d\.]+)\/

Perry

2013-11-19 00:29:30 +08:00

关于入门：
入门正则可以不用书
几分钟的入门：http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/
cheatsheet：http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/
然后发挥你的想象力自己写并验证：http://rubular.com

LetFoxRun

2013-11-19 01:01:42 +08:00 via Android

@yxjxx

博客里的 intersting 是不是打错了，还是故意这么写的？

sandtears

2013-11-19 03:09:35 +08:00

import re
tmpRe = re.compile(r"^http://.*?(\d+)$")
tmpNum = tmpRe.match(url).groups()[0]

此时tmp即为str类型的数字

clino

2013-11-19 08:54:35 +08:00

建议装一个 kodos ,是一个正则的调试集成环境

lixm

2013-11-19 09:23:22 +08:00

html页面为什么不用xml解析而要去用正则呢？

yxjxx

2013-11-19 10:01:15 +08:00

@LetFoxRun 囧,打错了!感谢指出!

xavierskip

2013-11-19 11:21:39 +08:00

http://images.cnblogs.com/cnblogs_com/huxi/Windows-Live-Writer/Python_10A67/pyre_ebb9ce1c-e5e8-4219-a8ae-7ee620d5f9f1.png