请教 preg_replace() 过滤 html 标签 - V2EX

Home Sign Up Sign In

This topic created in 3553 days ago, the information mentioned may be changed or developed.

文章正文中有大量 html 标签，类似这样的：

<h1>标题</h1>
<p>文字文字</p>
<img xxxx />

需要从文章正文提取摘要，现在用的办法是先过滤掉 html 标签，然后截取前 n 个字

$patterns = ['/<.+>/', '/&nbsp;/'];
$replacements = ['', ''];

$text = preg_replace($patterns, $replacements, $html);

但是替换结果是一个空字符串，是匹配了第一个 < 和最后一个 > ？

请问正则应该怎么写？

或者有没有其它方法从正文提取摘要？

4 replies • 2016-09-20 23:53:16 +08:00

1

fahai

Sep 19, 2016

1

可以把<.+>改成<.*?>或者使用 strip_tags

2

lissome

OP

Sep 19, 2016

strip_tags
http://php.net/manual/en/function.strip-tags.php

刚刚我也查到了，谢谢！
@fahai

3

Herobs

Sep 19, 2016

- 取消贪婪匹配。
- 替换   不要用正则表达式。
- 直接用现成的 strip_tags 。

4

gouchaoer

Sep 20, 2016 via Android

用 dom 库做这个啊

About · Help · Advertise · Blog · API · FAQ · Solana · 1139 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 35ms · UTC 23:49 · PVG 07:49 · LAX 16:49 · JFK 19:49
♥ Do have faith in what you're doing.