希望 match 到某几个特定单词的所有首次出现,比如下面这段文字(随便 copy 来的):
Mr. Obama also used his own remarks to try to drive a wedge between Mr. Trump’ s campaign and Republican voters. “ It wasn ’ t particularly Republican and it sure wasn ’ t conservative,” he said of last week ’ s Republican convention. “ There were no serious solutions to pressing problems. Just the fanning of resentments and blame and hate and anger.” The president ’ s contempt for Mr. Trump took on a personal dimension as well when he recalled his grandparents from Kansas and said, “ I don ’ t know if they had their birth certificates ” — a reference to Mr. Trump’ s leadership of the so-called birther movement that raised questions about Mr. Obama’ s citizenship.
正则 (Obama|Trump)
会匹配到所有的 Obama 和 Trump (粗体的),现在希望仅匹配首次出现,即粗斜体的两个,研究了半天 lookbehind/back reference 啥的也没试出怎么写,求指导。
1
old9 OP 用 lookahead 比如 (Obama|Trump)(?!.*\1) 可以匹配到最后出现的,我想着同理应该可以选到最开始出现的,但换成 lookbehind 发现 back reference 不能反着来,卡在这了……
|
2
imn1 2016-07-28 15:54:20 +08:00
用非贪婪模式就是第一个了,你用 findall 么?
|
3
old9 OP 这里并没有 *、或 + 一类的标识,所以你是指 global flag ?但我需要的是前两个,不是第一个。
如果去掉 global ,则只能选中第一个 Obama ,没法选中第一个 Trump 。 |
4
chairuosen 2016-07-28 16:22:50 +08:00
|
5
old9 OP @chairuosen 噢原来这样,虽然我的环境不是 JS ,但也不是帖子里提到的有此功能的 .NET 。
那有没有什么方法可以绕过呢?不用 lookbehind 能实现么? |
6
arnofeng 2016-07-28 17:14:10 +08:00
python 的 findall 然后列表的第一个
|
7
chairuosen 2016-07-28 17:16:09 +08:00
@old9 别用正则了呗。。。
|