比如我有以下 url(输入)
https://www.showcase.com/user/home
https://www.showcase.com/bill/BlKLSJDFLJERSDF
https://www.showcase.com/bill/BSERlKLSSDFEJSDF
https://www.showcase.com/bill/BSDREWRDF
https://www.showcase.com/bill/BSERDWEDFEJSDF # 类似 url 可能有 100+个
https://www.showcase.com/bill/BlKLSJDFLJERSDF/detail
https://www.showcase.com/bill/BSERlKLSSDFEJSDF/detail
https://www.showcase.com/bill/BSDREWRDF/detail
https://www.showcase.com/bill/BSERDWEDFEJSDF/detail # 类似 url 可能有 100+个
https://www.showcase.com/topic/234566833245234566
https://www.showcase.com/topic/200000234523456683
https://www.showcase.com/topic/2586683567243w56324 # 类似 url 可能有 100+个
# 其它大量 url , 正则规则不固定,只能通过统计分析
分类为(输出)
https://www.showcase.com/user/home
https://www.showcase.com/bill/{param}
https://www.showcase.com/bill/{param}/detail
https://www.showcase.com/topic/{param}
暂时只想到用模式识别, 不知大佬有无其它方法
1
Coderuancun 2023-02-03 09:20:11 +08:00
分词处理,有那种分词处理算法
|
2
acmerliu 2023-02-03 09:21:29 +08:00
隐马尔可夫
|
3
Jooooooooo 2023-02-03 10:39:28 +08:00
这不是正则吗
|
4
34127chi 2023-02-03 13:43:41 +08:00
这不是正则吗
|