V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
herosbd1

Elasticsearch 检索多语种混合文档

  •  
  •   herosbd1 · Nov 27, 2021 · 4147 views
    This topic created in 1624 days ago, the information mentioned may be changed or developed.
    要检索的文档既有中文,也有英文,还有少量带变音符号的词(类似法语德语等其它欧洲语言)。希望检索具有中文分词,去除英文屈折变化与变音符号(比如查 abandon 可以匹配到 abandoned 和Äbandonéd )

    如果是纯中文+英文,可以用这样的分析器:
    "analyzer": {
    "optimizeIK": {
    "type": "custom",
    "tokenizer": "ik_max_word",
    "filter": [ "stemmer" ]
    }
    }

    如果是纯英文加变音符号,可以用这样的分析器:
    "analyzer": {
    "optimizeIK": {
    "type": "custom",
    "tokenizer": "standard",
    "filter": [ "stemmer", "asciifolding" ]
    }
    }

    但如果三种类型都有,我就不知道要怎么办了。试了下面的写法,发现 asciifolding 过滤器没起作用。感觉是和 ik 冲突了?
    "analyzer": {
    "optimizeIK": {
    "type": "custom",
    "tokenizer": "ik_max_word",
    "filter": [ "stemmer", "asciifolding" ]
    }
    }
    No Comments Yet
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   4722 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 39ms · UTC 10:01 · PVG 18:01 · LAX 03:01 · JFK 06:01
    ♥ Do have faith in what you're doing.