请教 ES 怎么实现如下聚合查询: 按某关键字分组，每个分组找到最近的一条记录，筛选这条"最近记录"中状态字段为"特定状态"

This topic created in 894 days ago, the information mentioned may be changed or developed.

原始数据类似

[
  {
  "doc_key": 'a',
  "startsAt": '2024 年 1 月 29 日',
  "status": 'a'
  },
  {
  "doc_key": 'a',
  "startsAt": '2024 年 1 月 30 日',
  "status": 'b'
  }
]

对上面样例需要返回 doc_key=a, status=b的记录。

通过翻文档目前我实现的

{
  "aggs": {
    "unique_doc": {
      "terms": {
        "field": "doc_key", // 每个 doc_key 有多条记录
      }
    },
    "aggs": {
        "latest": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "@timestamp": {
                  "order": "desc"
                }
              }
            ]
          }
        },
        "pagination": {
          "bucket_sort": {
            "size": 10,
            "from": 0
          }
        } 
      }
  }
}

问题: 没有实现最后一步状态过滤。理想是在 top_hits 中能有个 filter 过滤status字段，然后结合bucket selector 过滤 hit count != 0 的桶。
请教各位大佬该怎么做？

3 replies • 2024-01-30 10:05:15 +08:00

chana71

Jan 29, 2024

同这个问题 https://stackoverflow.com/questions/36587083/elasticsearch-filter-top-hits-aggregation

akinoowari

Jan 29, 2024

这种单字段的，直接 collapse a 然后查 status=b

ijyuqi

Jan 30, 2024

按排序获取分组最新的一条数据
{
"aggs": {
"group_by_category": {
"terms": {
"field": "doc_key",
"size": 10
},
"aggs": {
"top_records": {
"top_hits": {
"sort": [
{
"doc_key": {
"order": "desc"
}
},
{
"startsAt": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}