distinct count is greater than doc_count in elasticsearch aggs -
i wrote aggs query total(sum) , unique count. result little confused.
unique value greater doc_count.
possible?
i know cardinality aggs experimentall , can approximate count of distinct values.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
but is's bad result. can see there many buckets unique larger doc_count.
problem request format? or cardinality limits?
half million documents indexed
, there 15 type of eventid
es 1.4 using.
request
{ "size": 0, "_source": false, "aggs": { "eventids": { "terms": { "field": "_eventid_", "size": 0 }, "aggs": { "unique": { "cardinality": { "field": "uuid" } } } } }
response
{ "took": 383, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 550971, "max_score": 0, "hits": [ ] }, "aggregations": { "eventids": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "red", "doc_count": 165110, "unique": { "value": 27423 } }, { "key": "blue", "doc_count": 108376, "unique": { "value": 94775 } }, { "key": "yellow", "doc_count": 78919, "unique": { "value": 70094 } }, { "key": "green", "doc_count": 60580, "unique": { "value": 78945 } }, { "key": "black", "doc_count": 49923, "unique": { "value": 56200 } }, { "key": "white", "doc_count": 38744, "unique": { "value": 45229 } },
edit. more test
i tried once again 1,000 precision_threshold filtered 1 eventid
result's error same. cardinality expected less 30,000 on 66,000 ( greater total document size)
doc_count : 65,672 ( no problem. right) cardinality : 66,037 ( greater doc_count) actual cardinality : 23,000 ( calculated rdbms scripts... )
request
{ "size": 0, "_source": false, "query": { "term": { "_eventid_": "packdownload" } }, "aggs": { "unique": { "cardinality": { "field": "uuid", "precision_threshold": 10000 } } }
}
response
{ "took": 28, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 65672, "max_score": 0, "hits": [] }, "aggregations": { "unique": { "value": 66037 } }
}
the highest value precision threshold 40,000. should improve results, big count of distinct values, there might error of 20% plus minus. happens lesser values.
Comments
Post a Comment