distinct count is greater than doc_count in elasticsearch aggs -


i wrote aggs query total(sum) , unique count. result little confused.

unique value greater doc_count.
possible?

i know cardinality aggs experimentall , can approximate count of distinct values.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

but is's bad result. can see there many buckets unique larger doc_count.
problem request format? or cardinality limits?

half million documents indexed
, there 15 type of eventid
es 1.4 using.

request

{ "size": 0, "_source": false, "aggs": {     "eventids": {         "terms": {             "field": "_eventid_",             "size": 0         },         "aggs": {             "unique": {                 "cardinality": {                     "field": "uuid"                 }             }         }     } }   

response

{ "took": 383, "timed_out": false, "_shards": {     "total": 5,     "successful": 5,     "failed": 0 }, "hits": {     "total": 550971,     "max_score": 0,     "hits": [      ] }, "aggregations": {     "eventids": {         "doc_count_error_upper_bound": 0,         "sum_other_doc_count": 0,         "buckets": [             {                 "key": "red",                 "doc_count": 165110,                 "unique": {                     "value": 27423                 }             },             {                 "key": "blue",                 "doc_count": 108376,                 "unique": {                     "value": 94775                 }             },             {                 "key": "yellow",                 "doc_count": 78919,                 "unique": {                     "value": 70094                 }             },             {                 "key": "green",                 "doc_count": 60580,                 "unique": {                     "value": 78945                 }             },             {                 "key": "black",                 "doc_count": 49923,                 "unique": {                     "value": 56200                 }             },             {                 "key": "white",                 "doc_count": 38744,                 "unique": {                     "value": 45229                 }             }, 

edit. more test

i tried once again 1,000 precision_threshold filtered 1 eventid
result's error same. cardinality expected less 30,000 on 66,000 ( greater total document size)

doc_count : 65,672 ( no problem. right) cardinality : 66,037 ( greater doc_count) actual cardinality : 23,000 ( calculated rdbms scripts... )

request

{ "size": 0, "_source": false, "query": {     "term": {         "_eventid_": "packdownload"     } }, "aggs": {     "unique": {         "cardinality": {             "field": "uuid",             "precision_threshold": 10000         }     } } 

}

response

{ "took": 28, "timed_out": false, "_shards": {     "total": 5,     "successful": 5,     "failed": 0 }, "hits": {     "total": 65672,     "max_score": 0,     "hits": [] }, "aggregations": {     "unique": {         "value": 66037     } } 

}

the highest value precision threshold 40,000. should improve results, big count of distinct values, there might error of 20% plus minus. happens lesser values.


Comments

Popular posts from this blog

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -

qt - How to embed QML toolbar and menubar into QMainWindow -