Elasticsearch query performance -
i'm using elasticsearch index 2 types of objects -
data details
contract object ~ 60 properties (object size - 120 bytes) risk item object ~ 125 properties (object size - 250 bytes)
contract parent of risk item (_parent)
i'm storing 240 million such objects in single index (210 million risk items, 30 million contracts)
index size - 322 gb
cluster details
11 m2.4x.large ec2 boxes [68 gb memory, 1.6 tb storage, 8 cores](1 box load balancer node node.data = false) 50 shards 1 replica
elasticsearch.yml
node.data: true http.enabled: false index.number_of_shards: 50 index.number_of_replicas: 1 index.translog.flush_threshold_ops: 10000 index.merge.policy.use_compound_files: false indices.memory.index_buffer_size: 30% index.refresh_interval: 30s index.store.type: mmapfs path.data: /data-xvdf,/data-xvdg
i'm starting elasticsearch nodes following command - /home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -xms30g -xmx30g
my problem i'm running following query on risk item type , taking 10-15 seconds return data, 20 records.
i'm running load of 50 concurrent users , bulk index load of 5000 risk items happening in parallel.
query (with join parent child)
http://:9200/contractindex/riskitem/_search*
{ "query": { "has_parent": { "parent_type": "contract", "query": { "range": { "contractdate": { "gte": "2010-01-01" } } } } }, "filter": { "and": [{ "query": { "bool": { "must": [{ "query_string": { "fields": ["riskitemproperty1"], "query": "abc" } }, { "query_string": { "fields": ["riskitemproperty2"], "query": "xyz" } }] } } }] } }
queries 1 table
query1 (this query takes around 8 seconds.)
<!-- language: lang-json --> { "query": { "constant_score": { "filter": { "and": [{ "term": { "commoncharacteristic_buildingscheme": "buildingscheme1" } }, { "term": { "address_admin2name": "admin2name1" } }] } } } } **query2** (this query takes around 6.5 seconds top 10 records ( has sort on top of it) <!-- language: lang-json --> { "query": { "constant_score": { "filter": { "and": [{ "term": { "insurer": "insurer1" } }, { "term": { "status": "status1" } }] } } } }
can please me how can improve query performance ?
have tried custom routing? without custom routing, query needs in 50 shards request. custom routing, query knows shards search, making queries more performant. more here.
you can assign custom routing each bulk item including routing value _routing field, described in bulk api docs.
Comments
Post a Comment