Elasticsearch query performance -


i'm using elasticsearch index 2 types of objects -

data details

contract object ~ 60 properties (object size - 120 bytes) risk item object ~ 125 properties (object size - 250 bytes)

contract parent of risk item (_parent)

i'm storing 240 million such objects in single index (210 million risk items, 30 million contracts)

index size - 322 gb

cluster details

11 m2.4x.large ec2 boxes [68 gb memory, 1.6 tb storage, 8 cores](1 box load balancer node node.data = false) 50 shards 1 replica

elasticsearch.yml

node.data: true http.enabled: false index.number_of_shards: 50 index.number_of_replicas: 1 index.translog.flush_threshold_ops: 10000 index.merge.policy.use_compound_files: false indices.memory.index_buffer_size: 30% index.refresh_interval: 30s index.store.type: mmapfs path.data: /data-xvdf,/data-xvdg 

i'm starting elasticsearch nodes following command - /home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -xms30g -xmx30g

my problem i'm running following query on risk item type , taking 10-15 seconds return data, 20 records.

i'm running load of 50 concurrent users , bulk index load of 5000 risk items happening in parallel.

query (with join parent child)

http://:9200/contractindex/riskitem/_search*

{     "query": {         "has_parent": {             "parent_type": "contract",             "query": {                 "range": {                     "contractdate": {                         "gte": "2010-01-01"                     }                 }             }         }     },     "filter": {         "and": [{             "query": {                 "bool": {                     "must": [{                         "query_string": {                             "fields": ["riskitemproperty1"],                             "query": "abc"                         }                     },                     {                         "query_string": {                             "fields": ["riskitemproperty2"],                             "query": "xyz"                         }                     }]                 }             }         }]     } } 

queries 1 table

query1 (this query takes around 8 seconds.)

 <!-- language: lang-json -->      {         "query": {             "constant_score": {                 "filter": {                     "and": [{                         "term": {                             "commoncharacteristic_buildingscheme": "buildingscheme1"                         }                     },                     {                         "term": {                             "address_admin2name": "admin2name1"                         }                     }]                 }             }         }     }    **query2** (this query takes around 6.5 seconds top 10 records ( has sort on top of it)   <!-- language: lang-json -->      {         "query": {             "constant_score": {                 "filter": {                     "and": [{                         "term": {                             "insurer": "insurer1"                         }                     },                     {                         "term": {                             "status": "status1"                         }                     }]                 }             }         }     } 

can please me how can improve query performance ?

have tried custom routing? without custom routing, query needs in 50 shards request. custom routing, query knows shards search, making queries more performant. more here.

you can assign custom routing each bulk item including routing value _routing field, described in bulk api docs.


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -