Elasticsearch Cheat Sheet

1. Introduction

  • Data is stored in a schema-less JSON docmuents -> You do not need to define fieds and types before your insert data
  • Near-real-time since it’s a cluster. An update or an insert must be propagated througout the cluster
  • Written in Java -> cross platform

We communicate with ES via its REST API (curl)

curl -X GET http://localhost:9200/person/employee/123

2. Terminology

2.1 Index

  • A collection of documents (product, account, movie). Each of these elements are a type .
  • Similar to a database within a RDBMS
  • Identified by names (lowercase)
  • Can define as many indexes you want (but most peoaple will have a few of them)

2.2 Type

  • Represents a class/category of simlar documents (product, account, movie)
  • Consists of a name and a mapping (explained later)
  • Similar to a table within a RDBMS
  • An index can have one or more types defined, each with their own mapping
  • stored within a metadata field named _type -> Searching for specific documents types applies a filter on this field

2.3 Mapping

  • Similar to a database schema for a table in RDBMS
  • Describes the data type of fields that a document of a given type may have + information on how fields should be indexed and stored
  • Defining a mapping is optional (Dynamic mapping)

2.4 Document

  • A basic unit of information that can be indexed
  • Consists of fields, which are key/value pairs. A value can be a string, date, object…
  • Corresponds to an object in OOP
  • Documents are expressed in JSON
  • Similar to a raw in RDBMS

An index contains documents which have types. Types are defined by mappings.

2.5 Shards

  • An index can be devided intpo multiple pieces called shards -> Useful when an index contains more data than the hardwware of a node can store
  • A shard is a fully functional and independant index
  • The number of shards can be specified when creating an index (default = 5)
  • Allows to scale horizontally
  • Allows to distribute and parallelize operations across shards -> Increaes performance

2.6 Replicas

  • A replica is a copy of a shard (default = 1 )
  • Provides High Availability in case a shard or node fails
  • Allows scaling search volume, because search queries can be executed on all replicas

3. Creating/Deleting an Index

List all indexes

curl -X GET  http://localhost:9200/_cat/indices?v

Create in index

curl -X PUT http://localhost:9200/ecommerce -d '{ }'

Note: If you insert a document without defining an index, ES will automatically create an index.

Delete in index”

curl -X DELETE http://localhost:9200/ecommerce

Adding mappings

curl -X PUT http://localhost:9200/ecommerce -d '
{
	"mappings": {
		"product": {
			"properties": {
				"name": {
					"type": "string"
				},
				"price": {
					"type": "double"
				},
				"description": {
					"type": "string"	
				},
				"status": {
					"type": "string" 
				},
				"quantity": {
					"type": "integer"
				},
				"categories": {
					"type": "nested",
					"properties": {
						"name": {
							"type": "string"
						}
					}
				},
				"tags": {
					"type": "string"
				}
			}
		}
		
	}
}'

Note: We cannot add mappings to existing Data.

4. Documents

4.1 Adding documents

curl -X POST http://localhost:9200/ecommerce/product/1001 -d '
{
	"name": "Zend framework",
	"price": 30.00,
	"description": "Learn Zend framwork infew hours",
	"status": "active",
	"quantity": 1,
	"categories": [
		{ "name": "Software"}
	],
	"tags": ["zendframework", "php", "progeamming", "zd2"]
}' 

Note: Providing an ID is optional. If not provided ES will generate an ID

4.2 Replacing documents

Replacing documents is done with the same request as adding a document when specifying the ID.

curl -X POST http://localhost:9200/ecommerce/product/1001 -d '
{
	"name": "Zend framework 2",
	"price": **40.00**,
	"description": "Learn Zend framwork infew hours",
	"status": "active",
	"quantity": 1,
	"categories": [
		{ "name": "Software"}
	],
	"tags": ["zendframework", "php", "progeamming", "zd2"]
}' 

4.3 Updating documents

Updating a doucment lets you add/remove or modify a single field without providing all the information as when we replaced the document.

curl -X POST http://localhost:9200/ecommerce/product/1001/_update -d '
{
	"doc": {
		"price": 50.00
	}
}' 

4.4 Deleting documents

curl -X DELETE http://localhost:9200/ecommerce/product/1001

Note: Basically, we may only delete documents by ID but there is plugin “DeleteByQuery” that lets you delete by query.

4.5 Requesting a document

curl http://localhost:9200/ecommerce/product/1003

5. Batch processing

Batch processing with bulk limits the amount of network overhead as it will need a unique network round trip.

You need to edit a file with the content of your bulk : vi ./requests

{"index":{"_id":"1002"}}
{"name": "Zendtest framework","price": 40.00,"description": "Leran Zend framwork infew hours","status": "active","quantity": 1,"categories": [{ "name": "Software"}],"tags": ["zendframework", "php", "progeamming", "zd2"]}
{"index":{"_id":"1003"}}
{"name": "Zendtest2 framework","price": 40.00,"description": "Leran Zend framwork infew hours","status": "active","quantity": 1,"categories": [{ "name": "Software"}],"tags": ["zendframework", "php", "progeamming", "zd2"]}

And then call the _bulk API with reference to your file

curl -X POST http://localhost:9200/ecommerce/product/_bulk --data-binary "@requests"

You may also DELETE or UPDATE documents using the _bulk API

{ "delete":{"_id":"1002" } }
{ "update":{"_id":"1003" } }
{ "doc": { "quantity" : 33 } }

Note: If you need to bulk actions on several type or indexes you may omit them in the API call and specify them in your jsons

curl -X POST http://localhost:9200/_bulk --data-binary "@requests"
{ "update":{"_id":"1003", "_index" : "ecommerce", "_type" : "product" } }
{ "doc": { "quantity" : 33 } }

Note: If an action fails, the remaining actions will still be executed. We have an action in the returned json wich lets us identify the errors.

6. Searching with Elastic

6.1. Relevancy & Scoring

  • A score is calculated for each documents that matches a query (This higher the score, the more relevant the document is)
  • Queries in query context affect the scores of matching documents (How well does the document match ?)
  • Queries in filter context do not affect the scores of matching documents (Does the docuiment match ?)

6.2. Query String

  • Used for simple queries.

All fields are searched.

curl -X POST http://localhost:9200/ecommerce/product/_search?q=pasta

You may specify a field

curl -X POST http://localhost:9200/ecommerce/product/_search?q=name:pasta

6.3. Query DSL

  • Search by defining queries within the request body JSON
  • Supports more features than the query string approach
curl -X POST http://localhost:9200/ecommerce/product/_search -d '
{
	"query": {
    	"match": {
        	"name": zend
        }
    }    
}' 

You may add some logical operators

curl -X POST http://localhost:9200/ecommerce/product/_search -d '
{
	"query": {
    	"query_string": {
        	"query": "(name:Zendtest2 AND description:Learn)"
            }
        }
}' 

Adding a “+” or a “-“ before a statement means that the word must or must not be present

curl -X POST http://localhost:9200/ecommerce/product/_search -d '
{
	"query": {
    	"query_string": {
        	"query": "+name:Zendtest2 +name:Zendtest2"
            }
    }
}'

More query types : https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

comments powered by Disqus