Elasticsearch - Basic Setup, indexing and searching

Elasticsearch

It is an open source search and analytics engine based on Apache Lucene. Elasticsearch is centre for Elastic stack which contains tools like kibana, logstash and beats like metricbeat and filebeat.

Installation

You can install elasticsearch many ways :

  • Using package manager - deb, rpm
  • Using binaries, exe or msi (windows), brew (mac os)

Please go through this link and configure as per your need.

I recommended docker for local development and learning. I find it clean and easier to maintain.

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.1

Docker will download elasticsearch 7.10.1 from docker hub and install single node of elasticsearch. This is not recommended for production system. For production system, multiple nodes are mandatory for proper replication and sharding factor.

Elasticsearch Terminology

  • Indexing : persisting data in the elasticsearch.
  • replication : duplicate copy of shards, it makes the ES fault tolerant.
  • sharding : breaking down large dataset(index) into smaller dataset, it helps elasticsearch for faster indexing and allows parallel searching
  • Nodes : Data nodes(data stores here) and master nodes (manages the data node). Use compute type instance for master nodes and memory/IO type instances for data nodes.

Index Creation

Index in elasticsearch is like logical storage unit. It is like a database in RDBMS.

For more info : elastic.co/blog/what-is-an-elasticsearch-in..

{
  "settings": {
    "index": {
      "number_of_shards": 1, // best practice it should be less than equal to data nodes 
      "number_of_replicas": 0  // we have single node es, it does not make sense have replica in same node.
    }
  }
}

Checkout this project from my github repo : github.com/sagar-rout/introduction-elastics..

 # if index is already there and you are trying to create index with same name,
 # elasticsearch will not like it and throw
 # raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
 # elasticsearch.exceptions.RequestError: RequestError(400, 'resource_already_exists_exception', 'index [favourite-foods/379Q_MsQS1yqPohqXylDPg] already exists')

 try:
     es.indices.create(index=INDEX_NAME, body=index_settings)
 except:
    print('Indices is already there.')
    pass

Mapping

  • Dynamic Mapping : When elasticsearch creates mapping based on the first time entered data.

  • Custom/static Mapping : When user defines mapping and elasticsearch stores this information before we index data.

from datetime import datetime
from elasticsearch import Elasticsearch
import json

es = Elasticsearch(
)  # This will connect with default elasticsearch connection http://localhost:9200

INDEX_NAME = 'favourite-foods'

# Create index in the elasticsearch

index_settings = {
    "settings": {
        "index": {
            "number_of_shards": 1,
            "number_of_replicas": 0
        }
    }
}

# if index is already there and you are trying to create index with same name,
# elasticsearch will not like it and throw
# raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
# elasticsearch.exceptions.RequestError: RequestError(400, 'resource_already_exists_exception', 'index [favourite-foods/379Q_MsQS1yqPohqXylDPg] already exists')

try:
    es.indices.create(index=INDEX_NAME, body=index_settings)
except:
    print('Indices is already there.')
    pass

# Index(Store) this document
sagar_doc = {
    'user_name': 'Sagar',
    'text': 'banana bread and espresso',
    'timestamp': datetime.now()
}

divya_doc = {
    'user_name': 'Divya',
    'text': 'Cheese cake and cappuccino',
    'timestamp': datetime.now()
}

geetika_doc = {
    'user_name': 'Geetika',
    'text': 'Tea',
    'timestamp': datetime.now()
}

# Index data in the

sagar_res = es.index(
    index=INDEX_NAME, id=1, body=sagar_doc
)  # Don't use custom id until you want to use id for searching because it slows down indexing
divya_res = es.index(index=INDEX_NAME, id=2, body=divya_doc)
geetika_res = es.index(index=INDEX_NAME, id=3, body=geetika_doc)

# Search the document using filter
print('Search the document using filter')
body = {"query": {"match": {"text": "banana"}}}
print('Who likes banana bread ? ')

searched_data = es.search(index=INDEX_NAME, body=body)
print(json.dumps(searched_data))

# Search the document using _id
print('Search the document using _id')
document_by_id = es.get(index=INDEX_NAME, id=2)
print(json.dumps(document_by_id))

The source code will create favourite-foods index and index 3 documents.

Execution Result :

➜  introduction-elasticsearch git:(main) source venv/bin/activate
(venv) ➜  introduction-elasticsearch git:(main) python demo-elasticsearch.py 
Indices is already there.
Search the document using filter
Who likes banana bread ? 
{"took": 2, "timed_out": false, "_shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0}, "hits": {"total": {"value": 1, "relation": "eq"}, "max_score": 0.93969977, "hits": [{"_index": "favourite-foods", "_type": "_doc", "_id": "1", "_score": 0.93969977, "_source": {"user_name": "Sagar", "text": "banana bread and espresso", "timestamp": "2020-12-13T21:12:07.470811"}}]}}
Search the document using _id
{"_index": "favourite-foods", "_type": "_doc", "_id": "2", "_version": 9, "_seq_no": 24, "_primary_term": 1, "found": true, "_source": {"user_name": "Divya", "text": "Cheese cake and cappuccino", "timestamp": "2020-12-13T21:15:39.779707"}}
(venv) ➜  introduction-elasticsearch git:(main)
➜  ~ http http://localhost:9200/favourite-foods/_search
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 311
content-type: application/json; charset=UTF-8

{
    "_shards": {
        "failed": 0,
        "skipped": 0,
        "successful": 1,
        "total": 1
    },
    "hits": {
        "hits": [
            {
                "_id": "1",
                "_index": "favourite-foods",
                "_score": 1.0,
                "_source": {
                    "text": "banana bread and espresso",
                    "timestamp": "2020-12-13T21:00:54.855742",
                    "user_name": "Sagar"
                },
                "_type": "_doc"
            },
            {
                "_id": "2",
                "_index": "favourite-foods",
                "_score": 1.0,
                "_source": {
                    "text": "Cheese cake and cappuccino",
                    "timestamp": "2020-12-13T21:00:54.855751",
                    "user_name": "Divya"
                },
                "_type": "_doc"
            },
            {
                "_id": "3",
                "_index": "favourite-foods",
                "_score": 1.0,
                "_source": {
                    "text": "Tea",
                    "timestamp": "2020-12-13T21:00:54.855754",
                    "user_name": "Geetika"
                },
                "_type": "_doc"
            }
        ],
        "max_score": 1.0,
        "total": {
            "relation": "eq",
            "value": 3
        }
    },
    "timed_out": false,
    "took": 1
}