Elasticsearch with Python — Part 1 [Installation and basics]

In this article, I am covering how to install, the basics of Elasticsearch and how to integrate it with python. So let’s get started…

Elasticsearch:

Elasticsearch [ES] is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene. Elasticsearch is commonly used for
- log analytics
- full-text search
- security intelligence
- business analytics
- operational intelligence use cases.

ES Installation:

Install ES requires a recent version of Java. If not installed in your system, kindly install it.
1. Download the archive file from this link.
2. Extract the downloaded archive file and run bin\elasticsearch.bat.
3. Test it in the browser ‘localhost:9200, You will get an output like the one below.

How ES works:

You can send data in the form of JSON documents to ES using API. Elasticsearch automatically stores the original document and adds a searchable reference to the document in the cluster’s index. You can then search and retrieve the document using the ES API. You can also use Kibana.

Mapping concepts across SQL and Elasticsearch:

Table — Index
Row — Document
Column — Field

Index:

An index is like a database in a relational database. It is the place to store related documents. To fetch any document we need 3 things Index(Database), Datatype(type of the document), Id(Id of the document)

Integrating ES with python:

  • Install ES package for python, use the link

  • To Import and Connect ES in Python

#Import ES package
from elasticsearch import Elasticsearch 
#Connect to elastic cluster
es=Elasticsearch([{'host':'localhost','port':9200}])

ES stores entire an object or documents. ES uses JSON as the serialization format for the documents. ES also indexes the content of each document in order to make them searchable. In Elasticsearch you can index, search, sort, and filter documents.

Inserting Document:

Storing data in Elasticsearch is called indexing. An Elasticsearch cluster contains multiple indices, which contain multiple types. These types hold multiple documents, and each document has multiple fields. To insert documents

student1 = {
    "f_name":"shailesh",
    "l_name":"jadhav",
    "age": 25,
    "subjects": ["English", "Science", "Maths"],
    "hobbies": ['sports','movies'],
    "about":"Like to play cricket"
    "subject_marks":[
             {"english":79, "out_of":100},
             {"maths":68, "out_of":100},
             {"science":85, "out_of":100},
     ]
}
student2 = {
    "f_name":"amit",
    "l_name":"rathod",
    "age": 19,
    "subjects": ["English", "Science", "Maths"],
    "hobbies": ['painting','movies'],
    "about":"Like to paint"
    "subject_marks":[
             {"english":69, "out_of":100},
             {"maths":72, "out_of":100},
             {"science":79, "out_of":100},
     ]
}
student3 = {
    "f_name":"ajay",
    "l_name":"mishra",
    "age": 20,
    "subjects": ["English", "Science", "Maths"],
    "hobbies": ['sports','movies'],
    "about":"Like to play football"
    "subject_marks":[
             {"english":52, "out_of":100},
             {"maths":82, "out_of":100},
             {"science":67, "out_of":100},
     ]
}
res=es.index(index='school',doc_type='student',id=1,body=student1)
res=es.index(index='school',doc_type='student',id=2,body=student2)
res=es.index(index='school',doc_type='student',id=3,body=student3)

Fetching Documents by ID:

Fetching documents by ID is very simple. For this, we simply execute a GET request and specify the index, type, and ID. Using those three things, we can return the JSON document.

res = es.get(index='school', doc_type='student', id=1)
print(res)
#This will print
{
  "_type": "student", 
  "_source":{
        "f_name":"shailesh",
        "l_name":"jadhav",
        "age": 18,
        "subjects": ["English", "Science", "Maths"],
        "hobbies": ['sports','movies'],
        "about":"Like to watch movies"
        "subject_marks":[
             {"english":79, "out_of":100},
             {"maths":68, "out_of":100},
             {"science":85, "out_of":100},
        ]
   }, 
   "_index': "school", 
   "_version': 1, 
   "found': True, 
   "_id": "1"
}
#NOTE:The actual document is in the _sorce field

Removing Document by ID:

Deleting a document is easy. For this, we execute the ‘DELETE’ request and require Index, Type, and ID.

res=es.delete(index='school', doc_type='student', id=3)
print(res['result'])
deleted#For checking document is deleted or not
res= es.search(
             index="school",
             body={
                 "query":{
                    "match_all":{
                     }
                  }
              }
          )
print("Total count:", res["hits"]["total"])
#this will print
Total count: 2

Text Search and Search Operators

match operator:

Now let’s filter the user who has ‘ajay’ in the name.

res= es.search(
            index='school',
            body={
               'query':{
                   'match':{
                      'f_name':'ajay'
                    }
                 }
              }
           )
print(res['hits']['hits'])
[
   {
      '_score': 0.2876821, 
      '_type': 'student', 
      '_id': u'1', 
      '_source':{
                 "f_name":"ajay",
                 "l_name":"mishra",
                 "age": 20,
                 "subjects": ["English", "Science", "Maths"],
                 "hobbies": ['sports','movies'],
                 "about":"Like to play football"
                 "subject_marks":[
                           {"english":52, "out_of":100},
                           {"maths":82, "out_of":100},
                           {"science":67, "out_of":100},
                  ]
        }, 
      '_index': 'school'
    }
]

Filter Operator:

Now we searched user who has ‘ajay’ in the first name. Now if you want to find all students with the first name of ‘ajay’ and want only students whose age is greater than 18.

res= es.search(index='school',body={
        'query':{
            'bool':{
                'must':{
                    'match':{
                        'f_name':'ajay'
                    }
                },
                "filter":{
                    "range":{
                        "age":{
                            "gt":18
                        }
                    }
                }
            }
        }
    })
print(res['hits']['hits'])
[
   {
      '_score': 0.2876821, 
      '_type': 'student', 
      '_id': u'1', 
      '_source':{
                 "f_name":"ajay",
                 "l_name":"mishra",
                 "age": 20,
                 "subjects": ["English", "Science", "Maths"],
                 "hobbies": ['sports','movies'],
                 "about":"Like to play football"
                 "subject_marks":[
                           {"english":52, "out_of":100},
                           {"maths":82, "out_of":100},
                           {"science":67, "out_of":100},
                  ]
        }, 
      '_index': 'school'
    }
]

bool operator:

Bool takes a dictionary containing at least one of must, should, and must_not, each of which takes a list of matches or other further search operators.

res= es.search(index='school',body={
        'query':{
            'bool':{
                'must':[{
                        'match':{
                            'f_name':'ajay'
                        }
                    }]
            }
        }
    })
print(res['hits']['hits'])
res= es.search(index='school',doc_type='student',body={
        'query':{
            'match':{
                "about":"play cricket"
            }
        }
    })
for hit in res['hits']['hits']:
    print("About:",hit['_source']['about']) 
    print("Score:",hit['_score'])
About: Like to play cricket
Score: 0.7549128
About: Like to play football
Score: 0.5753642#This return two documents but scores are different for both the #records depending upon their match

Sometimes we need to filter documents by exact match and exact sequence of words or phrases.

res= es.search(index='school',doc_type='student',body={
        'query':{
            'match_phrase':{
                "about":"play cricket"
            }
        }
    })
for hit in res['hits']['hits']:
    print("About",hit['_source']['about']) 
    print("Score", hit['_score'])
About: Like to play cricket
0.5753642

I hope this article helps you to start learning about ES. Please comment and share your feedback.