Beginners Guide to Elasticsearch

30 september 2020 om 10:00 by ParTech Media - Post a comment

In today’s fast-growing technological era, searching and analytics have become two of the most important features. Real-time handling of large data along with scalable and efficient solutions has become mandatory for all kinds of web and mobile applications.

On top of this, usability features such as instant search, search suggestion, auto-completion features have become standard for a modern software application. To implement such user experience features, Elasticsearch comes in handy.

In this post we are going to understand what is Elasticsearch and also look into its key features. There is also a practical implementation section of Elasticsearch to show how it works and how it can be implemented in your application.

Table of Content:

  1. What is Elasticsearch?
  2. Uses of Elasticsearch
  3. How Elasticsearch works?
  4. What are the advantages of Elasticsearch?
  5. Working on Elasticsearch: Practical Implementation
  6. Wrapping Up

What is Elasticsearch?

Elasticsearch can be considered as a search and analytics engine that is Java-based and mostly used for information retrieval. It is built on top of the text search library ‘Apache Lucene’ and released under the Apache license. It can be used to query very large datasets since it is distributed and also provides real-time search and analytics for almost all types of data including semi-structured data.

Being open-source and distributed, Elasticsearch comes in handy while trying to process execution for searching and aggregation of queries over large sets of data including system logs, network traffic, etc. It can also be termed as a NoSQL JSON database that holds in schema-less JSON documents which can be easily queried.

Use cases of Elasticsearch

  1. Text Search Engine – Since Elasticsearch is built on top of Lucene, it powers full-text search capabilities and enables users to perform quick and real-time searching of text data.
  2. Analytics Engine – The analytics use case of Elasticsearch is even more popular than the text search use case. It comes in use while doing log analysis and analysis of other important performance metrics.
  3. Distributed and Scalable – Scaling Elasticsearch is very easy since it is distributed in nature. It helps to prevent data loss by automatically replicating data in other nodes in case of node failures.

How Elasticsearch works?

Elasticsearch stores data in the form of documents. Documents consist of fields and represent a unit of information. A document corresponds to a row in a relational database whereas a field corresponds to a column in a relational database. Since a document is just a JSON object, we just need to pass a JSON object to Elasticsearch to add a document.

Elasticsearch consists of an index which is a collection of documents and can be queried at the highest level in Elasticsearch. Documents in an index are logically related. Inverted Index helps to store a mapping from content to its location in a document that helps to query data more efficiently.

Elasticsearch consists of an index that is split into shards which represent an instance of Lucene. Shards are spread across different machines combined in a cluster of computers. If we require more capacity we can add more machines and add more shards to the index to spread the load more efficiently. When we connect with a server on our cluster for Elasticsearch, it finds out the exact document we’re interested in and assigns it to a particular hash id. This is done so that we’ll have some mathematical function that quickly finds out the shard ID which in turn owns the given document we’re looking for. It redirects us to the appropriate shard on our cluster very quickly. The basic idea is to distribute our index among multiple shards so that a different shard can stay on a different computer within our cluster.

What are the advantages of Elasticsearch?

Here are some of the advantages of Elasticsearch -

  • Store data across all supported types without needing to flatten it to conform to a rigid RDBMS schema.
  • Shard and replicate your data across partitions and nodes as you’d do with any distributed database, to balance the load and provide fault-tolerance.
  • Scale your datastore accordingly: seamlessly add and integrate new nodes as you need to resize your cluster.
  • Index flexibly: on as little as a single word, token, combination of words, wildcards, or even regular expressions or a combination of all of the above.
  • Combine many types of full-text searches, analytical queries, fuzzy searches, and boosted queries into a single, laser-focused, intuitive inquiry. This power and flexibility are unseen in traditional ACID databases.
  • Do all of the above with speed and the lowest latency of any data storage solution around.

Working on Elasticsearch: Practical Implementation

Firstly, you need to download and install Elasticsearch on your machine. Follow the steps mentioned here to install Elasticsearch. You will also require Java in your system. After successfully installing, you can start Elasticsearch with the predefined values- bin/elasticsearch.

Here we look at a basic example of adding a document, retrieving the document, searching for a particular query, and finally deleting the document in Elasticsearch.

Adding document in Elasticsearch

The documents are represented in JSON format in Elasticsearch. The default port of Elasticsearch is 9200. Here, we have added a document of type student with id 1 to the index named marks. As the index is not present earlier, Elasticsearch automatically adds it while running.

Retrieving document in Elasticsearch

You can retrieve whatever is stored in the document using the GET keyword. The syntax for it is:

GET localhost:9200/marks/student/1

The entire metadata will be returned in the JSON format containing all the information about the document.

Searching in Elasticsearch

We can only retrieve documents with a particular search query. For instance, if we want to view the document having details of John, we need to run the code below.

GET localhost:9200/_search?q=John

Deleting a Document in Elasticsearch

Let us delete the student document that we created. To do that, use the DELETE keyword.

DELETE localhost:9200/marks/student/1

Wrapping Up

There are many unique features associated with Elasticsearch that we have seen in this article. This makes it one of the most effective solutions to enhance user experience. Several organizations across the globe are using it today. You can try it for yourself by implementing it in your code and experience its benefits.