40+ TOP Elasticsearch Interview Questions and Answers
1) What is Elasticsearch?
2) How does Elasticsearch work?
Indexing Documents to the Repository. During an indexing operation, Elasticsearch converts raw data such as log files or message files into internal documents and stores them in a basic data structure similar to a JSON object. Simply do an HTTP POST that transmits your document as a simple JSON object.
3) What is Amazon Elasticsearch?
Amazon Elasticsearch Service makes it easy to deploy, secure, operate and scale Elasticsearch for log analytics, full-text search, application monitoring, and more.
You can set up and configure petabyte-scale Amazon Elasticsearch Service domains in minutes from the AWS Management Console.
4) What is the functionality of Elasticsearch?
Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
5) What is Kibana and elastic search?
Kibana is an open source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster.
Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data.
6) What is Apache Lucene?
Apache Lucene is a free and open-source information retrieval software library, originally written completely in Java.
7) What is NRT in Elasticsearch?
In Elasticsearch NRT stands for Near Real Time Search platform. Elasticsearch is a near real-time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.
8) What is a Cluster in Elasticsearch?
A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes.
A cluster is identified by a unique name which by default is “elasticsearch”. This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.
9) What is Node in Elasticsearch?
A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup.
10) What is Index in Elasticsearch?
Index – An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data.
An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.
11) What is Document in Elasticsearch?
12) What are Shards in Elasticsearch and Explain the concept?
An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.
To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node in the cluster.
13) What are the benefits of Sharding in Elasicsearch?
Sharding is important for two primary reasons:
It allows you to horizontally split/scale your content volume
It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput
14) What are Replicas and Explain what do you understand?
In a network/cloud environment where failures can be expected any time, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason.
To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.
15) What are the benefits of Replicas in Elasticsearch?
Replication is important for two primary reasons:
It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
16) What is the minimum Java version required to install Elasticsearch?
To install Elasticsearch on a machine, you require having at least Java 8.
17) How do you interact with Cluster in Elasticsearch?
Elasticsearch provides a very comprehensive and powerful REST API that you can use to interact with your cluster.
18) What are the benefits of REST API’s in Elasticsearch?
There are many benefits of using REST API’s in Elasticsearch, they are:
Check your cluster, node, and index health, status, and statistics
Administer your cluster, node, and index data and metadata
Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
Execute advanced search operations such as paging, sorting, filtering, scripting, aggregations, and many others
19) How do you create an Index in Elasticsearch?
Now let’s create an index named “customer” and then list all the indexes again:
he first command creates the index named “customer” using the PUT verb. We simply append pretty to the end of the call to tell it to pretty-print the JSON response (if any).
20) How do you delete an Index?
Now let’s delete the index that we just created and then list all the indexes again:
Elasticsearch Frequently Asked Interview Questions
21) What are the different packages available for installing Elasticsearch?
zip/tar.gz – The zip and tar.gz packages are suitable for installation on any system and are the easiest choice for getting started with Elasticsearch on most systems.
deb – The deb package is suitable for Debian, Ubuntu, and other Debian-based systems.
rpm – The rpm package is suitable for installation on Red Hat, Centos, SLES, OpenSuSE and other RPM-based systems.
msi – The msi package is suitable for installation on Windows 64-bit systems with at least .NET 4.5 framework installed
docker – Images are available for running Elasticsearch as Docker containers.
22) What are the configuration management tools supported by Elasticsearch?
Elasticsearch supports the following configuration management tools to help with large deployments:
Puppet – puppet-elasticsearch
Chef – cookbook-elasticsearch
Ansible – ansible-elasticsearch
23) How many types of Configuration files are there in Elasticsearch?
Elasticsearch has three configuration files:
elasticsearch.yml for configuring Elasticsearch
jvm.options for configuring Elasticsearch JVM settings
log4j2.properties for configuring Elasticsearch logging
These files are located in the config directory, whose default location depends on whether or not the installation is from an archive distribution (tar.gz or zip) or a package distribution (Debian or RPM packages).
24) What is X-Pack in Elasticsearch?
X-Pack is an Elastic Stack extension that bundles security, alerting, monitoring, reporting, machine learning, and graph capabilities into one easy-to-install package.
To access this functionality, you must install X-Pack in Elasticsearch.
25) Where do you configure settings for X-Pack?
X-Pack Settings in Elasticsearch – You configure settings for X-Pack features in the elasticsearch.yml, kibana.yml, and logstash.yml configuration files.
26) What are breaking changes in Elasticsearch?
You need to be aware of when migrating your application from one version of Elasticsearch to another.
As a general rule:
Migration between minor versions – e.g. 6.x to 6.y – can be performed by upgrading one node at a time.
Migration between consecutive major versions – e.g. 5.x to 6.x – requires a full cluster restart.
Migration between non-consecutive major versions – e.g. 2.x to 6.x – is not supported.
27) What is Single document APIs in Elasticsearch?
Index API, Get API, Delete API, Update API
28) What are Multi-document APIs?
Multi Get API, Bulk API, Delete By Query API, Update By Query API, Reindex API.
29) What is Routing in Elasticsearch?
When executing a search, it will be broadcast to all the index/indices shards (round robin between replicas). Which shards will be searched on can be controlled by providing the routing parameter.
30) What are Aggregations?
The aggregation’s framework helps provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data.
An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. The context of the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed query/filters of the search request).
31) What are the different types of aggregations in Elasticsearch?
There are many different types of aggregations, each with its own purpose and output.
Metric – Aggregations that keep track and compute metrics over a set of documents.
Matrix – A family of aggregations that operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields.
Unlike metric and bucket aggregations, this aggregation family does not yet support scripting.
Pipeline – Aggregations that aggregate the output of other aggregations and their associated metrics
32) What are Indices APIs?
The indices APIs are used to manage individual indices, index settings, aliases, mappings, and index templates.
33) What is cat API in Elasticsearch?
All the cat commands accept a query string parameter help to see all the headers and info they provide, and the /_cat command alone lists all the available commands.
34) What are the different cat commands available in Elasticsearch cat API?
The different commands available in cat APIs are:
cat aliases, cat allocation, cat count, cat fielddata
cat health, cat indices, cat master, cat nodeattrs
cat nodes, cat pending tasks, cat plugins, cat recovery
cat repositories, cat thread pool, cat shards, cat segments
cat snapshots, cat templates
35) What is Query DSL in Elasticsearch?
Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses:
Leaf query clauses – Leaf query clauses look for a particular value in a particular field, such as the match, term or range queries. These queries can be used by themselves.
Compound query clauses – Compound query clauses wrap other leaf or compound queries and are used to combine multiple queries in a logical fashion (such as the bool or dis_max query), or to alter their behavior (such as the constant_score query).
36) What is Ingest Node?
Use an ingest node to pre-process documents before the actual document indexing happens. The ingest node intercepts bulk and index requests, it applies transformations, and it then passes the documents back to the index or bulk APIs.
37) What are the different types of X-Pack APIs?
X-Pack APIs – X-Pack exposes REST APIs that are used by the UI components and can be called directly to configure and access X-Pack features.
Graph Explore API
Machine Learning APIs
38) What are the different types of X-Pack Commands?
X-Pack includes commands that help you configure security:
39) What is Explore API in Elasticsearch?
The Graph explore API enables you to extract and summarize information about the documents and terms in your Elasticsearch index.
40) What is Migration APIs in Elasticsearch?
The migration APIs simplify upgrading X-Pack indices from one version to another.
Migration Assistance API
Migration Upgrade API
Deprecation Info APIs