Review: Elasticsearch Exporter

In this edition of our exporter review series, we introduce the Elasticsearch exporter, one of the best-fit exporters for monitoring metrics used by NexClipper. Read on to find out the exporter’s most important metrics, recommended alert rules, as well as the related Grafana dashboard and Helm Chart.

About Elasticsearch

Elasticsearch is a RESTful search engine, data store, and analytics solution. It is developed in Java and based on Apache Lucene. Elasticsearch is mainly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

Elasticsearch is a NoSQL database, which means it stores data in an unstructured way. You can send data in the form of JSON documents using the API or ingestion tools like Logstash. Elasticsearch will store the data and add searchable references to it. You can then search and retrieve the document using the Elasticsearch API or a visualization tool like Kibana. 

Elasticsearch used to be open source under the Apache License until 2021 when Elastic NV announced that they would change their software licensing strategy to offer it under the Elastic license.

Since Elasticsearch, like all other databases, is a critical resource, downtime can cause significant financial and reputation losses, therefore monitoring is a must. The Elasticsearch exporter is required to monitor and expose Elasticsearch metrics. It queries Elasticsearch, scraps the data, and exposes the metrics to a Kubernetes service endpoint that can further be scrapped by Prometheus to ingest time series data. For monitoring of Elasticsearch, an external Prometheus exporter is used, which is maintained by the Prometheus Community. On deployment, the Elasticsearch exporter scraps sizable metrics from Elasticsearch and helps users get crucial and continuous information about Elasticsearch which is difficult and time-consuming to extract from Elasticsearch directly. 

For this setup, we are using Elastic/Elasticsearch Helm charts to start the Elasticsearch cluster.

How do you set up an exporter for Prometheus?

With the latest version of Prometheus (2.33 as of February 2022), these are the ways to set up a Prometheus exporter: 

Method 1 – Basic

Supported by Prometheus since the beginning
To set up an exporter in the native way a Prometheus config needs to be updated to add the target.
A sample configuration:

# scrape_config job
scrape_configs:
  - job_name: elasticsearch
    scrape_interval: 45s
    scrape_timeout:  30s
    metrics_path: "/metrics"
    static_configs:
    - targets:
      - <elasticsearch exporter endpoint>
Method 2 – Service Discovery

This method is applicable for Kubernetes deployment only.
A default scrap config can be added to the prometheus.yaml file and an annotation can be added to the exporter service. With this, Prometheus will automatically start scrapping the data from the services with the mentioned path.

Prometheus.yaml

     - job_name: kubernetes-services   
        scrape_interval: 15s
        scrape_timeout: 10s
        kubernetes_sd_configs:
        - role: service
        relabel_configs:
        # Example relabel to scrape only endpoints that have
        # prometheus.io/scrape: "true" annotation.
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        #  prometheus.io/path: "/scrape/path" annotation.
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        #  prometheus.io/port: "80" annotation.
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: (.+)(?::\d+);(\d+)
          replacement: $1:$2

Exporter service annotations:

 annotations:
    prometheus.io/path: /metrics
    prometheus.io/scrape: "true"
Method 3 – Prometheus Operator

Setting up a service monitor
The Prometheus operator supports an automated way of scraping data from the exporters by setting up a service monitor Kubernetes object. For reference, a sample service monitor for Redis can be found here.
These are the necessary steps:

Step 1

Add/update Prometheus operator’s selectors. By default, the Prometheus operator comes with empty selectors which will select every service monitor available in the cluster for scrapping the data.

To check your Prometheus configuration:

Kubectl get prometheus -n <namespace> -o yaml

A sample output will look like this.

ruleNamespaceSelector: {}
    ruleSelector:
      matchLabels:
        app: kube-prometheus-stack
        release: kps
    scrapeInterval: 1m
    scrapeTimeout: 10s
    securityContext:
      fsGroup: 2000
      runAsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    serviceAccountName: kps-kube-prometheus-stack-prometheus
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector:
      matchLabels:
        release: kps

Here you can see that this Prometheus configuration is selecting all the service monitors with the label release = kps

So with this, if you are modifying the default Prometheus operator configuration for service monitor scrapping, make sure you use the right labels in your service monitor as well.

Step 2

Add a service monitor and make sure it has a matching label and namespace for the Prometheus service monitor selectors (serviceMonitorNamespaceSelector & serviceMonitorSelector).

Sample configuration:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: elasticsearch-exporter
    meta.helm.sh/release-namespace: monitor
  labels:
    app: prometheus-elasticsearch-exporter
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-elasticsearch-exporter-1.1.0
    heritage: Helm
    release: kps
  name: prometheus-elasticsearch-exporter
  namespace: monitor
  spec:
  endpoints:
  - interval: 15s
    port: elasticsearch-exporter
  selector:
    matchLabels:
      app: prometheus-elasticsearch-exporter
      release: elasticsearch-exporter

As you can see, a matching label on the service monitor release = kps is used that is specified in the Prometheus operator scrapping configuration.

Metrics

The following handpicked metrics for the Elasticsearch exporter will provide insights into Elasticsearch.

  1. Elasticsearch is up
    This shows whether the last scrape of metrics from Elasticsearch was able to connect to the server.
    ➡ The key of the exporter metric is “elasticsearch_cluster_health_up”
    ➡ The value of the metric is a boolean –  1 or 0 which symbolizes if Easticsearch is up or down respectively (1 for yes, 0 for no) 
  1. Elasticsearch health status
    This reflects the cluster health status as green, yellow, or red. If the status is red this indicates that the specific shard is not allocated in the cluster. Yellow means that the primary shard is allocated but replicas are not while green means that all shards are allocated.
    ➡ The metric key is “elasticsearch_cluster_health_status”
    ➡ The value will be 1 or 0 based on the color label
  1. Memory usage
    High memory pressure reduces performance and results in Out-Of-Memory errors. This is mainly caused by a high number of shards on the node or extensive queries. You may need to increase the memory if you have a high memory usage.
    ➡ The metric key is “elasticsearch_jvm_memory_used_bytes”
    ➡ JVM memory currently used by area – the percentage can be calculated based on elasticsearch_jvm_memory_max_bytes
  1. Elasticsearch disk size
    As the name suggests, this metric gives the size of the disk available for the database.
    ➡ The metric “elasticsearch_filesystem_data_available_bytes” shows the storage size available on the block device used to host the ES
    ➡ The value of this metric is a number in bytes; the percentage can be calculated based on the total disk space metric – “elasticsearch_filesystem_data_size_bytes”
  1. Elasticsearch unassigned shards
    This means ES is running out of capacity or has some issues causing shards to be unassigned. Reason for this could be node failures, disk space issues, or many other causes.
    ➡ The metric “elasticsearch_cluster_health_unassigned_shards” exposes the number of shards that are not assigned
    ➡ The value of this metric is a number and should be greater than 0 to get an alert
  2. Elasticsearch documents
    This metric will give you the data for the number of new documents inserts in the ES in a particular time frame. In the case that the number is 0 or not up to expectation, an alert can be generated.
    ➡ The metric “elasticsearch_indices_docs” will provide the data for the number of documents
    ➡ The value of this metric is a number
  3. Number of nodes
    This metric will provide the data for the number of nodes in the ES cluster. This is an informative metric and can be used to get missing nodes in the cluster.
    ➡ The metric “elasticsearch_cluster_health_number_of_nodes” will deliver the number of health nodes in the cluster
    ➡ The value of this is a number and can be used to calculate the missing nodes from the cluster

Alerting

After digging into all the valuable metrics, this section explains in detail how we can get critical alerts with the Elasticsearch exporter.

PromQL is a query language for the Prometheus monitoring system. It is designed for building powerful yet simple queries for graphs, alerts, or derived time series (aka recording rules). PromQL is designed from scratch and has zero common grounds with other query languages used in time series databases, such as SQL in TimescaleDB, InfluxQL, or Flux. More details can be found here.

Prometheus comes with a built-in Alert Manager that is responsible for sending alerts (could be email, Slack, or any other supported channel) when any of the trigger conditions is met. Alerting rules allow users to define alerts based on Prometheus query expressions. They are defined based on the available metrics scraped by the exporter. Click here for a good source for community-defined alerts.

A general alert looks as follows:

– alert:(Alert Name)
expr: (Metric exported from exporter) >/</==/<=/=> (Value)
for: (wait for a certain duration between first encountering a new expression output vector element and counting an alert as firing for this element)
labels: (allows specifying a set of additional labels to be attached to the alert)
annotation: (specifies a set of informational labels that can be used to store longer additional information)

Some of the recommended Elasticsearch exporter alerts are:

Alert – Cluster down

  - alert: ElasticsearchClusterDown
    expr: elasticsearch_cluster_health_up == 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: Elasticsearch is Down
      description: "Elasticsearch is down for 5 min\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Alert – Health status “yellow

  - alert: ElasticsearchClusterYellow
    expr: elasticsearch_cluster_health_status{color="yellow"} == 1
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Elasticsearch Cluster Yellow (instance {{ $labels.instance }})
      description: "Elastic Cluster Yellow status\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

➡ Alert – Health status “red

  - alert: ElasticsearchClusterRed
    expr: elasticsearch_cluster_health_status{color="red"} == 1
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Elasticsearch Cluster Red (instance {{ $labels.instance }})
      description: "Elastic Cluster Red status\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

 Alert –  ElasticSearch heap size too high

- alert: ElasticsearchHeapUsageTooHigh
    expr: (elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}) * 100 > 90
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: Elasticsearch Heap Usage Too High (instance {{ $labels.instance }})
      description: "The heap usage is over 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

➡ Alert – Database size

  - alert: ElasticsearchDiskOutOfSpace
    expr: elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 < 10
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Elasticsearch disk out of space (instance {{ $labels.instance }})
      description: "The disk usage is over 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Alert – Unassigned shards

 - alert: ElasticsearchUnassignedShards
    expr: elasticsearch_cluster_health_unassigned_shards > 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Elasticsearch unassigned shards (instance {{ $labels.instance }})
      description: "Elasticsearch has unassigned shards\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Alert – Elasticsearch no new documents

  - alert: ElasticsearchNoNewDocuments
    expr: increase(elasticsearch_indices_docs{es_data_node="true"}[10m]) < 1
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Elasticsearch no new documents (instance {{ $labels.instance }})
      description: "No new documents for 10 min!\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Alert – Elasticsearch missing node

  # modify the value with the number of nodes you have in the cluster
  - alert: ElasticsearchHealthyNodes
    expr: elasticsearch_cluster_health_number_of_nodes < 3
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Elasticsearch Healthy Nodes (instance {{ $labels.instance }})
      description: "Missing node in Elasticsearch cluster\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Dashboard

Graphs are easier to understand and more user-friendly than a row of numbers. For this purpose, users can plot their time series data in visualized format using Grafana.

Grafana is an open-source dashboarding tool used for visualizing metrics with the help of customizable and illustrative charts and graphs. It connects very well with Prometheus and makes monitoring easy and informative. Dashboards in Grafana are made up of panels, with each panel running a PromQL query to fetch metrics from Prometheus.
Grafana supports community-driven graphs for most of the widely used software, which can be directly imported to the Grafana Community.

NexClipper uses the Elasticsearch exporter by dcwangmit01 dashboard, which is widely accepted and has a lot of useful panels.

What is a Panel?

Panels are the most basic component of a dashboard and can display information in various ways, such as gauge, text, bar chart, graph, and so on. They provide information in a very interactive way. Users can view every panel separately and check the value of metrics within a specific time range. 
The values on the panel are queried using PromQL, which is Prometheus Query Language. PromQL is a simple query language used to query metrics within Prometheus. It enables users to query data, aggregate and apply arithmetic functions to the metrics, and then further visualize them on panels.

Here are some examples of panels for metrics from the Elasticsearch exporter:

Helm Chart

The Elasticsearch exporter, alert rule, and dashboard can be deployed in Kubernetes using the Helm chart. The Helm chart used for deployment is taken from the Prometheus community, which can be found here.

Installing Elasticsearch server

If your Elasticsearch server is not up and ready yet, you can start it using Helm:

$ helm repo add elastic https://helm.elastic.co
$ helm install elasticsearch elastic/elasticsearch
Installing Elasticsearch exporter
$ helm repo add Prometheus-community https://prometheus-community.github.io/helm-charts

$ helm repo update
$ helm install my-release prometheus-community/prometheus-elasticsearch-exporter --set es.uri=http://<elasticsearch>:9200

Some of the common parameters that must be changed in the values file include: 

es:
  ## Address (host and port) of the Elasticsearch node we should connect to.
  ## This could be a local node (localhost:9200, for instance), or the address
  ## of a remote Elasticsearch server. When basic auth is needed,
  ## specify as: <proto>://<user>:<password>@<host>:<port>. e.g., http://admin:pass@localhost:9200.
  ##
  uri: http://localhost:9200

  ## If true, query stats for all nodes in the cluster, rather than just the
  ## node we connect to.
  ##
  all: true

  ## If true, query stats for all indices in the cluster.
  ##
  indices: true

  ## If true, query settings stats for all indices in the cluster.
  ##
  indices_settings: true

  ## If true, query mapping stats for all indices in the cluster.
  ##
  indices_mappings: true

  ## If true, query stats for shards in the cluster.
  ##
  shards: true

  ## If true, query stats for snapshots in the cluster.
  ##
  snapshots: true

  ## If true, query stats for cluster settings.
  ##
  cluster_settings: false

All these parameters can be tuned via the values.yaml file here.

Scrape the metrics

There are multiple ways to scrape the metrics as discussed above. In addition to the native way of setting up Prometheus monitoring, a service monitor can be deployed (if a Prometheus operator is being used) to scrap the data from the Elasticsearch exporter. With this approach, multiple Elasticsearch servers can be scrapped without altering the Prometheus configuration. Every Elasticsearch exporter comes with its own service monitor.

In the above-mentioned chart, a service monitor can be deployed by turning it on from the values.yaml file here.

serviceMonitor:
  ## If true, a ServiceMonitor CRD is created for a prometheus operator
  ## https://github.com/coreos/prometheus-operator
  ##
  enabled: false
  #  namespace: monitoring
  labels: {}
  interval: 10s
  scrapeTimeout: 10s
  scheme: http
  relabelings: []
  targetLabels: []
  metricRelabelings: []
  sampleLimit: 0

Update the annotation section here in case you are not using the Prometheus Operator.

service: 
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/scrape: "true"

This concludes our review of the exporter for Elasticsearch! If you would like to discuss this or any other topic, please contact us via email to support@nexclipper.io. We will be back with more helpful exporter reviews and other tips very soon.