In this edition of our exporter review series, we will be introducing the RabbitMQ exporter, one of the best-fit exporters for monitoring metrics used by NexClipper. Read on to find out the most important metrics, recommended alert rules, as well as the related Grafana dashboard and Helm Chart for the RabbitMQ exporter.
About RabbitMQ
RabbitMQ is a widely adopted open source message broker. A message broker is software that enables applications, systems, and services to communicate with each other and exchange information.
RabbitMQ is lightweight, easy to deploy on premises and in the cloud, and able to handle millions of users and transactions. It can be deployed in distributed and federated configurations to meet high-scale, high-availability requirements. It supports multiple messaging protocols – AMQP 1.0, MQTT, STOMP.
Since it is a mission-critical piece of software that binds the applications, monitoring is a must. A RabbitMQ exporter is required to monitor and expose the RabbitMQ metrics. The RabbitMQ exporter queries RabbitMQ, scraps the data, and exposes the metrics to a Kubernetes service endpoint that can further be scrapped by Prometheus to ingest the time series data. For monitoring of RabbitMQ we use an external Prometheus exporter, which is maintained by the Prometheus Community. On deployment the RabbitMQ exporter scraps sizable metrics from RabbitMQ and helps users get crucial information about the message broker which is difficult to get from RabbitMQ directly.
For this setup, we are using bitnami rabbitmq helm charts to start the cluster.
RabbitMQ has a built-in Prometheus plugin as well as an official Prometheus exporter – below we are explaining the setup of both.
RabbitMQ with Prometheus Exporter
How do you set up a RabbitMQ exporter for Prometheus?
With the latest version of Prometheus (2.33 as of February 2022), there are three ways to set up a Prometheus exporter:
Method 1 – Native
Supported by Prometheus since the beginning
To set up an exporter in native way a Prometheus config needs to be updated to add the target.
A sample configuration:
# scrape_config job
- job_name: rabbitmq-staging
scrape_interval: 45s
scrape_timeout: 30s
metrics_path: "/metrics"
static_configs:
- targets:
- <RabbitMQ endpoint>
Method 2 – Service Discovery
This method is applicable for Kubernetes deployment only
With this, a default scrap config can be added to the prometheus.yaml file and an annotation can be added to the exporter service. With this, Prometheus will automatically start scrapping the data from the services with the mentioned path.
Prometheus.yaml
- job_name: kubernetes-services
scrape_interval: 15s
scrape_timeout: 10s
kubernetes_sd_configs:
- role: service
relabel_configs:
# Example relabel to scrape only endpoints that have
# prometheus.io/scrape: "true" annotation.
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# prometheus.io/path: "/scrape/path" annotation.
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# prometheus.io/port: "80" annotation.
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
Exporter service:
annotations:
prometheus.io/path: /metrics
prometheus.io/scrape: "true"
Method 3 – Prometheus Operator
Setting up a service monitor
The Prometheus operator supports an automated way of scraping data from the exporters by setting up a service monitor Kubernetes object. A sample service monitor for RabbitMQ can be found here. These are the necessary steps:
Step 1
Add/update Prometheus operator’s selectors. By default, the Prometheus operator comes with empty selectors which will select every service monitor available in the cluster for scrapping the data.
To check your Prometheus configuration:
Kubectl get prometheus -n <namespace> -o yaml
A sample output will look like this.
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
app: kube-prometheus-stack
release: kps
scrapeInterval: 1m
scrapeTimeout: 10s
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: kps-kube-prometheus-stack-prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: kps
Here you can see that this Prometheus configuration is selecting all the service monitors with the label release = kps
So with this, if you are modifying the default Prometheus operator configuration for service monitor scrapping, make sure you use the right labels in your service monitor as well.
Step 2
Add a service monitor and make sure it has a matching label and namespace for the Prometheus service monitor selectors (serviceMonitorNamespaceSelector & serviceMonitorSelector).
Sample configuration:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
meta.helm.sh/release-name: rabbitmq-exporter
meta.helm.sh/release-namespace: monitor
creationTimestamp: "2022-04-04T10:22:52Z"
generation: 1
labels:
app: prometheus-rabbitmq-exporter
app.kubernetes.io/managed-by: Helm
chart: prometheus-rabbitmq-exporter-1.1.0
heritage: Helm
release: kps
name: rabbitmq-exporter-prometheus-rabbitmq-exporter
namespace: monitor
resourceVersion: "86677099"
uid: 55943299-a8ed-4553-9cdb-cc784176aea8
spec:
endpoints:
- interval: 15s
port: rabbitmq-exporter
selector:
matchLabels:
app: prometheus-rabbitmq-exporter
release: rabbitmq-exporter
Here you can see we have a matching label on the service monitor release = kps that we are specifying in the Prometheus operator scrapping configuration.
Metrics
The following ones are handpicked metrics that will give insights for RabbitMQ operations with the RabbitMQ exporter.
- Server is up
As the name suggests, this metric will expose the state of the RabbitMQ process and whether it is up or down.
➡ The key of the exporter metric is “rabbitmq_up”.
➡ The value of the metric is a boolean – 1 or 0 which symbolizes if RabbitMQ is up or down respectively.
- Overflowing queue
Queues are a fundamental component of any message broker. All messages that are getting pushed or read by RabbitMQ must belong to one of the queues.
Users would never want to choke the queue. If the queue is filled up to the maximum capacity, it can no longer accept new messages.
To get the total number of the ready messages in the queue.
➡ The metric Key is “rabbitmq_queue_messages_ready_total”
➡ The value will be number of the messages, ex: “rabbitmq_queue_messages_ready_total 157”
- Too many connections
RabbitMQ acts as a broker between a publisher and a subscriber. Every client to the queue opens a connection with RabbitMQ. Each new one requires resources from the underlying machine and puts burden on the hardware as well as software. Therefore, the number of connections to RabbitMQ should be limited to avoid any discrepancy in the service.
➡ metric “ rabbitmq_connectionsTotal” gives the total active connections on RabbitMQ
➡ The number should be calculated based on the resources allocated to the RabbitMQ service
- Active queue
As the name suggests the metrics will give insight into how many active queues are present in RabbitMQ that are handling the data.
A message can be enqueued (added) and dequeued (removed). It is important to monitor the active queue.
➡ meric “rabbitmq_queuesTotal” exposes the number of active queues
- Total number of consumers
As the name suggests, this metric will provide insight into how many consumers a queue has. Consumers in RabbitMQ are those targets which consume the message from the queue.
➡ metric “rabbitmq_consumersTotal” exposes the total number of active consumers on a queue
Alerting
After digging into all the valuable metrics, this section explains in detail how we can get critical alerts with the RabbitMQ exporter.
PromQL is a query language for the Prometheus monitoring system. It is designed for building powerful yet simple queries for graphs, alerts, or derived time series (aka recording rules). PromQL is designed from scratch and has zero common grounds with other query languages used in time series databases, such as SQL in TimescaleDB, InfluxQL, or Flux. More details can be found here.
Prometheus comes with a built-in Alert Manager that is responsible for sending alerts (could be email, Slack, or any other supported channel) when any of the trigger conditions is met. Alerting rules allow users to define alerts based on Prometheus query expressions. They are defined based on the available metrics scraped by the exporter. Click here for a good source for community-defined alerts.
A general alert looks as follows:
– alert:(Alert Name)
expr: (Metric exported from exporter) >/</==/<=/=> (Value)
for: (wait for a certain duration between first encountering a new expression output vector element and counting an alert as firing for this element)
labels: (allows specifying a set of additional labels to be attached to the alert)
annotation: (specifies a set of informational labels that can be used to store longer additional information)
Some of the recommended alerts for the RabbitMQ exporter:
- Alert – RabbitMQ is Down
- alert: RabbitmqDown
expr: rabbitmq_up == 0
for: 0m
labels:
severity: critical
annotations:
summary: Rabbitmq down (instance {{ $labels.instance }})
description: "RabbitMQ node down\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- Alert – too many messages in the queue
- alert: RabbitmqTooManyMessagesInQueue
expr: rabbitmq_queue_messages_ready_total > 1000
for: 2m
labels:
severity: warning
annotations:
summary: Rabbitmq too many messages in queue (instance {{ $labels.instance }})
description: "Queue is filling up (> 1000 msgs)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- Alert – RabbitMQ running out of memory
- alert: RabbitmqOutOfMemory
expr: rabbitmq_node_mem_used / rabbitmq_node_mem_limit * 100 > 90
for: 2m
labels:
severity: warning
annotations:
summary: Rabbitmq out of memory (instance {{ $labels.instance }})
description: "Memory available for RabbitMQ is low (< 10%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- Alert – Too many connections
- alert: RabbitmqTooManyConnections
expr: rabbitmq_connections > 1000
for: 2m
labels:
severity: warning
annotations:
summary: Rabbitmq too many connections (instance {{ $labels.instance }})
description: "The total connections of a node is too high\n VALUE = {{ $value }}\n LABELS = {{ $labels }}
- Alert – Too many consumers
- alert: Too_many_consumers
expr: rabbitmq_consumersTotal > 1000
for: 2m
labels:
severity: warning
annotations:
summary: High number of consumers (instance {{ $labels.instance }})
description: "Consumers are exceeding \n VALUE = >1000 \n LABELS = {{ $labels }}"
Dashboard
Graphs are easier to understand and more user-friendly than a row of numbers. For this purpose, users can plot their time series data in visualized format using Grafana.
Grafana is an open-source dashboarding tool used for visualizing metrics with the help of customizable and illustrative charts and graphs. It connects very well with Prometheus and makes monitoring easy and informative. Dashboards in Grafana are made up of panels, with each panel running a PromQL query to fetch metrics from Prometheus.
Grafana supports community-driven graphs for most of the widely used software, which can be directly imported to the Grafana Community.
NexClipper uses the Redis Database by the downager dashboard, which is widely accepted and has a lot of useful panels.
What is a Panel?
Panels are the most basic component of a dashboard and can display information in various ways, such as gauge, text, bar chart, graph, and so on. They provide information in a very interactive way. Users can view every panel separately and check the value of metrics within a specific time range.
The values on the panel are queried using PromQL, which is Prometheus Query Language. PromQL is a simple query language used to query metrics within Prometheus. It enables users to query data, aggregate and apply arithmetic functions to the metrics, and then further visualize them on panels.
Here an example panel for the RabbitMQ exporter:
Showing system up/down with other consumer-related information
Helm Chart
The RabbitMQ exporter, alert rule, and dashboard can be deployed in Kubernetes using Helm chart. The Helm chart used for the deployment of the RabbitMQ exporter is taken from the Prometheus community, which can be found here. To deploy this Helm chart for the RabbitMQ exporter users can either follow the steps in the above link or refer to the ones outlined below:
helm repo add Prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install [RELEASE_NAME] prometheus-community/prometheus-rabbitmq-exporter
Some of the common parameters that must be changed in the values file include:
rabbitmq.url: Defines Rabbit MQ Listening URL.
rabbitmq.user: Rabbit MQ connection User.
rabbitmq.password: RabbitMQ password.
Additional parameters can be changed based on individual needs, such as including_queues, skip_queues, output format, timeouts, etc. All these parameters can be tuned via the values.yaml file here.
capabilities: bert,no_sort
include_queues: ".*"
include_vhost: ".*"
skip_queues: "^$"
skip_verify: "false"
skip_vhost: "^$"
exporters: "exchange,node,overview,queue"
output_format: "TTY"
timeout: 30
max_queues: 0
In addition to the native way of setting up Prometheus monitoring, a service monitor can be deployed (if a Prometheus operator is being used) to scrap the data from RabbitMQ, and Prometheus then scraps the data from the service monitor. With this approach multiple RabbitMQs can be scrapped without altering the Prometheus configuration. Every RabbitMQ comes with its own service monitor.
In the above-mentioned chart, a service monitor can be deployed by turning it on from the values.yaml file here.
# or use the service monitor
prometheus:
monitor:
enabled: true
additionalLabels:
release: kps
interval: 15s
namespace: []
rules:
enabled: true
additionalLabels:
release: kps
app: kube-prometheus-stack
A sample reference values file:
rabbitmq:
url: http://ncmq-rabbitmq-hana.nc.svc.cluster.local:15672
user: guest
password: guest
# If existingPasswordSecret is set then password is ignored
existingPasswordSecret: ~
existingPasswordSecretKey: password
capabilities: bert,no_sort
include_queues: ".*"
include_vhost: ".*"
skip_queues: "^$"
skip_verify: "false"
skip_vhost: "^$"
exporters: "exchange,node,overview,queue"
output_format: "TTY"
timeout: 30
max_queues: 0
## Additional labels to set in the Deployment object. Together with standard labels from
## the chart
additionalLabels: {}
podLabels: {}
# Either use Annotation
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "9419"
# or use the service monitor
prometheus:
monitor:
enabled: true
additionalLabels:
release: kps
interval: 15s
namespace: []
rules:
enabled: true
additionalLabels:
release: kps
app: kube-prometheus-stack
Update the annotation section here if not using Prometheus operator.
annotations:
prometheus.io/path: /metrics
prometheus.io/scrape: "true"
Now let’s move onto the setup of the second way.
RabbitMQ with built-in Prometheus plugin
(no exporter needed)
Additionally, there is a solution to monitor RabbitMQ by using the built-in Prometheus plugin from RabbitMQ. Our recommendation is to use both options.
How to install plugin, choose official metrics, and set alerts
RabbitMQ version V3.8.0 and above supports the way to enable a built-in Prometheus metrics plugin that will expose all RabbitMQ metrics in Prometheus format to an endpoint that Prometheus can scrap by enabling the auto-discovery or by creating a service monitor. To enable the RabbitMQ plugin via Helm charts, set the metrics enabled to “true”.
helm install <release name> bitnami/rabbitmq --set metrics.enabled=true
More details about the plugin can be found here.
In the case of standard Prometheus installation, once the plugin is enabled in RabbitMQ, annotations need to be added to RabbitMQ (if you are using the RabbitMQ chart it will be added automatically). Here are the annotations:
annotations:
prometheus.io/path: /metrics
prometheus.io/scrape: "true"
These annotations should be added on the pod level. Now Prometheus will automatically start scraping the data if the pod discovery is enabled.
Prometheus configuration for pod discovery:
- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
In the case of the Prometheus Operator, once the plugin is enabled in RabbitMQ, the service monitor needs to be enables. For this, run the following command:
helm upgrade ---install <release name> bitnami/rabbitmq --set metrics.enabled=true --set metrics.serviceMonitor.enabled=true
Once the service monitor is created, the Prometheus operator will start scrapping the metrics automatically in the default configuration.
Some important metrics
- Server is up
As the name suggests, this metric will expose the state of the RabbitMQ process and whether it is up or down.
➡ The key of the exporter metric is “rabbitmq_up”.
➡ The value of the metric is a boolean – 1 or 0 which symbolizes if RabbitMQ is up or down respectively.
- Cluster down
Tis metric exposes the state of the RabbitMQ cluster.
➡ The key of the exporter metric is “rabbitmq_running”
➡ The value of the metric is a number that symbolizes the number of nodes in the RabbitMQ cluster.
- Out of memory
The memory status of RabbitMQ is exposed through this metric.
➡ The key of the exporter metric is “rabbitmq_node_mem_used” and “rabbitmq_node_mem_limit”
➡ The value of the metric is a number that symbolizes the number of available memory
- Too many connections
RabbitMQ acts as a broker between a publisher and a subscriber. Every client to the queue opens a connection with RabbitMQ. Each new one requires resources from the underlying machine and puts burden on the hardware as well as software. Therefore, the number of connections to RabbitMQ should be limited to avoid any discrepancy in the service.
➡ metric “ rabbitmq_connectionsTotal” gives the total active connections on RabbitMQ
➡ The number should be calculated based on the resources allocated to the RabbitMQ service
- Cluster partitions down
This metric exposes the RabbitMQ partition status.
➡ The key of the exporter metric is “rabbitmq_partitions”
➡ The value of the metric is a number that symbolizes a number of the network partition created
Some critical alerts
- Alert – Rabbit MQ Down
- alert: RabbitmqDown
expr: rabbitmq_up{service="{{ template "rabbitmq.fullname" . }}"} == 0
for: 5m
labels:
severity: error
annotations:
summary: Rabbitmq down (instance {{ "{{ $labels.instance }}" }})
description: RabbitMQ node down
- Alert – Rabbit MQ Cluster Down
- alert: ClusterDown
expr: |
sum(rabbitmq_running{service="{{ template "rabbitmq.fullname" . }}"})
< {{ .Values.replicaCount }}
for: 5m
labels:
severity: error
annotations:
summary: Cluster down (instance {{ "{{ $labels.instance }}" }})
description: |
Less than {{ .Values.replicaCount }} nodes running in RabbitMQ cluster
VALUE = {{ "{{ $value }}" }}
- Alert – RabbitMQ Partition
- alert: ClusterPartition
expr: rabbitmq_partitions{service="{{ template "rabbitmq.fullname" . }}"} > 0
for: 5m
labels:
severity: error
annotations:
summary: Cluster partition (instance {{ "{{ $labels.instance }}" }})
description: |
Cluster partition
VALUE = {{ "{{ $value }}" }}
- Alert – RabbitMQ is out of memory
- alert: OutOfMemory
expr: |
rabbitmq_node_mem_used{service="{{ template "rabbitmq.fullname" . }}"}
/ rabbitmq_node_mem_limit{service="{{ template "rabbitmq.fullname" . }}"}
* 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: Out of memory (instance {{ "{{ $labels.instance }}" }})
description: |
Memory available for RabbmitMQ is low (< 10%)\n VALUE = {{ "{{ $value }}" }}
LABELS: {{ "{{ $labels }}" }}
- Alert – Too many connections
- alert: TooManyConnections
expr: rabbitmq_connectionsTotal{service="{{ template "rabbitmq.fullname" . }}"} > 1000
for: 5m
labels:
severity: warning
annotations:
summary: Too many connections (instance {{ "{{ $labels.instance }}" }})
description: |
RabbitMQ instance has too many connections (> 1000)
VALUE = {{ "{{ $value }}" }}\n LABELS: {{ "{{ $labels }}" }}
Alerts can be enabled, disabled, altered, or added using the helm chart here.
Dashboard
This is the dashboard that has been used.
This concludes our discussion of the RabbitMQ exporter! If you have any questions, you can reach our team via support@nexclipper.io and stay tuned for further exporter reviews and tips coming soon.