All about NexClipper’s observability architecture

What is NexClipper?

The NexClipper observability architecture is an OSS-based (i.e. Prometheus) solution, providing a metric dashboard and log/trace explore features as main functions to support responsive resolution activities. Easy installation, operation automation, and continuous expansion of exporters for free guarantee low operation cost.

Let’s take a deep dive into NexClipper’s application architecture below!

Figure 1. Application architecture of NexClipper

NexClipper’s server consists of the following intuitive components: Guided Dashboard, Alert Hub, Incident Management, Group/User/Channel Management, Operation Management & Automation, and Billing & Payment Management.

In addition, NexClipper’s OSS sudoRy is installed with a provided Helm script to install the sudoRy client in a target Kubernetes cluster. After installing the sudoRy client on the customer’s cluster, users are able to run the dashboard immediately by installing and setting various OSS projects including Prometheus, Grafana and more.

ExporterHub, NexClipper’s exporter-aid platform, helps customers to install additional exporters for services, not only providing recommended metrics, but also curated alert rules and the Grafana dashboard to monitor metrics successfully.

Key application stacks of the NexClipper Observability Architecture

Proven OSS projects

The NexClipper observability architecture consists of best-in-class OSS projects that have been proven in their respective field:

  • Prometheus is an open-source monitoring solution and graduate project of CNCF that is widely used in the cloud-native industry and the Kubernetes ecosystem.
  • Grafana is the de-facto standard for open-source monitoring, offering customizable dashboards with visualization tools as well as support for a wide range of databases.
  • Grafana, Loki, and Tempo are proven tools to store and manage logs and trace with scales.
  • OpenTelemetry provides a single, open-source standard and set of technologies to capture and export traces from the cloud-native applications and infrastructure of users.

In addition, NexClipper’s open source tools sudoRy (distributed resource management), NexClipper Cron (scheduler tool for APIs based on node-scheduler), and DS Switch (high-availability aid-tool for Prometheus) are making up for the shortcomings of open source projects by improving the user experience and efficiency.

In this architecture, all OSS components, that are installed on the user’s site are well configured so that the system can be operated even if the NexClipper subscription is cancelled at a later point. So no worries – we don’t lock you in!

Long-term storage

The NexClipper observability architecture uses Cortex for the long-term storage of Prometheus. Cortex provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus and will be installed on the user’s Kubernetes cluster. In the near future, NexClipper will also offer the option to choose cloud platforms in order to install Cortex outside of the user’s clusters.

Log and trace

To identify the root-cause of incidents – the core goal of observability – the NexClipper observability architecture helps users to collect and analyze application logs using Loki. The service further utilizes OpenTelemetry and Tempo to enable trace collection and analysis between distributed microservices. For log collection, users install Promtail, a log processor to feed the logs to Loki.

In order to collect traces, users can create and feed trace data though the OpenTelemetry Library in microservices. The Trace_id for linking log and trace is already set by NexClipper and with the “Explore” option in Grafana, users can easily analyze the correlation between log and trace.

Figure 2. Detailed architecture of log & trace in NexClipper

MetricOps for Alert Hub, incident management, group/user/channel management and anomaly management

The NexClipper observability architecture provides sophisticated and intelligent notifications in order to ensure practical help when it comes to problem solving. This includes a 360-degree view of incidents, and ultimately will include suggestions for solutions.

NexClipper collects alerts every minute and evaluates the alerts for selected targets using the Bayesian network model to forecast anomalies in order to notify users, so that they can arrange proper actions to resolve possible incidents. The solution also suggest proactive actions with NexClipper’s Kubernetes executor sudoRy, in order to automate the deployment of resolution actions. The official release of MeticOps is scheduled for fall 2022.

Figure 3. Anomaly evaluation with alert rules

ExporterHub

ExporterHub, NexClipper’s exporter-aid platform, has been developed to provide information about best-practice exporters to both, customers as well as the whole community. Among over 10,000 exporters on Github, ExporterHub selects qualified exporters and provides them through continuous curation with an introduction to key metrics, alert rules, Grafana dashboards, and values for the Helm chart to install the exporters. NexClipper Observability users can automate the installation of corresponding exporters, the alert configurations, and the Grafana dashboard directly via the user interface. NexClipper aims to review and provide qualified exporters with best practice alert configuration and dashboard continuously to always stay up-to-date.

Guided Dashboard with Grafana

NexClipper provides observability dashboards in connection with Grafana’s dashboards in order to use the dashboards in a guided tour manner while maximizing the OSS advantages of Grafana.

NexClipper’s guided dashboards provide a bird-eye view on the system topology with health status information so that users can see everything that is happening at a glance. Further, a hierarchical list and the status of nodes and microservices under a cluster are displayed. Detailed monitoring can then be done with a link to open the Grafana dashboard.

Figure 4. Guided Dashboard

Distributed Kubernetes Executor – sudoRy

sudoRy is responsible for remotely managing distributed Kubernetes clusters. This function is essential for low-cost and error-free management of distributed IT resources in the cloud native environment. Installation, upgrade, and continuous operation of NexClipper are automatically performed through a predefined service catalog and can be executed regardless of the type and size of the target.

Figure 5. sudoRy application architecture

This concludes our introduction to NexClipper’s observability architecture. If you would like to discuss more about this topic or would like to see a demo, feel free to contact us – we are always happy to assist with any questions you may have.