Enhancing Kubernetes Monitoring with Kuberhealthy
Written on
Chapter 1: Introduction to Kuberhealthy
Kuberhealthy offers a powerful way to implement synthetic monitoring directly within your Kubernetes clusters, eliminating the need for costly third-party solutions.
Synthetic monitoring serves as an essential resource for detecting performance issues, ensuring server availability, and monitoring DNS resolutions, among other functionalities. While many engineers depend on platforms like Datadog or New Relic for synthetic testing, Kuberhealthy allows you to set up your monitoring systems within your Kubernetes environment.
This guide will explore how to deploy Kuberhealthy, configure it for use, create synthetic checks, and set up monitoring and alerting directly within your cluster.
Section 1.1: What is Kuberhealthy?
Kuberhealthy is an incubator project under the CNCF (Cloud Native Computing Foundation). This Kubernetes operator offers KuberhealthyCheck custom resources, enabling users to create both built-in and custom synthetic checks. These checks assess the performance of your cluster, its components, or even external services.
Before we can proceed with deployment, let’s ensure we meet the necessary prerequisites. Since Kuberhealthy exposes synthetic check outcomes as Prometheus metrics, a Prometheus stack must be operational within your cluster. If you already run applications on Kubernetes, you likely have a Prometheus stack in place. If not, or if you wish to follow along in a test environment, you can initiate a new Minikube cluster with the following commands:
minikube delete && minikube start
--kubernetes-version=v1.26.1
—memory=6g
--bootstrapper=kubeadm
—extra-config=kubelet.authentication-token-webhook=true
--extra-config=kubelet.authorization-mode=Webhook
—extra-config=scheduler.bind-address=0.0.0.0
--extra-config=controller-manager.bind-address=0.0.0.0
minikube addons disable metrics-server
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack -f values.yaml
The above commands will set up a new Minikube cluster, along with the necessary flags to run the kube-prometheus-stack (Prometheus Operator). During installation, we also provided a values.yaml file that contains configuration settings for Alertmanager. This configuration will be used to dispatch Slack notifications for any failing Kuberhealthy checks. The values.yaml file can be accessed in this gist.
To enhance your monitoring, you can also deploy a custom Grafana dashboard for Kuberhealthy, which can be found in the Kuberhealthy repository.
With these components in place, access Prometheus, Grafana, and Alertmanager using the following commands:
kubectl port-forward -n default svc/monitoring-kube-prometheus-prometheus 9090
kubectl port-forward -n default svc/monitoring-grafana 3000:80 # User: admin, Password: prom-operator
kubectl port-forward -n default svc/monitoring-kube-prometheus-alertmanager 9093
Now that we have Prometheus set up, let's proceed with deploying Kuberhealthy.
Section 1.2: Deploying Kuberhealthy
helm install -n kuberhealthy kuberhealthy kuberhealthy/kuberhealthy --create-namespace --values values.yaml
Kuberhealthy is also available as a Helm Chart, which simplifies the deployment process. The values.yaml file is utilized to adjust the configuration settings, which include enabling integration with Prometheus and configuring service monitors and rules.
After the deployment, you can test whether Kuberhealthy is functioning by running the following command:
kubectl port-forward -n kuberhealthy svc/kuberhealthy 8080:80
You can verify its status by executing:
curl localhost:8080 | jq .
If everything is working correctly, you should see a JSON response indicating the status of the Kuberhealthy cluster.
In the video titled "What is KuberHealthy," you can gain insights into the core functionalities and benefits of using Kuberhealthy for monitoring your Kubernetes environments.
Section 1.3: Configuring Checks
Once Kuberhealthy is deployed and configured, you can start implementing your checks. Here’s an example of a basic ping check:
apiVersion: comcast.github.io/v1
kind: KuberhealthyCheck
metadata:
name: ping-check
namespace: kuberhealthy
spec:
runInterval: 30m
timeout: 10m
podSpec:
containers:
env:
name: CONNECTION_TIMEOUT
value: "10s"
name: CONNECTION_TARGET
value: "tcp://google.com:443"
image: kuberhealthy/network-connection-check:v0.2.0
name: main
Each check is represented as an instance of the KuberhealthyCheck custom resource. The parameters include the run interval, timeout, and pod specifications.
After the deployment of this check, you can monitor its logs:
kubectl logs -n kuberhealthy ping-check-1679831147
You will see logs confirming the success of the check.
Next, you can deploy additional checks such as HTTP checks or SSL expiry checks. Here’s an example of an HTTP content check:
apiVersion: comcast.github.io/v1
kind: KuberhealthyCheck
metadata:
name: http-content-check
namespace: kuberhealthy
spec:
runInterval: 60s
timeout: 2m
podSpec:
containers:
image: kuberhealthy/http-content-check:v1.5.0
imagePullPolicy: IfNotPresent
name: main
env:
name: "TARGET_URL"
name: "TARGET_STRING"
value: "whatever"
name: "TIMEOUT_DURATION"
value: "30s"
You can explore more built-in checks suitable for your needs, including checks for SSL certificate expiration.
The video "Don't Catch Feelings, Catch Issues With Kuberhealthy" featuring Joshulyne Park & Shilla Saebi from Comcast dives deeper into practical implementations and strategies for effective monitoring.
Chapter 2: Monitoring and Alerts
With all checks operational, the next step is to monitor their outcomes using PromQL queries. In addition to the kuberhealthy_cluster_state and kuberhealthy_running metrics, Kuberhealthy provides metrics specifically for each check.
To ensure reliability, Kuberhealthy operates multiple replicas of the operator, so your queries may return multiple results. You can refine your queries to focus on the current master Pod in the Kuberhealthy cluster.
For example, the following query focuses on the ping check:
label_replace(kuberhealthy_check{check="kuberhealthy/ping-check"}, "current_master", "$1", "pod", "(.+)")
- on (current_master) group_left() topk(1, kuberhealthy_running{}) < 1
This query evaluates whether the ping check is successful. It's crucial to set up PrometheusRules for alerting based on these metrics.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: synthetics
namespace: default
labels:
prometheus: prometheus
release: monitoring
spec:
groups:
name: synthetics
rules:
alert: PingFailed
expr: >
label_replace(kuberhealthy_check{check="kuberhealthy/ping-check"}, "current_master", "$1", "pod", "(.+)")
- on (current_master) group_left() topk(1, kuberhealthy_running{}) < 1
for: 5m
labels:
severity: warning
annotations:
summary: HTTP Ping failed
description: "Kuberhealthy was not able to reach tcp://google.com:443"
These rules will trigger alerts when checks fail, allowing you to maintain oversight over your Kubernetes environment.
In conclusion, Kuberhealthy is a valuable tool for enhancing your Kubernetes monitoring capabilities, allowing you to implement synthetic checks that fill gaps left by traditional metrics. It provides a familiar interface via Kubernetes and Prometheus, ensuring you can monitor both your cluster and external services effectively.
If you're interested in creating your custom checks, further documentation is available to guide you through the process.
Want to Connect?
This article was originally posted at martinheinz.dev.