System components monitoring#
Controller nodes are isolated by default, which thus means that a cluster user cannot schedule workloads onto controller nodes.
k0s provides a mechanism to expose system components for monitoring. System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts. You can read more about metrics for Kubernetes system components here.
Note: the mechanism is an opt-in feature, you can enable it on installation:
sudo k0s install controller --enable-metrics-scraper
Once enabled, a new set of objects will appear in the cluster:
❯ ~ kubectl get all -n k0s-system
NAME READY STATUS RESTARTS AGE
pod/k0s-pushgateway-6c5d8c54cf-bh8sb 1/1 Running 0 43h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/k0s-pushgateway ClusterIP 10.100.11.116 <none> 9091/TCP 43h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/k0s-pushgateway 1/1 1 1 43h
NAME DESIRED CURRENT READY AGE
replicaset.apps/k0s-pushgateway-6c5d8c54cf 1 1 1 43h
That's not enough to start scraping these additional metrics. For Prometheus
Operator](https://prometheus-operator.dev/) based solutions, you can create a
ServiceMonitor
for it like this:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: k0s
namespace: k0s-system
spec:
endpoints:
- port: http
selector:
matchLabels:
app: k0s-observability
component: pushgateway
k0s.k0sproject.io/stack: metrics
Note that it won't clear alerts like "KubeControllerManagerDown" or "KubeSchedulerDown" as they are based on Prometheus' internal "up" metrics. But you can get rid of these alerts by modifying them to detect a working component like this:
absent(apiserver_audit_event_total{job="kube-scheduler"})
Jobs#
The list of components which is scrapped by k0s:
- kube-scheduler
- kube-controller-manager
- etcd
- kine
Note: kube-apiserver metrics are not scrapped since they are accessible via kubernetes
endpoint within the cluster.
Architecture#
k0s uses pushgateway with TTL to make it possible to detect issues with the metrics delivery. Default TTL is 2 minutes.