Kubernetes AI conformance#
This document summarizes how k0s satisfies the Kubernetes AI Conformance requirements for Kubernetes v1.35. It is a reviewer-facing evidence page built from captured test runs on k0s v1.35.3+k0s.0.
Test environment#
The evidence below was captured on a two-node k0s cluster running on Azure:
- k0s v1.35.3+k0s.0
- Kubernetes v1.35.3
- One controller node and one NVIDIA T4 worker node
- GPU Operator v26.3.1 for NVIDIA runtime, device plugin, and DCGM Exporter
DRA support#
Requirement: Dynamic Resource Allocation APIs must be available and usable for accelerator allocation.
Status on k0s: Implemented.
k0s v1.35.3+k0s.0 ships Kubernetes v1.35.3, where DRA is available at resource.k8s.io/v1 and enabled by default.
We validated the full DRA flow with the upstream dra-example-driver: a ResourceClaimTemplate was created, the scheduler allocated a simulated GPU device, and the consuming Pod received the allocated device metadata inside the container.
This shows not only that the API surface exists, but that the allocation path is functional end to end on k0s.
Minimal example:
$ kubectl api-resources --api-group=resource.k8s.io
NAME APIVERSION KIND
deviceclasses resource.k8s.io/v1 DeviceClass
resourceclaims resource.k8s.io/v1 ResourceClaim
resourceclaimtemplates resource.k8s.io/v1 ResourceClaimTemplate
resourceslices resource.k8s.io/v1 ResourceSlice
$ kubectl -n dra-demo get resourceclaim
NAME STATE
dra-demo-pod-gpu-nd7zh allocated,reserved
$ kubectl logs -n dra-demo dra-demo-pod | grep GPU_DEVICE_0
GPU_DEVICE_0="gpu-0"
Further reading:
Driver runtime management#
Requirement: The platform should support installation and management of accelerator drivers and runtime components.
Status on k0s: Implemented.
k0s supports NVIDIA driver and runtime management through the NVIDIA GPU Operator.
On the test cluster, the operator installed the NVIDIA driver, configured the NVIDIA runtime for containerd, and enabled GPU access for workloads that opt into the nvidia RuntimeClass.
This is the standard upstream integration path for NVIDIA-backed Kubernetes clusters and works on k0s with containerd runtime configuration.
Minimal example:
$ kubectl -n gpu-operator exec ds/nvidia-driver-daemonset -- \
nvidia-smi --query-gpu=driver_version --format=csv,noheader
580.126.20
$ kubectl logs cuda-smoketest
NVIDIA-SMI 580.126.20 Driver Version: 580.126.20
$ kubectl get runtimeclass
NAME HANDLER
nvidia nvidia
nvidia-cdi nvidia-cdi
Further reading:
GPU sharing#
Requirement: The platform should support mechanisms to share a single physical accelerator among multiple workloads.
Status on k0s: Implemented.
k0s supports GPU sharing through NVIDIA device plugin time-slicing.
On the test cluster, a single Tesla T4 was reconfigured to advertise four schedulable nvidia.com/gpu replicas, and four GPU-requesting Pods ran concurrently on the same physical GPU.
T4 hardware does not support MIG, so time-slicing is the applicable sharing mechanism on this setup.
This demonstrates a real sharing mode on hardware commonly used for inference and smaller training workloads.
Minimal example:
$ kubectl get node k0s-gpu-0 -o jsonpath='{.status.capacity.nvidia\.com/gpu}{"\n"}'
4
$ kubectl get pods -l app=shared-gpu -o wide
NAME READY STATUS NODE
shared-gpu-1 1/1 Running k0s-gpu-0
shared-gpu-2 1/1 Running k0s-gpu-0
shared-gpu-3 1/1 Running k0s-gpu-0
shared-gpu-4 1/1 Running k0s-gpu-0
$ kubectl get node k0s-gpu-0 -o jsonpath='{.metadata.labels.nvidia\.com/gpu\.sharing-strategy}{"\n"}'
time-slicing
Further reading:
Virtualized accelerator#
Requirement: The platform should support virtualized accelerators where available.
Status on k0s: Not Implemented.
k0s does not ship a packaged vGPU integration. Virtualized GPU support depends on NVIDIA vGPU licensing and hardware-specific enablement outside the scope of the k0s distribution. Users can integrate upstream NVIDIA vGPU components on suitable infrastructure, but that path was not demonstrated in this submission.
Further reading:
AI inference#
Requirement: A working Kubernetes Gateway API implementation must support inference traffic management.
Status on k0s: Implemented.
k0s supports Gateway API resources and standard Gateway implementations.
We installed Gateway API v1.5.1 CRDs with Envoy Gateway v1.7.2, then configured a GatewayClass, Gateway, and HTTPRoute to route /predict traffic to a mock inference backend.
Requests matching the configured prefix reached the backend, while non-matching requests were rejected by Envoy.
This provides concrete evidence that k0s can host an inference ingress stack with route attachment and path-based traffic management.
Minimal example:
$ kubectl get gatewayclass eg
NAME CONTROLLER ACCEPTED
eg gateway.envoyproxy.io/gatewayclass-controller True
$ curl -si http://localhost:8888/predict
HTTP/1.1 200 OK
x-app-name: http-echo
{"prediction":0.87}
$ curl -si http://localhost:8888/
HTTP/1.1 404 Not Found
Further reading:
Gang scheduling#
Requirement: The platform must support all-or-nothing scheduling for distributed workloads.
Status on k0s: Implemented.
k0s supports gang scheduling with Volcano. We installed Volcano v1.14.1 and demonstrated both successful atomic placement and full refusal of an oversized gang: a two-task job was bound together, while a five-task job that could not fit was kept pending without partial placement. That combination matters because it shows both sides of gang semantics on k0s: atomic success when capacity exists and no partial placement when it does not.
Minimal example:
$ kubectl get vcjob gang-demo
NAME STATUS MINAVAILABLE
gang-demo Completed 2
$ kubectl get vcjob gang-overflow
NAME STATUS MINAVAILABLE
gang-overflow Pending 5
$ kubectl get pods -l volcano.sh/job-name=gang-overflow
No resources found in default namespace.
Further reading:
Cluster autoscaling#
Requirement: The platform must support scaling accelerator-capable node groups through Kubernetes cluster autoscaling.
Status on k0s: Implemented.
k0s is a Kubernetes distribution and does not bundle its own autoscaler.
Instead, k0s works with the upstream Kubernetes Cluster Autoscaler and the infrastructure-specific provisioning layer behind the cluster, such as Azure VMSS, AWS Auto Scaling Groups, GCP Managed Instance Groups, or Cluster API.
Accelerator-aware scale-up is driven by pending Pods that request nvidia.com/gpu and by the GPU resources advertised through the NVIDIA device plugin.
This is the same model used by other conformant distributions: k0s provides the standard Kubernetes substrate, while autoscaling is supplied by the provisioning layer chosen for the cluster.
Reference configuration:
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set cloudProvider=azure
Further reading:
Pod autoscaling#
Requirement: Horizontal Pod Autoscaler must function correctly for Pods that use accelerators.
Status on k0s: Implemented.
k0s bundles metrics-server, so HPA works out of the box on a default cluster.
We deployed a GPU-requesting Deployment and an autoscaling/v2 HPA targeting CPU utilization.
Under load, HPA increased the Deployment replica count from one to two; the second Pod remained pending because the scheduler correctly enforced single-GPU node capacity.
This is the expected and correct interaction: autoscaling decisions are made from workload metrics, while accelerator capacity is still enforced independently by the scheduler.
Minimal example:
$ kubectl get hpa gpu-burner
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
gpu-burner Deployment/gpu-burner cpu: 497%/50% 1 2 2
$ kubectl get pods -l app=gpu-burner
NAME READY STATUS
gpu-burner-759c8596bf-7gzvr 1/1 Running
gpu-burner-759c8596bf-k8d4x 0/1 Pending
Further reading:
Accelerator metrics#
Requirement: The platform must expose fine-grained accelerator metrics through a standard, machine-readable endpoint.
Status on k0s: Implemented.
k0s supports accelerator metrics through NVIDIA DCGM Exporter, deployed by the GPU Operator.
We integrated DCGM Exporter with kube-prometheus-stack using a ServiceMonitor and verified that Prometheus scraped per-GPU metrics with labels such as UUID, model, PCI bus ID, node name, and driver version.
The important part here is not just that metrics exist, but that they are queryable through the standard Prometheus API with enough labels to identify a specific GPU on a specific node.
Minimal example:
$ curl -s 'http://localhost:9090/api/v1/query?query=DCGM_FI_DEV_GPU_UTIL'
"__name__":"DCGM_FI_DEV_GPU_UTIL"
"UUID":"GPU-df2bfa0a-183e-d378-e4ab-1042a1736a51"
"modelName":"Tesla T4"
"Hostname":"k0s-gpu-0"
"pci_bus_id":"00000001:00:00.0"
Further reading:
AI service metrics#
Requirement: The platform must discover and collect workload metrics in a standard format.
Status on k0s: Implemented.
k0s supports workload metrics collection through Prometheus Operator compatible monitoring stacks.
We installed kube-prometheus-stack, created a ServiceMonitor for a sample inference-shaped workload, and verified that Prometheus discovered the target and stored its metrics with workload-identifying labels.
This demonstrates the standard collection path most AI services use in practice: workload metrics exposed on /metrics, discovered by ServiceMonitor, and stored in Prometheus.
Minimal example:
$ curl -s 'http://localhost:9090/api/v1/query' \
--data-urlencode 'query=up{job="sample-inference"}'
"job":"sample-inference"
"namespace":"default"
"service":"sample-inference"
"value":[..., "1"]
$ curl -s 'http://localhost:9090/api/v1/query' \
--data-urlencode 'query=node_cpu_seconds_total{job="sample-inference"}'
"__name__":"node_cpu_seconds_total"
Further reading:
Secure accelerator access#
Requirement: Accelerator access must be isolated so that only authorized workloads can use the device.
Status on k0s: Implemented.
k0s relies on the standard Kubernetes device plugin model for GPU isolation.
On the test cluster, a Pod without a GPU request and without the nvidia RuntimeClass had no access to NVIDIA tooling or device nodes.
When two Pods requested one GPU each on a single-GPU node, the scheduler admitted one and kept the second pending with Insufficient nvidia.com/gpu.
Together, these checks show both isolation-by-default and capacity enforcement for accelerator access on k0s.
Minimal example:
$ kubectl logs no-gpu-request
sh: 1: nvidia-smi: not found
NO_GPU_ACCESS
$ kubectl get pods gpu-hog-1 gpu-hog-2
NAME READY STATUS
gpu-hog-1 1/1 Running
gpu-hog-2 0/1 Pending
Further reading:
Robust controller#
Requirement: The platform must be able to run at least one complex AI operator with CRDs and reliable reconciliation behavior.
Status on k0s: Implemented.
k0s supports complex AI operators such as KubeRay.
We installed KubeRay v1.6.1, reconciled a RayCluster into head and worker Pods plus supporting Services, ran a distributed Ray task successfully, and then deleted the head Pod to verify that the operator recreated it and restored the cluster to a ready state.
This goes beyond a basic install check: it shows CRD registration, reconciliation of child resources, a successful workload, and recovery after disruption.
Minimal example:
$ kubectl get raycluster raycluster-demo -o wide
NAME DESIRED WORKERS AVAILABLE WORKERS STATUS
raycluster-demo 1 1 ready
$ kubectl get events --field-selector involvedObject.kind=RayCluster \
--sort-by=.lastTimestamp
CreatedHeadPod
CreatedWorkerPod
DeletedHeadPod
CreatedHeadPod
$ kubectl exec -i "$HEAD_POD" -- python -
Result: [0, 1, 4, 9, 16]
Cluster resources: {'CPU': 2.0, ...}
Further reading: