How GenAI agents can help with multi-step cluster diagnostics on Kubernetes
This article was co-authored with Shankar Ganeshan. You can find him on LinkedIn here.
k8x
.Symptom | Usual manual steps |
---|---|
Pods in CrashLoopBackOff | • kubectl get pods -A → copy failing names • kubectl logs … (repeat per pod & container) • grep for hints, cross-check ConfigMaps |
503s on an Ingress | • Check Service endpoints → kube-proxy rules → NetworkPolicy • Compare readiness probes & resource pressure |
”myservice works on most nodes” | • Inspect taints/labels, daemonsets, CNI logs • Describe nodes for kernel versions & allocatable |
kubectl get …
, kubectl describe …
, maybe kubectl top …
.k8x
works in your console with the your current kubectl
configuration, to perform autonomous, multi-step workflows to detect and troubleshoot kubernetes issues with your credentials.
Principle | Experience | Why it builds trust |
---|---|---|
Read-only by default (v0.1) | Zero risk of deletions; mirrors commands you could type | Safely trial AI before granting mutate rights ([GitHub][3]) |
Plain kubectl under the hood | Familiar audit trail; works anywhere your kubeconfig does | No proprietary sidecars or admission webhooks |
Multi-LLM back-end | Select OpenAI, Claude, or Gemini at k8x configure | Avoid vendor lock-in; keep traffic in-house |
Command history & undo | k8x history list shows past sessions | Auditors see exactly what ran; SREs replay in staging |
kubectl
queries.Tool | Domain | Autonomy | Local-first? | Write access |
---|---|---|---|---|
GitHub Copilot Chat | Code / CI | Suggests fixes, runs queries in UI | No | Optional PR commits |
Claude Code | Code & CLI automation | Plans & edits files | Yes | Yes (file edits) |
Goose | Multi-agent dev tasks | Runs terminal commands | Yes | Yes |
k8x (v0.1) | Kubernetes operations | Plans & executes kubectl reads | Yes | No (read-only) |
kubectl diff
) and let humans --approve
.terraform plan
or aws eks update-kubeconfig
as sub-steps.