This article was co-authored with Shankar Ganeshan.When you’re setting up a Kubernetes cluster, DevOps and Platform engineers feel like they’re navigating a maze. Take a simple service deployment: you might have to check the deployment (
kubectl get deployment
), the service (kubectl get service
), inspect events (kubectl describe pod
), the logs (kubectl logs pod-name
), and even Ingress rules (kubectl get ingress
). Each step requires context, copy-pasting names and ids from one place into another. A single typo can leave you scratching your head. Worse, incomplete information can lead you down the wrong path and waste hours. The complexity of multi-step diagnostics, especially when things go wrong, can be overwhelming.
Why Kubernetes troubleshooting still hurts and why we built k8x
.
Even veteran SREs know the familiar pain cycle:
Symptom | Usual manual steps |
---|---|
Pods in CrashLoopBackOff | • kubectl get pods -A → copy failing names • kubectl logs … (repeat per pod & container) • grep for hints, cross-check ConfigMaps |
503s on an Ingress | • Check Service endpoints → kube-proxy rules → NetworkPolicy • Compare readiness probes & resource pressure |
”myservice works on most nodes” | • Inspect taints/labels, daemonsets, CNI logs • Describe nodes for kernel versions & allocatable |
The agentic leap: from suggestion to orchestrated review
Recent agents like GitHub Copilot Chat (in VS Code w/ terminal access), Claude Code (terminal-native edits) and Goose showed a new pattern: the LLM drives an interactive loop—executes safe commands autonomously, then narrates the findings. General-purpose LLM helpers (ChatGPT, Claude, Copilot Chat) can go beyond suggesting commands - they can copy-paste, re-run, and stitch results together. k8x applies this agentic idea to Kubernetes:- Natural-language prompts → e.g. “Find pods that aren’t ready and tell me why.”
- The agent plans a sequence:
kubectl get …
,kubectl describe …
, maybekubectl top …
. - It executes those read-only commands, parses output, and reasons about root causes.
- Results appear as an explanation first, with raw command logs one keystroke away.
Design choices that matter to operators
k8x
works in your console with the your current kubectl
configuration, to perform autonomous, multi-step workflows to detect and troubleshoot kubernetes issues with your credentials.
Principle | Experience | Why it builds trust |
---|---|---|
Read-only by default (v0.1) | Zero risk of deletions; mirrors commands you could type | Safely trial AI before granting mutate rights ([GitHub][3]) |
Plain kubectl under the hood | Familiar audit trail; works anywhere your kubeconfig does | No proprietary sidecars or admission webhooks |
Multi-LLM back-end | Select OpenAI, Claude, or Gemini at k8x configure | Avoid vendor lock-in; keep traffic in-house |
Command history & undo | k8x history list shows past sessions | Auditors see exactly what ran; SREs replay in staging |
A day in the life with k8x
How a multi-step review actually works
- Intent parsing - Translates English prompts into an internal diagnostic goal.
- Planning - LLM selects a safe chain of read-only
kubectl
queries. - Adaptive execution - After each command, it decides if deeper queries are needed.
- Reasoning & templated explanations - Maps results to known issue patterns for a deterministic, auditable summary.
Where k8x stands in the AI-ops landscape
Tool | Domain | Autonomy | Local-first? | Write access |
---|---|---|---|---|
GitHub Copilot Chat | Code / CI | Suggests fixes, runs queries in UI | No | Optional PR commits |
Claude Code | Code & CLI automation | Plans & edits files | Yes | Yes (file edits) |
Goose | Multi-agent dev tasks | Runs terminal commands | Yes | Yes |
k8x (v0.1) | Kubernetes operations | Plans & executes kubectl reads | Yes | No (read-only) |
Getting started in 60 seconds
Open Source
k8x is Apache 2.0-licensed and available on GitHub. We’re looking for contributors to help build out the next features, including:- v0.2 - Declarative fixes
- Generate a patch plan (
kubectl diff
) and let humans--approve
.
- Generate a patch plan (
- Support ArgoCD and other k8s tools
- Integrate with ArgoCD for GitOps workflows.
- Use
kubectl apply
to update resources based on agent suggestions.
- Terraform & cloud-CLI mode
- Run
terraform plan
oraws eks update-kubeconfig
as sub-steps.
- Run
- Cluster runbooks as code
- Store successful sessions as YAML recipes to auto-trigger on alerts.