This article was co-authored with Shankar Ganeshan.
When you’re setting up a Kubernetes cluster, DevOps and Platform engineers feel like they’re navigating a maze. Take a simple service deployment: you might have to check the deployment (kubectl get deployment), the service (kubectl get service), inspect events (kubectl describe pod), the logs (kubectl logs pod-name), and even Ingress rules (kubectl get ingress). Each step requires context, copy-pasting names and ids from one place into another. A single typo can leave you scratching your head. Worse, incomplete information can lead you down the wrong path and waste hours. The complexity of multi-step diagnostics, especially when things go wrong, can be overwhelming.

Why Kubernetes troubleshooting still hurts and why we built k8x.

Even veteran SREs know the familiar pain cycle:
SymptomUsual manual steps
Pods in CrashLoopBackOffkubectl get pods -A → copy failing names
kubectl logs … (repeat per pod & container)
• grep for hints, cross-check ConfigMaps
503s on an Ingress• Check Service endpoints → kube-proxy rules → NetworkPolicy
• Compare readiness probes & resource pressure
myservice works on most nodes”• Inspect taints/labels, daemonsets, CNI logs
• Describe nodes for kernel versions & allocatable
Each scenario is multi-step, context-heavy, and needs you to track context across multiple kubernetes commands. You need to keep a note of resource ids to use them in other resource descriptions and logs to know what’s going on.

The agentic leap: from suggestion to orchestrated review

Recent agents like GitHub Copilot Chat (in VS Code w/ terminal access), Claude Code (terminal-native edits) and Goose showed a new pattern: the LLM drives an interactive loop—executes safe commands autonomously, then narrates the findings. General-purpose LLM helpers (ChatGPT, Claude, Copilot Chat) can go beyond suggesting commands - they can copy-paste, re-run, and stitch results together. k8x applies this agentic idea to Kubernetes:
  • Natural-language prompts → e.g. “Find pods that aren’t ready and tell me why.”
  • The agent plans a sequence: kubectl get …, kubectl describe …, maybe kubectl top ….
  • It executes those read-only commands, parses output, and reasons about root causes.
  • Results appear as an explanation first, with raw command logs one keystroke away.
Unlike code-centric tools, k8x is infra-native. It understands resource kinds, status fields, events, and failure taxonomies (image pull, scheduling, OOMKilled).

Design choices that matter to operators

k8x works in your console with the your current kubectl configuration, to perform autonomous, multi-step workflows to detect and troubleshoot kubernetes issues with your credentials.
PrincipleExperienceWhy it builds trust
Read-only by default (v0.1)Zero risk of deletions; mirrors commands you could typeSafely trial AI before granting mutate rights ([GitHub][3])
Plain kubectl under the hoodFamiliar audit trail; works anywhere your kubeconfig doesNo proprietary sidecars or admission webhooks
Multi-LLM back-endSelect OpenAI, Claude, or Gemini at k8x configureAvoid vendor lock-in; keep traffic in-house
Command history & undok8x history list shows past sessionsAuditors see exactly what ran; SREs replay in staging
There’s more to come, including write permissions, parallelism, etc. Conrtributions are welcome.

A day in the life with k8x

# 1️⃣ Something is off
default$ k8x -c "my checkout service is returning 502s"

# 2️⃣ Agent plan (condensed)
 Check Ingress status
 Verify Service endpoints
 Scan pod readiness & logs
 Examine recent HPA events

# 3️⃣ Summary
 2/5 endpoints unhealthy
 Pods stuck in Init:CrashLoopBackOff (db-migrations)
 Migration container fails on `ALTER TABLE …` (lock timeout)
Suggested fix: run `kubectl exec` into db-migration-pod or scale replicas to 0/1 to release lock
In ~30 seconds, you get an actionable story instead of fifteen manual commands.

How a multi-step review actually works

  1. Intent parsing - Translates English prompts into an internal diagnostic goal.
  2. Planning - LLM selects a safe chain of read-only kubectl queries.
  3. Adaptive execution - After each command, it decides if deeper queries are needed.
  4. Reasoning & templated explanations - Maps results to known issue patterns for a deterministic, auditable summary.
Only redacted command output reaches the LLM—no full application logs—another trust measure.

Where k8x stands in the AI-ops landscape

ToolDomainAutonomyLocal-first?Write access
GitHub Copilot ChatCode / CISuggests fixes, runs queries in UINoOptional PR commits
Claude CodeCode & CLI automationPlans & edits filesYesYes (file edits)
GooseMulti-agent dev tasksRuns terminal commandsYesYes
k8x (v0.1)Kubernetes operationsPlans & executes kubectl readsYesNo (read-only)
k8x fills the gap for platform and DevOps engineers looking for Copilot-level assistance after deployment, not just in CI/CD.

Getting started in 60 seconds

brew tap aihero/k8x
brew install k8x         # installs v0.1.1
k8x configure            # choose LLM & set API key
k8x -c "Are all pods running?"
You’ll never look at a 3-screen tmux layout the same way again.

Open Source

k8x is Apache 2.0-licensed and available on GitHub. We’re looking for contributors to help build out the next features, including:
  • v0.2 - Declarative fixes
    • Generate a patch plan (kubectl diff) and let humans --approve.
  • Support ArgoCD and other k8s tools
    • Integrate with ArgoCD for GitOps workflows.
    • Use kubectl apply to update resources based on agent suggestions.
  • Terraform & cloud-CLI mode
    • Run terraform plan or aws eks update-kubeconfig as sub-steps.
  • Cluster runbooks as code
    • Store successful sessions as YAML recipes to auto-trigger on alerts.

Final thoughts

Generative-AI agents are moving from IDEs into production infrastructure. By combining LLM planning, policy-guarded execution, and domain-specific reasoning, k8x transforms Kubernetes troubleshooting from a scavenger hunt into a guided review. Start with read-only diagnostics today; when you’re ready, the agent will apply fixes—one audited pull request at a time.

Get Connected, Share, and Other Socials