Practical, day-2 k8s content. Concepts explain what and why — Guides explain how: how to use the tools, how to recover from breakage, how to operate against non-functional requirements, and how to ship code to production.
If you’re new to k8s, read Concepts first.
The five sections
| Section | What it covers | Status |
|---|---|---|
| tools | CLI / TUI / debugging UIs (kubectl, k9s, Lens, multi-cluster workflows) | 🟡 Partial |
| troubleshooting | Issue → diagnosis → fix playbooks for the most common cluster problems | 🟡 Partial |
| non-functional | NFRs: scale, cost, HA, performance, security baseline, backup, upgrades, multi-tenancy | 🟠 Stub phase |
| delivery | How code reaches prod: GitOps, Helm/Kustomize, CI/CD pipelines, progressive delivery | 🟢 Solid (helm), 🟠 rest stub |
| networking | Ingress, Gateway API, service mesh (the practical/network side, not the L04 concepts) | 🟡 Partial |
Section summaries
tools/
Working with k8s day-to-day. The CLI, the TUIs, the multi-cluster context switches.
- k9s — terminal UI, the depth benchmark ✅
- kubectl — reference for the CLI
- context-switching — kubeconfig management
- multi-cluster — multi-cluster strategies
troubleshooting/
Issue-driven playbooks. “My pod is stuck in CrashLoopBackOff” → “check X, then Y, then Z”. Each note follows the same shape: symptom → diagnosis → fix → gotchas.
- crashloop-backoff ✅
- pod-eviction — Pending pods that won’t schedule
- networking — Service unreachable, DNS resolution failures
- image-pull — ImagePullBackOff, registry auth
- node-not-ready — Node conditions, kubelet logs
- storage — PVC stuck, RWX issues
- helm — release failures, hooks, drift
- gitops — Argo CD sync errors, drift, missing apps
- istio-linkerd — mesh-specific failures (sidecar injection, mTLS)
non-functional/
NFRs as standalone deep-dives. Each note is a practical operating guide for one axis of cluster quality.
- auto-scaling — HPA / VPA / CA / Karpenter / KEDA
- cost-optimization — rightsizing, spot, cluster autoscaler
- high-availability — control plane, multi-AZ, PDBs
- performance-tuning — resource limits, QoS, JVM/GC, kernel tuning
- security-baseline — PSA, NetworkPolicy default-deny, image policy, Kyverno, OPA, Checkov
- backup-restore ✅ (Velero, etcd, managed-service backup)
- disaster-recovery — RTO/RPO, multi-region
- multi-tenancy — namespaces, Projects, virtual clusters
- chaos-engineering — Chaos Mesh, Litmus, steady-state hypothesis
- upgrade-strategy — kubeadm, EKS, GKE version paths
- deprecations — k8s 1.29+ removals, what to watch
- oidc-integration — Dex, Keycloak, Pinniped
delivery/
How code reaches production.
- basics — what GitOps is and isn’t
- argo-cd — Argo CD
- best-practices, image-updater, app-of-apps, multi-tenancy (Projects/AppSets), troubleshooting
- helm — package & deploy ✅ (all 10 notes solid)
- kustomize — overlay/patch model
- argo-workflows — K8s-native pipelines
- argo-rollouts — canary, blue/green
- ci-cd-integration — GitHub Actions / GitLab CI / buildkit / kaniko, image signing, scanning
networking/
Practical / network-side notes. Complements L04 concepts with hands-on controller configuration.
- envoy-gateway — Gateway API implementation ✅
- traefik — Traefik ingress controller
- nginx — NGINX ingress controller
- gateway-api — overview, points to envoy-gateway
- service-mesh — overview
Status legend
- ✅ Done — comprehensive content (200+ lines)
- 🟡 Partial — substantial but not deep (100-200 lines) or mid-thin with growth plan
- 🟠 Stub phase — placeholder, expansion planned
- ⚪ Empty — placeholder