Practical, day-2 k8s content. Concepts explain what and whyGuides explain how: how to use the tools, how to recover from breakage, how to operate against non-functional requirements, and how to ship code to production.

If you’re new to k8s, read Concepts first.

The five sections

SectionWhat it coversStatus
toolsCLI / TUI / debugging UIs (kubectl, k9s, Lens, multi-cluster workflows)🟡 Partial
troubleshootingIssue → diagnosis → fix playbooks for the most common cluster problems🟡 Partial
non-functionalNFRs: scale, cost, HA, performance, security baseline, backup, upgrades, multi-tenancy🟠 Stub phase
deliveryHow code reaches prod: GitOps, Helm/Kustomize, CI/CD pipelines, progressive delivery🟢 Solid (helm), 🟠 rest stub
networkingIngress, Gateway API, service mesh (the practical/network side, not the L04 concepts)🟡 Partial

Section summaries

tools/

Working with k8s day-to-day. The CLI, the TUIs, the multi-cluster context switches.

troubleshooting/

Issue-driven playbooks. “My pod is stuck in CrashLoopBackOff” → “check X, then Y, then Z”. Each note follows the same shape: symptom → diagnosis → fix → gotchas.

  • crashloop-backoff
  • pod-eviction — Pending pods that won’t schedule
  • networking — Service unreachable, DNS resolution failures
  • image-pull — ImagePullBackOff, registry auth
  • node-not-ready — Node conditions, kubelet logs
  • storage — PVC stuck, RWX issues
  • helm — release failures, hooks, drift
  • gitops — Argo CD sync errors, drift, missing apps
  • istio-linkerd — mesh-specific failures (sidecar injection, mTLS)

non-functional/

NFRs as standalone deep-dives. Each note is a practical operating guide for one axis of cluster quality.

  • auto-scaling — HPA / VPA / CA / Karpenter / KEDA
  • cost-optimization — rightsizing, spot, cluster autoscaler
  • high-availability — control plane, multi-AZ, PDBs
  • performance-tuning — resource limits, QoS, JVM/GC, kernel tuning
  • security-baseline — PSA, NetworkPolicy default-deny, image policy, Kyverno, OPA, Checkov
  • backup-restore ✅ (Velero, etcd, managed-service backup)
  • disaster-recovery — RTO/RPO, multi-region
  • multi-tenancy — namespaces, Projects, virtual clusters
  • chaos-engineering — Chaos Mesh, Litmus, steady-state hypothesis
  • upgrade-strategy — kubeadm, EKS, GKE version paths
  • deprecations — k8s 1.29+ removals, what to watch
  • oidc-integration — Dex, Keycloak, Pinniped

delivery/

How code reaches production.

  • basics — what GitOps is and isn’t
  • argo-cd — Argo CD
    • best-practices, image-updater, app-of-apps, multi-tenancy (Projects/AppSets), troubleshooting
  • helm — package & deploy ✅ (all 10 notes solid)
  • kustomize — overlay/patch model
  • argo-workflows — K8s-native pipelines
  • argo-rollouts — canary, blue/green
  • ci-cd-integration — GitHub Actions / GitLab CI / buildkit / kaniko, image signing, scanning

networking/

Practical / network-side notes. Complements L04 concepts with hands-on controller configuration.

  • envoy-gateway — Gateway API implementation ✅
  • traefik — Traefik ingress controller
  • nginx — NGINX ingress controller
  • gateway-api — overview, points to envoy-gateway
  • service-mesh — overview
    • istio
    • linkerd
    • comparison — istio vs linkerd vs cilium service mesh

Status legend

  • ✅ Done — comprehensive content (200+ lines)
  • 🟡 Partial — substantial but not deep (100-200 lines) or mid-thin with growth plan
  • 🟠 Stub phase — placeholder, expansion planned
  • ⚪ Empty — placeholder