L08 — Operations

Day-2: things are running, and now you have to keep them running. This level is the troubleshooting flow and the hooks you need to operate a cluster at scale.

What you’ll understand after this level

A systematic troubleshooting flow — from “my pod isn’t working” to “the cluster is down”
The standard set of kubectl debug commands and when to use each
Where logs come from (container stdout, kubelet, control plane)
Where metrics come from (cAdvisor, kubelet, metrics-server, kube-state-metrics)
The most common failure modes and how to recognize them
When to drop down to the node (crictl, journalctl, /var/log)

Notes in this level

|| Note | Status | What’s in it | |------|--------|--------------| | Troubleshooting | ✅ | Decision tree for “my pod isn’t working” — the quick reference | | kubectl Debug Toolkit | ✅ | describe, logs, exec, debug, ephemeral containers — the commands you reach for | | Common Failure Modes | ✅ | Stage-by-stage triage guide, exit codes, escalation checklists | | Metrics Sources | ✅ | Where metrics come from — cAdvisor, kubelet, metrics-server, kube-state-metrics, full stack |

Troubleshooting flow (the 30-second version)

Pod not working?
  ├── Is it scheduled?
  │   └── Pending → resources? taints? affinity? PVC?
  ├── Is it creating?
  │   └── ContainerCreating → image pull? volume? CNI?
  ├── Is it running?
  │   └── Running but app broken → logs, readiness probe, Service, NetworkPolicy
  └── Is it crashing?
      └── CrashLoopBackOff → exit code, previous logs, OOM, probe too aggressive

Where to go next

→ L09 — Advanced: how Kubernetes itself is built — controllers, operators, etcd, internals.

Tooling for observability and log routing (Prometheus, Grafana, Loki, Fluent Bit) lives in Guides — this level is about understanding the data sources, not deploying the stack.

cloudnative wiki

Explorer

L08 — Operations

L08 — Operations

What you’ll understand after this level

Notes in this level

Suggested reading order

Troubleshooting flow (the 30-second version)

Where to go next

Graph View

Table of Contents