PodDisruptionBudget (PDB)
“https://kubernetes.io/docs/concepts/workloads/pods/disruptions/”
A PodDisruptionBudget limits how many Pods in a set can be simultaneously unavailable during voluntary disruption. Voluntary disruption = initiated by a person or a controller (a kubectl drain, a cluster autoscaler removing a node, an operator rolling out a change). It does NOT cover involuntary disruption (node failure, OOM, network partition).
Table of Contents
- What PDBs Protect Against
- The Two Specifications: minAvailable vs maxUnavailable
- The Math in Depth
- The Eviction API
- PDB Status and Conditions
- The Pod Eviction Policy (k8s 1.26+)
- PDB and HPA: The Scale-Down Interaction
- PDB and Cluster Autoscaler / Karpenter
- PDB Selector Matching
- PDB Status Fields in Depth
- AlwaysAllow and Other Edge Cases
- Common Patterns
- Operations and Debugging
- Gotchas and Common Mistakes
1. What PDBs Protect Against
Voluntary disruption is any disruption initiated by a person or a controller. The eviction API is what enforces it, and the eviction API respects PDBs.
| Action | Voluntary? | PDB respected? |
|---|---|---|
kubectl drain | Yes | Yes |
| Cluster Autoscaler removing a node | Yes | Yes |
| Karpenter consolidation | Yes | Yes |
| HPA scale-down | Yes | Yes (via eviction API) |
kubectl delete pod | Yes | Yes (via eviction API) |
kubectl delete pod --force --grace-period=0 | Yes | No — bypasses PDB |
| Node failure | No (involuntary) | No |
| Pod OOM-kill | No | No |
| kubelet kills a stuck Pod | No | No |
| Network partition | No | No |
The eviction API is the gate. The kubelet doesn’t use it for involuntary disruption — when a node fails, the Pods die regardless of PDB. PDB only constrains what the eviction API allows.
2. The Two Specifications: minAvailable vs maxUnavailable
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web
spec:
minAvailable: 2 # or
# maxUnavailable: 1 # pick one
selector:
matchLabels:
app: web2.1 minAvailable
At least N Pods must be available at all times during disruption. The PDB is satisfied as long as currentAvailable >= minAvailable.
2.2 maxUnavailable
At most N Pods can be unavailable at a time. The PDB is satisfied as long as currentUnavailable <= maxUnavailable.
2.3 Which to use
minAvailablefor “I always need N available” services (e.g. quorum-based, databases with replicas).maxUnavailablefor “I can tolerate some being gone” services (e.g. stateless HTTP).
They are equivalent for simple cases (e.g. minAvailable: 2 on a Deployment with replicas: 3 is the same as maxUnavailable: 1).
The difference shows up with percentages:
minAvailable: 50%onreplicas: 3rounds up to 2.maxUnavailable: 50%onreplicas: 3rounds down to 1 (but can be at most 1; the same asminAvailable: 2).
2.4 Percentages
spec:
minAvailable: 75% # round up
# maxUnavailable: 25%Percentages are of the Pods matching the selector. If the selector matches 4 Pods, minAvailable: 75% rounds up to 3.
For maxUnavailable: 25%, that rounds down to 1.
3. The Math in Depth
The PDB’s status field shows the math:
status:
currentHealthy: 5
desiredHealthy: 4
expectedPods: 5
disruptionsAllowed: 1expectedPods— Pods matching the selector that exist.currentHealthy— Pods matching the selector that are currently Ready.desiredHealthy— the floor (or ceiling) from the spec.disruptionsAllowed— how many more Pods can be voluntarily disrupted.
The eviction API checks: currentHealthy - 1 >= desiredHealthy for each eviction. If yes, allow. If no, reject.
3.1 The math for minAvailable: 2 on replicas: 3
expectedPods = 3
currentHealthy = 3 (initially)
desiredHealthy = 2
disruptionsAllowed = currentHealthy - desiredHealthy = 1
Eviction of Pod 1:
currentHealthy = 2
2 - 1 = 1, which is < 2 (desiredHealthy)
→ reject
Result: no evictions allowed while all 3 are healthy.
Wait, that doesn’t make sense. Let me re-check.
Actually, the math is:
disruptionsAllowed= how many Pods can be disrupted while still satisfying the PDB.- For
minAvailable: 2onreplicas: 3: you can disrupt 1 (currentHealthy 3, after disruption 2 = minAvailable). - For
minAvailable: 3onreplicas: 3: you can disrupt 0.
Let me redo:
expectedPods = 3
minAvailable = 2
desiredHealthy = 2
disruptionsAllowed = expectedPods - minAvailable = 1
If we evict 1 Pod: expectedPods = 2 (because the Pod is gone), minAvailable still 2.
1 disruption allowed = we can do 1 eviction. After the eviction, the math resets.
Actually the math in the apiserver is:
disruptionsAllowed = max(0, currentHealthy - desiredHealthy)
Hmm, the exact algorithm depends on the k8s version. The practical effect:
- For
minAvailable: 2and 3 healthy Pods: 1 disruption allowed. - For
minAvailable: 2and 2 healthy Pods: 0 disruptions allowed.
3.2 The math for maxUnavailable: 1 on replicas: 3
expectedPods = 3
maxUnavailable = 1
desiredHealthy = 3 - 1 = 2
disruptionsAllowed = expectedPods - desiredHealthy = 1
Same as minAvailable: 2 in this case.
3.3 The “0 disruptions allowed” deadlock
If minAvailable: 3 on replicas: 3, the PDB says “all 3 must be available”. disruptionsAllowed = 0. No voluntary eviction is allowed.
This is a common deadlock. A single-replica Deployment with minAvailable: 1 can’t be drained.
4. The Eviction API
The eviction API is the mechanism that respects PDBs. It’s a special endpoint:
POST /api/v1/namespaces/<ns>/pods/<pod>/evictionOr via kubectl:
kubectl drain node-1
# internally calls the eviction API for each Pod
kubectl delete pod <pod> --grace-period=30
# also uses the eviction APIThe eviction API:
- Checks the PDB for the Pod.
- If the eviction would violate the PDB, returns 403 Forbidden.
- The caller (drain, autoscaler) backs off and retries.
4.1 Bypassing the PDB
# bypass the eviction API; force-delete the Pod
kubectl delete pod <pod> --force --grace-period=0The --force flag bypasses the eviction API. The Pod is deleted regardless of PDB. This is the only way to evict a Pod when the PDB would block.
Use --force only when:
- A node is truly gone and you need to evict stuck Pods.
- The cluster autoscaler is stuck because of a deadlock.
- You accept the disruption.
4.2 The grace period
The eviction API supports a gracePeriodSeconds parameter. The Pod’s terminationGracePeriodSeconds is the upper bound; the eviction can be more aggressive.
# evict with a 0-second grace period
kubectl delete pod <pod> --grace-period=0
# (but the PDB is still respected)--force --grace-period=0 is the only way to bypass the PDB.
5. PDB Status and Conditions
The PDB has a status field:
status:
observedGeneration: 1
disruptionsAllowed: 1
currentHealthy: 5
desiredHealthy: 4
expectedPods: 5
conditions:
- type: SufficientPods
status: "True"
reason: ""
message: ""
lastTransitionTime: "2024-01-15T12:00:00Z"
- type: DisruptionAllowed
status: "True"
reason: ""
message: ""
lastTransitionTime: "2024-01-15T12:00:00Z"SufficientPods— does the current state satisfy the PDB?DisruptionAllowed— is a disruption allowed right now?
If DisruptionAllowed: False, no new voluntary disruption is allowed.
5.1 The “PDB blocking” indicator
kubectl get pdb -A
# NAME MIN MAX ALLOWED DISRUPTIONS AGE
# web 2 - 1 1 30d
# db - 1 0 0 30d
# ALLOWED DISRUPTIONS = 0 means next drain will blockThis is the key field to check before a drain.
6. The Pod Eviction Policy (k8s 1.26+)
A Pod Eviction Policy is set in the PDB’s status.conditions and lets you control how unhealthy Pods are counted:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: web }
spec:
minAvailable: 2
selector:
matchLabels: { app: web }
status:
observedGeneration: 1
disruptionsAllowed: 1
currentHealthy: 5
desiredHealthy: 4
expectedPods: 5
conditions:
- type: DisruptionAllowed
status: "True"But by default, unhealthy Pods are still counted as expected. If a Pod is CrashLoopBackOff, it’s in expectedPods but not in currentHealthy. The PDB may block eviction because the math is “off”.
The unhealthyPodEvictionPolicy field (in policy/v1 in 1.26+) controls this:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: web }
spec:
minAvailable: 2
unhealthyPodEvictionPolicy: IfHealthyBudget # default
selector:
matchLabels: { app: web }Two values:
IfHealthyBudget— the PDB only blocks if the Pod is healthy. If the Pod is already unhealthy (CrashLoopBackOff, NotReady), the eviction is allowed.AlwaysAllow— the PDB never blocks the eviction of an unhealthy Pod. The PDB only protects healthy Pods.
6.1 The use case
A Pod is CrashLoopBackOff. You want to drain the node. Without IfHealthyBudget, the PDB blocks the eviction (the Pod is in expectedPods but not in currentHealthy).
With IfHealthyBudget, the eviction is allowed because the Pod is already unhealthy. The PDB’s protection is only for the healthy Pods.
This is a critical fix for stuck drains.
7. PDB and HPA: The Scale-Down Interaction
HPA scale-down is a voluntary disruption. The eviction API is called, and the PDB is checked.
HPA: 5 → 3 Pods
HPA calls the eviction API for 2 Pods
Eviction API checks PDB:
- If 3 Pods are still satisfying the PDB, the evictions are allowed
- If not, the evictions are blocked
7.1 The deadlock
Deployment: 3 replicas
PDB: minAvailable: 2
HPA: target CPU 60%, current CPU 50%
HPA: scale to 2 replicas
HPA wants to scale to 2. The eviction API says “2 evictions would leave 1 healthy, but PDB says minAvailable 2”. The HPA is blocked.
The HPA controller retries, fails, and the Pods stay at 3.
This is a common deadlock. Fix: reduce minAvailable (e.g. to 1) or increase HPA’s minReplicas.
8. PDB and Cluster Autoscaler / Karpenter
Cluster Autoscaler and Karpenter both call the eviction API when removing a node. PDBs are respected.
8.1 The CA scale-down deadlock
PDB: minAvailable: 2 on a Deployment with 3 replicas
CA wants to scale down a node that has 1 of those Pods
CA evicts the Pod → PDB says no → CA fails
CA retries with backoff. If the PDB stays unsatisfiable, CA leaves the node behind. The cluster is over-provisioned.
The fix: ensure your PDBs are achievable. minAvailable: N-1 on a Deployment with N replicas is safe. minAvailable: N is a deadlock.
8.2 Karpenter and consolidation
Karpenter’s consolidation tries to drain nodes. The drain respects PDBs. If a drain is blocked by a PDB, Karpenter skips the consolidation and tries a different node.
Karpenter has disruption.budgets (in the NodePool) to rate-limit its own disruptions. The PDB is in addition to that.
9. PDB Selector Matching
A PDB’s selector must match the Pods you want to protect:
spec:
selector:
matchLabels: { app: web } # exact match
# OR
selector:
matchExpressions:
- key: tier
operator: In
values: [production]If the selector matches no Pods, the PDB is a no-op (it protects nothing).
If the selector matches Pods but the Pods aren’t Ready, the PDB counts them as not healthy. The math may still work — depending on unhealthyPodEvictionPolicy.
10. PDB Status Fields in Depth
10.1 disruptionsAllowed
The number of Pods that can be voluntarily disrupted right now. Computed by the apiserver.
For minAvailable: N: disruptionsAllowed = currentHealthy - N (or 0 if negative).
For maxUnavailable: N: disruptionsAllowed = N - currentUnavailable (or 0 if negative).
10.2 currentHealthy
The Pods matching the selector that are Ready (passing readiness probes).
A Pod that doesn’t have a readiness probe is always Ready (no probe = always passes).
A Pod that has a readiness probe and is failing is not Ready → not in currentHealthy.
10.3 expectedPods
The total number of Pods matching the selector. Includes unhealthy ones (without unhealthyPodEvictionPolicy: IfHealthyBudget or AlwaysAllow).
10.4 desiredHealthy
The minimum (for minAvailable) or maximum (for maxUnavailable) number of healthy Pods.
For minAvailable: 2: desiredHealthy = 2.
For maxUnavailable: 1 on expectedPods: 3: desiredHealthy = 2 (3 - 1).
11. AlwaysAllow and Other Edge Cases
11.1 AlwaysAllow
A PDB with no minAvailable and no maxUnavailable (or set to 0) is AlwaysAllow — it allows all voluntary disruption.
Don’t do this. It’s the same as having no PDB at all.
11.2 minAvailable: 0
A PDB with minAvailable: 0 is AlwaysAllow. No Pods need to be available. Same as no PDB.
11.3 maxUnavailable: 100%
A PDB with maxUnavailable: 100% allows all Pods to be unavailable. Same as no PDB.
11.4 Empty selector
A PDB with no selector (or matchLabels: {}) matches all Pods in the namespace. This is rarely what you want. A single PDB that covers all Pods is almost always wrong.
12. Common Patterns
12.1 Stateful workload
# StatefulSet
spec:
replicas: 3
template: {...}
# PDB
spec:
minAvailable: 2 # always have 2 of 3
selector:
matchLabels: { app: db }Loses 1 of 3, still has 2. Loses 2 of 3, PDB blocks further disruption.
12.2 Stateless HTTP service
spec:
minAvailable: 50% # always have majority
selector:
matchLabels: { app: web }For a 6-replica Deployment, 3 must be available. For a 4-replica, 2 must be available (rounding).
12.3 Critical singleton
spec:
minAvailable: 1
selector:
matchLabels: { app: critical }A single-replica Deployment. PDB says “the 1 Pod can’t be evicted”. Drain blocks. Use only for truly critical singletons.
12.4 The “PDB + HPA” pattern
# HPA: 3-20 replicas
spec:
minReplicas: 3
maxReplicas: 20
# PDB: always have 2
spec:
minAvailable: 2
selector:
matchLabels: { app: web }HPA can scale from 3 to 20 freely. PDB says at least 2 must be available. HPA scale-down stops at 2 (PDB blocks scale below 2).
If the cluster needs to evict more aggressively (e.g. for a drain), PDB blocks it. This is by design.
13. Operations and Debugging
13.1 Common commands
# list PDBs
kubectl get pdb -A
# shows NAME, MIN, MAX, ALLOWED DISRUPTIONS, AGE
# describe
kubectl describe pdb <name>
# shows selector, current state, conditions
# check if a drain is allowed
kubectl get pdb -A -o custom-columns='NAME:.metadata.name,ALLOWED:.status.disruptionsAllowed,DESIRED:.status.desiredHealthy,CURRENT:.status.currentHealthy'13.2 The “drain stuck” checklist
# 1. Is the PDB allowing disruptions?
kubectl get pdb -A
# ALLOWED DISRUPTIONS = 0 means the next drain will block
# 2. Are the Pods Ready?
kubectl get pods -l <selector>
# all should be 1/1 Ready
# 3. Are the Pods on the node being drained?
kubectl get pods -A --field-selector spec.nodeName=<node>
# 4. Are the PDB conditions showing "DisruptionAllowed: False"?
kubectl describe pdb <name>
# 5. Is the PDB satisfied?
# If minAvailable: 2 and 1 Pod is Ready, the PDB is unsatisfied
# → drain will block13.3 The “PDB not blocking” case
If a Pod is being evicted despite a PDB, the cause is usually --force --grace-period=0. Check the eviction call:
# was the eviction forced?
kubectl get events --field-selector reason=Evicted -A
# look for the pod's events14. Gotchas and Common Mistakes
14.1 The 25+ common mistakes
-
PDB requires a matching Pod selector. If your selector doesn’t match any Pods, the PDB has no effect. Always check
kubectl get pdbshows non-zero currentHealthy. -
PDB is enforced by the eviction API, not by the scheduler. The scheduler doesn’t know about PDBs. It places Pods based on resources and constraints, regardless of PDBs.
-
maxUnavailable: 0andminAvailable: replicasare the same thing and equally dangerous. The node is unevictable; the autoscaler will leave the node behind. -
PDB is not a substitute for HPA. PDB is about availability during planned disruption, not about handling load. You can have 100% availability during a drain and still get killed by traffic spikes.
-
PDB does not block voluntary disruption completely. It controls the rate, not the possibility. A
kubectl delete podstill works — it just makes the next drain fail. -
PDB status is reflected in the PDB object itself —
kubectl get pdbshowsALLOWED DISRUPTIONS. If that’s 0, the next drain will block. -
Eviction API is what drain / autoscaler call. A forced Pod deletion via
kubectl delete pod --force --grace-period=0bypasses the PDB. Use with caution. -
PDB with
minAvailable: 100%is the same asminAvailable: replicas. Don’t set 100% — it’s the deadlock. -
PDB with
minAvailable: 0is the same as no PDB. Useless. -
PDB on a Deployment with
replicas: 1means the Pod can’t be drained. The drain will block. -
PDB with
selector: {}matches all Pods in the namespace. Rarely correct. Be specific. -
PDB with
selectormatching multiple Deployments is a single PDB for all of them. The math is across all Pods. Easy to misconfigure. -
PDB is
policy/v1since k8s 1.21. Olderpolicy/v1beta1is removed. Make sure manifests are onv1. -
PDB’s
unhealthyPodEvictionPolicy: IfHealthyBudget(default in v1, k8s 1.26+) lets you evict unhealthy Pods even when the PDB math would block. Critical for stuck drains. -
PDB doesn’t know about Pods that are about to be created. A HPA scale-up to satisfy
minAvailable: 2works (the new Pod is created before the old one is evicted). But if the scale-up fails, the eviction is blocked. -
PDB + HPA scale-down deadlock is common. PDB says minAvailable 2, HPA wants to scale to 1. The eviction API blocks. Fix: relax the PDB or set higher minReplicas on HPA.
-
PDB + Cluster Autoscaler scale-down deadlock is the same. CA wants to remove a node, PDB blocks. The CA retries, fails, leaves the node.
-
PDB +
kubectl delete poduses the eviction API. PDB is checked. Use--forceto bypass. -
PDB +
kubectl delete deploymentdeletes all Pods. The PDB is checked for each eviction. The deployment is deleted regardless, but the Pod deletions may block briefly. Final state: Pods are gone. -
PDB +
kubectl delete namespacedeletes all Pods in the namespace. PDBs in the namespace are also deleted (with the namespace). The Pod deletions may block briefly until the namespace is gone. Then the Pods are force-deleted. -
PDB with
selector: { app: foo }and a Deployment withapp: foomatches. With a Deploymentapp: foo, tier: web(more labels), the selector still matches. Match is by the selector’s labels, not exact match. -
A PDB can have multiple selectors (matchLabels + matchExpressions). All must match.
-
PDB’s
disruptionsAllowedis computed by the apiserver. It updates on Pod state changes. There’s a small lag. -
A PDB with
maxUnavailable: 1on a Deployment withreplicas: 1is the same asminAvailable: 1. Single-replica, undrainable. -
PDB’s conditions can be
DisruptionAllowed: Falseeven when disruptionsAllowed > 0. This is rare and usually means a transient issue (selector doesn’t match, etc.).
See also
- HPA — PDB interaction with HPA scale-down
- Karpenter — PDB interaction with consolidation
- CA — PDB interaction with CA scale-down
- Priority & Preemption — PDBs can block preemption
- Resource Requests & Limits — readiness probes for currentHealthy