The kubelet can’t pull the container image. The pod sits in ImagePullBackOff (or ErrImagePull for the very first attempt) and the kubelet backs off retries. This is registry or credentials, not the application.
Symptoms
$ kubectl get podsNAME READY STATUS RESTARTS AGEweb-1 0/1 ImagePullBackOff 0 5mapi-2 0/1 ErrImagePull 0 30sworker 0/1 ImagePullBackOff 0 8m
RESTARTS = 0 — the container has never started. Compare to CrashLoopBackOff (container started and crashed) or Pending (pod never got scheduled).
ErrImagePull is the immediate failure. ImagePullBackOff means the kubelet has given up retrying for now and will try again later.
The 30-second diagnosis
# 1. describe — events will tell you whykubectl describe pod web-1 | tail -30# 2. which image is it trying to pull?kubectl get pod web-1 -o jsonpath='{.spec.containers[*].image}'# 3. try the pull manually from inside the clusterkubectl run debug --rm -it --image=busybox --restart=Never -- \ wget -qO- https://my-registry.example.com/v2/# 4. check the imagePullSecretskubectl get pod web-1 -o jsonpath='{.spec.imagePullSecrets}'
The taxonomy of causes
┌──────────────────────────────────────────────────────────────┐
│ ImagePullBackOff │
├──────────────────────────────────────────────────────────────┤
│ │
│ 1. Image name typo (myorg/web vs myorg/wed) │
│ 2. Image tag doesn't exist (v2 vs v2.0.0-rc1) │
│ 3. Private registry auth (no imagePullSecrets) │
│ 4. Wrong registry endpoint (typo in image prefix) │
│ 5. Network can't reach (proxy, NAT, EgressPolicy) │
│ 6. Architecture mismatch (arm64 vs amd64) │
│ 7. Registry rate-limited (Docker Hub, GHCR limits) │
│ 8. Image too large (registry timeout, OOM in pull)│
│ 9. Storage limit (no room to extract layers) │
│ │
└──────────────────────────────────────────────────────────────┘
1. Image name typo
Signatures:
$ kubectl describe pod web-1 | tail -10Events: Warning Failed 5m kubelet Failed to pull image "myorg/wed:v2": failed to pull and unpack image "docker.io/myorg/wed:v2": failed to resolve reference "docker.io/myorg/wed:v2": pull access denied, repository does not exist or may require authorization
The image name is wrong. myorg/wed doesn’t exist; you meant myorg/web.
Diagnosis:
# 1. confirm the image exists in the registrydocker manifest inspect myorg/web:v2# orcurl -s https://my-registry/v2/myorg/web/manifests/v2 | jq .# 2. is the typo in the deployment or the pod?kubectl get deploy web -o jsonpath='{.spec.template.spec.containers[0].image}'
Fix: correct the image name. kubectl set image is the right tool:
kubectl set image deployment/web web=myorg/web:v2
2. Image tag doesn’t exist
Signatures:
$ kubectl describe pod web-1 | tail -10Events: Warning Failed 5m kubelet Failed to pull image "myorg/web:v2.0.0-rc1": [errno 2] could not find reference "v2.0.0-rc1" in repository "myorg/web"
The repository exists; the tag doesn’t.
Diagnosis:
# 1. list tags in the registrycrane ls myorg/web# 2. check what you trieddocker manifest inspect myorg/web:v2.0.0-rc1# 3. (Docker Hub)# https://hub.docker.com/v2/repositories/myorg/web/tags/?page_size=100
Common sub-causes:
Pushed wrong tag. You built myorg/web:v2 but the deployment says :v2.0.0-rc1. The tag never existed.
Tag was deleted. Registry policies can delete tags. Garbage collection on Docker Hub, lifecycle policies on ECR.
Multi-arch manifest doesn’t include your platform. The image exists, but only has arm64 manifests, and you’re on amd64.
Fix: build with --platform linux/amd64,linux/arm64.
3. Private registry auth
The image is in a private registry, the kubelet needs creds, and there are no imagePullSecrets (or the wrong ones).
Signatures:
$ kubectl describe pod web-1 | tail -10Events: Warning Failed 5m kubelet Failed to pull image "registry.example.com/myorg/web:v2": Error response from daemon: pull access denied for registry.example.com/myorg/web, repository does not exist or may require 'docker login'
Diagnosis:
# 1. does the pod have imagePullSecrets?kubectl get pod web-1 -o jsonpath='{.spec.imagePullSecrets}' | jq .# [] <-- no secrets = no auth# 2. does the deployment have imagePullSecrets?kubectl get deploy web -o jsonpath='{.spec.template.spec.imagePullSecrets}' | jq .# 3. does the service account have the right secrets?kubectl get sa default -o jsonpath='{.imagePullSecrets}' | jq .# serviceaccount "default" in "my-ns":# []# no secrets here either
# 1. create the docker-registry secretkubectl create secret docker-registry regcred \ --docker-server=registry.example.com \ --docker-username=alice \ --docker-password=xxx \ --docker-email=alice@example.com \ -n my-ns# 2. add to the pod speckubectl patch deploy web -p '{ "spec": { "template": { "spec": { "imagePullSecrets": [{"name": "regcred"}] } } }}'
Approach 2: ServiceAccount-level (cleaner for namespaces):
# 1. add the secret to the SAkubectl patch sa default -p '{ "imagePullSecrets": [{"name": "regcred"}]}' -n my-ns# 2. all pods in the namespace using "default" SA now have the secret
Approach 3: Node-level (for kubelet to use across all pods on the node):
# on each node, configure the containerd/CRI-O registry credentials# /etc/containerd/config.toml[plugins."io.containerd.grpc.v1.cri".registry.configs."registry.example.com".auth] username = "alice" password = "xxx"
Common gotchas:
Secret is in the wrong namespace.imagePullSecrets is namespaced. A secret in kube-system doesn’t help a pod in my-ns.
Secret was deleted. Someone ran kubectl delete secret regcred. Pods that were already running keep their images cached; new pods fail to pull.
The registry requires a different auth method. AWS ECR uses temporary tokens (refreshed every 12h). Azure ACR uses different formats. GCR uses JSON keys or workload identity.
The secret was created from a working ~/.docker/config.json but the JSON has auths at the wrong level (k8s expects auths.<server>.auth and auths.<server>.username).
Using kubernetes.io/dockerconfigjson but the secret has the wrong type.
kubectl get secret regcred -o jsonpath='{.type}'# should be: kubernetes.io/dockerconfigjson# if it's Opaque, the kubelet ignores it
Re-create with kubectl create secret docker-registry (which sets the right type).
4. Wrong registry endpoint
The image has a registry prefix that resolves to the wrong place. Common cases:
Image
Resolves to
nginx
docker.io/library/nginx
myorg/web
docker.io/myorg/web
registry.example.com/myorg/web
registry.example.com/myorg/web
gcr.io/myproj/web
gcr.io/myproj/web
1234.dkr.ecr.us-east-1.amazonaws.com/web
ECR registry
quay.io/myorg/web
quay.io/myorg/web
Signatures:
# if you used a private registry but omitted the prefix$ kubectl describe pod web-1Failed to pull image "myorg/web:v2": pull access denied, repository does not exist or may require authorization# because Docker Hub has no "myorg/web" (you meant your private registry)
Fix: include the registry prefix in the image name.
5. Network can’t reach the registry
DNS works in the cluster, but the registry is unreachable.
Signatures:
$ kubectl describe pod web-1 | tail -10Events: Warning Failed 5m kubelet Failed to pull image "registry.example.com/myorg/web:v2": failed to do request: Head "https://registry.example.com/v2/myorg/web/manifests/v2": dial tcp: lookup registry.example.com on 10.96.0.10:53: no such host
# 1. can the cluster resolve the registry's DNS?kubectl run debug --rm -it --image=busybox --restart=Never -- \ nslookup registry.example.com# 2. can it reach the registry?kubectl run debug --rm -it --image=busybox --restart=Never -- \ wget -qO- https://registry.example.com/v2/ ; echo# 3. from the nodessh node-1$ curl -sS https://registry.example.com/v2/ ; echo$ nslookup registry.example.com
Common sub-causes:
DNS not configured for the registry’s domain. Especially for private registries on internal domains.
Fix: add a dnsConfig or use hostAliases on the pod.
HTTP proxy required. The cluster is behind a corporate proxy, and the kubelet isn’t using it.
Fix: configure the kubelet’s --http-proxy flag, or set HTTPS_PROXY in the containerd/CRI-O config.
NetworkPolicy blocks egress to the registry. Default-deny NetworkPolicy without an egress allow.
apiVersion: networking.k8s.io/v1kind: NetworkPolicyspec: podSelector: {} policyTypes: [Egress, Ingress] # no egress rules = no egress allowed
Fix: add an egress rule for the registry.
Registry is on a private network the cluster can’t reach. Common with on-prem or hybrid setups.
Fix: VPC peering, VPN, or proxy.
TLS cert issue. Registry uses a private CA, and the kubelet doesn’t trust it.
$ kubectl describe pod web-1 | tail -5x509: certificate signed by unknown authority
Fix: add the CA cert to the node’s trust store, or configure the containerd registry config with tls_config.
6. Architecture mismatch
The image was built for a different CPU architecture than the node.
Signatures:
$ kubectl describe pod web-1 | tail -5Events: Warning Failed 5m kubelet Failed to pull image "myorg/web:v2": no matching manifest for linux/amd64 in the manifest list entries
The image only has linux/arm64 (built on M1 Mac) and the node is linux/amd64.
# rebuild for the target platformdocker buildx build --platform linux/amd64 -t myorg/web:v2 .# or build for bothdocker buildx build --platform linux/amd64,linux/arm64 -t myorg/web:v2 --push .
7. Registry rate-limited
Docker Hub: 100 pulls / 6 hours for anonymous, 200 for authenticated (free tier).
GHCR: 5000 / hour with auth.
Signatures:
$ kubectl describe pod web-1 | tail -5Events: Warning Failed 5m kubelet Failed to pull image "library/nginx:latest": toomanyrequests: You have reached your pull rate limit
Common in clusters pulling from Docker Hub directly.
Fix:
Mirror to Docker Hub authenticated users — docker login once on each node.
Use a registry mirror — configure containerd/CRI-O to pull from a mirror (e.g., mirror.gcr.io for Docker Hub).
Cache locally — run a Harbor / ECR / GCR mirror in your own infrastructure, pull from there.
Pre-pull images — use a DaemonSet or node-image to pre-populate node caches.
8. Image too large
The image is hundreds of MB or GB. Pulling it takes longer than the kubelet’s pull timeout (default 1 minute for the manifest, longer for the actual pull).
# 1. image sizedocker inspect myorg/web:v2 --format='{{.Size}}' # bytes# 2. is the image multi-GB?# 1.2 GB, mostly from a fat base image (ubuntu + node + npm install)
Fix: use smaller base images:
# bad: 1.2 GBFROM ubuntu:22.04RUN apt-get update && apt-get install -y nodejs npmCOPY . .RUN npm installCMD ["node", "server.js"]# better: 200 MBFROM node:20-slimCOPY package*.json ./RUN npm ci --only=productionCOPY . .CMD ["node", "server.js"]# best: 80 MBFROM node:20-alpineCOPY package*.json ./RUN npm ci --only=productionCOPY . .CMD ["node", "server.js"]# or even smaller with multi-stage buildsFROM node:20-alpine AS builderWORKDIR /appCOPY package*.json ./RUN npm ciCOPY . .RUN npm run buildFROM node:20-alpineWORKDIR /appCOPY --from=builder /app/dist ./distCOPY package*.json ./RUN npm ci --only=productionCMD ["node", "dist/server.js"]
9. Storage limit (no room to extract)
The node’s ephemeral storage is full, and the kubelet can’t extract the image layers.
Signatures:
$ kubectl describe pod web-1 | tail -5Events: Warning Failed 5m kubelet Failed to pull image "myorg/web:v2": failed to extract layer sha256:... write /var/lib/containerd/.../layer.tar: no space left on device
$ kubectl describe node node-1 | grep -A 3 "Conditions:"Conditions: Type Status Reason DiskPressure True LowDisk
Diagnosis:
# 1. node diskkubectl describe node node-1 | grep -E "DiskPressure|ephemeral-storage"# 2. on the nodessh node-1$ df -h /var/lib/containerd /var/lib/kubelet$ du -sh /var/lib/containerd /var/lib/kubelet
Fix: clean up old images, expand disk, or change image storage location.
The “is it the registry or the cluster?” test
# 1. can *you* pull the image from outside the cluster?docker pull myorg/web:v2# 2. can a *pod* in the cluster pull any image?kubectl run debug --rm -it --image=busybox --restart=Never -- echo "pulled busybox"# if THIS fails, it's a network / kubelet / registry config issue# if THIS works but the real image fails, it's specific to that image/credentials
The “is it the secret?” test
# decode a dockerconfigjson secretkubectl get secret regcred -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq .
If the server is wrong, the username is wrong, or the auth doesn’t base64-decode to username:password, the secret is the problem.
Pulling from a private registry the kubelet doesn’t know about
Some managed services (ECR, ACR, GCR) auto-provision credentials via workload identity. If you have that set up, don’t create a docker-registry secret — the kubelet handles auth automatically via the cloud’s metadata service.
# EKS with IRSA — pods use the node's IAM role, no imagePullSecrets needed# GKE with Workload Identity — same idea# AKS with Managed Identity — same
If you have a working cloud-native setup but you’ve also created an imagePullSecrets, the kubelet will use the secrets first, and may fail if those secrets are stale or wrong.
ImagePullBackOff is normal for typos — the kubelet will keep retrying for a long time. If you’ve fixed the issue, kubectl delete pod <name> to force an immediate re-pull.
The “latest” tag is a liar.image: myorg/web:latest doesn’t mean “the newest stable version” — it means “whatever was tagged as latest at pull time.” Use specific tags (e.g., v2.1.4 or a SHA digest myorg/web@sha256:abc123...).
Multi-arch images need a manifest list. If you only built for one platform, the image won’t pull on the other.
The default service account has no imagePullSecrets by default. You have to add them.
Don’t put credentials in your image name.image: myorg/web:v2?token=xxx doesn’t work; the kubelet doesn’t parse query strings. Use imagePullSecrets.
Pull policies — imagePullPolicy: IfNotPresent (default) skips pull if image is cached. Always re-pulls every time. Never never pulls (assumes cached).
containers:- name: web image: myorg/web:v2 imagePullPolicy: Always # useful for `:latest` to ensure freshness
Cached images don’t get cleaned up automatically. Nodes accumulate old images. Use a tool like image-gc or crictl rmi to clean.
Pulling from one registry, pushing to another. Multi-cluster setups often have a local mirror. Make sure image references match the local mirror’s path, not the source registry’s.
Pod sandbox image. Even if your container image pulls fine, the pod needs a sandbox image (e.g., registry.k8s.io/pause:3.9). If the sandbox image is blocked, the pod fails to start.
A failed imagePullBackOff is a “kicked off but eventually failed” — the kubelet might keep retrying for hours. If you don’t see an event for a while, that’s the backoff. Force a re-pull with kubectl delete pod.