CronJobs — Time-Scheduled Workloads
https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
A CronJob runs Jobs on a time-based schedule. Think of it as the k8s equivalent of crontab(5): a schedule expression that says “create a Job at this time, with this template.”
CronJob is a thin layer over Job:
CronJob
│
│ "every hour"
▼
Creates Job ──── creates Pods ──── runs to completion
│
│ Job is kept around for history
▼
After ttlSecondsAfterFinished, Job is GC'd
For simple “run X every Y” patterns, CronJob is the right answer. For DAG dependencies, backfills, retries across days, or SLA-based scheduling, reach for a workflow engine like Argo Workflows or Airflow.
Table of Contents
- The CronJob Mental Model
- Manifest Anatomy
- The Schedule Field — Cron Syntax
- Time Zones
- Concurrency Policies
- Starting Deadline Seconds
- Suspend, Resume, and History Limits
- CronJob Lifecycle (Tick by Tick)
- Patterns
- CronJob vs External Schedulers
- Operational Recipes
- Troubleshooting
- Gotchas and Common Mistakes
- Related Notes
1. The CronJob Mental Model
The contract
“At every scheduled time, create a Job from this template. Don’t create overlapping Jobs (depending on the concurrency policy). Keep history of the last N Jobs. Allow the schedule to be paused.”
A CronJob does not run Pods directly. It creates Jobs. The Jobs create Pods. The Pods run the workload.
The clock
The schedule is evaluated by the cronjob controller, which runs in kube-controller-manager (not on the kubelet, not on a node). One controller instance is the leader at any time; if it crashes, another takes over.
Important: the controller evaluates the schedule against the controller’s clock (typically UTC, set at controller startup). This is the clock that determines when Jobs are created.
The state machine
┌────────────────────────────────────────┐
│ │
▼ │
┌──────────┐ │
│ Created │ │
└────┬─────┘ │
│ │
│ (next scheduled time) │
▼ │
┌──────────┐ │
│ Schedule │──tick──▶ Create Job │
│ Triggered│ │
└────┬─────┘ │
│ │
│ (concurrency policy applies) │
├─ Allow ──▶ Job is created ──▶ next tick │
│ │
├─ Forbid ──▶ if previous still active, skip this tick
│ │
└─ Replace ──▶ kill previous, start new
What a CronJob does NOT do
| Capability | CronJob | Argo Workflows / Airflow |
|---|---|---|
| Run a Job on a schedule | ✅ | ✅ |
| DAG dependencies (B after A) | ❌ | ✅ |
| Backfills across missed days | ❌ | ✅ |
| Conditional execution | ❌ | ✅ |
| SLA-based scheduling | ❌ | ✅ |
| Cross-cluster scheduling | ❌ | ✅ (with effort) |
| Pause / resume schedule | ✅ (suspend) | ✅ |
| Per-run history | ✅ (configurable) | ✅ (richer) |
2. Manifest Anatomy
A minimum-viable CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
spec:
schedule: "* * * * *" # every minute
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox:1.36
args: ["echo", "hello from cron"]
restartPolicy: OnFailureFull anatomy:
apiVersion: batch/v1
kind: CronJob
metadata:
name: db-backup
namespace: batch
labels:
app: backup
tier: data
spec:
schedule: "0 2 * * *" # 02:00 every day
timeZone: "Etc/UTC" # k8s 1.25+
startingDeadlineSeconds: 200 # see section 6
concurrencyPolicy: Forbid # see section 5
suspend: false # see section 7
successfulJobsHistoryLimit: 3 # see section 7
failedJobsHistoryLimit: 1 # see section 7
jobTemplate: # full Job spec
spec:
backoffLimit: 2
activeDeadlineSeconds: 3600
ttlSecondsAfterFinished: 86400
template:
spec:
restartPolicy: OnFailure
serviceAccountName: backup-runner
containers:
- name: backup
image: myorg/backup:2.1
command: ["./backup.sh"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1
memory: 1Gi
status:
active: # currently-running Jobs created by this CronJob
- apiVersion: batch/v1
kind: Job
name: db-backup-28532020
namespace: batch
resourceVersion: "12345"
uid: abc-123
lastScheduleTime: "2025-05-24T02:00:00Z"
lastSuccessfulTime: "2025-05-24T02:00:30Z"Required fields
| Field | Required | Why |
|---|---|---|
apiVersion | yes | Always batch/v1 |
kind | yes | Must be CronJob |
metadata.name | yes | DNS-1123 label |
spec.schedule | yes | Cron expression |
spec.jobTemplate | yes | The Job template to instantiate |
spec.jobTemplate.spec.template.spec.restartPolicy | yes (in template) | OnFailure or Never |
schedule is required and immutable
You cannot change the schedule field on a CronJob after creation. The API server rejects the update. To change a schedule, you must delete and recreate the CronJob.
This is a known sharp edge. The community has discussed making it mutable, but as of k8s 1.30, it’s immutable.
3. The Schedule Field — Cron Syntax
The schedule field uses standard cron syntax, with extensions for @hourly etc.
The standard cron format
┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday = 0)
│ │ │ │ │
* * * * *
Common expressions
| Expression | When |
|---|---|
* * * * * | Every minute |
0 * * * * | Every hour, on the hour |
0 0 * * * | Every day at midnight |
0 2 * * * | Every day at 02:00 |
0 0 * * 0 | Every Sunday at midnight |
0 0 1 * * | First of the month at midnight |
*/5 * * * * | Every 5 minutes |
0 9-17 * * 1-5 | Hourly from 9 AM to 5 PM, Mon-Fri |
0 0,12 * * * | Daily at midnight and noon |
30 4 1,15 * * | 04:30 on the 1st and 15th of every month |
The @ shortcuts
| Shortcut | Equivalent |
|---|---|
@hourly | 0 * * * * |
@daily | 0 0 * * * |
@midnight | 0 0 * * * |
@weekly | 0 0 * * 0 |
@monthly | 0 0 1 * * |
@yearly | 0 0 1 1 * |
@annually | 0 0 1 1 * |
Special characters
| Char | Meaning | Example |
|---|---|---|
* | Any value | * in hour = any hour |
, | Value list separator | 1,3,5 in day = 1st, 3rd, 5th |
- | Range | 9-17 in hour = 9 AM to 5 PM |
/ | Step | */15 in minute = every 15 min |
? | No specific value (some implementations) | k8s doesn’t support this; use * |
Two gotchas in cron syntax
-
Day-of-month AND day-of-week are OR’d if both are restricted. If you set both, the Job runs when either matches. To get “the 1st of every month AND Monday,” you need a workaround (split into two CronJobs, or use a workflow engine).
-
“Every minute” is
* * * * *, not*/1 * * * *. Both work, but the former is simpler.
Validation
The schedule is validated at creation time. Invalid expressions are rejected with a clear error. You can’t create a CronJob with a bad schedule.
To check the next 5 fire times of a cron expression without creating a CronJob:
# Using a Python one-liner
python3 -c "
from datetime import datetime, timedelta
import croniter
c = croniter('0 2 * * *', datetime.now())
for i in range(5):
print(c.get_next(datetime))
"Or use a tool like crontab.guru for human-readable explanations.
4. Time Zones
spec.timeZone (k8s 1.25+) lets you specify a timezone:
spec:
schedule: "0 9 * * *" # 09:00
timeZone: "America/New_York"This means: 09:00 New York time, every day. The schedule is evaluated in the specified timezone, not in the controller’s timezone.
Before k8s 1.25
The schedule was always interpreted in the controller’s timezone, which is typically UTC. To get “9 AM Eastern Time,” you’d compute the UTC equivalent:
- 9 AM ET (EST, UTC-5) = 14:00 UTC
- 9 AM ET (EDT, UTC-4) = 13:00 UTC
This breaks during DST transitions. Use timeZone to avoid the headache.
The IANA timezone database
timeZone accepts any IANA timezone name:
| Region | Timezones |
|---|---|
| Americas | America/New_York, America/Chicago, America/Denver, America/Los_Angeles, America/Sao_Paulo |
| Europe | Europe/London, Europe/Paris, Europe/Berlin, Europe/Moscow |
| Asia | Asia/Tokyo, Asia/Shanghai, Asia/Kolkata, Asia/Dubai |
| Pacific | Pacific/Auckland, Australia/Sydney |
| UTC | Etc/UTC (alias for UTC, no DST) |
For full list, see the IANA timezone database.
DST and CronJob
CronJob does not handle DST transitions intelligently. From the official docs:
“The controller does not synchronize the time zone, e.g. when a region permanently changes its time zone. To handle this, the user is expected to recreate the CronJob object.”
This means: if Brazil ends DST and the country stays at UTC-3 forever, your CronJob will keep firing at the “old” UTC-3 time. You need to recreate it with the new offset.
In practice: pick Etc/UTC if you don’t have a strong reason for local time, and let your application handle the conversion.
5. Concurrency Policies
What happens if a Job is still running when the next scheduled time arrives:
spec:
concurrencyPolicy: Forbid # or Allow or Replace| Policy | Behavior |
|---|---|
Allow (default) | Overlapping Jobs are allowed. Multiple instances can run concurrently. |
Forbid | Skip the new run if the previous one is still active. |
Replace | Kill the previous Job’s Pods and start the new one. |
The decision matrix
| Use case | Recommended policy |
|---|---|
| Database backups | Forbid — don’t run two backups against the same DB at once |
| Log rotation | Forbid — don’t double-rotate |
| Cache warmer | Replace — you want the latest run |
| Metric scrapers | Allow — multiple instances are fine |
| Data pipeline (independent runs) | Allow — each run is independent |
| Distributed ML training (resume from checkpoint) | Forbid — don’t step on the running training |
Forbid and missed runs
With Forbid, if a scheduled time is skipped, the run is lost. The next scheduled time is the next slot. There is no catch-up.
Example: schedule is 0 * * * * (every hour), concurrencyPolicy: Forbid. A run starts at 14:00 and runs until 15:30. The 15:00 run is skipped. The 16:00 run starts on time. The 15:00 run is not made up.
Replace and Forbid are not the same
Forbid | Replace | |
|---|---|---|
| Previous Job still running? | Skip new run | Kill previous, start new |
| Previous run is preserved? | Yes (continues) | No (terminated) |
| Use for | Sequential safety | Latest-wins |
Replace is the right policy for “I always want the most recent run” patterns. Forbid is the right policy for “don’t run two of me at once.”
What “still active” means
A Job is “active” if it has any non-terminated Pod. Once all Pods have terminated (success or failure), the next scheduled time triggers a new Job.
For a Job with restartPolicy: OnFailure and a script that keeps failing and restarting, the Job can stay “active” indefinitely. Forbid will keep skipping new runs until the Job is deleted or completes.
6. Starting Deadline Seconds
startingDeadlineSeconds bounds the time between the scheduled time and the actual Job creation:
spec:
startingDeadlineSeconds: 200If the controller can’t create the Job within 200 seconds of the scheduled time, the run is skipped. The schedule continues normally from there.
Why this matters
The controller-manager might be down (a leader election issue, a crash, a rolling update). When it comes back, it might be minutes or hours later. Without startingDeadlineSeconds, the controller would catch up on all missed runs, possibly creating dozens of Jobs in a burst.
With startingDeadlineSeconds: 200, the controller only creates a Job if it’s “still relevant” (within the deadline). Missed runs are dropped.
Choosing the value
| Pattern | Recommended value |
|---|---|
| Every minute | 60 (only create if within 1 minute) |
| Every hour | 300 (5 minutes) |
| Daily at 02:00 | 3600 (1 hour) or more |
| Once a month | 86400 (1 day) |
| Critical hourly backups | 600 (10 minutes — you want a tighter bound) |
If you set it too high, the controller will queue up many missed runs. If too low, brief controller-manager restarts will drop runs.
The behavior
Time 14:00:00 - scheduled
Time 14:00:05 - controller creates Job ✓ (within deadline)
Time 14:00:00 - scheduled
Time 14:00:30 - controller-manager is down
Time 14:05:00 - controller-manager comes back
- startingDeadlineSeconds: 200
- now > scheduled + 200s, so this run is SKIPPED
Time 15:00:00 - next scheduled time
Time 15:00:05 - controller creates Job ✓
Important: no catch-up
Unlike some external schedulers, CronJob does not compensate for missed runs. If the controller was down for 3 hours and the schedule is hourly, those 3 runs are lost. The next scheduled time is the next one.
If you need reliable catch-up, use a workflow engine (Argo Workflows, Airflow) with explicit backfill support.
7. Suspend, Resume, and History Limits
Suspend
Pause a CronJob from creating new Jobs:
spec:
suspend: trueThe CronJob is “paused.” No new Jobs are created. Existing Jobs are not affected. You can edit the CronJob to change its spec while suspended (e.g., update the image, change the schedule).
kubectl patch cronjob <name> -p '{"spec":{"suspend":true}}' --type=mergeThis is the right way to:
- Disable a CronJob temporarily without deleting it (keeps history)
- Edit the CronJob’s template safely
- Hold the schedule during maintenance
To resume:
kubectl patch cronjob <name> -p '{"spec":{"suspend":false}}' --type=mergeThe next scheduled time after resume will trigger a Job.
History limits
successfulJobsHistoryLimit and failedJobsHistoryLimit control how many finished Jobs are kept:
spec:
successfulJobsHistoryLimit: 3 # keep 3 most recent successful Jobs
failedJobsHistoryLimit: 1 # keep 1 most recent failed JobDefault: 3 and 1.
When a new Job is created, the controller counts the existing finished Jobs (matching the CronJob’s ownerReference). If the count exceeds the limit, the oldest are deleted.
Set these to 0 if you don’t want any history (saves etcd space). For high-volume CronJobs (every minute, keeping 3 = 3 minutes of history), this matters.
The etcd cost
Each Job creates Pods, which have status, events, and logs. With 1000 CronJobs each keeping 3 Jobs, you have 3000 Job objects in etcd. Each Job has 1+ Pod. That’s thousands of objects, all with events.
For cost-sensitive clusters, set successfulJobsHistoryLimit: 1 or 0 for high-frequency CronJobs.
The cleanup mechanism
The CronJob controller doesn’t directly delete the Job objects. It uses the same cascade mechanism as Pod garbage collection:
- The CronJob is the owner of each Job it creates
- When the count of owned Jobs exceeds the limit, the controller deletes the oldest
- The Job’s Pods are deleted via owner references
This is why the limits are enforced cleanly without a separate cleanup job.
8. CronJob Lifecycle (Tick by Tick)
What happens at every scheduled tick:
1. CronJob controller's informer fires (every minute by default)
2. Controller checks all CronJobs for due times
3. For each due CronJob:
a. Check if suspended → if yes, skip
b. Check if a previous Job is still active
- If concurrencyPolicy=Forbid and previous active → skip
- If concurrencyPolicy=Replace and previous active → delete previous
- Otherwise → proceed
c. Check if within startingDeadlineSeconds
- If past deadline → skip (drop the missed run)
d. Create the Job from jobTemplate
e. Update CronJob status (lastScheduleTime, active)
4. Garbage-collect old Jobs based on history limits
5. Wait for next informer tick
The “tick” is the controller’s reconciliation loop, which runs roughly every 10 seconds by default. So a CronJob with * * * * * (every minute) might fire up to 10 seconds after the scheduled time, depending on the controller’s load.
The naming convention
Each Job created by a CronJob is named:
<cronjob-name>-<unix-timestamp-of-creation>
For example, a db-backup CronJob that fires at 1700000000 will create db-backup-1700000000. This timestamp makes it easy to see when the Job ran.
The Job’s ownerReference points to the CronJob, so when the CronJob is deleted, all its Jobs are deleted too.
The status
status:
active:
- apiVersion: batch/v1
kind: Job
name: db-backup-1700000000
namespace: batch
lastScheduleTime: "2025-05-24T02:00:00Z"
lastSuccessfulTime: "2025-05-24T02:00:30Z"lastScheduleTime is when the controller last created a Job. lastSuccessfulTime is when the most recent Job reached Complete.
If lastScheduleTime is updating but lastSuccessfulTime is not, the Jobs are being created but failing. Investigate.
9. Patterns
Pattern 1: Database backup every night
apiVersion: batch/v1
kind: CronJob
metadata:
name: db-backup
spec:
schedule: "0 2 * * *"
timeZone: "Etc/UTC"
startingDeadlineSeconds: 600
concurrencyPolicy: Forbid # don't run two backups at once
successfulJobsHistoryLimit: 7 # keep a week of successful backups
failedJobsHistoryLimit: 3 # keep 3 failures for debugging
jobTemplate:
spec:
backoffLimit: 1 # don't retry failed backups
activeDeadlineSeconds: 3600 # 1 hour max
ttlSecondsAfterFinished: 604800 # delete Job after 7 days
template:
spec:
restartPolicy: OnFailure
serviceAccountName: backup-runner
containers:
- name: backup
image: myorg/backup:2.1
command: ["./backup.sh"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: BACKUP_BUCKET
value: s3://myorg-db-backups/
resources:
requests:
cpu: 500m
memory: 512MiPattern 2: Cleanup job every hour
apiVersion: batch/v1
kind: CronJob
metadata:
name: cleanup-stale-data
spec:
schedule: "0 * * * *" # top of every hour
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
ttlSecondsAfterFinished: 3600
template:
spec:
restartPolicy: OnFailure
containers:
- name: cleanup
image: myorg/cleanup:1.0
command: ["./cleanup", "--older-than=24h"]Pattern 3: ML training every Sunday at midnight
apiVersion: batch/v1
kind: CronJob
metadata:
name: weekly-train
spec:
schedule: "0 0 * * 0" # Sunday midnight UTC
concurrencyPolicy: Forbid
startingDeadlineSeconds: 3600
jobTemplate:
spec:
backoffLimit: 0 # no retries
activeDeadlineSeconds: 21600 # 6 hours
template:
spec:
restartPolicy: OnFailure
containers:
- name: train
image: myorg/trainer:3.0
command: ["./train", "--epochs=100"]
resources:
requests:
nvidia.com/gpu: 1
cpu: 4
memory: 16Gi
limits:
nvidia.com/gpu: 1
cpu: 8
memory: 32GiPattern 4: Cache warmer every 10 minutes (Replace)
apiVersion: batch/v1
kind: CronJob
metadata:
name: cache-warmer
spec:
schedule: "*/10 * * * *"
concurrencyPolicy: Replace # always want the latest run
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTemplate:
spec:
ttlSecondsAfterFinished: 600
template:
spec:
restartPolicy: OnFailure
containers:
- name: warmer
image: myorg/warmer:1.0
command: ["./warm"]Pattern 5: Heartbeat / canary (every minute, Allow)
apiVersion: batch/v1
kind: CronJob
metadata:
name: heartbeat
spec:
schedule: "* * * * *"
concurrencyPolicy: Allow # many can run concurrently; cheap
successfulJobsHistoryLimit: 0 # no history needed
failedJobsHistoryLimit: 1
startingDeadlineSeconds: 60
jobTemplate:
spec:
ttlSecondsAfterFinished: 60
template:
spec:
restartPolicy: OnFailure
containers:
- name: heartbeat
image: myorg/heartbeat:1.0
command: ["./send-heartbeat"]
resources:
requests:
cpu: 10m
memory: 32Mi
limits:
cpu: 100m
memory: 64MiPattern 6: Email digest every weekday at 8 AM
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-digest
spec:
schedule: "0 8 * * 1-5" # 8 AM, Mon-Fri
timeZone: "America/New_York"
concurrencyPolicy: Forbid
jobTemplate:
spec:
ttlSecondsAfterFinished: 86400
template:
spec:
restartPolicy: OnFailure
containers:
- name: digest
image: myorg/digest:1.0
command: ["./send-digest"]10. CronJob vs External Schedulers
When CronJob is enough
- A single, time-based trigger
- No dependencies between Jobs
- Tolerable to miss runs during outages
- Small number of CronJobs (tens, not thousands)
- No need for backfill or catch-up
When you need Argo Workflows
- DAG dependencies: “B runs after A succeeds, C runs after A and B”
- Conditional execution: “Run D only if X is true”
- Per-step resource allocation and parallelism
- Built-in retry, error handling, artifact passing
- Visual workflow UI
When you need Airflow
- Backfills across days/weeks
- SLA-based scheduling with deadlines and alerts
- Cross-system dependencies (e.g., wait for an SFTP upload before processing)
- Mature operator ecosystem (S3, BigQuery, Snowflake, etc.)
- Compliance and audit requirements
When you need neither
- “Run X every Y” with no deps → CronJob ✅
- “Trigger from an event” → use a webhook + Job, or a message queue + worker Deployment
- “Continuous processing” → use a Deployment, not a CronJob
The decision tree
Need to run work on a schedule?
│
├── Single trigger, no deps? ──▶ CronJob
│
├── DAG dependencies? ──▶ Argo Workflows
│
├── Backfill / cross-system deps? ──▶ Airflow
│
├── Event-triggered? ──▶ webhook + Job, or message queue + worker
│
└── Continuous? ──▶ Deployment with worker pattern
11. Operational Recipes
Recipe 1: Manually trigger a CronJob
# Create a Job from the CronJob's template, immediately
kubectl create job --from=cronjob/<cronjob-name> <manual-job-name>This creates a one-off Job with the same template. The CronJob’s schedule is not affected; the next scheduled time still runs.
Recipe 2: List all Jobs created by a CronJob
kubectl get jobs -l batch.kubernetes.io/cronjob=<cronjob-name>
# or (older clusters):
kubectl get jobs -l job-name=<cronjob-name>The label is batch.kubernetes.io/cronjob (k8s 1.27+) or job-name (older).
Recipe 3: Get the last successful run
kubectl get cronjob <name> -o jsonpath='{.status.lastSuccessfulTime}'Recipe 4: Suspend all CronJobs in a namespace
kubectl get cronjob -n <namespace> -o name | \
xargs -I {} kubectl patch {} -n <namespace> -p '{"spec":{"suspend":true}}' --type=mergeRecipe 5: Check if a CronJob is firing
# Recent Job creations
kubectl get events -n <namespace> --field-selector involvedObject.kind=CronJob
# Or directly:
kubectl get cronjob <name> -o jsonpath='{.status.lastScheduleTime}'If lastScheduleTime is recent, the controller is firing Jobs. If it’s stale, the controller isn’t firing — check controller-manager health.
Recipe 6: Bulk-update image
# Update image on all CronJobs matching a label
kubectl get cronjob -A -l app=backup -o name | \
xargs -I {} kubectl patch {} -p '{"spec":{"jobTemplate":{"spec":{"template":{"spec":{"containers":[{"name":"backup","image":"myorg/backup:2.2"}]}}}}}}' --type=merge(Note: this is a deep patch. Verify the structure matches your CronJob.)
Recipe 7: Disable a CronJob without deleting history
kubectl patch cronjob <name> -p '{"spec":{"suspend":true}}' --type=mergeThe CronJob stays around (with its history). To re-enable:
kubectl patch cronjob <name> -p '{"spec":{"suspend":false}}' --type=merge12. Troubleshooting
Symptom: CronJob is not firing
Check 1: Is it suspended?
kubectl get cronjob <name> -o jsonpath='{.spec.suspend}'
# "true" = suspendedCheck 2: Is the schedule valid?
The schedule was validated at creation, but if you copy-paste, double-check the syntax.
Check 3: Is the controller-manager healthy?
kubectl get pods -n kube-system -l component=kube-controller-manager
# All instances should be Running and ReadyCheck 4: Are there events for the CronJob?
kubectl describe cronjob <name>Look for events at the bottom. If the controller is firing, you’ll see Job-creation events.
Check 5: Is lastScheduleTime updating?
kubectl get cronjob <name> -o jsonpath='{.status.lastScheduleTime}'
watch -n 30 'kubectl get cronjob <name> -o jsonpath="Last schedule: {.status.lastScheduleTime}\n"'If it’s not updating, the controller isn’t firing.
Symptom: CronJob fires but Jobs are failing
Check 1: Look at the most recent Job
# Find the latest Job
kubectl get jobs -l batch.kubernetes.io/cronjob=<name> --sort-by=.metadata.creationTimestamp -o name | tail -1
# Check its status
kubectl describe job <job-name>
kubectl logs -l job-name=<job-name> --tail=100Check 2: Are resources insufficient?
Check if the Job’s Pods are Pending due to resource pressure.
Check 3: Is backoffLimit too low?
A flaky script with backoffLimit: 1 will fail fast. Raise it.
Symptom: CronJob fires multiple times when it shouldn’t
This usually means:
concurrencyPolicy: Allow(default) — overlapping runs are intended- The previous Job is failing silently and never reaches “not active” state
Check:
kubectl get jobs -l batch.kubernetes.io/cronjob=<name>If you see many Jobs running concurrently, change to Forbid or Replace.
Symptom: CronJob skipped a run
This is normal if:
concurrencyPolicy: Forbidand the previous run was still activestartingDeadlineSecondswas exceeded (controller was late)
To recover, manually trigger:
kubectl create job --from=cronjob/<name> <manual-name>Symptom: Time zone confusion
The CronJob fires at the wrong local time. Check:
- Is
timeZoneset? If yes, that’s your reference. If no, the controller’s timezone is used. - Is the controller-manager’s timezone UTC? (It usually is.)
- Is the schedule interpreted correctly?
Convert your expected local time to UTC and verify against the actual fire times.
Symptom: CronJob with timeZone doesn’t work in older clusters
spec.timeZone was added in k8s 1.25. In older clusters, the field is silently ignored. Check the cluster version:
kubectl versionIf < 1.25, use UTC and convert in your application or the schedule.
Symptom: Jobs are not being garbage-collected
The history limits are not being enforced. Check:
- Is the CronJob’s
successfulJobsHistoryLimit/failedJobsHistoryLimitset? - Is the controller running?
- Is the Job owned by this CronJob? (Check
ownerReference)
If the Job was created manually (not by the CronJob), the GC won’t touch it.
13. Gotchas and Common Mistakes
Schedule gotchas
scheduleis immutable. You cannot change it. Delete and recreate.- CronJob does not catch up on missed runs. If the controller is down, those runs are lost.
- Day-of-month and day-of-week are OR’d. You can’t say “1st of the month AND Monday” in one expression.
timeZoneis silently ignored on clusters < 1.25.
Concurrency gotchas
- Default
concurrencyPolicy: Allowallows overlapping runs. Often not what you want. Forbidskips silently. If you don’t monitorlastScheduleTime, you won’t notice missed runs.Replacekills the previous run. If the previous run had side effects (e.g., wrote to a database), those side effects are interrupted.
Deadline gotchas
startingDeadlineSeconds: null(default) is unlimited. A controller that comes back from a 1-day outage will try to fire all 1440 missed runs.- Always set
startingDeadlineSecondsto bound the catch-up window.
History gotchas
- Default
successfulJobsHistoryLimit: 3may not be enough for high-frequency CronJobs. - Each retained Job keeps its Pods and events. This adds up fast.
- For high-frequency CronJobs (every minute), set limits to 0 or 1.
Suspend gotchas
- Suspending does not delete existing Jobs. If a Job is running and you suspend the CronJob, the Job continues.
- Suspending is not a “pause” for the next tick. The schedule is paused; existing activity is unaffected.
Naming gotchas
- Job names are
<cronjob-name>-<unix-timestamp>. If you create a Job manually with the same name as one the CronJob would create, the CronJob’s creation will fail (duplicate name). - CronJob names must be DNS-1123 compatible. Lowercase, ≤52 chars, no underscores.
Concurrency vs. startup ordering
When the controller creates a Job, the Job creates a Pod. The Pod takes time to start. If the next scheduled time is 1 second later, you could have two Jobs running in the same second.
This is fine if concurrencyPolicy: Allow. With Forbid or Replace, the controller checks “is there an active Job” before creating a new one. The “active” check is by Job name, not by Pod phase, so a Job with restartPolicy: OnFailure and many in-place container restarts is still “active.”
Other gotchas
- The controller evaluates the schedule in
timeZone(or UTC if not set). It does not use the cluster’s local time. - CronJob does not respect NTP slew or DST. Time changes are not handled gracefully.
- A CronJob with
suspend: truefrom creation is never started. Setting it later in the lifecycle has no effect on already-created Jobs. - CronJob is namespace-scoped. To run cluster-wide, you need a CronJob in each namespace (or use a privileged operator).
- No built-in alerting. Use the
lastScheduleTime/lastSuccessfulTimestatus with your monitoring system to alert on missed runs.
14. Related Notes
| Topic | Note |
|---|---|
| Jobs (what CronJob creates) | 06 — Job |
| Pods (what Jobs run) | 01 — Pods |
| Deployment (long-running) | 03 — Deployments |
| Init containers (run before app) | 08 — Init Containers |
| Resource requests and limits | L06 — Resource Requests and Limits |
| Taints and tolerations | L06 — Scheduling and Scaling |
| Garbage collection (TTL, history) | L09 — Garbage Collection |