Specialized Workloads: DaemonSets, Jobs, and CronJobs
While Deployments manage long-running, stateless applications, Kubernetes provides specialized controllers for infrastructure services (DaemonSets) and batch processing (Jobs).
1. DAEMONSETS (Infrastructure Workloads)
A DaemonSet (DS) ensures that a copy of a specific Pod is running on all (or a filtered subset of) Nodes in the cluster.
1.1 Scheduling Internals
Historically, DaemonSets were scheduled by the DaemonSet Controller. In modern Kubernetes (v1.12+), they are scheduled by the default scheduler using Node Affinity automatically injected by the controller.
- Logic: The DS Controller adds
nodeAffinityto the Pods to match the target nodes. - Taints: DaemonSets automatically tolerate the
node.kubernetes.io/unschedulable:NoScheduletaint to ensure they run even on cordoned nodes. - Update Strategy: Supports
RollingUpdate. UsemaxUnavailableto control how many nodes lose the daemon during a version swap.
1.2 Production Use Cases
- Networking: CNI plugins (Cilium, Calico).
- Observability: Node-level log collectors (Fluent-Bit, Promtail) and metrics exporters (Node Exporter).
- Storage: CSI drivers providing local storage.
1.3 Bible-Grade YAML: Fluent-Bit Log Shipper
This manifest demonstrates a production setup including taints, tolerations, and host-level path mounts.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: kube-system
labels:
k8s-app: fluent-bit-logging
spec:
selector:
matchLabels:
name: fluent-bit
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # Ensure only one node is missing logs during updates
template:
metadata:
labels:
name: fluent-bit
spec:
# Critical: DS usually needs to run on Control Plane nodes too
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: fluent-bit
image: fluent/fluent-bit:2.1.4
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
2. JOBS (Run-to-Completion)
A Job creates one or more Pods and ensures that a specified number of them successfully terminate (Exit Code 0).
2.1 The Restart Logic Paradox
spec.template.spec.restartPolicy: Must beOnFailureorNever. (You cannot useAlwaysin a Job).backoffLimit: The Job Controller recreates the Pod if it fails. If the failures exceed this limit, the Job stops and is marked asFailed.activeDeadlineSeconds: Hard limit on how long a Job can run (including retries). If exceeded, K8s kills all active Pods and fails the Job.
2.2 Completion Modes
- Non-Parallel Jobs: One Pod starts, Job finishes when Pod succeeds.
- Fixed Completion Count (
completions): Job is complete whenNpods finish successfully. - Work Queue (
parallelism): Multiple Pods run concurrently to process a queue.
2.3 Bible-Grade YAML: Database Migration
apiVersion: batch/v1
kind: Job
metadata:
name: schema-migration-v42
spec:
# RECOVERY: Wait 100s total, retry 4 times
backoffLimit: 4
activeDeadlineSeconds: 100
# CLEANUP: Automatically delete Job object from Etcd 1 hour after finish
ttlSecondsAfterFinished: 3600
template:
spec:
containers:
- name: migration-tool
image: company/db-migrator:latest
env:
- name: DB_URL
valueFrom:
secretKeyRef:
name: db-creds
key: url
restartPolicy: OnFailure # Restart container if it crashes, recreate Pod if it fails
3. CRONJOBS (Scheduled Tasks)
A CronJob is a manager for Jobs. It creates a Job object based on a Cron schedule.
3.1 Architecture & Concurrency
CronJobs must handle the scenario where a Job takes longer to finish than the next scheduled interval.
concurrencyPolicy:Allow(Default): Multiple Jobs can run simultaneously.Forbid: Skips the next job if the current one is still running.Replace: Kills the current running job and starts a new one.
3.2 The "Missing" Job Problem
If the Control Plane is down during a scheduled window, the CronJob controller checks how many schedules were missed. If startingDeadlineSeconds is not set and more than 100 schedules are missed, the CronJob stops entirely and requires manual intervention.
3.3 Bible-Grade YAML: Daily Database Backup
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-backup
spec:
schedule: "0 2 * * *" # 2 AM UTC every day
concurrencyPolicy: Forbid
startingDeadlineSeconds: 200 # Allow 200s delay if controller is busy
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: backup-agent
image: rclone/rclone:latest
args: ["sync", "/data", "s3:my-backup-bucket"]
restartPolicy: OnFailure
4. ARCHITECT'S COMPARISON SUMMARY
| Feature | DaemonSet | Job | CronJob |
|---|---|---|---|
| Termination | Never (Running as long as node exists) | Once task is successful | Once task is successful |
| Scaling | Implicit (1 per Node) | parallelism field | Managed by schedule |
| Self-Healing | Recreates Pod if deleted | Recreates Pod based on backoffLimit | Recreates based on backoffLimit |
| Cleanup | Manual or DS deletion | ttlSecondsAfterFinished | historyLimit fields |
| Primary Use Case | Log/Metric agents, Proxies | Migrations, Batch processing | Backups, Cleanup, Reports |
5. TROUBLESHOOTING & NINJA COMMANDS
5.1 Debugging DaemonSet Rollouts
# Check the status of a rolling update
kubectl rollout status ds/fluent-bit -n kube-system
# View which nodes do NOT have the DS Pod (Troubleshooting taints/affinity)
kubectl get pods -l name=fluent-bit -o wide
5.2 Cleaning up "Zombies"
Completed Jobs stay in the system forever (consuming Etcd space) unless ttlSecondsAfterFinished is set.
# Manually delete all completed jobs in a namespace
kubectl delete jobs --field-selector status.successful=1
5.3 Triggering a CronJob Manually
Useful for testing a backup script without waiting for 2 AM.
kubectl create job --from=cronjob/nightly-backup manual-backup-test
5.4 Watching CronJob Logic
# See the last time the job ran and if it's currently active
kubectl get cronjob
Sample Output:
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
nightly-backup 0 2 * * * False 0 18h 5d