Skip to main content

Specialized Workloads: DaemonSets, Jobs, and CronJobs

While Deployments manage long-running, stateless applications, Kubernetes provides specialized controllers for infrastructure services (DaemonSets) and batch processing (Jobs).


1. DAEMONSETS (Infrastructure Workloads)

A DaemonSet (DS) ensures that a copy of a specific Pod is running on all (or a filtered subset of) Nodes in the cluster.

1.1 Scheduling Internals

Historically, DaemonSets were scheduled by the DaemonSet Controller. In modern Kubernetes (v1.12+), they are scheduled by the default scheduler using Node Affinity automatically injected by the controller.

  • Logic: The DS Controller adds nodeAffinity to the Pods to match the target nodes.
  • Taints: DaemonSets automatically tolerate the node.kubernetes.io/unschedulable:NoSchedule taint to ensure they run even on cordoned nodes.
  • Update Strategy: Supports RollingUpdate. Use maxUnavailable to control how many nodes lose the daemon during a version swap.

1.2 Production Use Cases

  • Networking: CNI plugins (Cilium, Calico).
  • Observability: Node-level log collectors (Fluent-Bit, Promtail) and metrics exporters (Node Exporter).
  • Storage: CSI drivers providing local storage.

1.3 Bible-Grade YAML: Fluent-Bit Log Shipper

This manifest demonstrates a production setup including taints, tolerations, and host-level path mounts.

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: kube-system
labels:
k8s-app: fluent-bit-logging
spec:
selector:
matchLabels:
name: fluent-bit
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # Ensure only one node is missing logs during updates
template:
metadata:
labels:
name: fluent-bit
spec:
# Critical: DS usually needs to run on Control Plane nodes too
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: fluent-bit
image: fluent/fluent-bit:2.1.4
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers

2. JOBS (Run-to-Completion)

A Job creates one or more Pods and ensures that a specified number of them successfully terminate (Exit Code 0).

2.1 The Restart Logic Paradox

  • spec.template.spec.restartPolicy: Must be OnFailure or Never. (You cannot use Always in a Job).
  • backoffLimit: The Job Controller recreates the Pod if it fails. If the failures exceed this limit, the Job stops and is marked as Failed.
  • activeDeadlineSeconds: Hard limit on how long a Job can run (including retries). If exceeded, K8s kills all active Pods and fails the Job.

2.2 Completion Modes

  1. Non-Parallel Jobs: One Pod starts, Job finishes when Pod succeeds.
  2. Fixed Completion Count (completions): Job is complete when N pods finish successfully.
  3. Work Queue (parallelism): Multiple Pods run concurrently to process a queue.

2.3 Bible-Grade YAML: Database Migration

apiVersion: batch/v1
kind: Job
metadata:
name: schema-migration-v42
spec:
# RECOVERY: Wait 100s total, retry 4 times
backoffLimit: 4
activeDeadlineSeconds: 100
# CLEANUP: Automatically delete Job object from Etcd 1 hour after finish
ttlSecondsAfterFinished: 3600
template:
spec:
containers:
- name: migration-tool
image: company/db-migrator:latest
env:
- name: DB_URL
valueFrom:
secretKeyRef:
name: db-creds
key: url
restartPolicy: OnFailure # Restart container if it crashes, recreate Pod if it fails

3. CRONJOBS (Scheduled Tasks)

A CronJob is a manager for Jobs. It creates a Job object based on a Cron schedule.

3.1 Architecture & Concurrency

CronJobs must handle the scenario where a Job takes longer to finish than the next scheduled interval.

  • concurrencyPolicy:
    • Allow (Default): Multiple Jobs can run simultaneously.
    • Forbid: Skips the next job if the current one is still running.
    • Replace: Kills the current running job and starts a new one.

3.2 The "Missing" Job Problem

If the Control Plane is down during a scheduled window, the CronJob controller checks how many schedules were missed. If startingDeadlineSeconds is not set and more than 100 schedules are missed, the CronJob stops entirely and requires manual intervention.

3.3 Bible-Grade YAML: Daily Database Backup

apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-backup
spec:
schedule: "0 2 * * *" # 2 AM UTC every day
concurrencyPolicy: Forbid
startingDeadlineSeconds: 200 # Allow 200s delay if controller is busy
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: backup-agent
image: rclone/rclone:latest
args: ["sync", "/data", "s3:my-backup-bucket"]
restartPolicy: OnFailure

4. ARCHITECT'S COMPARISON SUMMARY

FeatureDaemonSetJobCronJob
TerminationNever (Running as long as node exists)Once task is successfulOnce task is successful
ScalingImplicit (1 per Node)parallelism fieldManaged by schedule
Self-HealingRecreates Pod if deletedRecreates Pod based on backoffLimitRecreates based on backoffLimit
CleanupManual or DS deletionttlSecondsAfterFinishedhistoryLimit fields
Primary Use CaseLog/Metric agents, ProxiesMigrations, Batch processingBackups, Cleanup, Reports

5. TROUBLESHOOTING & NINJA COMMANDS

5.1 Debugging DaemonSet Rollouts

# Check the status of a rolling update
kubectl rollout status ds/fluent-bit -n kube-system

# View which nodes do NOT have the DS Pod (Troubleshooting taints/affinity)
kubectl get pods -l name=fluent-bit -o wide

5.2 Cleaning up "Zombies"

Completed Jobs stay in the system forever (consuming Etcd space) unless ttlSecondsAfterFinished is set.

# Manually delete all completed jobs in a namespace
kubectl delete jobs --field-selector status.successful=1

5.3 Triggering a CronJob Manually

Useful for testing a backup script without waiting for 2 AM.

kubectl create job --from=cronjob/nightly-backup manual-backup-test

5.4 Watching CronJob Logic

# See the last time the job ran and if it's currently active
kubectl get cronjob

Sample Output:

NAME             SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
nightly-backup 0 2 * * * False 0 18h 5d