Specialized Workloads: DaemonSets, Jobs, and CronJobs

While Deployments manage long-running, stateless applications, Kubernetes provides specialized controllers for infrastructure services (DaemonSets) and batch processing (Jobs).

1. DAEMONSETS (Infrastructure Workloads)

A DaemonSet (DS) ensures that a copy of a specific Pod is running on all (or a filtered subset of) Nodes in the cluster.

1.1 Scheduling Internals

Historically, DaemonSets were scheduled by the DaemonSet Controller. In modern Kubernetes (v1.12+), they are scheduled by the default scheduler using Node Affinity automatically injected by the controller.

Logic: The DS Controller adds nodeAffinity to the Pods to match the target nodes.
Taints: DaemonSets automatically tolerate the node.kubernetes.io/unschedulable:NoSchedule taint to ensure they run even on cordoned nodes.
Update Strategy: Supports RollingUpdate. Use maxUnavailable to control how many nodes lose the daemon during a version swap.

1.2 Production Use Cases

Networking: CNI plugins (Cilium, Calico).
Observability: Node-level log collectors (Fluent-Bit, Promtail) and metrics exporters (Node Exporter).
Storage: CSI drivers providing local storage.

1.3 Bible-Grade YAML: Fluent-Bit Log Shipper

This manifest demonstrates a production setup including taints, tolerations, and host-level path mounts.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: kube-system
  labels:
    k8s-app: fluent-bit-logging
spec:
  selector:
    matchLabels:
      name: fluent-bit
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1 # Ensure only one node is missing logs during updates
  template:
    metadata:
      labels:
        name: fluent-bit
    spec:
      # Critical: DS usually needs to run on Control Plane nodes too
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:2.1.4
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracePeriodSeconds: 10
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

2. JOBS (Run-to-Completion)

A Job creates one or more Pods and ensures that a specified number of them successfully terminate (Exit Code 0).

2.1 The Restart Logic Paradox

spec.template.spec.restartPolicy: Must be OnFailure or Never. (You cannot use Always in a Job).
backoffLimit: The Job Controller recreates the Pod if it fails. If the failures exceed this limit, the Job stops and is marked as Failed.
activeDeadlineSeconds: Hard limit on how long a Job can run (including retries). If exceeded, K8s kills all active Pods and fails the Job.

2.2 Completion Modes

Non-Parallel Jobs: One Pod starts, Job finishes when Pod succeeds.
Fixed Completion Count (completions): Job is complete when N pods finish successfully.
Work Queue (parallelism): Multiple Pods run concurrently to process a queue.

2.3 Bible-Grade YAML: Database Migration

apiVersion: batch/v1
kind: Job
metadata:
  name: schema-migration-v42
spec:
  # RECOVERY: Wait 100s total, retry 4 times
  backoffLimit: 4
  activeDeadlineSeconds: 100 
  # CLEANUP: Automatically delete Job object from Etcd 1 hour after finish
  ttlSecondsAfterFinished: 3600 
  template:
    spec:
      containers:
      - name: migration-tool
        image: company/db-migrator:latest
        env:
        - name: DB_URL
          valueFrom:
            secretKeyRef:
              name: db-creds
              key: url
      restartPolicy: OnFailure # Restart container if it crashes, recreate Pod if it fails

3. CRONJOBS (Scheduled Tasks)

A CronJob is a manager for Jobs. It creates a Job object based on a Cron schedule.

3.1 Architecture & Concurrency

CronJobs must handle the scenario where a Job takes longer to finish than the next scheduled interval.

concurrencyPolicy:
- Allow (Default): Multiple Jobs can run simultaneously.
- Forbid: Skips the next job if the current one is still running.
- Replace: Kills the current running job and starts a new one.

3.2 The "Missing" Job Problem

If the Control Plane is down during a scheduled window, the CronJob controller checks how many schedules were missed. If startingDeadlineSeconds is not set and more than 100 schedules are missed, the CronJob stops entirely and requires manual intervention.

3.3 Bible-Grade YAML: Daily Database Backup

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-backup
spec:
  schedule: "0 2 * * *" # 2 AM UTC every day
  concurrencyPolicy: Forbid
  startingDeadlineSeconds: 200 # Allow 200s delay if controller is busy
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup-agent
            image: rclone/rclone:latest
            args: ["sync", "/data", "s3:my-backup-bucket"]
          restartPolicy: OnFailure

4. ARCHITECT'S COMPARISON SUMMARY

Feature	DaemonSet	Job	CronJob
Termination	Never (Running as long as node exists)	Once task is successful	Once task is successful
Scaling	Implicit (1 per Node)	`parallelism` field	Managed by schedule
Self-Healing	Recreates Pod if deleted	Recreates Pod based on `backoffLimit`	Recreates based on `backoffLimit`
Cleanup	Manual or DS deletion	`ttlSecondsAfterFinished`	`historyLimit` fields
Primary Use Case	Log/Metric agents, Proxies	Migrations, Batch processing	Backups, Cleanup, Reports

5. TROUBLESHOOTING & NINJA COMMANDS

5.1 Debugging DaemonSet Rollouts

# Check the status of a rolling update
kubectl rollout status ds/fluent-bit -n kube-system

# View which nodes do NOT have the DS Pod (Troubleshooting taints/affinity)
kubectl get pods -l name=fluent-bit -o wide

5.2 Cleaning up "Zombies"

Completed Jobs stay in the system forever (consuming Etcd space) unless ttlSecondsAfterFinished is set.

# Manually delete all completed jobs in a namespace
kubectl delete jobs --field-selector status.successful=1

5.3 Triggering a CronJob Manually

Useful for testing a backup script without waiting for 2 AM.

kubectl create job --from=cronjob/nightly-backup manual-backup-test

5.4 Watching CronJob Logic

# See the last time the job ran and if it's currently active
kubectl get cronjob

Sample Output:

NAME             SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
nightly-backup   0 2 * * *     False     0        18h             5d

1. DAEMONSETS (Infrastructure Workloads)​

1.1 Scheduling Internals​

1.2 Production Use Cases​

1.3 Bible-Grade YAML: Fluent-Bit Log Shipper​

2. JOBS (Run-to-Completion)​

2.1 The Restart Logic Paradox​

2.2 Completion Modes​

2.3 Bible-Grade YAML: Database Migration​

3. CRONJOBS (Scheduled Tasks)​

3.1 Architecture & Concurrency​

3.2 The "Missing" Job Problem​

3.3 Bible-Grade YAML: Daily Database Backup​

4. ARCHITECT'S COMPARISON SUMMARY​

5. TROUBLESHOOTING & NINJA COMMANDS​

5.1 Debugging DaemonSet Rollouts​

5.2 Cleaning up "Zombies"​

5.3 Triggering a CronJob Manually​

5.4 Watching CronJob Logic​