Steven's Knowledge

Background Jobs & Workflow Orchestration

Sidekiq, BullMQ, Celery, Temporal, Airflow - task queues, retries, scheduled work, and durable workflows

Background Jobs & Workflow Orchestration

When a web request triggers work that takes longer than the user should wait — sending email, processing images, generating reports — you push that work to a background job. When the work has multiple steps with dependencies, retries, and long-running state, you graduate to a workflow orchestrator.

This page covers both ends of the spectrum. They're related but solve different problems.

Two Different Patterns

Background JobsWorkflow Orchestration
DurationSeconds to minutesMinutes to weeks
ComplexitySingle functionMulti-step, conditional, parallel
Failure handlingRetry + dead-letterResume from where it failed
StateJob arguments + returnPersistent execution history
Examples"Send welcome email""Onboard new customer: provision, validate, charge, fulfill"
ToolsSidekiq, BullMQ, Celery, RQ, Bee QueueTemporal, Airflow, Step Functions, Argo Workflows, Cadence

Why Background Jobs

WithoutWith
Long requests time out / users waitRequest returns fast; work happens async
Failed work disappearsRetry with backoff; dead-letter for unfixable
Scheduled tasks need cron + scriptsNative scheduled / recurring jobs
Work tied to web serversWorker fleet scales independently
Lost work on crashDurable queue survives restarts

For most apps: anything taking > 100 ms or non-essential to the user response goes to a background job.

The Job Queue Layer

ToolLanguageBackendNotes
SidekiqRubyRedisMature; the Ruby standard
BullMQNode.jsRedisModern Node standard; replaces Bull
CeleryPythonRedis / RabbitMQPython standard; complex but flexible
RQPythonRedisSimpler than Celery
InngestMulti-languageHosted SaaSCode-defined workflows; modern UX
Trigger.devMulti-languageHosted SaaSCode-first job orchestration
AWS SQS + workersAnySQSCheapest; you write the worker loop
Cloudflare Queues + WorkersJSNativePairs with Edge Functions

For Node: BullMQ is the default. For Ruby: Sidekiq. For Python: Celery or RQ. For a hosted "we don't want to operate Redis": Inngest or Trigger.dev.

The Workflow Orchestration Layer

ToolStrengthsWhen
TemporalCode-defined workflows; replay-based durability; multi-languageComplex business logic, long-running
AWS Step FunctionsJSON state-machine DSL; AWS-nativeAWS-heavy stacks
Apache AirflowDAGs; batch / data engineeringScheduled data pipelines
Argo WorkflowsKubernetes-native; container per stepAlready on K8s; batch processing
DagsterData-asset orientedModern data engineering
PrefectPythonic; Airflow alternativePython-centric pipelines
n8n / Zapier-likeNo-code; visualGlue between SaaS tools
Inngest / Trigger.devCode-first; hostedMid-complexity; modern stacks

Two big questions:

  1. Is this batch / ETL or transactional? Batch → Airflow / Dagster. Transactional → Temporal / Step Functions / Inngest.
  2. Do I need code-defined workflows or visual? Almost always code, for the same reason you'd use IaC over a console.

Learning Path

The Killer Property: Durable Execution

Modern orchestrators (Temporal, Inngest, Trigger.dev) offer durable execution — your code's execution state is persisted, so it survives crashes, restarts, even multi-day workflows. The workflow code looks like normal code:

// Temporal-style — pretend this is a 7-day workflow
async function onboardCustomer(customerId) {
  const customer = await activities.createAccount(customerId);

  await sleep('1 day');

  if (!customer.verified) {
    await activities.sendReminderEmail(customer);
  }

  await sleep('5 days');

  if (!customer.verified) {
    await activities.deactivate(customer);
  }

  await activities.sendWelcomeBonus(customer);
}

The sleep('1 day') actually sleeps for a day — the orchestrator saves state, your worker can restart, and the workflow picks up where it left off. You write business logic; the orchestrator handles persistence, retries, and time.

Without orchestration, this same logic would be cron jobs + database flags + complicated state machines.

For most apps, you start with a simple job queue (BullMQ, Sidekiq, Celery). You graduate to orchestration when you have multi-step workflows that take longer than a single function, must complete eventually, and where partial failures need careful handling. Don't reach for Temporal on day one; do reach for it when you're building scheduled state machines from scratch.

What's Different from Message Queues?

Message Queues (Kafka, RabbitMQ) and background-job systems overlap. The distinction:

Message QueuesJob Queues / Orchestrators
Transport between systemsApplication-level task execution
Generic message → many consumersJob → one execution
No native retry / DLQ semantics for jobsFirst-class retries, DLQ, scheduling
Cross-language pub/subOften single-language (Sidekiq is Ruby, BullMQ is Node)

You can build a job system on top of a message queue. But the dedicated job libraries (Sidekiq, BullMQ, Celery) layer scheduled jobs, retries, dead-lettering, dashboards on top of their queue — that's the value.

Often used together: Kafka for cross-service events; BullMQ/Sidekiq for application-internal background work.

On this page