Background Jobs & Workflow Orchestration
Sidekiq, BullMQ, Celery, Temporal, Airflow - task queues, retries, scheduled work, and durable workflows
Background Jobs & Workflow Orchestration
When a web request triggers work that takes longer than the user should wait — sending email, processing images, generating reports — you push that work to a background job. When the work has multiple steps with dependencies, retries, and long-running state, you graduate to a workflow orchestrator.
This page covers both ends of the spectrum. They're related but solve different problems.
Two Different Patterns
| Background Jobs | Workflow Orchestration | |
|---|---|---|
| Duration | Seconds to minutes | Minutes to weeks |
| Complexity | Single function | Multi-step, conditional, parallel |
| Failure handling | Retry + dead-letter | Resume from where it failed |
| State | Job arguments + return | Persistent execution history |
| Examples | "Send welcome email" | "Onboard new customer: provision, validate, charge, fulfill" |
| Tools | Sidekiq, BullMQ, Celery, RQ, Bee Queue | Temporal, Airflow, Step Functions, Argo Workflows, Cadence |
Why Background Jobs
| Without | With |
|---|---|
| Long requests time out / users wait | Request returns fast; work happens async |
| Failed work disappears | Retry with backoff; dead-letter for unfixable |
| Scheduled tasks need cron + scripts | Native scheduled / recurring jobs |
| Work tied to web servers | Worker fleet scales independently |
| Lost work on crash | Durable queue survives restarts |
For most apps: anything taking > 100 ms or non-essential to the user response goes to a background job.
The Job Queue Layer
| Tool | Language | Backend | Notes |
|---|---|---|---|
| Sidekiq | Ruby | Redis | Mature; the Ruby standard |
| BullMQ | Node.js | Redis | Modern Node standard; replaces Bull |
| Celery | Python | Redis / RabbitMQ | Python standard; complex but flexible |
| RQ | Python | Redis | Simpler than Celery |
| Inngest | Multi-language | Hosted SaaS | Code-defined workflows; modern UX |
| Trigger.dev | Multi-language | Hosted SaaS | Code-first job orchestration |
| AWS SQS + workers | Any | SQS | Cheapest; you write the worker loop |
| Cloudflare Queues + Workers | JS | Native | Pairs with Edge Functions |
For Node: BullMQ is the default. For Ruby: Sidekiq. For Python: Celery or RQ. For a hosted "we don't want to operate Redis": Inngest or Trigger.dev.
The Workflow Orchestration Layer
| Tool | Strengths | When |
|---|---|---|
| Temporal | Code-defined workflows; replay-based durability; multi-language | Complex business logic, long-running |
| AWS Step Functions | JSON state-machine DSL; AWS-native | AWS-heavy stacks |
| Apache Airflow | DAGs; batch / data engineering | Scheduled data pipelines |
| Argo Workflows | Kubernetes-native; container per step | Already on K8s; batch processing |
| Dagster | Data-asset oriented | Modern data engineering |
| Prefect | Pythonic; Airflow alternative | Python-centric pipelines |
| n8n / Zapier-like | No-code; visual | Glue between SaaS tools |
| Inngest / Trigger.dev | Code-first; hosted | Mid-complexity; modern stacks |
Two big questions:
- Is this batch / ETL or transactional? Batch → Airflow / Dagster. Transactional → Temporal / Step Functions / Inngest.
- Do I need code-defined workflows or visual? Almost always code, for the same reason you'd use IaC over a console.
Learning Path
1. Getting Started
Run BullMQ with Redis - enqueue a job, process it, schedule a recurring task
2. Jobs vs Workflows
When to graduate to Temporal, what durable execution gives you, code examples
3. Best Practices
Idempotency, retries, dead-letter queues, observability, scaling workers
The Killer Property: Durable Execution
Modern orchestrators (Temporal, Inngest, Trigger.dev) offer durable execution — your code's execution state is persisted, so it survives crashes, restarts, even multi-day workflows. The workflow code looks like normal code:
// Temporal-style — pretend this is a 7-day workflow
async function onboardCustomer(customerId) {
const customer = await activities.createAccount(customerId);
await sleep('1 day');
if (!customer.verified) {
await activities.sendReminderEmail(customer);
}
await sleep('5 days');
if (!customer.verified) {
await activities.deactivate(customer);
}
await activities.sendWelcomeBonus(customer);
}The sleep('1 day') actually sleeps for a day — the orchestrator saves state, your worker can restart, and the workflow picks up where it left off. You write business logic; the orchestrator handles persistence, retries, and time.
Without orchestration, this same logic would be cron jobs + database flags + complicated state machines.
For most apps, you start with a simple job queue (BullMQ, Sidekiq, Celery). You graduate to orchestration when you have multi-step workflows that take longer than a single function, must complete eventually, and where partial failures need careful handling. Don't reach for Temporal on day one; do reach for it when you're building scheduled state machines from scratch.
What's Different from Message Queues?
Message Queues (Kafka, RabbitMQ) and background-job systems overlap. The distinction:
| Message Queues | Job Queues / Orchestrators |
|---|---|
| Transport between systems | Application-level task execution |
| Generic message → many consumers | Job → one execution |
| No native retry / DLQ semantics for jobs | First-class retries, DLQ, scheduling |
| Cross-language pub/sub | Often single-language (Sidekiq is Ruby, BullMQ is Node) |
You can build a job system on top of a message queue. But the dedicated job libraries (Sidekiq, BullMQ, Celery) layer scheduled jobs, retries, dead-lettering, dashboards on top of their queue — that's the value.
Often used together: Kafka for cross-service events; BullMQ/Sidekiq for application-internal background work.