Steven's Knowledge

Object Storage

S3-compatible blob storage - AWS S3, Cloudflare R2, GCS, Azure Blob, MinIO - the backbone of modern data persistence

Object Storage

Object storage holds unstructured blobs — files, images, video, backups, log archives, ML training data — at effectively unlimited scale, accessed over HTTP. It's the workhorse behind almost every modern system: every CDN origin, every backup target, every data lake, every user-uploaded photo.

What It Is (and Isn't)

Object storageBlock storageFile system
HTTP API, immutable blobsMountable diskPOSIX semantics
Cheap per GB; high latency per opFast per op; tied to one VMPer-file ops; tied to a server or share
Eventually consistent (historically) / strong nowStrongStrong
Scales to petabytesTB per volumeServer-bound
Examples: S3, R2, GCSEBS, GCE PDEFS, NFS

If you find yourself thinking "S3 with directories" — object storage simulates directories via key prefixes (photos/2025/05/...), but the underlying model is a flat key-value store. Operations are per-object, not per-tree.

Why Object Storage

WithoutWith
Files on a single VM disk — single point of failure99.999999999% (11 nines) durability across zones
Database BLOB columns — bloats DB, slows backupsDatabase stores URLs; storage handles blobs
Local disk shared across services via NFSEvery service hits storage directly via HTTPS
Manual backup of filesLifecycle rules move old data to cheap tiers automatically
Serving uploads from your app serverBrowser uploads directly via presigned URLs

The Players

ProviderNotes
AWS S3The original; the API everyone copies. Pricing per GB + per-request + egress
Cloudflare R2S3-compatible API; zero egress fees — disruptive for distribution-heavy workloads
Google Cloud Storage (GCS)First-class GCP integration; native multi-region
Azure Blob StorageMicrosoft equivalent; deep hot/cool/archive tiering
Backblaze B2Very cheap; S3-compatible; less feature-rich
WasabiSame niche — cheap, S3-compatible, hosted
MinIOOpen-source, self-hostable; S3 API; popular for on-prem / hybrid
SeaweedFS / Ceph RGW / GarageOpen-source for hyperscale or air-gapped
DigitalOcean Spaces / Linode Object StorageS3-compatible, simpler pricing

The S3 API is the de facto standard — most clients work against any S3-compatible storage by changing the endpoint URL.

Choosing a Provider

A short decision tree:

You need...Pick
The most-supported API and richest ecosystemAWS S3
Public distribution at scale (CDN origin, downloads)R2 — egress is free
Tight GCP / BigQuery integrationGCS
Tight Azure integrationAzure Blob
Cheapest cold storageBackblaze B2 / Wasabi
Self-host / air-gap / on-premMinIO
Multi-cloud abstractionAll-of-the-above via S3-compatible clients

For new projects in 2026: R2 if you serve significant traffic (CDN savings dominate), S3 if you're already on AWS (no egress to AWS services from S3).

Learning Path

What's a Bucket, an Object, a Key

s3://my-bucket/photos/2025/05/sunset.jpg
   │      │              │
   │      └─ bucket       └─ key (the full path within the bucket)
   └─ scheme
  • Bucket — top-level container. Globally named (S3) or per-account named (R2). Has a region, policy, lifecycle rules.
  • Object — the actual blob: bytes + metadata + headers.
  • Key — the object's identifier within the bucket. Looks like a path but is a flat string.
  • Metadata — HTTP-style headers (Content-Type, Cache-Control) + user-defined (x-amz-meta-*).

Operations are per-object. There's no "rename a directory" because there are no directories.

Object storage is immutable per object. Writing to photos/sunset.jpg either creates a new version (if versioning is on) or overwrites — you can't append, you can't partial-update. For mutable structured data you want a database; object storage is for blobs.

On this page