Steven's Knowledge

Patterns

Direct upload, multipart, lifecycle rules, versioning, static hosting, replication, signed URLs in depth

Patterns

The handful of object-storage patterns that show up in real systems. Mastering these unlocks 95% of what you'll do with S3-compatible storage.

Direct Browser Upload (Presigned URLs)

The pattern everyone should use for user uploads:

Browser ──► Backend: "I want to upload photo X (size, type)"
Backend ──► validates user can upload; issues presigned PUT URL
Browser ──► PUT directly to storage with the URL
Browser ──► Backend: "uploaded; here's the key"
Backend ──► stores the key in DB, fires post-processing

Backend (issue URL)

const key = `uploads/${userId}/${uuid()}.jpg`;
const url = await getSignedUrl(
  s3,
  new PutObjectCommand({
    Bucket: 'uploads',
    Key: key,
    ContentType: 'image/jpeg',
    ContentLength: size,          // optional: enforce expected size
  }),
  { expiresIn: 300 }
);

return Response.json({ url, key });

Browser (upload)

const { url, key } = await fetch('/api/upload-url', {
  method: 'POST',
  body: JSON.stringify({ size: file.size, type: file.type }),
}).then(r => r.json());

await fetch(url, {
  method: 'PUT',
  headers: { 'Content-Type': file.type },
  body: file,
});

// Tell backend we're done
await fetch('/api/upload-complete', {
  method: 'POST',
  body: JSON.stringify({ key }),
});

Why this matters

Without presigned URLsWith them
Files stream through your serversFiles go browser → storage; you never see them
App memory / CPU tied up per uploadServers stay responsive
Upload throughput bounded by your app fleetBounded by browser ↔ storage (massive)
Multipart upload coordinated server-sideBrowser does it directly

Form POST as an alternative

For HTML form uploads (no JS), use presigned POST policies that include enforced constraints — max size, content type prefix, key prefix:

const policy = await createPresignedPost(s3, {
  Bucket: 'uploads',
  Key: `uploads/${userId}/\${filename}`,
  Conditions: [
    ['content-length-range', 0, 10 * 1024 * 1024],   // 10 MB max
    ['starts-with', '$Content-Type', 'image/'],
  ],
  Expires: 300,
});
// → policy.url + policy.fields

Render <form action={policy.url}> with the fields as hidden inputs.

Multipart Upload

For files over ~100 MB, single-PUT uploads stall on flaky networks. Multipart splits into 5 MB+ parts uploaded in parallel:

1. CreateMultipartUpload → returns uploadId
2. UploadPart × N (in parallel) → each returns ETag
3. CompleteMultipartUpload(uploadId, parts) → finalizes
// High-level — most SDKs do this for you
const upload = new Upload({
  client: s3,
  params: { Bucket: 'videos', Key: 'big.mp4', Body: fileStream },
  partSize: 10 * 1024 * 1024,
  queueSize: 4,
});
upload.on('httpUploadProgress', (p) => console.log(p.loaded / p.total));
await upload.done();

Most SDKs (@aws-sdk/lib-storage, boto3's upload_fileobj) handle multipart transparently.

Browser-side multipart is more involved — you presign each part URL and orchestrate parallelism. Libraries like Uppy / S3Fanout handle this.

Abandoned multipart uploads cost money. They sit in storage incurring fees until completed or aborted. Add a lifecycle rule to abort multiparts after 7 days — see below.

Lifecycle Rules

Automate moving / deleting old objects:

{
  "Rules": [
    {
      "Id": "expire-temp",
      "Status": "Enabled",
      "Filter": { "Prefix": "tmp/" },
      "Expiration": { "Days": 7 }
    },
    {
      "Id": "tier-old-logs",
      "Status": "Enabled",
      "Filter": { "Prefix": "logs/" },
      "Transitions": [
        { "Days": 30,  "StorageClass": "STANDARD_IA" },
        { "Days": 90,  "StorageClass": "GLACIER" },
        { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
      ],
      "Expiration": { "Days": 2555 }
    },
    {
      "Id": "abort-multipart",
      "Status": "Enabled",
      "Filter": {},
      "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
    }
  ]
}

Storage tiers trade off cost vs access latency:

ClassPer-GB costRetrievalWhen to use
STANDARDHighestInstantHot data
STANDARD_IA (Infrequent Access)~50% lessInstant; minimum 30-day storageBackups, monthly reports
GLACIER_INSTANT~70% lessInstant; minimum 90-dayCold but might need now
GLACIER_FLEXIBLE~85% lessMinutes to hoursCompliance archives
DEEP_ARCHIVE~95% lessHours"Maybe never read again"

R2, GCS, Azure all have equivalents. Set up lifecycle rules on every bucket from day one — it's the easiest cost win.

Versioning

Turn on versioning to keep every overwrite:

aws s3api put-bucket-versioning --bucket photos \
  --versioning-configuration Status=Enabled

Now every PUT creates a new version; DELETE creates a "delete marker" but keeps the data. Restoring a "deleted" object is one API call. Pair with a lifecycle rule to permanently delete old versions after N days.

Versioning + replication + MFA-delete protection is the gold-standard configuration for buckets holding irreplaceable data.

Static Website Hosting

S3 (and equivalents) can serve a bucket as a website directly:

# Enable
aws s3 website s3://my-site --index-document index.html --error-document 404.html

# Make objects public
aws s3 cp ./dist s3://my-site/ --recursive --acl public-read

But the right pattern in 2026 is bucket + CDN:

Internet ──► CDN (Cloudflare / CloudFront) ──► origin: bucket

The bucket is private; the CDN has an OAC / signed identity that lets it read. Users hit the CDN; the CDN hits the bucket on miss. You get edge caching, HTTPS, and DDoS protection for free. See CDN.

Replication

For DR or geographic distribution:

# AWS S3 Cross-Region Replication
aws s3api put-bucket-replication --bucket source-bucket \
  --replication-configuration file://replication.json
{
  "Role": "arn:aws:iam::123:role/replication-role",
  "Rules": [{
    "Status": "Enabled",
    "Priority": 1,
    "Filter": {},
    "DeleteMarkerReplication": { "Status": "Enabled" },
    "Destination": {
      "Bucket": "arn:aws:s3:::dr-bucket",
      "StorageClass": "STANDARD_IA"
    }
  }]
}

Replication is eventually consistent — usually seconds, sometimes minutes. Don't depend on it for strong consistency; use it for DR, latency (multi-region reads), or compliance.

Event Notifications

Trigger something when an object is created/deleted:

S3 ObjectCreated event ──► SNS / SQS / Lambda / EventBridge ──► your code

Common uses:

  • New image uploaded → kick off thumbnail generation
  • New PDF uploaded → extract text
  • New log file → ship to your log aggregator
  • Object deleted → audit log

R2 has Workers integration; GCS has Pub/Sub notifications; Azure Blob has Event Grid. Same pattern, different glue.

Static Assets with Hash-in-Filename + Long TTLs

/assets/main.a4f9e21c.css     → immutable; year-long Cache-Control
/assets/main.b3d72ed8.css     → new version

Upload with:

await s3.send(new PutObjectCommand({
  Bucket: 'static',
  Key: 'assets/main.a4f9e21c.css',
  Body: cssBuffer,
  ContentType: 'text/css',
  CacheControl: 'public, max-age=31536000, immutable',
}));

The CDN caches forever. New deploys publish new hashed filenames; old versions naturally age out. No cache purge ever needed.

Cross-Account / Cross-Provider Access

S3 bucket policies grant access across AWS accounts:

{
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "AWS": "arn:aws:iam::OTHER_ACCOUNT:role/data-reader" },
    "Action": ["s3:GetObject"],
    "Resource": "arn:aws:s3:::shared-data/*"
  }]
}

For cross-provider (e.g., AWS Lambda reading from R2), R2 uses access keys; the Lambda environment holds those keys. OIDC isn't yet common for cross-provider object storage, so manage keys via Secrets Management.

Backup Patterns

ApproachNotes
Same-bucket versioningEasy "oops" recovery; doesn't survive bucket deletion
Cross-region replicationDR; same provider
Cross-account replicationSurvives account compromise
Cross-provider mirror (e.g., S3 → R2 nightly)Survives provider outage / account loss
Object Lock / ImmutableCompliance; objects can't be deleted before retention period

Object Lock + cross-account replication is the gold standard for "can't lose this" data — even an attacker with full access to your account can't delete it.

What's Next

You can use object storage for the patterns that come up in real systems. Best Practices covers operations — security, cost, observability, naming, pitfalls.

On this page