Steven's Knowledge

Validation & Serialization

Validating input at the boundary, schema validation with Zod/Pydantic/JSON Schema, DTOs vs domain models, mass-assignment risks, and parse-don't-validate

Validation & Serialization

Every byte that crosses your service boundary is hostile until proven otherwise. The client may be a browser, a mobile app, a partner's server, or an attacker with curl. Your job at the edge is to turn untrusted, untyped bytes into trusted, typed data — or reject them. Everything inside the boundary should be able to assume the data is already valid.

This page is about that boundary: how to validate, how to shape data on the way in and out, and the traps that turn validation into a security hole.

Validate at the Boundary

There is exactly one place to validate untrusted input: the moment it enters your system, before any business logic touches it. Not in the database layer, not scattered through services — at the edge.

HTTP request body
  → [VALIDATION HAPPENS HERE]   ← untyped → typed, untrusted → trusted
    → handler (works with a validated, typed object)
      → domain logic (never re-checks: the type guarantees validity)
        → persistence

The payoff: business logic stops being defensive. If a function takes a ValidatedOrder, it does not need to check whether quantity is a positive integer — the type system already proved it. Validation done once at the edge replaces validation scattered everywhere.

Parse, Don't Validate

The deepest idea in this whole topic. A validator answers a yes/no question and throws the answer away:

// Validate: returns a boolean, type is unchanged
function isValidEmail(input: unknown): boolean { ... }

function handle(body: any) {
  if (!isValidEmail(body.email)) throw new Error('bad email');
  // body.email is STILL `any` here — the check told us nothing the type system remembers
  sendEmail(body.email);
}

A parser answers the same question but returns a more precise type that carries the proof forward:

// Parse: returns a typed value or throws; the type now encodes validity
function parseEmail(input: unknown): Email { ... } // Email is a branded string type

function handle(body: unknown) {
  const email: Email = parseEmail(body.email);
  // From here on, the type IS the guarantee. No re-checking possible or needed.
  sendEmail(email);
}

Parse, don't validate. Push untrusted data through a parser at the boundary that produces a precisely-typed value. After that point, illegal states are unrepresentable — the compiler enforces what a if check only hopes for.

This is why schema libraries that infer types (Zod, Pydantic) are strictly better than ones that only check (a bare if ladder, JSON Schema used purely for validation). The schema and the type stay in sync because one generates the other.

Schema Validation

Hand-written validation rots. Use a schema library that defines the shape once and derives both runtime checks and the static type.

Zod (TypeScript)

import { z } from 'zod';

const CreateUserSchema = z.object({
  email: z.string().email(),
  age: z.number().int().min(13).max(120),
  role: z.enum(['admin', 'member', 'viewer']).default('member'),
  tags: z.array(z.string()).max(10).optional(),
});

// The type is DERIVED from the schema — they cannot drift apart
type CreateUserInput = z.infer<typeof CreateUserSchema>;

function handler(rawBody: unknown) {
  const result = CreateUserSchema.safeParse(rawBody);
  if (!result.success) {
    return respond(400, { error: { code: 'VALIDATION_ERROR', details: result.error.issues } });
  }
  const input: CreateUserInput = result.data; // fully typed, fully validated
  return createUser(input);
}

Pydantic (Python)

from pydantic import BaseModel, EmailStr, Field, field_validator

class CreateUser(BaseModel):
    email: EmailStr
    age: int = Field(ge=13, le=120)
    role: Literal["admin", "member", "viewer"] = "member"
    tags: list[str] = Field(default_factory=list, max_length=10)

    @field_validator("email")
    @classmethod
    def normalize_email(cls, v: str) -> str:
        return v.lower().strip()

# FastAPI does this automatically: the parameter type IS the validation
@app.post("/users")
def create_user(payload: CreateUser):  # 422 returned automatically on bad input
    return service.create(payload)

Picking a Tool

ToolEcosystemType inferenceBest when
ZodTypeScriptSchema → type (z.infer)TS apps; one source of truth for shape + type
PydanticPythonType → schema (annotations)FastAPI; settings; rich coercion
JSON SchemaLanguage-agnosticNone (it's just data)Cross-language contracts, OpenAPI, config files

JSON Schema is the right choice when the contract must be shared across languages (an OpenAPI spec, a config schema validated by multiple services). It is the wrong choice as your primary in-code validator in a typed language — you lose type inference and write the type twice.

DTOs vs Domain Models

A common mistake: letting the HTTP request shape be your domain object. The wire format and the domain model have different jobs and change for different reasons.

DTO (Data Transfer Object)Domain Model
PurposeMove data across the boundaryExpress business rules and invariants
ShapeFlat, matches the API contractWhatever the logic needs; rich behavior
Changes whenThe API contract changesThe business rules change
ContainsPlain fields, no logicMethods, invariants, encapsulated state
ExposurePublic, versionedInternal, free to refactor
// DTO — mirrors the JSON the client sends; validated at the edge
const CreateOrderDTO = z.object({
  customerId: z.string().uuid(),
  lineItems: z.array(z.object({ sku: z.string(), qty: z.number().int().positive() })),
});

// Domain model — the application's vocabulary, with behavior
class Order {
  private constructor(readonly id: OrderId, private items: LineItem[]) {}

  static create(dto: z.infer<typeof CreateOrderDTO>): Order { /* maps + enforces invariants */ }

  applyDiscount(pct: Percentage): void { /* business rule lives HERE, not in the DTO */ }
}

Keeping them separate means you can change your database schema or refactor domain logic without breaking the public API, and vice versa. The mapping layer between them is cheap insurance.

Serialization & Deserialization

Deserialization is bytes → object (request in). Serialization is object → bytes (response out). Both are boundary crossings and both have traps.

On the way out, never serialize the domain model directly. Define an explicit response shape:

// BAD: leaks whatever the domain object happens to hold today
return res.json(user); // ← passwordHash, internal flags, future fields all leak

// GOOD: explicit allowlist of what goes on the wire
function toUserResponse(u: User) {
  return { id: u.id, email: u.email, name: u.name, createdAt: u.createdAt.toISOString() };
}
return res.json(toUserResponse(user));

Serialization rules that prevent leaks and surprises:

  • Allowlist output fields, never blocklist. A blocklist (delete user.passwordHash) silently leaks every new sensitive field someone adds later. An allowlist only ever exposes what you named.
  • Normalize types on the wire. Dates as ISO 8601 strings, money as integer minor units (cents) or a { amount, currency } object — never floats. Decide once, apply everywhere.
  • Be explicit about nulls vs absent. { "name": null } and {} are different. Pick a convention (usually: omit absent, send null only when "explicitly cleared" is meaningful).

Mass-Assignment: The Classic Hole

The single most dangerous shortcut in input handling is binding the entire request body straight onto a model:

// CATASTROPHIC: whatever the client sends becomes user fields
const user = await User.update(id, req.body);
// Attacker sends: { "name": "Bob", "role": "admin", "isVerified": true, "credits": 999999 }
// → privilege escalation, for free
# Same hole in an ORM
user = User(**request.json)  # client controls EVERY column, including is_staff

The fix is the same idea as parse-don't-validate: never trust the body's shape; project it through a schema that names exactly the assignable fields.

// The schema is the allowlist. role/isVerified/credits are not in it, so they can't be set.
const UpdatableUser = z.object({ name: z.string(), bio: z.string().max(500).optional() });
const safe = UpdatableUser.parse(req.body);
await User.update(id, safe); // only `name` and `bio` can ever be written

Privileged fields (role, ownerId, accountBalance) get their own explicit, authorized endpoints — they never ride in on a general update.

Where Validation Does NOT Belong

  • Not only in the database. A NOT NULL or CHECK constraint is a last-resort safety net, not your primary validation. By the time the DB rejects it, you have wasted a round trip and your error is a cryptic constraint violation, not a helpful 422.
  • Not in the frontend alone. Client-side validation is a UX nicety. It is trivially bypassed with curl. Every client-side rule must be re-enforced on the server.
  • Not re-done in every service method. That is the smell that says you did not parse at the boundary. Parse once, pass typed values inward.

Decision Tree

Data is entering your system from outside?
  → Parse it at the boundary into a precise type (Zod / Pydantic). Reject with 422 on failure.

Defining the wire contract?
  → Use a DTO distinct from your domain model. Map between them explicitly.

Binding a request body to a model/ORM?
  → STOP. Project through a schema that allowlists assignable fields. Never spread req.body.

Sending a response?
  → Allowlist output fields explicitly. Never serialize the domain object directly.

Contract shared across languages?
  → JSON Schema / OpenAPI for the contract; still parse into native types per service.

Checklist

Before shipping an endpoint:

  • Untrusted input is parsed into a typed value at the boundary, before business logic.
  • The schema and the static type come from one source (inference), not two hand-written copies.
  • Invalid input returns a structured 4xx (400/422) with field-level details, not a 500.
  • The request body is never spread/bound directly onto a model — assignable fields are allowlisted.
  • Privileged fields have their own authorized endpoints, not a shared update path.
  • Responses are built from an explicit output shape, not the raw domain object.
  • Dates, money, and IDs have one canonical wire representation.
  • Client-side validation is treated as UX only; the server re-enforces every rule.

On this page