Foundation Model Landscape

The frontier model landscape changes month to month, but the structure of it is more stable than the leaderboard suggests. Knowing the major families — and what each is optimized for — is more useful than chasing the latest benchmark.

The Major Frontier Labs

Anthropic — Claude family. Strong reasoning, long context, character and safety work treated as first-class.
OpenAI — GPT family plus reasoning models (o-series). Broadest tooling and ecosystem.
Google DeepMind — Gemini family. Native multimodality, deep integration with Google products.
Meta — Llama family. The dominant open-weights line; the foundation most fine-tuned models build on.
Mistral — efficient open and commercial models, strong European footprint.
xAI — Grok family.

Open vs Closed Weights

Closed weights — accessed only via API. Easier to ship, no infrastructure burden, but you're a tenant of someone else's model.
Open weights — Llama, Mistral, Qwen, DeepSeek. You can host, fine-tune, audit, and run offline. Lower per-token cost at scale; higher operational cost overall.

The right answer depends on your throughput, regulatory posture, and whether the marginal capability gap is worth the operational tax.

Capability Tiers

A rough mental model of the current tiers:

Frontier — the largest, most capable models from each lab. Highest cost per token, best at hard reasoning.
Mid-tier / fast — smaller, faster, cheaper variants of the frontier. The right default for most product features.
Small / on-device — 1B–8B parameter models. Run on a laptop or phone, suitable for narrow tasks and latency-critical paths.

Most production systems route between tiers, sending only the requests that actually need frontier capability to the expensive model.

Specialized Models

Reasoning models — trained to think through problems with explicit intermediate steps. Better on math, coding, and multi-step planning.
Code models — tuned specifically for code generation and editing.
Multimodal models — accept and produce mixes of text, images, audio, and video.
Embedding models — produce vectors, not text. Different family entirely.

What to Watch

The interesting axes today aren't raw quality — they're context length, cost per token, latency, tool use reliability, and how well the model handles long-horizon agentic work. Those determine what you can actually build, not the leaderboard score.

The Major Frontier Labs

Open vs Closed Weights

Capability Tiers

Specialized Models

What to Watch

On this page