Code Generation & Assistants

Code assistance is the AI use case where engineers feel the impact most directly: completion in the editor, agents that can navigate and edit a repo, reviewers that catch real bugs. The space has matured fast and the patterns are worth studying even if you're not building one.

The Spectrum of Code Assistance

From least to most autonomous:

Inline completion — suggests the next few lines as you type (Copilot, Cursor's tab).
Chat in editor — answer questions and produce snippets for a specific file or selection.
Multi-file edits — propose changes across files, applied with a diff review (Cursor composer, Claude Code in plan mode).
Agentic coding — the model navigates the repo, runs tests, iterates on failures, and lands a change largely on its own (Claude Code, Devin, Cline).

Each level requires more capability and more guardrails.

What Makes a Good Code Assistant

Codebase awareness. The model sees relevant context — the function being called, the type definition, the caller — not just the cursor area.
Tool use. Search, file reading, test execution, type checking, lint. The model that runs the tests is much more useful than the one that just guesses.
Tight feedback loops. Type errors, test failures, and lint output go straight back into the model's context.
Diff-first interaction. Show the user a diff, not a wall of code. Diffs are how engineers think about change.

Retrieval for Code

Code retrieval has its own quirks:

Symbol-aware indexing. Treat function and class definitions as first-class units, not arbitrary chunks.
Hybrid retrieval. Embeddings for semantic ("auth flow"), exact match for identifiers (getCurrentUser).
Graph awareness. Following imports and call sites often beats pure similarity search.

Verification Closes the Loop

Code is one of the few domains where verification is cheap: tests, type checkers, linters, builds. The most reliable agents lean on this hard:

Make a change.
Run the relevant verifier.
Read the failure and try again.

This is why coding agents have advanced faster than agents in many other domains — the environment provides ground truth.

Common Failure Modes

Hallucinated APIs. Functions, methods, or imports that don't exist. Mitigate with retrieval and verification.
Style drift. Generated code that doesn't match the project's conventions.
Surface-level fixes. Patching a symptom rather than the root cause.
Spec drift. The model loses track of what the user actually asked for in long sessions.

Where the Frontier Is

Today's frontier is long-horizon work: agents that take a vague ticket, propose a plan, implement across many files over hours, run the tests, and open a reviewable PR. The bottlenecks are evaluation, planning over long contexts, and human trust in autonomous changes — not raw code generation quality.