Documentation Writing
How to write READMEs, runbooks, ADRs, and onboarding docs in English (用英文写 README、runbook、ADR、入职文档)
Why Documentation Is Senior+ Work (为什么文档是资深以上的工作)
Code is read more than written. Documentation is read far more than written — by future you, new hires, on-call engineers at 3am, and people in adjacent teams. Engineers who write good docs get disproportionate credit because their impact compounds.
代码读多于写。文档读远多于写——未来的你、新人、凌晨 3 点的 on-call、相邻团队。写好文档的工程师获得不成比例的认可,因为影响是复利。
The principle 原则: Write for the reader you'll never meet. They have 30 seconds, no context, and a question. Answer it.
为你永远见不到的读者写。他们有 30 秒、没背景、有问题。回答它。
This page covers operational documentation (READMEs, runbooks, ADRs). For proposal documentation (design docs, RFCs, PRDs), see the Design Docs page.
本页讲运维型文档。提案型文档见 Design Docs 页面。
README (项目自述)
A README is the front door to your project. It tells someone landing on it: what is this, can I use it, how.
README 是项目的前门。告诉新来者:这是什么、能不能用、怎么用。
Standard structure (标准结构)
# Project Name
One-sentence description: what it does, who it's for.
## Status
- Active / Maintained / Deprecated
- Owner: @team-name or @person
## Quickstart
[How to run / use, in under 5 commands]
## What it does
[1-2 paragraphs of what problem it solves]
## How it works
[High-level architecture, link to deeper docs]
## Development
[How to set up, test, contribute]
## Deployment
[How it gets to production, link to runbook]
## Links
- Architecture doc
- Runbook
- Slack channel
- On-call rotationWhat to lead with (最先写什么)
The first sentence should answer: "Should I keep reading?"
第一句要回答:"我要不要继续读?"
❌ "This repository contains the implementation of..." — 没用。 ✅ "Auth-service handles login, sessions, and OAuth for all consumer products." — 一秒知道是不是要找的。
Quickstart matters most (Quickstart 最重要)
Most readers want to run something within 5 minutes. Make it possible.
大多数读者想 5 分钟内跑起来。让它可能。
## Quickstart
```bash
git clone [repo]
cd [repo]
make setup
make run
```
Now visit http://localhost:8080. Default credentials: dev/dev.Phrases (短语)
- "For [audience], if you want to [goal]..." — 给 [受众],如果你想 [目标]……
- "This is the right place if [...]. If [other goal], try [other place]." — 你来对地方了,如果……如果 [别的],去 [别处]。
- "See [link] for [topic]." — 见 [链接] 了解 [话题]。
- "Owned by [team]. Slack: #channel." — [团队] 拥有。Slack:#channel。
Runbook (操作手册)
A document for the on-call engineer at 3am. They are tired, stressed, and have never seen this system before. Optimize for them.
给凌晨 3 点的 on-call 工程师。他们疲倦、紧张、从没见过这个系统。为他们优化。
Structure (结构)
# [Service] Runbook
## Quick links
- Dashboard: [link]
- Alerts: [link]
- Repository: [link]
- Owner team: @platform
## Common issues
### Issue: HTTP 500 spike
**Symptom**: error rate > 1% on /api/users
**First steps**:
1. Check dashboard: [link]
2. Check recent deploys: [link]
3. Look at error logs: [link]
**If [pattern X]**:
- Likely cause: [Y]
- Fix: [specific commands]
**If [pattern Z]**:
- Escalate to [team]
**Rollback**:
```bash
[exact rollback command]
```
### Issue: latency p99 > 500ms
[...]
## Escalation
- Team channel: #platform-oncall
- Manager on call: see [PagerDuty link]
- After-hours emergency: [number]Runbook writing principles (runbook 写作原则)
- Imperative voice: "Run X", not "X should be run."
- 命令式:"运行 X",不是"X 应该被运行"。
- Copy-pasteable commands: full, not partial.
- 可复制粘贴的命令:完整,不是片段。
- Decision trees: "if you see X, do Y. If Z, do W."
- 决策树:看到 X 就做 Y。看到 Z 就做 W。
- Link to the dashboard, not describe it. Words age; URLs don't.
- **链接到 dashboard,不要描述。**文字会老化;URL 不会。
- Test the runbook: walk through it cold to see what's missing.
- 测试 runbook:冷启动走一遍看缺什么。
Runbook anti-patterns (反模式)
❌ "Refer to the codebase to understand the issue." — 凌晨 3 点不可能。 ❌ "Contact Bob if there's a problem." — Bob 不是 24/7。 ❌ "It should be obvious how to..." — 不会的。 ❌ Long prose paragraphs explaining context. — 没人在故障中读。
ADR (Architecture Decision Record) (架构决策记录)
A short doc capturing why a decision was made, written at the time of decision.
简短文档,记录决策为什么这样做,在做决策时写。
Structure (结构)
# ADR 0042: Use Redis for rate limiting
Status: Accepted
Date: 2026-05-06
Authors: @alice
Deciders: @platform-team
## Context
We needed cross-instance rate limiting for the public API. The existing
in-process limiter is per-instance, so users could exceed limits by
hitting different replicas.
## Decision
Use Redis with the sliding-window algorithm.
## Consequences
Positive:
- Cross-instance enforcement
- 10x faster than Postgres for this access pattern
- Already deployed for sessions, no new infra
Negative:
- New runtime dependency on Redis for /api/* paths
- Redis becomes critical-path; need to handle Redis outages
## Alternatives considered
### In-process + gossip protocol
Rejected: complex, debug nightmare, marginal benefit.
### Postgres
Rejected: ~5ms added per request; doesn't scale to expected RPS.
### Vendor (Cloudflare rate limiting)
Rejected: $X/year; doesn't cover internal services.ADR principles (ADR 原则)
- Short. 1 page max. ADRs longer than 1 page become design docs.
- **简短。**最多 1 页。超过 1 页就是设计文档。
- Immutable. Don't edit decided ADRs; add a new one that supersedes.
- **不可变。**不要编辑已决定的 ADR;新写一个取代旧的。
- Numbered. ADR 0001, 0002, ... — easy to reference.
- **编号。**ADR 0001、0002……便于引用。
- Capture the WHY. The diff shows WHAT changed. ADR captures why.
- 记录原因。diff 显示什么变了。ADR 记录为什么。
Status values (状态值)
- Proposed: in discussion
- Accepted: decided, in effect
- Rejected: considered and not adopted
- Superseded by ADR-N: replaced by a newer decision
- Deprecated: no longer applies but historic context
Onboarding Docs (入职文档)
The doc you give a new hire on day 1.
新人第一天给的文档。
Structure (结构)
# Onboarding to [team]
Welcome! Here's how to get productive in your first 2 weeks.
## Day 1
- [ ] Get laptop set up (IT)
- [ ] Get added to Slack channels: #team, #team-eng, #team-help
- [ ] Pair with @buddy for environment setup (book on calendar)
- [ ] Read [team mission doc]
## Week 1
- [ ] Read top 3 design docs:
1. [link]
2. [link]
3. [link]
- [ ] Ship "Hello World" PR (any small fix; goal: learn deploy flow)
- [ ] Have 1:1s with each direct teammate
- [ ] Read runbook for [primary service]
## Week 2
- [ ] Take first ticket (small bug or feature)
- [ ] Shadow on-call for one shift
- [ ] Review someone else's PR
## Useful links
- Architecture overview: [link]
- Runbooks: [link]
- Glossary of internal terms: [link]
- Team norms doc: [link]
## People to meet
| Person | Role | Why |
|--------|------|-----|
| Alice | Tech lead | Owner of auth service |
| Bob | PM | Drives roadmap |
| Carol | Adjacent team lead | API team — we depend on them |
## Stuck?
- Ping @buddy in Slack
- Bring questions to weekly team Q&A (Tuesdays 3pm)
- Ask in #team-help — no question is too dumbOnboarding doc principles (入职文档原则)
- Time-boxed checklists beat long prose.
- 限时清单胜过长篇文字。
- Specific links, not "search the wiki."
- 具体链接,不是"在 wiki 里搜"。
- Refresh quarterly. Old links rot fast.
- **每季度刷新。**旧链接很快失效。
- Get new hires to update it. They notice gaps you miss.
- **让新人更新它。**他们看到你错过的空隙。
Glossary / Internal Terms (词汇表)
Tech orgs accumulate jargon. A glossary saves new hires hours.
科技组织积累行话。词汇表能给新人省几小时。
# Internal Glossary
**ARC** — Auth Request Coordinator. Internal service that brokers auth checks.
**P0** — Severity 0; site down; wake-up incident.
**P1** — Severity 1; significant degradation.
**Twinkle** — Internal codename for the new identity migration.
**The Move** — Q2 2024 migration from AWS to GCP. Often referenced.Writing Patterns (写作模式)
Lead with the answer (答案先行)
Same principle as design docs. Documentation readers skim — give them the answer in the first sentence.
和设计文档原则一样。文档读者扫读——第一句就给答案。
Use action-oriented headers (用行动导向的标题)
❌ "Performance Considerations" — 模糊。 ✅ "How to debug latency spikes" — 知道点开能干什么。
❌ "Authentication Architecture" — 静态。 ✅ "How auth works (and how to debug it)" — 主动。
Code blocks for commands (命令用代码块)
Anything someone might copy-paste belongs in a code block. Always.
任何可能被复制的东西放代码块。永远。
Step-by-step beats prose (分步胜过散文)
For procedures, numbered steps. For concepts, prose. Don't mix.
操作用编号步骤。概念用散文。不要混。
Use callouts for warnings (警告用 callout)
⚠️ Warning: Do not delete the cache table. It's used by the migration runner. Removing it causes data loss.
💡 Tip: For local development, use
make run-fastto skip the integrity check (saves 10s).
Date your docs (给文档加日期)
A doc with no date is a doc you can't trust. Add last-updated at top.
没日期的文档不可信。顶部加最后更新。
> Last updated: 2026-05-06 by @alice
> Owner: @platform-teamCommon Doc Types Cheat Sheet (常见文档类型一览)
| Doc | Purpose | Length | Audience |
|---|---|---|---|
| README | Front door | 1 page | Anyone landing on the repo |
| Runbook | Fix it at 3am | 1-3 pages | On-call engineer |
| ADR | Why a decision | 1 page | Future engineers |
| Design doc | Propose a change | 3-10 pages | Reviewers + future you |
| PRD | What to build | 3-10 pages | Eng, design, leadership |
| Postmortem | What went wrong | 2-5 pages | Org-wide learning |
| Onboarding | Ramp up a new hire | 1-3 pages | New hires |
| Glossary | Define jargon | 1 page (live) | Everyone |
| API doc | How to call X | varies | API consumers |
Maintaining Documentation (维护文档)
Stale docs are worse than no docs (过时文档比没文档更糟)
A wrong doc misleads. Either keep it current or delete it.
错的文档误导。要么保持最新,要么删除。
Cadence (节奏)
- README: every release / major change.
- Runbook: after every incident that exposed a gap.
- Design doc: don't update — write a new one if things change.
- Onboarding: every quarter, with new hire feedback.
- Glossary: as new terms appear.
Phrases to use in docs (文档用语)
- "Last updated: [date]"
- "Owner: @team"
- "Status: [active / deprecated / TODO]"
- "⚠️ This is out of date as of [date] — see [link] for current state" (deprecation marker)
- "TODO: [thing]" — explicit gap markers
Calling out stale docs (标记过时)
- "This doc references [old system]. Caveat for the reader." — 这文档提到 [旧系统]。读者注意。
- "Sections [X, Y] are outdated — flag if you want them updated." — [X, Y] 段过时——想更新告诉我。
Reviewing Others' Docs (评审别人的文档)
Treat doc reviews like code reviews. The bar is "would a stranger be able to use this?"
像代码评审一样评审文档。标准是"陌生人能用这个吗?"
Useful comments (有用的评论)
- "Quick: what is X? Define it on first use." — 快速:X 是什么?首次使用时定义。
- "This step assumes [Y] — worth saying." — 这步假设了 [Y]——值得说出来。
- "Could you add a code example here?" — 这里能加代码例子吗?
- "Link to [related doc] for context." — 链接到 [相关文档] 给上下文。
- "Date / owner missing." — 缺日期 / 负责人。
Phrases to Avoid (要避免的表达)
| ❌ Avoid (避免) | Why (原因) | ✅ Better (更好) |
|---|---|---|
| "Just" | "Just run X" implies it's obvious; often it isn't | "Run X." |
| "Simply" | Same problem | (delete) |
| "Obviously" | Reader feels dumb | (delete) |
| "As mentioned above" | Reader has to scroll | Repeat the key point. |
| "For more information, see..." with no link | Useless | Always include the link. |
| "TBD" without owner / date | Becomes permanent | "TBD by @alice, target [date]." |
| Long paragraphs | Skipped | Break into bullets / steps. |
Cultural Notes (文化提示)
Documentation culture varies by company (文档文化因公司而异)
- Amazon, Stripe: heavy doc culture, written narratives are central.
- Google, Meta: design docs are mandatory; runbooks variable.
- Startups: often under-documented; documenting is a chance to stand out.
Calibrate to your culture, but err on the side of more documentation.
按文化调整,但宁可多写。
Async-first means doc-first (异步优先 = 文档优先)
If your team is distributed across time zones, well-written docs replace meetings. The ROI is enormous.
如果团队跨时区,写得好的文档替代会议。ROI 巨大。
"Docs are dead" is wrong (说"文档已死"是错的)
Some engineers claim "code should be self-documenting." Code documents what; docs document why and how. They're not interchangeable.
有工程师说"代码应该自解释"。代码记录做什么;文档记录为什么和怎么用。不可互换。
Writing docs is reviewable work (写文档是可评估的工作)
In review/promotion conversations, list docs you wrote. They count. Brag about them like you brag about shipped features.
在评估/晋升对话里,列出你写的文档。算工作。像吹发布的功能一样吹文档。
Update as you discover gaps (发现空隙就更新)
The senior+ habit: when a runbook is missing the answer you needed, update it before closing the ticket. Compounding effort.
资深以上的习惯:当 runbook 缺你需要的答案,关 ticket 前更新它。复利效应。
Tips (小贴士)
- Lead with the answer — Reader has 30 seconds. 答案先行。
- Date and own everything — Owner-less, dateless docs rot. 加日期和负责人。
- Test runbooks cold — Walk through with no context to find gaps. 冷测试 runbook。
- Update as you go — Capture knowledge while fresh. 边做边更新。
- Action-oriented headers — "How to do X" beats "X considerations." 行动导向的标题。
- Delete more than you keep — Stale docs lie. 多删,少留。