Coaching & Simulation
Using LLMs to roleplay scenarios, give feedback, and let people practice skills safely
Scenario Abstraction
A person needs to practice a skill that is hard to practice in real life — a difficult conversation, a sales objection, a clinical interview, an emergency procedure, a foreign-language exchange — or needs feedback on something they did. Traditional alternatives are expensive (1:1 human coach) or low-fidelity (slides, quizzes). LLMs can play the counterparty, observe the learner, and produce structured feedback against a rubric.
This is distinct from a chatbot in a few ways: there is a target skill with a defined rubric; the conversation is intentionally adversarial or scenario-bound, not helpful by default; and the value is the feedback, not the dialog itself.
Solution Shape
- Scenario design — write the situation: who the AI is playing, their motivation, what an excellent learner response looks like, what failure modes to probe.
- Persona prompt — instruct the model to stay in role; not break character; raise specific objections; mirror real difficulty.
- Practice session — learner interacts (text or voice). The system maintains the persona and the scenario state.
- Rubric-based scoring — at the end, a separate evaluator pass scores the learner against named competencies with examples from the transcript.
- Targeted feedback — strengths, growth areas, what to try next time, with quoted moments.
- Replay / variations — let the learner retry with the same persona, harder version, or different scenario.
- Manager dashboard — across many sessions, identify common weaknesses and skill trends.
Voice agents add complexity here (turn-taking, latency, interruption handling) but match how the real-world skill is practiced.
Key Building Blocks
- Scenario library — versioned, peer-reviewed, mapped to competencies.
- Roleplay model with system prompt designed to resist jailbreaking the persona.
- Evaluator model with a calibrated rubric.
- Conversation infrastructure — text or voice; real-time TTS / ASR if voice.
- Learner dashboard with progress over time and recommended next sessions.
- Subject-matter-expert review loop — SME evaluations on a sample to calibrate the evaluator model.
Concrete Cases
- Sales objection practice. AI plays a tough prospect with a specific objection; learner practices; manager sees scores on discovery, qualification, value articulation.
- Difficult conversations training (performance reviews, layoffs). AI plays the employee; learner practices delivering hard feedback; rubric covers empathy, clarity, legal-safe phrasing.
- Clinical communication. AI plays a patient with given symptoms and concerns; learner practices history taking, breaking bad news, motivational interviewing.
- Language learning conversation partner. AI converses at the learner's level, gives gentle corrections, expands vocabulary.
- Customer service training. AI plays angry / confused / fraudster customer; trainee practices handling per script and policy.
- Job-interview practice. AI as interviewer for a target role; STAR-method feedback.
- Negotiation training. Multi-round negotiation with constraints; AI plays counterparty with hidden preferences.
- Incident-response tabletop exercises. AI runs the scenario, plays multiple stakeholders, evaluates the playbook adherence.
- Public-speaking / pitch coaching. Learner records a pitch; LLM (+ ASR + maybe vision) gives feedback on clarity, structure, filler words, pacing.
- Crisis communication drill. Mock journalist asks hostile questions; learner practices on-message responses.
Similar Scenarios
- Internal training assistants — knowledge-assistant shape combined with roleplay for skill drills.
- Onboarding simulations — first-week practice of common situations.
- Compliance training — scenario-based, not slide-based.
- Therapy adjuncts (carefully scoped) — practice between sessions with explicit limits and crisis-handover plans.
Pitfalls & Evaluation
- Persona collapse. The model wants to be helpful; it answers the learner instead of staying in role. System prompt must firmly enforce role.
- Rubric drift. Without SME calibration, the evaluator's scoring drifts away from what experts would say. Periodically realign.
- Negative-experience risk. Scenarios involving distress (medical, layoffs, harassment) need content guards and explicit boundaries; the AI should not improvise into harmful territory.
- Feedback that is too generic to act on. "Be more empathetic" is useless. Force the evaluator to quote specific moments and propose a specific alternative.
- Voice-mode UX traps. Latency, false interruptions, ASR errors at emotional pitches. Test extensively before shipping.
Useful metrics: pre / post-training skill scores on a held-out task, SME agreement with the evaluator model, learner self-rated usefulness, retention (do learners come back?), real-world performance change (the only metric that ultimately matters).