Coaching & Simulation

Using LLMs to roleplay scenarios, give feedback, and let people practice skills safely

Scenario Abstraction

A person needs to practice a skill that is hard to practice in real life — a difficult conversation, a sales objection, a clinical interview, an emergency procedure, a foreign-language exchange — or needs feedback on something they did. Traditional alternatives are expensive (1:1 human coach) or low-fidelity (slides, quizzes). LLMs can play the counterparty, observe the learner, and produce structured feedback against a rubric.

This is distinct from a chatbot in a few ways: there is a target skill with a defined rubric; the conversation is intentionally adversarial or scenario-bound, not helpful by default; and the value is the feedback, not the dialog itself.

Solution Shape

Scenario design — write the situation: who the AI is playing, their motivation, what an excellent learner response looks like, what failure modes to probe.
Persona prompt — instruct the model to stay in role; not break character; raise specific objections; mirror real difficulty.
Practice session — learner interacts (text or voice). The system maintains the persona and the scenario state.
Rubric-based scoring — at the end, a separate evaluator pass scores the learner against named competencies with examples from the transcript.
Targeted feedback — strengths, growth areas, what to try next time, with quoted moments.
Replay / variations — let the learner retry with the same persona, harder version, or different scenario.
Manager dashboard — across many sessions, identify common weaknesses and skill trends.

Voice agents add complexity here (turn-taking, latency, interruption handling) but match how the real-world skill is practiced.

Key Building Blocks

Scenario library — versioned, peer-reviewed, mapped to competencies.
Roleplay model with system prompt designed to resist jailbreaking the persona.
Evaluator model with a calibrated rubric.
Conversation infrastructure — text or voice; real-time TTS / ASR if voice.
Learner dashboard with progress over time and recommended next sessions.
Subject-matter-expert review loop — SME evaluations on a sample to calibrate the evaluator model.

Concrete Cases

Sales objection practice. AI plays a tough prospect with a specific objection; learner practices; manager sees scores on discovery, qualification, value articulation.
Difficult conversations training (performance reviews, layoffs). AI plays the employee; learner practices delivering hard feedback; rubric covers empathy, clarity, legal-safe phrasing.
Clinical communication. AI plays a patient with given symptoms and concerns; learner practices history taking, breaking bad news, motivational interviewing.
Language learning conversation partner. AI converses at the learner's level, gives gentle corrections, expands vocabulary.
Customer service training. AI plays angry / confused / fraudster customer; trainee practices handling per script and policy.
Job-interview practice. AI as interviewer for a target role; STAR-method feedback.
Negotiation training. Multi-round negotiation with constraints; AI plays counterparty with hidden preferences.
Incident-response tabletop exercises. AI runs the scenario, plays multiple stakeholders, evaluates the playbook adherence.
Public-speaking / pitch coaching. Learner records a pitch; LLM (+ ASR + maybe vision) gives feedback on clarity, structure, filler words, pacing.
Crisis communication drill. Mock journalist asks hostile questions; learner practices on-message responses.

Similar Scenarios

Internal training assistants — knowledge-assistant shape combined with roleplay for skill drills.
Onboarding simulations — first-week practice of common situations.
Compliance training — scenario-based, not slide-based.
Therapy adjuncts (carefully scoped) — practice between sessions with explicit limits and crisis-handover plans.

Pitfalls & Evaluation

Persona collapse. The model wants to be helpful; it answers the learner instead of staying in role. System prompt must firmly enforce role.
Rubric drift. Without SME calibration, the evaluator's scoring drifts away from what experts would say. Periodically realign.
Negative-experience risk. Scenarios involving distress (medical, layoffs, harassment) need content guards and explicit boundaries; the AI should not improvise into harmful territory.
Feedback that is too generic to act on. "Be more empathetic" is useless. Force the evaluator to quote specific moments and propose a specific alternative.
Voice-mode UX traps. Latency, false interruptions, ASR errors at emotional pitches. Test extensively before shipping.

Useful metrics: pre / post-training skill scores on a held-out task, SME agreement with the evaluator model, learner self-rated usefulness, retention (do learners come back?), real-world performance change (the only metric that ultimately matters).