Steven's Knowledge

Conversation Intelligence

Turning unstructured human conversation into structured insight, actions, and metrics

Scenario Abstraction

A multi-party human conversation happens — meeting, call, interview, chat — and the business needs more than the recording. It needs: what was said, what was decided, what was felt, what should happen next, and how the conversation compares to many others like it.

The core difficulty isn't transcription (ASR is largely solved). It's that humans speak loosely: they backtrack, agree implicitly, change topic, joke, and rarely state action items in the clean form a downstream system expects. LLMs are well-suited here because the gap between "what someone literally said" and "the structured fact you need" is exactly the kind of fuzzy mapping that classical NLP struggled with.

Solution Shape

A canonical pipeline:

  1. Capture — audio (call recording, meeting room, headset), video (screen + camera), or chat transcript.
  2. Transcribe & diarize — speech-to-text with speaker turns. Word-level timestamps matter for later citation.
  3. Normalize — fix proper nouns, jargon, product names (often via a custom vocabulary or LLM correction pass).
  4. Extract structured facts — LLM extracts a typed schema: topics, decisions, action items (owner + due date), risks, objections, sentiment, questions asked.
  5. Score & classify — apply rubrics (was the demo done well? was the objection handled? did the meeting hit its stated goal?).
  6. Route & write back — push action items to a task tracker, update CRM stage, notify owners, store in a searchable archive.
  7. Aggregate — across many conversations, surface trends: top objections this month, win/loss themes, agent coaching gaps.

The non-obvious work is the schema: defining what types of facts the business cares about, and writing prompts that extract them reliably with citations back to specific turns.

Key Building Blocks

  • ASR with diarization — Whisper-class model, plus a diarizer (pyannote, Deepgram, AssemblyAI).
  • Custom vocabulary / phrase boosting — for product/people names.
  • Structured-output LLM call — typed JSON extraction, ideally with span-level grounding.
  • Rubric / judge LLM — for scoring against a defined quality framework.
  • Vector index of past transcripts — for "find similar calls" and trend search.
  • Integrations — calendar, CRM, task tracker, BI dashboards.

Concrete Cases

  • Sales call intelligence (Gong-style). Record every customer call, extract objections, competitors mentioned, deal-stage signals, action items. Coach reps by comparing winning vs losing call patterns.
  • Meeting copilot for internal meetings. Generate a structured recap with decisions and owners, write back action items to Linear/Jira/Asana, push the recap to Slack thread.
  • Customer support QA. Score every support call against a rubric (greeting, empathy, resolution, compliance script). Flag the bottom 5% for manager review.
  • Clinical visit summary. Transcribe doctor-patient conversation, produce SOAP-format notes, surface coded diagnoses/medications for the EHR.
  • Job interview structuring. Extract candidate's stated experience, evidence for each competency in the scorecard, fill the ATS rubric.
  • Compliance review of financial advisor calls. Detect required disclosures, prohibited language, suitability red flags.
  • User research / call-center insights. Cluster recurring complaints, tag transcripts by theme, feed product roadmap discussions.
  • Mediation / legal deposition prep. Index full depositions, surface contradictions across testimony.

Similar Scenarios

These share most of the pipeline shape — replace "conversation" with another long, unstructured signal:

  • Chat / email thread intelligence — same extraction work, no ASR step. (Inbox triage, ticket summarization.)
  • Reading group / podcast briefs — single-speaker audio, narrative summary instead of action items.
  • Internal video library indexing — recorded all-hands and training, with search by topic.
  • In-classroom analytics — teacher speaking + student response, identify engagement.

Pitfalls & Evaluation

  • Garbage-in transcription — accents, overlapping speech, jargon. Always evaluate ASR word-error rate on your audio before blaming the LLM.
  • Hallucinated action items — the LLM invents commitments that nobody made. Force the extraction step to cite turn/timestamp; reject items without a citation.
  • Schema drift — the business adds new fields ad hoc until the prompt becomes unmaintainable. Treat the extraction schema as a versioned data contract.
  • Privacy & consent — recording laws differ per jurisdiction. PII redaction is part of the pipeline, not an afterthought.

Useful metrics: extraction F1 against a labeled set, rubric agreement with human reviewers, downstream task completion rate (did the auto-created action item actually get done?).

On this page