Select case study

Case study · 01 / 03

From a 14-person support queue to a 24/7 medical knowledge agent.

Thalamus is a Series A digital-health platform serving 80,000 members. Their support load was doubling every quarter. We replaced a brittle FAQ chatbot with a production RAG system grounded in their internal protocols, EHR notes and policy docs, and shipped it through HIPAA review in 12 weeks.

[ Client ]

Thalamus Health

Series A · Healthtech · 80k members

[ Engagement ]

Sprint + 6mo retainer

12 weeks build · ongoing MLOps

[ Role ]

Lead AI engineer

Solo build · embedded with Eng

[ Year ]

2025

Q1 - Q2, ongoing

Results · 6 months post-launchverified by client finance

92%

Auto-resolution

Up from 14% on legacy bot

$840k

Annual savings

Net of LLM costs

10x

Query spike absorbed

Open-enrollment week

PHI violations

Audit-clean since launch

[ 01 ] The Problem

Thalamus's support team had grown from 4 to 14 in 18 months and was still missing SLAs. The legacy FAQ bot was deflecting only 14% of queries and was actively making things worse by quoting out-of-date procedures.

Members were frustrated. Engineering had tried two rounds of vendor RAG products; neither passed HIPAA review, neither beat the FAQ bot on accuracy.

The CTO's brief was specific: build something they could put their name on, that does not lie, survives an audit, and pays for itself in 12 months.

[ 02 ] Approach

I started with a one-week paid discovery: shadowed the queue, talked to support agents, classified 1,400 historical tickets, and mapped the existing bot's failure modes.

The conclusion: this was not an LLM problem, it was a retrieval problem. Truth lived in Notion, Confluence, SharePoint and a custom CMS with no reconciliation layer.

Plan: build unified ingestion, ground retrieval with citations, add a multi-stage eval harness, and keep humans in the loop for anything PHI-adjacent.

Unified ingest from Notion, Confluence, SharePoint and the internal CMS, with diff-tracking for deprecated content.
Hybrid retrieval with citation-mandatory output, so answers cannot ship without source grounding.
HIPAA-safe redaction layer in front of every LLM call. PHI never leaves the VPC.
Nightly eval harness over 800 golden queries with Slack regression alerts.
Human-in-the-loop escalation for clinical-decision intent classifiers.

[ 03 ] Architecture

[ 01 ] Sources

Knowledge ingest

Notion API
Confluence
SharePoint
Internal CMS

[ 02 ] Index

Hybrid retrieval

BM25
Dense vectors
Reranker
Diff tracker

[ 03 ] Reasoning

LLM + guardrails

GPT-4o-mini
PHI redactor
Citation enforcer
Intent classifier

[ 04 ] Surface

Member chat + ops

Member web app
Agent assist
Eval dashboard
Slack escalation

The whole stack runs inside Thalamus's VPC. No member data, PHI, or transcript content ever touches my infrastructure. Eval runs nightly against a frozen golden set; any retrieval regression fires a Slack alert.

# Citation-enforced response contract
class GroundedResponse(BaseModel):
  answer: str
  citations: list[Citation]
  confidence: Literal["high", "med", "low"]
  intent: ClinicalIntent

[ 04 ] Outcome

The system shipped in 12 weeks. We launched to 10% of members, ramped to 100% over 4 weeks, and hit 92% auto-resolution by week 6.

During open enrollment it absorbed a 10x spike in query volume with no degradation in latency or accuracy.

Six months on, the system has zero PHI violations, zero rolled-back deploys, and the eval harness has caught 4 silent regressions before members saw them.

"Talha shipped a production RAG system in 12 weeks that survived our HIPAA audit on the first try. Best money we have spent."

- Sarah Mendez, VP Engineering, Thalamus Health

More work02 / 03 next

CS / 02

Seed · Legal AI · Contract automation