Case study · 01 / 03

From a 14-person support queue to a 24/7 medical knowledge agent.

Thalamus is a Series A digital-health platform serving 80,000 members. Their support load was doubling every quarter. We replaced a brittle FAQ chatbot with a production RAG system grounded in their internal protocols, EHR notes and policy docs, and shipped it through HIPAA review in 12 weeks.

[ Client ]

Thalamus Health

Series A · Healthtech · 80k members

[ Engagement ]

Sprint + 6mo retainer

12 weeks build · ongoing MLOps

[ Role ]

Lead AI engineer

Solo build · embedded with Eng

[ Year ]

2025

Q1 - Q2, ongoing

Results · 6 months post-launchverified by client finance
92%
Auto-resolution
Up from 14% on legacy bot
$840k
Annual savings
Net of LLM costs
10x
Query spike absorbed
Open-enrollment week
0
PHI violations
Audit-clean since launch

Thalamus's support team had grown from 4 to 14 in 18 months and was still missing SLAs. The legacy FAQ bot was deflecting only 14% of queries and was actively making things worse by quoting out-of-date procedures.

Members were frustrated. Engineering had tried two rounds of vendor RAG products; neither passed HIPAA review, neither beat the FAQ bot on accuracy.

The CTO's brief was specific: build something they could put their name on, that does not lie, survives an audit, and pays for itself in 12 months.

I started with a one-week paid discovery: shadowed the queue, talked to support agents, classified 1,400 historical tickets, and mapped the existing bot's failure modes.

The conclusion: this was not an LLM problem, it was a retrieval problem. Truth lived in Notion, Confluence, SharePoint and a custom CMS with no reconciliation layer.

Plan: build unified ingestion, ground retrieval with citations, add a multi-stage eval harness, and keep humans in the loop for anything PHI-adjacent.

  • Unified ingest from Notion, Confluence, SharePoint and the internal CMS, with diff-tracking for deprecated content.
  • Hybrid retrieval with citation-mandatory output, so answers cannot ship without source grounding.
  • HIPAA-safe redaction layer in front of every LLM call. PHI never leaves the VPC.
  • Nightly eval harness over 800 golden queries with Slack regression alerts.
  • Human-in-the-loop escalation for clinical-decision intent classifiers.
[ 01 ] Sources
Knowledge ingest
  • Notion API
  • Confluence
  • SharePoint
  • Internal CMS
[ 02 ] Index
Hybrid retrieval
  • BM25
  • Dense vectors
  • Reranker
  • Diff tracker
[ 03 ] Reasoning
LLM + guardrails
  • GPT-4o-mini
  • PHI redactor
  • Citation enforcer
  • Intent classifier
[ 04 ] Surface
Member chat + ops
  • Member web app
  • Agent assist
  • Eval dashboard
  • Slack escalation

The whole stack runs inside Thalamus's VPC. No member data, PHI, or transcript content ever touches my infrastructure. Eval runs nightly against a frozen golden set; any retrieval regression fires a Slack alert.

# Citation-enforced response contract class GroundedResponse(BaseModel): answer: str citations: list[Citation] confidence: Literal["high", "med", "low"] intent: ClinicalIntent

The system shipped in 12 weeks. We launched to 10% of members, ramped to 100% over 4 weeks, and hit 92% auto-resolution by week 6.

During open enrollment it absorbed a 10x spike in query volume with no degradation in latency or accuracy.

Six months on, the system has zero PHI violations, zero rolled-back deploys, and the eval harness has caught 4 silent regressions before members saw them.

"Talha shipped a production RAG system in 12 weeks that survived our HIPAA audit on the first try. Best money we have spent."

- Sarah Mendez, VP Engineering, Thalamus Health

Book a call

Got a problem AI might solve?

30 minutes. Free. You leave with a clear yes/no on whether to build, and a one-pager you can forward to your team.

[ Response ]

Within 24 hours

[ Timezone ]

GMT+5 · flexible

[ Discovery ]

Free · no NDA needed

[ Engagement ]

From $8k / 4 wks