From a 14-person support queue to a 24/7 medical knowledge agent.
Thalamus is a Series A digital-health platform serving 80,000 members. Their support load was doubling every quarter. We replaced a brittle FAQ chatbot with a production RAG system grounded in their internal protocols, EHR notes and policy docs, and shipped it through HIPAA review in 12 weeks.
Thalamus's support team had grown from 4 to 14 in 18 months and was still missing SLAs. The legacy FAQ bot was deflecting only 14% of queries and was actively making things worse by quoting out-of-date procedures.
Members were frustrated. Engineering had tried two rounds of vendor RAG products; neither passed HIPAA review, neither beat the FAQ bot on accuracy.
The CTO's brief was specific: build something they could put their name on, that does not lie, survives an audit, and pays for itself in 12 months.
I started with a one-week paid discovery: shadowed the queue, talked to support agents, classified 1,400 historical tickets, and mapped the existing bot's failure modes.
The conclusion: this was not an LLM problem, it was a retrieval problem. Truth lived in Notion, Confluence, SharePoint and a custom CMS with no reconciliation layer.
Plan: build unified ingestion, ground retrieval with citations, add a multi-stage eval harness, and keep humans in the loop for anything PHI-adjacent.
- Unified ingest from Notion, Confluence, SharePoint and the internal CMS, with diff-tracking for deprecated content.
- Hybrid retrieval with citation-mandatory output, so answers cannot ship without source grounding.
- HIPAA-safe redaction layer in front of every LLM call. PHI never leaves the VPC.
- Nightly eval harness over 800 golden queries with Slack regression alerts.
- Human-in-the-loop escalation for clinical-decision intent classifiers.
- Notion API
- Confluence
- SharePoint
- Internal CMS
- BM25
- Dense vectors
- Reranker
- Diff tracker
- GPT-4o-mini
- PHI redactor
- Citation enforcer
- Intent classifier
- Member web app
- Agent assist
- Eval dashboard
- Slack escalation
The whole stack runs inside Thalamus's VPC. No member data, PHI, or transcript content ever touches my infrastructure. Eval runs nightly against a frozen golden set; any retrieval regression fires a Slack alert.
The system shipped in 12 weeks. We launched to 10% of members, ramped to 100% over 4 weeks, and hit 92% auto-resolution by week 6.
During open enrollment it absorbed a 10x spike in query volume with no degradation in latency or accuracy.
Six months on, the system has zero PHI violations, zero rolled-back deploys, and the eval harness has caught 4 silent regressions before members saw them.
"Talha shipped a production RAG system in 12 weeks that survived our HIPAA audit on the first try. Best money we have spent."
- Sarah Mendez, VP Engineering, Thalamus Health
Got a problem AI might solve?
30 minutes. Free. You leave with a clear yes/no on whether to build, and a one-pager you can forward to your team.