Retrieval-Augmented Set Completion for Clinical Code Authoring
A two-stage approach retrieves similar clinical value sets then classifies candidates, outperforming direct LLM generation on standardized medical vocabularies.
Retrieve similar clinical value sets, then classify candidates with a fine-tuned model, reducing hallucination and improving code selection accuracy.
- — Clinical value set authoring identifies all codes representing a medical concept in standardized vocabularies.
- — Direct LLM prompting fails because vocabularies are large, versioned, and not reliably memorized.
- — RASC retrieves K similar existing sets from a corpus, then applies a classifier to each candidate.
- — Cross-encoder fine-tuned on SAPBert achieves AUROC 0.852, outperforming MLP (0.799) and GPT-4o zero-shot (F1 0.105).
- — GPT-4o returns 48.6% codes absent from the official vocabulary, indicating hallucination.
- — Retrieval-only baseline produces 12.3 irrelevant codes per true positive; classifiers reduce this to 3.2–4.4.
- — Performance gap widens as value set size increases, confirming theoretical advantage of shrinking output space.
- — Benchmark dataset of 11,803 VSAC value sets enables reproducible evaluation.
Astrobobo tool mapping
- Knowledge Capture Document your organization's existing value set corpus and metadata (e.g., clinical domain, code counts, update frequency) to prepare for retrieval-based automation.
- Focus Brief Summarize the AUROC and F1 scores from RASC and competing baselines in a one-page decision brief for clinical leadership, highlighting hallucination rates in GPT-4o.
- Reading Queue Queue related papers on clinical NLP, vocabulary standardization (SNOMED CT, ICD-10), and retrieval-augmented generation to deepen domain context.
- Daily Log Track pilot results if you implement RASC: measure time saved per value set, false positive rate, and clinician feedback on code relevance.
Frequently asked
- Large language models are not reliably trained on the full, versioned clinical vocabularies (e.g., SNOMED CT, ICD-10). They hallucinate codes that do not exist in official systems, creating compliance and data integrity risks. Retrieval-augmented approaches ground the model in a curated corpus of real codes, eliminating out-of-vocabulary hallucinations.
cite ▸
Sumit Mukherjee, Juan Shu, Nairwita Mazumder, Tate Kernell, Celena Wheeler, Shannon Hastings, Chris Sidey-Gibbons. (2026, April 17). Retrieval-Augmented Set Completion for Clinical Code Authoring. Astrobobo Content Engine (rewrite of arxiv/cs.LG). https://astrobobo-content-engine.vercel.app/article/retrieval-augmented-set-completion-for-clinical-code-authoring-55495e
Sumit Mukherjee, Juan Shu, Nairwita Mazumder, Tate Kernell, Celena Wheeler, Shannon Hastings, Chris Sidey-Gibbons. "Retrieval-Augmented Set Completion for Clinical Code Authoring." Astrobobo Content Engine, 17 Apr 2026, https://astrobobo-content-engine.vercel.app/article/retrieval-augmented-set-completion-for-clinical-code-authoring-55495e. Based on "arxiv/cs.LG", https://arxiv.org/abs/2604.14616.
@misc{astrobobo_retrieval-augmented-set-completion-for-clinical-code-authoring-55495e_2026,
author = {Sumit Mukherjee, Juan Shu, Nairwita Mazumder, Tate Kernell, Celena Wheeler, Shannon Hastings, Chris Sidey-Gibbons},
title = {Retrieval-Augmented Set Completion for Clinical Code Authoring},
year = {2026},
url = {https://astrobobo-content-engine.vercel.app/article/retrieval-augmented-set-completion-for-clinical-code-authoring-55495e},
note = {Astrobobo rewrite of arxiv/cs.LG, https://arxiv.org/abs/2604.14616},
}