FHIR Format Choice Shifts LLM Medication Safety by 19 Points
How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.
Data format choice significantly impacts LLM medication reconciliation accuracy, with strategy effectiveness varying by model size.
- — Clinical Narrative format outperforms Raw JSON by up to 19 F1 points on models ≤8B parameters.
- — Raw JSON achieves best performance (F1 0.9956) on 70B-parameter models, reversing smaller-model trends.
- — All model-format combinations show precision exceeding recall; omission is the dominant failure mode.
- — Models plateau at 7–10 concurrent medications, systematically underserving polypharmacy patients.
- — BioMistral-7B produced zero usable output despite domain pretraining, indicating instruction tuning is essential.
- — Tested five open-weight models across four serialization strategies on 200 synthetic patients (4,000 inferences).
- — Practical deployment rule: use Clinical Narrative for ≤8B models, Raw JSON for 70B+.
- — Pipeline reproducible on single AWS g6e.xlarge instance with 48 GB VRAM.
Astrobobo tool mapping
- Knowledge Capture Document your current FHIR serialization approach and the model size you are using. Cross-reference this paper's format recommendations to identify whether you are in the ≤8B or 70B+ regime and whether your format choice aligns with the evidence.
- Focus Brief Create a one-page decision matrix: rows = your candidate models (with parameter counts), columns = serialization formats, cells = F1 scores from your own benchmark runs. Use this to select the format-model pair that maximizes recall for your patient population.
- Daily Log Track medication reconciliation errors in your pilot deployment (if live) or in your benchmark results. Log omissions and hallucinations separately to validate the paper's finding that omission dominates, and adjust your safety thresholds accordingly.
Frequently asked
- Yes, significantly. According to this study, Clinical Narrative format outperforms Raw JSON by up to 19 F1 points on smaller models (≤8B parameters), while Raw JSON performs best on 70B-parameter models. The choice of format is not cosmetic; it directly impacts whether the model catches all active medications, which is critical for patient safety.
cite ▸
Sanjoy Pator. (2026, April 25). FHIR Format Choice Shifts LLM Medication Safety by 19 Points. Astrobobo Content Engine (rewrite of arxiv/cs.AI). https://astrobobo-content-engine.vercel.app/article/fhir-format-choice-shifts-llm-medication-safety-by-19-points-08a0ef
Sanjoy Pator. "FHIR Format Choice Shifts LLM Medication Safety by 19 Points." Astrobobo Content Engine, 25 Apr 2026, https://astrobobo-content-engine.vercel.app/article/fhir-format-choice-shifts-llm-medication-safety-by-19-points-08a0ef. Based on "arxiv/cs.AI", https://arxiv.org/abs/2604.21076.
@misc{astrobobo_fhir-format-choice-shifts-llm-medication-safety-by-19-points-08a0ef_2026,
author = {Sanjoy Pator},
title = {FHIR Format Choice Shifts LLM Medication Safety by 19 Points},
year = {2026},
url = {https://astrobobo-content-engine.vercel.app/article/fhir-format-choice-shifts-llm-medication-safety-by-19-points-08a0ef},
note = {Astrobobo rewrite of arxiv/cs.AI, https://arxiv.org/abs/2604.21076},
}