Does the format I use to serialize FHIR data really affect LLM medication reconciliation accuracy?

Yes, significantly. According to this study, Clinical Narrative format outperforms Raw JSON by up to 19 F1 points on smaller models (≤8B parameters), while Raw JSON performs best on 70B-parameter models. The choice of format is not cosmetic; it directly impacts whether the model catches all active medications, which is critical for patient safety.

Which serialization format should I use for my medication reconciliation LLM?

Use Clinical Narrative format if your model has 8 billion parameters or fewer. Use Raw JSON if your model has 70 billion parameters or more. The study tested five open-weight models and found this pattern holds consistently. However, you should benchmark both formats on your own data and model to confirm, especially if your patient population has complex medication histories.

Why do LLMs miss medications more often than they invent fake ones?

Across all tested model-format combinations, precision exceeded recall, meaning omission (missing a real medication) was the dominant failure mode. This suggests LLMs are conservative in medication extraction, preferring to skip uncertain entries rather than guess. Clinically, this is safer than hallucination, but it means safety audits should prioritize catching false negatives (missed medications) over false positives.

ai · 8 min read · Apr 25, 2026

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.

Source: arxiv/cs.AI · Sanjoy Pator · open original ↗

Data format choice significantly impacts LLM medication reconciliation accuracy, with strategy effectiveness varying by model size.

— Clinical Narrative format outperforms Raw JSON by up to 19 F1 points on models ≤8B parameters.
— Raw JSON achieves best performance (F1 0.9956) on 70B-parameter models, reversing smaller-model trends.
— All model-format combinations show precision exceeding recall; omission is the dominant failure mode.
— Models plateau at 7–10 concurrent medications, systematically underserving polypharmacy patients.
— BioMistral-7B produced zero usable output despite domain pretraining, indicating instruction tuning is essential.
— Tested five open-weight models across four serialization strategies on 200 synthetic patients (4,000 inferences).
— Practical deployment rule: use Clinical Narrative for ≤8B models, Raw JSON for 70B+.
— Pipeline reproducible on single AWS g6e.xlarge instance with 48 GB VRAM.

Astrobobo tool mapping

Knowledge Capture Document your current FHIR serialization approach and the model size you are using. Cross-reference this paper's format recommendations to identify whether you are in the ≤8B or 70B+ regime and whether your format choice aligns with the evidence.
Focus Brief Create a one-page decision matrix: rows = your candidate models (with parameter counts), columns = serialization formats, cells = F1 scores from your own benchmark runs. Use this to select the format-model pair that maximizes recall for your patient population.
Daily Log Track medication reconciliation errors in your pilot deployment (if live) or in your benchmark results. Log omissions and hallucinations separately to validate the paper's finding that omission dominates, and adjust your safety thresholds accordingly.

Frequently asked

Yes, significantly. According to this study, Clinical Narrative format outperforms Raw JSON by up to 19 F1 points on smaller models (≤8B parameters), while Raw JSON performs best on 70B-parameter models. The choice of format is not cosmetic; it directly impacts whether the model catches all active medications, which is critical for patient safety.

Share X LinkedIn

cite ▸

APA

Sanjoy Pator. (2026, April 25). FHIR Format Choice Shifts LLM Medication Safety by 19 Points. Astrobobo Content Engine (rewrite of arxiv/cs.AI). https://astrobobo-content-engine.vercel.app/article/fhir-format-choice-shifts-llm-medication-safety-by-19-points-08a0ef

MLA

Sanjoy Pator. "FHIR Format Choice Shifts LLM Medication Safety by 19 Points." Astrobobo Content Engine, 25 Apr 2026, https://astrobobo-content-engine.vercel.app/article/fhir-format-choice-shifts-llm-medication-safety-by-19-points-08a0ef. Based on "arxiv/cs.AI", https://arxiv.org/abs/2604.21076.

BibTeX

@misc{astrobobo_fhir-format-choice-shifts-llm-medication-safety-by-19-points-08a0ef_2026,
  author       = {Sanjoy Pator},
  title        = {FHIR Format Choice Shifts LLM Medication Safety by 19 Points},
  year         = {2026},
  url          = {https://astrobobo-content-engine.vercel.app/article/fhir-format-choice-shifts-llm-medication-safety-by-19-points-08a0ef},
  note         = {Astrobobo rewrite of arxiv/cs.AI, https://arxiv.org/abs/2604.21076},
}

#llm #healthcare #fhir #medication #serialization

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

Astrobobo tool mapping

Frequently asked

Related insights

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs