ai · 4 min read · Apr 30, 2026

Evergreen: Cost-Efficient Verification of LLM-Generated Claims

A system that recasts claim verification as semantic queries, reducing LLM costs by 3.2x while maintaining accuracy on aggregated data.

Source: arxiv/cs.AI · Alexander W. Lee, Benjamin Han, Shayak Sen, Sam Yeom, Ugur Cetintemel, Anupam Datta · open original ↗

Evergreen verifies claims in LLM-generated summaries by compiling them into semantic queries, cutting costs 3.2x via targeted optimizations.

  • LLM semantic aggregation produces natural language summaries that may contain ungrounded claims requiring verification.
  • Evergreen converts each claim into a declarative semantic query executed on the same engine that generated the aggregate.
  • Verification-aware optimizations include early stopping, relevance sorting, and confidence sequences to minimize LLM calls.
  • General semantic query optimizations include operator fusion, similarity filtering, and prompt caching.
  • Provenance tracking identifies minimal tuple sets justifying each verdict using semiring-based first-order logic semantics.
  • Benchmarks show F1=1.00 with strong LLMs and 3.2x cost reduction, 4.0x latency reduction versus baseline.
  • Weak LLM performance exceeds strong LLM-as-judge baselines at 48x lower cost and 2.3x lower latency.

Astrobobo tool mapping

  • Knowledge Capture Document the structure of claims your LLM generates (e.g., 'X restaurants have Y rating') so you can design verification queries matching those patterns.
  • Focus Brief Create a checklist of claim types (quantifiers, comparisons, groupings) that require verification in your domain, then prioritize which to automate first.
  • Reading Queue Track papers on semantic query optimization and provenance systems to understand how Evergreen's techniques apply to your data stack.

Frequently asked

  • Evergreen avoids redundant LLM calls by using symbolic query execution on the underlying data. It only invokes the LLM when semantic reasoning is necessary, and it caches prompts across similar claims. Early stopping halts verification once sufficient evidence is found, and relevance sorting prioritizes high-impact tuples, reducing the data the LLM must process.
Share X LinkedIn
cite
APA
Alexander W. Lee, Benjamin Han, Shayak Sen, Sam Yeom, Ugur Cetintemel, Anupam Datta. (2026, April 30). Evergreen: Cost-Efficient Verification of LLM-Generated Claims. Astrobobo Content Engine (rewrite of arxiv/cs.AI). https://astrobobo-content-engine.vercel.app/article/evergreen-cost-efficient-verification-of-llm-generated-claims-ff9864
MLA
Alexander W. Lee, Benjamin Han, Shayak Sen, Sam Yeom, Ugur Cetintemel, Anupam Datta. "Evergreen: Cost-Efficient Verification of LLM-Generated Claims." Astrobobo Content Engine, 30 Apr 2026, https://astrobobo-content-engine.vercel.app/article/evergreen-cost-efficient-verification-of-llm-generated-claims-ff9864. Based on "arxiv/cs.AI", https://arxiv.org/abs/2604.26180.
BibTeX
@misc{astrobobo_evergreen-cost-efficient-verification-of-llm-generated-claims-ff9864_2026,
  author       = {Alexander W. Lee, Benjamin Han, Shayak Sen, Sam Yeom, Ugur Cetintemel, Anupam Datta},
  title        = {Evergreen: Cost-Efficient Verification of LLM-Generated Claims},
  year         = {2026},
  url          = {https://astrobobo-content-engine.vercel.app/article/evergreen-cost-efficient-verification-of-llm-generated-claims-ff9864},
  note         = {Astrobobo rewrite of arxiv/cs.AI, https://arxiv.org/abs/2604.26180},
}

Related insights