ai · 8 min read · Apr 27, 2026

Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data

Researchers demonstrate how adversaries can plant dormant malicious logic in large language models by seeding poisoned content across obscure websites, evading detection until triggered.

Source: arxiv/cs.AI · Harsh Kumar, Rahul Maity, Tanmay Joshi, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das · open original ↗

Adversaries can inject tiny poisoned payloads into web-crawled pretraining data, creating latent vulnerabilities that activate only when triggered by specific prompts.

  • Stealth Pretraining Seeding distributes small malicious content across obscure websites exposed to web crawlers.
  • Individual payloads remain small, diffuse, and superficially harmless to evade dataset filtering.
  • Poisoned logic lies dormant during standard evaluation but activates via precise alphanumeric triggers.
  • PermaFrost-Attack framework uses geometric diagnostics to detect latent model behavior and vulnerabilities.
  • Attack succeeds across multiple model families and scales while often bypassing alignment defenses.
  • Threat exploits the dependency of LLMs on web-scale pretraining from sources like Common Crawl.
  • Standard safety evaluations miss dormant threats because they do not probe for hidden conditional logic.

Astrobobo tool mapping

  • Knowledge Capture Log the threat model: dormant poisoning, trigger-based activation, evasion of standard eval. Capture the three diagnostic methods (Thermodynamic Length, Spectral Curvature, Infection Traceback Graph) as reference techniques for your security team.
  • Focus Brief Summarize for stakeholders: pretraining data is now a security perimeter. Outline the gap between current dataset filtering (which assumes benign noise) and adversarial filtering (which assumes coordinated poisoning).
  • Reading Queue Queue related work on data poisoning defenses, model interpretability for latent behavior, and supply-chain security in ML. Prioritize papers on trigger detection and data provenance.

Frequently asked

  • Standard poisoning injects obvious malicious examples into training data. SPS distributes tiny, benign-looking payloads across many obscure websites, relying on web crawlers to absorb them into pretraining corpora. Each payload is individually undetectable, but collectively they embed dormant logic that activates only when triggered by specific prompts. This makes the attack harder to catch during dataset construction.
Share X LinkedIn
cite
APA
Harsh Kumar, Rahul Maity, Tanmay Joshi, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das. (2026, April 27). Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data. Astrobobo Content Engine (rewrite of arxiv/cs.AI). https://astrobobo-content-engine.vercel.app/article/poisoned-pretraining-hidden-attacks-embedded-in-llm-training-data-30ecbb
MLA
Harsh Kumar, Rahul Maity, Tanmay Joshi, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das. "Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data." Astrobobo Content Engine, 27 Apr 2026, https://astrobobo-content-engine.vercel.app/article/poisoned-pretraining-hidden-attacks-embedded-in-llm-training-data-30ecbb. Based on "arxiv/cs.AI", https://arxiv.org/abs/2604.22117.
BibTeX
@misc{astrobobo_poisoned-pretraining-hidden-attacks-embedded-in-llm-training-data-30ecbb_2026,
  author       = {Harsh Kumar, Rahul Maity, Tanmay Joshi, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das},
  title        = {Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data},
  year         = {2026},
  url          = {https://astrobobo-content-engine.vercel.app/article/poisoned-pretraining-hidden-attacks-embedded-in-llm-training-data-30ecbb},
  note         = {Astrobobo rewrite of arxiv/cs.AI, https://arxiv.org/abs/2604.22117},
}

Related insights