How does Stealth Pretraining Seeding differ from standard data poisoning?

Standard poisoning injects obvious malicious examples into training data. SPS distributes tiny, benign-looking payloads across many obscure websites, relying on web crawlers to absorb them into pretraining corpora. Each payload is individually undetectable, but collectively they embed dormant logic that activates only when triggered by specific prompts. This makes the attack harder to catch during dataset construction.

What are the geometric diagnostics used to detect latent poisoning?

The paper introduces three diagnostic methods: Thermodynamic Length measures how the model's behavior changes under adversarial perturbations; Spectral Curvature examines the curvature of the loss landscape to identify hidden decision boundaries; and the Infection Traceback Graph traces which training examples or data sources contributed to suspicious model outputs. Together, they help reveal vulnerabilities that standard evaluations miss.

Can standard safety alignment defenses prevent PermaFrost attacks?

No. The paper shows that common alignment techniques (RLHF, constitutional AI, adversarial training) often fail to catch dormant poisoning because they evaluate the model on standard benchmarks and prompts. Since poisoned logic only activates via specific triggers, it remains invisible during typical safety testing. Defenders must adopt new evaluation methods that probe for conditional or latent behaviors.

ai · 8 min read · Apr 27, 2026

Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data

Researchers demonstrate how adversaries can plant dormant malicious logic in large language models by seeding poisoned content across obscure websites, evading detection until triggered.

Source: arxiv/cs.AI · Harsh Kumar, Rahul Maity, Tanmay Joshi, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das · open original ↗

Adversaries can inject tiny poisoned payloads into web-crawled pretraining data, creating latent vulnerabilities that activate only when triggered by specific prompts.

— Stealth Pretraining Seeding distributes small malicious content across obscure websites exposed to web crawlers.
— Individual payloads remain small, diffuse, and superficially harmless to evade dataset filtering.
— Poisoned logic lies dormant during standard evaluation but activates via precise alphanumeric triggers.
— PermaFrost-Attack framework uses geometric diagnostics to detect latent model behavior and vulnerabilities.
— Attack succeeds across multiple model families and scales while often bypassing alignment defenses.
— Threat exploits the dependency of LLMs on web-scale pretraining from sources like Common Crawl.
— Standard safety evaluations miss dormant threats because they do not probe for hidden conditional logic.

Astrobobo tool mapping

Knowledge Capture Log the threat model: dormant poisoning, trigger-based activation, evasion of standard eval. Capture the three diagnostic methods (Thermodynamic Length, Spectral Curvature, Infection Traceback Graph) as reference techniques for your security team.
Focus Brief Summarize for stakeholders: pretraining data is now a security perimeter. Outline the gap between current dataset filtering (which assumes benign noise) and adversarial filtering (which assumes coordinated poisoning).
Reading Queue Queue related work on data poisoning defenses, model interpretability for latent behavior, and supply-chain security in ML. Prioritize papers on trigger detection and data provenance.

Frequently asked

Standard poisoning injects obvious malicious examples into training data. SPS distributes tiny, benign-looking payloads across many obscure websites, relying on web crawlers to absorb them into pretraining corpora. Each payload is individually undetectable, but collectively they embed dormant logic that activates only when triggered by specific prompts. This makes the attack harder to catch during dataset construction.

Share X LinkedIn

cite ▸

APA

Harsh Kumar, Rahul Maity, Tanmay Joshi, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das. (2026, April 27). Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data. Astrobobo Content Engine (rewrite of arxiv/cs.AI). https://astrobobo-content-engine.vercel.app/article/poisoned-pretraining-hidden-attacks-embedded-in-llm-training-data-30ecbb

MLA

Harsh Kumar, Rahul Maity, Tanmay Joshi, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das. "Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data." Astrobobo Content Engine, 27 Apr 2026, https://astrobobo-content-engine.vercel.app/article/poisoned-pretraining-hidden-attacks-embedded-in-llm-training-data-30ecbb. Based on "arxiv/cs.AI", https://arxiv.org/abs/2604.22117.

BibTeX

@misc{astrobobo_poisoned-pretraining-hidden-attacks-embedded-in-llm-training-data-30ecbb_2026,
  author       = {Harsh Kumar, Rahul Maity, Tanmay Joshi, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das},
  title        = {Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data},
  year         = {2026},
  url          = {https://astrobobo-content-engine.vercel.app/article/poisoned-pretraining-hidden-attacks-embedded-in-llm-training-data-30ecbb},
  note         = {Astrobobo rewrite of arxiv/cs.AI, https://arxiv.org/abs/2604.22117},
}

#llm #security #poisoning #pretraining #adversarial

Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data

Astrobobo tool mapping

Frequently asked

Related insights

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs