Supervised Learning Has Built-In Geometric Blindness
Mathematical proof shows empirical risk minimization must preserve sensitivity to label-correlated but test-irrelevant features—a structural constraint, not a training bug.
Supervised learning mathematically requires encoders to retain sensitivity to training-label correlations that don't generalize, creating an unavoidable geometric constraint.
- — ERM imposes necessary Jacobian sensitivity in directions correlated with labels but irrelevant at test time.
- — This constraint unifies four separate empirical phenomena: non-robust features, texture bias, corruption fragility, robustness-accuracy tradeoff.
- — Trajectory Deviation Index (TDI) directly measures this blind spot; standard metrics like Frobenius norm miss it.
- — PGD adversarial training achieves high Jacobian magnitude but poor clean-input geometry (TDI 1.336 vs PMH 0.904).
- — Blind spot worsens in larger language models (ratio 0.860→0.742 from 66M to 340M parameters).
- — Task-specific ERM fine-tuning amplifies the blind spot by 54%; PMH repairs it 11x with one Gaussian-form training term.
- — Defect appears at foundation-model scale across vision, NLP, and multimodal architectures (CLIP, DINO, SAM, ViT-B/16).
- — Proposition 5 proves the repair term is the unique perturbation law that uniformly penalizes encoder Jacobian.
Astrobobo tool mapping
- Knowledge Capture Document the distinction between Jacobian magnitude (Frobenius norm) and isotropic path-length distortion (TDI) in your model evaluation playbook. Note that high Frobenius does not guarantee clean-input robustness.
- Focus Brief Summarize Theorem 1 (the geometric blind spot) and Proposition 5 (the unique repair form) as a one-page reference for your ML team. Include the PMH loss term and its Gaussian structure.
- Daily Log When running next adversarial training experiment, log both Frobenius norm and TDI side-by-side. Record whether they diverge; if so, flag for deeper investigation.
Frequently asked
- It is a mathematical necessity of empirical risk minimization: any encoder trained to minimize supervised loss must retain non-zero sensitivity (Jacobian) in directions that correlate with training labels but are irrelevant at test time. This is not a bug in current methods but a structural property of the supervised objective itself, proven in Theorem 1.
cite ▸
Vishal Rajput. (2026, April 24). Supervised Learning Has Built-In Geometric Blindness. Astrobobo Content Engine (rewrite of arxiv/cs.AI). https://astrobobo-content-engine.vercel.app/article/supervised-learning-has-built-in-geometric-blindness-0a1a7e
Vishal Rajput. "Supervised Learning Has Built-In Geometric Blindness." Astrobobo Content Engine, 24 Apr 2026, https://astrobobo-content-engine.vercel.app/article/supervised-learning-has-built-in-geometric-blindness-0a1a7e. Based on "arxiv/cs.AI", https://arxiv.org/abs/2604.21395.
@misc{astrobobo_supervised-learning-has-built-in-geometric-blindness-0a1a7e_2026,
author = {Vishal Rajput},
title = {Supervised Learning Has Built-In Geometric Blindness},
year = {2026},
url = {https://astrobobo-content-engine.vercel.app/article/supervised-learning-has-built-in-geometric-blindness-0a1a7e},
note = {Astrobobo rewrite of arxiv/cs.AI, https://arxiv.org/abs/2604.21395},
}