ai · 8 min read · Apr 29, 2026

Model Architecture Controls Whether Errors Stay Hidden

Transformer design determines if internal decision signals remain observable after training, independent of output confidence metrics.

Source: arxiv/cs.LG · Thomas Carmichael · open original ↗

Transformer architecture, not just training, determines whether mid-layer activations expose token-level decision quality hidden from confidence scores.

  • Output confidence absorbs 57.7% of raw probe signal, masking true decision quality in frozen activations.
  • 24-layer 16-head configurations collapse to near-zero observability across parameter scales; other configs maintain healthy signal.
  • Observability collapse emerges during training despite improving loss, suggesting architectural constraints erase internal signals.
  • Qwen 2.5 and Llama differ by 2.9x observability at matched 3B scale with non-overlapping probe distributions.
  • Error-detection probes trained on WikiText catch 10.9–13.4% of errors confidence misses across downstream tasks.
  • Nonlinear probes and layer sweeps fail to recover signal in collapsed configurations.
  • Architecture selection functions as a monitoring decision with measurable consequences for error detection.

Astrobobo tool mapping

  • Knowledge Capture Document the observability scores (rho_partial) for each model architecture you evaluate. Store alongside accuracy, latency, and other selection criteria.
  • Focus Brief Before committing to a model family, create a one-page observability summary: which layer-head configs collapse, which preserve signal, and what that means for your error-detection strategy.
  • Reading Queue Add Carmichael's paper and related interpretability work to your queue. Observability is an emerging design criterion; staying current is now part of responsible model selection.

Frequently asked

  • Confidence (max-softmax) and activation norm absorb approximately 57.7% of the raw signal that probes can extract from mid-layer activations. This means a model can be confident in its output while the internal decision-making process—visible only in frozen activations—shows uncertainty or error. Controlling for these factors reveals hidden signal that confidence alone cannot expose.
Share X LinkedIn
cite
APA
Thomas Carmichael. (2026, April 29). Model Architecture Controls Whether Errors Stay Hidden. Astrobobo Content Engine (rewrite of arxiv/cs.LG). https://astrobobo-content-engine.vercel.app/article/model-architecture-controls-whether-errors-stay-hidden-ae7584
MLA
Thomas Carmichael. "Model Architecture Controls Whether Errors Stay Hidden." Astrobobo Content Engine, 29 Apr 2026, https://astrobobo-content-engine.vercel.app/article/model-architecture-controls-whether-errors-stay-hidden-ae7584. Based on "arxiv/cs.LG", https://arxiv.org/abs/2604.24801.
BibTeX
@misc{astrobobo_model-architecture-controls-whether-errors-stay-hidden-ae7584_2026,
  author       = {Thomas Carmichael},
  title        = {Model Architecture Controls Whether Errors Stay Hidden},
  year         = {2026},
  url          = {https://astrobobo-content-engine.vercel.app/article/model-architecture-controls-whether-errors-stay-hidden-ae7584},
  note         = {Astrobobo rewrite of arxiv/cs.LG, https://arxiv.org/abs/2604.24801},
}

Related insights