Why does INT4 quantization fail after a model converges in FP32?

Post-convergence weight updates shift the weight distribution in ways that exceed INT4's 16-level quantization grid resolution. The divergence is not caused by learning rate decay magnitude alone, but by the specific pattern of weight changes after FP32 perplexity stops improving. INT8's finer grid (256 levels) remains robust, suggesting the failure is tied to INT4's coarseness.

How can I prevent INT4 quantization collapse in my model?

Use INT8 quantization if memory permits, as it is immune to the collapse. If INT4 is required, apply an Oscillatory Lock-In schedule with amplitude-calibrated cool phases during the plateau phase. Monitor INT4 gap via a calibration-free probe on every checkpoint after convergence begins. Select a checkpoint from the meta-stable plateau phase rather than after divergence onset.

Does this problem affect all language models or just Pythia-160m?

The study audits only Pythia-160m (160 million parameters). The three-phase pattern is reproducible across all 154 public checkpoints of that model, but generalization to larger models, different architectures, or other domains (vision, speech) is not yet established. Practitioners should test the pattern on their own models before assuming it applies universally.

ai · 8 min read · Apr 17, 2026

INT4 Quantization Fails After FP32 Convergence in Predictable Phases

Post-training quantization assumes converged models are ready to compress, but INT4 quantization collapses in a three-phase pattern tied to weight updates, not learning rate decay.

Source: arxiv/cs.LG · Marcus Armstrong · open original ↗

INT4 quantization fails after FP32 convergence in three phases: improvement, plateau, then explosive divergence caused by post-convergence weight updates.

— Three-phase divergence: rapid learning, meta-stable plateau (~70k steps), explosive INT4 gap growth (11% to 517%).
— Divergence onset correlates with FP32 perplexity convergence, not learning rate decay schedule.
— INT8 quantization remains robust across all phases; failure is specific to INT4's 16-level grid coarseness.
— Weight outlier accumulation ruled out via kurtosis measurement; mechanism remains in weight distribution shift.
— Oscillatory Lock-In schedule reduces INT4 gap by 2.2 percentage points; SGDR accelerates divergence uniformly.
— Study audits all 154 Pythia-160m public checkpoints with calibration-free per-group INT4 probe.
— Post-convergence weight updates, not decay magnitude alone, are the proximate cause of quantization collapse.
— Schedule amplitude calibration determines whether perturbation helps or hurts quantization robustness.

Astrobobo tool mapping

Knowledge Capture Document the three-phase pattern (rapid learning, plateau, divergence) and the onset predictor (FP32 convergence, not LR decay) in your model development notes. Link to the paper's checkpoint audit results.
Focus Brief Create a one-page checklist: (1) Is INT4 or INT8 required? (2) If INT4, does your schedule include amplitude-calibrated oscillations? (3) When does FP32 perplexity converge? (4) Have you probed INT4 gap post-convergence?
Reading Queue Queue the paper's supplementary materials (code, probe implementation, checkpoint audit) for detailed review before implementing quantization in your pipeline.

Frequently asked

Post-convergence weight updates shift the weight distribution in ways that exceed INT4's 16-level quantization grid resolution. The divergence is not caused by learning rate decay magnitude alone, but by the specific pattern of weight changes after FP32 perplexity stops improving. INT8's finer grid (256 levels) remains robust, suggesting the failure is tied to INT4's coarseness.

Share X LinkedIn

cite ▸

APA

Marcus Armstrong. (2026, April 17). INT4 Quantization Fails After FP32 Convergence in Predictable Phases. Astrobobo Content Engine (rewrite of arxiv/cs.LG). https://astrobobo-content-engine.vercel.app/article/int4-quantization-fails-after-fp32-convergence-in-predictable-phases-e5db56

MLA

Marcus Armstrong. "INT4 Quantization Fails After FP32 Convergence in Predictable Phases." Astrobobo Content Engine, 17 Apr 2026, https://astrobobo-content-engine.vercel.app/article/int4-quantization-fails-after-fp32-convergence-in-predictable-phases-e5db56. Based on "arxiv/cs.LG", https://arxiv.org/abs/2604.15167.

BibTeX

@misc{astrobobo_int4-quantization-fails-after-fp32-convergence-in-predictable-phases-e5db56_2026,
  author       = {Marcus Armstrong},
  title        = {INT4 Quantization Fails After FP32 Convergence in Predictable Phases},
  year         = {2026},
  url          = {https://astrobobo-content-engine.vercel.app/article/int4-quantization-fails-after-fp32-convergence-in-predictable-phases-e5db56},
  note         = {Astrobobo rewrite of arxiv/cs.LG, https://arxiv.org/abs/2604.15167},
}

#quantization #compression #training #int4 #convergence #ptq

INT4 Quantization Fails After FP32 Convergence in Predictable Phases

Astrobobo tool mapping

Frequently asked

Related insights

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs