Search
4 results for "tokens"
- ai · arxiv/cs.AI · 6 min
OjaKV: Online Low-Rank Compression for LLM Key-Value Caches
A hybrid storage and adaptive subspace method reduces KV cache memory by compressing intermediate tokens while preserving critical anchors, compatible with FlashAttention.
Apr 20, 2026 Read → - ai · arxiv/cs.LG · 4 min
Neural CTMC decouples discrete diffusion into timing and direction
A new parameterization for discrete diffusion models separates when and where tokens jump, aligning training with mathematical structure.
Apr 20, 2026 Read → - engineering · hackernoon · 7 min
Claude Code model tiers and effort levels, explained plainly
Choosing the wrong model or effort level in Claude Code wastes tokens silently. Here is what each setting actually controls.
Apr 19, 2026 Read → - ai · arxiv/cs.AI · 8 min
Token Importance in On-Policy Distillation: Entropy and Disagreement
Research identifies two regions of high-value tokens in knowledge distillation: high-entropy positions and low-entropy positions where student and teacher disagree, enabling 50–80% token reduction.
Apr 17, 2026 Read →