Astrobobo · Content Engine

Search

4 results for "tokens"

ai · arxiv/cs.AI · 6 min

OjaKV: Online Low-Rank Compression for LLM Key-Value Caches

A hybrid storage and adaptive subspace method reduces KV cache memory by compressing intermediate tokens while preserving critical anchors, compatible with FlashAttention.

Apr 20, 2026 Read →
ai · arxiv/cs.LG · 4 min

Neural CTMC decouples discrete diffusion into timing and direction

A new parameterization for discrete diffusion models separates when and where tokens jump, aligning training with mathematical structure.

Apr 20, 2026 Read →
engineering · hackernoon · 7 min

Claude Code model tiers and effort levels, explained plainly

Choosing the wrong model or effort level in Claude Code wastes tokens silently. Here is what each setting actually controls.

Apr 19, 2026 Read →
ai · arxiv/cs.AI · 8 min

Token Importance in On-Policy Distillation: Entropy and Disagreement

Research identifies two regions of high-value tokens in knowledge distillation: high-entropy positions and low-entropy positions where student and teacher disagree, enabling 50–80% token reduction.

Apr 17, 2026 Read →