engineering · 8 min read · Apr 28, 2026

Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss

A hardware architecture that decrypts neural network weights at 64-byte granularity, hiding cryptographic overhead within DRAM fetch latency on shared-memory edge accelerators.

Source: arxiv/cs.LG · Animan Naskar · open original ↗

Tessera decrypts DNN weights inline at cache-line granularity, achieving near-zero overhead on UMA edge devices by parallelizing AES-256-CTR with DRAM access.

  • UMA systems expose plaintext model weights to OS-level and physical attacks because CPU and NPU share DRAM.
  • Page-level encryption (4 KB granularity) wastes bandwidth fetching entire pages for small tensor tiles, incurring up to 32x penalty.
  • Tessera intercepts 64-byte AXI bursts and computes AES-256-CTR keystreams in parallel with DRAM fetches, hiding crypto latency.
  • Decrypted weights stream directly into isolated NPU SRAM, eliminating permanent memory carve-outs required by trusted execution environments.
  • Measured across three SoC platforms, Tessera achieves 98.4% of theoretical bandwidth with only 1.6% overhead.
  • Architecture neutralizes DRAM extraction, rogue DMA, and compute hijacking attacks while preventing plaintext leakage across sparse tensors.
  • Design maintains constant 1x memory footprint across all layer geometries, unlike page-level schemes that degrade with irregular tensor shapes.

Astrobobo tool mapping

  • Reading Queue Add the Tessera paper to your queue, prioritizing Section 3 (architecture) and the bandwidth measurement results in Section 5.
  • Knowledge Capture Document the key insight: crypto latency can be hidden if keystream generation is pipelined with DRAM fetch. Capture the specific timing constraints (64-byte burst, AES-256-CTR, DRAM access time) that make this work.
  • Focus Brief Summarize the threat model (OS compromise, physical DRAM extraction, rogue DMA) and how Tessera neutralizes each. Note the gaps (side-channel, key management, sparse models).

Frequently asked

  • Page-level encryption operates at 4 KB granularity. When a neural network layer accesses a small tensor tile (e.g., 64 bytes), the system must fetch the entire 4 KB page, decrypt it, and extract the needed bytes. This forces unnecessary data movement and cache pollution. Tessera avoids this by decrypting at 64-byte cache-line granularity, matching the actual memory access size.
Share X LinkedIn
cite
APA
Animan Naskar. (2026, April 28). Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss. Astrobobo Content Engine (rewrite of arxiv/cs.LG). https://astrobobo-content-engine.vercel.app/article/tessera-cache-line-encryption-for-edge-ai-without-bandwidth-loss-df6bf3
MLA
Animan Naskar. "Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss." Astrobobo Content Engine, 28 Apr 2026, https://astrobobo-content-engine.vercel.app/article/tessera-cache-line-encryption-for-edge-ai-without-bandwidth-loss-df6bf3. Based on "arxiv/cs.LG", https://arxiv.org/abs/2604.23205.
BibTeX
@misc{astrobobo_tessera-cache-line-encryption-for-edge-ai-without-bandwidth-loss-df6bf3_2026,
  author       = {Animan Naskar},
  title        = {Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss},
  year         = {2026},
  url          = {https://astrobobo-content-engine.vercel.app/article/tessera-cache-line-encryption-for-edge-ai-without-bandwidth-loss-df6bf3},
  note         = {Astrobobo rewrite of arxiv/cs.LG, https://arxiv.org/abs/2604.23205},
}

Related insights