Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss
A hardware architecture that decrypts neural network weights at 64-byte granularity, hiding cryptographic overhead within DRAM fetch latency on shared-memory edge accelerators.
Tessera decrypts DNN weights inline at cache-line granularity, achieving near-zero overhead on UMA edge devices by parallelizing AES-256-CTR with DRAM access.
- — UMA systems expose plaintext model weights to OS-level and physical attacks because CPU and NPU share DRAM.
- — Page-level encryption (4 KB granularity) wastes bandwidth fetching entire pages for small tensor tiles, incurring up to 32x penalty.
- — Tessera intercepts 64-byte AXI bursts and computes AES-256-CTR keystreams in parallel with DRAM fetches, hiding crypto latency.
- — Decrypted weights stream directly into isolated NPU SRAM, eliminating permanent memory carve-outs required by trusted execution environments.
- — Measured across three SoC platforms, Tessera achieves 98.4% of theoretical bandwidth with only 1.6% overhead.
- — Architecture neutralizes DRAM extraction, rogue DMA, and compute hijacking attacks while preventing plaintext leakage across sparse tensors.
- — Design maintains constant 1x memory footprint across all layer geometries, unlike page-level schemes that degrade with irregular tensor shapes.
Astrobobo tool mapping
- Reading Queue Add the Tessera paper to your queue, prioritizing Section 3 (architecture) and the bandwidth measurement results in Section 5.
- Knowledge Capture Document the key insight: crypto latency can be hidden if keystream generation is pipelined with DRAM fetch. Capture the specific timing constraints (64-byte burst, AES-256-CTR, DRAM access time) that make this work.
- Focus Brief Summarize the threat model (OS compromise, physical DRAM extraction, rogue DMA) and how Tessera neutralizes each. Note the gaps (side-channel, key management, sparse models).
Frequently asked
- Page-level encryption operates at 4 KB granularity. When a neural network layer accesses a small tensor tile (e.g., 64 bytes), the system must fetch the entire 4 KB page, decrypt it, and extract the needed bytes. This forces unnecessary data movement and cache pollution. Tessera avoids this by decrypting at 64-byte cache-line granularity, matching the actual memory access size.
cite ▸
APA
Animan Naskar. (2026, April 28). Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss. Astrobobo Content Engine (rewrite of arxiv/cs.LG). https://astrobobo-content-engine.vercel.app/article/tessera-cache-line-encryption-for-edge-ai-without-bandwidth-loss-df6bf3
MLA
Animan Naskar. "Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss." Astrobobo Content Engine, 28 Apr 2026, https://astrobobo-content-engine.vercel.app/article/tessera-cache-line-encryption-for-edge-ai-without-bandwidth-loss-df6bf3. Based on "arxiv/cs.LG", https://arxiv.org/abs/2604.23205.
BibTeX
@misc{astrobobo_tessera-cache-line-encryption-for-edge-ai-without-bandwidth-loss-df6bf3_2026,
author = {Animan Naskar},
title = {Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss},
year = {2026},
url = {https://astrobobo-content-engine.vercel.app/article/tessera-cache-line-encryption-for-edge-ai-without-bandwidth-loss-df6bf3},
note = {Astrobobo rewrite of arxiv/cs.LG, https://arxiv.org/abs/2604.23205},
}