Introducing RadixAttention to Trellis (trellis.unfoldml.com) AI

Trellis introduces RadixAttention, a KV-caching approach that uses a radix tree to reuse cached key/value activations for shared prompt prefixes (common in chat systems), reducing redundant prefill compute and improving throughput, latency, and memory usage. The post details the block-paged cache design for concurrent requests and reports benchmark results showing faster (30–40%) and more memory-efficient inference as the shared-prefix fraction increases.

June 03, 2026 07:20 Source: Lobsters