Two Leaps to 1000 Tokens/s on a 1T-Parameter Model (tilert.ai) AI

TileRT argues that reaching 1000+ tokens per second on a large (up to 1T-parameter) model requires a shift from kernel/operator-level tuning to a persistent, continuously running execution engine that removes microsecond “execution gaps,” plus hardware–model co-design to eliminate microsecond-scale overheads in components like RMSNorm, RoPE, KV-cache writes, and multi-token prediction.

June 08, 2026 18:35 Source: Hacker News