KVarN: Native vLLM KV-cache quantization back end by Huawei (github.com) AI
Huawei has released KVarN, an Apache-licensed native vLLM KV-cache quantization backend that aims to boost long-context capacity (3–5x) and maintain FP16-level accuracy while achieving throughput above FP16, using a calibration-free “one flag” integration. The project describes a variance-normalization approach (including channel rotation and variance normalization) and reports matching FP16 accuracy on Qwen3-32B while improving throughput versus FP16, with implementation details and a specific kv-cache dtype preset for deployment.
June 04, 2026 15:45
Source: Hacker News