Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency (blog.google) AI

Google’s The Keyword says it is releasing Gemma 4 checkpoints optimized with quantization-aware training (QAT) to cut memory requirements and improve on-device performance, including Q4_0 and a mobile-specialized quantization format (claimed to reduce Gemma 4 E2B memory footprint to about 1GB). The post describes how QAT and custom mobile quantization strategies aim to preserve quality while reducing VRAM/storage, and notes support across tools like Hugging Face, llama.cpp, vLLM, LiteRT-LM, and Transformers.js.

June 05, 2026 17:10 Source: Hacker News