Gemma 4 12B: A unified, encoder-free multimodal model (blog.google) AI

Google DeepMind announced Gemma 4 12B, a unified, “encoder-free” multimodal model that routes vision and native audio inputs directly into the LLM backbone and is designed to run locally on laptops with 16GB of VRAM or unified memory.

June 03, 2026 16:25 Source: Hacker News