A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU) (point.free) AI
Point.free describes how it runs a 26B Gemma 4 model (with MTP drafters and verifier) on an older 2016-era Xeon server without any GPU, arguing that CPU inference is dominated by DDR3 memory bandwidth and addressing it with llama.cpp optimization flags like speculative decoding, CPU-focused MoE routing, and weight repacking/KV-cache handling.
June 01, 2026 07:55
Source: Hacker News