RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8 (imil.net) AI

The post describes a local multi-GPU Linux setup pairing an RTX 5080 with an RTX 3090 to run Qwen 3.6 27B quantized to Q8, focusing on BIOS/PCIe settings, driver considerations, and llama.cpp build/run parameters; the author reports throughput around 80–90 tokens per second with speculative decoding enabled (draft MTP/“ngram-mod”).

June 13, 2026 13:15 Source: Hacker News