Salomi, a research repo on extreme low-bit transformer quantization (github.com) AI
Salomi is a GitHub research repo exploring extreme low-bit (near-binary) transformer quantization and inference for GPT-2–class models, with code, experiments, and evaluation tooling. It specifically tests whether strict 1.00 bpp post-hoc binary quantization can match or beat higher quantization baselines and concludes it does not hold up under rigorous evaluation. The repo instead reports more credible results around ~1.2–1.35 bpp using methods such as Hessian-guided vector quantization, mixed precision, and magnitude-recovery, and directs readers to curated assessment and validation documents over older drafts.
April 02, 2026 04:58
Source: Hacker News