Bias Compounds, Variance Washes Out (convergentthinking.sh) AI

The post argues that using round-to-nearest in BF16 optimizer state can introduce a repeating rounding bias that compounds and causes training to plateau, while stochastic rounding produces zero-mean errors that largely cancel over time, improving convergence in a teacher-student MLP experiment.

June 01, 2026 06:15 Source: Hacker News