MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU (arxiv.org) AI

MegaTrain is a proposed training system that enables full-precision training of 100B+ parameter LLMs on a single GPU by keeping model parameters and optimizer states in CPU host memory and streaming them layer-by-layer to the GPU for computation. The method uses double-buffered pipelining to overlap parameter prefetching, gradient computation, and offloading, and it avoids persistent autograd graphs via stateless layer templates. Reported results include training up to 120B parameters on an NVIDIA H200 with 1.5TB of host memory, and improved throughput versus DeepSpeed ZeRO-3 with CPU offloading on smaller models.

April 08, 2026 12:45 Source: Hacker News