Benchmarks in Leipzig (arxiv.org) AI
“Benchmarks in Leipzig” reports a new dataset of 100 research-level mathematics questions compiled by 49 mathematicians during a 3-day workshop in Leipzig, with outcomes tracked across multiple LLM evaluation stages.
June 06, 2026 14:10
Source: Hacker News