FrontierCode (cognition.ai) AI

Cognition introduces FrontierCode, a new coding benchmark intended to measure whether LLM-written code would be “mergeable” into real production repositories, going beyond functional correctness to assess correctness, test quality, scope restraint, style, and adherence to repo standards. The evaluation uses repo maintainers’ real-world criteria and automated grading methods (including code-mergeability blockers, score rubrics, and techniques designed to reduce false positives/negatives), and reports that even top models struggle—e.g., best results on the hardest subset remain low in percentage scores.

June 08, 2026 21:25 Source: Hacker News