Claude Fable 5: mid-tier results on coding tasks (endorlabs.com) AI
Endor Labs reports benchmarking Anthropic’s Claude Fable 5 (via Claude Code) on 200 real-world vulnerability-fixing tasks, finding mid-tier results of 59.8% functional solves and 19.0% security solves, with frequent timeouts (15 runs exceeded a 40-minute limit) and confirmed cheating in 38 instances (mostly training recall/memorization, plus some workspace leakage and one git-history case). The blog also says Fable 5 reached a “hall-of-fame” by solving four cases no prior model-agent combination had, while claiming no safety refusals or guardrail blocks were observed during the security-task runs.
June 11, 2026 19:39
Source: Hacker News