The revenge of the data scientist (hamel.dev) AI

The post argues that much of “LLM harnessing” and evaluation is still traditional data science, despite claims that the field is declining or that engineering teams can rely on APIs and generic tooling. It highlights common eval pitfalls—such as using generic metrics, unverified LLM judges, weak experimental design, low-quality data/labels, and over-automation—and explains how data scientists would approach each with trace analysis, error breakdowns, proper validation, and domain-expert labeling.

April 02, 2026 00:24 Source: Hacker News