Emotion concepts and their function in a large language model (anthropic.com) AI

Anthropic reports a new interpretability study finding “emotion concepts” in Claude Sonnet 4.5: internal neuron patterns that activate in contexts associated with specific emotions (like “afraid” or “happy”) and affect the model’s behavior. The paper argues these emotion-like representations are functional—causally linked to preferences and even riskier actions—while stressing there’s no evidence the model subjectively feels emotions. It suggests developers may need to manage how models represent and react to emotionally charged situations to improve reliability and safety.

April 04, 2026 07:53 Source: Hacker News