Dario Amodei, CEO of Anthropic, has claimed that artificial intelligence (AI) models “hallucinate” less frequently than humans do. This assertion was made during the company’s inaugural Code With Claude event on Thursday, where Anthropic introduced two new Claude 4 models alongside enhanced features, including advances in memory and tool utilization. Amodei also commented on the challenges posed by critics of AI, suggesting that their concerns are largely unfounded.
Anthropic CEO Downplays AI Hallucinations
In a recent briefing, Amodei addressed the issue of AI hallucinations, indicating that they are not a barrier to the pursuit of artificial general intelligence (AGI). When asked for clarification, he remarked, “It really depends how you measure it, but I suspect that AI models probably hallucinate less than humans; however, they can do so in more unexpected ways.”
Amodei went on to emphasize that errors are common among TV broadcasters, politicians, and professionals in various fields, suggesting that the mistakes made by AI do not negate its overall intelligence. Nonetheless, he did admit that AI’s tendency to provide confidently incorrect answers poses a significant challenge.
Earlier this month, Anthropic faced legal scrutiny when its Claude chatbot erroneously included an incorrect citation in a court filing, as reported by Bloomberg. This incident occurred during the company’s ongoing lawsuit against music publishers over allegations of copyright infringement regarding the lyrics of over 500 songs.
In a paper published in October 2024, Amodei expressed optimism that Anthropic may achieve AGI as early as next year. AGI is defined as a category of AI capable of comprehending, learning, and applying knowledge across a broad spectrum of tasks without human intervention.
As part of its aspirations, Anthropic unveiled the Claude Opus 4 and Claude Sonnet 4 during the developer conference. The new models showcase significant enhancements in coding efficiency, tool use, and writing capabilities, with Claude Sonnet 4 achieving a score of 72.7 percent on the SWE-Bench benchmark, earning it a state-of-the-art distinction in code generation.