Google is reportedly evaluating the effectiveness of its Gemini AI by comparing its responses with those generated by Anthropic’s Claude artificial intelligence models. According to sources, the company based in Mountain View has instructed third-party contractors, tasked with assessing the quality of Gemini’s outputs, to also determine whether Claude’s responses are superior. This practice, while not unusual, necessitates approval from the originating company. However, it remains unclear if Google has secured permission from Anthropic for this comparison.
Enhancements to Gemini through Claude
A recent report from TechCrunch indicates that contractors working on the Gemini project are actively comparing its outputs against those from Claude, a direct competitor. The publication claims to have reviewed internal communications suggesting that these comparisons are being carried out as directed by Google.
Contractors reportedly have up to 30 minutes to evaluate responses to specific prompts, deciding whether they prefer the output from Gemini or Claude. Many of these contractors are subject matter experts who assess the chatbot’s answers on a range of technical and niche topics, focusing on criteria such as accuracy, clarity, and thoroughness.
As the evaluation process progressed, some contractors noted that Gemini’s outputs began referencing Anthropic and Claude. One particular response allegedly stated, “I am Claude, created by Anthropic.” This occurrence raises questions about whether developers may be incorporating Claude’s superior outputs into Gemini, rather than merely comparing them.
According to Anthropic’s commercial terms, individuals accessing Claude are prohibited from developing “competing products or services” or training “competing AI models” without prior approval from the company.
In a statement to TechCrunch, Shira McNamara, a spokesperson for Google DeepMind, affirmed that while they do compare outputs for evaluation, Gemini is not trained on Anthropic’s AI models.
Additionally, a previous report suggested that contractors assessing Gemini’s responses were instructed to evaluate outputs beyond their areas of expertise. Earlier provisions allowing them to skip such questions have reportedly been eliminated.