OpenAI’s recent livestream event focused on the unveiling of GPT-5 included impressive charts that showcased the model’s capabilities. However, upon closer inspection, some of the graphs raised questions regarding their accuracy.
One notable chart, intended to illustrate GPT-5’s performance in “deception evaluations,” displayed inconsistencies in its scales. For instance, the graph indicated a 50.0 percent deception rate for GPT-5 with reasoning, yet compared this to an o3 score of 47.4 percent, which was represented by a larger bar. Interestingly, OpenAI’s blog post provides what appears to be corrected figures, listing GPT-5’s deception rate at 16.5 percent.
Another chart presented during the livestream inaccurately indicated that one score for GPT-5 was lower than that of o3, while showing them with equal bar sizes. The discrepancies reached a point that CEO Sam Altman referred to it as a “mega chart screwup,” although he acknowledged that the accurate information is available in the company’s blog.
A member of OpenAI’s marketing team issued an apology, stating, “We fixed the chart in the blog, guys, apologies for the unintentional chart crime.”
In a follow-up discussion on Reddit, Altman addressed user inquiries about the graphs. He confirmed that while the numbers were accurate, issues with the bar charts during the livestream were indeed present; he noted that errors occurred on another slide as well. He attributed the mistakes to fatigue, stating that “a lot comes together for a livestream in the last hours.”
The situation has put OpenAI in a challenging position on a significant launch day, particularly as the company emphasizes “significant advances in reducing hallucinations” in its latest model.
Update, August 8th: Added Reddit comment from Altman.