Recent findings reveal a clear preference among journalists for human-generated summaries over those produced by artificial intelligence, particularly ChatGPT. A quantitative survey conducted among these professionals confirmed this bias, with the AI-generated summaries receiving an average score of only 2.26 when respondents were asked if they could realistically be included in their summary collections. Similarly, when evaluating the summaries’ compelling nature, they scored even lower, averaging just 2.14. Notably, only one summary achieved a perfect score of ‘5’ in either category, while 30 ratings were marked as ‘1’.
In addition to numeric ratings, journalists provided qualitative feedback outlining their concerns about ChatGPT’s summaries. Common critiques included the AI’s tendency to confuse correlation with causation and its failure to offer necessary context, such as the inherent slowness of soft actuators. Furthermore, the summaries often employed excessive adjectives like “groundbreaking” and “novel,” although this issue was somewhat alleviated when specific prompts addressed it.
The research indicated that while ChatGPT excelled at accurately transcribing information from scientific papers lacking complexity, it struggled significantly with translating findings in a meaningful way. Challenges were particularly evident when summarizing papers with conflicting results or when tasked with condensing two interconnected studies.
Despite the AI’s tone and style frequently aligning with human writing, concerns about factual accuracy were prevalent among journalists. They noted that relying on ChatGPT summaries as a preliminary source would demand as much, if not more, effort than creating summaries from scratch, largely due to the extensive fact-checking required.
These results echo earlier studies highlighting the shortcomings of AI search engines, which have been found to cite inaccurate news sources up to 60% of the time. Such inaccuracies are especially troubling in the realm of scientific literature, where precision and clear communication are crucial.
Ultimately, the AAAS journalists concluded that ChatGPT fails to meet the necessary standards for briefs within the SciPak press package. However, they acknowledged the potential to re-evaluate the technology in the future, should ChatGPT undergo significant improvements. Noteworthy is the fact that GPT-5 was publicly unveiled in August, suggesting that advancements may be on the horizon.