Could artificial intelligence one day revise scientific articles or even do what we call peer review? Currently, the first scientists tested on this topic prefer ChatGPT’s judgments to those of their colleagues.
Let us remember that this is a fundamental part of the process by which scientific knowledge is constructed: whether it is a supposed discovery, a hypothesis or a claim, all of this must be supported by published research. And ideally, this research should have been reviewed by other experts in the field before publication. This is called “peer review.”
However, this has its limits: you have to find experts who understand what the research is about, and those experts have to have the time. Traditionally, months pass between a researcher submitting an article to an academic journal and the time of publication.
Could ChatGPT replace human reviewers? Researchers led by James Zou, an expert in machine learning at Stanford University in California, wanted to test this. They asked ChatGPT-4 to provide “constructive criticism” of more than 3,000 studies published in 2022-2023 by one of the Nature group journals (hence peer-reviewed) and 1,700 articles from a world congress on machine learning (International Conference on Learning Representations). And they compared the ratings of the robots with those of humans. In the second step, they asked ChatGPT the same for a few hundred articles that had not been reviewed by anyone, and asked about 300 of their authors (all in the field of artificial intelligence or IT) to rate the robot’s criticism of them.
In the abstract, pre-published on the ArXiv server on October 3rd (which ironically means that it has not been peer-reviewed), they first write that this is the case for more than half of the published texts and for more than three In a quarter (77%) of the congress texts, the robot pointed out things that at least one of the experts had also pointed out. However, the strongest result is on the side of unpublished and unrevised texts: 82% of authors said they found ChatGPT’s critique more useful than the critiques they had received in the past on other research papers.
However, the work has significant disadvantages: In the second part of the research, the researchers’ assessment of the robot’s criticism is purely subjective and does not allow any comparison with the criticism that the robot would have expressed. a person does the same work. As for the first part of the investigation, it provides few details about what information is involved that ChatGPT did not identify but was reported by people. It remains to be seen whether scientific journals could be satisfied with these AI reviews without the assurance that the robot hasn’t “forgotten” anything important – but this question also arises in real peer review, which is far from infallible .