Why Using AI for Research Papers is a Bad Idea

Indiana University School of Medicine Logo

Artificial intelligence (AI) is, without a doubt, the most important topic right now when it comes to copyright, authorship and attribution issues. The promises for AI to help speed up or improve writing processes have made it an attractive alternative.  

However, concerns about the quality of that writing, the use of copyright-protected work to train AI models and the potential for AI to deceive readers have dampened that enthusiasm. 

To that end, a “third rail” topic has been the use of AI systems when creating scientific review articles. Several publishers, including Sage and Wiley have released policies and/or statements on the topic that both limit the use of AI and require transparency for any use that does occur. 

However, is AI useful for drafting scientific papers? That is what a recent paper by Melissa A. Kacena, Lilian I. Plotkin and Jill C. Fehrenbacher, all from the Indiana University School of Medicine, sought to find out.  

They took three topics, all related to musculoskeletal health, and commissioned three separate papers on each. One was to be written by humans, the other was done through a human/AI hybrid approach and the final was written by AI. 

After the papers were created, the text was reviewed, and the citations were checked for accuracy. What they found did not bode particularly well for ChatGPT, or AI systems broadly. 

The Findings of the Paper 

The idea behind the study was simple: To find out if the use of AI reduced the time it took to write the paper and/or improved the quality of the paper.  

This was done by taking nine students and separating them into three groups. One would write the papers themselves as humans, three would work with AI using a hybrid system and three more would simply use ChatGPT to generate the papers wholesale. 

The study used ChatGPT 4.0, the latest ChatGPT model as of this writing, and all papers were reviewed by members of the faculty. 

The AI, in its biggest win, received the highest marks for being easier to read. While this was not specifically quantified in the report, the AI system used repetition to make its papers easier to read and understand, even if sometimes the repetition was excessive. 

However, that was the only major victory for ChatGPT in this study. 

According to the report, inaccuracies in the AI papers were “high,” making up to 70% incorrect references in the papers. This meshed with other research that has found ChatGPT routinely either cites incorrect sources or fabricates nonexistent ones. This is, obviously, not acceptable in a research environment. 

To make matters worse, the hybrid papers were the ones that scored the highest on similarity indexes, warning of possible plagiarism. Though it does not appear full plagiarism analyses were performed, the higher similarities indicate the likelihood that they contain more copied material. 

Furthermore, the work of the AI was seen as incomplete and sloppy. ChatGPT routinely struggled to make connections between various concepts, something required of human authors.  

However, the biggest limitation was that ChatGPT 4.0 only has information up to 2021. This made it impossible for any of the AI-generated papers to examine recent literature. This was especially challenging for one of the three papers, which dealt with COVD-19’s impact on musculoskeletal health. 

In short, though AI generation was much faster, it was much less accurate and would require a much greater deal of editing and correction to produce anything that would amount to a legitimate work of scientific publishing. 

Limitations and What This Means Moving Forward 

The biggest limitation of this study is sample size. Nine papers across three topics in one field of study does not give us much in the way of statistical relevance. It also does not help that many elements of the analysis were not quantified and are subject to reviewer opinion. 

However, if we treat the paper more as a case study, it points to some interesting issues within academic publishing when it comes to AI.  

First off, using AI is a tradeoff. While using AI can speed up the writing process, if you want a quality paper that is valid, you need to spend significantly more time with editing, fact-checking, and rewriting. The more time one spends there, the more the benefit of AI shrinks, making it less and less worth the risk. 

Second, AI clearly has an issue when it comes to identifying sources. Between incorrect citations and fabrications, AI, particularly ChatGPT, seems to be unreliable when it comes to citing sources. 

Finally, I found it particularly interesting that it was the hybrid papers that had the highest instances of overlapping text. Those papers were written using AI-generated text, but that text was based on human-created documents and outlines. This seems to indicate that ChatGPT may be more prone to copying text verbatim when it is provided documentation to review, something that is an interesting revelation. 

Overall, it is an interesting case study and, while it will not be the final word on the matter, it paints a bleak picture of the usefulness of AI for academic publishing. 

Bottom Line 

The authors of the paper make it clear that they feel that AI systems are here to stay and that they will likely improve over time, including addressing many of the shortcomings highlighted in this paper. 

While that is likely true, AI is being used to both write and help write research papers right now. To that end, this study shows why that is a bad idea. 

As we’ve seen in the legal field, using AI in a space where accuracy is important is an unwise move, at least not without significant fact checking and editing.  

To that end, if you are looking to use AI in this capacity, it is best to think of it as adding another coauthor. It is an author that you need to decide what their role will be, how you will attribute their contributions and how you will check and verify their work. 

But with that in mind, do you really want a coauthor that is prone to fabrication, incapable of drawing independent conclusions, at risk of committing plagiarism and cannot explain how or why it produced what they did? 

If ChatGPT were a human coauthor, I doubt it would have many eager to seek out their expertise. That is likely how it should be for the computer version, at least until AI makes some significant strides.  

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free