Researcher Claims 2% of Published Papers Resemble Paper Mill Works

Jonathan BaileyNovember 9, 2023

6 minutes read

A recent article by Richard Van Noorden in Nature highlights the work of Adam Day. Day is the director of a scholarly data services company named Clear Skies, which has developed Papermill Alarm, a tool that attempts to identify journal articles and submissions that came from paper mills.

According to Day, this is achieved by looking at patterns and similarities between published papers and known paper mill-generated works. He claims that this is the best approach for detecting paper mill works, since these works are created in large batches and, generally, carry formatting, styling and other similarities between them.

Day used his software to analyze some 48 million papers published since 2000. He found that some 400,000 of those articles share “strong textual similarities” to known paper mill studies, and that includes some 70,000 in 2022 alone. All in all, the study estimates that between 1.5% and 2% of all papers published in 2022 came from essay mills.

Day also examined a smaller subset of 2.85 million works published in 2022 where information about the subject area was available. In that subset, he found 2.2% resembled paper mill studies, but found wide variances in subject, with medicine and biology coming in at over 3% while Economics was well below 0.5%.

However, others in the field have been quick to criticize this analysis. Day only shared the analysis Nature, and it has not been through peer review nor has it been published. Of particular concern was the potential for false positives, with legitimate research being caught in the net simply for using a similar template.

Day also has not shared details about how his tool works, citing the need to protect his intellectual property and prevent paper mills from learning how to circumvent it.

In June 2022, the Committee on Publication Ethics (COPE) and the International Association of Scientific, Technical, and Medical Publishers (STM) released a similar study about paper mill works and found the most journals could expect to see about 2% of the papers submitted to them be paper mill works, with those that published paper mill works see that number leapt to as high as 46%.

Note: STM uses Clear Skies as part of its Integrity Hub service.

However, that study looked at papers submitted to journals, where the most recent one looks at publications. That’s why other experts in the field believe that Day’s estimates are reasonable but at the upper bounds of likelihood. Elizabeth Bik and David Bimler, both research integrity experts interviewed for the Nature article, both indicated that the numbers are, at the very least, possible, but felt the numbers were high.

Day himself, on the other hand, seems to think his study is likely an underestimation as his tool can only compare to known paper mill works. As such, he believes that the likelihood of false negatives is greater than false positives, especially since he claimed to have taken significant steps to eliminate the latter.

All this points to a significant problem: Finding out just how large of an issue paper mills are for scholarly publishing.

A Growing Concern

Back in March 2021, Bik, along with other researchers, published an article that identified some 1,400 published papers that were, at least potentially, linked to essay mills. However, even after notifying journals of the issue, only a small fraction of the papers were retracted.

While that was one of the first major alarms for the publishing industry, efforts to track the size and scope of paper mills go back to at least 2007, and likely far beyond that.

That, in turn, is something that’s pointed out by Day’s recent work. According to his unpublished estimates, he was able to find examples of likely essay mill publications going back to 2000, the earliest year in his data set.

However, according to data provided by Day, the problem has seen almost linear growth since 2000. Every year, there has been an increase in the percentage of papers published that have strong similarities with paper mill products.

Since it’s impossible and impractical to thoroughly investigate each individual work, it’s equally impossible to know how many false positives and negatives there are. As such, there’s likely no way to know, with any certainty, how big of a problem this is on the submission side or the publication side.

That said, there are three things we can say with relative certainty based on this and other research in this space:

“Researchers” Are Increasingly Turning to Paper Mills: The temptation to turn to paper mills is an ever-present one for those who are in positions where they need publications to either maintain their position or advance, but lack the time, funding and/or interest to do original research. The use of paper mills appears to be growing, along with the industry.
Legitimate Journals Are Being Impacted: The issue is often paired with predatory journals, which are journals that will publish virtually any paper, as long as they are paid a fee. However, paper mill papers are, increasingly, being submitted to legitimate journals who are being tasked with filtering out these papers. This has made tools like STM Integrity Hub crucial in the fight.
The Defenses Are Imperfect: No matter how many false positives and negatives there are in spotting these cases, it’s clear that the number of paper mill papers being published in legitimate, respected journals is not zero. That number is, ultimately, too high.

So that begs the question: What can be done about this problem?

Addressing the Problem

In the short term, journals need to strengthen both their review and peer review process in anticipation of receiving papers that were produced by paper mills. Tools like STM Integrity Hub and Papermill Alarm will likely be key components of any such strategy.

But regardless of what tools are used, checking to see if a paper is similar to paper mill works should be as much a part of the evaluation process as a plagiarism analysis, examination of the data and other steps that are part of the review process.

In the medium term, there needs to be a real conversation about if and how such services are allowed to operate.

Essay mills, which function similarly but are targeted at students rather than researchers, have long been under legal fire. They were outright made illegal in Australia, were cut off from payment processors and had their advertising targeted in the UK.

This strong response is largely owed to the fact that essay mills disproportionately targeted minors and young adults attending government institutions, namely high schools and colleges. Getting governments and other intermediaries to care that much about mills that target researchers submitting work to private publishers will be an uphill battle.

But while an outright ban may not be warranted or practical, broader efforts to dissuade and discourage the use of paper mills could yield results.

That said, as we discussed when looking at the issue of “Zombie Plagiarism”, change will be limited without systemic improvements. As long as there is a pressure to publish and a focus on quantity of publications over quality of publications, paper mills will likely continue to thrive, especially when paired with predatory journals.

Bottom Line

In a way, this seems like a strange problem to focus on right now. In November 2022, Open AI launched ChatGPT to the public and, with it, kicked off the generative AI land rush we’ve seen over the past year.

For essay mills and other cheating sites, this has been a complete disaster. There’s simply little reason for students to pay money for test answers and human-written essays when an AI can, most likely, generate what they need for free.

As such, in the classroom, the focus has shifted away from essay mills and contract cheating onto AI. Though essay mills and other contract cheating services certainly do exist, their influence is also clearly waning.

However, paper mills won’t be so quickly replaced in the research space. While AI will definitely have impacts here, paper mills provide more than just a paper. They often provide help finding a place to publish it and in securing the desired publication. We’ve seen this in the past with how paper mills have quickly pivoted where they submit papers to

Combine that with the complexities of the topics at hand and the (hopefully) thorough examination of peer review, AI will struggle to compete with humans in this space. That is, as long as authors hope to get published in legitimate and respected journals.

That’s not to say that AI-generated research papers aren’t or won’t be a problem. We’ve already had AI-generated abstracts fool scientists in studies. It just means that the human to AI ghostwriting transition will be slower in academic publishing than student essays.

That means that paper mills will remain relevant, even as the use of AI likely grows. That makes this a difficult one-two punch that journals are going to struggle to deal with.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free