Detecting the use of ChatGPT in academic papers through redundant vocabulary

Frequencies of PubMed abstracts containing specific words. Black lines show counterfactual extrapolations
From 2021–22 to 2023–24. The first six words are affected
ChatGPT; The last three relate to major events that affected
Scientific writing and show for comparison. (Credit: Kobak et al., 2024)

It’s no secret that students these days love using ChatGPT to help with reports and other writing tasks, but in academics it’s becoming more prevalent as well. This raises the question of whether academic writings can be distinguished with the help of ChatGPT in some way. according to [Dmitry Kobak] This is the case and colleagues, with a strong sign of ChatGPT use being a lot of extra flowery vocabulary in the text. As detailed in Their pre-publication paperThe repetition of some stereotypical words represents a noticeable change in the vocabulary used in the published works examined.

In their study, they looked at more than 14 million biomedical abstracts from 2010 to 2024 obtained via PubMed. These summaries were then analyzed for word usage and frequency, showing natural increases in word frequency (e.g. from the SARS-CoV-2 pandemic and Ebola outbreak), as well as dramatic increases in excess vocabulary that coincide with the public availability of ChatGPT and similar tools Depends on LLM.

In total, 774 unique redundant words were annotated. “Excess” here means “outside the norm,” following a pattern of “excess deaths” where deaths during one period deviate markedly from patterns established during previous periods. In this regard, there is a stumbling block in words such as: respiratory It is logical, but the mutation in the style of words is like complicated And In particular This appears to be due to MBAs’ penchant for such flowery and overly dramatic language.

The researchers have Analysis code exists For those who want to try it on another group. Also the lead author directed The question of whether ChatGPT might influence people to write more like an LLM. At this point, it is still an open question whether people are more inclined to use ChatGPT-like vocabulary or strive to avoid looking like LLM students.

