The Invisible Hand in Scientific Writing: The Rise of Large Language Models

A recent study reveals the growing influence of large language models in scientific writing, with at least 10% of 2024 scientific abstracts generated or assisted by LLMs.
The Invisible Hand in Scientific Writing: The Rise of Large Language Models
Photo by Jess Bailey on Unsplash

Large Language Models: The Invisible Hand in Scientific Writing

A recent study has shed light on the prevalence of large language models (LLMs) in scientific writing. By analyzing “excess word usage” in scientific abstracts published on PubMed from 2010 to 2024, researchers have developed a new technique to estimate the influence of LLMs on academic writing.

Identifying the Invisible Hand

The challenge of detecting AI-generated text has long perplexed AI companies and researchers alike. However, a novel approach has been proposed by researchers at the University of Tubingen and Northwestern University. By examining the sudden rise in specific vocabulary within scientific abstracts, they offer a unique way to identify the influence of LLMs on academic writing.

Image credit: Glen Carrie/Unsplash

Inspiration from Pandemic Studies

The researchers drew inspiration from studies that measured the impact of the COVID-19 pandemic through excess deaths compared to historical data. Applying a similar approach, they analyzed “excess word usage” in scientific abstracts published on PubMed from 2010 to 2024. This comparison revealed significant changes in vocabulary coinciding with the widespread adoption of LLMs in late 2022.

Analyzing the Data

To measure these changes, the team scrutinized 14 million abstracts, tracking the frequency of each word annually. By comparing the expected word frequency, based on pre-2023 trends, to actual usage in 2023 and 2024, they identified a dramatic increase in certain terms. For example, the word “delves” appeared 25 times more frequently in 2024 abstracts than anticipated. Similarly, “showcasing” and “underscores” saw a ninefold increase in usage.

The researchers noted that, unlike the noun-heavy vocabulary shifts during the COVID-19 pandemic, the post-LLM era saw a rise in verbs, adjectives, and adverbs.

Vocabulary Shifts

This surge in specific words, dubbed “marker words,” is a key indicator of LLM usage. While language naturally evolves, such abrupt and widespread changes were previously only associated with significant global events like health crises.

Image credit: Glen Carrie/Unsplash

Geographical Variations in LLM Usage

The study also highlighted geographical differences in LLM usage. Papers from countries like China, South Korea, and Taiwan exhibited a higher frequency of marker words, suggesting LLMs are particularly useful for non-native English speakers in editing and composing scientific texts.

Conclusion

By identifying these marker words, researchers can estimate that at least 10% of 2024 scientific abstracts were generated or assisted by LLMs. This figure likely underestimates the true extent, as not all LLM-assisted texts will include these specific markers.

This study raises important questions about the role of LLMs in scientific writing and the potential implications for academic integrity.