The Signpost


Recent research

YOUR ARTICLE'S DESCRIPTIVE TITLE HERE

Contribute   —  
Share this
By Tilman Bayer, ...


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

TKTK include figure 1 from the paper here

A new paper titled "The Rise of AI-Generated Content in Wikipedia"[1] estimates

"that 4.36% of 2,909 English Wikipedia articles created in August 2024 contain significant AI-generated content"

In more detail, the authors used two existing AI detectors, which

"reveal a marked increase in AI-generated content in recent[ly created] pages compared to those from before the release of GPT-3.5 [in March 2022]. With thresholds calibrated to achieve a 1% false positive rate on pre-GPT-3.5 articles, detectors flag over 5% of newly created English Wikipedia articles as AI-generated, with lower percentages for German, French, and Italian articles. Flagged Wikipedia articles are typically of lower quality and are often self-promotional or partial towards a specific viewpoint on controversial topics."


These are among the first research results providing a quantitative answer to an important question that Wikipedia's editing community and the Wikimedia Foundation been weighing since at least the release of ChatGPT almost two years ago. (Cf. previous Signpost coverage: Community rejects proposal to create policy about large language models, "AI is not playing games anymore. Is Wikipedia ready?", and TKTK link ITM in this issue re Wikipedia:WikiProject AI Cleanup). The "Implications of ChatGPT for knowledge integrity on Wikipedia" were also the topic of a research project conducted in 2023-2024 by UT Sydney researchers (funded by a $32k Wikimedia Foundation grant) which just published preliminary results where "Concerns about AI-generated content bypassing human curation" are highlighted as one of the challenges voiced by Wikipedians.

The new study's numbers should be valuable as concrete evidence that the generative AI has indeed started to affect Wikipedia in this manner (but might potentially also be reassuring for those who had been fearing Wikipedia would be overrun entirely by ChatGPT-generated articles).

That said, there are several serious concerns about how to interpret the study's data, and unfortunately the authors address them only partially.

TKTK include figure 2 from the paper here

First, the researchers made no attempt to quantify how many of the articles from their headline result ("4.36% of 2,909 English Wikipedia articles created in August 2024 contain significant AI-generated content") had also been detected (and flagged or deleted) by Wikipedians. They did inspect a smaller subset, namely "the 45 English articles flagged as AI-generated by both GPTZero and Binoculars" (corresponding to 1.5% of those 2,909), finding that "Most of the 45 pages are flagged by moderators and bots with some warning, e.g., 'This article does not cite any sources. Please help improve this article by adding citations to reliable sources' or even 'This article may incorporate text from a large language model." Even for this smaller sample though, we are not told what percentage of AI-generated articles survived.

In other words, the paper is a rather unsatisfactory read for those interested in the important question of whether generative AI threatens to overwhelm or at least degrade Wikipedia's quality control mechanisms, or whether these handle LLM-generated articles just fine alongside the existing never-ending stream of human-generated vandalism, hoaxes, or articles with misssing or misleading references (see also WikiCrow TKTK link last issue). Overall, while the paper's title boldly claims to show "The Rise of AI-Generated Content in Wikipedia", it leaves it entirely unclear whether the text that Wikipedia readers actually read has become substantially more likely to be AI-generated. (Or, for that matter, the text that AI systems themselves read, considering that Wikipedia is an important training source for LLMs - i.e. whether the paper is evidence for concerns that "The ourouborous has begun".)

Secondly and more importantly, the reliability of AI content detection software - such as the two tools that the study's numerical results are based on - has been repeatedly questioned. To their credit, the authors are aware of these problems and try to address them. For example by combining the results of two different detectors, and by using a comparison dataset of articles created before the release of GPT-3.5 in March 2022 (which can be reasonably assumed to be virtually free of LLM-generated text). However, their method still leaves several questions unanswered that may well threaten the validity of the study's results overall.

In more detail, the authors "use two prominent detection tools which were suitably scalable for our study". The first tool is

GPTZero [.....] a commercial AI detector that reports the probabilities that an input text is entirely written by AI, entirely written by humans, or written by a combination of AI and humans. In our experiments we use the probability that an input text is entirely written by AI. The black-box nature of the tool limits any insight into its methodology."

The second tool is more transparent:

An open-source method, Binoculars [... uses two separate LLMs [..] to score a text s for AI-likelihood by normalizing perplexity by a quantity termed cross-perplexity [...] The input text is classified as AI-generated if the score is lower than a determined threshold, calibrated according to a desired false positive rate (FPR). [...] For our experiments, we use Falcon-7b and Falcon-7b-instruct [as the two LLMs, following the recommendation of the authors of the Binoculars paper.] Compared to competing open-source detectors, Binoculars reports superior performance across various domains including Wikipedia"

TKTK discuss "superior performance" claim and other issues

The study has only been been published as an Arxiv preprint at the time of writing, but according to a remark in the accompanying code, it has been accepted at the "NLP for Wikipedia Workshop" at next month's EMNLP conference.

...

[edit]
Reviewed by ...

...

[edit]
Reviewed by ....

Briefly

[edit]

Other recent publications

[edit]

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by ...

"..."

[edit]

From the abstract:

...

"..."

[edit]

From the abstract:

...

"..."

[edit]

From the abstract:

...

References

[edit]
  1. ^ Brooks, Creston; Eggert, Samuel; Peskoff, Denis (2024-10-10), The Rise of AI-Generated Content in Wikipedia, arXiv, doi:10.48550/arXiv.2410.08044
Supplementary references and notes:


S
In this issue
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.




       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0