The Signpost


Technology report

WikiAnnotate: help us build a dataset of article quality evaluations

Contribute   —  
Share this
By Sage (Wiki Ed)

TL;DR

[edit]

I'm working with a team of researchers to collect a high-quality dataset of fine-grained Wikipedia article assessments. Experienced editors (with at least 1,000 edits) can contribute — and get paid for it — at wikiannotate.org. We'll use this dataset to build better automated article assessment tools.

Background

[edit]

I've been working at Wiki Education since 2014, building software — like the Wiki Education Dashboard — to support programs that bridge the gap between Wikipedia and academia. Our flagship program — the Wikipedia Student Program — supports hundreds of higher education courses and thousands of students every term, as professors guide their students to improve Wikipedia in their areas of expertise and interest.

The widespread adoption of AI tools has been highly disruptive — as with many online domains — to Wiki Education and our work training student editors how to contribute effectively to the sum of all human knowledge. Teaching students how Wikipedia works — and how to reliably know things and share knowledge in ways that go beyond "just trust the AI" — is more important than ever (both for Wikipedia and for the students who are learning to learn in this AI-centric information environment). You can read a recap of much of our recent work in this area, but I think the impacts AI will have on Wikipedia are just beginning.

We can and will continue adapting to the changing landscape of AI usage, but one of the things holding us back is that we don't have good tools for measuring article quality systematically and automatically. The best software tool we currently have for automatically measuring aspects of article quality — Wikimedia Foundation's ‘articlequality’ model (formerly ORES) — can't differentiate between great content written by an experienced Wikipedian and an AI-slop imitation of what a great Wikipedian would write. It uses some basic metrics, like the amount of text, number of citations, headers, images, and so on, to predict the quality of an article, but can't address anything involving the quality or accuracy of the writing itself.

For Wiki Education's programs, we have one powerful tool for catching slop: the Wiki Education Dashboard integrates with the AI detection service Pangram, automatically scanning larger edits for signs of LLM-generated text. For samples of at least a few hundred words, Pangram is very good at sorting human-written prose from text that came straight out of an LLM. However, real-world AI usage patterns are much more complicated, ranging from minor copyedits to LLM-generated text that gets extensively rewritten by hand (and everything in between). In many cases — like the increasingly AI-centric Grammarly service — it's not even obvious to a student just how much of their text came out of an LLM, because AI tools get integrated into conventional text editors. We can warn a student when we detect a high likelihood of LLM text, but that kind of strategy creates an antagonistic relationship. Students perceive that they've been accused of cheating with AI, and become defensive — and still don't get a clear indication of what the AI did badly or why we have rules against AI-written article content.

Hallucination is fundamental to the way LLMs work, but they can do a pretty good job in some respects: recent models can write understandable prose about encyclopedic topics, and they can generally follow our style guidelines when prompted to do so. Some of the things they do very badly — like accurately representing the content of individual sources — are also harder for a human to notice. (I've come to think of it like this: LLMs think they've read every book, but haven't actually read any. Everything they've trained on is a muddled mix, so they can't accurately represent any single source without accessing it directly.) But it's now possible to do much better.

wikiannotate.org

[edit]

We can build tools that use LLMs to explicitly evaluate an article against many aspects of our policies, guidelines and quality standards (like the detailed quality rubric of WP:ASSESS), and we can check against some of the ways we know AI usually fails catastrophically (like confabulating citations to sources that the AI didn't actually access).

That's what the research "Wiki Education in the Age of Generative AI" research team is working on with wikiannotate.org. We want to collect a good dataset of fine-grained article quality assessments from experienced Wikipedians — covering general aspects of quality as well as some of the specific things that AI usually does wrong — so that we can build a tool for quantifying the ways that AI usage impacts article quality. We're looking for editors to help build this dataset, with compensation available for each completed batch of evaluations. Currently we’re offering $21 USD for each batch of 5 articles.

With help from the Wikipedia editing community, we can build on the things that LLMs do well to mitigate some of the problems they are causing. Some of the possible applications include:

If you want to help, visit wikiannotate.org to sign up and do some article assessments. Each batch is expected to take 30 to 60 minutes on average, and you can complete multiple batches.

(All these em-dashes are my own. I've been overusing em-dashes my entire adult life, and I'm not about to stop.)


Signpost
In this issue
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

"Please sign to support..."

[edit]

I'd have to suggest that looks a little like canvassing. We shouldn't be telling Signpost readers which proposals to support. AndyTheGrump (talk) 15:32, 16 February 2026 (UTC)[reply]

Could you please be a bit more specific? I will do my best to address your concerns. The discussions directly linked to have already either concluded or been acted on, so even if the Signpost readers do go to those pages, they won't be able to interfere with those processes. Mitchsavl (talk) 03:30, 27 February 2026 (UTC)[reply]
My post above relates to Wikipedia:Wikipedia Signpost/2026-02-17/Technology report (n.b. the date was in fact the 16th, see diff [2]), as edited by User:Bluerasberry: Please sign to support meta:WikiCite (3), which is a proposal to establish WikiCite the citation database as an official Wikimedia project. AndyTheGrump (talk) 15:34, 10 March 2026 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0