The Signpost

Special report

Introducing Credibility Bot: Reliable Source Monitoring Anywhere

Contribute  —  
Share this
By Ocaasi and Harej

What is the problem?

Vaccine misinformation and disinformation have significant public health consequences. Because of this, the Wikipedia community works diligently to maintain the reliability of vaccine-related articles. There are guidelines ensuring vaccine content is evidence-based. Editors and administrators monitor and edit articles to maintain quality and combat misinformation.

Currently, much assessment happens ad hoc on the talk pages of individual articles, which lacks visibility to the broader community. It is still laborious to find and replace poor sources. Editors primarily lack centralized tracking of citations and centralized task management. Wikipedians struggle to get the signals they need to collectively prioritize their work. Even if volunteer editors have the capability to assess sources in one area, they lack the tools to do so across topics within an entire language edition of Wikipedia. Tools like WP:CITEWATCH and WP:JCW cover significant subsets of this data but are inherently incomplete in their coverage. This unevenness reflects Wikipedia's volunteer-driven nature and the difficulty of scaling up these projects.

The critical need to solve this problem has given us the opportunity to build a solution, not just for this subject matter but potentially for Wikipedia as a whole. Our efforts have resulted in the development of Credibility bot. While we began with the narrow topic of vaccines, the toolkit we have started building to monitor articles for the sources they use shows promise for being useful to any WikiProject or collaboration.

Read on to learn how we improved vaccine safety, including the infrastructure to support the development of future tools and bots. If this is something that interests you, you can help us.

Our solution

An underutilized method for maintaining reliability is compiling lists of rated domains. These "perennial sources lists" are where Wikipedians achieve consensus evaluations around contentious sources that generate debate. This guidance helps Wikipedia editors understand whether certain sources should be cited or avoided and points them to previous discussions. They can reflect a nuanced judgment that a source might be appropriate for a given context or subject matter, or they can categorically rule out a source as unreliable.

To demonstrate the value of this practice, we built the vaccine safety project (or WP:VSAFE) around this concept. Our goal: Make it easy for Wikipedians to increase the use of reliable sources on, and remove unreliable sources from, vaccine-related articles.

We started with an English-language perennial sources list for vaccine topics seeded with 80 sources, with analyses based on extracted conversations about these sources. We exported this list as a table with a versioned, timestamped mirror on Underlay. Data about the domains is also synchronized to the Internet Domains Wikibase, a repository of data organized around domain names that serves as a staging ground for future imports into Wikidata.

From there, we defined a set of articles in scope for this project. Approximately 800 articles were selected and screened for all sources. Rather than rely on templates appearing on talk pages, we defined a set of categories related to vaccines and vaccine hesitancy, as well as a set of Wikidata queries related to types of vaccines. We have modeled the project scope, though this page serves the purpose of documentation more than as something that can be interacted with in a meaningful way (at this point). Framing the scope of the project this way lets us use automated tools to add (and remove) articles from the project scope as new articles are created.

With a set of articles and a set of domain names to check those articles against, we developed Credibility bot. This bot produces a report summarizing the usage of reliable, unreliable, or mixed-reliable sources, as well as lists of domains corresponding to known bad sources and "unknown" (unrated) sources. The good news: on our first run, we could not find any articles in scope linking to a known unreliable source. Alerts are generated as sources of poor or unknown reliability are added to pages, a convenience on top of MediaWiki's own diffs.

Underpinning the presentation of this project is a new system of templates that we call Workspaces. This is a system of generic templates that lets you build organized portals on Wikipedia with little effort and is functional on both desktop and mobile. Each template exists independently of the others and is usable in contexts outside of WikiProjects as well. Bots can use these templates to render cleanly formatted pages without having to implement their own styling code.

The vision

Although our solution was designed in the context of vaccine safety, we have laid the foundation for a new approach to building WikiProjects and other on-wiki working groups. At the moment, what we have is a single project, with a bot, that happens to use certain templates. However, we seek to enable anyone to build their own focused workspace – be it for a subject matter, an institutional partnership, or even just a random rabbit hole – by filling out a template on a wiki page. The services producing these page sets are open APIs, allowing other bot developers to make use of them. Tagging talk pages with WikiProject templates will no longer be necessary to get the benefit of automated reporting. The goal is to make what is available for vaccine safety available for all projects.

This infrastructure for automatically producing sets of articles and on-wiki portals around them is a necessary baseline for what we ultimately want to achieve, which is cross-wiki task coordination. Wikipedia's backlog is notoriously long, and surfacing relevant tasks to editors is not a straightforward task. In our experience, this has made it harder to carry out partnerships between Wikipedia and other knowledge institutions since we do not have access to a ready list of ways in which these partners can constructively support Wikipedia. In the context of vaccine safety, this will help us get real-time updates as the usage of a given source is contested on a given article on some far-flung corner of the wiki, which could then prompt the vaccine safety project to participate in the discussion as well.

Finally, there's the matter of how we keep track of sources. To the surprise of some, Wikipedia does not have a central database of the citations that appear on articles. Partial solutions exist, including the external links table and aforementioned projects like JCW, but we are looking to build a more comprehensive solution. In the interim, we address this by focusing on web sources with domain names. This is the easiest approach, as many other source reliability datasets also identify sources in terms of their domain name, and it's a reasonable enough place to focus if your concern is weeding out unreliable web sources. But ultimately, to support Wikipedia as a whole, support for other kinds of sources will be required. This is, in itself, a massive research project that will be covered in a future article.

You may be wondering if this vision includes wikis other than Wikipedia and languages other than English. The answer is yes. Although we will need to do some work on the Lua modules used in Workspaces to support localizable template parameters, everything else is built to be wiki-agnostic, not reliant upon any particular category or template system.

How you can support this

We are working on gauging interest to see if there is interest in this kind of work being done outside of vaccine safety.

WP:VSAFE was developed with generous support from the City University of New York and Craig Newmark Philanthropies as a partnership between Hacks/Hackers (NewsQ) and Knowledge Futures Group (KFG).

In this issue
+ Add a comment

Discuss this story


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0