Study deems COVID-19 editors smart and cool, questions of clarity and utility for WMF's proposed "Knowledge Integrity Risk Observatory"

Recent research

Study deems COVID-19 editors smart and cool, questions of clarity and utility for WMF's proposed "Knowledge Integrity Risk Observatory"

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"Wikipedia as a trusted method of information assessment during the COVID-19 crisis"

Reviewed by Piotr Konieczny

This book chapter,^[1] unfortunately not yet in open access, provides a good overview of Wikipedia's history and practices, concluding that Wikipedia's coverage of the COVID-19 pandemic is "precise and robust", and that this generated positive coverage of Wikipedia in mainstream media. The author provides an extensive and detailed overview of how Wikipedia volunteers covered the pandemic, and highlights the efforts of the dedicated WikiProject COVID-19 (on English Wikipedia, while also mentioning COVID-19 "portals in several other languages such as French, Spanish, or German"), an offshoot of the larger WikiProject Medicine, as important in creating quality content in this topic area. He also praises Wikipedia editors for their dedication to the fight against fake news and misinformation, and the Wikimedia Foundation's "embrace" of these editors' actions.

One of the most interesting observations made by the author – if somewhat tangential to the main topic – is that Wikipedia "is used [by readers] in ways similar to a news media", which "generates a tension between Wikipedia's original encyclopaedic ambitions and these pressing journalistic tendencies" (an interesting and, in this reviewer's experience, under-researched topic, at least in English – see here for a review of a recent French book on this topic). The author concludes that this, somewhat begrudging, acceptance of current developments by the Wikipedia community has significantly contributed to making its coverage relevant to the general audience, and at the same time, notes that despite reliance on media sources, rather than waiting for scholarly coverage, Wikipedia was still able to main high quality in its coverage, which the author attributes to editors' reliance on "legacy media outlets, like The New York Times and the BBC".

Wikimedia Foundation builds "Knowledge Integrity Risk Observatory" to enable communities to monitor at-risk Wikipedias

Reviewed by Tilman Bayer

"Taxomony of Knowledge Integrity Risks in Wikipedia" (figure 1 in the paper)

A paper titled "A preliminary approach to knowledge integrity risk assessment in Wikipedia projects"^[2] by two members of the Wikimedia Foundation's research team provides a "taxonomy of knowledge integrity risks in Wikipedia projects [and an] initial set of indicators to be the core of a Wikipedia Knowledge Integrity Risk Observatory." The goal is to "provide Wikipedia communities with an actionable monitoring system." The paper was presented in August 2021 at an academic workshop on misinformation (part of the ACM's KDD conference), as well as at the Wikimania 2021 community conference the same month.

The taxonomy distinguishes between internal and external risks, each divided into further sub-areas (see figure). "Internal" refers "to issues specific to the Wikimedia ecosystem while [external risks] involve activity from other environments, both online and offline."

Various quantitative indicators of "knowledge integrity risks" are proposed for each area. A remark at the end of the paper clarifies that they are all meant to be calculated at the project level, i.e. to provide information about how much an entire Wikipedia language version is at risk (rather than, say, to identify specific articles or editors that may deserve extra scrutiny).

For example, the following indicators are suggested for the "content verifiability" risk category:

Distribution of articles by number of citations, number of scientific citations and number of citation and verifiability article maintenance templates, distribution of sources by reliability.

The authors emphasize that "the criteria for proposing these indicators are that they should be simple to be easily interpreted by non-technical stakeholders". Some of the proposed metrics are indeed standard in other contexts. But the paper mostly leaves open how they should be interpreted in the context of knowledge integrity risks. For example, the metrics for internal risks in the category "community capacity" include "Number of articles, editors, active editors, editors with elevated user rights (admins, bureaucrats, checkusers, oversighters, rollbackers)". The authors indicate that these are meant to identify a "shortage of patrolling resources." Presumably the idea is to construct risk indicators based on the ratio of these editor numbers to the number of articles or edits (with higher ratios perhaps indicating higher resilience to integrity risks), but the paper doesn't provide explanations.

For various other metrics, the possible interpretations are even less clear. For example, "ratio of articles for deletion; ratio of blocked accounts" are listed in the "Community governance" risk category. But do high ratios indicate a higher risk (because the project is more frequently targeted by misinformation) or a lower risk (because the local community is more effective at deleting and blocking misinformation)? Similarly, would a comparatively low "number of scientific citations" on a project indicate that it is rife for scientific misinformation - or simply that it has fewer and shorter articles about scientific topics?

Throughout the paper, such questions often remain unresolved, raising doubts about how useful these metrics will be in practice. While the authors sometimes cite relevant literature, several of the cited publications do not support or explain a relation between the proposed metric and misinformation risks either. For example, one of the two papers cited for "controversiality" (cf. our review) points out that, contrary to what the Foundation's researchers appear to assume, editor controversies can have a positive effect on article quality ("Clearly, there is a positive role of the conflicts: if they can be resolved in a consensus, the resulting product will better reflect the state of the art"). Similarly, other research has found that "higher political polarization [among editors] was associated with higher article quality."

An exception is the "community demographics" risk category, where the authors provide the following justification:

"To illustrate the value of the indicators for knowledge integrity risk assessment in Wikipedia, we provide an example on community demographics, in particular, geographical diversity [defined as] the entropy value of the distributions of number of edits and views by country of the language editions with over 500K articles. On the one hand, we observe large entropy values for both edits and views in the Arabic, English and Spanish editions, i.e., global communities. On the other hand, other large language editions like the Italian, Indonesian, Polish, Korean or Vietnamese Wikipedia lack that geographical diversity."

(Assuming that this refers to entropy in the sense of information theory, for example, these values are minimal (0) when all edits or views are concentrated in a single country, and maximized when every country worldwide contributes the exact same number of edits or views.)

Here, the authors "highlight the extraordinarily low entropy of views of the Japanese Wikipedia, which supports one of the main causes attributed to misinformation incidents in this edition" (referring to concerns about historical revisionism in several Japanese Wikipedia articles). However, it remains unclear why a low diversity in views should be more directly associated to such bias problems than a low diversity in edits (where the Japanese Wikipedia appears to be largely on par with Finnish and Korean Wikipedia, and Italian, Polish and Catalan Wikipedia would seem similarly at risk). The paper also includes a plot showing a linear regression fit that indicates a relation between the two measures (entropy of views and edits). But this finding seems somewhat unsurprising if one accepts that reading and editing activity levels may be correlated, and its relevance to knowledge integrity remains unclear.

Lastly, while the paper's introduction highlights "deception techniques such as astroturfing, harmful bots, computational propaganda, sockpuppetry, data voids, etc." as major reasons for a recent rise in misinformation problems on the web, none of these are explicitly reflected in the proposed taxonomy, or captured in the quantitative indicators. (The "content quality" metrics mention the frequency of bot edits, but in the context of Wikipedia research, these usually refer to openly declared, benign bot accounts. The "geopolitics" risk category gives a nod to "political contexts" where "some well resourced interested parties (e.g., corporations, nations) might be interested in externally-coordinated long-term disinformation campaigns in specific projects," but this evidently does not capture many or most non-geopolitical abuses of Wikipedia for PR purposes.)

This omission is rather surprising, considering that problems like paid advocacy and conflict of interest editing have been discussed as major concerns among the editing community for a long time (see e.g. the hundreds of articles published over the years by the Signpost, recently in form of a recurring "Disinformation Report" rubric). They are also among the few content issues where the Wikimedia Foundation has felt compelled to take action in the past, e.g. by changing the Terms of Use and taking legal action against some players.

The paper stresses in its title that the taxonomy is meant to be "preliminary." Indeed, since its publication last year, further work has been done on refining and improving at least the proposed metrics (if not necessarily the taxonomy itself), according to the research project's page on Meta-wiki and associated Phabricator tasks, resulting in a not yet public prototype of the risk observatory (compare screenshot below). Among other changes, the aforementioned entropy of views by country seems to have been replaced by a "Gini index" chart. Also, rather than the relative ratios of blocks mentioned in the paper, the prototype shows absolute counts of globally locked editors over time, still raising several questions on how to interpret these numbers in terms of knowledge risks.

The project appears to be part of the WMF Research team's "knowledge integrity" focus area, announced in February 2019 in one of four "white papers that outline our plans and priorities for the next 5 years" (see also last month's issue about several other efforts in this focus area, which likewise haven't yet resulted in actionable tools for editors apart from one since discontinued prototype). The "observatory" concept may have been inspired by the existing "Wikipedia Diversity Observatory" (cf. our previous coverage).

The case of Croatian Wikipedia

While it is not mentioned in the "knowledge integrity risk assessment" paper, the Croatian Wikipedia is probably the most prominent example of a Wikimedia project where knowledge integrity was found to be compromised significantly. Thus it might provide an interesting retroactive test case for the usefulness of the observatory. A "disinformation assessment report"^[3] commissioned by a different team at the Wikimedia Foundation (published in June 2021, i.e. around the same time as the paper reviewed above) found "a widespread pattern of manipulative behaviour and abuse of power by an ideologically aligned group of Croatian language Wikipedia (Hr.WP) admins and other volunteer editors", who held "undue de-facto control over the project at least from 2011 to 2020." It's unclear to this reviewer whether that kind of situation would have been reflected in any of the Wikipedia Knowledge Integrity Risk Observatory's proposed indicators. None of them seem to be suitable for distinguishing such a case - where the project's admins reverted or blocked editors who actually tried to uphold core Wikipedia principles - from the (hopefully) more frequent situation where the project's admins uphold and enforce these principles against actors who try to introduce disinformation.

Interestingly though, while the findings of the Croatian Wikipedia disinformation assessment are largely qualitative, it also developed a quantitative indicator "to measure and quantify disinformation". This is based on examining the Wikipedia articles about individuals who the UN's International Criminal Tribunal for the former Yugoslavia (ICTY) convicted of war crimes, counting how many of these articles mention this conviction in the first three sentences. The report's anonymous author found

"...that Croatian and Serbian language Wikipedia, in 62.5% and 39.1% of cases, respectively, avoid informing their visitors in the introductory paragraph that the person they’re reading about is a convicted war criminal who comes from the same ethnic group. All other surveyed Wikipedia languages – Serbo-Croatian, Bosnian, English, French, Spanish, and German – do this consistently and keep the information at the very beginning of an article [...]"

This metric seems much more concretely justified and actionable than any of those captured in the aforementioned observatory - albeit of course rather topic-specific and harder to operationalize (apparently the report's author had to manually inspect each article in the sample.)

Having said this, another finding of the report may lend additional anecdotal support to the "knowledge integrity risk assessment" paper's hypothesis that low edit diversity/entropy (by country) increases knowledge integrity risks:

"Croatian Wikipedia represents the Croatian standard variant of the Serbo-Croatian language. Unlike other pluricentric Wikipedia language projects, such as English, French, German, and Spanish, Serbo-Croatian Wikipedia’s community was split up into Croatian, Bosnian, Serbian, and the original Serbo-Croatian wikis starting in 2003. The report concludes that this structure enabled local language communities to sort by points of view on each project, often falling along political party lines in the respective regions. The report asserts, furthermore, it deprived the newly-created communities of editorial diversity that normally guides and underpins the traditionally successful process of editorial consensus in other pluricentric language projects."

Consequently, the report's three main recommendations include "unifying community elections for admin and functionary roles across the involved wikis" and ultimately re-merging these wikis "into the original Serbo-Croatian language projects."

Briefly

The Wikimedia Foundation's research team has released a Python library ("mwedittypes") for automated detection of edit types, based on action categories such as adding a wikilink or changing a template parameter.
See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer

Wikipedia "is winning the battle against COVID-19 misinformation"

From the abstract and the "Conclusion" section, in First Monday:^[4]

"This paper investigates the editorial framework developed by the Wikipedia community, and identifies three key factors as proving successful in the fight against medical misinformation in a global pandemic — the editors, their sources and the technological affordances of the platform."
"Perhaps most significantly for the flow of misinformation, is that unlike the interconnectivity of other online media platforms, Wikipedia is largely a one-way street. While Facebook, YouTube and Google refer their readers to the site for fact-checking, Wikipedia does not return the favour. Without a commercial agenda, its readers are not directed to other content by an algorithm, nor are they subjected to advertisements or clickbait, hijacking their attention. [...]
The site is winning the battle against COVID-19 misinformation through the combination of an enthusiastic, volunteer army (those nit-picking masses), working within the disciplined schema of rigorous referencing to credible sources, on a platform designed for transparency and efficient editing. This editorial framework, combined with sanctions, expert oversight and more stringent referencing rules, is showing Wikipedia to be a significant platform for health information during the COVID-19 pandemic."

"Associations Between Online Instruction in Lateral Reading Strategies and Fact-Checking COVID-19 News Among College Students"

From the abstract, in AERA Open:^[5]

"In Fall 2020, college students (N = 221) in an online general education civics course were taught through asynchronous assignments how to use lateral reading strategies to fact-check online information [i.e. "leaving the initial content to investigate sources and verify claims using trusted sources", which here were taken to include Wikipedia]. Students improved from pretest to posttest in the use of lateral reading to fact-check information; lateral reading was predicted by the number of assignments completed and students’ reading comprehension test scores. Students reported greater use, endorsement, and knowledge of Wikipedia at posttest, aligning with the curriculum’s emphasis on using Wikipedia to investigate information sources. Students also reported increased confidence in their ability to fact-check COVID-19 news."

"Wikipedia, Google Trends and Diet: Assessment of Temporal Trends in the Internet Users’ Searches in Italy before and during COVID-19 Pandemic"

From the abstract, in Nutrients:^[6]

"We obtained data from Google Trends and Wikipedia in order to assess whether an analysis of Internet searches could provide information on the Internet users’ behaviour/interest in diets. Differences in seasonality, year and before/during COVID-19 pandemic were assessed. From Wikipedia, we extracted the number of times a page is viewed by users, aggregated on monthly and seasonal bases. [...] The Mediterranean diet was the most frequently (33.9%), followed by the pescatarian diet (9.0%). Statistically, significant seasonal differences were found for the Mediterranean, vegetarian, Atkins, Scarsdale, and zone diets and pescetarianism."

"Collective Response to Media Coverage of the COVID-19 Pandemic on Reddit and Wikipedia: Mixed-Methods Analysis"

From the abstract, in the Journal of Medical Internet Research:^[7]

"We collected a heterogeneous data set including 227,768 web-based news articles and 13,448 YouTube videos published by mainstream media outlets, 107,898 user posts and 3,829,309 comments on the social media platform Reddit, and 278,456,892 views of COVID-19–related Wikipedia pages. To analyze the relationship between media coverage, epidemic progression, and users’ collective web-based response, we considered a linear regression model that predicts the public response for each country given the amount of news exposure. We also applied topic modelling to the data set [...].
Results: Our results show that public attention, quantified as user activity on Reddit and active searches on Wikipedia pages, is mainly driven by media coverage; meanwhile, this activity declines rapidly while news exposure and COVID-19 incidence remain high."

"War and Pieces: Comparing Perspectives About World War I and II Across Wikipedia Language Communities"

From the abstract, in the Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature:^[8]

"We introduce a methodology for approximating the extent to which narratives of conflict may diverge [...], focusing on articles about World War I and II battles written by Wikipedia’s communities of editors across four language editions. For simplicity, our unit of analysis representing each language communities’ perspectives is based on national entities and their subject-object-relation context, identified using named entity recognition and open-domain information extraction. Using a vector representation of these tuples, we evaluate how similarly different language editions portray how [sic] and how often these entities are mentioned in articles. Our results indicate that (1) language editions tend to reference associated countries more and (2) how much one language edition’s depiction overlaps with all others varies."