The Signpost

Recent research

Wikipedia's Shoah coverage succeeds where libraries fail

Contribute  —  
Share this
By Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Wikipedia succeeds where libraries fail, showing "an unmet interest" in the Shoah and Israel, also in Muslim countries

In a paper titled "The Political Geography of Shoah Knowledge and Awareness, Estimated from the Analysis of Global Library Catalogues and Wikipedia User Statistics"[1] Austrian political scientist Arno Tausch finds a disturbing "global North-South and North-East divide in the library presence of Shoah-related titles", contrasting it with "a more optimistic [trend], based on freely available information on the internet" – namely the availability and popularity of Wikipedia articles about the same topic in multiple languages. For example, the study highlights their pageview numbers in the Farsi, Arabic and Indonesian Wikipedias as "truly a hopeful sign".

The bulk of the paper consists of detailed bibliometric examinations:

"Based on the data of our research project covering 165 library catalogues (54 nationwide union catalogues, 81 national libraries, 16 legislative-assembly libraries, 14 libraries of international organizations) and the OCLC Worldcat, which by itself includes no less than 70,000 libraries in more than 170 countries, we found that there is indeed a huge global gap in Shoah library holdings. Some 69.3% of the global library presence of the leading peer-reviewed journal in the field, Holocaust and Genocide Studies, in principle available to global publics, is encountered in libraries within the geographical distance of less than 1,000 miles from New York City or Brussels. We particularly analyze the lack of Shoah knowledge and awareness in many Muslim and Catholic countries."

The author contrasts this with webometrics, where "we must regard Wikipedia download statistics [i.e. page view data] as a first and very reliable seismograph of global social network trend ... Its 49.3 million articles in almost 300 languages are therefore also a treasure trove for the research on Judaism, Israel, the memory of the victims of the Shoah, and global anti-Semitism. ....[To] estimate whether or not a given language community on Wikipedia has a high or a low relative tendency to seek information on the Shoah", he compares the pageview numbers for the corresponding article with the annual pageview numbers for the entire Wikipedia in that language, or alternatively with "the culturally most neutral article in this context, the Wikipedia article on the encyclopedia Wikipedia itself."

The paper's detailed bibliometric studies contain many observations on particular countries or regions, e.g. "Among the countries holding less than 100 titles in their combined entire countrywide library system, we find countries where considerable numbers of Jews were sent to the Nazi German death camps".

While praising Wikipedia as an information source with more even coverage (across languages or countries), the author still notes that "compared to the presumed size of the Wikipedia user community [i.e. total pageview numbers], the Portuguese, Spanish, German, Italian, Persian, and French speaking Wikipedia users had a higher tendency to download the main Shoah Wikipedia article. Results for the Wikipedia downloads in Japanese, Turkish, Russian, Chinese, Swedish, Polish, Korean, Ukrainian, Czech, Finnish, English, Indonesian, Arabic, and Dutch (in descending order) were below the trend." The study similarly examines the article about Israel, observing e.g. that "With 844 daily downloads of the Israel article in Persian and 1,254 daily downloads of the Israel article in Arabic, a certain presence of the theme of Israel among Wikipedia audiences in the Middle East has now been achieved."

The article was published in a journal of the Jerusalem Center for Public Affairs, a think-tank which (according to the English Wikipedia article about it) "is considered to be politically neo-conservative". That said, few of the author's previous publications appear to have focused on topics related to the Holocaust or the Arab-Israeli conflict.

Libraries and their biases are the main focus of the paper, with the Wikipedia-related results occupying a smaller part. Still, the former are of interest to Wikimedians and Wikipedia researchers as well - for example as evidence for possible risks in GLAM-WIKI collaborations, where the biases and political constraints of such cultural institutions might negatively affect Wikipedia's efforts to achieve a neutral point of view.


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"NwQM: A neural quality assessment framework for Wikipedia"

From the abstract:[2]

"In this paper we propose Neural wikipedia Quality Monitor (NwQM), a novel deep learning model which accumulates signals from several key information sources such as article text, meta data and images to obtain improved Wikipedia article representation. We present comparison of our approach against a plethora of available solutions and show 8% improvement over state-of-the-art approaches with detailed ablation studies."

"Evidence of a mostly productive and continuous effort to improve the quality of references" on English Wikipedia

From the abstract:[3]

"... we present a publicly available dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019. We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions. [...] We use the dataset to study the temporal evolution of Wikipedia references as well as users' editing behaviour. We find evidence of a mostly productive and continuous effort to improve the quality of references: (1) there is a persistent increase of reference and document identifiers (DOI, PubMedID, PMC, ISBN, ISSN, ArXiv ID), and (2) most of the reference curation work is done by registered humans (not bots or anonymous editors)."

"The network structure of scientific revolutions"

From the abstract:[4]

"Philosophers of science have long postulated how collective scientific knowledge grows. Empirical validation has been challenging due to limitations in collecting and systematizing large historical records. Here, we capitalize on the largest online encyclopedia to formulate knowledge as growing networks of articles and their hyperlinked inter-relations. We demonstrate that concept networks grow not by expanding from their core but rather by creating and filling knowledge gaps, a process which produces discoveries that are more frequently awarded Nobel prizes than others. Moreover, we operationalize paradigms as network modules to reveal a temporal signature in structural stability across scientific subjects."

"Using logical constraints to validate information in collaborative knowledge graphs: a study of COVID-19 on Wikidata"

From the abstract:[5]

"we catalog the rules describing relational and statistical COVID-19 epidemiological data and implement them in SPARQL, a query language for semantic databases. We demonstrate the efficiency of our methods to evaluate structured information, particularly COVID-19 knowledge in Wikidata, and consequently in collaborative ontologies and knowledge graphs, and we show the advantages and drawbacks of our proposed approach by comparing it to other methods for validation of linked web data."

"PNEL: Pointer Network based End-To-End Entity Linking over Knowledge Graphs"

From the abstract:[6]

"Question Answering systems are generally modelled as a pipeline consisting of a sequence of steps. In such a pipeline, Entity Linking (EL) is often the first step. [...] In this work we present a novel approach to end-to-end EL by applying the popular Pointer Network model, which achieves competitive performance. We demonstrate this in our evaluation over three datasets on the Wikidata Knowledge Graph."

"A decade of writing on Wikipedia: A comparative study of three articles"

From the abstract:[7]

"This article reports what observable writing activities characterized three Wikipedia articles, archive, design, and writing, over a three-year period from 2012–2014. It then compares these results to writing in these same three articles 10 years earlier, from 2002–2004. Results show that articles were longer and more referenced in 2012–2014. The most frequent written contributions in 2012–2014 were adding and deleting content, followed by vandalizing and reverting vandalism. Ten years earlier, content addition was likewise the most frequent activity, though vandalism and its removal were not found."


  1. ^ Tausch, Arno (2020). "The Political Geography of Shoah Knowledge and Awareness, Estimated from the Analysis of Global Library Catalogues and Wikipedia User Statistics". Jewish Political Studies Review. 31 (1/2): 7–123. ISSN 0792-335X. JSTOR 26870790.
  2. ^ Reddy, Bhanu Prakash; Bhusan, Sasi; Sarkar, Soumya; Mukherjee, Animesh (2020-10-14). "NwQM: A neural quality assessment framework for Wikipedia". arXiv:2010.06969 [cs.SI].
  3. ^ Zagovora, Olga; Ulloa, Roberto; Weller, Katrin; Flöck, Fabian (2020-10-06). "'I Updated the <ref>': The Evolution of References in the English Wikipedia and the Implications for Altmetrics". arXiv:2010.03083 [cs.CY].
  4. ^ Ju, Harang; Zhou, Dale; Blevins, Ann S.; Lydon-Staley, David M.; Kaplan, Judith; Tuma, Julio R.; Bassett, Danielle S. (2020-10-16). "The network structure of scientific revolutions". arXiv:2010.08381 [cs.DL].
  5. ^ Houcemeddine Turki; Dariusz Jemielniak; Mohamed Ali Hadj Taieb; Jose Emilio Labra Gayo; Mohamed Ben Aouicha; Mus'ab Banat; Thomas Shafee; Eric Prud'Hommeaux; Tiago Lubiana; Diptanshu Das; Daniel Mietchen (2020-08-30). "Using logical constraints to validate information in collaborative knowledge graphs: a study of COVID-19 on Wikidata". doi:10.5281/zenodo.4008359.
  6. ^ Banerjee, Debayan; Chaudhuri, Debanjan; Dubey, Mohnish; Lehmann, Jens (2020-08-31). "PNEL: Pointer Network based End-To-End Entity Linking over Knowledge Graphs". arXiv:2009.00106 [cs.CL].
  7. ^ Purdy, James P. (2020-08-03). "A decade of writing on Wikipedia: A comparative study of three articles". First Monday. doi:10.5210/fm.v25i9.10857. ISSN 1396-0466.

In this issue
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

I think "Shoah", the Hebrew term for "The Holocaust", is a lot more obscure in the English-speaking world than, well, "The Holocaust". Obviously, there's nothing whatsoever wrong with calling it "Shoah", but it's probably best to gloss it at the first use. The research is interesting, but the presentation seems a little reader-unfriendly. Adam Cuerden (talk)Has about 7.7% of all FPs 01:31, 2 December 2020 (UTC)[reply]

Point taken, I should probably have used "holocaust" in the headline and maybe add an illustration or two. (I spent less time than envisaged on writing up this month's "Recent research" issue, having decided to cover a different topic over at "News and notes" shortly before the publication deadline.) Regards, HaeB (talk) 04:08, 2 December 2020 (UTC)[reply]
I can't argue much with that. I'm still working on an article tht's been delayed twice. Adam Cuerden (talk)Has about 7.7% of all FPs 17:36, 2 December 2020 (UTC)[reply]


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0