The Signpost

Recent research

10%–30% of Wikipedia’s contributors have subject-matter expertise

Contribute  —  
Share this
By Tilman Bayer and Miriam Redi

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"Generating Architectural Landmark Descriptions" from Wikipedia, DBpedia and image analysis

This paper[1] describes a process to automatically generate descriptions of architectural landmarks using Wikipedia article text, Wikimedia Commons images, and DBpedia triples. The paper lists some examples on how this approach can go awry (see below), noting that "these descriptions cannot compete, in general, with more comprehensive well-written descriptions as encountered in Wikipedia. Still, it needs to be taken account that by far not all architectural landmarks that are of interest from the professional or cultural viewpoint are covered by Wikipedia. Fused content descriptions are then a welcomed solution".

Not actually a windmill in a zen garden: The Christ the Redeemer statue in Rio de Janeiro, Brasil
Evaluation of autogenerated text for Christ the Redeemer (Table 10 from the paper)
Wikipedia (human)
Christ the Redeemer is an Art Deco statue of Jesus Christ in Rio de Janeiro, Brazil, created by French sculptor Paul Landowski and built by Brazilian engineer Heitor da Silva Costa, in collaboration with French engineer Albert Caquot. Romanian sculptor Gheorghe Leonida fashioned the face. Constructed between 1922 and 1931, the statue is 30 metres (98 ft) high, excluding its 8-metre (26 ft) pedestal. The arms stretch 28 metres (92 ft) wide.
Christ the Redeemer (statue), which was built of Soapstone, is a Statue in Brazil.
Fused [with descriptions based on image recognition, incorrect content in red]
Christ the Redeemer (statue), which was built of Soapstone, is a statue in a zen garden environment in Brazil. Its architectural style is Hellinistic. Christ the Redeemer (statue) has similarities with a windmill and a beach house. There is an elevator shaft in it.


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"10%–30% of Wikipedia’s contributors have substantial subject-matter expertise"

From the abstract:[2]

"we carefully crossed information from individual Wikipedia editor pages with external sources such as Google Scholar to reliably identify editors who are credentialed experts. Matching these credentialed experts with their Wikipedia editing patterns, we used this dataset to train a machine learning classifier that we then employed to identify additional expert editors and assess the nature and the scope of their work across Wikipedia. Our results suggest that the scope of expert involvement is substantial, albeit with considerable differences across topics. We estimate that approximately 10%–30% of Wikipedia’s contributors have substantial subject-matter expertise in the topics that they edit."

See also coverage of an earlier conference presentation: "Evidence of Dark Matter: Assessing the Contribution of Subject-matter Experts to Wikipedia"

Wikidata lexemes still lack multilingual links

A paper from last year's "7th Workshop on Linked Data in Linguistics" [3] presented descriptive statistics about the lexemes of Wikidata, showing that there are still relatively few multilingual links as of 2020 (i.e. around two years after the project's launch).

Wikidata's "sustainable integration into library operations remains a challenge"

From the abstract:[4]

"The review revealed that Wikidata in libraries is generally described as an open and reusable knowledgebase of structured data capable of linking local metadata with a network of global metadata. Libraries have started experimenting with Wikidata to improve the global reach and access of their unique and prominent collections and scholars. While Wikidata holds great potential to become the repository choice for authority data disambiguation and linking, its sustainable integration into library operations remains a challenge."

"On Altpedias: partisan epistemics in the encyclopaedias of alternative facts*"

From the abstract:[5]

"We consider a selection of Altpedias that reject Wikipedia’s celebrated ‘neutral point of view’ as an artefact of liberal consensus politics whilst regarding their own epistemics as inherently partisan. As opposed to disregarding objectivity or truth, Altpedias’ ‘alternative facts’ may thus be understood as the product of competing normative standpoints concerning the use value of knowledge. In competing with Wikipedia, Altpedias ultimately attempt to give their partisan viewpoints universal standards, both in tone and in their very nature as wiki platforms. Empirically, the article uses visual network analysis and natural language processing in order to represent the vernacular worldviews of several far- and extreme-right Altpedias: Metapedia, Infogalactic and Rightpedia. Theoretically, the article frames these Altpedias’ fractious approach to the study of knowledge in relation to Lyotard’s ‘general agonistic’ and his speculations concerning the impact of computation on epistemics in the postmodern condition. "


  1. ^ Mille, Simon; Symeonidis, Spyridon; Rousi, Maria; Felipe, Montserrat; Stavrothanasopoulos, Klearchos; Alvanitopoulos, Petros; Carlini, Roberto; Grivolla, Jens; Meditskos, Georgios; Vrochidis, Stefanos; Wanner, Leo. A Case Study of NLG from Multimedia Data Sources: Generating Architectural Landmark Descriptions (PDF). com3rd International Workshop on Natural Language Generation from the Semantic Web (WebNL). Dublin, Ireland (Virtual).
  2. ^ Yarovoy, Alex; Nagar, Yiftach; Minkov, Einat; Arazy, Ofer (2020-10-16). "Assessing the Contribution of Subject-matter Experts to Wikipedia". ACM Transactions on Social Computing. 3 (4): 21–1–21:36. doi:10.1145/3416853. ISSN 2469-7818. Closed access icon
  3. ^ Finn Arup Nielsen: "Lexemes in Wikidata: 2020 status". Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020), pages 82–86. PDF
  4. ^ Tharani, Karim (2021-03-01). "Much more than a mere technology: A systematic review of Wikidata in libraries". The Journal of Academic Librarianship. 47 (2): 102326. doi:10.1016/j.acalib.2021.102326. ISSN 0099-1333.
  5. ^ Keulenaar, Emillie V. de; Tuters, Marc; Kisjes, Ivan; Beelen, Kaspar (2019-07-11). "On Altpedias: partisan epistemics in the encyclopaedias of alternative facts*". Artnodes (24).

In this issue
+ Add a comment

Discuss this story

To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
No comments yet. Yours could be the first!


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0