A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
See also the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
From the abstract and the discussion section:[1]
"New disease outbreaks [e.g. Ebola, MERS, Swine influenza] are often characterized by emergent and changing information which, in turn, require Wikipedia editors to spend time and effort to retrieve and understand information that is sometimes ambiguous, complex, and contradictory. [...] the goals of this study are to identify types of uncertainty expressed by Wikipedia editors during new disease outbreaks, and examine different strategies deployed by Wikipedia editors to manage uncertainty. [...]
Wikipedia editors depend on several strategies to cope with uncertainty during a disease outbreak. These strategies rely primarily on consulting authoritative sources, reporting the uncertainty to the public, ignoring the uncertainty in the interests of maintaining simplicity, and, to a far lesser extent, setting up a mailing list to gather information and science as they emerge over time."
From the abstract:[2]
"we show that machine learning with natural language processing can accurately forecast the outcomes of group decision-making in online discussions. Specifically, we study Articles for Deletion, a Wikipedia forum for determining which content should be included on the site. Applying this model, we replicate several findings from prior work on the factors that predict debate outcomes; we then extend this prior work and present new avenues for study, particularly in the use of policy citation during discussion. Alongside these findings, we introduce a structured corpus and source code for analyzing over 400,000 deletion debates spanning Wikipedia's history."
From the abstract and discussion section:[3]
"Incorporating ideas into Wikipedia leads to those ideas being used more in the scientific literature. We provide correlational evidence of this across thousands of Wikipedia articles and causal evidence of it through a randomized control trial where we add new scientific content to Wikipedia. In the months after uploading it, an average new Wikipedia article in Chemistry is read tens of thousands of times and causes changes to hundreds of related scientific journal articles. Patterns in these changes suggest that Wikipedia articles are used as review articles, summarizing an area of science and highlighting the research contributions to it. Consistent with this reference article view, we find causal evidence that when scientific articles are added as references to Wikipedia, those articles accrue more academic citations. [...]
For each Wikipedia article that we created for this experiment we paid students $100. Assuming one Wikipedia article (or equivalent contribution) per research paper, the implicit tax on research would be ($100/$220,000 ) = 0.05%. [...] even with many conservative assumptions, dissemination through Wikipedia is ∼ 120× more cost-effective than traditional dissemination techniques."
This research caused community discussions that ultimately led to the creation of a "Wikipedia is not a laboratory" policy on the English Wikipedia.
From the abstract:[4]
"The data examined consist of Wikipedia contributors' debates that took place on a Wikipedia discussion site ('talk page'). Taking a corpus-assisted approach combined with argumentation analysis and aspects of systemic functional linguistics, I found that Wikipedia editors repeatedly propose that Nazi Germany might have been a precursor of the EU today. However, the Wikipedia community ultimately rejects this notion and emphasises the voluntary nature guiding the EU's creation process. Thus, while the EU's legitimacy is indeed contested in the course of the debates, the Wikipedia community eventually rejects this challenge."
From the abstract:[5]
"Drawing on systems justification theory and methods for measuring the enthusiasm gap among voters, this paper quantitatively analyzes the candidates’ biographical and related articles and their editors. Information production and consumption patterns match major events over the course of the campaign, but Trump-related articles show consistently higher levels of engagement than Clinton-related articles."
From the tool documentation and abstract:[6]
Wikipedia2Vec is a tool for learning embeddings of words and entities from Wikipedia. The learned embeddings map similar words and entities close to one another in a continuous vector space.
This tool learns embeddings of words and entities by iterating over entire Wikipedia pages and jointly optimizing the following three submodels:
- Wikipedia link graph model, which learns entity embeddings by predicting neighboring entities in Wikipedia's link graph [...]
- Word-based skip-gram model, which learns word embeddings by predicting neighboring words given each word in a text contained on a Wikipedia page.
- Anchor context model, which aims to place similar words and entities near one another in the vector space[ ...]
The embeddings of entities in a large knowledge base (e.g., Wikipedia) are highly beneficial for solving various natural language tasks that involve real world knowledge. In this paper, we present Wikipedia2Vec, a Python-based open-source tool for learning the embeddings of words and entities from Wikipedia. [...] We also introduce a web-based demonstration of our tool that allows users to visualize and explore the learned embeddings."
From the abstract:[7]
" ...we provide an overview over [...] recent advancements [in question answering research], focusing on neural network based question answering systems over knowledge graphs [including "the most popular KGQA datasets": 8 based on Freebase, 2 on DBPedia, one on DBpedia and Wikidata]. We introduce readers to the challenges in the tasks, current paradigms of approaches, discuss notable advancements, and outline the emerging trends in the field."
From the abstract:[8]
"Online encyclopediae like Wikipedia contain large amounts of text that need frequent corrections and updates. The new information may contradict existing content [....] we focus on rewriting such dynamically changing articles. [...] To this end, we propose a two-step solution: (1) We identify and remove the contradicting components in a target text for a given claim, using a neutralizing stance model; (2) We expand the remaining text to be consistent with the given claim, using a novel two-encoder sequence-to-sequence model with copy attention. Applied to a Wikipedia fact update dataset, our method successfully generates updated sentences for new claims... "
See also university press release: "Automated system can rewrite outdated sentences in Wikipedia articles" ("Text-generating tool pinpoints and replaces specific information in sentences while retaining humanlike grammar and style") and media coverage.
This preprint[9] presents a query-focused summarization dataset using Wikipedia's citations to align queries and documents.
This summary of the journey of knowledge graphs for Artificial Intelligence[10] also covers Wikidata:
"Wikidata (wikidata.org/) is wikipedia’s open-source machine-readable database with millions of entities where everyone can contribute and use (with reading and editing permissions) with a user-friendly query interface.
It covers a wide variety of domains and contains not only textual knowledge but also images, geocoordinates, and numerics. Wikidata uses unique identifiers for each entity/ relation for accurate querying and provides provenance metadata, unlike DBpedia and schema.org. For instance, it includes information about a fact’s correctness in terms of its origin and temporal validity (reference point of time during of the fact). Wikidata is one of the latest projects acknowledging the dynamic nature of KG and is continuously updated by human contributors unlike DBpedia which is curated from wikipedia once in a while."
From the abstract:[11]
"Based on action research with a mixed evaluation method and two rounds of interviews, the research followed the steps of 27 Israeli women activists who participated in editing workshops.
Findings: [...] having the will to edit and the knowledge of how to edit are necessary but insufficient conditions for women to participate in Wikipedia. The finding reveals two categories: pre-editing barriers of negative reputation, lack of recognition, anonymity and fear of being erased; and post-editing barriers of experiences of rejection, alienation, lack of time and profit and ownership of knowledge. The research suggests a “Vicious Circle” model, displaying how the five layers of negative reputation, anonymity, fear, alienation and rejection enhance each other, in a manner that deters women from contributing to the website."
Discuss this story
A very interesting discussion with good points on all sides. For me, the salient point is that Wikipedia combines an increasingly strict insistence on quality (citations, neutrality, non-commerciality, copyright, and all the rest) with rather little in the way of training and apprenticeship for newcomers. Helping even an able, willing, and co-operative newbie into editing an area effectively is quite a lot of work, specially for the first week or two, and such coaching requires expertise, energy and teaching skill, all quantities in quite short supply. But throwing newcomers straight into the rigours of editing live articles with no training seems increasingly drastic; it was alarming being a newbie over a decade ago, and now it's certainly worse. Other measures than apprenticeships and coaching are imaginable: we could encourage people to take an online tutorial; we could have a pop-up box asking new editors to make their suggestion on an article's talk page until they get the hang of things; there could be an automated 20-questions test so newcomers could see what skills they needed to get started; and so on. And of course, a safe place for new female editors in particular would be very welcome. Chiswick Chap (talk) 09:01, 2 April 2020 (UTC)[reply]
Science Is Shaped by Wikipedia: Evidence From a Randomized Control Trial
This research is from 2017, as are the linked discussions. I don't understand why it is listed in . —andrybak (talk) 08:40, 22 April 2020 (UTC)[reply]