A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
The "clean Wehrmacht" battle covered in the past three issues of The Signpost (May, June, July) is reviewed from a historian's perspective in The Journal of Slavic Military Studies.[1] The title of the paper is an allusion to Lost Victories, today generally accepted as an unreliable and apologetic account of the actions of German forces during World War II. The author, David Stahel, who states that he is not a Wikipedia editor, examines the behind-the-scenes mechanisms and debates that result in article content, with the observation that these debates are not consistent with "consensus among serious historians" and "many people (and in my experience students) invest [Wikipedia] with a degree of objectivity and trust that, at least on topics related to the Wehrmacht, can at times be grossly misplaced...articles on the Wehrmacht (in English Wikipedia) might struggle to meet [the standard]". The author describes questionable arguments raised by several of the pro-Wehrmacht editors and concludes their writing "may in some instances reflect extremist views or romantic notions not grounded in the historiography".
Several recent publications tackle the problem of taking machine-readable factual statements about a notable person, such as their date of birth from the Wikidata item about them, and creating a biographical summary in natural language.
A paper[2] by three researchers from Australia reports on using an artificial intelligence approach for "the generation of one-sentence Wikipedia biographies from facts derived from Wikidata slot-value pairs". These are modeled after the first sentences of biographical Wikipedia articles, which, the authors argue, are of particular value because they form "clear and concise biographical summaries". The task of generating them involves making decisions about which of the facts to include (e.g. the date of birth or a political party that the subject is a member of), and arranging them into a natural language sentence. To achieve this in an automated fashion, the authors trained a recurrent neural network (RNN) implemented in TensorFlow on a corpus of several hundred thousand introductory sentences extracted from English Wikipedia articles about human, together the corresponding Wikidata entries. (Although not mentioned in the paper, such first sentences are the subject of a community guideline on the English Wikipedia, at least some aspects of which one might expect the neural network to reconstruct from the corpus.)
An example the algorithm's output compared to the Wikipedia original (excerpted from Table 5 in the paper):
Wikipedia original | robert charles cortner ( april 16 , 1927 may 19 , 1959 ) was an american automobile racing driver from redlands , california . |
Algorithm variant "S2S" | bob cortner ( april 16 , 1927 |
Algorithm variant "S2S+AE" | robert cortner ( april 16 , 1927 may 19 , 1959 ) was an american race-car driver . |
The quality of the algorithm's output (in several variants) was evaluated against the actual human-written sentences from Wikipedia (as the "gold standard") with a standard automated test (BLEU), but also by human readers recruited from CrowdFlower. This "human preference evaluation suggests the model is nearly as good as the Wikipedia reference", with the consensus of the human raters even preferring the neural network's version 40% of the time. However, those of the algorithm's variants that are allowed to infer facts not directly stated in the Wikidata item can suffer from the problem of AI "hallucinations", e.g. the struck-out parts in the above example, claiming that Bob Cortner was a boxer instead of a race-car driver, and died in 2005 instead of 1959.
Apart from describing and evaluating the algorithm, the paper also provides some results about Wikipedia itself, e.g. showing which biographical facts are most frequently used by Wikipedia editors. Table 1 from the paper lists "the top fifteen slots across entities used for input, and the % of time the value is a substring in the entity’s first sentence" in the examined corpus:
Fact | Count | % |
---|---|---|
TITLE (name) | 1,011,682 | 98 |
SEX OR GENDER | 1,007,575 | 0 |
DATE OF BIRTH | 817,942 | 88 |
OCCUPATION | 720,080 | 67 |
CITIZENSHIP | 663,707 | 52 |
DATE OF DEATH | 346,168 | 86 |
PLACE OF BIRTH | 298,374 | 25 |
EDUCATED AT | 141,334 | 32 |
SPORTS TEAM | 108,222 | 29 |
PLACE OF DEATH | 107,188 | 17 |
POSITION HELD | 87,656 | 75 |
PARICIPANT OF | 77,795 | 23 |
POLITICAL PARTY | 74,371 | 49 |
AWARD RECEIVED | 67,930 | 44 |
SPORT | 36,950 | 72 |
The paper's literature review mentions a 2016 paper titled "Neural Text Generation from Structured Data with Application to the Biography Domain"[3] as "the closest work to ours with a similar task using Wikipedia infoboxes in place of Wikidata. They condition an attentional neural language model (NLM) on local and global properties of infobox tables [...] They use 723k sentences from Wikipedia articles with 403k lower-cased words mapping to 1,740 distinct facts".
While the authors of both papers commendably make at least some of their code and data available on GitHub (1, 2), they do not seem to have aimed to make their algorithms into a tool for generating text for use in Wikipedia itself – perhaps wisely so, as previous efforts in this direction have met with community opposition due to quality concerns (e.g. in the case of a paper we covered previously here: "Bot detects theatre play scripts on the web and writes Wikipedia articles about them").
In the third, most recent research effort, covered in several publications,[4][5][6] another group of researchers likewise developed a method to automatically generate summaries of Wikipedia article topics via a neural network, based on structured data from Wikidata (and, in one variant, DBpedia).
They directly worked with community members from two small Wikipedias (Arabic and Esperanto) to evaluate "not only the quality of the generated text, but also the usefulness of our end-system to any underserved Wikipedia version", when extending the existing ArticlePlaceholder feature that is in use on some of these smaller Wikipedias. The result was that "that members of the targeted language communities rank our text close to the expected quality standards of Wikipedia, and are likely to consider the generated text as part of Wikipedia. Lastly, we found that the editors are likely to reuse a large portion of the generated summaries [when writing actual Wikipedia articles], thus emphasizing the usefulness of our approach to its intended audience."
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions are always welcome for reviewing or summarizing newly published research.
From the accompanying blog post: "The system was a gamified tutorial for new Wikipedia editors. Working with the tutorial creators, we conducted both a survey of its users and a randomized field experiment testing its effectiveness in encouraging subsequent contributions. We found that although users loved it, it did not affect subsequent participation rates."
See also: research project page on Meta-wiki, podcast interview, podcast coverage, Wikimedia Research Showcase presentation
Two papers by the same team of researchers explore this topic for Wikipedia editors and readers, respectively:
From the paper:
Study 1: This study investigated whether events in Wikipedia articles are represented as more likely in retrospect. For a total of 33 events, we retrieved article versions from the German Wikipedia that existed prior to the event (foresight) or after the event had happened (hindsight) and assessed indicators of hindsight bias in those articles [...] we determined the number of words of the categories "cause" (containing words such as "hence"), "certainty" (e.g., "always"), tentativeness (e.g., "maybe"), "insight" (e.g., "consider"), and "discrepancy" (e.g., "should"), because the hindsight perspective is assumed to be the result of successful causal modeling [...] There was an increase in the proportion of hindsight-related words across article versions. [...] We investigated whether there is evidence for hindsight distortions in Wikipedia articles or whether Wikipedia’s guidelines effectively prevent hindsight bias to occur. Our study provides empirical evidence for both.
From the abstract: "We report two studies with Wikipedia articles and samples from different cultures (Study 1: Germany, Singapore, USA, Vietnam, Japan, Sweden, N = 446; Study 2: USA, Vietnam, N = 144). Participants read one of two article versions (foresight and hindsight) about the Fukushima Nuclear Plant and estimated the likelihood, inevitability, and foreseeability of the nuclear disaster. Reading the hindsight article increased individuals' hindsight bias independently of analytic or holistic thinking style. "
From the abstract:[10] "...we introduce a new Wikipedia based collection specific for non-factoid answer passage retrieval containing thousands of questions with annotated answers and show benchmark results on a variety of state of the art neural architectures and retrieval models."
From the abstract:[11] "This paper gives comprehensive analyses of corpora based on Wikipedia for several tasks in question answering. Four recent corpora are collected, WikiQA, SelQA, SQuAD, and InfoQA, and first analyzed intrinsically by contextual similarities, question types, and answer categories. These corpora are then analyzed extrinsically by three question answering tasks, answer retrieval, selection, and triggering."
From the abstract:[12] "We study the task of generating from Wikipedia articles question-answer pairs that cover content beyond a single sentence. We propose a neural network approach that incorporates coreference knowledge via a novel gating mechanism. [...] We apply our system [...] to the 10,000 top-ranking Wikipedia articles and create a corpus of over one million question-answer pairs."
From the abstract:[13] "We first introduce a new approach for translating natural language questions to SPARQL queries. It is able to query several KBs [knowledge bases] simultaneously, in different languages, and can easily be ported to other KBs and languages. In our evaluation, the impact of our approach is proven using 5 different well-known and large KBs: Wikidata, DBpedia, MusicBrainz, DBLP and Freebase as well as 5 different languages namely English, German, French, Italian and Spanish." Online demo: https://wdaqua-frontend.univ-st-etienne.fr/
{{cite conference}}
: External link in |conference=
(help)
{{cite conference}}
: External link in |conference=
(help)
{{cite conference}}
: External link in |conference=
(help)
{{cite conference}}
: External link in |conference=
(help)
{{cite conference}}
: External link in |conference=
(help)
{{cite conference}}
: External link in |conference=
(help)
{{cite conference}}
: External link in |conference=
(help)
Discuss this story
FWIW, I would think it should be worth mentioning that a passage Stahel complains about having been cut, at the bottom of page 398, cited his own work as a source? He may have less disinterested motives in writing this than it seems. Daniel Case (talk) 20:51, 2 September 2018 (UTC)[reply]