A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
In this research, presented at last month's OpenSym conference, Balderas and colleagues experimented with a MediaWiki wiki for British university students in a computer science course to turn in and collaborate on written assignments. They were interested in developing a pedagogical intervention to combat procrastination by students, which they describe as an "ethically questionable" behavior.
They created a new version of a six week software engineering course with the goal of reducing procrastination and evaluated it as an experiment comparing the time management and grade performance of students between two consecutive years of the course. In the first year, the students were to turn in their assignments all at once at the end of the course and were free to use whatever software they wished. In the second year, the students were trained in MediaWiki, used it to complete weekly assignments, and would be penalized for not finishing the work on time. This second group of students procrastinated less (in the first year 16% of students handed in late work, compared to only 4% in the second year) and achieved better grades (in the first year many more students received a 'B' than an 'A', but the opposite was true in the second year).
I think this study achieved its goal of demonstrating that MediaWiki may be a useful pedagogical tool because edit history data can make it easy for instructors to monitor when students worked on their assignments. The course instructors used an open-source software called "WikiAssignmentMonitor" that extracts data from the MediaWiki database and generates a spreadsheet showing how much progress a student made on each assignment every week or hour. The researchers used this tool to track whether students completed work on time.
That said, the study also suffers limitations in its experimental design. Mainly, several other things changed between the two courses other than the use of a wiki or the schedule of deadlines. In particular, the assignments themselves were not exactly the same from one year to the next. However, they saw similar grade improvements for every assignment, even the ones that didn't change. Also, different software version control systems were used, but it seems more plausible that changing to MediaWiki and weekly deadlines explains their findings compared to this unrelated change. Importantly, it isn't possible from their study to say how much of the improvement should be attributed to the use of MediaWiki or to changing the schedule from 1 final deadline to 6 weekly deadlines.
Despite these limitations, I thought it was interesting to see an educational application of wikis that didn't rely heavily on collaboration, but instead on other affordances of the MediaWiki software that can be useful to instructors. They didn't have to require students turn in their work each week, they could just look at the WikiAssignmentMonitor report to check student's progress. Moreover, they could see students make progress on assignments over time at levels of granularity not normally available to course instructors. For instance, they could see whether a given student completed an assignment in one session instead of over many sessions. This paper made me curious about how this kind of monitoring would influence student behavior even if it wasn't a factor in their grades.
"How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources" by Annabel Rothschild, Emma Lurie, and Eni Mustafaraj of Wellesley College, published in the 2019 Computation and Journalism Symposium, focuses on how readers determine the quality of a given news source based on information provided through Google's rich search results. This is a particularly timely study as this summer it was reported that, for the first time, over half of searches on Google are not resulting in clicks to links[supp 1]–i.e. Google Search has become progressively more efficient at satisfying the needs of their users without the user ever visiting the sites providing the content that is surfaced via Google. This means that Google Search increasingly sets the context in which readers evaluate the quality of information they read.
Rothschild et al. conduct two studies. The first involved interviews with 30 undergraduate students as they assessed the credibility of three news sources: The Durango Herald, The Tennessean, and The Christian Times. Many of the participants indicated that they used Google as the primary medium through which they evaluated a source. As a result, in the second study, Rothschild et al. recruited 66 individuals through Amazon Mechanical Turk to evaluate the credibility of two news sources (ProPublica and Newsmax) through the Knowledge Panel alone. Both studies indicated that information surfaced by Google from Wikipedia about the news sources figured heavily in readers' assessments.
This work highlights the incredible value that Wikipedia the provides to the world and tech platforms, in particular for helping readers assess the credibility of news sources. Readers use Wikipedia, as surfaced via Google, for this purpose, but sites like Youtube and Facebook also surface Wikipedia links about a source as a means of supporting fact-checking.[supp 2] This work also points towards particularly important statements on Wikidata for assessing the quality of a source -- namely awards that a publication has earned, social media presence, geographic context, and establishment date.
The paper closes by noting that despite the value that Wikipedia, as surfaced by Google, provides to readers, many news sources do not yet have a knowledge panel appearing when you search for them. It mentions the Newspapers on Wikipedia project (which had been inspired by early results from their research) as a valuable initiative for addressing this gap with many potential benefits beyond supporting credibility assessments within Google Search.
"Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics" by Włodzimierz Lewoniewski, Krzysztof Węcel, and Witold Abramowicz of Poznan ́University, published by MDPI Computers, examines the challenge of aggregating Wikipedia page views according to topic and comparing the quality and popularity of these topics across languages.
From a methodological standpoint, comprehensively labeling Wikipedia articles according to a relatively small number of topics is quite challenging. This problem has inspired many approaches and taxonomies (e.g., ORES drafttopic, Using Wikipedia categories for research, Wikidata Concepts Monitor). This work explores two approaches: 1) automatic mapping of the existing category network on Wikipedia to high-level categories as identified by English Wikipedia, and, 2) topic as determined by a mixture of DBPedia and Wikidata classes. Figure 4 from the paper (shown here) shows the results for proportion of articles in each topic (using the category network method).
There are a lot of data and visualizations in this paper that I would encourage the reader to view for themselves. The authors also expose their results through the website WikiRank.
(See also earlier coverage of related publications by some of the same authors)
The fifteenth edition of the annual OpenSym conference took place in Skövde, Sweden last month. The event was launched in 2005 as "WikiSym", focusing exclusively on research about wikis, but over time came to include other forms of "open collaboration" and was renamed to OpenSym several years ago. Many papers presented at this year's OpenSym (see proceedings) studied open source software collaboration, but a substantial part were still focused on Wikipedia, Wikidata and other wikis. Apart from Balderas et al.'s paper on wikis and procrastination (reviewed above), these were:
Among the takeaways presented from this overview of 28 papers which covered this area since Wikidata's launch in 2012 (some comparing it with other structured data projects such as DBpedia or YAGO):
From the abstract::
"... we extend [the] previous line of research on [automated] article quality classification by extending the set of features with novel content and edit features (e.g., document embeddings of articles). We propose a classification approach utilizing gradient boosted trees based on this novel, extended set of features extracted from Wikipedia articles. Based on an established dataset containing Wikipedia articles and quality classes, we show that our approach is able to substantially outperform previous approaches (also including recent deep learning methods [cf. previous coverage: 'Improved article quality prediction with deep learning'])."
"Document embeddings" refers to mapping each article to a vector in a vector space of "500 latent dimensions" (analogous to word embeddings), resulting in "a numeric, latent representation of the document content, its context, and semantics. We hypothesize that adding this comprehensive article representation can be leveraged for getting a better representation of the contents of an article and hence, its quality."
The edit-related features include the timestamps of the article's last 100 edits, and "the vector differences between the tf/idf vectors of the last 100 versions of the article."
This paper examines the work on labels in Wikidata (i.e. the most common name of an item in a particular language, typically but not always coinciding with the title of the corresponding Wikipedia article in that language, if it exists).From the conclusions:
"We identify three types of editors: registered editors, bots, and anonymous editors. Bots contributed to the most number of labels for specific languages while registered users tend to contribute more to multilingual labels, i.e., translation. The hybrid approach of Wikidata, of humans and bots editing the knowledge graph alongside, supports the collaborative work towards the completion of the knowledge graph."
From the paper's conclusions:
"We studied the formal process of requesting bot rights in Wikidata [...] The RfPs [ requests for permission ] were studied mainly from two perspectives: 1) What information is provided during the time the bot rights are requested and 2) how the community handles these requests. We found that the main tasks requested are adding claims, statements, terms and sitelinks into Wikidata, as well as the main source of bot edits have their roots in Wikipedia. This contrasts with Wikipedia where bots are performing mostly maintenance tasks. Our findings also show that most of the RfPs were approved and a small number of them were unsuccessful mainly because operators had withdrawn or there was no activity from the operators."
From the abstract and paper (co-authored by this reviewer):
"In 2017, the Wikimedia Foundation began measuring the time readers spend on a given page view (dwell time), enabling a more detailed understanding of [Wikipedia] reading patterns. In this paper, we validate and model this new data source and, building on existing findings, use regression analysis to test hypotheses about how patterns in reading time vary between global contexts.
Consistent with prior findings from self-report data, our complementary analysis of behavioral data provides evidence that Global South readers are more likely to use Wikipedia to gain in-depth understanding of a topic. [...] The median reading time [across all Wikipedias, globally] is 25 seconds and the 75th percentile is 75.1 seconds. [...] Based on our data, we estimate that humanity spent about 672,349 years reading Wikipedia from November 2017 through October 2018."
This paper presented applications of the "WikiChron" tool, available as a demo for various (non-Wikimedia) wikis at http://wikichron.science/ (with source code available on GitHub). It was also the subject of a presentation at this year's Wikimania.
This paper examined Wikitribune, a for-profit but freely licensed news site launched in 2017. While Wikitribune is (despite the name) not based on a wiki, its model of open collaboration between professional journalists and volunteers, as well as the fact that it was launched by Wikipedia founder Jimmy Wales, made it a fitting subject for OpenSym.
Among the potential barriers to volunteer participation on WikiTribune identified by the researchers - in particular in its initial version - were the website's design (emphasizing readability over editability), and a real names policy. Over time, the project's model morphed from closed to hybrid to more open (also involving the departure of all paid journalists). Some data from the project's first year, as highlighted in the presentation: The vast majority of articles (79%) were written by the paid staff. Articles tended to be UK-centric, have a low engagement in the comments, and had on average nine revisions and six different contributors.
See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
From the abstract of this paper (which received the "Outstanding Problem-Solution Paper" award at the conference):
"Standard recommender systems [...] rely on users' histories of previous interactions with the platform. As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions -- the so-called cold-start problem. The present paper addresses the cold-start problem on Wikipedia by developing a method for automatically building short questionnaires that, when completed by a newly registered Wikipedia user, can be used for a variety of purposes, including article recommendations that can help new editors get started. Our questionnaires are constructed based on the text of Wikipedia articles as well as the history of contributions by the already onboarded Wikipedia editors. We assess the quality of our questionnaire-based recommendations in an offline evaluation using historical data, as well as an online evaluation with hundreds of real Wikipedia newcomers, concluding that our method provides cohesive, human-readable questions that perform well against several baselines."
See also project page on Meta-wiki
From the abstract:
"[We study] participation following shocks that draw attention to an article. Such events can be recruiting opportunities due to increased attention; but can also pose a threat to the quality and control of the article and drive away newcomers. [We examine] shocks generated by drastic increases in attention as indicated by data from Google trends. We find that participation following such events is indeed different from participation during normal times–both newcomers and incumbents participate at higher rates during shocks. We also identify collaboration dynamics that mediate the effects of shocks on continued participation after the shock. The impact of shocks on participation is mediated by the amount of negative feedback given to newcomers in the form of reverted edits and the amount of coordination editors engage in through edits of the article’s talk page."
From the abstract: 
"... For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia."
Tweet by one of the authors:
"Try Cr5, our new model for crosslingual document embedding! Input: text in any of 28 languages Output: language-independent vector representation, so you can compare text across langs. Pre-trained model and API: https://github.com/epfl-dlab/Cr5 "
From the abstract
"The article explores how a notorious case of Second World War atrocities in Ukraine – the Babi Yar massacres of 1941-1943 – is represented and interpreted on Wikipedia. Using qualitative content analysis, it examines what frames and content features are used in different language versions of Wikipedia to transcribe the traumatic narrative of Babi Yar as an online encyclopedia entry. It also investigates how these frames are constructed by scrutinizing the process of collaborative frame-building on discussion pages of Wikipedia and exploring how Wikipedia users employ different power play strategies to promote their vision of the events at Babi Yar."
(See also "Framing the Holocaust in popular knowledge" below, and related earlier coverage: "Holocaust articles compared across languages")
From the abstract:
" ... the article conducts a content analysis of three articles, in three different languages [...]: “Auschwitz-Birkenau Camp”, “The Pogrom in Jedwabne”, and “Righteous Among the Nations”. [...] Analyzing how the articles fulfill each of the roles in the different languages, the research hypothesis is that the framing of the phenomena will differ between the versions, and each version will follow pillars of the collective memory of the Holocaust in its respective country. Findings, however, are not in complete compliance with this hypothesis."
(See also "Framing the Holocaust Online" above, and related earlier coverage: "Holocaust articles compared across languages")