A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
In this paper, "Civic Engagement Meets Service Learning: Improving Wikipedia's Coverage of State Government Officials", the author argues that students' contributions on Wikipedia serve as civic engagement in the educational approach known as service learning. The paper cites other academic work highlighting Wikipedia's value as a teaching platform because of ease of entry, its ability to "boost students' writing, information literacy, creativity, and critical-thinking skills" while they are motivated to create content that "matters to the world". Background research also showed that basic biographical information about political representatives is often hard to find, becoming "a costly and semiprecious commodity".
For the study, students edited the Wikipedia biography of "a state or local representative who lacked a substantial Wikipedia presence", i.e. creating a new article or improving an existing low-quality one. Then they conducted self-reflective essays and "Small-N surveys" concerning the subjective outcomes.
The outcomes were generally positive except for a number of deleted new articles due to Wikipedia notability standards. The survey results found that "students left the course better able to understand government, more attentive to government actions, more likely to discuss government, and more confident that their vote matters".
This paper presents a wealth of results from the "first large-scale analysis of how interactions with images happen on Wikipedia".
The authors first note that (excluding images that appear as icons), only a minority of articles are illustrated:
"Out of the 6.2M articles, 2.7M (44%) contained at least one image, for a total of 5M unique images across all English Wikipedia articles. The vast majority of the articles (91%) contain two images or less, while only 1.5% has more than eight images [..]. Around 84% of images is unique to the article where it appears."
Using a machine learning based topic model, they find that "Geographic articles are the most illustrated, containing 1/4 of the images in our dataset. Biographies, making up 30% of the articles on Wikipedia, also contain around 15% of the images. Topics such as entertainment (movies, plays, books), visual arts, transportation, military, biology, and sports follow, covering together another third of the images in English Wikipedia."
Examining the length of image captions, the study finds a "large fraction of the images without a description and the majority of existing captions centered around ten words." Regarding the position of images in the article, "only 36% of the images in our dataset is generally placed in infoboxes, while only 16% can be found in galleries, and that the majority of inline images are generally placed at the top of the article".
The analysis of reader interactions with these images is based on internal web log data from March 2021 recording three types of such interactions: image views (opening an image in Media Viewer), pageviews (of articles with images) and page previews (on the desktop version of the Wikipedia website), grouping these into reading sessions based on the (somewhat imperfect) heuristic that readers are uniquely identifiable based on the combination of IP address and user agent. A main finding (highlighted in the abstract) is "that one in 29 pageviews results in a click on at least one image, one order of magnitude higher than interactions with other types of article content", or in more detail:
We find that the [global click-through rate] across all pages in English Wikipedia with at least one image is 3.5%, meaning that around 3.5 out of 100 times readers visit a page, they also click on an image. This metric is higher for desktop (5.0%) and lower for mobile web users (2.6%), probably due to differences in the way readers navigate Wikipedia on the two devices and the better Media Viewer experience on desktop. Over time, the behavior also changes depending on the device used. For example, on desktop, readers tend to click more often on images during weekdays (Monday to Friday), with an increase of 5.5% over weekends ...
Images in articles about "topics such as transportation, visual arts, geography, and military" were found to have higher engagement, whereas |clicks on images are less likely in education, sports, and entertainment articles." Furthermore,
we observe that the most important negative predictor is the text offset, i.e. the relative position of the image with respect to the length of the article, meaning that images are more clicked if placed in the upper part of an article. Regarding the visual content, we observe a strong positive effect of outdoor settings, consistently with the positive coefficients of transportation and geography, topics in which a large portion of images display outdoor scenes. Regarding the image position on the page, we find that images in galleries show a high level of engagement, as well as images in the infobox, even though with a moderate effect.
The researchers also investigated how reader engagement was associated with page popularity and image quality (using an automated rating of image quality, based on a machine learning model trained on a balanced dataset of community rated "quality images" on Commons):
The paper proceeds to study more involved questions, e.g. finding that "the tendency to click on images with faces varies depending on page popularity. On pages with less that 1000 monthly pageviews, the presence of faces induces higher level of interactions, with a difference of 0.1%, whereas, after 1000 pageviews, we observe the opposite behavior, with a difference of 0.06%." and concluding that "Faces engage us, but only if unfamiliar".
Another high-level conclusion is that "Images serve a cognitive purpose" on Wikipedia - based on "a negative relation between article length and iCTR. This suggests that [...] images might be used by readers to complement missing information in the article".
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
From the abstract and paper:
"We present an Attention Feedback (AF) approach for Wikipedia readers. The fundamental idea of the proposed approach comprises the implicit capture of gaze-based feedback of Wikipedia readers using a commodity gaze tracker. The developed AF mechanism aims at overcoming the main limitation of the currently used “pageview” and “survey” based feedback approaches, i.e., data inaccuracy."
"For each reading session, along with the gaze density heat map, we also provide a set of sentences where a user-focused while reading along with the time for which each sentence was focused. [...] After processing the sentences, we arrange them in the order they are read along with their gaze quotient. By gaze quotient, we mean the time duration (in seconds) for which a sentence is being gazed at or read. [...] the proposed AF framework also captures some additional information listed below: (1) Wikilink clicks [...] (2) Eye blinks [...] (3) Scroll events [...]"
"Extensive experiments demonstrated [this setup's] efficiency compared to other feedback approaches used by the Wikipedia research community [...] Moreover, incorporating a single-camera image processing-based gaze tracker into a web application framework makes the overall system costefficient and portable. This study’s outcomes are currently being discussed in the Wikimedia Foundation for developing specialized tools to capture readers’ implicit feedback."
(See also meta:Research:Which parts of an article do readers read for an overview of related work)
From the abstract:
"We have conducted an ethnographic analysis of several [of the French] Wikipedia's terrorist attacks pages as well as interviews with regular Wikipedia's contributors. We document how Wikipedia is used during crisis by readers and contributors. Doing so, we identify a specific pace of contributions which provides reliable information to readers. [...] we highlight how historical sources (i.e. traditional media and authorities) support this pace. Our analyses demonstrate that citizens are engaging very quickly in processes of resilience and should be, therefore, considered as relevant partners by authorities when engaging a response to the crisis."
From the abstract:
"... we analyse contributions on Wikipedia and Twitter during major crises in France through online ethnographies and semi-structured interviews to investigate their roles in building and sharing information. Wikipedia has often been analysed as a collaborative tool but this approach has underestimated its use in reducing uncertainty in times of crisis. We demonstrate that despite their distinct pace and designs, Twitter and Wikipedia are used with seriousness by citizens in their dissemination of information."