A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
This paper[1] reports findings from a survey of Norwegian secondary school students about their use of Wikipedia in the context of their coursework. The survey of 168 students between the ages of 18 and 19 consisted of 33 Likert scale questions and two free response questions. The goal was to assess how Wikipedia figured into students' literacy practices, a concept that encompasses students' and teachers' attitudes towards the resources they use to learn and the social context in which they engage with those resources, as well as the process by which they read, remember, and understand the information provided by each resource.
The main finding of the study is that students' attitudes towards Wikipedia are overwhelmingly positive, but they find the information presented in Wikipedia less trustworthy than their official course materials. Although 90% of respondents rated their textbooks as more trustworthy, they cited the ease of finding factual information (such as dates, names, etc) as a key reason for preferring Wikipedia. They also reported that Wikipedia was better than their textbooks at explaining the "big picture" of a given topic, as well as facilitating more in-depth exploration. In the words of one survey respondent: "If you need to, you can read elaborations about a given topic, or you can just read the summary if that is what you need."
These findings suggest that the primary advantage that Wikipedia offers to students is its flexibility: it allows students to find quick answers and more detailed accounts with equal ease. The findings also suggest that both students and teachers would benefit from a better understanding of how to critically evaluate the quality of information presented in Wikipedia and other open online information resources.
The study also confirmed findings from previous studies: that the vast majority of students use Wikipedia to supplement their official course resources (textbooks, etc), that most of them access Wikipedia via Google search, and that English-speaking students tend to seek information on the English-language Wikipedia first, regardless of their first language or national origin.
A (conference?) paper titled "Beyond Friendships and Followers: The Wikipedia Social Network"[2] applies social network theory to the analysis of relationship between subjects of Wikipedia biographical articles. Using Wikidata and Wikipedia metadata, the authors produce a number of findings. Some of them will not be unexpected to readers, such as that "By far the largest occupational groups are politicians and football players", or "The page with the most mentions of persons is Rosters of the top basketball teams in European club competitions" (with 4,694 mentions of 1,761 different persons). The most referenced persons are Jesus and Napoleon, followed by Barack Obama, Muhammad, Shakespeare, Adolf Hitler, and George W. Bush. Over four fifths of the links in Wikipedia are to male persons, which roughly reflects the gender distribution of Wikipedia biographies; a similar distribution confirms that most of the biographies focus on the 19th and 20th centuries. The authors, however, do not dwell on the social science implications of their findings, but merely suggest that their tool can be used to refine Wikipedia categories and disambiguation tools. The findings are interesting from the perspective of alternate approaches to categorization, as it may suggest possible new categories that haven't yet been created by human editors, and perhaps provides a mathematical model of how Wikipedia categories can be created.
This paper[3] also uses social network theory, as well as the Hofstede's cultural dimensions theory, Schwartz's Theory of Basic Human Values, and McCrae's Five factor model of personality to ask research questions about the concept of online culture; in particular whether it is universal or differs for various national cultures. It focused on 72 Featured Articles in 12 languages (unfortunately, the authors do not explain any reasons for choosing those particular 12 languages over the others); discounting bots, the authors analyzed more than 150,000 editors and 250,000 edits. The authors find that most Wikipedia edits are what they call self-loops, or individual editors making edits to the same articles they have edited before, without their editing being interrupted by edits by another editor. They fail to make any comment on what that really means for the vision of Wikipedia as a collaborative environment. The authors find significant differences in editing patterns between certain Wikipedia projects, though this reviewer finds the description of said differences (focusing on a case study of one Japanese and one Russian article) rather curt. Similarly, their discussion of how the results fit (or don't) with the established theories of Hofstede and others is interesting, but rather short; that unsatisfying brevity may however be due to editorial requirements (the entire paper is only 3.5k words long, instead of the more common average of about 8k). The authors conclude that "new dimensions of online culture can be explored from directly observed online behavior", something that one hopes they'll revisit themselves, together with their dataset, in a longer paper that will do proper justice to it.
A paper at the 19th International Conference on Circuits, Systems, Communications and Computers (CSCC)[4] provides an overview of research on vandalism detection in Wikipedia, with a focus on the usage of machine learning. One of the paper’s conclusions is that future research should aim for language-independency, as little progress has been made outside of the English, German, French, and Spanish Wikipedia editions.
“Measuring Article Quality in Wikipedia Using the Collaboration Network”[5] is a paper that proposes an improved model of co-authorship to be used in predicting the quality of Wikipedia articles. Trained on a stratified sample of articles from the English Wikipedia, it is shown to outperform several baselines. Unfortunately, the dataset used for evaluation omits Start-class articles for no apparent reason, and used the latest revision of an article, which might differ considerably from when an article received its quality rating.
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
Discuss this story