A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
This paper[1] begins with a review of prior research on various reasons for editor dropout on Wikipedia, which focus on the stress of interpersonal conflict and overburdened volunteers, especially admins. It then adds the methods and findings from new research on "more experienced and active Wikipedians", the 1% contributing the most time and content. One startling perspective gained from the survey of the 300 most active Wikipedians (with a 41% survey response rate!) is a lack of recognition from the wider academic and professional community. Volunteer Wikipedia editing is not often treated as a "legitimate" volunteer activity contributing e.g. to professional development. The author also states "the current Wikimedia Foundation efforts directed at increasing positive reinforcement, developed with a focus on increasing the retention of new editors ... may be much less efficient ... when it comes to ... long-term highly active contributors" and concludes that more research on interpersonal conflict as a motivation for retirement needs to be conducted.
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.
From the proceedings of OpenSym 2018:
From the abstract:[2] "This research aims to find the extent to which a particular group of university students vandalize Wikipedia, while also exploring their perceptions of vandalism. Data is obtained from a questionnaire sent to university students in educational psychology, early and primary childhood education, and related master’s programs, as well as a focus group involving a sample of these students and interviews with editors in charge of maintaining Wikipedia. [...] it seems that students and editors have some preconceived ideas (boredom, amusement, or ideological motivations) about what pushes individuals to vandalize."
Among the 928 survey participants, only 39 (4%) reported to have vandalized Wikipedia. Younger students were vandalizing more often ("there is a meaningful difference between students under 23 (5.3 % of them vandalize) and both students from 24 to 30 (1.9 %) and from 31 to 40 (0%)"), but on the other hand there was no significant difference between male and female respondents.
From the abstract:[3] ".... in comparison to photosharing sites like Flickr and mobile apps like Instagram, Commons is largely unknown to the general public and under-researched by scholars. We conducted an exploratory study to determine if an alternative means of contribution—a mobile application that gamifies implicitly desirable and useful behavior—could broaden awareness of and participation in Commons. Our findings from an online survey (N=103) suggest that by creating value around implicitly desirable behaviors, we can create new opportunities and alternative pathways for both increasing and broadening participation in peer-production communities such as Commons."
From the paper:[4] "The analysis shows that the introduction of a few key terms acts as milestones for the evolution of the article. When a factoid is added, more knowledge related to that factoid is likely to be added. However, different users get triggered differently, leading to the inclusion of diversified knowledge into the articles." (compare also "Do less active participants make active participants more active?" below)
See also our earlier coverage of other OpenSym 2018 papers:
Other publications:
From the abstract:[5] "In this study, we probe the indirect influence of less active participants' contributing behaviors on the quality of knowledge collaboration. [...] Using the edit data of featured articles in the Chinese Wikipedia, we examine the proposed causal path. The main findings of this study are as follows: the productivity of active participants of a Wikipedia article increases when they are triggered by less active participants' editing activities; the additional edits of active participants triggered by less active participants can improve the quality of an article; and less active participants play a major role in reviving the editing work of dormant articles. These findings reveal that less active participants play a substantial role in knowledge collaboration in online communities, as their contributing behaviors sustain collaborative work and eventually improve the quality [of Wikipedia]." (compare also "Triggering" paper above)
From the paper's[6] introduction: "51% of entity attributes in English Wikipedia infoboxes are not described in English Wikipedia articles. We aim to fill in this knowledge gap via a system that can take an entity as input and automatically generate a natural language description.”
From the abstract:[7] "Nowadays, editors tend to separate different subtopics of a long Wikipedia article into multiple sub-articles. This separation seeks to improve human readability. However, it also has a deleterious effect on many Wikipedia-based tasks that rely on the article-as-concept assumption, which requires each entity (or concept) to be described solely by one article. This underlying assumption significantly simplifies knowledge representation and extraction [...] In this paper we provide an approach to match the scattered sub-articles back to their corresponding main-articles, with the intent of facilitating automated Wikipedia curation and processing." (see also related earlier coverage: "Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia")
From the abstract:[8] "We consider the network of 5 416 537 articles of English Wikipedia extracted in 2017. Using the recent reduced Google matrix (REGOMAX) method we construct the reduced network of 230 articles (nodes) of infectious diseases and 195 articles of world countries. [...] PageRank and CheiRank algorithms are used to determine the most influential diseases with the top PageRank diseases being Tuberculosis, HIV/AIDS and Malaria. From the reduced Google matrix we determine the sensitivity of world countries to specific diseases integrating their influence over all their history including the times of ancient Egyptian mummies. The obtained results are compared with the World Health Organization (WHO) data demonstrating that the Wikipedia network analysis provides reliable results with up to about 80 percent overlap between WHO and REGOMAX analyses."
From the abstract:[9] "The Spoken Wikipedia project unites volunteer readers of encyclopedic entries. Their recordings make encyclopedic knowledge accessible to persons who are unable to read [...]. However, [these recordings] can only be consumed linearly [...]. We present a reading application which uses an alignment between the recording, text and article structure and which allows to navigate spoken articles, through a graphical or voice-based user interface (or a combination thereof). We present the results of a usability study in which we compare the two interaction modalities. We find that both types of interaction enable users to navigate articles and to find specific information much more quickly compared to a sequential presentation of the full article." (see also code on GitHub)
From the abstract and paper:[10] "This paper builds upon previous research, where we identified six common participation patterns, i.e. roles, in Wikidata. In the research presented here, we study the applicability of sequence analysis methods by analyzing the dynamics in users’ participation patterns. The sequence analysis is judged by its ability to answer three questions: (i) 'Are there any preferable role transitions in Wikidata?'; (ii) 'What are the dominant dynamic participation patterns?'; (iii) 'Are users who join earlier more turbulent contributors?' [answer: "the earlier an user joins Wikidata, the more turbulent his/her dynamic participation pattern is"] Our data set includes participation patterns of about 20,000 users in each month from October 2012 to October 2014."
Discuss this story