By Giovanni Luca Ciampaglia, Lilaroja, Tilman Bayer, Dario Taraborelli, Steven Walling, and Daniel Mietchen

This is the third overview of recent published research on Wikipedia and other Wikimedia projects (previous issues: June 6, April 11), intended to become a monthly feature published jointly with the Wikimedia Foundation Research Committee. In addition to a focus on covering research by academics outside Wikimedia, this issue includes contributions funded by the Wikimedia Foundation. If you want your research to be featured in this monthly newsletter, you can tell us about your work by submitting it to the Wikimedia Research Index.

Edit wars and conflict metrics

A study covered in the previous edition of the research newsletter was extended and published by the authors on ArXiv. The authors report a new method for classifying how disputed a Wikipedia article is, to detect controversies and edit-wars. At its core, the method is based on looking at pairs of editors who have mutually reverted each other, and using their respective edit-counts to define an overall metric of conflict. Even though this formula is not immediately intuitive, the authors describe using special diagrams called "revert maps" on the Cartesian space that depict such pairs of editors. The authors use this classifier to select two samples of pages, of disputed and non-disputed topics,respectively, and analyze the time-series of revisions to these pages; while they find that both time series are characterized by bursts of user activity, they claim there is a qualitative difference between the two, although their analysis appears to lack any form of statistical hypothesis testing. They apply a priority-based model of editor activity that has been already proposed to explain human activity on the web, and find two distinctive patterns of activity that can help class "good" guys vs "bad" guys. [1]

The anatomy of a Wikipedia talk page

Several pieces over the past month have focused on the structure and nature of social interaction on Wikipedia's discussion pages, both from quantitative and qualitative perspectives.

Wikipedians as "Janitors of Knowledge"

In a paper titled "Janitors of Knowledge: Constructing Knowledge in the Everyday Life of Wikipedia Editors",[6] researcher Olof Sundin of Lund University applies concepts from Science and technology studies to an online ethnography study of the Swedish Wikipedia community, focusing on the role of references in particular.

He conducted interviews with eleven active users of the Swedish Wikipedia (out of 20 contacted via e-mail) who had given "informed consent according to the recommendations of the Swedish Research Council". Their activity, as well as discussion on the village pump and on the talk pages of some articles, were observed from August 2009 to February 2010. (The paper does not link diffs of the users' comments, due to privacy reasons.) They were between 20 and 50 years old, with diverse jobs and outside interests. Among other observations, the paper states that "For most of the informants the watch-list ... is the starting point for their [everyday] activities", and that Wikipedia is also a place for identity construction, .... For Wikipedia editors, to edit is not just something you do, it is also a part of who you are". The title refers to the finding that "Cleaning work [e.g. reverting vandals] seems to be the central activity for almost all of the participants" of the interviews. The informants state that citing references has become more important on Wikipedia in recent years, also evidenced by the introduction (in November 2009) of a requirement to cite at least one reference in the criteria for inclusion of new articles in a "New Written Articles of the Week" page (similar to the English Wikipedia's Did You Know). One section is devoted to Wikipedia's "hierarchy of references" (by reliability), mentioning the Swedish Wikipedia's equivalent of WP:RS.

As theoretical framework, Sundin uses an actor-network theory interpretation of Wikipedia, which he explains as follows: "Within such a perspective, the editors, form and functions, core policies, guidelines of Wikipedia, its millions of articles and discussions, references, and users around the world can all be seen as actors, as they make each other do something; they construct, uphold and transform Wikipedia as we know it. An actor, for instance a functional feature in Wikipedia called the watch-list, that makes it easier for the editors to scan new contributions, or a policy document, makes other actors act in a particular way. ... Some actors have a more central role than others and some of these, if we draw on Callon (1986), are so central that they can be called obligatory passage points. An obligatory passage point can be thought of as a threshold that other actors need to pass or adjust to." As such an obligatory passage point in Wikipedia's network of actors, Sundin identifies the Verifiability policy.

Use of Wikipedia among law students: a survey

An article in The Law Teacher titled: "Embracing Wikipedia as a research tool for law: to Wikipedia or not to Wikipedia?" describes an anonymous survey among 101 Australian students (30 senior secondary high school students enrolled in legal studies, and 71 law degree students in their first and second year at the University of Southern Queensland) about their use and perceptions of Wikipedia.[7] Their results indicate "that the majority (78%) of all students surveyed are currently using Wikipedia for some form of legal (30%) or other research (37%) or as a source of general information (11%)." One of the 101 students admitted to have vandalized Wikipedia articles, while two said they corrected errors in Wikipedia. The use of Wikipedia for legal research among the first-year university students was much lower than among the high-school students, which the authors conjecture is "a result of legal research skills training and warnings against its use, and perhaps even a result of cultural adaptation. Seventy-eight percent of the first year law students surveyed acknowledged that Wikipedia can be unreliable and/or inaccurate." However, Wikipedia usage for legal or other research increased again for the second year university students, which the authors surmise could have to do with the students becoming "a little more streetwise within the university context and [finding] the convenience of Wikipedia appealing."

Apart from the poll results, the paper contains a small literature survey about "Wikipedia as a teaching and learning resource", observing that "the use of wikis in legal education is in its infancy. Several of the case studies in the literature reported positive outcomes," and qualitative results from an "informal preliminary investigation into academic perceptions of Wikipedia as a research source in law" ("All the academics consulted considered Wikipedia an unreliable source for legal information ... Some acknowledged a role for Wikipedia as a source for legal or incidental background information" with qualifications about accuracy and reliability). Still, "the authors argue that using Wikipedia as a tertiary source for assimilating broad overview information, for both legal and incidental research, to define and identify keywords for further research, and as a link to other resources, is acceptable when the issues surrounding the discerning use of any secondary source, peer reviewed or not, are fully understood", and that "Academics can and should contribute to Wikipedia either directly, through the contribution of research, or indirectly, through the mentoring of student contributions which can be incorporated into course content and assessments." Among other conclusions, the authors suggest "encouraging universities to develop policies consistent with academic contribution to Wikipedia".


Wikipedia research at OKCon 2011

On June 30 – July 1, the Open Knowledge Foundation held their annual meeting, the Open Knowledge conference (OKCon), this time in Berlin. On the first day, a workshop on Wikipedia & Research took place, organized by Mayo Fuster Morell (member of the Research Committee of the Wikimedia Foundation), who agreed to report back for the Signpost.

A message was already sent by the simple observation that the room was packed with around 50 people, some of them even sitting on the floor. In a tweet, Philipp Schmidt from P2P University commented: "wikipedia research community growing and diversifying. I remember meetings with 5 people, now the room is packed. Great!". The attendance at the workshop is a sign that there is high interest in the question of promoting research around Wikipedia. Furthermore, the good response could be seen from a double perspective: because addressing the questions is considered as important per se, but also in terms of good timing – a question of the right moment.

Since 2005, there has been an increasing interest within the scientific community in researching Wikipedia. In 2011, ten years after Wikipedia started, research on Wikipedia keeps growing, with a body of research and a community of researchers in place. In this regard, according to a recent review, there is currently a total of 2,100 peer-reviewed articles and 38 doctoral theses related to Wikipedia. The willingness to collaborate, to make use of synergies between research initiatives of various kinds, and to continue innovating (in what is already constituting one of the leading nodes of methodological innovation) have also increased and continue to mature. It seems that in 2011 and the coming years, we will see not only the continuation in terms of a quantitative increase, but also a qualitative jump towards a more organized and challenging stage of research initiatives from and around Wikipedia. This can be expected to translate into important changes at the research level, and the initiative of research being promoted by Wikipedia (not only about Wikipedia) is likely to be well received.

During the workshop, Mathias Schindler (from Wikimedia Deutschland) presented the RENDER project – a research project looking at knowledge diversity, which is the first experience of a Wikimedia Chapter engaging in a large research project with other research partners at the European level.

Mayo Fuster Morell presented how Wikipedia had evolved over the years. Starting with quantitative analyses of large data sets and on the English version of Wikipedia as the predominant approach in early empirical research on Wikipedia, the focus then expanded to conducting research on other language versions, covering a larger variety of issues, such as socio-political questions, and also adopting qualitative methods. She also presented the Research Committee, a committee created by the WMF staff consisting of Wikimedia volunteers, researchers, and Wikimedia Foundation staff with the mandate to help organize policies, practices and priorities around Wikimedia-related research).

Daniel Mietchen (likewise a member of the Research Committee of the WMF) presented the draft for an open access and open data policy of the WMF as a requirement for research projects receiving significant WMF support.

Benjamin Mako Hill (Wikimedia Foundation Advisory Board member and intellectual property researcher at MIT, among others) was also present, but stepped back from his planned intervention in favor of allowing time for debate. During the discussion, the question of open data was the central theme of interest to the floor. Other than that, interest was also expressed in the question of data repositories.

The schedule was tight, and the session ended well before the discussions could have reached a conclusion. It remains clear that a continuation of the discussion is needed as much as occasions to meet and develop things together around Wikipedia research and promoting another way of doing research.

Wikimedia Summer of Research: Three topics covered so far

The "Wikimedia Summer of Research" (WSoR, see previous coverage) is a three-month program (ending in September), sponsored by the Wikimedia Foundation, which has brought together a team of eight academics working in the Foundation's Community Department. The goal is to study the dynamics of the editing community, starting with English and focusing particularly on which factors can measurably be said to affect the decline in new editors. The following is a short look at three of the many areas studied so far. Other research can be found on Meta and on Commons.

How new English Wikipedians ask for help

The early weeks of research by Jonathan Morgan, R. Stuart Geiger, and Shawn Walker were focused on how new editors find and interact with help spaces, both within and outside the Help namespace. A combination of qualitative and quantitative methods have been used to address this issue, but the primary data was gathered through qualitative coding of randomized samples of new editors.

The following two charts were derived from the coding of activities by 445 new Wikipedians distributed from 2009–11.[10]

One question that was directed at the summer research team was whether trending articles – such as those about breaking current events or in "In the news" on the Main Page — attract a significant number of new editors compared with articles not affected by current events. Adjacent questions were whether those new editors who registered because of interest in unfolding-event articles are more or less likely to become repeat editors of the encyclopedia.

Using a quantitative sample of a random 20% of the thousands of articles which were trending (in terms of traffic stats) in January 2011, this study by Yusuke Matsubara showed that, perhaps surprisingly, the number of newly registered editors who participate in unfolding-event articles is proportionally quite low.[11] However, the amount of participation from anonymous editors was more significant regardless of semi-protection. This suggests that there may be an opportunity to invite good-faith anonymous contributors on trending articles to participate further by registering accounts.

The workload of new-page patrollers and vandal fighters

One of the theories that has been proposed about the decline in participation by new editors is that newbie biting has increased over the years because more of the burden of policing vandalism, spam, etc. has been shouldered by fewer and fewer active new-page patrollers and vandal fighters, which contributes to burn out. To test this theory, summer researcher Aaron Halfaker looked at the workload of new-page patrollers[12] and vandalfighters[13] since 2007 overall. It found that, like many things in Wikipedia, the trends follow a power law where the top contributors do most of the work. However, contrary to the hypothesis, the number of patrolling actions per editor (by both month and year) has been decreasing steadily.


