A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
Northeastern University researcher Brian Keegan analyzed the gathering of hundreds of Wikipedians to cover the Sandy Hook Elementary School shooting in the immediate aftermath of the tragedy. The findings are reported in a detailed blog post that was later republished by the Nieman Journalism Lab.[1] Keegan observes that the Sandy Hook shooting article reached a length of 50Kb within 24 hours of its creation, making it the fastest growing article by length in the first day among recent articles covering mass shootings on the English-language Wikipedia. The analysis compares the Sandy Hook page with six similar articles from a list of 43 articles on shooting sprees in the US since 2007. Among the analyses described in the study, of particular interest is the dynamics of dedicated vs occasional contributors as the article reaches maturity: while in the first few hours contributions are evenly distributed with a majority of single-edit editors, after hour 3 or 4 a number of dedicated editors show up and "begin to take a vested interest in the article, which is manifest in the rapid centralization of the article". A plot of inter-edit time also shows the sustained frequency of revisions that these articles display days after their creation, with Sandy Hook averaging at about 1 edit/minute around 24 hours since its first revision. The notebook and social network data produced by the author for the analysis are available on his website. The Nieman Journalism Lab previously covered the role that Wikipedia is playing as a platform for collaborative journalism, and why its format outperforms Wikinews with an interview of Andrew Lih published in 2010.[2] The early revision history of the Sandy Hook shooting article was also covered in a blog post by Oxford Internet Institute fellow Taha Yasseri, however with a focus on the coverage in different Wikipedia language editions.[3]
In a forthcoming paper in the Journal of Management Information Systems (presented earlier at HICSS '12[4]), Xiaoquan (Michael) Zhang and Chong (Alex) Wang use a natural experiment to demonstrate that changes to the position of individuals within the editor network of a wiki modify their editing behavior. The data for this study came from the Chinese Wikipedia. In October 2005, the Chinese government suddenly blocked access to the Chinese Wikipedia from mainland China, creating an unanticipated decline in the editor population. As a result, the remaining editors found themselves in a new network structure and, the authors claim, any changes in editor behavior that ensued are likely effects of this discontinuous "shock" to the network. The paper defines each editor as a node (vertex) in the network and a tie (edge) between two editors is created whenever the editors edit the same page in the wiki. They then examine how changes to three aspects of individual editors' relative connectedness (centrality) to other editors within the network altered their subsequent patterns of contribution.
The main finding is that changes in the three kinds of editors' connectedness within the network result in differential changes to their editing behavior. First, an increase in the number of direct connections between one editor and the rest of the network (degree centrality) resulted in fewer edits by that editor, and more work on articles they created. Second, an increase in the overall proximity of an editor to the other members of the network (closeness centrality) resulted in fewer edits and less work on articles they created. Third, an increase in the extent to which an editor connected otherwise isolated groups in the network (betweenness centrality) resulted in more edits and more work by that editor on articles they created. Overall, these results imply that alterations to the network structure of a wiki can change both the quantity and quality of editor contributions. The researchers argue that their findings confirm the predictions of both network game theory and role theory; and that future research should try to analyze the character of the network ties created within platforms for large-scale online collaboration, to better understand how changes to network structure may alter collaborative practices and public goods creation.
In an online early version of an upcoming article in Atención Primaria,[5] researchers at the Miguel Hernández University of Elche and the University of Alicante have benchmarked articles on pharmaceutical drugs in the Spanish Wikipedia against information available in a pharmaceutical database, Vademécum.[6] A subset of the Vademécum corpus of 3,595 drugs was created using simple random sampling without replacement, consisting of 386 drugs. Of these, 171 (44%) had entries on the Spanish Wikipedia, which were then scrutinized along several dimensions in May 2012. Usage of the drug was correctly indicated in 155 (91%) of these articles, dosage in 26 (15%), and side-effects in 64 (37%), with only 15 articles (9%) scoring well in all of these dimensions. The researchers conclude that, while Wikipedia has a high potential to help with the dissemination of pharmaceutical knowledge, the Spanish-language edition does not currently live up to this potential. As a possible solution, they suggest the pharmaceutical community more actively participate in editing Wikipedia. The list of the drugs involved has not been made public, since a similar study is currently underway whose results may be distorted by targeted intervention. The authors have signalled to this research report their intention to make the list available after this second study is complete.
A paper posted to ArXiv[7] by SFI's Omidyar fellow Simon DeDeo presents evidence for non-finite state computation in a human social system using data from Wikipedia edit histories. Finite state-systems are the basis for the study of formal languages in computer science and linguistics, and many real-world complex phenomena in biology and the social sciences are also studied empirically by assuming the existence of underlying finite-state processes, for the analysis of which powerful probabilistic methods have been devised. However, the question of whether the description of a system truly entails a finite or a non-finite, unbounded number of states, is an open one. This is significant from a functionalist point of view: can we classify a system by its computational properties, and can these properties help us better understand how the system works regardless of its material details?
The paper's contribution lies in its proof of a probabilistic generalization of the pumping lemma, a device used in theoretical computer science as a necessary condition for a language to be described by only a finite number of states. The lemma is applied to the edit histories of a number of the most frequently edited articles in the English Wikipedia, after being properly transformed into coarse-grain sequences of "cooperative" or "non-cooperative (reversion) edits (reverts being identified by means of their SHA1 field). A Bayesian argument is applied to show that the lemma cannot hold for a majority of sequences, thus showing that Wikipedia's collaborative editing system as a whole cannot be described by any aggregation of finite-state systems. The author discusses the implications of this finding for a more grounded study of Wikipedia's editing model, and for the identification of detailed computational models of other social and biological systems.
Michela Ferron, a member of the SoNet (Social Networking) research group at the Bruno Kessler Foundation in Trento, Italy submitted her PhD thesis[8] in December 2012. She examined the idea of viewing Wikipedia as a venue for collective memory and the language indicators of the dynamic process of memory formation in response to "traumatic" events. Parts of the thesis have already been published in journals and conference proceedings, such as WikiSym 2011 and 2012 (cf. presentation slides).
A full chapter is dedicated to the background on the concept of collective memory and its appearance in the digital world. The thesis continues with an analysis of "anniversary edits", showing a significant increase in editorial activities on articles related to traumatic events during the anniversary period compared to a large random sample of "other" articles. More detailed linguistic indicators are introduced in the next chapter. It is statistically shown that the terms related to affective processes, negative emotions, and cognitive and social processes occur more often in articles on traumatic events; "Specifically, the relative number of words expressing anxiety (e.g., “worried”), anger (e.g., “hate”) and sadness (e.g., “cry”) was significantly higher in articles about traumatic events".
In the next step, Ferron tried to distinguish between human-made and natural disasters. It has been observed that "human-made traumatic events were characterized by language referring to anger and anxiety, while the collective representation of natural disasters expressed more sadness". Finally, a detailed case study of the talk pages of articles on the 7 July 2005 London bombings and the 2011 Egyptian revolution was carried out, and language indicators, especially those related to emotions, were investigated in a dynamic framework and compared for both examples.
A First Monday article[9] reviews several aspects of the Wikipedia participation in the 18 January 2012 protests against SOPA and PIPA legislation in the US. The paper focuses on the question of legitimacy, looking at how the Wikipedia community arrived at the decision to participate in those protests.
The paper provides an interesting discussion of legitimacy in Wikipedia's governance, and discusses the legitimacy of the decision to participate in the protests. The author notes that the initiative was given a major boost by Jimmy Wales' charismatic authority, as Wales posted a straw poll about the issue on his talk page on December 10, 2011, as while the issue was discussed by the community beforehand (for example, in mid-November at the Village Pump), those discussions attracted much less attention. It is hard to say whether the protest would have happened without Jimbo's push for more discussion, as it veers towards "what if" territory; as things happened, it is true that Jimbo's actions began a landslide that led to the protests. However, this reviewer is more puzzled at the claim made in the introduction to the article that the discussion involved a "massive involvement of the Wikimedia Foundation staff". While several WMF staffers were active in the discussions in their official capacity, and while the WMF did issue some official statements about the ongoing discussion, the paper certainly does not provide any evidence to justify the word "massive".
The paper subsequently notes that the WMF focused on providing information and gently steering the discussion, without any coercion; this hardly justifies the claim of "massive involvement". At the very least, a clear explanation is necessary of precisely how many WMF staffers participated in the discussion before such a grandiose adjective as "massive" is used. It is true that the WMF staffers helped push the discussion forward, but this reviewer believes that the paper does not sufficiently justify the stress it puts on their participation, and thus may overestimate their influence.
The third part of the paper discusses how the arguments about legitimacy or the lack of it framed the subsequent discourse of the voters. The author notes that after initial period of discussing SOPA itself, the discussion of whether it was legitimate or not for Wikipedia to become involved in the protest took over, with a major justification for it emerging in the form of an argument that it was legitimate for Wikipedia to protest against SOPA as SOPA threatened Wikipedia itself. While this is an interesting claim, unfortunately, other than citing one single comment, no other qualitative or quantitative data are provided; nor is the methodology discussed. We are not told how many individuals voted, how many commented on legitimacy or illegitimacy, how many felt that Wikipedia is threatened; we do not know how the author classified comments supporting any of the viewpoints, or the shifts in the discussion ... this list could unfortunately go on. In one specific example drawn from the conclusion, the author writes that "The main factor that shaped the multi-phased process was the will to have the community accept the final decision as legitimate, and avoid backlash. This factor especially influenced those who are suspected of relying on traditional means of legitimacy such as charisma or professionalism." At the same time, we are provided with no number, no percentage, and certainly no correlation to back up this claim. Without a clear methodology or distinct data it is hard to verify the author's claims and conclusions.
The introduction also notes that "the mass effort of planning an effective political action was not something “anyone [could] edit”" and "the debate preceding the blackout did not follow Wikipedia’s open and anarchic decision-making system"; unfortunately this reviewer finds no justification for those rather strong claims anywhere else in the article.
Overall, this is an interesting paper about legitimacy in Wikipedia, but it seems to overreach when it tries to draw conclusions from the data that is simply not presented to the reader. It suffers from a failure to explain the research's methodology, making verification of the claims made very hard. Due to the lack of hard data, most conclusions are unfortunately rendered dubious, and the paper has a tendency to make strong claims that are not backed up by data or even developed later on.
In his Communication and Society PhD dissertation,[11] Randall M. Livingstone of the University of Oregon explores the relationship between the social and technical structures of Wikipedia, with a particular focus on bots and bot operators. After a fairly broad literature review (which summarizes the basic approaches to Wikipedia studies from new media theory, social network analysis, science and technology studies, and political economy), Livingstone gives a concise history of the technical development of Wikipedia, from UseModWiki to MediaWiki, and from a single server to hundreds.
The most interesting chapters for Wikipedians will be V – Wikipedia as a Sociotechnical System – and VI – Wikipedia as Collective Intelligence. Chapter 5 looks at the ways the editing community and the evolution of software (both MediaWiki and the semi-automated tools and bots that interact with editors and articles) "construct" each other. Based on 45 interviews with bot operators and WMF staff, this chapter gives an interesting and varied picture of how Wikipedia works as a sociotechnical system. It will in part be a familiar account to the more tech-minded Wikipedians, but offers an accessible overview of bots and their place in the ecosystem to editors who normally steer clear of bots and software development. Chapter 6 looks at theories of intelligence and the concept of collective intelligence, arguing that Wikipedia exhibits (at least to some extent) the key traits of stigmergy, distributed cognition, and emergence.
Discuss this story
While I'm at it, I think that you mean that the discussion on Jimmy's talk page about SOPA was in 2011, not 2001.--Sturmvogel 66 (talk) 06:36, 2 January 2013 (UTC)[reply]
I have never seen evidence that First Monday has pubhlished a useful and important paper on anything whatsoever; just bad papers on important subjects, trolling for attention - David Gerard (talk) 08:12, 2 January 2013 (UTC)[reply]
If the Spanish Wikipedia is anything like the English Wikipedia then there is a reason why they would score poorly for drug dosage information: we strongly discourage including it. We don't give medical advice and this is an open wiki - both reasons why such information would be unacceptable. This is an encyclopaedia, not a drug formulary. -- Colin°Talk 11:20, 2 January 2013 (UTC)[reply]
This page is one of the best Signpost articles I have ever seen. Good job! As for the SOPA blackout, I think a case can be made for it starting on Reddit, then the many Redditors who are also Wikipedia editors starting "a landslide that led to the protests" on Wikipedia. This appears to predate Jimbo's straw poll. (What percentage of Wikipedians watch Jimbo's talk page anyway?) --Guy Macon (talk) 09:00, 2 January 2013 (UTC)[reply]
Hmm, I'm not sure it's worth getting-into-it, but I find the review of the SOPA paper above really bizarre. It kinds of reminds me of a parody about reviewing various printed books as if they were novels - i.e. the telephone listings were praised for immediately introducing many interesting characters, but didn't flesh them out and had no plot or drama. Here the reviewer is treating a humanities paper as if it were a mathematical analysis, and thus finding it wanting. For example - "At the same time, we are provided with no number, no percentage, and certainly no correlation to back up this claim". (oh no, he gave only one example, ok here's another) "Due to the lack of hard data, most conclusions are unfortunately rendered dubious, and the paper has a tendency to make strong claims that are not backed up by data or even developed later on.". Now, maybe one can argue humanities papers are gibberish because of those sort of problems in general, and certainly that case can be made :-). But it's weird to see that general argument applied as if it were a specific failing, to an ordinary paper in its genre. -- Seth Finkelstein (talk) 23:07, 2 January 2013 (UTC)[reply]
That's the sort of thing First Monday does. I see they've taken "A Critique of Vulgar Raymondism" downAh, sorry, you're talking about the story - David Gerard (talk) 23:36, 2 January 2013 (UTC)[reply]HOW WIKIPEDIA'S CONCERN IS DEPICTED ABOVE:
HOW WIKIPEDIA'S CONCERN WAS DEPICTED BY REP. LAMAR SMITH
WIKIPEDIA'S ACTUAL STATED CONCERN (WP:BLACKOUT):
It wasn't an accurate depiction of Wikipedia's concerns when Lamar Smith first said it, and it still isn't. --Guy Macon (talk) 04:00, 4 January 2013 (UTC)[reply]