The Signpost

Recent research

Top female Wikipedians, reverted newbies, link spam, social influence on admin votes, Wikipedians' weekends, WikiSym previews

Contribute  —  
Share this
By Tilman Bayer, Daniel Mietchen, Dario Taraborelli and Jodi.a.schneider

What the most active female editors contribute

A paper addressing gender imbalance in Wikipedia ("Gender differences in Wikipedia editing") by Judd Antin and collaborators won the "Best Short Paper" award at WikiSym.[1] This follows the awarding of "best full paper" to another study on the gender gap[2] already covered in previous editions of the research newsletter. The study by Antin and collaborators sampled 256,190 users who created a new account on the English Wikipedia between September 2010 and February 2011 and qualitatively coded their contribution by category of wiki work. The results suggest that, whereas in the lower three quartiles by activity level men and women make roughly the same contributions in each category of wiki work, in the top quartile editors behave in a significantly different way. The researchers found that among the top 25% of Wikipedians by activity level:

Effects of reverts on wiki work

Two plots show the changes in an editor's boldness after being reverted.

Another WikiSym 2011 paper by GroupLens researchers, including Summer of Research fellow Aaron Halfaker ("Don’t bite the newbies: how reverts affect the quantity and quality of Wikipedia work"), reports on the effects of reverts on the quality and quantity of Wikipedia editors, with a specific focus on newbies.[3] The study uses a number of key metrics to assess the quality of editor contributions (using reverts per revision and Persistent Word Revisions or PWR, to measure the survival across revisions of words added by an editor, other than stop-word) and changes in editor activity (using a controlled activity delta that calculates an editor's variation of activity across weeks with respect to the week preceding the revert, normalized by the editor's daily rate of activity). The results point at the same time at the important role of reverts as a learning and quality improvement process but also at their negative effects on new contributors. Below are highlights from this study:

These results are consistent with the findings by Summer of Research fellows on the effects of community interactions with new Wikipedians.

Further Wikipedia coverage at WikiSym 2011: Social dynamics and global reach

Geographic location of edits for English Wikipedia article 2011 Egyptian revolution

The "Wiki tools and interfaces" session at WikiSym will see the presentation of a paper titled "Autonomous link spam detection in purely collaborative environments". According to the five authors from the University of Pennsylvania, link spam is currently "an annoying, but non-pervasive issue", but could become a grave threat to Wikipedia if new spam techniques that were explored by some of them in another paper (see below) become more widespread.

Using the STiki software by one of the authors, which is already widely used as an anti-vandalism tool on the English Wikipedia, the researchers collected mainspace edits adding external links and extracted a corpus of 5,962 link additions classified as either ham or spam, using criteria such as whether the edit had been rolled back (to determine spam), or whether it had been added by a user with rollback rights (to determine ham). From this, the researchers derived numerous features that indicate link spamming behavior, in three areas: On-wiki evidence (including very simple metrics such as the URL's length – spam links tend to be shorter – or that older and more popular articles are more likely to be targeted), properties of the landing page that the link points to (these were found to be less useful), and classification from third-party sites, including Alexa and Google Safe Browsing. The backlinks data provided by Alexa proved to be most useful for the classifier that the authors went on to construct, and tested in a live implementation in the STiki tool. They conclude that "it is clear this work will benefit the Wikipedia community".

In another paper, presented earlier this month at CEAS ‘11, five authors from the same university including two of the same researchers examine the possibility of "Link spamming Wikipedia for profit". They picture spam detection on Wikipedia as a pipelined process, with the MediaWiki spam blacklist as the first stage (currently containing around 17000 regular expressions), recent changes patrollers (often aided by software tools) as the next – often reacting within seconds after an edit, watchlisters as the third (within minutes to days), and finally review by normal readers as the last stage. Based on a spam/ham corpus constructed as in the other paper, this paper contains some further analysis of the characteristics of link spam destinations and spamming accounts, and of the exposure spammed links receive before they are removed (determined by both the link's lifespan and the popularity of the spammed page). The most sensitive part of the paper then leverages these results to "describe a novel and efficient spam model we estimate can significantly outperform status quo techniques", e.g. by rapidly adding links to exploit the time lag of Wikipedia's spam removal process, or targetting popular pages. In a nod to WP:BEANS, the researchers admit that "there is the possibility that we have introduced previously unknown vectors", but the "Ethical Considerations" section emphasizes that:

"It is in no way this research’s intention to facilitate damage to Wikipedia or any wiki host. The vulnerabilities discussed in this section have been disclosed to Wikipedia’s parent organization, the Wikimedia Foundation (WMF). Further, the WMF was notified regarding the publication schedule of this document and offered technical assistance."

The authors also point to the implementation of the spam mitigation tool described in the WikiSym article.

However, the paper fails to mention that last year, one of its authors conducted actual, extensive tests of spamming techniques on the English Wikipedia that are very similar to those outlined in the paper. The spam attacks gained the attention of several IT security news websites, and even involved setting up a fake webshop to measure how many Wikipedia readers would have carried out an actual purchase of the penis enlargement pills advertised in the links. The case led to the researcher's temporary ban as a Wikipedia user, later lifted by the arbitration committee, and informed the research guidelines drafted later that year by the Wikimedia Foundation's Research Committee. See Signpost coverage: "Large scale vandalism revealed to be 'study' by university researcher" (includes a background interview with the researcher).

How social ties influence admin votes

A paper by three researchers from the University of the Philippines Diliman[6], presented at the International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2011) two months ago, examined statistical relations between the voting behavior in requests for adminships (RfAs) and the on-wiki social contacts of participants. The paper includes a brief review of existing literature (in particular two papers which already studied the relation with existing social networks[7][8]). Drawing from a January 2008 dump of the English Wikipedia, they analyzed 2,587 elections conducted between 2004 and 2008 (48% of them successful, with 7,231 users voting or running in at least one RfA, and 80% of the final non-neutral votes being supportive), and "1,097,223 instances of communication between 265,155 distinct pairs of users" who had run or voted in an RfA – from user talk page messages, an undirected social graph was generated. Their results concern three areas:

Wikipedians' weekends in international comparison

A paper titled "Temporal characterization of the requests to Wikipedia" examined how search requests, read accesses and edits on Wikipedia change over time, and relate to those at the entirety of Wikimedia sites (based on squid logs for the whole year of 2009, provided by the Wikimedia Foundation). Among findings are differences between language versions of Wikipedia, such as that the "the number of edits tends to raise in weekends" for the French, Japanese, Dutch and Polish Wikipedia, but not for other languages. Another paper, titled "Circadian patterns of Wikipedia editorial activity: A demographic analysis"[9], similarly analyzed "34 Wikipedias in different languages [trying] to characterize and find the universalities and differences in temporal activity patterns of editors", with the underlying data provided by the German Wikimedia chapter from the toolserver. They found that "in contrast to diurnal [daily] pattern, which is universal to a great extent, weekly activity patterns of WPs show remarkable differences. We could, however, identify two main categories, namely 'weekends' and 'working days' active WPs."[10]


In brief

References

  1. ^ Antin, Judd, Raymond Yee, Coye Cheshire, and Oded Nov (2011). Gender Differences in Wikipedia Editing. WikiSym 2011: Proceedings of the 7th International Symposium on Wikis, 2011. PDF Open access icon
  2. ^ S.T.K. Lam, A. Uduwage, Z. Dong, S. Sen, D.R. Musicant, L. Terveen, and J. Riedl (2011). WP:Clubhouse? An Exploration of Wikipedia's Gender Imbalance. In WikiSym 2011: Proceedings of the 7th International Symposium on Wikis, 2011. PDF Open access icon
  3. ^ Halfaker, Aaron, Aniket Kittur, and John Riedl (2011). Don't Bite the Newbies: How Reverts Affect the Quantity and Quality of Wikipedia Work. WikiSym '11: Proceedings of the 7th International Symposium on Wikis. PDF Open access icon
  4. ^ A.G. West and I. Lee (2011). What Wikipedia Deletes: Characterizing Dangerous Collaborative Content. In WikiSym 2011: Proceedings of the 7th International Symposium on Wikis. PDF Open access icon
  5. ^ Ferron, Michela, and Paolo Massa (2011). Collective memory building in Wikipedia: The case of North African uprisings. WikiSym 2011: Proceedings of the 7th International Symposium on Wikis. PDF Open access icon
  6. ^ Cabunducan, Gerard, Ralph Castillo, and John Boaz Lee (2011). Voting behavior analysis in the election of Wikipedia admins. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, 545–547. IEEE DOI Closed access icon
  7. ^ J. Leskovec, D. Huttenlocher, J. Kleinberg (2010) Predicting positive and negative links in online social networks. ACM WWW International conference on World Wide Web (WWW '10), 2010. video PDF Open access icon
  8. ^ J. Leskovec, D. Huttenlocher, J. Kleinberg (2010) Governance in Social Media: A case study of the Wikipedia promotion process. In: AAAI International Conference on Weblogs and Social Media (ICWSM '10). video PDF Open access icon
  9. ^ Yasseri, Taha, Sumi, Róbert, Kerétsz, János (2011). Circadian patterns of Wikipedia editorial activity: A demographic analysis, ArXiV (September 8, 2011). PDF Open access icon
  10. ^ Reinoso, Antonio J., Jesus M. Gonzalez-Barahona, Rocio Muñoz-Mansilla, and Israel Herraiz (2011). Temporal characterization of the requests to Wikipedia. In Proceedings of the 5th International Workshop on New Challenges in Distributed Information Filtering and Retrieval (DART 2011). ETSI Caminos, Canales y Puertos (UPM), September 13, 2011. PDF Open access icon
  11. ^ Reagle, Joseph, and Lauren Rhue (2011). Gender Bias in Wikipedia and Britannica. International Journal of Communication 5 (2011): 1138–1158. PDF Open access icon
  12. ^ José Felipe Ortega and Joaquín Rodríguez López (2011). El potlatch digital. Wikipedia y el triunfo del procomún y el conocimiento compartido, Catedra, September 2011. HTML Closed access icon
  13. ^ Badgett, Robert G, and Mary Moore (2011). Are students able and willing to edit Wikipedia to learn components of evidence-based practice? Kansas Journal of Medicine 4(3), August 30, 2011. PDF Open access icon
  14. ^ Reagle, Joseph M. (2010). Good Faith Collaboration: The Culture of Wikipedia. The MIT Press, 2010. HTML Open access icon
  15. ^ Liu, J. (2011). W7 model of provenance and its use in the context of Wikipedia. PhD dissertation, The University of Arizona, 2011. PDF Closed access icon
  16. ^ Graham, M., Hale, S. A. and Stephens, M. (2011) Geographies of the World’s Knowledge. Ed. Flick, C. M., London, Convoco! Edition. PDF Open access icon
  17. ^ He, Zeyi (2011). Measuring the Development of Wikipedia. In 2011 International Conference on Internet Technology and Applications, IEEE DOI Closed access icon
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Less technical.

Could the standard deviation results be expressed in less technical terms so that 95% of readers know what it means? One standard deviation, BTW, is a range of about 34% on both sides of the mean, or average. Tony (talk) 03:11, 27 September 2011 (UTC)[reply]

Assuming a normal distribution... let's not jump to conclusions. Dcoetzee 20:52, 3 October 2011 (UTC)[reply]

Antarctica.

While Antarctica vs. country may make for a nice sound bite, it's very much apples and oranges. Whereas Antarctica's population may be just a handful, its scientific significance and the variety of wildlife are well, continent-sized. Jztinfinity (talk) 04:54, 27 September 2011 (UTC)[reply]

Not so sure about the current variety of wildlife, last I heard Antarctica was mostly ice cap with a few penguins round the coast. Things would have been very different a few million years ago, but in recent millennia there has not been much plantlife except in the surrounding seas. ϢereSpielChequers 08:25, 27 September 2011 (UTC)[reply]
See Ucucha's comments at Wikipedia_talk:Wikipedia_Signpost/2011-09-19/In_the_news. -- Jeandré, 2011-09-27t13:55z
I didn't take the time to follow the link above—I'm just skimming through here—but I just wanted to say, don't forget microbes when you consider the topic of "wildlife in Antarctica"—I bet Antarctica has a lot of microbes that we don't know about yet. Cheers, — ¾-10 23:28, 27 September 2011 (UTC)[reply]
And don't forget the presence of Belgica antarctica, the largest solely terrestrial animal indigenous to Antarctica. (The WP article definitely needs work -- at least someone should add which parts of Antarctica B. antarctica has been found.) -- llywrch (talk) 06:08, 28 September 2011 (UTC)[reply]

Reverting and quality

this seems like a dubious analysis: "reverted editors are less likely to be reverted in the future (particularly in the week after the revert), whereas the probability of being reverted in the control group keeps growing every week." the second part can be rephrased as "the longer you edit the more probable you will get reverted". Insofar as EVERYONE gets reverted eventually, it doesn't mean much to me. Circéus (talk) 19:46, 27 September 2011 (UTC)[reply]


  • Just a important general note about reverts: Not all editors are equal. The dumber a person is, the more likely it is that they will be reverted. So, even if reverts suppress future editing, this is not necessarily a bad thing. Jason Quinn (talk) 14:28, 29 September 2011 (UTC)[reply]
My thoughts exactly! The fact that reverted editors edit less and non-reverted editors edit more is very likely due to the caliber of the edits made. Bad editors, editors that are clowning around or vandalizing, are apt to be reverted and also less likely to edit in the future than "good" editors making good edits, who aren't reverted and go on to edit more. The fact that one is reverted and the other isn't is incidental to the fact that the caliber of the groups, on average, is different. Now, if one were to study the effect of malicious or bad reversions upon future participation, that would be informative... Carrite (talk) 04:51, 30 September 2011 (UTC)[reply]
My point is that the first part is a good interpretation, but the second part (once recast as I do), is not a good comparison: since everyone gets reverted eventually (even it its due to making a page-breaking error, or because one accidentally waded into a can of worm), it is the normal behavior that as time without a revert rise, the probability that you will get reverted obviously will too. Circéus (talk) 06:48, 30 September 2011 (UTC)[reply]
Just to clear up potential confusion, my comment isn't in reply to yours. It's a separate note. I put mine here just because the heading was apropos. Jason Quinn (talk) 15:02, 30 September 2011 (UTC)[reply]
Oh... I'll put in a separator line to try and clarify that. Circéus (talk) 16:05, 30 September 2011 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0