The Signpost

Recent research

Edit war patterns, deleters vs. the 1%, never used cleanup tags, authorship inequality, higher quality from central users, and mapping the wikimediasphere

Contribute  —  
Share this
By Tilman Bayer, Piotr Konieczny, Evan and Daniel Mietchen

Dynamics of edit wars

Controversy about Michael Jackson as quantified on the basis of reverted edits to his Wikipedia article. A: Jackson is acquitted on all counts after five month trial. B: Jackson makes his first public appearance since the trial to accept eight records from the Guinness World Records in London, including Most Successful Entertainer of All Time. C: Jackson issues Thriller 25. D: Jackson dies in LA.

"Dynamics of Conflicts in Wikipedia"[1] develops an interesting "measure of controversiality", something that might be of interest to editors at large if it were a more widely popularized and dynamically updated statistic. The paper analyzes patterns of edit warring over Wikipedia articles. The authors conclude that edit warriors are usually willing to reach consensus, and that the rare cases of never-ending warring are those that continually attract new editors who have not yet joined the consensus.

The authors' decision to exclude from the study articles with under 100 edits because they are "evidently conflict-free" is questionable. Articles with fewer than 100 edits have been subject to clear, if not overly long, edit warring. A recent example is Concerns and controversies related to UEFA Euro 2012. It is also unfortunate that "memory effects" – a term mentioned only in the abstract and lead, and which the authors suggest is significant in understanding the conflict dynamic – is not explained in the article. The term "memory", by itself, appears four times in the body, but is not operationalized anywhere.

A press release accompanied the paper, entitled "Wikipedia 'edit wars' show dynamics of conflict emergence and resolution". An MSNBC tech news headline misleadingly, but sensationally, summarized it as "Wikipedia is editorial warzone, says study".

Who deletes Wikipedia?

In a recent blog post by Wibidata, an analytics startup based in San Francisco, the authors set out to shed light on the often-quoted claim that most of Wikipedia was written by a small number of editors, noting other editorial patterns along the way.[2] Using the entire revision history of English Wikipedia (they wanted to show that their platform can scale), the authors looked at the distribution of edits across editor cohorts, grouped by number of total edits. They found that from a pure count perspective, the most active 1% of editors had contributed over 50% of the total edits. (see original plot here)

In response to the suggestion that the strongly skewed distribution of edits might just be due to a core set of editors who primarily make only minor formatting modifications, they looked at the net number of characters contributed by each editor. Grouping editors by total number of edits as before, they showed an even more strongly skewed distribution, with the top 1% contributing well over 100% of the total number characters on Wikipedia (i.e. an amount of text that is larger than the current Wikipedia) and the bottom 95% of editors deleting more on average than they contributed (original plot). Next, the authors separated logged in users from non-logged in "users" (identified only by IP addresses) and recomputed the distribution of net character contributions. By edit-count cohort, logged-in users tended to contribute significantly more than their anonymous counterparts, and non-logged-in users tended to delete significantly more (original plot).

In summary, low-activity and new editors, along with anonymous users, tend to delete more than they contribute; this reinforces the notion that Wikipedia is largely the product of a small number of core editors.

Published in proceedings of *SEM, a computational semantics conference, researchers from the University of North Texas and Ohio University looked into the nature of interlingual links on Wikipedia, both reviewing the quality of existing links and exploring possibilities for automatic link discovery.[3] The researchers took the directed graph of interlingual links on Wikipedia and used the lens of set-theoretic operations to structure an evaluation of existing links, to build a system for automatic link creation. For example, they suggest that the properties of symmetry and transitivity should hold for the relation of interlingual linking. This means that if there is an interlingual link from language A to B, there should also be a link from B to A, and if there is a link from language A to B, and language B to C, then there should be a link from language A to C. (This assumption is routinely made by the many existing Interwiki bots.) They further refine the notion of transitivity, by grouping article pairs by the number of transitive 'hops' required to connect a candidate article pair.

Their methodology revolves around the creation of a sizeable annotated gold data set. Using these labels, they first evaluated the quality of existing links, finding between one half and one third to fail their criteria for legitimate translations. They then evaluated the quality of various implied links. For example, reverse links where they do not already exist satisfy their criteria for faithful translation only 68% of the time.

The gold data set was used to train a boosted decision-tree classifier for selecting good candidate pairs of articles. They used various network topology features to encode the information in interlingual links for a given topic and found that they can significantly beat the baseline, which uses only the presence of direct links (73.97% compared with 69.35% accuracy).

"Wikipedia Academy" preview

Various conference papers and posters from the upcoming "Wikipedia Academy" (hosted by the German Wikimedia chapter from June 29 to July 1 in Berlin) are already available online. A brief overview of those which are presenting new research about Wikipedia:

Posters

Researcher Felipe Ortega blogged[16] about a new parser for Wikipedia dumps, to be integrated into "WikiDAT (Wikipedia Data Analysis Toolkit) ... a new integrated framework to facilitate the analysis of Wikipedia data using Python, MySQL and R. Following the pragmatic paradigm 'avoid reinventing the wheel', WikiDAT integrates some of the most efficient approaches for Wikipedia data analysis found in libre software code up to now", which will be featured in a workshop at the conference.

Special issue of "Digithum" on Wikipedia research

The open-access journal "Digithum" (subtitled "The Humanities in the Digital Era") has published a special issue containing five papers about Wikipedia from various disciplines, with a multilingual emphasis (including research about non-English Wikipedias, and Catalan and Spanish versions of the papers alongside the English versions):

Briefly

The bonobo (here a juvenile) is amongst the species that the Flora and Fauna finder finds for Congo.

References

  1. ^ Yasseri, Taha; Sumi, Robert; Rung, András; Kornai, András; Kertész, János (2012). Szolnoki, Attila (ed.). "Dynamics of Conflicts in Wikipedia". PLOS ONE. 7 (6): e38869. arXiv:1202.3643. Bibcode:2012PLoSO...738869Y. doi:10.1371/journal.pone.0038869. PMC 3380063. PMID 22745683. Open access icon
  2. ^ Who Deletes Wikipedia?, June 6, 2012.
  3. ^ Dandala, B., Mihalcea, R., & Bunescu, R. (n.d.). Towards Building a Multilingual Semantic Network: Identifying Interlingual Links in Wikipedia. Retrieved from http://ixa2.si.ehu.es/starsem/proc/pdf/STARSEM-SEMEVAL004.pdf PDF
  4. ^ Maik Anderka, Benno Stein and Matthias Busse: On the Evolution of Quality Flaws and the Effectiveness of Cleanup Tags in the English Wikipedia (PDF) Open access icon
  5. ^ Iolanda Pensa: The Power of Wikipedia: Legitimacy and Territorial Control (PDF) Open access icon
  6. ^ Simeona Petkova: Individual and Cultural Memories on Wikipedia and Wikia, Comparative Analysis (PDF) Open access icon
  7. ^ Alexander Mehler, Christian Stegbauer and Rüdiger Gleim: Latent Barriers in Wiki-based Collaborative Writing (PDF) Open access icon
  8. ^ Bernardo Esteves and Henrique Cukierman: The climate change controversy through 15 articles of Portuguese Wikipedia (PDF) Open access icon
  9. ^ Guillermo Garrido, Enrique Alfonseca, Jean-Yves Delort and Anselmo Peñas: "Extracting Wikipedia Historical Attributes Data" (PDF) Open access icon
  10. ^ Fabian Flöck and Andriy Rodchenko: Whose article is it anyway? – Detecting authorship distribution in Wikipedia articles over time with WIKIGINI (PDF) Open access icon
  11. ^ Moritz Braun: Here be Trolls: Motives, mechanisms and mythology of othering in the German Wikipedia community (PDF) Open access icon
  12. ^ Carlos D'Andréa: Seft-organization and emergence in peer production: editing “Biographies of living persons” in Portuguese Wikipedia (PDF) Open access icon
  13. ^ Djordje Stakic: Biographical articles on Serbian Wikipedia and application of the extraction information on them (PDF) Open access icon
  14. ^ Stephan Ligl: Wikipedia article namespace – user interface now and a rhizomatic alternative (PDF)
  15. ^ Marc Miquel-Ribé, David Morera-Ruíz and Joan Gomà-Ayats: Extensive Survey to Readers and Writers of Catalan Wikipedia: Use, Promotion, Perception and Motivation (PDF) Open access icon
  16. ^ Ortega, Felipe: "Improving the extraction of Wikipedia data" libresoft.es, 2012-06-03
  17. ^ Marcia W. DiStaso, Marcus Messner: "Wikipedia’s Role in Reputation Management: An Analysis of the Best and Worst Companies in the United States" DIGITHUM, NO 14 (2012) Open access icon
  18. ^ Antoni Oliver, Salvador Climent: Using Wikipedia to develop language resources: WordNet 3.0 in Catalan and Spanish Open access icon
  19. ^ David Gómez Fontanills: "Panorama of the wikimediasphere" Open access icon
  20. ^ Nathaniel Tkacz: "The Truth of Wikipedia" Open access icon
  21. ^ Emilio José Rodríguez Posada, Ángel González Berdasco, Jorge A. Sierra Canduela, Santiago Navarro Sanz, Tomás Saorín: Wiki Loves Monuments 2011: the experience in Spain and reflections regarding the diffusion of cultural heritage. Digithum, no. 14 (May, 2012), p. 94. Open access icon
  22. ^ Morton-Owens, E. G. (2012). A tool for extracting and indexing spatio-temporal information from biographical articles in Wikipedia. New York University. PDF
  23. ^ "How Big Data Sees Wikipedia". 14 June 2012.
  24. ^ Vrandečić, D. (2012). Ratio of language links to full text in Wikipedias" simia.net, June 2012
  25. ^ Qin, Xiangju; Cunningham, Pádraig (2012). "Assessing the Quality of Wikipedia Pages Using Edit Longevity and Contributor Centrality". arXiv:1206.2517 [cs.SI].
  26. ^ Hensel, T. (2012, March 11). Impact of duration of the search on the trust judgment of Wikipedia articles. Retrieved from http://essay.utwente.nl/61602/1/Hensel%2C_T.N.C.H._%2D_s0170860_(verslag).pdf
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

"Wikipedia Academy" preview

  • Page 7 of "On the Evolution of Quality Flaws and the Effectiveness of Cleanup Tags in the English Wikipedia" makes for interesting reading for editors interested in knocking over some smaller clean-ups. Fifelfoo (talk) 03:23, 26 June 2012 (UTC)[reply]

Who deletes Wikipedia

  • Per: "In summary, low-activity and new editors, along with anonymous users, tend to delete more than they contribute; this reinforces the notion that Wikipedia is largely the product of a small number of core editors." - So when is the foundation going to stop obsessing about making Wikipedia ultra-friendly to IP editing and to start getting serious about studying who its content-creators actually are and what tools they need to do their jobs better??? Here's one hint: we're older than you think we are. Here's another: we need access to JSTOR. Carrite (talk) 23:52, 26 June 2012 (UTC)[reply]
    I think you're presuming the Foundation wants to keep its current content creators. ;-) Killiondude (talk) 06:06, 27 June 2012 (UTC)[reply]
    Well duh, Killiondude. If instead you counted on anon IPs and part-time contributors, I think the study shows pretty clearly that we soon wouldn't have much left to look at! MeegsC | Talk 11:27, 27 June 2012 (UTC)[reply]
    There's a good reason for the Foundation's advocacy of the myth that anyone can contribute & improve Wikipedia: funding from major foundations. It's easier to get a charitable grant for an encyclopedia "anyone can edit" than one only a select few can. Money does influence content & presentation in ways undreamed of by your philosophy, Horatio. -- llywrch (talk) 16:58, 27 June 2012 (UTC)[reply]
  • RE: "A small number of core editors" - Sounds like the recent 1%ers vs. 99%ers issue with the 99%ers trying to take what the 1%ers contribute. -- Uzma Gamal (talk) 12:45, 27 June 2012 (UTC)[reply]
  • I'll just chime in here with WP:HUMAN and mention that I'm an IP that;
Just wanted to point that out for people that aren't aware of the contributions of IPs. 64.40.54.121 (talk) 06:28, 28 June 2012 (UTC)[reply]
  • Anyone else notice that the contribution percentages might be skewed by reversions of page- and section-blanking? Some of the most "prolific" contributors might just be wikignomes who revert blanking vandalism. Powers T 12:32, 28 June 2012 (UTC)[reply]
    • Yes, blanking a page or reverting it are going to look much more dramatic in this methodology than actual content edits (or justfied removals), if every character gained or lost is equal. So it doesn't really show you who the content contributors are. The study shows that accounts with only a few edits are net contributors--but that probably just shows that blanking vandals are less likely to register. And even if the content contributors are a select group, the "1%", you still need to find ways of steadily injecting that group with new blood or else it will be lost through attrition (regardless of whether the WMF caters to the preferences of current contributors, there's going to be a natural dropoff rate). The track record of Sanger's Citizendium model is clear, cliquishness does not build an encyclopedia. 169.231.98.141 (talk) 19:12, 28 June 2012 (UTC)[reply]
      • Since much of the text in wikipedia is interwiki links, I can guess that persons running interwiki scripts can run up a huge % of the total text.
      • Also, 1% writes 100%?? Sounds like a problem with counting methods. If someone adds a new fact, and I completely rewrite his text, is it counted as text written only by me? --Enric Naval (talk) 09:44, 30 June 2012 (UTC)[reply]
  • I've often wondered who the 1% are 20% of. I mean, 20% of the people doing 80% of the work is an old story. There is no bright line saying where the community ends or begins. So it's as if, maybe, the 5% are those we should concentrate on? Where does this lead? Charles Matthews (talk) 20:47, 28 June 2012 (UTC)[reply]
the law is generally recursive: of that 20%, 20% of them will edit 80% of the 80% of total articles--in other words, it is expected that 4% of the editors will be responsible for 64% of the content. WP is similar to other human activities. It would take a highly artificial structure to do otherwise. DGG ( talk ) 18:05, 29 June 2012 (UTC)[reply]
The thing about counting up how many characters were added by which editors really should have also mentioned an earlier similar study by Aaron Swartz.[1] In both cases I have some doubts about the methodology and in particular in the more recent study, I'd like to know if there was some attempt to separate out additions made by automated scripts rather than human editors. 69.228.171.149 (talk) 07:40, 2 July 2012 (UTC)[reply]

Inline templates

  • The story about the citation needed template isn't all that surprising. Quality content contributors (like those working on FAs) often use or encourage the use of profligate cn tagging on a specific article, so they can identity claims that need referencing in what is often an already densely referenced article. They then go about replacing all the tags with references. By contrast, unreferenced is usually a well-intentioned drive-by tag, left for some unidentified individual who has knowledge of the subject and may never respond. --Dweller (talk) 11:51, 28 June 2012 (UTC)[reply]

Typo

Fixed. -- Daniel Mietchen - WiR/OS (talk) 11:49, 17 July 2012 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0