The Signpost

Recent research

Predicting admin elections; studying flagged revision debates; classifying editor interactions; and collecting the Wikipedia literature

Contribute  —  
Share this
By Tilman Bayer, Dario Taraborelli, Jodi.a.schneider, Nicolas Jullien and Piotr Konieczny
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.

How editors evaluate each other: effects of status and similarity

A team of social computing researchers based at Stanford and Cornell University studied how users evaluate each other in social media.[1] Their paper, presented at the 5th ACM Web Search and Data Mining Conference (WSDM '12), focuses on three main case studies: Wikipedia, StackOverflow and Epinions. User-to-user evaluations, the authors note, are jointly influenced by the properties of the evaluator and the target; as a result, differences in properties between the target and the evaluator should be expected to affect the evaluation. The study looks specifically at how differences in topic expertise and status affect peer evaluations. The Wikipedia case focuses on requests for adminship (RfAs), the most prominent example of peer evaluation in Wikipedia and a topic that has attracted considerable attention in the literature (Signpost research coverage: September 2011, October 2011, January 2012). Similarity is measured based on article co-authorship, and status as a function of an editor's number of contributions. Previous research by the same authors showed that the probability an evaluator will evaluate a target user positively drops dramatically when the status of the two users is very similar, and there is general evidence that homophily and similarity in editing activity have a strong influence on peer evaluation in RfAs. The study identifies two effects that jointly account for this singular finding:

In a direct application of these results, dubbed ballot-blind prediction, the authors show how the outcome of an RfA can be accurately predicted by a model that simply considers the first few participants in a discussion and their attributes, without looking at their actual evaluations of the target.

Sociological analysis of debates about flagged revisions in the English, German and French Wikipedias

Icon for accepted
At the center of debates on "Coercion or empowerment": Icons signifying accepted (left) and not yet accepted (right) revisions under a flagged revisions scheme

In an article to appear in Ethics and Information Technology, Paul B. de Laat analysed debates occurring in the English, German and French Wikipedias about the evolution of the rules governing new edits.[2] As noted by the analysis of the English Wikipedia's rules, by Butler et al., 2008,[3] these rules are numerous and have increased in number and complexity; they range from the more formal and explicit (intellectual property rights) to the more informal.

De Laat's work is based on a study of the discussions around the proposal to introduce a system of reviewing edits before they appear on screen (flagged revisions, discussed on English Wikipedia at Wikipedia:Flagged revisions). It focuses on the perennial debate around the construction of knowledge commons theorized by Elinor Ostrom:[4] being a collective, open project, it must be accessible to most, but as its production becomes important for its "owners" (readers and producers), boundaries have to be set to protect its integrity. De Laat's article describes and analyzes the tensions and permanent adjustments needed to manage these apparently opposed goals.

In a Weberian analysis of bureaucracy, applicable to Wikipedia policies, he shows that two views can be invoked to explain the intensity of the discussions. He summarizes the debate as a clash between (i) those who saw the flagged revisions as "a useful tool for curbing vandalism", enabling and empowering users and editors, and (ii) those who denounced it as "a superfluous bureaucratic device that violates egalitarian principles of participation", designed to introduce a more controlled and hierarchical environment. He muses that "an intriguing question that remains to be answered, of course, is: What brought the three language communities to ultimately choose or reject such a review system? Why is it that, each in their own ways, the Germans voted for acceptance, the French for rejection, while the English have been wavering all the time between acceptance and rejection"? (p. 11) This question, and Wikipedians' views of flagged revisions, can shine light onto what kind of community Wikipedia should be, according to various factions of editors. As De Laat answers it, "many of those who reject the system of review do so from a vision of Wikipedia as an unbounded community that shares knowledge without mutual control and suspicion, while many of those who embrace the review system do so because they have a vision of Wikipedia as an organization producing reliable knowledge that keeps vandalism outside its borders". De Laat suggests that further research is needed to fully understand the factors affecting the decisions on different Wikipedias taken with regard to flagged revisions, postulating a hypothesis to be tested in further research that "those whose mother tongue is German may possibly be more deferential to hierarchy than those who speak either French or English, and therefore may prefer the order and respectability introduced by a system of reviewing".

In a paper published by the European chapter of the Association for Computational Linguistics,[5] Oliver Ferschke and coauthors describe a study of talkpages on the Simple English Wikipedia. This paper uses speech act theory and dialog acts as a theoretical framework for studying how authors use discussion pages to collaborate on article improvement. They have released a freely downloadable corpus of 100 segmented and annotated talk pages, called the Simple English Wikipedia Discussion Corpus and based on a new annotation schema for coordination-related dialog acts. Their schema uses 17 categories, grouped into these four top-level categories: article criticism, explicit performative announce, information content, and the interpersonal. The authors use their corpus to develop a machine-learning-based UIMA pipeline for dialog act classification, which they describe but which is not freely available. They provide a useful discussion of conversational implicature theory and good pointers to seminal and new research in dialog acts. (A longer, editable summary is available on AcaWiki.)

Majority of UK academics prohibit students from using Wikipedia, but use it just as frequently themselves

An article appearing in "Teaching in Higher Education"[6] "discusses the use of Wikipedia by academics and students for learning and teaching activities at Liverpool Hope University, [considering] the findings to be indicative of Wikipedia use at other British universities". Having sent email invitations to all staff and students at the university, they received responses from 133 academics and 1222 students. 75% of the student respondents said they used Wikipedia for "some purpose", which according to the authors indicates that Wikipedia use "has risen appreciably in a short period of time" among British university students, citing a 2009 study[7] which had put that number at only 17.1%. "However", they cautioned, usage was "significantly lower than usage in the USA."

Among the surveyed teaching staff, almost the same percentage (74%) used Wikipedia "for some purpose" as their students—but just 24% of them "tell their students to use Wikipedia for Learning and Teaching purposes, with 18% having not mentioned it to students and 58% having expressly told them not to." The independence of academics' answers to these two questions is highlighted by the authors as

"a key finding of the survey: there is little difference between academics that permit their students to use Wikipedia and those who do not in respect to their own use. In particular, amongst both groups, academics that used Wikipedia ‘frequently’ seem to exhibit similar usage profiles. It was indicated in the commentary that the critical difference was that they have the scholarly expertise to determine what material on Wikipedia was ‘correct’ and that which was not."

In the conclusion the authors observe that "a significant proportion of what we would see as enlightened academics at Liverpool Hope and no doubt elsewhere realise that it is pointless to try to hold back the online tide of Wikipedia. Instead, they try to give guidance in the way that students consult it: for clarification, references, comparison and definitions."

A systematic review of the Wikipedia literature

"The people's encyclopedia under the gaze of the sages: A systematic review of scholarly research on Wikipedia"[8] is the title of a working paper which promises to be a major milestone in Wikipedia research. It is an attempt to synthesize a broad-based literature review of scholarly research on Wikipedia. The task of creating a comprehensive database of such publications has seen several efforts before and its difficulties were explored in a well-attended workshop at last year's WikiSym conference (see the October issue of this research report).

The authors intend to release their findings in a "Web 2.0" format through their wiki by the end of May 2012. The current paper is impressive in scope, but at 71 pages badly in need of a table of contents (the current version does not seem to adhere to any consistent manual of style, with headings using different font sizes and even colors) and clarifications (the current distinction between findings on p.12 and discussion on p. 19 seems somewhat arbitrary; the authors at one point promise a discussion of over 2,000 articles and in other places talk of a sample of 139) – perhaps due to its genesis (see below). Keeping in mind this is just a draft paper, we hope the final paper will have an improved flow and transparency. The presented methodology is useful for those interested in learning how to analyze large, thematic bodies of work using online databases. In one of their major contributions, the authors intend to present an overview of Wikipedia research grouped by themes (keywords), such as for example discussing research done on "vandalism reversion", "thesaurus construction" or "attitude towards Wikipedia". While the current draft is not yet comprehensive, it shows much potential, and in practice their wiki, which already groups the content with categories, may prove more useful as a reference work.

As explained by one of the authors, the paper merges two existing efforts, both of which already published drafts last year. And by choosing as their platform, they embrace the work of a third party, Wikimedian User:Emijrp's "Wikipapers" wiki on the same domain. This follows discussions between the three parties reported in the January issue of this research report ("New effort at comprehensive wiki research literature database"). On the wiki, the authors acknowledge the modest efforts of a fourth party, namely this research report (which just released a dataset of all publications covered until the end of 2011): "We do not include any items published after June 2011, after which the Wikimedia Research Newsletter was formally inaugurated; we're letting them pick up from where we stop."



  1. ^ Anderson, A., Huttenlocher, D., Kleinberg, J., & Leskovec, J. (2012). Effects of user similarity in social media. Proceedings of the fifth ACM international conference on Web search and data mining - WSDM '12(p. 703). New York, New York, USA: ACM Press. DOIPDF Open access icon
  2. ^ de Laat, P. B. (2012). Coercion or empowerment? Moderation of content in Wikipedia as 'essentially contested' bureaucratic rules. Ethics and Information Technology, 1–13. Springer Netherlands. DOI Open access icon
  3. ^ Butler, B., Joyce, E., & Pike, J. (2008). Don't look now, but we've created a bureaucracy: The nature and roles of policies and rules in Wikipedia. Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems – CHI '08 (p. 1101). New York, New York, USA: ACM Press. DOIPDF Open access icon
  4. ^ Hess, Charlotte and Ostrom, Elinor (2006) A Framework for Analyzing the Knowledge Commons, in Hess, C., & Ostrom, E. (Eds.). Understanding Knowledge as a Commons: From Theory to Practice. MIT Press, 2006, pp. 41–81 Closed access icon
  5. ^ Ferschke, O., Gurevych, I., & Chebotar, Y. (2012). Behind the Article: Recognizing Dialog Acts in Wikipedia Talk Pages. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012). PDF Open access icon
  6. ^ Knight, C., & Pryke, S. (2012). Wikipedia and the University, a case study. Teaching in Higher Education, 1–11. Routledge. DOI Closed access icon
  7. ^ Hampton-Reeves, S., Mashiter, C., Westaway, J., Lumsden, P., Day, H., Hewertson, H., & Hart, A. (2009). Students' Use of Research Content in Teaching and Learning Behaviour. JISC, PDF Open access icon
  8. ^ Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F. Å., & Lanamäki, A. (2012). The people's encyclopedia under the gaze of the sages: A systematic review of scholarly research on Wikipedia. SSRN eLibrary. SSRN. HTML Open access icon
  9. ^ Tzekou, P., Stamou, S., Kirtsis, N., & Zotos, N. (2011). "Quality assessment of Wikipedia external links." In J. Cordeiro & J. Filipe (Eds.), WEBIST 2011, Proceedings of the 7th International Conference on Web Information Systems and Technologies (pp. 248-254). PDF Open access icon
  10. ^ Hauff, C., & Houben, G.-J. (2012). Serendipitous Browsing: Stumbling through Wikipedia. In D. Elsweiler, M. L. Wilson, & M. Harvey (Eds.), Proceedings of the “Searching 4 Fun!” workshop, collocated with the annual European Conference on Information Retrieval (ECIR2012) Barcelona, Spain, April 1, 2012. (pp. 21-24) PDF Open access icon
  11. ^ Knäusl, H. (2012). Searching Wikipedia: Learning the Why, the How, and the Role Played by Emotion. In D. Elsweiler, M. L. Wilson, & M. Harvey (Eds.), Proceedings of the "Searching 4 Fun!" Workshop, collocated with the annual European Conference on Information Retrieval (ECIR2012) Barcelona, Spain, April 1, 2012. (pp. 14-15). PDF Open access icon
  12. ^ Priem, J., Piwowar, H. A., & Hemminger, B. H. (2012). Altmetrics in the Wild: Using Social Media to Explore Scholarly Impact. ArXiV. PDFOpen access icon
  13. ^ Florin, F., Fung, H., Halfaker, A., Keyes, O., & Taraborelli, D. (2012). Helping readers improve Wikipedia: First results from Article Feedback v5. Wikimedia Foundation blog. HTML Open access icon
  14. ^ Hall, M. M., Clough, P. D., Lopez de Lacalle, O., Soroa, A., & Agirre, E. (2012). Enabling the Discovery of Digital Cultural Heritage Objects through Wikipedia. PDF Open access icon
  15. ^ Good, B. M., Clarke, E. L., Loguercio, S., & Su, A. I. (2012). Building a biomedical semantic network in Wikipedia with Semantic Wiki Links. Database : The Journal of Biological Databases and Curation, 2012, DOIOpen access icon
  16. ^ Morsey, M., Lehmann, J., Auer, S., Stadler, C., & Hellmann, S. (2012). DBpedia and the Live Extraction of Structured Data from Wikipedia. Program: Electronic library and information systems, 46(2), 2. Emerald Group Publishing Limited. PDF Closed access icon
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0