The Signpost

Recent research

Overview of research on Wikipedia's readers; predicting which article you will edit next

Contribute  —  
Share this
By Piotr Konieczny, Maximilian Klein and Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"Wikipedia in the eyes of its beholders: A systematic review of scholarly research on Wikipedia readers and readership"

This paper[1] is another major literature review of the field of Wikipedia studies, brought forward by the authors whose prior work on this topic, titled "The People’s Encyclopedia Under the Gaze of the Sages"[supp 1] was reviewed in this research report in 2012 ("A systematic review of the Wikipedia literature").

This time the authors focus on a fragment of the larger body of works about Wikipedia, analyzing 99 works published up to June 2011 on the theme of "Wikipedia readership" – in other words focusing on the theme "What do we know about people who read Wikipedia". The overview focuses less on demographic analysis (since little research has been done in that area), and more on perceptions of Wikipedia by surveyed groups of readers. Their findings include, among other things, a conclusion that "Studies have found that articles generally related to entertainment and sexuality top the list, covering over 40% of visits", and in more serious topics, it is a common source for health and legal information. They also find that "a very large number of academic in fact have quite positive, if nuanced, perceptions of Wikipedia’s value." They also observe that the most commonly studied group has been that of students, who offer a convenience sample. The authors finish by identifying a number of contradictory findings and topics in need of further research, and conclude that existing studies have likely overestimated the extent to which Wikipedia's readers are cautious about the site's credibility. Finally, the authors offer valuable thoughts in the "implications for the Wikipedia community" section, such suggesting "incorporating one or more of the algorithms for computational estimation of the reliability of Wikipedia articles that have been developed to help address credibility concerns", similar to the WikiTrust tool.

The authors also published a similar literature review paper summarizing research about the content of Wikipedia, which we hope to cover in the next issue of this research report.

Chinese-language time-zones favor Asian pop and IT topics on Wikipedia

Map of the Chinese-speaking world

A paper[2] presented at the WWW 2014 Companion Conference analyzes the readership patterns of the English and Chinese Wikipedias, with a focus on which types of articles are most popular in the English- or Chinese-language time zones. The authors used all Wikipedia pages which existed under the same name in both languages in the period from 1 June 2012 to 14 October 2012 for their study, coding them through the OpenCalais semantic analysis service with an estimated 2.6% error rate.

The authors find that readers of the English and Chinese Wikipedias from time-zones of high Chinese activity browse different categories of pages. Chinese readers visit English Wikipedia about Asian culture (in particular, Japanese and Korean pop culture) more often, as well as about mobile communications and networking technologies. The authors also find that pages in English are almost ten times as popular as those in Chinese (though their results are not identifying users by nationality directly, rather focusing on time zone analysis).

In this reviewer's opinion, the study suffers from major methodological problems that are serious enough to cast all the findings in doubt. Apparently because the authors were unaware of Interlanguage links and consider only articles which have the same name (URL) in both the English and Chinese Wikipedians, they find that only 7603 pages were eligible to be analyzed (as they had both an English and Chinese version), however the Chinese Wikipedia in the studied period had approximately half a million articles; and while many don't have English equivalents yet, to expect that less than 2% did seems rather dubious. Similarly, our own WikiProject China estimates that English Wikipedia has almost 50,000 China-related articles. That, given that WikiProject assessments are often underestimating the number of relevant topics, and usually don't cover many core topics, suggests that the study missed a vast majority of articles that exist in both languages. It is further unclear how English- and Chinese-language time-zones were operationalized. The authors do not reveal how, if at all, they controlled for the fact that readers of English Wikipedia can also come from countries where English is not a native language, and that there are hundreds of millions of people outside China who live in the five time zones that span China, which overlap with India, half of Russia, Korea and major parts of Southeast Asia. As such, the findings of that study can be more broadly interpreted as "readership patterns of English and Chinese Wikipedia in Asia and the the world, regarding a small subset of pages that exist on both English and Chinese Wikipedia."

"Bipartite editing prediction in Wikipedia"

Reviewed by Maximilianklein (talk)

Bipartite Editing Prediction in Wikipedia[3] is a paper wherein the authors aim to solve what they call the "link prediction problem". Essentially they aim to answer "which editors will edit which articles in the future." They claim the social utility of this is to suggest articles to edit to users. So in some ways this is a similar function to SuggestBot, but using different techniques.

Their approach here is to use a bipartite network modelling. A bipartite network is a network with two node-types, here editors and articles. Using bipartite network modelling is becoming increasingly trendy, like Jesus (2009)[supp 2] and Klein (2014).[supp 3]

Explaining their method, the researchers outline their two approaches: "supervised learning" and "community awareness". In the supervised learning approach the machine learning features used are Association Rule, K-nearest neighbor, and graph partitions. All these features, they state, can be inferred directly from the bipartite network. In the community awareness approach, the Stanford Network Analysis Project tool is used to cut the network into co-editor sets, and then go on to inspect what they call indirect features which are sum of neighbors, Jaccard coefficient, preferred attachment, and Adamic–Adar score.

The authors proceed to give a table of their results, and highlight their highest achieving precision, and recall statistics which are moderate and contained in the interval [.6, .8]. Thereafter a short non-interpretive one-paragraph discussion concludes the paper saying that these results might be useful. Unfortunately they are not of much use, since while they declare their sample size of 460,000 editor–article pairs from a category in a Wikipedia dump, they don't specify which category, or even which Wikipedia they are working on.

This machine learning paper lacks sufficient context or interpretation to be immediately valuable, despite the fact that they may be able to predict with close to 80% F-measure which article you might edit next. Therefore the paper is a good example of the extent to use Wikipedia for research without even feigning attempt to make the research useful to the Wikipedia community, or even frame it in that way.


A reading room in the University of Pittsburgh's Hillman Library

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.


  1. ^ Okoli, Chitu and Mehdi, Mohamad and Mesgari, Mostafa and Nielsen, Finn Årup and Lanamäki, Arto (2014): Wikipedia in the eyes of its beholders: A systematic review of scholarly research on Wikipedia readers and readership. Journal of the American Society for Information Science and Technology . ISSN 1532-2882 (In Press) PDF
  2. ^ Tinati, Ramine; Paul Gaskell; Thanassis Tiropanis; Olivier Phillipe; Wendy Hall (2014). "Examining Wikipedia across linguistic and temporal borders". Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. WWW Companion '14. International World Wide Web Conferences Steering Committee. pp. 445–450.
  3. ^ CHANG, YANG-JUI; YU-Chuan Tsai; Hung-Yu Kao (May 2014). "Bipartite editing prediction in Wikipedia". Journal of Information Science and Engineering. 30 (3): 587-603.
  4. ^ Galloway, Ed; Cassandra DellaCorte (2014-05-02). "Increasing the discoverability of digital collections using Wikipedia: the Pitt experience". Pennsylvania Libraries: Research & Practice. 2 (1): 84–96. doi:10.5195/palrap.2014.60. ISSN 2324-7878.
  5. ^ Seo-Young Lee, Sang-Ho Lee, "A Comparison Study on the Key Factors for Success of Social Authoring Systems – focusing on Naver KiN and Wikipedia", AISS: Advances in Information Sciences and Service Sciences, Vol. 5, No. 15, pp. 137 ~ 144, 2013,PDF
  6. ^ cgcsblog. "Citation-Filtered". Retrieved 31 May 2014.
  7. ^ Olivier Van Laere, Steven Schockaert, Vlad Tanasescu, Bart Dhoedt, Christopher B. Jones: Georeferencing Wikipedia documents using data from social media sources. Preprint, acccepted for publication in: ACM Transactions on Information Systems, Volume 32 Issue 3 PDF
  8. ^ Halfaker, Aaron; R. Stuart Geiger; Loren Terveen (2014-04-28). "Snuggle: designing for efficient socialization and ideological critique" (PDF). CHI: Conference on Human Factors in Computing Systems. doi:10.1145/2556288.2557313.
  9. ^ Xu, Danyun; Gong Cheng; Yuzhong Qu (March 2014). "Preferences in Wikipedia abstracts: empirical findings and implications for automatic entity summarization". Information Processing & Management. 50 (2): 284–296. doi:10.1016/j.ipm.2013.12.001. ISSN 0306-4573. Closed access icon
  10. ^ Alguliev, R. M.; R. M. Aliguliyev; I. Ya Alekperova (2014-03-01). "Cluster approach to the efficient use of multimedia resources in information warfare in wikimedia". Automatic Control and Computer Sciences. 48 (2): 97–108. doi:10.3103/S0146411614020023. ISSN 0146-4116. Closed access icon
  11. ^ de Laat, Paul B. (2014-04-22). "From open-source software to Wikipedia: 'backgrounding' trust by collective monitoring and reputation tracking". Ethics and Information Technology: 1–13. doi:10.1007/s10676-014-9342-9. ISSN 1388-1957. Closed access icon
Supplementary references:
  1. ^ Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F., & Lanamäki, A. (2012, October 24). The People’s Encyclopedia Under the Gaze of the Sages: A Systematic Review of Scholarly Research on Wikipedia. SSRN Scholarly Paper, Montreal.
  2. ^ Rut Jesus; Martin Schwartz; Sune Lehmann (2009). "Bipartite networks of Wikipedia's articles and authors: a meso-level approach" (PDF). {{cite journal}}: Cite journal requires |journal= (help)
  3. ^ Klein. "Measuring Editor Collaborativeness With Economic Modelling".
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • I would really like to see them predict what I am going to edit in the future (apart from the CSD talk page...). (I might also be interested to see what Google considers suitable advertising to target me with - based on MY search subjects, it could be rather surreal. But I'm going to leave AdBlock Plus turned on and enjoy browsing without all the flashing irrelevant ads for crap that I would not want to buy - I don't buy from ads even if I want something.) I'm damn sure the money spent in this exercise could have been used elsewhere to better effect. Yes, you can predict that someone who only edits football will go on to edit football (unless he gets a new girlfriend who is dominatingly into cross-stitch and Thomas Hardy...). That's hardly rocket science, and is the basis of DUCK blocks on a lot of socks and possibly one or two unfortunate wrong place at wrong time editors. I have doubts about the validity of SuggestBot's suggestions, but haven't signed up or investigated it. But I have even more doubts about this outside thing, and wonder how well they actually know this place. Peridon (talk) 13:19, 1 June 2014 (UTC)[reply]


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0