The Signpost

Recent research

The chilling effect of surveillance on Wikipedia readers

Contribute   —  
Share this
By Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Chilling effects: The impact of surveillance awareness on Wikipedia pageviews

A paper in the Berkeley Technology Law Journal[1] finds that the traffic to privacy-sensitive articles on the English Wikipedia dropped significantly around June 2013, when the existence of the US government's PRISM online surveillance program was first revealed based on documents leaked by Edward Snowden. As stated by the author, Jon Penney, the study "is among the first to evidence—using either Wikipedia data or web traffic data more generally—how government surveillance and similar actions may impact online activities, including access to information and knowledge online." It received wide media attention upon its release, as already reported last year in the Signpost.

The paper is part of a growing body of literature that studies the effect of external events on Wikipedia pageviews (for another example, see our previous issue: "How does unemployment affect reading and editing Wikipedia ? The impact of the Great Recession"). The 66-page paper stands out for its methodological diligence, devoting much space to explaining and justifying its data selection and statistical approach, and to checking the robustness of the results. The framework was adapted from an earlier MIT study that had similarly examined the effect of the Snowden revelations on Google search traffic for sensitive terms, finding a statistically significant reduction of 5%. The author emphasizes the higher quality of the Wikipedia data: "unlike Google Trends, the Wikimedia Foundation provides a wealth of data on key elements of its site, including article traffic data, which can provide a more accurate picture as to any impact or chilling effects identified."

To generate a list of Wikipedia articles that could be considered privacy sensitive in the context of US government surveillance, the author used a (publicly available) set of terms that the Department of Homeland Security (DHS) specifies as related to terrorism. The corresponding Wikipedia articles (48 altogether) include dirty bomb, suicide attack, nuclear enrichment (a redirect) and eco-terrorism. To verify the assumption that these topics are indeed considered as privacy sensitive by Internet users, a survey among 415 Mechanical Turk users asked them to rate each, e.g. on whether they would be likely to delete their browser history after accessing information about it.

To examine the impact on traffic, the paper uses the time series of monthly pageviews for the 48 articles (81 million views altogether, from January 2012 to August 2014). It is divided into the periods before and after the June 2013 "exogeneous shock". As a first finding, the author notes that the average monthly views in the "after" period are lower - but points out that such considerations (which e.g. form part of the difference in differences approach in the paper on unemployment mentioned above) are too simplistic to show an actual effect, e.g. because this could merely be caused by an overall declining traffic trend. (Although not stated directly in the paper, this is indeed the case, as the study is only based on desktop pageviews, which have been gradually replaced by mobile views in recent years. The Wikimedia Foundation makes combined mobile/desktop pageview datasets available going back to 2015.)

The author then turns to a more sophisticated statistical method known as interrupted time series analysis (ITS). It involves a "segmented regression analysis": linear trend lines are calculated separately for the timespans before and after June 2013, providing information both on the slope (growth/decrease rate) within each and on the size of the mismatch (if any) where the two segments intersect. This method indicates "an immediate drop-off of over 30% of overall views" following the June 2013 revelations. To further exclude the possibility that the results for these terrorism-related articles "may simply reflect overall Wikipedia article view traffic trends", an analogous ITS analysis is conducted for the pageviews to all Wikipedia articles.

The author points out the importance of the results for the Wikimedia Foundation's current lawsuit that challenges the constitutionality of the NSA surveillance of Internet traffic.

See also our review of a recent qualitative study that examined the privacy concerns of editors: "Privacy, anonymity, and perceived risk in open collaboration: a study of Tor users and Wikipedians"

Briefly

Conferences and events

See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.

The Komodo dragon is the most popular reptile according to Wikipedia pageview data

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions are always welcome for reviewing or summarizing newly published research.

Logo of the Art+Feminism editathons

References

  1. ^ Penney, Jon (2016-06-01). "Chilling Effects: Online Surveillance and Wikipedia Use". Berkeley Tech. L.J. doi:10.15779/Z38SS13.
  2. ^ Roll, Uri; Mittermeier, John C.; Diaz, Gonzalo I.; Novosolov, Maria; Feldman, Anat; Itescu, Yuval; Meiri, Shai; Grenyer, Richard. "Using Wikipedia page views to explore the cultural importance of global reptiles". Biological Conservation. doi:10.1016/j.biocon.2016.03.037. ISSN 0006-3207. Closed access icon
  3. ^ Massa, Paolo; Zelenkauskaite, Asta (2014-03-19). "Gender Gap in Wikipedia Editing: A Cross Language Comparison" (PDF). Global Wikipedia: International and Cross-Cultural Issues in Online Collaboration. 3/19/14. p. 12.
  4. ^ Tramullas, Jesús; Garrido-Picazo, Piedad; Sánchez-Casabón, Ana I. (2016). "Research on Wikipedia Vandalism: a brief literature review". Proceedings of the 4th Spanish Conference on Information Retrieval CERI 2016. Granada, Spain: ACM. doi:10.1145/2934732.2934748. Closed access icon
  5. ^ Thomas, Paul (2016-09-15). "Wikipedia and participatory culture: Why fans edit". Transformative Works and Cultures. 22 (0). doi:10.3983/twc.2016.0902. ISSN 1941-2258.
  6. ^ Rodriguez-Hernandez, Ismael; Trillo-Lado, Raquel; Yus, Roberto (2016). WikInfoboxer: A Tool to Create Wikipedia Infoboxes Using DBpedia (PDF). University of Zaragoza, Zaragoza, Spain. p. 4.
  7. ^ Atzori, Maurizio; Gao, Shi; Mazzeo, Giuseppe M.; Zaniolo, Carlo (2016). Answering End-User Questions, Queries and Searches on Wikipedia and its History (PDF). Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. p. 12.
  8. ^ Gieck, Robin; Kinnunen, Hanna-Mari; Li, Yuanyuan; Moghaddam, Mohsen; Pradel, Franziska; Gloor, Peter A.; Paasivaara, Maria; Zylka, Matthäus P. (2016). "Cultural Differences in the Understanding of History on Wikipedia" (PDF). In Matthäus P. Zylka; Hauke Fuehres; Andrea Fronzetti Colladon; Peter A. Gloor (eds.). Designing Networks for Innovation and Improvisation. Springer Proceedings in Complexity. Springer International Publishing. pp. 3–12. ISBN 9783319426969.
  9. ^ Jatowt, Adam; Kawai, Daisuke; Tanaka, Katsumi (2016). "Predicting Importance of Historical Persons Using Wikipedia". Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. CIKM '16. New York, NY, USA: ACM. pp. 1909–1912. doi:10.1145/2983323.2983871. ISBN 9781450340731. Closed access icon
  10. ^ Stanisavljevic, Darko; Hasani-Mavriqi, Ilire; Lex, Elisabeth; Strohmaier, Markus; Helic, Denis (2016-11-30). "Semantic Stability in Wikipedia". In Hocine Cherifi; Sabrina Gaito; Walter Quattrociocchi; Alessandra Sala (eds.). Complex Networks & Their Applications V. Studies in Computational Intelligence. Springer International Publishing. pp. 379–390. ISBN 9783319509006. Closed access icon
  11. ^ Oliveira, João Marcos de; Gloor, Peter A. (2016). "The Citizen IS the Journalist: Automatically Extracting News from the Swarm". In Matthäus P. Zylka; Hauke Fuehres; Andrea Fronzetti Colladon; Peter A. Gloor (eds.). Designing Networks for Innovation and Improvisation. Springer Proceedings in Complexity. Springer International Publishing. pp. 141–150. ISBN 9783319426969. Closed access icon
  12. ^ Suyehira, Kelsey; Spezzano, Francesca (2016). "DePP: A System for Detecting Pages to Protect in Wikipedia". Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. CIKM '16. New York, NY, USA: ACM. pp. 2081–2084. doi:10.1145/2983323.2983914. ISBN 9781450340731. Closed access icon
  13. ^ Farzan, Rosta; Savage, Saiph; Saviaga, Claudia Flores (2016). Bring on Board New Enthusiasts! A Case Study of Impact of Wikipedia Art + Feminism Edit-A-Thon Events on Newcomers. SocInfo’16. Vol. Part I, LNCS 10046, pp. 24–40, 2016. Springer International Publishing. p. 17. doi:10.1007/978-3-319-47880-7_2. Closed access icon (direct PDF download)


S
In this issue
+ Add a comment

Discuss this story

To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
No comments yet. Yours could be the first!







       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0