A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
Tl;dr: Users, informed consent and privacy policies online
Reviewed by Kim Osman
In new research conducted in light of proposed changes to data protection legislation in the European Union (EU), authors Bart Custers, Simone van der Hof, and Bart Schermer conducted a comparative analysis of social media and user-generated content websites’ privacy policies along with a user survey (N=8,621 in 26 countries) and interviews in 13 different EU countries on awareness, values, and attitudes toward privacy online. The authors state consent regarding personal data use is an important concept and observe, “There is mounting evidence that data subjects do not fully contemplate the consequences and risks of personal data processing.”
Custers, van der Hof and Schermer developed a set of criteria for giving informed consent about the use of personal data, including: “Is it clear who is processing the data and who is accountable?” and “Is the information provided understandable?” When existing privacy policies were applied to these criteria, Wikipedia was the worst performing of the sites analyzed and recommends that it makes clear how minors are dealt with and to provide additional clarity around security measures. It also notes that IP addresses may be traced, therefore making “anonymous” Wikipedia users identifiable.
Holocaust articles compared across languages: We tell ourselves that Wikipedia works well for the most part, but that finding consensus might break down on controversial articles. Of all article topics, perhaps none is potentially more fraught than the Holocaust, and that is precisely what Rudolf Den Hartogh has tackled in his Master's thesis "The future of the Past: A case study on the representation of the Holocaust on Wikipedia". It is an in-depth compare and contrast analysis of the Holocaust topic in the English, German, and Dutch. Several curious facts come out of this. For instance the average vandalism rate on these articles is 4%, compared with 7% globally - as these articles have been locked at some point, although the Dutch version is no longer protected. Other analyses show edit activity over time, since the articles' inception. The German version saw the height of its shaping 2 years after it was started in 2004, whereas the English and Dutch articles saw their main spurts 5 and 3 years later respectively. Moreover the author finds "that there does not exist one representation of the Holocaust, but each language version has its own unique account of events and phenomena." Finally they "found that none of the Holocaust entries under study is rated ‘good quality’," so we still have not definitively addressed the hardest parts of our encyclopedia.
Lensing Wikipedia aims to extract date, location, event and role semantic data from historical English Wikipedia articles. Of course making grand sense of that automatic extraction work requires visualization. Such visualization is difficult on high-dimensional data consisting of e.g. a date, location, multiple events and roles - all at the same time. A short proof of concept "Visualizing Wikipedia using t-SNE" by Jasneet Singh Sabharwal  has done just this using a Barnes-Hut simulation variation of the T-distributed stochastic neighbor embedding algorithm. This image shows the closeness of the semantic roles of features found in Wikipedia article text, with colors indicating similar events that articles are describing.
"Infoboxes and cleanup tags: Artifacts of Wikipedia newsmaking" looks at use and abuse of cleanup tags and infobox elements as conceptual and symbolic tools. Based on ethnographic observations and several interviews, the author provides a lengthy description of the formative first three or so weeks in the 2011 Egyptian Revolution article. It is a valuable study of how articles are developed, and the collaboration and conflicts that are common in high-activity articles. The author provides a valuable observation that "Classification work... is intensely political" and "the editing of Wikipedia articles involves continuous linking and classifying." The choice of words, categories, article titles, but also specific tags or infoboxes (a particular example discussed - whether to use Template:Infobox uprising or not - concerns a now deleted template) can be quite controversial. The author also puts forth an interesting argument that removal of cleanup tags may give false impressions of stability in articles that are not yet stable; and that infoboxes carry significant, perhaps undue weight, compared to other elements of the article.
Wikipedia's identity "based on freedom": This paper looks at Wikipedia through a number of organizational theory lenses, in particular theories of organizational identity. Of particular interest to Wikipedians is one of the aspects analyzed by the editors - identify of the project. The authors state that "the organizational identity at Wikipedia is based on freedom". Next, they discuss the utopian ideals of freedom (such as "anyone can edit"), as contrasted with the freedom-reducing tendencies of censorship, administrative control, and bureaucratization. The authors argue that the common solution to criticism of Wikipedia, within the community, is concealment and marginalization of said criticism. The authors point to the practical defanging of the Wikipedia:Ignore all rules policy, which has went through a number of meaning shifts, in which it was redefined to be virtually toothless, even though the name remained the same. Another way that freedom is limited is through end-justifies-the-mean utopian vision of "free access [to Wikipedia] for everyone", replacing the older "anyone can edit" "freedom of editing meaning. Unfortunately, the author's discussion of "the subjugation of contesting voices" is very short on details and specifics; the authors allude to administrator power abuse, but fail to provide any specific discussion of how it occurs; an example they used of "deleted content" can be interpreted as nothing more sinister then admin ability to delete content that does not meet Wikipedia's site policies, including uncontroversial content such as spam.
"Copyright or Copyleft? Wikipedia as a Turning Point for Authorship": This paper  touches upon a very interesting yet understudied area: what Wikipedia's existence means for copyright law. As the authors note, Wikipedia "appears to challenge some of the notions at the heart of copyright law."
Critique of Wikipedia's dispute resolution procedures: This paper claims to presents an ethnographic analysis of and a strong critique of Wikipedia's dispute resolution procedures, and states upfront its goal as "to tease out systemic discrimination or injustice". The strongly worded abstract is attention-drawing, promising that "A number of flaws will be identified including the ability for vocal minorities to dominate the Wikipedia community consensus". Unfortunately, while the paper provides a very detailed description of Wikipedia's dispute resolution scene, it doesn't seem to present any new data; its critique of "vocal minorities", for example, is composed of few sentences, and the entire argument is based on, and essentially a repetition of a similar passage in Reagle's Good Faith Collaboration book. While the paper is well written and presents a number of valid arguments, it does not seem to contribute anything new to our understanding of Wikipedia, being in essence a literature review focused on the topic of dispute resolution on Wikipedia. Which this reviewer finds disappointing, considering that the almost tabloid-style abstract and the introductory section promise ethnographic research, which - like anything else going beyond synthesis of existing, published research - is sadly very much absent from the paper.
Other recent publications
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
"A Piece of My Mind: A Sentiment Analysis Approach for Online Dispute Detection" (constructs a dispute corpus from Wikipedia talk pages)
"Extracting Imperatives from Wikipedia Article for Deletion Discussions" (without conclusions or published dataset, apparently)
"Use of Wikipedia by Legal Scholars: Implications for Information Literacy"
"Guiding Students in Collaborative Writing of Wikipedia Articles – How to Get Beyond the Black Box Practice in Information Literacy Instruction" (received the EdMedia Outstanding Paper Award)
"Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project" (project home page, allowing the live creation of a taxonomy graph for an arbitrary Wikipedia article: http://wibitaxonomy.org )
"Analysis of the accuracy and readability of herbal supplement information on Wikipedia"
"Maturity Assessment of Wikipedia Medical Articles"
"Computer-supported collaborative accounts of major depression: Digital rhetoric on Quora and Wikipedia"
^Rughinis, Cosima; Bogdana Huma; Stefania Matei; Razvan Rughinis (June 2014). Computer-supported collaborative accounts of major depression: Digital rhetoric on Quora and Wikipedia. 2014 9th Iberian Conference on Information Systems and Technologies (CISTI). pp. 1–6. doi:10.1109/CISTI.2014.6876968.