A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
At the International Joint Conference on Artificial Intelligence (IJCAI) – one of the prime artificial intelligence conferences, if not the pre-eminent one – Banerjee and Mitra from Pennsylvania State University published the paper "WikiWrite: Generating Wikipedia Articles Automatically".[1]
The system described in the paper looks for red links in Wikipedia and classifies them based on their context. To find section titles, it then looks for similar existing articles. With these titles, the system searches the web for information, and eventually uses content summarization and a paraphrasing algorithm. The researchers uploaded 50 of these automatically created articles to Wikipedia, and found that 47 of them survived. Some were heavily edited after upload, others not so much.
While I was enthusiastic about the results, I was surprised by the suboptimal quality of the articles I reviewed – three that were mentioned in the paper. After a brief discussion with the authors, a wider discussion was initiated on the Wiki-research mailing list. This was followed by an entry on the English Wikipedia administrators' noticeboard (which includes a list of all accounts used for this particular research paper). The discussion led to the removal of most of the remaining articles.
The discussion concerned the ethical implications of the research, and using Wikipedia for such an experiment without the consent of Wikipedia contributors or readers. The first author of the paper was an active member of the discussion; he showed a lack of awareness of these issues, and appeared to learn a lot from the discussion. He promised to take these lessons to the relevant research community – a positive outcome.
In general, this sets an example for engineers and computer-science engineers, who often show a lack of awareness of certain ethical issues in their research. Computer scientists are typically trained to think about bits and complexities, and rarely discuss in depth how their work impacts human lives. Whether it's social networks experimenting with the mood of their users, current discussions of biases in machine-learned models, or the experimental upload of automatically created content in Wikipedia without community approval, computer science has generally not reached the level of awareness of some other sciences for the possible effects of their research on human subjects, at least as far as this reviewer can tell.
Even in Wikipedia, there's no clear-cut, succinct Wikipedia policy I could have pointed the researchers to. The use of sockpuppets was a clear violation of policy, but an incidental component of the research. WP:POINT was a stretch to cover the situation at hand. In the end, what we can suggest to researchers is to check back with the Wikimedia Research list. A lot of people there have experience with designing research plans with the community in mind, and it can help to avoid uncomfortable situations.
See also our 2015 review of a related paper coauthored by the same authors: "Bot detects theatre play scripts on the web and writes Wikipedia articles about them" and other similarly themed papers they have published since then: "WikiKreator: Automatic Authoring of Wikipedia Content"[2], "WikiKreator: Improving Wikipedia Stubs Automatically"[3], "Filling the Gaps: Improving Wikipedia Stubs"[4]. DV
A paper[5] in the journal Ethics and Information Technology examines the "system of surveillance" that the English Wikipedia has built up over the years to deal with vandalism edits. The author, Paul B. de Laat from the University of Groningen, presents an interesting application of a theoretical framework by US law scholar Frederick Schauer that focuses on the concepts of rule enforcement and profiling. While providing justification for the system's efficacy and largely absolving it of some of the objections that are commonly associated with the use of profiling in, for example, law enforcement, de Laat ultimately argues that in its current form, it violates an alleged "social contract" on Wikipedia by not treating anonymous and logged-in edits equally. Although generally well-informed about both the practice and the academic research of vandalism fighting, the paper unfortunately fails to connect to an existing debate about very much the same topic – potential biases of artificial intelligence-based anti-vandalism tools against anonymous edits – that was begun last year[6] by the researchers developing ORES (an edit review tool that was just made available to all English Wikipedia users, see this week's Technology report) and most recently discussed in the August 2016 WMF research showcase.
To answer the first question, the author turns to Schauer's work on rules, in a brief summary that is worth reading for anyone interested in Wikipedia policies and guidelines – although de Laat instead applies the concept to the "procedural rules" implicit in vandalism profiling (such as that anonymous edits are more likely to be worth scrutinizing). First, Schauer "resolutely pushes aside the argument from fairness: decision-making based on rules can only be less just than deciding each case on a particularistic basis ". (For example, a restaurant's "No Dogs Allowed" rule will unfairly exclude some well-behaved dogs, while not prohibiting much more dangerous animals such as snakes.) Instead, the existence of rules have to be justified by other arguments, of which Schauer presents four:
The author cautions that these four arguments have to be reinterpreted when applying them to vandalism profiling, because it consists of "procedural rules" (which edits should be selected for inspection) rather than "substantive rules" (which edits should be reverted as vandalism, which animals should be disallowed from the restaurant). While in the case of substantive rules, their absence would mean having to judge everything on a case-by-case basis, the author asserts that procedural rules arise in a situation where the alternative would be to to not judge at all in many cases: Because "we have no means at our disposal to check and pass judgment on all of them; a selection of a kind has to be made. So it is here that profiling comes in". With that qualification, Schauer's second argument provides justification for "Wikipedian profiling [because it] turns out to be amazingly effective", starting with the autonomous bots that auto-revert with an (aspired) 1:1000 false-positive rate.
De Laat also interprets "the Schauerian argument of reliability/predictability for those affected by the rule" in favor of vandalism profiling. Here, though, he fails to explain the benefits of vandals being able to predict which kind of edits will be subject to scrutiny. This also calls into question his subsequent remark that "it is unfortunate that the anti-vandalism system in use remains opaque to ordinary users". The remaining two of Schauer's four arguments are judged as less pertinent. But overall the paper concludes that it is possibile to justify the existence of vandalism profiling rules as beneficial via Schauer's theoretical framework.
Next, de Laat turns to question 2, on whether vandalism profiling is also morally justified. Here he relies on later work by Schauer, from a 2003 book, "Profiles, Probabilities, and Stereotypes", that studies such matters as profiling by tax officials (selecting which taxpayers have to undergo an audit), airport security (selecting passengers for screening) and by police officers (for example, selecting cars for traffic stops). While profiling of some kind is a necessity for all these officials, the particular characteristics (dimensions) used for profiling can be highly problematic (see Driving While Black). For de Laat's study of Wikipedia profiling, "two types of complications are important: (1) possible ‘overuse’ of dimension(s) (an issue of profile effectiveness) and (2) social sensibilities associated with specific dimension(s) (a social and moral issue)." Overuse can mean relying on stereotypes that have no basis in reality, or over-reliance on some dimensions that, while having a non-spurious correlation with the deviant behavior, are over-emphasized at the expense of other relevant characteristics because they are more visible or salient to the profile. While Schauer considers that it may be justified for "airport officials looking for explosives [to] single out for inspection the luggage of younger Muslim men of Middle Eastern appearance", it would be an over-use if "officials ask all Muslim men and all men of Middle Eastern origin to step out of line to be searched", thus reducing their effectiveness by neglecting other passenger characteristics. This is also an example for the second type of complication profiling, where the selected dimensions are socially sensitive – indeed, for the specific case of luggage screening in the US, "the factors of race, religion, ethnicity, nationality, and gender have expressly been excluded from profiling" since 1997.
Applying this to the case of Wikipedia's anti-vandalism efforts, de Laat first observes that complication (1) (overuse) is not a concern for fully automated tools like ClueBotNG – obviously their algorithm applies the existing profile directly without a human intervention that could introduce this kind of bias. For Huggle and STiki, however, "I see several possibilities for features to be overused by patrollers, thereby spoiling the optimum efficacy achievable by the profile embedded in those tools." This is because both tools do not just use these features in their automatic pre-selection of edits to be reviewed, but expose at least the fact whether an edit was anonymous to the human patroller in the edit review interface. (The paper examines this in detail for both tools, also observing that Huggle presents more opportunities for this kind of overuse, while STiki is more restricted. However, there seems to have been no attempt to study empirically whether this overuse actually occurs.)
Regarding complication (2), whether some of the features used for vandalism profiling are socially sensitive, de Laat highlights that they include some amount of discrimination by nationality: IP edits geolocated to the US, Canada, and Australia have been found to contain vandalism more frequently and are thus more likely to be singled out for inspection. However, he does not consider this concern "strong enough to warrant banning the country-dimension and correspondingly sacrifice some profiling efficacy", chiefly because there do not appear to be a lot of nationalistic tensions within the English Wikipedia community that could be stirred up by this.
In contrast, de Laat argues that "the targeting of contributors who choose to remain anonymous ... is fraught with danger since anons already constitute a controversial group within the Wikipedian community." Still, he acknowledges the "undisputed fact" that the ratio of vandalism is much higher among anonymous edits. Also, he rejects the concern that they might be more likely to be the victim of false positives:
“ | normally [IP editors] do not experience any harm when their edits are selected and inspected as a result of anon-powered profiling; they will not even notice that they were surveilled since no digital traces remain of the patrolling. ... The only imaginable harm is that patrollers become over focussed on anons and indulge in what I called above 'overinspection' of such edits and wrongly classify them as vandalism ... As a consequence, they might never contribute to Wikipedia again. ... Nevertheless, I estimate this harm to be small. At any rate, the harm involved would seem to be small in comparison with the harassment of racial profiling—let alone that an 'expressive harm hypothesis' applies. | ” |
With this said, de Laat still makes the controversial call "that the anonymous-dimension should be banned from all profiling efforts" – including removing it from the scoring algorithms of Huggle, STiki and ClueBotNG. Instead of concerns about individual harm,
“ | my main argument for the ban is a decidedly moral one. From the very beginning the Wikipedian community has operated on the basis of a 'social contract' that makes no distinction between anons and non-anons – all are citizens of equal stature. ... In sum, the express profiling of anons turns the anonymity dimension from an access condition into a social distinction; the Wikipedian community should refrain from institutionalizing such a line of division. Notice that I argue, in effect, that the Wikipedian community has only two choices: either accept anons as full citizens or not; but there is no morally defensible social contract in between. | ” |
Sadly, while the paper is otherwise rich in citations and details, it completely fails to provide evidence for the existence of this alleged social contract. While it is true that "the ability of almost anyone to edit (most) articles without registration" forms part of Wikipedia's founding principles (a principle that this reviewer strongly agrees with), the "equal stature" part seems to be de Laat's own invention – there is a long list of things that, by longstanding community consensus, require the use of an account (which after all is freely available to everyone, without even requiring an email address). Most of these restrictions – say, the inability to create new articles or being prevented from participating in project governance during admin or arbcom votes – seem much more serious than the vandalism profiling that is the topic of de Laat's paper. TB
A list of other recent publications that could not be covered in time for this issue—contributions are always welcome for reviewing or summarizing newly published research. This month, the list mainly gathers research about the extraction of specific content from Wikipedia.
{{cite conference}}
: CS1 maint: numeric names: authors list (link)
Discuss this story
No comments on this yet? What a fascinating story. A huge helping of "thank-you" to the authors, with a generous side of gratitude. Well done. 78.26 (spin me / revolutions) 15:37, 7 September 2016 (UTC)[reply]