The Signpost

Recent research

AI-generated articles and research ethics; anonymous edits and vandalism fighting ethics

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

[[File:|center|200px]]

While I was enthusiastic about the results, I was surprised by the suboptimal quality of the articles I reviewed – three that were mentioned in the paper. After a brief discussion with the authors, a wider discussion was initiated on the Wiki-research mailing list. This was followed by an entry on the English Wikipedia administrators' noticeboard (which includes a list of all accounts used for this particular research paper). The discussion led to the removal of most of the remaining articles.

The discussion concerned the ethical implications of the research, and using Wikipedia for such an experiment without the consent of Wikipedia contributors or readers. The first author of the paper was an active member of the discussion; he showed a lack of awareness of these issues, and appeared to learn a lot from the discussion. He promised to take these lessons to the relevant research community – a positive outcome.

In general, this sets an example for engineers and computer-science engineers, who often show a lack of awareness of certain ethical issues in their research. Computer scientists are typically trained to think about bits and complexities, and rarely discuss in depth how their work impacts human lives. Whether it's social networks experimenting with the mood of their users, current discussions of biases in machine-learned models, or the experimental upload of automatically created content in Wikipedia without community approval, computer science has generally not reached the level of awareness of some other sciences for the possible effects of their research on human subjects, at least as far as this reviewer can tell.

Even in Wikipedia, there's no clear-cut, succinct Wikipedia policy I could have pointed the researchers to. The use of sockpuppets was a clear violation of policy, but an incidental component of the research. WP:POINT was a stretch to cover the situation at hand. In the end, what we can suggest to researchers is to check back with the Wikimedia Research list. A lot of people there have experience with designing research plans with the community in mind, and it can help to avoid uncomfortable situations.

See also our 2015 review of a related paper coauthored by the same authors: "Bot detects theatre play scripts on the web and writes Wikipedia articles about them" and other similarly themed papers they have published since then: "WikiKreator: Automatic Authoring of Wikipedia Content"[2], "WikiKreator: Improving Wikipedia Stubs Automatically"[3], "Filling the Gaps: Improving Wikipedia Stubs"[4]. DV

Ethics researcher: Vandal fighters should not be allowed to see whether an edit was made anonymously

A paper[5] in the journal Ethics and Information Technology examines the "system of surveillance" that the English Wikipedia has built up over the years to deal with vandalism edits. The author, Paul B. de Laat from the University of Groningen, presents an interesting application of a theoretical framework by US law scholar Frederick Schauer that focuses on the concepts of rule enforcement and profiling. While providing justification for the system's efficacy and largely absolving it of some of the objections that are commonly associated with the use of profiling in, for example, law enforcement, de Laat ultimately argues that in its current form, it violates an alleged "social contract" on Wikipedia by not treating anonymous and logged-in edits equally. Although generally well-informed about both the practice and the academic research of vandalism fighting, the paper unfortunately fails to connect to an existing debate about very much the same topic – potential biases of artificial intelligence-based anti-vandalism tools against anonymous edits – that was begun last year[6] by the researchers developing ORES (an edit review tool that was just made available to all English Wikipedia users, see this week's Technology report) and most recently discussed in the August 2016 WMF research showcase.

The paper first gives an overview of the various anti-vandalism tools and bots in use, recapping an earlier paper[7] where de Laat had already asked whether these are "eroding Wikipedia’s moral order" (following an even earlier 2014 paper in which he had argued that new-edit patrolling "raises a number of moral questions that need to be answered urgently"). There, de Laat's concerns included the fact that some stronger tools (rollback, Huggle, and STiki) are available only to trusted users and "cause a loss of the required moral skills in relation to newcomers", and that they a lack of transparency about how the tools operate (in particular when more sophisticated artificial intelligence/machine learning algorithms such as neural networks are used). The present paper expands on a separate but related concern, about the use of "profiling" to pre-select which recent edits will be subject to closer human review. The author emphasizes that on Wikipedia this usually does not mean person-based offender profiling (building profiles of individuals committing vandalism), citing only one exception in form of a 2015 academic paper – cf. our review: "Early warning system identifies likely vandals based on their editing behavior". Rather, "the anti-vandalism tools exemplify the broader type of profiling" that focuses on actions. Based on Schauer's work, the author asks the following questions:
  1. "Is this profiling profitable, does it bring the rewards that are usually associated with it?"
  2. "is this profiling approach towards edit selection justified? In particular, do any of the dimensions in use raise moral objections? If so, can these objections be met in a satisfactory fashion, or do such controversial dimensions have to be adapted or eliminated?"
But snakes are much more dangerous! According to Schauer, while general rules are always less fair than case-by-case decisions, their existence can be justified by other arguments.

To answer the first question, the author turns to Schauer's work on rules, in a brief summary that is worth reading for anyone interested in Wikipedia policies and guidelines – although de Laat instead applies the concept to the "procedural rules" implicit in vandalism profiling (such as that anonymous edits are more likely to be worth scrutinizing). First, Schauer "resolutely pushes aside the argument from fairness: decision-making based on rules can only be less just than deciding each case on a particularistic basis ". (For example, a restaurant's "No Dogs Allowed" rule will unfairly exclude some well-behaved dogs, while not prohibiting much more dangerous animals such as snakes.) Instead, the existence of rules have to be justified by other arguments, of which Schauer presents four:

The author cautions that these four arguments have to be reinterpreted when applying them to vandalism profiling, because it consists of "procedural rules" (which edits should be selected for inspection) rather than "substantive rules" (which edits should be reverted as vandalism, which animals should be disallowed from the restaurant). While in the case of substantive rules, their absence would mean having to judge everything on a case-by-case basis, the author asserts that procedural rules arise in a situation where the alternative would be to to not judge at all in many cases: Because "we have no means at our disposal to check and pass judgment on all of them; a selection of a kind has to be made. So it is here that profiling comes in". With that qualification, Schauer's second argument provides justification for "Wikipedian profiling [because it] turns out to be amazingly effective", starting with the autonomous bots that auto-revert with an (aspired) 1:1000 false-positive rate.

De Laat also interprets "the Schauerian argument of reliability/predictability for those affected by the rule" in favor of vandalism profiling. Here, though, he fails to explain the benefits of vandals being able to predict which kind of edits will be subject to scrutiny. This also calls into question his subsequent remark that "it is unfortunate that the anti-vandalism system in use remains opaque to ordinary users". The remaining two of Schauer's four arguments are judged as less pertinent. But overall the paper concludes that it is possibile to justify the existence of vandalism profiling rules as beneficial via Schauer's theoretical framework.

Police traffic stops: A good analogy for anti-vandalism patrol on Wikipedia?

Next, de Laat turns to question 2, on whether vandalism profiling is also morally justified. Here he relies on later work by Schauer, from a 2003 book, "Profiles, Probabilities, and Stereotypes", that studies such matters as profiling by tax officials (selecting which taxpayers have to undergo an audit), airport security (selecting passengers for screening) and by police officers (for example, selecting cars for traffic stops). While profiling of some kind is a necessity for all these officials, the particular characteristics (dimensions) used for profiling can be highly problematic (see Driving While Black). For de Laat's study of Wikipedia profiling, "two types of complications are important: (1) possible ‘overuse’ of dimension(s) (an issue of profile effectiveness) and (2) social sensibilities associated with specific dimension(s) (a social and moral issue)." Overuse can mean relying on stereotypes that have no basis in reality, or over-reliance on some dimensions that, while having a non-spurious correlation with the deviant behavior, are over-emphasized at the expense of other relevant characteristics because they are more visible or salient to the profile. While Schauer considers that it may be justified for "airport officials looking for explosives [to] single out for inspection the luggage of younger Muslim men of Middle Eastern appearance", it would be an over-use if "officials ask all Muslim men and all men of Middle Eastern origin to step out of line to be searched", thus reducing their effectiveness by neglecting other passenger characteristics. This is also an example for the second type of complication profiling, where the selected dimensions are socially sensitive – indeed, for the specific case of luggage screening in the US, "the factors of race, religion, ethnicity, nationality, and gender have expressly been excluded from profiling" since 1997.

Applying this to the case of Wikipedia's anti-vandalism efforts, de Laat first observes that complication (1) (overuse) is not a concern for fully automated tools like ClueBotNG – obviously their algorithm applies the existing profile directly without a human intervention that could introduce this kind of bias. For Huggle and STiki, however, "I see several possibilities for features to be overused by patrollers, thereby spoiling the optimum efficacy achievable by the profile embedded in those tools." This is because both tools do not just use these features in their automatic pre-selection of edits to be reviewed, but expose at least the fact whether an edit was anonymous to the human patroller in the edit review interface. (The paper examines this in detail for both tools, also observing that Huggle presents more opportunities for this kind of overuse, while STiki is more restricted. However, there seems to have been no attempt to study empirically whether this overuse actually occurs.)

Regarding complication (2), whether some of the features used for vandalism profiling are socially sensitive, de Laat highlights that they include some amount of discrimination by nationality: IP edits geolocated to the US, Canada, and Australia have been found to contain vandalism more frequently and are thus more likely to be singled out for inspection. However, he does not consider this concern "strong enough to warrant banning the country-dimension and correspondingly sacrifice some profiling efficacy", chiefly because there do not appear to be a lot of nationalistic tensions within the English Wikipedia community that could be stirred up by this.

In contrast, de Laat argues that "the targeting of contributors who choose to remain anonymous ... is fraught with danger since anons already constitute a controversial group within the Wikipedian community." Still, he acknowledges the "undisputed fact" that the ratio of vandalism is much higher among anonymous edits. Also, he rejects the concern that they might be more likely to be the victim of false positives:

With this said, de Laat still makes the controversial call "that the anonymous-dimension should be banned from all profiling efforts" – including removing it from the scoring algorithms of Huggle, STiki and ClueBotNG. Instead of concerns about individual harm,

Sadly, while the paper is otherwise rich in citations and details, it completely fails to provide evidence for the existence of this alleged social contract. While it is true that "the ability of almost anyone to edit (most) articles without registration" forms part of Wikipedia's founding principles (a principle that this reviewer strongly agrees with), the "equal stature" part seems to be de Laat's own invention – there is a long list of things that, by longstanding community consensus, require the use of an account (which after all is freely available to everyone, without even requiring an email address). Most of these restrictions – say, the inability to create new articles or being prevented from participating in project governance during admin or arbcom votes – seem much more serious than the vandalism profiling that is the topic of de Laat's paper. TB

Briefly

Conferences and events

Other recent publications

A list of other recent publications that could not be covered in time for this issue—contributions are always welcome for reviewing or summarizing newly published research. This month, the list mainly gathers research about the extraction of specific content from Wikipedia.


References

  1. ^ Siddhartha Banerjee, Prasenjit Mitra, "WikiWrite: Generating Wikipedia Articles Automatically".
  2. ^ Banerjee, Siddhartha; Mitra, Prasenjit (October 2015). "WikiKreator: Automatic Authoring of Wikipedia Content". AI Matters. 2 (1): 4–6. doi:10.1145/2813536.2813538. ISSN 2372-3483. Closed access icon
  3. ^ Banerjee, Siddhartha and Mitra, Prasenjit: "WikiKreator: Improving Wikipedia Stubs Automatically, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing" (Volume 1: Long Papers), July 2015, Beijing, China, Association for Computational Linguistics, pages 867–877,
  4. ^ Banerjee, Siddhartha; Mitra, Prasenjit (2015). "Filling the Gaps: Improving Wikipedia Stubs". Proceedings of the 2015 ACM Symposium on Document Engineering. DocEng '15. New York, NY, USA: ACM. pp. 117–120. doi:10.1145/2682571.2797073. ISBN 9781450333078. Closed access icon
  5. ^ Laat, Paul B. (30 April 2016). "Profiling vandalism in Wikipedia: A Schauerian approach to justification". Ethics and Information Technology: 1–18. doi:10.1007/s10676-016-9399-8. ISSN 1388-1957.
  6. ^ See, as an example, Halfaker, Aaron (6 December 2015). "Disparate impact of damage-detection on anonymous Wikipedia editors". Socio-technologist.
  7. ^ Laat, Paul B. de (2 September 2015). "The use of software tools and autonomous bots against vandalism: eroding Wikipedia's moral order?". Ethics and Information Technology. 17 (3): 175–188. doi:10.1007/s10676-015-9366-9. ISSN 1388-1957.
  8. ^ Tufiş, Dan; Ion, Radu; Dumitrescu, Ştefan; Ştefănescu2, Dan (26 May 2014). "Large SMT Data-sets Extracted from Wikipedia" (PDF). Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). TUFI 14.103. ISBN 978-2-9517408-8-4.{{cite conference}}: CS1 maint: numeric names: authors list (link)
  9. ^ Aprosio, Alessio Palmero; Tonelli, Sara (17 September 2015). "Recognizing Biographical Sections in Wikipedia". Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal. pp. 811–816.
  10. ^ Norrby, Magnus; Nugues, Pierre (2015). Extraction of lethal events from Wikipedia and a semantic repository (PDF). workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015. Vilnius, Lithuania.
  11. ^ Hu, Linmei; Wang, Xuzhong; Zhang, Mengdi; Li, Juanzi; Li, Xiaoli; Shao, Chao; Tang, Jie; Liu, Yongbin (26 July 2015). "Learning Topic Hierarchies for Wikipedia Categories" (PDF). Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers). Beijing, China. pp. 346–351.
  12. ^ Nakashole, Ndapa; Mitchell, Tom; Wijaya, Derry (2015). "A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce": Learning State Changing Verbs from Wikipedia Revision History (PDF). Proceedings of EMNLP 2015. Lisbon, Portugal. pp. 518–523.
  13. ^ Shan Liu, Mizuho Iwaihara: Extracting Representative Phrases from Wikipedia Article Sections, DEIM Forum 2016 C3-6. http://db-event.jpn.org/deim2016/papers/314.pdf
  14. ^ Cannaviccio, Matteo; Barbosa, Denilson; Merialdo, Paolo (2016). "Accurate Fact Harvesting from Natural Language Text in Wikipedia with Lector". Proceedings of the 19th International Workshop on Web and Databases. WebDB '16. New York, NY, USA: ACM. doi:10.1145/2932194.2932203. ISBN 9781450343107. Closed access icon
  15. ^ Ekenstierna, Gustaf Harari; Lam, Victor Shu-Ming. Extracting Scientists from Wikipedia. Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, July 11, 2016, Krakow, Poland.
  16. ^ Lowe, Daniel M.; O'Boyle, Noel M.; Sayle, Roger A. "LeadMine: Disease identification and concept mapping using Wikipedia" (PDF). Proceeding of the fifth BioCreative challenge evaluation workshop. BCV 2015. pp. 240–246.
  17. ^ Shuang Sun, Mizuho Iwaihara: Finding Member Articles for Wikipedia Lists. DEIM Forum 2016 C3-3. http://db-event.jpn.org/deim2016/papers/184.pdf
  18. ^ Martín Curto, María del Rosario (15 April 2016). "Estudio sobre el contenido de las Ciencias de la Documentación en la Wikipedia en español" (info:eu-repo/semantics/bachelorThesis). thesis, University of Salamanca, 2014
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

No comments on this yet? What a fascinating story. A huge helping of "thank-you" to the authors, with a generous side of gratitude. Well done. 78.26 (spin me / revolutions) 15:37, 7 September 2016 (UTC)[reply]

I've got one, but it's going to upset some people. The first story here is about a kind of Turing test ... an attempt to get a machine to successfully mimic the kinds of things a human would write. This kind of research is proceeding at light speed; no one in the field is betting that machines won't be much, much better at it 10 years from now, across a range of applications. This will inevitably have dramatic consequences for Wikipedia. We can expect to see a variety of bad actors, including convincing, machine-generated sockpuppets that promote their master's articles, and show up at community votes, and generally cause mayhem. We can expect to see good actors, who create tools that efficiently do a variety of tasks that have to be done manually now, including tools that fight the bad actors. Most of the machine-users will probably be neither good nor bad; they might just be curious about how the software will work, as these researchers seem to have been, or they might be using these tools in other spheres of their lives, and never stop to think that we might object to use of those tools on Wikipedia. One thing that concerns me: if we yell at every neutral editor and researcher who uses similar tools and tell them we think they're scum (and that happened in this case, a little bit), we might, over time, convert all the neutral actors into bad actors. - Dank (push to talk) 17:09, 7 September 2016 (UTC)[reply]
Ever since seeing CGP Grey's "Humans Need Not Apply" video and reading Nick Bostrom's Superintelligence: Paths, Dangers, Strategies upon which the video was based, I have become increasingly concerned with changes in this industry. Those of us that enjoy writing an encyclopedia will not survive for long against AI that will generate a free encyclopedia for those who are only consumers. Clearly as a biased humanities student I have little regard to the professionalism of engineers mucking around in Wikipedia as they selfishly seek to solve a perceived problem without a care for either the human editors or the larger enterprise. To that end, I have no qualms about biting or "profiling" so-called neutral actors. Anyone that's not an encyclopedist is a bad actor, anyway. Chris Troutman (talk) 13:37, 8 September 2016 (UTC)[reply]
FWIW, the Quill software mentioned at the 8:54 mark in that video is described at https://www.narrativescience.com/quill. Google has a team headed by Ray Kurzweil that plans to deliver a customizable chatbot by the end of this year. So the threats (and opportunities) already exist to some extent. - Dank (push to talk) 14:14, 8 September 2016 (UTC)[reply]
  • Regarding the commentary on the paper by "de Laat", I'm sure it's a well thought out work, but I fail to see why this form of profiling (or as I think of it: filtering) is "eroding the moral order". Editing Wikipedia is not an innate human right, it's a privilege that can be taken away. To me the anti-vandalism tactics are somewhat equivalent to requiring seat belts to drive on the freeway; the police can profile the unbelted drivers, thereby focusing their efforts on (presumably) higher risk targets. How is that eroding the moral order? de Laat's reasoning seems more appropriate for a court of law. Praemonitus (talk) 19:59, 8 September 2016 (UTC)[reply]
    • I've long been of the opinion that IP vandals should be permanently banned after three strikes, and more generally that we ought to treat anonymous edits differently from edits by named editors. Bearian (talk) 17:57, 12 September 2016 (UTC)[reply]
      • Maybe that would be fair, but is it possible? We only have the technology to block IP addresses, an individual vandal is likely to move on and if we've permanently blocked their former IP address it is no skin of their nose. If anything they have provoked us into permanently disabling editing for future users of that IP address. ϢereSpielChequers 09:31, 25 September 2016 (UTC)[reply]
  • Interesting article. I'd have critiqued the IP profiling article differently. Firstly the question of intrusiveness, I can understand drivers getting annoyed if they are pulled over and breathalysed when sober. Antivandalism patrol is more like the highway patrol that looks at all the traffic and then goes after the car that is weaving all over the road or speeding. Unless you get thanked or your edit accepted you don't normally notice the time someone checks your edit and decides it isn't vandalism. Secondly it conflated the divide into IP v registered. In reality the divide is three way, IP, new account, trusted account. The main difference is in the way we treat the regulars as opposed to newbies and IPs. In effect we are like an airport with special light touch express lines for frequent flyers and staff, or a barman who doesn't repeatedly check the age of the regulars - prove you are a trusted known quantity and we will focus our security time elsewhere. If you make the comparison between IP editors and Newbies then I'm not sure the IPs have a case to gripe. In practice an IP vandal will usually get a 31 hour block for something that a newbie would get an indefinite block for. A better analogy for the IP editors and newbies is with office blocks that operate a keyfob system. I wouldn't be surprised if the researchers work in an environment with such a system. If so I challenge them to persuade their University or other workplace to drop special treatment for regulars, no side doors that only work with a fob - everyone gets to use the main entrance and sign in at reception. Anons, such as people wearing full face motorcycle helmets and having forgotten their keyfob, get the same access as everyone else. Such a system might work OK at a public library or in a village on a sparsely populated island, but not in an organisation with hundreds let alone thousands of people and in a big city. ϢereSpielChequers 09:31, 25 September 2016 (UTC)[reply]
    • It is common for organizations to issue members identification cards or badges. People who have such IDs are treated differently from visitors who do not. They can enter and leave buildings and areas within buildings without checking in at a front desk. People who do not display a badge in a work place may be politely challenged by employees or security guards ("Can I Help you?"). A library card allows one to take out books. A passport permits border crossing. Cards holders are trusted more by the organization that issues the card. None of this behavior is considered profiling or ethically dubious. Having a Wikipedia account is a form of ID. We can easily contact you if your behavior is inappropriate and block you if your bad behavior persists despite repeated warnings (3 at least). By contrast many IP addresses come from schools or cybercafes where the addresses are shared by multiple users, making warnings and blocks more difficult to deliver. A vandal may come from more than one IP address and can easily evade blocks. Finally vandalism as defined by our policy is clear cut stuff like deleting blocks of text or inserting obscenities or gibberish. Its removal is an unquestionable good. If one goes into a poor neighborhood and quietly picks up broken glass from public playgrounds without attracting any attention or making any fuss about it, would that present ethical problems? Even if the reality is that there is just as much or more broken glass in richer neighborhoods, the playgrounds cleaned up are still better for it.--agr (talk) 01:00, 29 September 2016 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0