The Signpost

News and notes

The high road and the low road

Contribute  —  
Share this
By SnowFire and Nosebagbear

Scots Wikipedia language quality problems ripple around the Internet, make the news, and trigger Meta-Wiki response

King James I and VI, the actual person to have done the most damage to the Scots language in history (source). James moved his court from Scotland to London in 1603 and later commissioned the King James Version (Authorized Version) of the Bible in English only, not Scots. Both God and the government now spoke English.

The Scots Wikipedia is a quiet, sleepy, low activity edition of Wikipedia written in the Scots language, the Anglic language traditionally spoken in the lowlands of Scotland. Nobody paid it much mind... until August 2020, when a Reddit thread entitled "I've discovered that almost every single article on the Scots version of Wikipedia is written by the same person – an American teenager who can’t speak Scots" spread across the Internet. This young volunteer, who dedicated a large amount of time over seven years to translating segments of the English Wikipedia into Scots, unfortunately seemingly was never told that maintaining English sentence structure and translating words 1:1 from a dictionary is no way to translate at all. Further investigation showed the quality problems ran deep: articles untouched by the prolific user in question also had poor quality and ungrammatical Scots, meaning that many more articles on Scots Wikipedia may be essentially worthless. The author of the Reddit post called the incident "cultural vandalism on an unprecedented scale" and wrote that "This is going to sound incredibly hyperbolic and hysterical but I think this person has possibly done more damage to the Scots language than anyone else in history."

The story hit the news media, for both high and low reasons. For the high road, this was a massive and notable failure of Wikipedia, one that has likely poisoned training data sets for the Scots language used by translation algorithms, and led any curious human readers to think that Scots is simply English in an accent with a few funky words thrown in. For the low road, the hobbies and naivety of the prolific user were mocked. Some of the notable coverage includes:

Several of the tabloid-style sources omitted from this list got the story essentially wrong, confusing Scots with the Scottish Gaelic language, suggesting that the user might have just been writing in silly Groundskeeper Willie-ese, or that the user's admin status was relevant (a status much-misunderstood by the media). The problem was the user's edits: there has been no allegation of misuse of admin tools.

Within the Wikipedia community, several actions were kicked off. User:MJL, the only other active admin on Scots Wikipedia at the time, boldly set up their own "AMA" (short for 'Ask Me Anything') on the Scotland Subreddit to explain the situation as well as solicit interest in potential fixes for Scots Wikipedia. The prolific user apologized for his mistakes after being informed of his lack of proficiency in Scots and has withdrawn from editing for now. Various split discussions eventually coalesced into an RFC on Meta-Wiki: meta:Requests for comment/Large scale language inaccuracies on the Scots Wikipedia. The current short-term course of action with the most support seems to be having a bot perform some sort of mass rollback of affected articles if they meet criteria (which are still being determined), enlisting new admins, and some proposals for other new bots.

The long-term solution requires understanding how this disaster happened in the first place. On Wikipedia user page language templates, the prolific contributor only marked himself a 2/5 and a 3/5 (changing over time) at Scots proficiency in the first place. If he was really that bad at Scots – more like a 1/5 – how did nobody notice? The answer: there simply wasn't anyone to notice. To the extent there ever was an authentic Scots-speaking Scots Wikipedia community, it had departed by 2012. The contributor's contributions were "Scots-y" enough to keep non-native speakers paying mild attention to the wiki from realizing the extent of their problems, and the user himself was a young kid when this started, clearly without the best self-awareness. If even one or two native Scots speakers had been active, they could have sounded the alarm, long before seven years had passed of wasted, counterproductive effort. The fundamental problem at Scots Wikipedia is the lack of a Scots-speaking community of editors. Perhaps not only bad things have emerged from the incident: the burst of attention has drawn the attention of Scots language groups. If the end result is to expand the Scots Wikipedia community, then perhaps something good will have come of this. Sn

Interim Trust & Safety Case Review Committee

In early July, the Wikimedia Foundation announced the creation of the Interim Trust & Safety Case Review Committee (CRC), designed to allow appeal of certain less clear-cut cases decided by the WMF (both on-wiki and event bans), including appealing against a decision by T&S not to act on a complaint. A charter, a public call for applicants, and a Q&A with WMF Vice President of Community Resilience & Sustainability Maggie Dennis were also created. The CRC charter sets out the scope, objectives, and minimum candidate requirements.

The CRC is specifically temporary, designed to terminate with the creation of a permanent process as part of the Universal Code of Conduct. If those discussions have not concluded by July 1, 2021, then a new candidate call can be made for a new term or a single up to six-month extension can be granted if there is a clear indication the process will wrap up by then (such as if an implementation date has been agreed).

Process: Maggie Dennis responded to a question: "Let's say user FooBar is blocked as a T&S office action and requests case review [...] What does the appeal process look like, both from FooBar's perspective and the review committee's perspective?"

Subject to process changing by the CRC, a rough outline was offered as follows:

  1. User emails inbox asking for a review
  2. WMF attorney confirms case is not within remit of "statutory, regulatory, employment, or legal policies", and so is subject to review
  3. User is notified it is under review and given likely timeline
  4. CRC Chair appoints 5 members who review the case for "appropriate handling; appropriate collection of evidence; appropriate outcomes"
  5. Members vote on whether to support, overturn (partially or fully), or return to the WMF for additional investigation
  6. WMF enacts that decision
  7. All involved users will be notified of decision

Overturning could occur on two main grounds: the sanction was inappropriately reached (the evidence didn't warrant the sanction) or the case did not fall within the T&S remit. This would indicate that a complaint could then be resubmitted at local community level (Arbitration Committee, Administrators' Noticeboard/Incidents (ANI) or equivalents). The publicly available documentation doesn't make it clear if a case could be simultaneously overturned on both grounds and whether that would still allow for a "double jeopardy" situation. Individuals may only make a single appeal per prohibition.

Candidates: the WMF imposes a number of eligibility requirements, including holding a current or prior advanced permissions role or an experienced contributor as part of a Wikimedia affiliate. Candidates also need to be members in full good standing with no current sanctions and be fluent in English. Several roles were viewed as exclusive, including current/former WMF staff. The en-wiki Community has decided to disallow currently serving arbitrators from acting as CRC members, which Maggie Dennis said would be accepted. Gender and lingual diversity were also sought, the latter most likely also driving a project diversity.

CRC members are intended to be able to spend up to five hours a week on the role, though there were repeated statements that it was anticipated to be less.

One particular requirement was part of a major theme: anonymity. As well as keeping all case information to themselves under a currently non-published reinforced non-disclosure agreement (NDA) – above and beyond the standard non-public information agreement – candidates made anonymous applications and are to keep both others' and their own membership secret. A number of changes were made after applications closed due to "negotiation between committee finalists and Deputy GC", including further limiting CRC membership knowledge to only three Board members but giving retired CRC members the right to self-disclose after 6 months.

The initial filter of applications was made by non-applying Stewards, with members chosen from that group by the WMF General Counsel Amanda Keton. The WMF is also hiring a contractor to support the committee.

Reporting: the CRC is to provide quarterly generalised reports (number of cases ratified, number of cases overturned). It's not clear whether additional information will also be provided, such as number of cases T&S prohibits from going to appeal. Nbb

Brief notes

S
In this issue
+ Add a comment

Discuss this story

About Scots Wikipedia

  • I think the main lesson to learn from this incident is that it can be very risky to write content for Wikipedia in a language that is not your first language. Although I am reasonably fluent in German, I never contribute any significant content (other than links to images, and brief captions for the images) to German Wikipedia. On the other hand, I have translated a large amount of content from other languages into my first language, English. The translated content even includes material originally published in languages I can't speak, but can translate with the assistance of Google translate. There is obvious risk in publishing content that is partially machine translated, and I think that that risk is acceptable only if the destination language is the translator's first language. Bahnfrend (talk) 05:59, 31 August 2020 (UTC)Reply[reply]

The person has resumed editing scowiki

51 edits so far today, including talk pages as if nothing is out of the ordinary. So he's gotten over it and everything is hunky-dory now? Do people still recommend that we not attempt to talk to him? EllenCT (talk) 20:16, 31 August 2020 (UTC)Reply[reply]

Why would we? As you can read above, the editor base seems to think this was just a childhood mistake and nothing more to say about it, which is just the sort of thing game players would conclude. Chris Troutman (talk) 20:22, 31 August 2020 (UTC)Reply[reply]
Now that he's under the scrutiny of native speakers (it looks like the editathon has resulted in almost 3,000 good edits so far) I'm sure this will all blow over in a gust of wikilove, but there still appear to be at least 17,000 articles which are not in Scots, the vast majority of them due to him. EllenCT (talk) 21:02, 31 August 2020 (UTC)Reply[reply]
An important piece of context is that those 51 edits were largely moving pages to new titles which had been requested as a result of the editathon involving Scots speakers. That person was helping to clear things up with the community, rather than resuming as before. Richard Nevell (talk) 09:11, 1 September 2020 (UTC)Reply[reply]
@Richard Nevell, EllenCT, and Chris troutman: That kind of thing will be the extent of his involvement in Scots Wikipedia as things currently stand. His articles are currently on track for deletion. I assure you; AG is taking this seriously. There are a contingent of Scots Speakers who reached out to him to try and get him to resume editing. No surprise, he followed their request. –MJLTalk 06:23, 2 September 2020 (UTC)Reply[reply]

@EllenCT:,@Richard Nevell: Dear all, I changed the title of this section and the comment above to remove the username, presumably the person involved in the incident. The username of the person involved is not mentioned in the article. In addition, according to the reddit discussion that person has received harassment due to the incident. SYSS Mouse (talk) 18:24, 1 September 2020 (UTC)Reply[reply]

@SYSS Mouse: Sensible move, thank you. Richard Nevell (talk) 19:09, 1 September 2020 (UTC)Reply[reply]
I think the problem here is that one person systematically ruined a whole language encyclopedia (not that there was ever much of it to begin with). One person making poorly translated edits on en.wiki is not going to cause problems of such scale. -Indy beetle (talk) 18:53, 2 September 2020 (UTC)Reply[reply]

About Interim Trust & Safety Case Review Committee

One particular requirement was part of a major theme: anonymity. As well as keeping all case information to themselves under a currently non-published reinforced non-disclosure agreement (NDA) – above and beyond the standard non-public information agreement – candidates made anonymous applications and are to keep both others' and their own membership secret. A number of changes were made after applications closed due to "negotiation between committee finalists and Deputy GC", including further limiting CRC membership knowledge to only three Board members but giving retired CRC members the right to self-disclose after 6 months. Erm...what? No. This is not how we operate. We know who makes these decisions. Anonymity on the part of the decision-makers is utterly unacceptable. Even if all the details of why the decision was made can't be made public, the identity of the people who made the decision absolutely must be. Seraphimblade Talk to me 20:36, 30 August 2020 (UTC)Reply[reply]

What are you talking about? Nearly everybody is here under a pseudonym, including yourself. We don't publish the real names of admins or ArbCom members, or even ordinary editors. Hawkeye7 (discuss) 22:21, 30 August 2020 (UTC)Reply[reply]
I don't think the conversation is about real names, but total anonymity. We, as a community, know who our elected admins and arbs are even if we know them by pseudonym. Not knowing at all who is on the committee could mean that there's nobody on the committee and we're assured we should trust the decisions handed down. Chris Troutman (talk) 22:36, 30 August 2020 (UTC)Reply[reply]
One concern with this secrecy is recusal, or potentially the lack of it. With Arbcom if an Arb forgets past encounters with one of the parties to a dispute it is possible to remind them. But with an anonymous review committee any recusal operates almost entirely on the honour system. Even with T&S there is the potential for a party to a case to check with T&S that a particular staff member they have bad blood with is recused re their case. It is possible that a fellow committee member could know of a conflict of interest and remind someone of it. But if the committee is a diverse group of Wikimedians, or even a random group of them, there is a real risk that they don't know each other's pasts well enough to know of occasions where each other should recuse. ϢereSpielChequers 23:40, 30 August 2020 (UTC)Reply[reply]
Exactly as Chris Troutman said. I have no problem with pseudonymity. I do have a problem with anonymity. When I was on the ArbCom, when I cast a vote, I cast it under my username. Now, obviously, "Seraphimblade" is not my real, legal name—but it is my pseudonym on the project and what people know me by, and I signed that name to any such decision. Similarly, if I block someone, delete a page, whatever have you, my username appears in the log as having done so and I can therefore be held accountable for it. It is not done anonymously. Seraphimblade Talk to me 04:44, 31 August 2020 (UTC)Reply[reply]
The Trust & Safety Case Review Committee are not accountable to us; they are accountable to WMF, who know their real, legal names. But if you want to initiate an RfC to abolish anonymity, I would definitely support that. Hawkeye7 (discuss) 05:31, 31 August 2020 (UTC)Reply[reply]
If T&S is going to be taking actions regarding communities, but not be accountable to those same communities, that's a problem on a whole different scale. It's not WMF's place to be doing what the community of editors doesn't want done. Seraphimblade Talk to me 20:41, 31 August 2020 (UTC)Reply[reply]





       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0