The Signpost
Single-page Edition
WP:POST/1
27 September 2020

Special report
Paid editing with political connections
News and notes
More large-scale errors at a "small" wiki
In the media
WIPO, Seigenthaler incident 15 years later
Featured content
Life finds a Way
Arbitration report
Clarifications and requests
Traffic report
Is there no justice?
Recent research
Wikipedia's flood biases
 

2020-09-27
Contribute   —  
Share this
By Smallbones

Many Wikipedians have seen the effects that paid advocates can have on an article. These paid editors tend to make obviously biased edits, be very persistent, and are easily identifiable. But some of them are much more sophisticated. This investigation focuses on one firm that, except for two seemingly minor editing mistakes, would likely not have been identified. The firm had twelve sock puppets, which are now blocked, and worked for a charity now involved in an ethics investigation of Canadian Prime Minister Justin Trudeau. The paid editing firm has also worked on articles about Russian and Ukrainian oligarchs, and subcontracted social media work from Bell Pottinger during that firm’s disastrous PR campaign in South Africa.

Percepto

Percepto, formerly known as Veribo, is an Israeli firm that advertises its online reputation management (ORM) services and its "Wikipedia consulting". Its ORM clients have included binary options companies.

Until earlier this year, it published a page on its website "Wikipedia - The rules of the game" about how difficult Wikipedia’s rules make life for paid editors. They end the page "Wikipedia editing is a beautiful challenge. […] To consult on any and all Wikipedia-related queries for you or your clients, leave us your contact info and we will get back to you promptly." But let’s be clear: Percepto doesn’t follow Wikipedia’s rules—for example, none of Percepto's paid editors blocked for editing ever declared their employer or clients.

Their website promises "strict confidentiality" for their clients and also states, "Effective online reputation management is not confined by geographic borders or conceptual barriers. In fact, the ability to overcome these traditional boundaries is essential. We know how to bend the lines without breaking them and where to shine the spotlight in order to promote your agenda effectively."

Wikipedia editors believed to be employed by the firm have been indefinitely blocked for editing articles on the WE charity which is currently under investigation for a scandal involving Canadian Prime Minister Justin Trudeau. Editors blocked in the same sockpuppet investigation edited articles on Russian and Ukrainian businessmen Viktor Vekselberg, Boris Lozhkin, and Gennady Gazin. The Signpost reminds our readers that no purely on-Wiki investigation can completely prove the real world identity of the accounts under investigation. For instance, an editor may try to impersonate another individual or company in order to embarrass them, known as a "Joe job". The evidence in this case seems particularly strong, however, and is based not just on the thorough sockpuppet investigation and a long discussion at the Conflict of interest noticeboard, but also on a pair of remarkable editing mistakes, by two accounts associated with Percepto. In this edit to the article on the Rinat Akhmetov Humanitarian Center, the now-blocked sock puppet Isaack.build inserted an apparent link to a Dropbox file

File:///C:/Users/b oph/Dropbox (Veribo)/Delivery/Active Clients restored/Ruslan Baisarov/Wikipedia/8918 Baisarov Akhmetov Humanitarian Ophir 100.docx#%20ftnref1

This file path or link looks like a Dropbox file for the company Veribo, possibly related to a customer named Ruslan Baisarov about the Wikipedia article on the Rinat Akhmetov Humanitarian Center.

Not to be outdone, the now-blocked sockpuppet MarthaLetter made this edit to the article on ME to WE inserting a link to a possible Dropbox file:

File:///C:/Users/User/Dropbox (Veribo)/Delivery/Active Clients restored/We/Wikipedia/May 2019 project/ME to WE wiki page - phase 3.docx#%20ftn5

This file path or link looks like a Dropbox file for the company Veribo, possibly related to a customer named WE about the Wikipedia article on the for-profit firm ME to WE.

While we cannot be 100% certain the blocked accounts were employed by Percepto, we can report on why the accounts were suspected as sockpuppets, and examine their edits.

Canadian charity scandal

WE was founded in 1995 as "Free The Children" by the then-12-year-old Craig Kielburger and his older brother Marc as an organization opposed to child labor, especially in the global south. Over the years it expanded from Canada to the UK and US, expanded its mission to include education and economic opportunity for young people, and shortened its name to WE. It also founded a for-profit arm ME to WE. Wikipedia articles or draft articles were written about it including biographies of the co-founders and the CEO of ME to WE. Other articles include ME to WE, WE Day, and recently WE Charity scandal.

The ongoing scandal involving the WE charity and Canadian Prime Minister Justin Trudeau first made the news in early July when it was revealed that Trudeau family members, including his mother Margaret Trudeau, had taken almost C$300,000 (about US$228,000 at exchange rates this week) in speaking fees from WE over about four years. The prime minister had earlier participated in the decision to award a government contract to WE where C$900 million would then be given to youth "volunteers" who had been affected economically by the COVID-19 pandemic. WE’s management fees would have been C$43.5 million.

Finance Minister Bill Morneau, who also had a conflict of interest in the matter, later resigned. Justin Trudeau apologized for not recusing himself from the contract decision. He then temporarily suspended Parliament, which had the effect of suspending two parliamentary investigations into the matter, until Parliament reopened this week. In the meantime, WE announced that it would close its Canadian operations over the next year.

Because of the block of the Percepto editors, their editing ended three months before the scandal broke. Their general editing strategy appears to be quite sophisticated and aimed at long-term influence over article content rather than short-term results. They tended to edit fairly conservatively, with major changes spread out over time, sometimes involving multiple user accounts. Many of the edits could be viewed as "housekeeping" style edits - correcting grammar, adding references, categories, or minor facts, or removing redundant material. They did try to add new articles about WE and its executives, such as this unsuccessful draft and this successful article. Some edits might be considered promotional, but not garishly so. A list of Marc Kielburger’s awards, for example, was added, a few of which might have been properly included if they’d been added by an unbiased editor. The Signpost’s investigation found few or no obvious removals of negative content, though the number of very long edits and a few borderline removals of material make this impossible to rule out. These editors did clearly leave out one type of content - our investigation could not find any additions of content that reflected poorly on WE or related persons.

These editors followed a rather sophisticated and time-consuming approach to apparently create a backstory for each editor or hide their identities. They made many edits unrelated to WE but related to Toronto that would normally suggest that they were Torontonians. But surprisingly, many of the blocked sock puppets edited according to a normal 8am-5pm Sunday-Thursday Israeli work week.

Working with Bell Pottinger

Percepto, then named Veribo, worked as a subcontractor for the infamous political lobbying firm Bell Pottinger. They worked with social media in Bell Pottinger's horrific 2017 South African campaign of social and racial division meant to support the Gupta family and the then-president of South Africa Jacob Zuma. The New Yorker later reported that Veribo stated “We now regret our involvement” with Bell Pottinger’s South Africa campaign. A more detailed interview in South Africa’s Daily Maverick quotes Veribo’s CEO expressing the same view and deflecting all responsibility for the campaign onto Bell Pottinger.

Bell Pottinger has had a long history with Wikipedia. In a videotaped undercover interview by the Bureau of Investigative Journalism, Bell Pottinger revealed that it "sorts" Wikipedia coverage of its clients with a team of employees, and also employs other "dark arts" for its clients.

When its South African campaign was exposed in 2017, Bell Pottinger immediately began losing clients and employees and the firm quickly collapsed.

How the paid editing was discovered

Wikipedia administrator OhanaUnited is a resident of Ontario and knows about WE and the recent scandal. He told The Signpost that when he first saw the edit with the Dropbox link he thought "oh, someone's finally going to expand this article on this worthy initiative." But the editor must have been new and unfamiliar with Wikipedia’s referencing practices. "How many people prepare it (their edit) as .docx and upload to Dropbox? … I was pretty sure it's undisclosed paid editing since the Dropbox file path mentioned 'Clients' but my editing areas don't typically come across paid editing and I needed second pair of eyes to look at it" so he asked about the edit at the conflict of interest noticeboard.

Wikipedia administrator Newslinger guided The Signpost through this sock puppet investigation, to the extent permitted by Wikipedia rules.

Many undisclosed paid editing cases involve the abuse of multiple accounts, also known as sockpuppets. These editors may have particular behaviors that can be observed among more than one of their accounts. This behavioral evidence can be submitted as a sockpuppet investigation (SPI) to trace the extent of the possible abuse.

In an SPI, the editing history of the editors under examination are probed to determine if at least two of these accounts are operated by the same individual and also looks for potential connections to other accounts. All SPIs include a review of the available behavioral evidence and some SPIs also involve administrators known as checkusers who can inspect site logs not revealed publicly on Wikipedia, called technical evidence. A checkuser can only examine technical evidence if there is adequate behavioral evidence.

Few investigations have "smoking gun" evidence as strong as the Dropbox links in the Percepto case. The first link was discovered by chance by an editor not usually involved in paid editing investigations: a Microsoft Windows file path that was accidentally included by editor MarthaLetter in a 16 June 2019 edit. The link identified the article Me to We as another one of Veribo's "Active Clients restored" that had a "wiki page" in "phase 3" of some operation. The edit was tagged as a visual edit, which means that MarthaLetter most likely drafted the content in a word processor before copying and pasting it into the Wikipedia article.

The Wikipedia search feature was used to find all traces of Veribo in Wikipedia articles. Another Dropbox link from Veribo was found: on 3 June 2019, Issack.build's link revealed Ruslan Baisarov as one of Veribo's "Active Clients restored". The edit was also tagged as a visual edit.

Combined, these two links showed that Veribo was associated with at least two accounts that engaged in undisclosed paid editing and a search of technical evidence began. Five other accounts showed behavioral evidence of being connected to the first two accounts and technical evidence resulted in the blocking of 12 sock puppet accounts.

The strength of the behavioral and technical evidence in this investigation made it highly unlikely that these accounts were framed in a "joe job", which are usually related to trolls rather than paid editors.

Where we stand now

The edits made by accounts blocked for their connection to Percepto/Veribo provide a window into the world of sophisticated paid advocacy on Wikipedia by a firm which is willing to take on politically connected assignments.

These editors created some articles and influenced the content of other articles related to the WE charity which is now being investigated in a scandal involving Canadian Prime Minister Justin Trudeau. While we haven’t presented any evidence that Trudeau knew about Percepto, the charity WE should have been able to find enough information about the type of online reputation managers that they employed just by reading Percepto’s website.

Percepto or the editors connected to them were involved in editing in three areas of concern to the public. Where does this paid advocacy editing leave Wikipedia’s reputation for accuracy?

Some Wikipedia editors might think that the Percepto editors were only caught because they made two silly mistakes and were incompetent. Other Wikipedians know, much to their chagrin, that it is surprisingly easy to unknowingly reveal identifying information while editing. Percepto editors may be back - there is apparently a lot of money to be made in this business. While it may take some time to discover any new edits they make, sooner or later they would likely be discovered again. Clients of ORM firms should be aware of this.

Of course there are many companies, frauds, grifters, and kleptocrats that have things to hide and would like to use similar ORM firms to edit Wikipedia. Some of them may be willing to spend much more money on it than the WE charity could. Completely stopping paid advocacy on Wikipedia may be an impossible task, but it is a task that Wikipedians continue to diligently work on.

The Signpost will continue to cover online reputation management firms. If you have any tips that can be documented on how these firms operate on Wikipedia, please contact us here.

The WE charity and Percepto were contacted to request comments on this story. WE responded that they would get back within 48 hours. Percepto did not respond before deadline. The Signpost will update this story with any replies from them.

Update: Following the publication of Wikipedia probe exposes an Israeli stealth PR firm that worked for scammers in the The Times of Israel, WE Charity responded to The Signpost "WE charity appreciates the seriousness of the paid editing problem on Wikipedia. Though we have hired other reputation managers, we did not hire the firm Percepto, as suggested by your story."




Reader comments

2020-09-27

More large-scale errors at a "small" wiki

Contribute   —  
Share this
By Bri, Eddie891, and Smallbones

Large-scale errors at Malagasy Wiktionary

Growth of Malagasy Wiktionary, 99.23% due to bot edits

A small wiki audit of the Malagasy Wiktionary found that the wiktionary, which has the second largest number of entries (over 6,103,961), has had a large number of their pages automatically translated. Bot-Jagwar is a bot account run by Jagwar, the sole admin who has made edits. On the project, his bot has made more than 22 million edits (and counting). Jagwar also has a secondary bot account, Bot-Jagwar II which has made a further 6,976 edits. Another major bot contributing to mg.wikt, making the exact same type of edit, is Ikotobaity, with 2,456,748 edits run by Lohataona until 2017; the bot has been inactive since 20 October 2017. These three bots have created 6,076,769 new mainspace pages, which is 99.23% of all mainspace pages on mg.wikt. (Jagwar also ran bot edits on his main account, so the true number of bot-created entries is likely 50,000 higher.)

In this blog post, Jagwar detailed the history of his bot and mg.wikt. The bot began editing in 2010, at a rate of 50,000 edits per day, initially simply importing foreign words from other wiktionaries. After the wiki reached 200,000 pages in 2011, he wrote a script that "upload[ed] the word forms of that language", and propelled Malagasy Wiktionary to be the third largest. In 2012, Jagwar developed a more refined script. He uses NLP and automated translation in order to generate new entries, with no human intervention nor oversight. In the blog post, he wrote that translation errors were estimated at <5%, though he had "no precise idea" of it.

There is no active editing community, and Jagwar is the sole active admin on the site. Jagwar himself has only made 6 edits in the last 90 days, of which only 3 were in mainspace. The audit noted that there are various mistakes in the entries. Of a random survey of 100 non-Malagasy entries, the auditor concluded that 49 were "unusable", 29 "partially usable", and only 22 were "fully correct and usable" (though they may still have minor errors). Of Malagasy entries, the report noted that:

There are 41,902 entries categorised as lacking any definition, most of which seem to be Malagasy entries, and around 30,000 of which are the result of the definitions being removed due to copyright violation many years ago. Although there are 1,150,182 Malagasy entries in total, most of these are inflected forms, which can generally be safely created by bots. These definitionless entries are not strictly speaking incorrect, but a definition is the most central function of a dictionary, so these entries fail to be a useful part of the dictionary as a whole.

The bots also ran 218,156 edits at chr.wikt from 2012 to 2014 and 127,389 edits at ku.wikt from 2012 to 2013. The audit concluded that "Even an editing community of the size of the biggest Wiktionary, en.wikt, would not be able to clean up after these bots by hand". It strongly recommended deleting all non-Malagasy entries, removing translation sections, and telling the bot owners to cease automated creation of entries, and weakly recommended deleting all definition-less entries. – adapted by Eddie891 from Large-scale errors at Malagasy Wiktionary, written by Metaknowledge, with help from Surjection, AryamanA, Erutuon, and Smashhoof, along with input from a fluent speaker of Malagasy who wishes to remain anonymous.

Inline parenthetical citations deprecated

A Request for Comment (RfC) to deprecate the inline parenthetical citation style was closed by Seraphimblade on 5 September as having reached consensus "that inline parenthetical referencing should be deprecated". The RFC, which was begun by CaptainEek on 5 August, drew a large amount of attention and discussion. A watchlist notice for the RFC was placed on 29 August after a discussion determined that it was a sufficiently high-profile RFC.

In closing the discussion, Seraphimblade noted that roughly 71% of the community had supported the proposal and that there was only consensus to deprecate "parenthetical style citations directly inlined into articles", rather than {{harv}} style-references in <ref></ref> tags. The RFC led to the WP:PAREN and WP:CITEVAR guidelines needing an update, though as of The Signpost's publication deadline, what the update would look like was still under discussion. Before the RfC, CITEVAR specifically stated that "editors should not attempt to change an article's established citation style merely on the grounds of personal preference" and cited a 2006 Arbitration Committee decision that "Wikipedia does not mandate styles in many different areas", including citation style. E

More news

Brief notes



Reader comments

2020-09-27

WIPO, Seigenthaler incident 15 years later

Contribute   —  
Share this
By Smallbones and Tilman Bayer
A previous WIPO General Assembly meeting (2011)

Beijing blocks WMF from World Intellectual Property Organization, citing Wikimedia Taiwan

On September 23, at the general assembly meeting of the World Intellectual Property Organization (WIPO) in Geneva, Switzerland, the Chinese government's delegation blocked the Wikimedia Foundation from joining WIPO as an observer. The incident was reported by Quartz ("Beijing blocked Wikimedia from a UN agency because of 'Taiwan-related issues'") and news media in various other languages (for example, ZDNet France [1], Der Standard [2] and Netzpolitik.org [3]).

As summarized by Quartz,

... the Beijing delegate said that China had “spotted a large amount of content and disinformation in violation of [the] ‘One China’ principle” on webpages affiliated with Wikimedia, thereby contravening established UN protocols and “the consistent position of WIPO on Taiwan-related issues.” The Beijing representative also suggested that Wikimedia Taiwan has been “carrying out political activities… which could undermine the state’s sovereignty and territorial integrity.” [...] Beijing claims sovereignty over Taiwan, even though the ruling Communist Party has never controlled the country.

According to one eyewitness, Teresa Nobre of Communia,

This decision came as a shock to many observers of WIPO, since there has only been one case in recent memory where an observer status application to WIPO has not been accepted. In 2014, the Pirate Party International was rejected due to being a federation of political parties.

Beijing has long been known for its efforts to prevent Taiwan or Taiwanese organizations from participating in global associations (such as the World Health Organization, or, as a recent example, BirdLife International). However, excluding an international organization like WMF for such reasons seems highly unusual, with the US delegation pointing out "the established precedent at WIPO of supporting other existing observers and Member States that also have some affiliation with Taiwan. For example, the International Chamber of Commerce, the International Law Association, the Biotechnology Industry Organization ..."

Wikimedia Taiwan reacted with a statement emphasizing its status as an independent organization and its commitment to neutrality, stating "we fairly display all points of view of a controversial topic, not the point of view from any particular country or government". The Wikimedia Foundation urged China to withdraw its objection, which would enable the application to go through next year.

On the Publicpolicy mailing list, Sherwin Siy from the Wikimedia Foundation gave some background about its motivations for joining WIPO:

WIPO is where the world's countries gather to write the treaties that shape the laws that govern the world's knowledge. If you've ever complained about DRM laws being ubiquitous, you can blame lobbying that took place at WIPO; if you're glad for recent laws that make it easier for blind and visually impaired people to access books, you can thank lobbying that took place at WIPO, too.

Those treaties are negotiated among country delegations that typically sit in a big impressive room in Geneva. Meanwhile, hundreds of non-governmental organizations (NGOs) representing publishers, broadcaster, record labels, libraries, and civil society organizations sit at the back of the room, observing the negotiations as they happen and, in between official sessions, those groups hold side briefings, pass out position papers and white papers, and try to make sure that the negotiators don't forget about their particular interests.

We wanted to make sure that the Foundation could be a part of those conversations, as a way to bring more members of the community to WIPO, and make sure that our movement's interests don't get left behind.

Creative Commons (itself already an observer at WIPO) and Wikimedia Germany reacted with statements supporting the Wikimedia Foundation's application.

Like Communia ("It was particularly disappointing that the European Union and its Member States remained silent in the discussion") and former European Parliament member Julia Reda ("Shamefully, the EU kept silent"), the German chapter also criticized the lack of support from EU member states, in contrast to the reactions of the delegations from the US and the UK.

As noted by Quartz, the Chinese government's action should be seen in the context of its previous blocking of Wikipedia and more recent reports about conflicts over Taiwan-related content on Wikipedia (see Signpost coverage: "The BBC looks at Chinese government editing"). The English Wikipedia's decision some months ago to describe Taiwan as a country also comes to mind. That said, besides Wikimedia-specific aspects, it's also worth being aware of current geopolitical developments, with almost 40 Chinese warplanes crossing the previously respected Cross-Strait median on the weekend before the WIPO incident, and observers warning that a military invasion of Taiwan is becoming a more realistic possibility.

Seigenthaler incident 15 years later

Wikipedia falsely said I was convicted of attempted murder. I expected online abuse, but not this: The editing described by this article in the Seattle Times was done by a user who states that he is a teenager. He has also requested that he be indefinitely blocked and his request was granted. Fifteen years ago this month John Seigenthaler discovered that Wikipedia had suggested that he was involved with the assassinations of John and Robert Kennedy. There have been over a hundred discussions on Wikipedia:Biographies of living persons/Noticeboard involving the word "murder" since that time. We're still making the same type of mistake.

Both parties agree, curb Section 230

DOJ to Seek Congressional Curbs on Immunity for Internet Companies: (paywalled) The Wall Street Journal reports that the US Department of Justice is seeking to change the Section 230 protections for internet platforms. According to the WSJ, Section 230 of the Communications Decency Act "gives internet platforms broad latitude to police their sites and shields them from legal liability related to users’ actions, except in relatively narrow circumstances." While the WSJ did not mention Wikipedia in the article, Wikipedians might still feel threatened. Section 230 is central to the way Wikipedia operates: it says that the WMF is not responsible for your edits. Back in July Digital Trends stated the case bluntly in If Section 230 gets killed, Wikipedia will die along with it. It quoted Sherwin Siy, the Wikimedia Foundation’s senior manager for public policy saying

[If we were to] live in a world where there is no Section 230 in the United States, that changes things drastically ... It makes it a very different landscape. You’d see a lot of platforms being much more hesitant to allow users to publish things without any vetting. It would expose, for example, the Wikimedia Foundation to a lot more potential liability. It actually would just be a punishing amount of risk.

Bills cosponsored by Republicans and Democrats have been proposed to modify section 230, and presidential candidate Joe Biden has proposed revoking it.

In brief

This raises the question of "why not just skip to step 5 right away, especially if you are going to ignore the COI guideline?"
In earlier attempts to encourage the never-ending quest for free advertisements on Wikipedia, Entrepreneur has published

Odd bits

The Signpost in the media

“No crypto blogs, no crypto news sites — because these look like specialist trade press, but they’re really about advocacy: promoting their holdings. Many are blatantly pay for play, and very few ever saw a press release with ‘blockchain’ in it that they wouldn’t reprint.”



Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit next month's edition in the Newsroom or leave a tip on the suggestions page.




Reader comments

2020-09-27

Life finds a Way

Contribute   —  
Share this
By Eddie891 and Gog the Mild
Panorama of Schloss Favorite from the path to Ludwigsburg Palace, January 2017

This Signpost "Featured content" report covers material promoted from August 23 through September 20. For nominations and nominators, see the featured contents' talk pages.

Spanish battleship España, formerly Alfonso XIII underway, photographed c. 1932
Portrait of Elizabeth Willing Powel by Matthew Pratt, c. 1793
A CBC specimen in front of a printout displaying CBC and differential results
Vicente shortly after reaching tropical storm strength on October 19
Dinar minted in Yusuf I's name
A zebra
Cromwell at Dunbar, 1886, by Andrew Carrick Gow
Natalie Portman speaking at the 2019 San Diego Comic-Con International in San Diego, California
Producer of Avengers: Endgame, Kevin Feige



Reader comments

2020-09-27

Clarifications and requests

Contribute   —  
Share this
By Bri

Arbitration requests

Clarification and amendment requests

Amendment requests adjusting one editor's editing restrictions are not discussed here.

New case requests

Several arbitrators voting to accept the case cited the lack of resolution to issues at the Arbcom case brought concerning the same administrator this past June. However, the committee is divided on this; after writing on 11 June 2020 [4]: I'm voting to decline today because I don't see enough recent evidence of serious incivility or personal attacks to warrant convening an admin-conduct case—but the outcome might be different if we find ourselves back here with a more solid request for a case, based on incidents occurring after today. JzG, there might be people out there looking for a good reason to file a new request. Don't give them one., Newyorkbrad stated on 9 September, in voting to decline again, commented [5] In voting to decline a previous case request against JzG in June, I urged him to remain civil even in difficult situations. It is good that in both of the recent disputed discussions, he appears to have done so.




Reader comments

2020-09-27

Is there no justice?

Contribute   —  
Share this
By Igordebraga, Kingsif and Rebestalic
This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga, Kingsif and Rebestalic.

Our lives are still hindered by the COVID-19 pandemic, but Wikipedia readers don't seek the article on this disease as much as they seek other unsavory subjects such as politics and police brutality. They also remember the recently deceased and seek info on distractions, such as movies and streaming shows.

In my culture, death is not the end. It’s more of a stepping-off point. (August 23 to 29)

Most Popular Wikipedia Articles (August 23 to 29, 2020)
Rank Article Class Views Image Notes/about
1 Chadwick Boseman 9,945,698 Boseman's death was announced by his family on the evening of August 28, from stage 4 colon cancer that he fought for 4 years without telling anyone. While playing a beloved action hero in a physically demanding role (#6). He is also known for a variety of biopics, playing important black figures in history.
2 Tenet (film) 1,431,368 Christopher Nolan's long-awaited drama got postponed due to the pandemic that has slipped off the list again, but was released in movie theaters this week.
3 Kimberly Guilfoyle 1,347,829 This high-profile Trump supporter spoke at the 2020 Republican National Convention.
4 Shooting of Jacob Blake 1,249,181 Blake, an African-American man, was shot in the back by police 7 times – he was paralyzed but lived. His shooting has kicked off riots in Kenosha, Wisconsin, (where it happened) and Portland, Oregon, which have just missed out on the list. The NBA also refused to play in the direct aftermath.
5 The Batman (film) 988,016 The trailer for this film was released, and well-received, as was star Robert Pattinson (pictured).
6 Black Panther (film) 927,889 The film starred #1 as the first lead role for a black superhero in the Marvel Cinematic Universe, and made over $1 billion. It is widely cited as proving all-black casts are popularly and financially viable. A sequel was in production, and if it goes through, title character T'Challa will need a recasting.
7 Melania Trump 858,406 The First Lady delivered a speech to the RNC from her renovated White House Rose Garden.
8 QAnon 829,809 A conspiracy theory that people won't shut up about. And possibly the only one claiming that the government is the victim. Speaking of which...
9 Donald Trump 826,247 The center of attention at the RNC, who is seeking re-election.
10 Deaths in 2020 816,929 Better quote from #6's soundtrack to eulogize #1:

I just thank for the life, for the day, for the hours and another life breathin'
I did it all 'cause it feel good
You could live it all if you feel bad
Better live your life
We are running out of time

We must find a way to look after one another, as if we were one single tribe. (August 30 to September 5)

Most Popular Wikipedia Articles (August 30 to September 5, 2020)
Rank Article Class Views Image Notes/about
1 Chadwick Boseman 3,974,415 The untimely death of a talented actor, who played three iconic African-Americans (Thurgood Marshall, James Brown and Jackie Robinson) and Marvel Comics' first black superhero (#7), is still being felt by his fans.
2 Tenet (film) 1,587,728 Christopher Nolan's long-awaited drama got postponed due to the pandemic that has slipped off the list again, but the film was released in movie theaters last week. Tenet had good reviews, even if with criticism for a confusing plot that was not helped by drowning dialogue under noise, and opened to a somewhat respectable $20 million given most theaters are still closed and many people are still afraid of going out during a pandemic.
3 Cobra Kai 1,192,835 Netflix released the show, previously exclusive to YouTube Premium, that brings back The Karate Kid himself, Daniel LaRusso, the guy who he crane kicked in the face, Johnny Lawrence, and the former sensei of the title dojo, John Kreese (whose actor Martin Kove is seen with a fan to the left)
4 Pranab Mukherjee 940,517 The 13th President of India passed away at the age of 84.
5 Deaths in 2020 817,442 Seems everything we've ever known's here
Why must it drift away and die?
6 Robert F. Kennedy Jr. 811,642 His father was a senator, his uncle was president... and despite this great legacy, RFK Jr. is in the news for his anti-vaccination views and COVID-19 misinformation, down to speaking in a partially violent demonstration in Berlin calling for an end to anti-Corona virus restrictions. (Strangely, the redirects to his article got more hits than the actual title.)
7 Black Panther (film) 764,462 The film starred #1 as the first lead role for a black superhero in the Marvel Cinematic Universe, and made over $1 billion. This week it was given special screenings on TV in the U.S., as well as re-released in some movie theaters, in tribute to the film's deceased star.
8 Mulan (2020 film) 744,896 The pandemic screwed over the release of Disney's latest live-action remake, originally scheduled for March and postponed to the point the company decided to put it on Disney+ under the hefty tag of $30. And so, at least in the United States, the streaming service received Yifei Liu (pictured) playing the Chinese girl who decided to fight a war in her father's place (who wasn't royalty or married into it, but still counted as a Disney Princess), only now without the Eddie Murphy dragon or that awesome montage song.
9 The Boys (2019 TV series) 625,976 Still on streaming: after The Umbrella Academy on Netflix, another subversive superhero show returned, namely the Prime Video comic book adaptation where "supes" are corporate puppets and overall jerks – and now we can't even say the exception are the women, as Season 2 made the jaded veteran and the still idealistic newcomer be joined by a racist sociopath named after a neo-Nazi website.
10 William Zabka 593,068 While Zabka's career after being crane kicked in the face didn't take off – at most, he produced an Academy Award-nominated short – he has now returned to the role of Johnny Lawrence in #8.

Mysterious as the dark side of the moon (September 6 to 12)

Most Popular Wikipedia Articles (September 6 to 12, 2020)
Rank Article Class Views Image Notes/about
1 Mulan (2020 film) 1,383,212 One of the movies that would've hit theaters in the first semester if not for the pandemic has become available on Disney+ for a hefty $30 dollar premium (and also theaters in some countries without the service). The story of the Chinese girl who takes her father's place in the army got a more serious approach, supposedly closer to the source material – though I bet Mulan's wire fu and the invading army having a witch are both original additions, as much as the funny characters and musical numbers that made the cartoon so beloved. Hence why audiences were not as forgiving of the movie as reviewers (especially in China, in spite of this remake of their folk tale trying to cater to them...).
2 September 11 attacks 1,357,498 Next year, the terrorist attack that really started the 21st century will have happened two decades ago. Man, are we getting old.
3 Tenet (film) 1,168,269 Christopher Nolan returned with another complicated concept – something about time manipulation and preventing a war – executed with flashy visuals, and even made it hit theaters in spite of most being closed by the pandemic. Even with that hindrance (especially when many people don't want to leave their homes), it already made its $200 million budget in the box office, and got positive reviews, although with criticism for a confusing plot that was not helped by drowning dialogue under noise.
4 Diana Rigg 1,141,276 Dame Diana Rigg died at 82 after a long and storied great career, highlighted by marrying James Bond and dying because of it, being a super spy herself, and reigning as Queen of Thorns.
5 The Boys (2019 TV series) 973,513 Instead of a season all at once, the return of the jerk superheroes at Prime Video had three episodes followed by weekly installments – so every week viewers can return to see Antony Starr's Homelander and Aya Cash's (pictured) Stormfront be horrible people.
6 Dune (2020 film) 794,940 Denis Villeneuve had good results with his approach on a beloved sci-fi film from the 1980s, so him taking instead a reviled sci-fi film from the 1980s with a beloved source material (#9) might also work, if the good reaction to the first trailer of this movie shows anything.
7 Deaths in 2020 777,830 Hope you got your things together
Hope you are quite prepared to die
8 Cobra Kai 776,530 Netflix took this previously YouTube Premium-exclusive series continuing on the story of The Karate Kid, now focusing on the antagonist dojo that gives the title.
9 Dune (novel) 728,725 In 1965, Frank Herbert wrote this sci-fi novel centered around a planet that houses an addictive substance and giant worms, that eventually spawned a franchise that continued to add books until 2017. The first attempt at adapting it went wrong, but #6 keeps fans hopeful.
10 Naomi Osaka 658,268 Wimbledon was cancelled by the COVID-19, but tennis still managed to return with the 2020 US Open. This Haitian-Japanese wonder who lives in the U.S and plays for Japan, won the women's tournament, two years after her first triumph at Flushing Meadows.

Pulling your strings, justice is done (September 13 to 19)

Most Popular Wikipedia Articles (September 13 to 19, 2020)
Rank Article Class Views Image Notes/about
1 Ruth Bader Ginsburg 4,162,215 The Notorious R.B.G., a prominent proponent of women's rights, served as a justice with a liberal philosophy on the Supreme Court of the United States.
2 Dennis Nilsen 2,502,328 David Tennant already played a completely despicable human being on Netflix, so now he adds another on ITV – and worse, a real life one, this serial killer who is the focus of the miniseries Des.
3 The Devil All the Time (film) 907,722 Two new Netflix releases, a psychological thriller book adaptation, and a French coming-of-age comedy-drama criticizing the hypersexualization of pre-teens.
4 Cuties 896,477
5 Tenet (film) 795,947 This movie is apparently complex.
6 Deaths in 2020 768,735 Reminds me of the summer time
On this winter's day
See you at the bitter end!
7 The Boys (2019 TV series) 696,038 Who are you when you don't like the superheroes who work for Vought International and decide to act against the superheroes' true selves (think conceited)? You're a member of The Boys, a vigilante group – led by Karl Urban (pictured) as Billy Butcher – in a universe made possible on moving pictures by Eric Kripke for Prime Video. A second season remains running while a related short film recently dropped.
8 Naomi Osaka 666,797 This Japanese-born tennis player recently won her division of the 2020 US Open for Tennis. Also notable: Each mask she wore in the Open bore the name of an African-American racial martyr, including George Floyd and Breonna Taylor.
9 Mulan (2020 film) 658,195 This 2020 live-action remake of Disney's not hugely but sizably popular original animated film sparked the #CancelMulan hashtag once viewers decided they didn't like how Disney thanked regional authority organisations in China's Xinjiang province, where Uyghur Muslims are being forcibly re-educated.
10 Amy Coney Barrett 657,650 Currently a judge in a US appeals court, Amy Coney Barrett is seen as the apple of incumbent President Donald Trump's eye when it comes to filling the vacant position on the Supreme Court of the United States.

Exclusions



Reader comments

2020-09-27

Wikipedia's flood biases

Contribute   —  
Share this
By Tilman Bayer


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


"Uneven Coverage of Natural Disasters in Wikipedia: The Case of Floods"

A 2017 flood in China (64.29% of whose floods are covered on English Wikipedia, according to the study)

A paper[1] with this title, presented earlier this year at the "International Conference on Information Systems for Crisis Response and Management" (ISCRAM 2020), adds to the growing literature on Wikipedia's content biases, finding that while the English Wikipedia "offers good coverage of disasters, particularly those having a large number of fatalities [...] the coverage of floods in Wikipedia is skewed towards rich, English-speaking countries, in particular the US and Canada."

Any bias analysis of this kind is faced with the problem of identifying an unbiased "ground truth" that Wikipedia's coverage can be compared to. The researchers approach this diligently, resorting to "three of the most comprehensive databases documenting floods that are commonly used by the hydrology science for reference": Floodlist, which is funded by the EU's Copernicus Programme, the "Emergency Events Database" (EM-DAT), and the University of Colorado's Dartmouth Flood Observatory (DFO). Focusing on a timespan extending from 2016 to 2019, and following an elaborate process involving e.g. defining search criteria for each source and deduplicating the results, they arrived at a consolidated dataset consisting of 1102 flood events, of which only 249 were present in all three databases. The authors asked experts to identify possible reasons for these discrepancies (or biases) between the sources, e.g. the fact that Floodlist includes landslides resulting from heavy rain events that do not meet the definitions of the other two sources. They concluded that these explanations justified relying on events that were covered in at least two of the three sources, resulting in a dataset consisting of 458 floods.

The comparison dataset representing Wikipedia's coverage was constructed using keyword searches to find individual sentences mentioning flood events (rather than entire articles, which one might identify more easily using e.g. Category:Floods).

The analysis of the data focuses on the "hit rate" per country, defined as the percentage of floods from the ground truth dataset that have at least one corresponding item in the Wikipedia dataset. The United States was both the country with the highest number of floods in the ground truth dataset (36, followed by Indonesia with 25 and the Philippines with 17), and the country with by far the highest hit rate (86.11%) among the countries with the highest number of floods. Aggregated by continent, North America likewise had the highest Wikipedia coverage (49.06%), and South America the lowest (10.53%). Interestingly, Europe did not fare very well, with a hit rate of 21.18%, slightly below that of Africa (21.88%) and way behind Asia (which had 37.63% of its floods covered on English Wikipedia).

To identify possible causes of the differing hit rates by country, the authors "analyzed several socio-economic variables to see whether they correlate with floods coverage. These variables are GDP per capita, GNI per capita, country, continent, date, fatalities, number of English speakers and vulnerability index." This analysis consists of presenting various table and graphs with the hit rate plotted over four to six buckets of the independent variable (e.g. Low income / lower middle income / upper middle income / high income), eschewing more sophisticated statistical methods. They find some evidence for a bias toward higher income countries, although the trend is not entirely consistent (e.g. in a different classification into six instead of four income levels, the second-lowest level "Lower middle income" had a higher hit rate than the three above it). They also find evidence of that countries with a higher ratio of English speakers have better coverage, although "The language can be only a partial explanation because for floods in Australia the hit-rate is half and lower than other non-English-speaking countries" (similarly, the UK only ranked 16th in Wikipedia coverage among the top 20 countries with at least five floods in the ground truth data).

Still, the paper's overall conclusion is that "Wikipedia’s coverage is biased towards some countries, particularly those that are more industrialized and have large English-speaking populations, and against some countries, particularly low-income countries which also happen to be among the most vulnerable".

Unfortunately the researchers fail to acknowledge their own glaring bias in this research, namely the decision to exclusively focus on the English Wikipedia in a paper that is repeatedly hand-wringing about language disparities. To be sure, this bias has long been identified as an issue affecting a large part of Wikipedia research, and there are practical reasons for confining such an analysis to a language that researchers are fluent in. But since the authors clearly seem to frame such biases as a bad thing (at one point referring to them as "flaws" of Wikipedia), it is worth asking whether and why they think that the authors of reference works like Wikipedia should not focus their labor on those natural disasters that are more likely to affect their readers. While the study's confinement to only one of Wikipedia's hundreds of languages is mentioned in the "Limitations and future work" section, it is again framed just as an open question about Wikipedia's shortcomings ("understand how an editor’s language affects the coverage bias"), rather than as an acknowledgment of the paper's own.

Briefly


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Automated Adversarial Manipulation of Wikipedia Articles" using Markov chains

From the abstract and paper:[2]

"The WikipediaBot is a self-contained mechanism with modules for generating credentials for Wikipedia editors, bypassing login protections, and a production of contextually-relevant adversarial edits for target Wikipedia articles that evade conventional detection. The contextually-relevant adversarial edits are generated using an adversarial Markov chain that incorporates a linguistic manipulation attack known as MIM or malware-induced misperceptions. [...]

To show how the WikipediaBot could be used to harm discourse, we analyzed a scenario where a hypothetical adversary aims to reduce mentions of Uyghurs on the Uyghurs Wikipedia page [e.g. by changing "the ongoing repression of the Uyghurs" with "the ongoing repression of the Manchus", and other edits suggested by the MIM engine]. ... we contacted the Wikipedia security team with the details and the inner workings of WikipediaBot prior to writing this publication as part of the responsible disclosure requirement. The exposure of the WikipediaBot system architecture allows for consideration of other types of detection, prevention, and defenses then the one proposed in this paper [which was "to add a more robust CAPTCHA system to prevent edits to individual pages"]. We only tested the WikipediaBot on a local, isolated testbed, and never used it to make any adversarial manipulation on the live Wikipedia platform."


"From web to SMS: A text summarization of Wikipedia pages with character limitation"

From the abstract:[3]

"Due to the limitation of the number of characters, a Wikipedia page cannot always be sent through SMS. This work raises the issue of text summarization with character limitation. To solve this issue, two extractive approaches have been combined: LSA and TextRank algorithms. [...] The evaluation showed the relevance of the approach for pages of at most 2000 characters. The system has been tested using the SMS simulator of RapidSMS without a GSM gateway to simulate the deployment in a real environment."

(Compare also previous efforts to make Wikipedia accessible via text messaging)


"RuBQ: A Russian Dataset for Question Answering over Wikidata"

From the abstract:[4]

"The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification."


"Topological Data Analysis on Simple English Wikipedia Articles"

From the abstract and paper :[5]

"We present three statistical approaches for comparing geometric data using two-parameter persistent homology [a tool from topological data analysis ], and we demonstrate the applicability of these approaches on high-dimensional point-cloud data obtained from Simple English Wikipedia articles. [...] The data in this project was produced by applying a Word2Vec algorithm to the text of articles in Simple English Wikipedia, [converting] each of 120,526 articles into a 200-dimension vector, such that articles with similar content produce vectors that are close together. The data also gives a popularity score for each article, indicating how frequently the article is accessed in Simple English Wikipedia. Abstractly, our data is a point cloud of 120,526 points in , with a real-valued function on each point ..."


Dataset provides "interesting negative information" for Wikidata

From the abstract:[6]

"Rooted in a long tradition in knowledge representation, all popular KBs [knowledge bases] only store positive information, but abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. [..] We introduce two approaches towards automatically compiling negative statements. [...] Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.4M statements for 130K popular Wikidata entities."

See also Video and slides, OpenReview page, dataset


Amazon Alexa researchers measure "social bias" on Wikidata

From the abstract:[7]

"We present the first study on social bias in knowledge graph embeddings, and propose a new metric suitable for measuring such bias. We conduct experiments on Wikidata and Freebase, and show that, as with word embeddings, harmful social biases related to professions are encoded in the embeddings with respect to gender, religion, ethnicity and nationality. For example, graph embeddings encode the information that men are more likely to be bankers, and women more likely to be homekeepers."

The paper also contains lists of the top male and female professions in Wikidata (relative to female and male, respectively), evaluated by two different metrics. For the first metric (TransE embeddings), the male list is lead by baritone, military commander, banker, racing driver and engineer. The top five entries on the corresponding female professions list are nun, feminist, soprano, suffragette, and mezzo-soprano.


"Wikipedia and Westminster: Quality and Dynamics of Wikipedia Pages about UK Politicians"

From the abstract:[8]

"First, we analyze spatio-temporal patterns of readers' and editors' engagement with MPs' Wikipedia pages, finding huge peaks of attention during election times, related to signs of engagement on other social media (e.g. Twitter). Second, we quantify editors' polarisation and find that most editors specialize in a specific party and choose specific news outlets as references. Finally we observe that the average citation quality is pretty high, with statements on 'Early life and career' missing citations most often (18%)."


References

  1. ^ Valerio Lorini; Javier Rando; Diego Saez-Trumper; Carlos Castillo (2020). "Uneven Coverage of Natural Disasters in Wikipedia: The Case of Floods" (PDF). In Amanda Hughes; Fiona McNeill; Christopher W. Zobel (eds.). ISCRAM 2020 Conference Proceedings – 17th International Conference on Information Systems for Crisis Response and Management. Blacksburg, VA (USA): Virginia Tech. pp. 688–703. code on GitHub
  2. ^ Sharevski, Filipo; Jachim, Peter (2020-06-24). "WikipediaBot: Automated Adversarial Manipulation of Wikipedia Articles". arXiv:2006.13990 [cs].
  3. ^ Fendji, Jean Louis; Aminatou, Balkissou (2020-06-11). "From web to SMS: A text summarization of Wikipedia pages with character limitation". EAI Endorsed Transactions on Creative Technologies. 7. doi:10.4108/eai.11-6-2020.165277.
  4. ^ Korablinov, Vladislav; Braslavski, Pavel (2020-05-21). "RuBQ: A Russian Dataset for Question Answering over Wikidata". arXiv:2005.10659 [cs]. Dataset on GitHub
  5. ^ Wright, Matthew; Zheng, Xiaojun (2020-06-30). "Topological Data Analysis on Simple English Wikipedia Articles". arXiv:2007.00063 [math].
  6. ^ Hiba Arnaout, Simon Razniewski, Gerhard Weikum. "Enriching Knowledge Bases with Interesting Negative Statements". Automated Knowledge Base Construction (AKBC) 2020 https://www.akbc.ws/2020/papers/pSLmyZKaS
  7. ^ Fisher, Joseph; Palfrey, Dave; Christodoulopoulos, Christos; Mittal, Arpit (2020-05-07). "Measuring Social Bias in Knowledge Graph Embeddings". arXiv:1912.02761 [cs]. (also published on Amazon.Science)
  8. ^ Agarwal, Pushkal; Redi, Miriam; Sastry, Nishanth; Wood, Edward; Blick, Andrew (2020-06-23). "Wikipedia and Westminster: Quality and Dynamics of Wikipedia Pages about UK Politicians". arXiv:2006.13400 [cs].




Reader comments

If articles have been updated, you may need to refresh the single-page edition.



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0