AI is not playing games anymore. Is Wikipedia ready?

In the media

AI is not playing games anymore. Is Wikipedia ready?

By Bri, Jayen466, Oltrepier, Smallbones and HaeB

Portland pol's publicly-paid profile: Part II

See previous coverage: "Portland politician spends $6,400 in taxpayer dollars to 'spruce up his profile on Wikipedia'" about the article Rene Gonzalez (politician)

The 2020 Oregon Ballot Measure 107 allows campaign finance disclosure regulations in the state of Oregon, which may have been violated by the Gonzalez campaign, in addition to Gonzalez authorizing irregular expenditures of taxpayer funds not allocated to campaigning. Alt-weekly Portland Mercury said "It's unclear which fund the money for the Wikipedia edits came from, and why the money didn't instead come from Gonzalez's mayoral campaign funds."

Two Portland-based television stations had stories on an investigation into the expenditures. KOIN, the CBS affiliate, said that Gonzalez claims "the money went to train staff on how to follow Wikipedia standards", not to conduct impermissible campaigning; KGW, the NBC affiliate, also carried a full story about the case, titled "Commissioner Rene Gonzalez now the subject of Portland campaign finance investigation". – B

Is Wikipedia ready to play the game of Jum-AInji?

A transformer might think this image depicts "The Transformer", but it does not (it is, however, depicting an instance of Japanese hardcore)

In a recent article for The New Yorker, titled Was Linguistic A.I. Created by Accident? (paywalled), Stephen Marche focuses on the role of chance and good luck in the research that led to the landmark 2017 AI paper "Attention Is All You Need", which introduced the transformer architecture. The paper was originally supposed to focus on using the transformer to make English-to-German translations.

Instead, as part of the AI model's training process, the Google team asked the transformer to read Wikipedia entries for two days, covering almost half of the platform's pages. The model was then asked to create five new Wikipedia-style articles from scratch, all about made-up subjects called "The Transformer": a fictitious Japanese hardcore punk band formed in 1968, a fictitious video game, a fictitious 2013 Australian sitcom, a fictitious studio album by an alternative metal group called Acoustic, and even a fictitious science-fiction novel. At first reading, the articles produced by Transformer on the made-up topics all looked like real Wikipedia articles: they were almost too good, "filled with inconsistencies, but [...] also strikingly detailed", suggesting that AI had made a jump of twenty or more years of progress:

Why was a neural network designed for translating text capable of writing imaginative prose from scratch? "I was shocked, blown away," (researcher Aidan) Gomez recalled. "I thought we would get to something like this in twenty years, twenty-five years, and then it just showed up." The entries were a kind of magic, and it was unclear how that magic was performed.
— Was Linguistic A.I. Created by Accident?, Stephen Marche

The historical bond between Wikipedia and machine-learning based natural language processing goes back even further. The first attempts to provide the encyclopedia with text generated using artificial neural networks trace back to at least 2009.

But artificial intelligence and large language models are not just derived from Wikipedia; they are important topics for discussion and policy about the platform's future.

The rapid rise of ChatGPT has raised the most interest and sparked dozens of research efforts towards the implementation of LLMs in the creation and improvement of Wikipedia articles, among other tasks, with the STORM system prototype being the latest example. The Wikimedia Foundation has taken note of AI's progress, for example, by expanding its Machine Learning team and even testing an experimental ChatGPT plugin between July 2023 and February 2024. The Signpost itself has included DALL-E-generated images in various articles. On the other hand, in somewhat Jumanji style, the more we get invested in the AI game, the more traps we discover: without proper checks and balances, machine-generated content can pose a threat to the integrity of Wikipedia, should the number of unsourced and fictitious articles keep increasing and causing more problems with COI-related material and disinformation.

The Spanish newspaper El País recently interviewed Wikimedian and Wikimedia España member Miguel Ángel García, along with the WMF's Director of Machine Learning, Chris Albon (in Spanish, free registration might be required). García, who joined Wikipedia in 2006, noted how many newly-registered users introduce themselves by "[pasting] a giant text, apparently well-structured and well-developed", which turns out to be poorly-written and redundant after a closer look. Luckily, the platform is usually able to handle this material through mechanisms such as speedy or proposed deletion, as well as the continuous efforts of its volunteers, which have also been acknowledged by Albon. (Everyone interested can give a helping hand by joining initiatives such as the WikiProject AI Cleanup.)

However, both expressed concerns over the long-term impact of automatic content on the encyclopedia: while García is mainly worried about the incorporation of "pseudo-media" hosting bot-generated articles as sources on Wikipedia - a phenomenon that could actually be mitigated through reports at the noticeboard - Albon took a brief detour from his usually optimistic view on AI tools, explaining that "if there's a detachment between the places where knowledge is created, like Wikipedia, and the places where it is accessed, like ChatGPT, we're at risk of losing a generation of volunteers". He also said that LLMs providing the platform with poorly-sourced or unreferenced content could "introduce an unprecedented amount of disinformation" on the Internet, since "users will not be able to easily distinguish accurate information from [AI] hallucinations"; quite an ironic situation to find ourselves in, considering that chatbots such as ChatGPT and Google Gemini are being fed with thousands of Wikipedia articles as part of their training schedules.

Titled "ENC-AI-CLOPEDIA. AI is mining the sum of human knowledge from Wikipedia. What does that mean for its future?", a separate interview by Sherwood News (the media arm of trading platform Robinhood Markets) also featured Albon, together with his colleague Lane Becker, Senior Director of Earned Revenue at the Wikimedia Foundation and president of its for-profit subsidiary Wikimedia LLC, which runs Wikimedia Enterprise.

The interviewer first confronted them with "Data from Similarweb [which] shows that traffic to Wikipedia has been in decline" since about 2020. In response, Albon pointed to the Foundation's own (presumably more precise) pageview and unique devices data, with Becker asserting that "We have not seen a significant drop in traffic on Wikimedia websites that can directly be attributed to the current surge in AI tools." (This conclusion is somewhat in contrast with two recent academic papers, see our coverage: "ChatGPT did not kill Wikipedia, but might have reduced its growth", "'Impact of Generative AI': A 'significant decrease in Wikipedia page views' after the release of ChatGPT")

However (similar to Albon in the El País interview), Becker voiced "concern [...] about the potential impact that these AI tools could have on the human motivation to continue creating and sharing knowledge. When people visit Wikipedia directly, they are more likely to become volunteer contributors themselves. If there is a disconnect between where knowledge is generated (e.g. Wikipedia) and where it is consumed (e.g. ChatGPT or Google AI Overview), we run the risk of losing a generation of volunteers." (Not mentioned, but presumably on Becker's mind as well, was the fact that these visitors are also, via Wikipedia's well-known donation banners, the Foundation's most important source of revenue by far.)

Asked "How do you feel about practically every LLM being trained on Wikipedia content?", Becker stressed that "we welcome people and organizations to extend the reach of Wikipedia's knowledge. Wikipedia is freely licensed and its APIs are available for free to everyone, so that people all over the world can use, share, add to, and remix Wikipedia content." However, "We urge AI companies to use Wikimedia's free APIs responsibly and include recognition and reciprocity for the human contributions that they are built on, through clear and consistent attribution. They should also provide pathways for continued growth and maintenance of the human-created knowledge that is used to train them" - such as "Clearly attributing knowledge back to Wikipedia", but also, for "high-volume commercial reusers of Wikipedia content to use our opt-in paid for product, Wikimedia Enterprise." Becker shared that its total revenue (i.e. not accounting for the staffing and other costs of Wikimedia Enterprise itself) "for FY 2022-23 was $3.2 million - representing 1.8% of the Wikimedia Foundation's total revenue for the period." However, he declined to disclose how much of that came from Google (one of the few publicly known customers, another one being yep.com).

– S, O, H

See also in this issue's News and notes: "AI policy positions of the Wikimedia Foundation"

In brief

Red clover for Clovermoss

Wikimedian of the Year gets recognition within her local community: Local Canadian newspaper Thorold Today recently dedicated a full article to Wikipedian Hannah Clover, who has recently become the latest recipient of the Wikimedian of the Year award at Wikimania 2024 in Katowice; the newspaper likely cited Hannah's acceptance speech at the convention, where she broke down her most fond memories about the platform, as well as her successful RfA from last year. Among the other key contributions she was recognized for in the award, there's also a notable Signpost essay from January 2023, where she gave feedback on mobile editing of Wikipedia: you can find it here.
Wikipedia discussion about the war in Gaza keeps spilling over into the real world: A recent decision to rename the article Allegations of genocide in the 2023 Israeli attack on Gaza as Gaza genocide, which followed a lengthy talk page discussion, is being covered by an ever-growing number of media outlets worldwide. The latest ones to join the list are Israeli newspaper Haaretz (behind pay-wall), which has also highlighted a notable rise in the amount of daily pageviews since the title was changed, and German portal Israelnetz--published by evangelical association Christliche Medieninitiative pro [de]--which has stated that the page's title was "altered to Israel's detriment", while noting how a deletion discussion over the equivalent article on the German Wikipedia was also taking place (it has since been deleted).

See previous Signpost coverage about the controversy surrounding this article, as well as the discussion about the reliability of the Anti-Defamation League on the Israeli-Palestinian conflict, here and here.

Edit wars over the noisiest Olympic culture war: Italian online newspaper Fanpage.it recently covered the edit wars on the English Wikipedia page of Imane Khelif; during her participation at the 2024 Summer Olympics, where she eventually won the gold medal in her weight, the born female Algerian boxer became the subject of extensive controversy, as sports functionaries, conservative politicians, (former) social media CEOs and writers all falsely claimed^{[neutrality is disputed]} that she had XY chromosomes or elevated levels of testosterone that could give her some kind of advantage over her opponents. As of this issue's publication, Khelif's article remains extended protected, having been blue-locked since August 1.
The Editors reviews and events: Wikipedia beat reporter Stephen Harrison, who is best known for his articles on Slate, has recently been busy promoting his debut novel, The Editors, focused on a fictionalized version of the platform (named Infopendium) that is suddenly caught up in global cyberwarfare during the COVID-19 pandemic. Following his interviews in July, Harrison sat down with WFAA and Numlock News, as well as New America and Arizona State University, as part of Future Tense event. WashU newspaper Student Life also published a review of the novel. KAMR (NBC in Amarillo) has an under-three-minute video if you are in a rush.
We'll be with you down every road: In a recent episode of NPR's radio program 1A, fittingly titled "Why All Roads Of Inquiry Lead To Wikipedia", host Jenn White talks about how "much of our factual questions get answered by the site", and interviews Stephen Harrison. They emphasize the collaborative nature of Wikipedia and how this can lead to heated debates, for example on Donald Trump's conviction, which The Signpost extensively covered back in June.
Boolean roulette: Musicians and YouTubers Andrew Huang and Tom McGovern recently challenged themselves to write a full song from scratch by using a random Wikipedia article as an inspiration. Helped by Matt Inouye's tool WikiRoulette, the two artists eventually resolved to turn a... Boolean circuit into a metaphor for a meaningful relationship. You can hear the final result of their efforts--with some help from Gabi Rose--at this link.
Lawsuit against Taiwanese chapter rejected: Taiwan's Liberty Times [1] (Google Translate) and United Daily News [2] (Google Translate) both report that businessman Tsai Eng-meng has lost a lawsuit he filed in Taipei's district court against Wikimedia Taiwan, after edits he made to the Chinese Wikipedia's coverage of his ties to the Chinese Communist Party and purchase of media organizations, including the China Times, had been reverted. The chapter pointed out that it is not the owner, nor the operator of the website.

Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit our next edition in the Newsroom or leave a tip on the suggestions page.

← Previous "In the media"

Next "In the media" →

In this issue

4 September 2024 (all comments)

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Forgive me, but under the third-to-last item, shouldn't that be Stephen Harrison? (As a fellow holder of the name I am always incredibly sensitive to correct spellings. :-) ) Also, as I recall, Lane Raspberry was also on the program. I tried to listen - WAMU is my hometown NPR station, after all - but as it was during the workday I wasn't able to get much. --Ser Amantio di Nicolao^{Che dicono a Signa?}_{Lo dicono a Signa.} 17:15, 4 September 2024 (UTC)[reply]
You're quite right. I've corrected the author spelling. * Bri (talk) 18:10, 4 September 2024 (UTC)[reply]

Ironically, you misspelled Lane Rasberry's name, itself originating in an ancestral misspelling. :) Ijon (talk) 18:29, 4 September 2024 (UTC)[reply]
@Ijon: Hey, look, do I come in here and correct your spelling? Well...evidently I do, but that's beside the point.

(This is what comes of freebasing when I should have my sources in front of me. *sigh* When will I learn?)*

Apologies to all involved for the error.

_{*Never. The answer is never.}

--Ser Amantio di Nicolao^{Che dicono a Signa?}_{Lo dicono a Signa.} 19:11, 4 September 2024 (UTC)[reply]

According to the International Boxing Association, the XY chromosome test result is correct, as reported in multiple media sources: [3] etc. The IOC chose to disregard the IBA test, but considering that the IBA are the only ones who tested for this, it seems quite biased to state the exact opposite as a fact. AnonMoos (talk) 00:42, 6 September 2024 (UTC)[reply]

I believe @Oltrepier: wrote that section, so I'll let them answer if any details are needed. The best short answer is IMHO

The are 1000s of words at Talk:Imane Khelif which I hope we won't expend over here.
The International Boxing Association is not held in much esteem these days, having been "de-recognized" (or whatever the correct word is) by the International Olympic Committee.
IBA didn't publish the genetic results (as far as I can tell), and
Genetic results aren't always conclusive, in any case, according to the BBC

But this isn't something we can solve on this page, see my 1st point. Smallbones_(smalltalk) 01:41, 6 September 2024 (UTC)[reply]

That's nice -- the IBA can't release the full lab results without violating medical legal privacy rights. The Algerian and Taiwanese boxers could release the results if they wanted to, but have chosen not to. Meanwhile, none of this changes the fact that the IBA was the only entity which tested for chromosomes, and they reported XY. Confidently asserting that XY is impossible goes far beyond ordinary "original research" into constructing a parallel fantasy world. AnonMoos (talk) 01:50, 6 September 2024 (UTC)[reply]

Now that Signpost people are aware that XY chromosomes is not a false claim, but is the best available information (though its significance is subject to interpretation), continuing the text of this piece unaltered is basically the same as lying. It would be rather unfortunate if the Signpost had no concern for truth and falsehood. By the way, claims that the Algerian and Taiwanese boxers had elevated testosterone levels, and claims that Algerian and Taiwanese boxers DID NOT have elevated testosterone levels are equally unsubstantiated, since no testosterone tests took place. Only chromosome tests took place, and the reported results were XY. AnonMoos (talk) 10:41, 8 September 2024 (UTC)[reply]

What do you think of The Signpost? Share your feedback.

Home

About