The Signpost

File:Roundup Kanister 2020 08.JPG
Flo Beck
CC0
0
50
300
Recent research

Is Wikipedia a merchant of (non-)doubt for glyphosate?; eight projects awarded Wikimedia Research Fund grants

Contribute   —  
Share this
By e_mln_e and Tilman Bayer


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Is Wikipedia a merchant of (non-)doubt for glyphosate?

Reviewed by e_mln_e
TKTK
A container of the herbicide Roundup

This study[1] by Alexander A. Kaurov and Naomi Oreskes examines the circulation of a 2000 study on the safety and risk profile of glyphosate[supp 1] (WKM2000), a component of the herbicide Roundup. The authors describe the paper as ghostwritten due to the fact that it "was crafted by Monsanto" (the company that produced Roundup), adding that "the paper has not been retracted and continues to be cited". The authors of WKM2000 indeed disclosed Monsanto funding in the paper, following scientific standards.

Oreskes, a historian of science, is well-known for coauthoring a 2010 book titled Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming. In this new paper, she and Kaurov aim to "illustrat[e] how corporate-sponsored science infiltrates public knowledge platforms," by analyzing how WKM2000 is cited on Wikipedia, in governance and policy documents accessible through the Overton database (for context about its uses for evaluating the impact of academic research, see Szomszor and Adie, 2022[supp 2]) and in academic literature. The authors have also published an opinion paper based on the study[supp 3].

We focus here on their analysis of using WKM2000 as a reference on Wikipedia.

First, they identify three articles on English Wikipedia using WKM2000 as a source: "Polyethoxylated Tallow Amine", "Round-up (herbicide)" and "Glyphosate-based Herbicides." They find that on December 1, 2024, all three referred to WKM2000, and that in the past the article about glyphosate did too. Looking at revisions, they highlight that there were multiple attempts to remove this citation or add a cautionary sentence about it being ghostwritten; these changes were generally reverted.

Kaurov and Oreskes offer a succinct history of changes to the inclusion and description of the citation, and summarize why editors chose to keep the citation. They do not link to the specific talk page discussions they draw on in the paper[supp 4], but we did verify that the discussions between Wikipedians they present are similar to current discussions on Glyphosate's talk page, asking why "[t]he article states that there is scientific consensus that glyphosate is not carcinogenic"; as well as to many archived discussions:

Editors argued that according to Wikipedia’s guidelines—particularly WP:MEDRS, which outlines the use of medical sources—removal of the paper would be unwarranted unless secondary reviews had criticized its methodology or findings. They contended that court documents and internal emails, which could be “cherry-picked,” do not necessarily undermine the scientific validity of the study unless reflected in peer-reviewed critiques. Another editor emphasized that the controversy had been reported in reliable secondary sources, including major news outlets like NPR, suggesting that this should be taken into account when assessing the paper’s reliability.

The authors add, at this point, that this is not a critique of editors' work or of Wikipedia's editorial system. But they later use this example to question the process in place to handle scientific papers neutrally on Wikipedia.

The treatment of WKM2000 on Wikipedia also reveals a troubling interpretation of "neutrality" in scientific discourse. Wikipedia editors consistently treated the paper as a valid scientific source, even after its ghost-written nature was revealed, arguing that as long as the paper remained in the literature without formal retraction or refutation, it should be cited without caveat.

The article goes on to discuss corporate entities editing Wikipedia despite their conflict of interests, infringing WP:COI, to situate WKM200 in broader manipulations of Wikipedia by corporate entities. But they miss a far more interesting discussion: at what level of proof should editors set the bar before referring to controversies about a paper that has not been disproved in later reviews and research? With regards to WKM2000, why would we mention the critique leveraged by the authors, but not the European Food Safety Authority's statement about WKM2000? It concluded that "EU experts had access to, and relied primarily on, the findings of the original guideline studies and the underlying raw data to produce their own conclusions" which aligned with those of the paper and that "even if the allegations were confirmed that these review papers were ghostwritten, there would be no impact on the overall EU assessment and conclusions on glyphosate."

Finally, the authors do not consider the ramifications of their critiques of WP:MEDRS. It was developed to protect the integrity of medicine information of Wikipedia; going against it would open the door to adding unfounded controversies on other health topics, a heightened concern at a time where longstanding medical research is called into question by public figures [supp 5].

For instance, WP:MEDRS is used to ensure the reliability of the page on Paracetamol). It prevents the inclusion of studies with undisclosed conflicts of interest. As it happens, Wikipedia editors are currently examining (see Talk:Paracetamol#Baccarelli_2025_review) whether a paper recently cited by the Trump administration to question Paracetamol/Tylenol's safety should be included. Indeed, it was co-authored by Beate Ritz, a well-known public figure critiquing peer-reviewed glyphosate research[supp 6] who... did not disclose her conflicts of interest[supp 7].

In summary, the study fails to fully engage with editors' rationale for not removing WKM2000 and for giving precedence to peer-reviewed academic sources. They did suggest future work could include interviewing Wikipedians.

Eight research projects awarded a total of $315,659 in grants from the 2024-25 Wikimedia Research Fund

By Tilman Bayer

The Wikimedia Foundation has announced the results of the 2024-25 Research Fund round. Eight proposals (out of 61) were funded, with a total budget of $315,659 USD:

Title and link to research project page Applicants Organization Budget Start–end dates
Extended: Opportunities for Supporting Community-Scale Communication Cristian Danescu-Niculescu-Mizil Cornell University $105,000 USD August 2025 – August 2027
Informing Memory Institutions and Humanities Researchers of the Broader Impact of Open Data Sharing via Wikidata [project page missing] Hanlin Li, Nicholas Vincent The University of Texas at Austin $49,450 USD July 15, 2025 – July 14, 2026
Lexeme based approach for the development of technical vocabulary for underserved languages: A case Study on Moroccan Darija Anass Sedrati, Reda Benkhadra, Mounir Afifi, Jan Hoogland Kiwix $25,722 USD July 1, 2025 – June 30, 2026
Wikipedia and Wikimedia projects in the focus of scientific research – a research community-building event in Ukraine Anton Protsiuk, Mariana Senkiv, Natalia Lastovets Wikimedia Ukraine $12,115 USD July 1, 2025 – March 31, 2026
Between Prompt and Publish: Community Perceptions and Practices Related to AI-Generated Wikipedia Content Anwesha Chakraborty, Netha Hussain N/A $10,000 USD October 1, 2025 – August 31, 2026
Establishing a Critical Digital Commons Research Network [project page missing] Zachary McDowell University of Illinois at Chicago $14,300 USD October 2025 – March 2026
The state of science and Wikimedia: Who is doing what, and who is funding it? Brett Buttliere, Matthew A. Vetter, Lane Rasberry, Iolanda Pensa, Susanna Mkrtchyan, Daniel Mietchen University of Warsaw $49,450 USD August 1, 2025 – July 30, 2026
Developing a wiki-integrated workflow to build a living review on just sustainability transitions Adélie Ranville, Romain Mekarni, Rémy Gerbet, Arthur Perret, Finn Årup Nielsen, Dariusz Jemielniak Wikimédia France $49,622 USD September 1, 2025 – August 31, 2026


Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer

"Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data"

From the abstract:[2]

"[...] Wikipedia aggregates a large amount of data on chemistry, encompassing well over 20,000 individual Wikipedia pages and serves the general public as well as the chemistry community. Many other chemical databases and services utilize these data [...] We present a comprehensive effort that combines bulk automated data extraction over tens of thousands of pages, semiautomated data extraction over hundreds of pages, and fine-grained manual extraction of individual lists and compounds of interest. We then correlate these data with the existing contents of the U.S. Environmental Protection Agency’s (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database. This was performed with a number of intentions including ensuring as complete a mapping as possible between the Dashboard and Wikipedia so that relevant snippets of the article are loaded for the user to review. Conflicts between Dashboard content and Wikipedia in terms of, for example, identifiers such as chemical registry numbers, names, and InChIs and structure-based collisions such as SMILES were identified and used as the basis of curation of both DSSTox and Wikipedia. [...] This work also led to improved bidirectional linkage of the detailed chemistry and usage information from Wikipedia with expert-curated structure and identifier data from DSSTox for a new list of nearly 20,000 chemicals. All of this work ultimately enhances the data mappings that allow for the display of the introduction of the Wikipedia article in the community-accessible web-based EPA Comptox Chemicals Dashboard, enhancing the user experience for the thousands of users per day accessing the resource."

Using Wikidata to assess the cultural awareness and diversity of text-to-image models

From the abstract:[3]

"[...] we introduce a framework to evaluate cultural competence of T2I models along two crucial dimensions: cultural awareness and cultural diversity, and present a scalable approach using a combination of structured knowledge bases and large language models to build a large dataset of cultural artifacts to enable this evaluation. In particular, we apply this approach to build CUBE (CUltural BEnchmark for Text-to-Image models), a first-of-its-kind benchmark to evaluate cultural competence of T2I models. CUBE covers cultural artifacts associated with 8 countries across different geo-cultural regions and along 3 concepts: cuisine, landmarks, and art."

From the paper:

"We use WikiData [...] to extract cultural artifacts, as it is the world’s largest publicly available knowledge base, with each entry intended to be supported by authoritative sources of information. We [..] traverse the WikiData dump of April 2024, by first manually identifying root nodes [...], a small seed set of manually selected nodes that represent the concept in question. For example, the node ’dish’ (WikiID: Q746549) is identified as a root node for the concept ’cuisine’. We then look for child nodes that lie along the ’instance of’ (P31) and ’subclass of’ (P279) edges; e.g. ’Biriyani’,(Q271555), a popular dish from India, is a child node of ’dish’ along the ’instance of’ edge. The child nodes that have the ’country-of-origin’ (P495) or the ’country’ (P17) are extracted at the iteration.

"WikiDO": A benchmark for vision-language models, derived from the "Wikipedia Diversity Observatory"

From the abstract:[4]

"Cross-modal (image-to-text and text-to-image) retrieval is an established task used in evaluation benchmarks to test the performance of vision-language models (VLMs). [...] we introduce WikiDO (drawn from Wikipedia Diversity Observatory), a novel cross-modal retrieval benchmark to assess the OOD generalization capabilities of pretrained VLMs. This consists of newly scraped 380K image-text pairs from Wikipedia with domain labels, a carefully curated, human-verified a)in-distribution (ID) test set (3K) and b) OOD test set (3K). The image-text pairs are very diverse in topics and geographical locations. [...] Our benchmark is hosted as a competition at https://kaggle.com/competitions/wikido24 with public access to dataset and code.

See also presentation slides from NeurIPS 2024

"Class Granularity": Wikidata- and Wikipedia-derived knowledge graphs "represent the real world" less "richly"

From a preprint titled "Class Granularity: How richly does your knowledge graph represent the real world?":[5]

"[...] we propose a new metric called Class Granularity, which measures how well a knowledge graph is structured in terms of how finely classes with unique characteristics are defined. Furthermore, this research presents potential impact of Class Granularity in knowledge graph's on downstream tasks."
"Class Granularity is a metric that can measure how detailed the ontology of a knowledge graph is and how well it reflects the actual knowledge graph composed of RDF triples."

"In this study, we provide Class Granularity for Wikidata, DBpedia, YAGO, and Freebase, allowing us to compare the level of granularity in LOD (Linked Open Data) that has not been addressed in previous research."

In lieu of Wikidata itself, the authors, three researchers from Naver, used their own company's "knowledge graph, Raftel, a knowledge graph constructed by consolidating Wikidata’s ontology", with 28,652,479 instances (much fewer than the over 109 million items that Wikidata contains currently). The results indicate that Wikidata or at least its "Raftel" derivative has a lower granularity than Freebase and YAGO, although still higher than the (Wikipedia-derived) DBpedia:

Table 6: Metric comparison of LOD and Raftel
Dataset Classes Predicates Instances Triples Avg. predicates per class Granularity
DBpedia 472 33,457 6,570,879 60,451,631 599 0.0904
YAGO 111 133 64,611,470 461,321,787 24 0.1708
Freebase 7,425 769,935 115,755,706 961,192,099 278 0.3964
Raftel 287 1,079 28,652,479 298,359,151 132 0.1400

Wikimedians can derive consolation from the authors' caveat that "[h]aving a high Class Granularity doesn’t necessarily imply superiority", although they maintain that "it does provide a way to gauge how well classes possess distinct characteristics beyond just their quantity, which is often hard to evaluate solely based on the number of classes and predicates."

"KGPrune: a Web Application to Extract Subgraphs of Interest from Wikidata with Analogical Pruning"

From the abstract:[6]

"[...] not all knowledge represented [in knowledge graphs] is useful or pertaining when considering a new application or specific task. Also, due to their increasing size, handling large KGs in their entirety entails scalability issues. These two aspects asks for efficient methods to extract subgraphs of interest from existing KGs. To this aim, we introduce KGPrune, a Web Application that, given seed entities of interest and properties to traverse, extracts their neighboring subgraphs from Wikidata. To avoid topical drift, KGPrune relies on a frugal pruning algorithm based on analogical reasoning to only keep relevant neighbors while pruning irrelevant ones. The interest of KGPrune is illustrated by two concrete applications, namely, bootstrapping an enterprise KG and extracting knowledge related to looted artworks."

The tool can be accessed online via browser and via an API.

References

  1. ^ Kaurov, Alexander A.; Oreskes, Naomi (2025-09-01). "The afterlife of a ghost-written paper: How corporate authorship shaped two decades of glyphosate safety discourse". Environmental Science & Policy. 171 104160. doi:10.1016/j.envsci.2025.104160. ISSN 1462-9011.
  2. ^ Sinclair, Gabriel; Thillainadarajah, Inthirany; Meyer, Brian; Samano, Vicente; Sivasupramaniam, Sakuntala; Adams, Linda; Willighagen, Egon L.; Richard, Ann M.; Walker, Martin; Williams, Antony J. (2022-10-24). "Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data". Journal of Chemical Information and Modeling. 62 (20): 4888–4905. doi:10.1021/acs.jcim.2c00886. ISSN 1549-9596.
  3. ^ Kannen, Nithish; Ahmad, Arif; Andreetto, Marco; Prabhakaran, Vinodkumar; Prabhu, Utsav; Dieng, Adji B.; Bhattacharyya, Pushpak; Dave, Shachi (2024-12-16). "Beyond Aesthetics: Cultural Competence in Text-to-Image Models". Advances in Neural Information Processing Systems. 37: 13716–13747.
  4. ^ Kalyan, T. P.; Pasi, Piyush S.; Dharod, Sahil N.; Motiwala, Azeem A.; Jyothi, Preethi; Chaudhary, Aditi; Srinivasan, Krishna (2024-12-16). "WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Models". Advances in Neural Information Processing Systems. 37: 140812–140827.
  5. ^ Seo, Sumin; Cheon, Heeseon; Kim, Hyunho (2024-11-10), Class Granularity: How richly does your knowledge graph represent the real world?, arXiv, doi:10.48550/arXiv.2411.06385
  6. ^ Monnin, Pierre; Nousradine, Cherif-Hassan; Jarnac, Lucas; Zuckerman, Laurel; Couceiro, Miguel (2024-10-19). KGPrune: a Web Application to Extract Subgraphs of Interest from Wikidata with Analogical Pruning. ECAI 2024 - 27th European Conference on Artificial Intelligence. IOS Press. doi:10.3233/FAIA241038.
Supplementary references and notes:


Signpost
In this issue
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Eight research projects awarded a total of $315,659 in grants from the 2024-25 Wikimedia Research Fund

@HaeB:, thanks for this article. So these projects have been funded and the work on them will start, right? This makes me interested in the previous batch. For example there was this initiative. Has anyone measured the impact? What was the output? Were specific measures suggested/adopted? Has the gap grown or shrunk? Alaexis¿question? 13:03, 5 October 2025 (UTC)[reply]

Good questions. Note that each project has a research project page on Meta-wiki (or should have - as mentioned in the article, two of the projects in the current round are still lacking one, eleven weeks after their funding was publicly announced on July 15).
In case of the project from the previous round that you are asking about, its project page is here, and it looks like this edit by User:Rehamaltamime is the most recent update.
Regards, HaeB (talk) 21:56, 5 October 2025 (UTC)[reply]
@HaeB, thanks for the links. I think it's safe to assume that the research was not concluded by June 30 as planned. It would be interesting to look at all the last year's grants to see whether it's an exception or not. Do you know if anyone has done it already? Could it be Signpost-worthy? Alaexis¿question? 06:33, 6 October 2025 (UTC)[reply]
Absolutely worth covering, yes. I had already included an overview of the status of the 2022–23 round earlier this year here when covering the announcement of the 2024-25 round opening: Wikipedia:Wikipedia_Signpost/2025-03-22/Recent_research#Applications_are_open_for_the_2025_Wikimedia_Research_Fund (as remarked then, that table might also shed some light on possible reasons for [some changes WMF announced for the 2024-25 round] – e.g. it appears that several projects struggled to complete work within 12 months). At that point, the projects from the 2023-24 round like the one you mention were still in progress (I understand that the years refer to the WMF fiscal year where the funding was budgeted, not the timespan of the research project itself).
So you would be welcome to cover the outcomes of the 2023-24 round for our readers (here is our Etherpad for coordinating coverage in the next issue). Ideally this would include not just information on which projects were completed on time (as I did with that table in March) but also a summary of (what you deem to be) the most interesting research findings and publications resulting from each.
Regards, HaeB (talk) 06:57, 6 October 2025 (UTC)[reply]

Is Wikipedia a merchant of (non-)doubt for glyphosate?

Regarding the parallels between the controversies about glyphosate/Roundup and about Tylenol/autism that the review alludes to (featuring undisclosed researcher COIs in both cases), I happened to see this recent NYT opinion article by Michael Grunwald which likewise connects these two:

President Trump’s health secretary and Make America Healthy Again leader Robert F. Kennedy Jr., has condemned [glyphosate] as a poison fueling a disease crisis. Mr. Trump’s nominee for surgeon general, Dr. Casey Means, wrote on X that it’s driving a “slow-motion extinction event,” begging her followers: “For the love of God never buy Roundup.” In May, the administration’s initial MAHA report on childhood disease linked glyphosate to “a range of possible health effects,” from cancer to ominous “metabolic disturbances.”

[...] This debate is what happens when politics, vibes and hysteria drown out science, facts and data. There’s no weighing of benefits versus costs, much less any subtler distinction between hazards and risks. [...] Many liberals repulsed by Mr. Kennedy’s unscientific bias against vaccines and Tylenol share his unscientific bias against agri-chemicals, genetically modified organisms and industrial agriculture. [...] The Environmental Protection Agency, the European Food Safety Authority and regulators in Canada, Japan and Australia have all concluded [glyphosate is] safe for humans. [...]

This is a scientific truism that MAHA misses: The dose makes the poison. You shouldn’t swallow an entire bottle of Tylenol, but it’s a safe product, and it would take a higher dose of glyphosate than Tylenol to kill someone.

On the other hand, Grunwald also mentions that "the administration’s follow-up strategic plan in September didn’t mention glyphosate. It didn’t propose any tighter regulations of any agricultural chemicals. Now many MAHA activists believe that Mr. Kennedy has abandoned his principles to appease Mr. Trump’s agricultural donors [...]." In their Undark opinion article about the paper reviewed here, Kaurov and Oreskes similarly highlight the possible impact of the Trump administration's antiregulatory stance on an EPA "decision on the use of America’s most widely used herbicide, glyphosate" scheduled for next year - but fail mention how well aligned their own promotion of doubt about (what Wikipedia presents as) the scientific consensus about the safety of glyphosate aligns with the stance of other parts of the Trump administration.

Regards, HaeB (talk) 03:14, 6 October 2025 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0