The Signpost

File:Wikifunctions-favicon.svg
NGC 54, Jon Harald Søby, Stevenliuyi
cc-by-sa-4.0
300
Recent research

WikiLambda the Ultimate

Contribute   —  
Share this
By e mln e and Tilman Bayer


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


"Wikilambda the ultimate: the Wikimedia foundation’s search for the perfect language"

Reviewed by User:e_mln_e

This paper[1] by Michael Falk (of the WikiHistories project) uses Critical Code Studies methods to examine Wikilambda, the extension of the MediaWiki software that underlies Wikifunctions and Abstract Wikipedia.

"Wikifunctions - Top-level architectural model" (from 2021, by the Wikimedia Foundation, reproduced as figure 1 in the paper)

Wikifunctions, a collaboratively edited library of computer functions, is the newest Wikimedia project, launched in 2023. Abstract Wikipedia, a language-independent version of Wikipedia that the Wikimedia Foundation has been developing since 2020, relies on Wikifunctions and thereby Wikilambda to convert structured data from Wikidata into natural language. In other words, Wikilambda is the programming language using Wikifunctions to fetch structured data and facts from Abstract Wikipedia, to translate it and render it into other written language.

Published in the journal AI & Society, the paper argues that Wikilambda is an attempt to create a ‘perfect language.’ Comparing it to previous attempts to create perfect languages, the paper suggests Wikilambda cannot meet its stated goals, and points to assumptions about its potential users that likely aren't correct.

Definitions

What does the author mean by a perfect language? The article refers to Umberto Eco's 1995 book The Search for the Perfect Language, which looks at various attempts in history to create ideal languages. Umberto Eco (1995, 73) distinguishes two kinds of ideal language: the “perfect” and the “universal.” As described in the article:

A perfect language is one that is “capable of mirroring the true nature of objects. Such a language must analyse the world into its constituent parts, and provide means to build it back up again. Each word must correspond to a real component of nature, and each syntactic rule must correspond to a way that nature combines primitive elements into complex entities.

A universal language is ideal in a different way: it is a language “which everyone might, or ought to, speak. Esperanto is an example among the spoken languages. Among programming languages, BASIC, Logo, Python and Scratch are examples of languages that are intended to be universally accessible.

Umberto Eco's book describes many such projects that have failed in the past, because language is not easily severed from symbolism or necessitate a significant learning effort, while not offering the advantages of connection it promised. For instance, Esperanto didn't grow to become a lingua franca. Researchers[supp 1] note that:

Despite the logical concept and intellectual appeal of a standard language, Esperanto has not evolved into a dominant worldwide language. Instead, English, with all its idiosyncrasies, is closest to an international lingua franca. Like Zamenhof, standards committees in medical informatics have recognized communication chaos and have tried to establish working models, with mixed results. In some cases, previously shunned proprietary systems have become the standard. A proposed standard, no matter how simple, logical, and well designed, may have difficulty displacing an imperfect but functional “real life” system.

Overall argument

Falk argues Wikilambda is an attempt to create two ideal languages:

The proposed “template language” for Abstract Wikipedia is intended to be both perfect and universal: it will be perfectly able to express any fact, and universally accessible by writers all over the world. To implement this “template language,” the Abstract Wikipedia team has gone about developing another perfect and universal language: Wikilambda. This programming language will enable the people of the world to collaborate to build the constructors and renderers that will define and express the sum of human knowledge. According to the Wikilambda developers, Wikilambda is universal because it breaks the hegemony of English; it is perfect because it is not actually a language.

If WikiLambda indeed is an attempt to create ideal languages, it follows that it is at the same risks of failing as the many other such projects documented by Umberto Eco. The article analyzes why.

Article summary

The article opens with a reference to Signpost's 2023 coverage of an evaluation of WikiLambda, which found the project "at substantial risk of failure"[supp 2].

The article includes four sections. After the introduction in Section 1, Section 2 describes Wikilambda and its relationship to Wikifunctions and Abstract Wikipedia (see above), and how it treats language as a conduit, i.e. that "when we speak or write, we pack “content” into a sentence, which is then delivered to a speaker or reader who unpacks the content at the other end." Falk argues that language is not reducible in this way, because of our use of metaphors, and different constructs to understand the world.

In Section 3, Falk discusses Wikilambda itself.

The main argument for Wikilambda’s universality is that it will break the hegemony of English. Most programming languages, observe Wikilambda’s creators, use English as a source of vocabulary. JavaScript has objects, functions and if-statements, rather than Objekte, Funktionen and wenn-statements. Since languages like JavsScript use English words, they force budding programmers to “learn English first” before they learn to program, which is unfair (“Wikifunctions:Vision” 2023). To solve this problem, Wikilambda does not use words to denote parts of a computation. Instead, each part of the computation is assigned a Z-number or Z-key in the Wikifunctions database. When a person visits a function in the Wikifunctions interface, they are presented with a translation of these Z-numbers and Z-keys into their preferred language.

Falk notes this is justified by Wikilambda developers as preventing a system reproducing imperialist, Western thinking[supp 3], which directly contradicts their other beliefs about language as a simple conduit for facts. Further, he points out that because English is the de facto lingua franca, developers communities turn to it to discuss across languages.

In Section 4, Falk turns to the function orchestrator, examining "What abstractions have the Wikilambda developers invented to describe their new language? What can these abstractions tell us about the natureand intent of their project?." Falk notes that the first metaphor is that of orchestration:

The orchestrate function takes as its input a piece of Wikilambda code (a ZObject), some configuration settings (invariants) and an ImplementationSelector. Its task is to run the given Wikilambda code, using the ImplementationSelector to choose between available “implementations” in the Wikifunctions database. It is this ImplementationSelector that most clearly virtualises the “orchestration” metaphor. Normally, a programming language will have just one way of doing each action: one function for addition, one for integer division, one for instantiating an array, and so on. If there are two ways of doing something, it would normally be up to the programmer to decide: perhaps there are two division routines, one that is fast and approximate and one that is slow but exact, and the programmer can select which one is appropriate for their task. The Wikilambda language is different, because there may be many ways of performing each operation, and it is the orchestrator’s job rather than the programmer’s to choose between them.

He then dives into the specific of language design, to argue that Wikilambda developers are working to carve new abstractions, to make Wikilambda a language escaping traditional programming metaphors and constructs. He also notes the language often fails because of its high level of abstraction, and has to return to default programming conventions.

Where does that leave us?

The article is a good introduction to the full Wikilambda project, and a convincing analytical examination of the potential failure points of the project. It situates Wikilambda in the history of programming languages, and provides a useful case study of developers' use of metaphors and understanding of language. It also points to contradictions in the project we should be mindful about. The article concludes with the irony that Wikilambda developers explicitly criticized "One ring to rule them all" approaches[supp 3], yet implement one such solution. It also highlights the moral commitments made by the team: they make the entire translation process (structured data, functions, interpreter) transparent, contestable and modifiable by humans. "If nothing else, Wikilambda is a thundering critique of corporate AI hype."

See also

Briefly

"Looking at usage trends across all 11 surveyed Wikipedias from 2024-2025, it's clear that Google and YouTube are again consistently the most-frequently named platforms across survey waves and Wikipedia language editions. However, it is also clear that ChatGPT use for learning and accessing knowledge has grown considerably among Wikipedia readers from 2024-2025, particularly on arwiki, jawiki, kowiki, ptwiki, and ruwiki."

Alongside Google and YouTube, ChatGPT also received the highest favorability ratings among these other sources.

Other finding are about reader demographics, e.g. gender and age:

Consistent with previous findings from 2023 and 2024, Wikipedia readers skew young overall, although this can vary substantially by project. German Wikipedia readers in particular tend to skew older.

Share of Wikipedia readers identifying solely as men, by project (from the survey; compare also our earlier coverage: "Global Gender Differences in Wikipedia Readership")

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer

"Generic Geonyms: Exploring Wikidata for Crosslinguistic Prototypical Semantics"

From the abstract:[2]:

"[...] data extracted from Wikidata can be interesting for working on geonyms, classifying nouns in place names (e.g., English alley) and their content similarity across languages, e.g., whether Italian piazza and Chinese guǎng chǎng both express the concept ‘square.’ In this paper we explore the use of Wikidata entries to represent the semantic content of geonyms and compare cross-linguistic representations, and thus Wikidata’s potential as a novel, powerful resource for geo-semantic, cross-linguistic research."

"Derivative Relationships and Bibliographic Families Among Creative Works: A Systematic Study of Their Application by the Wikidata Community from the FRBR and BIBFRAME Perspective"

From the abstract:[3]

"This paper examines how the concept of bibliographic families and derivative relationships, foundational to modern bibliographic models like FRBR and BIBFRAME, manifest within Wikidata's community-driven knowledge base. Through systematic analysis of over 2,2 million creative works across audiovisual, musical, literary, and video game domains, we explore the emergent patterns of relationships between works. Our findings reveal that while traditional WEMI relationships represent only 2% of the identified connections, a rich ecosystem of other relationship types dominates the descriptive landscape.

"The New Zealand Thesis Project: Connecting a Nation’s Dissertations Using Wikidata"

From the abstract:[4]

"Nine New Zealand tertiary institutions collaborated with four Wikidata experts to upload a combined national dataset of doctoral and master’s theses. Thesis records, including author and advisor names and richly described with main subject statements, were extracted from each repository, combined, and data cleaned before being uploaded to Wikidata. The team then undertook additional data enrichment, round-tripped Wikidata’s QID identifiers back to individual repositories, and used the new records to cite theses on authors’ Wikipedia pages. Wikidata queries and other visualizations were created to demonstrate how connecting the thesis metadata to records for authors, advisors, institutions, and subjects allows new insights into our collections."

"Mapping the Past: Geographically Linking an Early 20th Century Swedish Encyclopedia with Wikidata"

From the abstract:[5]

"In this paper, we describe the extraction of all the location entries from a prominent Swedish encyclopedia from the early 20th century, the Nordisk Familjebok ‘Nordic Family Book.’ We focused on the second edition called Uggleupplagan, which comprises 38 volumes and over 182,000 articles. [...]. It showed a higher density within Sweden, Germany, and the United Kingdom. The paper sheds light on the selection and representation of geographic information in the Nordisk Familjebok, providing insights into historical and societal perspectives. It also paves the way for future investigations into entry selection in different time periods and comparative analyses among various encyclopedias."

Two papers from a special issue titled "Wikidata Across the Humanities: Datasets, Methodologies, Reuse":

"Integrating Premodern Manuscript Metadata into Wikidata: A Case Study in Ontology Design and Linked Data Reuse"

From the paper:[6]

"Digital Scriptorium (DS; https://digital-scriptorium.org/) is a national consortium of institutional members who contribute data describing their premodern manuscript holdings to a union catalog of premodern manuscripts owned in North American collections, the DS Catalog. The DS Catalog is built in Wikibase and operates in many ways on the same data principles and organizational structure as Wikidata [...]

Integrating manuscript metadata into Wikidata, however, is not straightforward. Wikidata was not designed with manuscripts in mind, and its flexible but general-purpose schema presents modeling challenges for representing the complexity of premodern manuscript metadata [...]. Descriptive elements like artistic attribution, ambiguous production dates, or multilingual titles in original script require more nuanced representation than current property infrastructure often allows."

"Victims of Posterity. Identifying Gaps on 19th-Century French Art History with Wikidata"

From the abstract:[7]

"This article presents a historiographical investigation of nineteenth-century French art using Wikidata. It draws on a dataset of over 12,000 artists who exhibited at the Paris Salon between 1848 and 1880, each identified and, where possible, aligned with Wikidata entries. This alignment allows for both a quantitative analysis of artists’ posthumous visibility–assessing their presence in Wikidata and the completeness of their entries–and a qualitative evaluation of the data itself. Using OpenRefine, Wikidata entries were compared with specialized sources such as the Getty Research Institute’s Union List of Artist Names, providing insight into the reliability of basic biographical information and the broader documentation available. [...]
Three key patterns emerge: first, women artists remain largely invisible in historiography, reflecting the professional and institutional barriers they faced during their lifetimes. Second, artists highly recognized in their own time tend to maintain substantial posthumous documentation, showing the durability of reputations and the traces historians rely upon. Third, association with modernity is a particularly strong factor in ensuring posthumous recognition [...]"

"Explicit vs. Implicit Biographies: Evaluating and Adapting LLM Information Extraction on Wikidata-Derived Texts"

From the abstract:[8]

"Text Implicitness has always been challenging in Natural Language Processing (NLP), with traditional methods relying on explicit statements to identify entities and their relationships. From the sentence "Zuhdi attends church every Sunday", the relationship between Zuhdi and Christianity is evident for a human reader, but it presents a challenge when it must be inferred automatically. Large language models (LLMs) have proven effective in NLP downstream tasks such as text comprehension and information extraction (IE).

This study examines how textual implicitness affects IE tasks in pre-trained LLMs: LLaMA 2.3, DeepSeekV1, and Phi1.5. We generate two synthetic datasets of 10k implicit and explicit verbalization of biographic information to measure the impact on LLM performance and analyze whether fine-tuning implicit data improves their ability to generalize in implicit reasoning tasks."

From the paper:

"[...] a set of 10,000 random entities from Wikidata was extracted, specifically targeting entities of the Human class2, e.g. Vincent Rodriguez III). The entities’ biographical information3 have been extracted via the Wikidata API, filtering out irrelevant information, such as identification parameters, visual references, and associated technical metadata. As shown in Table 2, 14 triples describe relevant information about the biography of Vincent Rodriguez III (e.g., occupation, country of citizenship, sexual orientation), with 18 values. Our aim is to create two parallel sentences for each person, one that describes a fact or info about them explicitly, and the other implicitly."


References

  1. ^ Falk, Michael (2026-03-11). "Wikilambda the ultimate: the Wikimedia foundation's search for the perfect language". AI & Society. doi:10.1007/s00146-026-02899-w. ISSN 1435-5655.
  2. ^ Samo, Giuseppe; Ursini, Francesco-Alessio (2025-12-18). "Generic Geonyms: Exploring Wikidata for Crosslinguistic Prototypical Semantics". Journal of Open Humanities Data. 11 (1) 77. doi:10.5334/johd.432. ISSN 2059-481X.
  3. ^ Saorín, Tomás; Pastor-Sánchez, Juan-Antonio; Perandones, María Antonia Ovalle (2025-12-24). "Derivative Relationships and Bibliographic Families Among Creative Works: A Systematic Study of Their Application by the Wikidata Community from the FRBR and BIBFRAME Perspective". Proceedings of the International Conference on Dublin Core and Metadata Applications. International Conference on Dublin Core and Metadata Applications. Dublin Core Metadata Initiative. doi:10.23106/dcmi.952592617.
  4. ^ Braisher, Tamsin; Fitchett, Deborah (2025-03-20). "The New Zealand Thesis Project: Connecting a Nation's Dissertations Using Wikidata". Journal of Librarianship and Scholarly Communication. 13 (1). doi:10.31274/jlsc.18295. ISSN 2162-3309.
  5. ^ Ahlin, Axel; Myrne, Alfred; Nugues, Pierre (2024-06-25). "Mapping the Past: Geographically Linking an Early 20th Century Swedish Encyclopedia with Wikidata". arXiv:2406.17903 [cs.CL].
  6. ^ McCandless, Rose A.; Coladangelo, L. P. (2025-12-11). "Integrating Premodern Manuscript Metadata into Wikidata: A Case Study in Ontology Design and Linked Data Reuse". Journal of Open Humanities Data. 11 (1) 69. doi:10.5334/johd.431. ISSN 2059-481X.
  7. ^ Beyssat, Claire Dupin de (2025-11-21). "Victims of Posterity. Identifying Gaps on 19th-Century French Art History with Wikidata". Journal of Open Humanities Data. 11 (1) 59. doi:10.5334/johd.399. ISSN 2059-481X.
  8. ^ Stramiglio, Alessandra; Schimmenti, Andrea; Pasqual, Valentina; Erp, Marieke van; Sovrano, Francesco; Vitali, Fabio (2025-09-18). "Explicit vs. Implicit Biographies: Evaluating and Adapting LLM Information Extraction on Wikidata-Derived Texts". arXiv:2509.14943 [cs.CL].
Supplementary references and notes:
  1. ^ Patterson, R.; Huff, S. M. (1999). "The decline and fall of Esperanto: lessons for standards committees". Journal of the American Medical Informatics Association: JAMIA. 6 (6): 444–446. doi:10.1136/jamia.1999.0060444. ISSN 1067-5027. PMC 61387. PMID 10579602.
  2. ^ "Abstract Wikipedia/Google.org Fellows evaluation".
  3. ^ a b "Abstract Wikipedia/Google.org Fellows evaluation - Answer - Meta-Wiki". meta.wikimedia.org. Retrieved 2026-05-21.


Signpost
In this issue
+ Add a comment

Discuss this story

To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
No comments yet. Yours could be the first!







       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0