The Signpost

Technology report

Wikimedia Foundation's Abstract Wikipedia project "at substantial risk of failure"

Contribute  —  
Share this
By Tilman Bayer

Could Abstract Wikipedia fail?

People sitting around an outdoor dining table with a view of a lake and Swiss landscape in the background
Members of the Foundation's Abstract Wikipedia team with the Google.org Fellows and others at an offsite in Switzerland this August. Left-hand side of the table, from front to back: Ariel Gutman, Ori Livneh, Maria Keet, Sandy Woodruff, Mary Yang, Eunice Moon. At head of table: Rebecca Wambua. Right-hand side of the table, front to back: Olivia Zhang, Denny Vrandečić, Edmund Wright, Dani de Waal, Ali Assaf, James Forrester

In 2020, the Wikimedia Foundation began working on Abstract Wikipedia, which is envisaged to become the first new Wikimedia project since Wikidata's launch in 2012, accompanied and supported by the separate Wikifunctions project. Abstract Wikipedia is "a conceptual extension of Wikidata", where language-independent structured information is rendered in an automated way as human-readable text in a multitude of languages, with the hope that this will vastly increase access to Wikipedia information in hitherto underserved languages. Both Abstract Wikipedia and Wikifunctions are the brainchild of longtime Wikimedian Denny Vrandečić, who also started and led the Wikidata project at Wikimedia Deutschland before becoming a Google employee in 2013, where he began to develop these ideas before joining the Wikimedia Foundation staff in 2020 to lead their implementation.

An evaluation published earlier this month calls the project's future into question:

"This is a sympathetic critique of the technical plan for Abstract Wikipedia. We (the authors) are writing this at the conclusion of a six-month Google.org Fellowship, during which we were embedded with the Abstract Wikipedia team, and assisted with the development of the project. While we firmly believe in the vision of Abstract Wikipedia, we have serious concerns about the design and approach of the project, and think that the project faces a substantial risk." [...]

"We find [Abstract Wikipedia's] vision strongly compelling, and we believe that the project, while ambitious, is achievable. However, we think that the current effort (2020–present) to develop Abstract Wikipedia at the Wikimedia Foundation is at substantial risk of failure, because we have major concerns about the soundness of the technical plan. The core problem is the decision to make Abstract Wikipedia depend on Wikifunctions, a new programming language and runtime environment, invented by the Abstract Wikipedia team, with design goals that exceed the scope of Abstract Wikipedia itself, and architectural issues that are incompatible with the standards of correctness, performance, and usability that Abstract Wikipedia requires."

That Fellowship was part of a program by Google.org (the philanthropy organization of the for-profit company Google) that enables Google employees to do pro-bono work in support of non-profit causes. The Fellow team's tech lead was Ori Livneh, himself a longtime Wikipedian and former software engineer at the Wikimedia Foundation (2012–2016), where he founded and led the Performance Team before joining Google. The other three Google Fellows who authored the evaluation are Ariel Gutman (holder of a PhD in linguistics and author of a book titled "Attributive constructions in North-Eastern Neo-Aramaic", who also published a separate "goodbye letter" summarizing his work during the Fellowship), Ali Assaf, and Mary Yang.

The evaluation examines a long list of issues in detail, and ends with a set of recommendations centered around the conclusion that –

"Abstract Wikipedia should be decoupled from Wikifunctions. The current tight coupling of the two projects together has a multiplicative effect on risk and substantially increases the risk of failure."

Among other things, the Fellows caution the Foundation to not "invent a new programming language. The cost of developing the function composition language to the required standard of stability, performance, and correctness is large ..." They propose that –

Regarding Abstract Wikipedia, the recommendations likewise center on limiting complexity and aiming to build on existing open-source solutions if possible, in particular for the NLG (natural language generation) part responsible for converting the information expressed in the project's language-independent formalism into a human-readable statement in a particular language:

  • Rather than present to users a general-purpose computation system and programming environment, provide an environment specifically dedicated to authoring abstract content, grammars, and NLG renderers in a constrained formalism.
  • Converge on a single, coherent approach to NLG.
  • If possible, adopt an extant NLG system and build on it."

The Foundation's answer

A response authored by eight Foundation staff members from the Abstract Wikipedia team (published simultaneously with the Fellows' evaluation) rejects these recommendations. They begin by acknowledging that although "Wikidata went through a number of very public iterations, and faced literally years of criticism from Wikimedia communities and from academic researchers[, the] plan for Abstract Wikipedia had not faced the same level of public development and discussion. [...] Barely anyone outside of the development team itself has dived into the Abstract Wikipedia and Wikifunctions proposal as deeply as the authors of this evaluation."

However, Vrandečić's team then goes on to reject the evaluation's core recommendations, presenting the expansive scope of Wikifunctions as a universal repository of general-purpose functions a done deal mandated by the Board (the Wikimedia Foundation's top decision-making authority), and accusing the Google Fellows of "fallacies" rooted in "misconception":

The Foundation’s Board mandate they issued to us in May 2020 was to build the Wikifunctions new wiki platform (then provisionally called Wikilambda) and the Abstract Wikipedia project. This was based on the presentation given to them at that meeting (and pre-reading), and publicly documented on Meta. That documentation at the time very explicitly called out as “a new Wikimedia project that allows to create and maintain code” and that the contents would be “a catalog of all kind[s] of functions”, on top of which there would “also” (our emphasis) be code for supporting Abstract Wikipedia.

The evaluation document starts out from this claim – that Wikifunctions is incidental to Abstract Wikipedia, and a mere implementation detail. The idea that Wikifunctions will operate as a general platform was always part of the plan by the Abstract Wikipedia team.

This key point of divergence sets up much of the rest of this document [i.e. the evaluation] for fallacies and false comparisons, as they are firmly rooted in, and indeed make a lot of sense within, the reality posed by this initial framing misconception."

(The team doesn't elaborate on why the Foundation's trustees shouldn't be able to amend that May 2020 mandate if, two and a half years later, its expansive scope does indeed risk causing the entire project to fail.)

The evaluation report and the WMF's response are both lengthy (at over 6,000 and over 10,000 words, respectively), replete with technical and linguistic arguments and examples that are difficult to summarize here in full. Interested readers are encouraged to read both documents in their entirety. Nevertheless, below we attempt to highlight and explain a few key points made by each side, and to illuminate the underlying principal tensions about decisions that are likely to shape this important effort of the Wikimedia movement for decades to come.

What is the scope of the new "Wikipedia of functions"?

In an April 2020 article for the Signpost (published a few weeks before the WMF board approved his proposal), Vrandečić explained the concept of Abstract Wikipedia and a "wiki for functions" using an example describing political happenings involving San Francisco mayor London Breed:

"Instead of saying "in order to deny her the advantage of the incumbent, the board votes in January 2018 to replace her with Mark Farrell as interim mayor until the special elections", imagine we say something more abstract such as elect(elector: Board of Supervisors, electee: Mark Farrell, position: Mayor of San Francisco, reason: deny(advantage of incumbency, London Breed)) – and even more, all of these would be language-independent identifiers, so that thing would actually look more like Q40231(Q3658756, Q6767574, Q1343202(Q6015536, Q6669880)).

[...] We still need to translate [this] abstract content to natural language. So we would need to know that the elect constructor mentioned above takes the three parameters in the example, and that we need to make a template such as {elector} elected {electee} to {position} in order to {reason} (something that looks much easier in this example than it is for most other cases). And since the creation of such translators has to be made for every supported language, we need to have a place to create such translators so that a community can do it.

For this I propose a new Wikimedia project [...] to create, maintain, manage, catalog, and evaluate a new form of knowledge assets: functions. Functions are algorithms, pieces of code, that translate input into output in a determined and repeatable way. A simple function, such as the square function, could take the number 5 and return 25. The length function could take a string such as "Wikilambda" and return the number 10. Another function could translate a date in the Gregorian calendar to a date in the Julian calendar. And yet another could translate inches to centimeters. Finally, one other function, more complex than any of those examples, could take an abstract content such as Q40231(Q3658756, Q6767574, Q1343202(Q6015536, Q6669880)) and a language code, and give back the text "In order to deny London Breed the incumbency advantage, the Board of Supervisors elected Mark Farrell Mayor of San Francisco." Or, for German, "Um London Breed den Vorteil des Amtsträgers zu verweigern, wählte der Stadtrat Mark Farrell zum Bürgermeister von San Francisco."

Wikilambda will allow contributors to create and maintain functions, their implementations and tests, in a collaborative way. These include the available constructors used to create the abstract content. The functions can be used in a variety of ways: users can call them from the Web, but also from local machines or from an app. By allowing the functions in Wikilambda to be called from wikitext, we also allow to create a global space to maintain global templates and modules, another long-lasting wish by the Wikimedia communities.

In other words, the proposal for what is now called Wikifunctions combined two kinds of functions required for Abstract Wikipedia ("constructors" like the elect example and "translators" or renderers for natural language generation that produce the human-readable Wikipedia text) with a much more general "new form of knowledge assets", functions or algorithms in the sense of computer science. While the examples Vrandečić highlighted in that April 2020 Signpost article are simple calculations such as unit conversions that are already implemented on Wikipedia today using thousands of Lua-based templates (e.g. {{convert}} for the inches to centimeters translation), the working paper published earlier that month (recommended in his Signpost article for "technical aspects") evokes a much more ambitious vision:

Imagine that everyone could easily calculate exponential models of growth, or turn Islamic calendar dates into Gregorian calendar dates. That everyone could pull census data and analyze the cities of their province or state based on the data they are interested in. [...] To analyze images, videos, and number series that need to stay private. To allow everyone to answer complex questions that today would require coding and a development environment and having a dedicated computer to run. To provide access to such functionalities in every programming language, no matter how small or obscure. That is the promise of Wikilambda.

Wikilambda will provide a comprehensive library of functions, to allow everyone to create, maintain, and run functions. This will make it easier for people without a programming background to reliably compute answers to many questions. Wikilambda would also offer a place where scientists and analysts could create models together, and share specifications, standards, or tests for functions. Wikilambda provides a persistent identifier scheme for functions, thus allowing you to refer to the functions from anywhere with a clear semantics. Processes, scientific publications, and standards could refer unambiguously to a specific algorithm.

Also, the creation of new development environments or programming languages or paradigms will become easier, as they could simply refer to Wikilambda for a vast library of functions. [...]

Indeed, a function examples list created on Meta-Wiki in July 2020 already lists much more involved cases than unit conversion functions, e.g. calculating SHA256 hashes, factorizing integers or determining the "dominant color" of an image. It is not clear (to this Wikimedian at least) whether there will be any limits in scope. Will Wikifunctions become a universal code library eclipsing The Art of Computer Programming in scope, with its editors moderating disputes about the best proxmap sort implementation and patrolling recent changes for attempts to covertly insert code vulnerabilities?

This ambitious vision of Wikifunctions as spearheading a democraticizing revolution in computer programming (rather than just providing the technical foundation of Abstract Wikipedia) appears to fuel a lot of the concerns raised in the Fellows' evaluation, and conversely motivate a lot of the pushback in the Abstract Wikipedia team's answer.

Would adapting existing NLG efforts mean perpetuating the dominance of "an imperialist English-focused Western-thinking industry"?

Another particularly contentious aspect is the Fellows' recommendation to rely on existing natural language generation tools, rather than building them from the ground up in Wikifunctions. They write:

Clearly, as Denny pointed out, the bulk work of creating NLG renderers would fall on the community of Wikipedia volunteers. [...] Hence, the necessity of a collaborative development and computation environment such as Wikifunctions.

While the core argument is correct, the fallacy lies in the scope of the Wikifunctions project, which is intended to cover any conceivable computable function (and also using various implementation languages [...]). Since NLG renderers are specific types of functions (transforming specific data types into text, possibly using specific intermediate linguistic representations) it would suffice to create a platform which allows creating such functions. There are many extant NLG systems, and some, such as Grammatical Framework, already have a vibrant community of contributors. Instead of creating a novel, general computation system such as Wikifunctions, it would suffice to create a collaborative platform which extends one of these existing approaches (or possibly creating a new NLG system adapted to the scope and contributor-profile of Abstract Wikipedia, as suggested by Ariel Gutman).

The Foundation's response argues that this approach would fail to cover the breadth of languages envisaged for Abstract Wikipedia:

Some of our own colleagues, like Maria Keet, have noted the inadequacy of those systems for many of the languages Abstract Wikipedia is intended to serve.

In particular, according to Keet, Grammatical Framework notably lacks the flexibility to handle certain aspects of Niger-Congo B languages’ morphology. The traditional answer to this kind of critique would be to say, “Let’s just organize an effort to work on Grammatical Framework.” The design philosophy behind Abstract Wikipedia is to make as few assumptions about a contributor’s facility with English and programming experience as possible. Organized efforts around Grammatical Framework (or otherwise) are not a bad idea at all. They may well solve some problems for some languages. However, the contributor pools for under-resourced languages are already small. Demanding expertise with specific natural languages like English and also specific programming paradigms contracts those pools still further.

(However, in a response to the response, Keet – a computer science professor at the University of Cape Town who volunteers for Abstract Wikipedia – disputed the Foundation's characterization of her concerns, stating that her "arguments got conflated into a, in shorthand, 'all against GF' that your reply suggests, but that is not the case.")

The Abstract Wikipedia team goes on to decry Grammatical Framework as a –

[...] solution designed by a small group of Westerners [that] is likely to produce a system that replicates the trends of an imperialist English-focused Western-thinking industry. Existing tools tell a one-voice story; they are built by the same people (in a socio-cultural sense) and produce the same outcome, which is crafted to the needs of their creators (or by the limits of their understanding). Tools are then used to build tools, which will be again used to build more tools; step by step, every level of architectural decision-making limits more and more the space that can be benefitted by these efforts.

Grammatical Framework would probably give a quick start to creating the basis for a NLG system perfectly suited to "about 45 languages of various linguistic families" – less than 10% of all existing written languages. However, while designing a system that is explicitly focused at covering the knowledge gap of languages that are underrepresented on the Internet, basing the architecture on a framework that gives support to less than the 10% most represented languages defies – by design and from the very start – the bigger purpose of the project.

S
In this issue
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
Boards approve high level plans not detailed implementation. Doc James (talk · contribs · email) 06:21, 1 January 2023 (UTC)[reply]

I'm always disappointed when a response is entirely defensive. It offers a terrible start at a dialog, if nothing else, and will predictably divide responses into polarized sides. I sympathize more with the critique than the response, myself, and so perhaps this isn't a neutral reaction, but I really wish the official foundation response had found opportunities to embrace criticism and a few opportunities to admit that a change in direction might be warranted. Ie, a "these are the critiques we feel are more valid" instead of a blanket "none of the critiques are valid". You don't have to agree, but offer a counterproposal at least. Doubling down on the original plan with no changes after the initial years experience seems to indicate management failure, regardless of the technical merits. No project survives initial implementation completely unchanged. C. Scott Ananian (talk) 15:43, 1 January 2023 (UTC)[reply]

I agree with you that it would be very disappointing if a response were entirely defensive. If I were to solely rely on The Signpost's reporting above, it might easily seem that way. Fortunately, the entire evaluation is available - and it is lengthy, as The Signpost correctly states. As we write in the response: "We have or plan to implement many of the recommendations the fellows have made regarding security, the function model, the system’s usability, etc. Many of those did not make it in either the evaluation or this answer, as both documents focused on the remaining differences, and less on the agreements."
There are a few recommendations we do not agree with. But with many we agreed, and we either already implemented them, sometimes together with the fellows, or have tasks on our task board to implement them, many before launch. --DVrandecic (WMF) (talk) 19:31, 2 January 2023 (UTC)[reply]
Denny, it's very disappointing to see you double down on such deceptive communications tactics here. I'm excerpting below here the full set of recommendations from m:Abstract Wikipedia/Google.org Fellows evaluation#Recommendations:

Wikifunctions

  • Wikifunctions should extend, augment, and refine the existing programming facilities in MediaWiki. The initial version should be a central wiki for common Lua code. [...]
  • Don’t invent a new programming language. [...] It is better to base the system on an existing, proven language.
  • The Z-Object system as currently conceived introduces a vast amount of complexity to the system. If Wikifunctions consolidates on a single implementation language (as we believe it should), much of the need for Z-Objects goes away. If there is a need to extend the native type system provided by the chosen implementation language, it should be with a minimal set of types, which should be specified in native code. They likely do not need to be modifiable on wiki.

Abstract Wikipedia

  • The design of the NLG system should start with a specification of the Abstract Content [...].
  • Rather than present to users a general-purpose computation system and programming environment, provide an environment specifically dedicated to authoring abstract content, grammars, and NLG renderers in a constrained formalism.
  • Converge on a single, coherent approach to NLG.
  • If possible, adopt an extant NLG system and build on it. One of two alternatives we mentioned above is Grammatical Framework, which already have a vibrant community of contributors.
  • Alternatively, create a new NLG system adapted to the scope and contributor-profile of Abstract Wikipedia, as previously suggested by Ariel Gutman
As far as I can see, you have dismissed every single of these 8 recommendations (three for Wikifunctions and five for Abstract Wikipedia). What's more, a CTRL-F through the entire document shows that these are the only statements that the authors refer to as "recommendations" (or where they use the term "recommend").
So Cscott seems quite correct in describing your reaction to the evaluation's criticism as a blanket rejection, at least with regard to the the resulting recommendations. The "many of" handwaving to obscure that fact and the goalpost-shifting (nobody had claimed that you had disagreed with every single thing the fellows had said outside this evaluation) really don't look good.
Regards, HaeB (talk) 21:20, 3 January 2023 (UTC)[reply]
@HaeB: Apologies. You are right, we indeed reject these eight recommendations. To explain what I meant: throughout the evaluation the fellows give many suggestions or proposals and raise many a good point, and of these, we accepted many. We have implemented many, and others are currently open tasks and planned to be implemented. I did express myself badly. I apologize for using the wrong word here. It is indeed an excellent point that, if a paper calls a section "Recommendations", that if I refer to recommendations, that it should mean the points in the recommendations, and not the generic sense of "things that are suggested throughout the paper". Sorry! --DVrandecic (WMF) (talk) 22:53, 4 January 2023 (UTC)[reply]

I feel like the team's response to the criticism kind of missed the mark. The criticism raised some risks, and then suggested some solutions. The response seemed to focus on the suggested solutions and why they didn't go with them originally, which isn't what i would call the meat of the criticism. The meat of the criticism comes down to pretty generic concerns - too much waterfall, not agile enough (e.g Trying to do everything all at once), too much NIH syndrome, too much scope creep, not enough focus on an MVP. These are all very common project management risks to focus on in the tech world, and there are many solutions. The critics suggest one possible thing to do, but certainly not the only things possible. I would expect a response to something like this talk about how those risks will be mitigated (or dispute the significance of these risks), not just talk about how they don't like one potential solution. I also am pretty unconvinced with the appeal to avoid cultural bias. Not because i dont think that is important, but because it is being treated as a binary instead of something to minimize. Yes, its an important thing to try to reduce, but you will never fully eliminate as everything anyone done is informed by cultural context. It is a risk that needs to be balanced against other risks. You can't think of it as something to eliminate entirely as the only way to do that is to do nothing at all. Bawolff (talk) 22:05, 1 January 2023 (UTC)[reply]

These are great points, thank you. The response indeed focused very much on the points of disagreement, and not so much on the agreements. A lot of the things we agreed with have already been implemented, or are in the course of being implemented. This particularly includes the project management risks you call out. It is, for example, thanks to the fellows that we refocused, which allowed us to launch the Wikifunctions Beta during the fellowship. The fellows also contributed to our now much more stable and comprehensive testing setup. We have already and are continuing to reduce scope, and to speed up the launch of Wikifunctions proper, to focus more on an MVP given the place we are now.
Some of the criticisms that are raised though are difficult to fix: we would love to have two dedicated teams, one to work on Wikifunctions, one to work on Abstract Wikipedia, but for that, we do not have the resources available. Other criticisms would have made a lot of sense to discuss in 2020 around the original proposal, but seem less actionable now, given the development in the meantime, e.g. the Python and JavaScript executors are already implemented and running on Beta.
I found the evaluation very helpful. I promise that I will keep the evaluation in mind. We will continue to focus to get us to an MVP and to get us to launch. That is our priority. --DVrandecic (WMF) (talk) 19:52, 2 January 2023 (UTC)[reply]
Agreed w bawolff on "the appeal to avoid cultural bias." I hope the team finds ways to work w / extend GF or equivalent! And hope a global template repo is still one of the core early goals, since it is mentioned prominently in both the initial design and in this critique.
I am delighted to see this depth and clarity of discussion about the scope and impact of a Project, this is something we have been missing across Wikimedia for some time. Thanks to all involved for tackling new ideas substantive enough to warrant this. – SJ + 16:36, 3 January 2023 (UTC)[reply]
There are two kinds of cultural bias involved, really. In terms of content, there is a cultural bias built into Wikidata anyway, just on the basis of Wikidata demographics (Western views, interests, preoccupations, etc.). The linguistic bias, in terms of being able to handle agglutinative or ergative grammars etc., is a different one. I think it will have a negligible impact on community demographics and the amount of content bias there is (I don't foresee large number of, say, Niger-Congo language speakers coming in and taking over if their language can be handled well).
Personally, I've always been worried that Wikidata and Abstract Wikipedia will create a sort of digital colonialism, not least because the companies likely to benefit most are all in the US, and multilingual free content is their ticket to dominating new markets currently still closed to them. Andreas JN466 17:00, 3 January 2023 (UTC)[reply]
Leaving aside Wikidata (where the Wikimedia approach has basically succeeded with "semantic web" ideas by intelligent selection), I would say that the Silicon Valley approach to language translation is firmly based at present on machine learning, massive corpus computation, and other empirical ideas. What Abstract Wikipedia intends, as can be seen already in painstaking lexeme work, is so different as almost to be considered orthogonal to current orthodoxy. The outputs from the abstract syntax are heavily conditional. If you can give a formal description of how enough sentences work in language L, and can supply enough accurate translations for nouns, verbs etc. into L from abstracted concepts, you can start getting paragraphs of Wikipedia-like content, typically of assertoric force on factual subjects. All this can generate debate and refinement of the linguistic inputs via L; and possibly cultural feedback too. It seems a long way from quick wins such as machine translation offers now, and the time scale is around ten years to see what "production mode" might mean. (I base some of this on conversations around Cambridge with people having relevant business experience.) Charles Matthews (talk) 12:35, 4 January 2023 (UTC)[reply]
Charles, according to our article on it, the idea for Abstract Wikipedia was first developed in a Google paper (see also HaeB's intro above) and we're discussing the input of Google Fellows seconded to the project, whose stated aim was "to support the backend of Wikifunctions, enabling the team to speed up their schedule" (see Wikipedia:Wikipedia_Signpost/2022-06-26/News_and_notes). So I wouldn't think that Google and others have no interest in this work. Simple articles in languages like Yoruba, Igbo or Kannada, drawing on Wikidata's vast storehouse of data, would surely be a boon to any search engine maker wanting to display knowledge panels in languages that are currently poorly served and have very small online corpora, and the same goes for makers of voice assistants. (Having said that, I wouldn't discount the possibility that machine translation may advance fast enough to significantly reduce the perceived need for something like Abstract Wikipedia.) Andreas JN466 14:08, 4 January 2023 (UTC)[reply]
I wasn't discounting your argument as it applies to Wikidata. I expect Google and other big players (what's the current acronym?) would be interested in AW just because they should "keep an eye on" this part of the field. The approach in machine learning can take decisions well enough from noisy data, can make money and so on. Basically it extends the low-hanging fruit available. Trying to get AW to populate (say) the Luganda Wikipedia in order to create "core" articles in medicine, chemistry and biology is a very different type of project. It is fundamentally about mobilising people rather than machines. Wikimedia should try to get to the point where it is a routine application of known technology.
To get back to the contentious point here: if there was no real need for innovative tech in this area, I would think a rival to AW would have been announced by now (and even a spoiler project started). I would say the type of innovation required probably has unpredictable consequences, rather than predictable ones. It should increase somewhat the connectedness of the Web. By the way, the voice assistant expert I talked to obviously thought the basic approach was wrong. Charles Matthews (talk) 16:06, 4 January 2023 (UTC)[reply]
@Charles Matthews I'm having trouble parsing this part of your reply: if there was no real need for innovative tech in this area, I would think a rival to AW would have been announced by now (and even a spoiler project started). If there was no need for innovative tech, someone would have announced a rival?
One interesting example given in the presentation on Meta was the Simple English article on Jupiter (see m:Abstract_Wikipedia/Examples/Jupiter#Original_text. Having CC0 or CC BY-SA (this is a decision the Abstract Wikipedia team has dithered about; last I looked, it was postponed) articles like this available in dozens of languages, spoken collectively by hundreds of millions of people in Asia, Africa, South America and Oceania, would surely be of interest to voice assistant makers. I can't imagine that they would turn their noses up at it, given that they're all over Wikipedia as it is.
The other question is whether articles like this, written in simple English, are actually within reach of machine translation capability today. (DeepL certainly produces perfect translations of that Jupiter article.)
I always thought an alternative – or complementary – approach to serving more languages might be to actually put some energy into Simple English Wikipedia: write texts about key topics in that project that are designed for machine translation and avoid all the things translation engines (and learners of English) have problems with – and then advising users to let emerging translation engines have a go, having them translate articles on the fly, and reviewing what problems remain.
This might be easier and quicker than the WMF and a very limited subset of volunteers coming up with
  • Wikifunctions grammar,
  • thousands of articles written in that grammar – essentially a special meta-language for writing articles only used in that project – and
  • natural-language generators to translate this metalanguage into dozens upon dozens of human languages.
I understand that the idea is to leverage Wikidata content, making ultimately for a far more powerful product; I just fear it might take decades, i.e. so long that by the time it could be done, everybody else will have moved on. Andreas JN466 10:38, 5 January 2023 (UTC)[reply]
@Jayen466: Well, you may even be right about the feasibility of "simple English" as a starting point for some progress in the direction of creating "core" content. I was thinking, rather, of the application of existing linguistic theory to provide some substitute for AW: there were mails early on its list saying "you do realise that certain things are already done/known to be very hard" about the prior art in this field. I don't know that prior art, so I can't comment. The approach being taken is blue skies research. It has my approval for that reason: if once a decade the WMF puts resources behind such a project, that seems about right.
The use of lexemes in AW is a coherent strategy. Wikidata has begun to integrate Commons and Wikisource with the Wikipedias in a way I approve of also. What I wrote below about the actual linguistic approach being adopted is something like "brute-force solution mitigated by the use of functional programming in a good context for it". It hasn't been said explicitly, but seems quite possible as an eventual outcome, that the Module: namespace in Wikipedia would, via Wikifunctions, be broadened out to include much more diverse code than is currently used, with just Lua. That is all back-office infrastructure, but promising for the future of Wikipedia (which is managed conservatively from the tech point of view).
There are people asking all the time for more technical attention across Wikimedia, and they will go on doing that. We see some incremental changes on Wikipedia, of the meat-and-potatoes kind. It seems to me satisfactory that there is an ambitious project launched in 2020 which might look pretty smart by 2030. In any case I come down here on Denny's side, rather than supporting the critics, because it seems care and thought is going into the design, as opposed to a race for quick results. My own limited experience of software pipelines suggests that everyone is a critic. Charles Matthews (talk) 11:20, 5 January 2023 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0