The Signpost

In focus

Multilingual Wikipedia

Contribute  —  
Share this
By Denny Vrandečić
Denny Vrandečić was the Wikidata director until September 2013 and was a member of the Wikimedia Foundation board of trustees from July 2015 to April 2016. He earned a PhD at the Karlsruhe Institute of Technology. He now works at Google. -S

Wikipedia’s mission is to allow everyone to share in the sum of all knowledge. Wikipedia is in its twentieth year, and it has been a success in many ways. And yet, it still has large knowledge gaps, particularly in language editions with smaller active communities. But not only there – did you know that only a third of all topics that have Wikipedia articles have an article on the English Wikipedia? Did you know that only about half of articles in the German Wikipedia have a counterpart on the English Wikipedia? There are huge amounts of knowledge out there that are not accessible to readers who can read only one or two languages.

And even if there is an article, content is often very unevenly distributed, and where one Wikipedia has long articles with several sections, another Wikipedia might just have a stub. And sometimes, articles contain very outdated knowledge. When London Breed became mayor of San Francisco, nine months later only twenty-four language editions had listed her as such. Sixty-two editions listed out-of-date mayors – and not only Ed Lee, who was mayor from 2011, but also Gavin Newsom, who was mayor from 2004 to 2011, and Willie Brown, who was mayor from 1996 to 2004. The Cebuano Wikipedia even lists Dianne Feinstein, who was mayor from 1978 to 1988, more than a decade before Wikipedia was even created.

This is no surprise, as half of the Wikipedia language editions have fewer than ten active contributors. It is challenging to write and maintain a comprehensive and current encyclopedia with ten people in their spare time. It cannot be expected that those ten contributors keep track of all the cities in the world and update their mayors in Wikipedia. In many cases those contributors would prefer to work on other articles.

Wikidata to the rescue?

This is where Wikidata can help. And in fact, it does: of the twenty-four Wikipedia language editions that listed London Breed as mayor, eight got that information from Wikidata, and were up-to-date because of that. But Wikidata cannot really tell the full story.

Ed Lee, then mayor of San Francisco, died of cardiac arrest in December 2017. London Breed, as the president of the board of supervisors, became acting mayor, but in order to deny her the advantage of the incumbent, the board voted in January 2018 to replace her with Mark Farrell as interim mayor until the special elections to finish the term of Ed Lee were held in June. London Breed won the election and became mayor in July until the next regular elections a year later which she also won.

Now there are many facts in there that can be represented in Wikidata: that there was a special election for the position of the mayor of San Francisco, that it was held in June, that London Breed won that election. That there was an election in 2019. That Mark Farrell held the office from January to July. That Ed Lee died of cardiac arrest in December 2017.

But all of these facts don’t tell a story. Whereas Wikidata records these facts, they are spread throughout the wiki, and it is very hard to string them together in a way that allows a reader to make sense. Even worse, these facts are just a very small set of the billions of such facts in Wikidata, and for a reader it is hard to figure out which are relevant and which are not. Wikidata is great for answering questions, creating graphs, allowing data exploration, or making infobox-like overviews of a topic, but it is really bad at telling even the rather simple story presented above.

We have a solution for this problem, and it’s quite marvelous: language. Language is expressive, it can tell stories, it is predestined for knowledge transfer. But also, there are many languages in the world, and most of us only speak a few of them. This is a barrier for the transfer of knowledge. Here I suggest an architecture to lower this barrier, deeply inspired by the way language works.

Imagine for a moment that we start abstracting the content of a text. Instead of saying "in order to deny her the advantage of the incumbent, the board votes in January 2018 to replace her with Mark Farrell as interim mayor until the special elections", imagine we say something more abstract such as elect(elector: Board of Supervisors, electee: Mark Farrell, position: Mayor of San Francisco, reason: deny(advantage of incumbency, London Breed)) – and even more, all of these would be language-independent identifiers, so that thing would actually look more like Q40231(Q3658756, Q6767574, Q1343202(Q6015536, Q6669880)). On first glance, this looks much like a statement in Wikidata, but merely by putting that in a series of other such abstract statements, and having some connecting tissue between these bare-bones statements, we are inching much closer to what a full-bodied text needs.

A new project: a wiki for functions

But obviously, we wouldn’t show this abstract content to the readers. We still need to translate the abstract content to natural language. So we would need to know that the elect constructor mentioned above takes the three parameters in the example, and that we need to make a template such as {elector} elected {electee} to {position} in order to {reason} (something that looks much easier in this example than it is for most other cases). And since the creation of such translators has to be made for every supported language, we need to have a place to create such translators so that a community can do it.

For this I propose a new Wikimedia project, preliminarily called Wikilambda (and I am terrible with names, so I do not expect the project to be actually called this). Wikilambda would be a new project to create, maintain, manage, catalog, and evaluate a new form of knowledge assets: functions. Functions are algorithms, pieces of code, that translate input into output in a determined and repeatable way. A simple function, such as the square function, could take the number 5 and return 25. The length function could take a string such as "Wikilambda" and return the number 10. Another function could translate a date in the Gregorian calendar to a date in the Julian calendar. And yet another could translate inches to centimeters. Finally, one other function, more complex than any of those examples, could take an abstract content such as Q40231(Q3658756, Q6767574, Q1343202(Q6015536, Q6669880)) and a language code, and give back the text "In order to deny London Breed the incumbency advantage, the Board of Supervisors elected Mark Farrell Mayor of San Francisco." Or, for German, "Um London Breed den Vorteil des Amtsträgers zu verweigern, wählte der Stadtrat Mark Farrell zum Bürgermeister von San Francisco."

Wikilambda will allow contributors to create and maintain functions, their implementations and tests, in a collaborative way. These include the available constructors used to create the abstract content. The functions can be used in a variety of ways: users can call them from the Web, but also from local machines or from an app. By allowing the functions in Wikilambda to be called from wikitext, we also allow to create a global space to maintain global templates and modules, another long-lasting wish by the Wikimedia communities. This will allow more communities to share expertise and make the life of other projects such as the Content Translation tool easier.

This will allow the individual language communities to use text generated from the abstract content, and fill some of their knowledge gaps. The hope is that writing the functions that translate abstract content, albeit more complex, is also much less work than writing and maintaining a full-fledged encyclopedia. This will also allow smaller communities to focus on the topics they care about – local places, culture, food – and yet to have an up-to-date coverage of globally relevant topics.

What do you think?

To make it absolutely clear: this proposal does not call for the replacement of the current Wikipedias. It is meant as an offer to the communities to fill in the gaps that currently exist. It would be presumptuous to assume that a text generated by Wikilambda would ever achieve the brilliance and subtlety that let many of our current Wikipedia articles shine. And although there are several advantages for many parts of the English Wikipedia as well (say for global templates or content that is actually richer in a local language), I would be surprised if the English Wikipedia community would start to widely adopt what Wikilambda offers early on. But it seems that it is hard to overestimate the effect this proposal could have on smaller communities, and eventually on our whole movement in order to get a bit closer to our vision of a world in which everyone can share in the sum of all knowledge.

I invite you to read my recently published paper detailing the technical aspects and an upcoming chapter discussing the social aspects of this proposal. I have discussed this proposal with several researchers in many related research areas, with members of different Wikimedia communities, and with folks at the Wikimedia Foundation, to figure out the next steps. I also invite you to discuss this proposal, in the Comments section below, or on Meta, on Wikimedia-l, or with me directly. I am very excited to work toward it and I hope to hear your reservations and your ideas.

Update (May 8, 2020): An official proposal for Wikilambda is now up on Meta. Discussion and support can be expressed there.

In this issue
+ Add a comment

Discuss this story

  • I think a first step would be to assess why we don't have an English article matching one on another wiki. Is it solely because of a language barrier? Is the concept covered by a different article? Does the explanation lie in differing notability guidelines between projects? A translate-a-thon is not a bad idea in theory but further analysis would be useful in better understanding the underlying issues. Nikkimaria (talk) 20:29, 26 April 2020 (UTC)[reply]
  • Here's a quick query that shows you a few hundred German articles without one in English (should be easy to change the language on that):
Probably even more interesting is this, the list of articles that exist both in the German and Spanish Wikipedia, but not in English:
This is just for starting this investigation, obviously, we should have a much deeper analysis, I agree with that. --denny vrandečić (talk) 20:55, 26 April 2020 (UTC)[reply]
  • Just because the article is in German and there's no English equivalent doesn't mean that that the article is about a German subject. I've encountered many articles in German about Australian athletes. Hawkeye7 (discuss) 01:23, 27 April 2020 (UTC)[reply]
  • I have translated a few articles from German and other languages to English. In some other cases, the article I wanted to "port" had insufficient references per enwp requirements. This is not necessarily a surmountable problem with just more translators. Notability and BLP documentation standards differ between different Wikipedia language editions, and enwp seems to have a relatively high bar. Example, I just created a stub for Paragraf Lex. It has zero references on Serbian Wikipedia. It appears to be somewhat notable; at least, it is cited quite a few times by English Wikipedia. ☆ Bri (talk) 21:00, 26 April 2020 (UTC)[reply]
  • Indeed, I'd expect us to have the highest bar as the largest Wikipedia (by some metrics). Lower bars encourage expansion of content and accumulating an editor base, which is good for small Wikipedias, whilst higher bars encourage improving the quality of already-existing content. Nonetheless, we do have systemic biases and no doubt there is a lot of useful translation that can be done, even if the aim is not to make an article for every subject on the German Wikipedia. — Bilorv (talk) 21:18, 26 April 2020 (UTC)[reply]
  • When we talk about articles existing in one language Wikipedia not being present in the English language Wikipedia, it doesn't always prove that there is a deficit needing to be addressed. For example, a few years back I was surprised to find the Bulgarian language Wikipedia has an article on every Consul of the Roman Empire known -- who number about 1,400 between 30 BC & AD 235. (en.wikipedia has somewhat more than 1,000.) I was impressed by that, & took a close look at a few ... only to find they were the most basic of stubs, consisting of little more information than "X was a politician of ancient Rome. X was a consul in year A with Y as his colleague", & some fancy templates. (Google translate works wonders in cases like this.) No sense translating stubs like these to the English Wikipedia; we create enough stubs on our own. -- llywrch (talk) 03:40, 27 April 2020 (UTC)[reply]
In effect, notability is defined per-language. For any particular article in German Wikipedia, the topic may not be suitable for English Wikipedia, if there are not enough appropriate English-language sources. Bruce leverett (talk) 18:57, 1 May 2020 (UTC)[reply]
This is not an issue. According to WP:NOENG, English sources are preferred in the English Wikipedia, but non-English sources are allowed, and this is sensible. As for notability, my understanding is that this Multilingual Wikipedia / Abstract Wikipedia / Wikilambda proposal doesn't intend to force any article to appear in any language, but only to give an easier way to auto-generate basic articles in languages that are interested in them. --Amir E. Aharoni (talk) 12:16, 2 May 2020 (UTC)[reply]
Non-English sources are "allowed", but, to repeat what I said above, "German-language sources aren't much help to my English-speaking readers". Yes, I expect people to read the footnotes, and click on them. I understand that for some articles, including some that I have worked on, this isn't an issue. But the implication is that creating a non-stub English-language version of a foreign-language article is more than just running Google translate and fixing up the results -- much more. I'm not scoffing; in many cases of, for example, chess biographies, I have yearned to be able to transplant the knowledge from a foreign-language article to English. But it's only a little less work than writing a new article from scratch. Bruce leverett (talk) 18:42, 2 May 2020 (UTC)[reply]
@Bruce leverett: Agreed. There's also nothing that would stop us from using a cite mechanism in the Abstract Wikipedia that prefers sources in the language of the Wiki when available, and only falls back to sources in other languages if none is given. I guess it is still better to have a source in a foreign language than have no source at all, but I totally understand and agree with the idea that sources in the local language should be preferred on display. --denny vrandečić (talk) 20:48, 11 May 2020 (UTC)[reply]
I expect people to read the footnotes, and click on them You're going to be very disappointed. Not only don't they read the footnotes, sometimes they post questions on the talk page admitting that they didn't read the article. Hawkeye7 (discuss) 23:26, 5 July 2020 (UTC)[reply]
@Bahnfrend: But isn't that true for Wikipedia in general? We have people with different skill sets working together. Bots written in Python, templates written with many curly braces, modules in Lua, tables, images, categories, and beautiful natural language text.
The important part is that the actual content can be contributed by many people, because that is where we must make sure that the barrier is low. This is what the project really needs to get right, and it devotes quite a few resources to this challenge.
For Wikilambda itself, yes, that's a very different kind of beast - and will have a different kind of community with a different set of contributors. But they don't have to be the same contributors that contribute to the Content of the Abstract Wikipedias. But again, as in Wikipedia we will have volunteers with different skill sets working together and achieving more than they could alone. --denny vrandečić (talk) 20:53, 11 May 2020 (UTC)[reply]
Thank you! --denny vrandečić (talk) 20:49, 11 May 2020 (UTC)[reply]
Answered there, thanks! --denny vrandečić (talk) 20:49, 11 May 2020 (UTC)[reply]


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0