With Erysichton elaborata, the Swedish Wikipedia passed the one million article rubicon this week, following closely on the heels of the Spanish Wikipedia last month. While this is a mostly symbolic achievement, serving as a convenient benchmark with which to gain publicity and attention in an increasingly statistical world, the particular method by which the Swedish site has passed the mark has garnered significant attention—and controversy.
The Swedish Wikipedia, alongside the Dutch and much smaller Wikipedias, is one of the few to allow bots—semi-automated or automated programs—to mass-create articles. Using this method has allowed them to leap from about 968,000 articles in May to about 1,044,000 now, with about 454,000 of them being bot-created. This puts them as the fifth-largest Wikipedia, up from ninth just one month ago, and the same method has pushed the Dutch past the Germans, who had long held the title of second-largest Wikipedia. By comparison, the Polish Wikipedia, which had a similar total to the Swedish in May, is now at 973,000 articles.
The Dutch and Swedish totals come despite their far smaller userbases—for example, the Germans have an active userbase that is five times the size of the Dutch and eight times the size of the Swedish. By the same metric, the Polish are twice the size of the Swedish.
The bot-created articles themselves are basic enough: they are about four sentences long, with an infobox and sources from a common database. Each article is tagged with {{Robotskapad}}
a template that notes its origins. Before it received attention for the achievement it represents, Erysichton elaborata provides an excellent example.
The Signpost contacted the bot operator, Lsj, for his thoughts. He told us that the idea for bot-created articles came from the Dutch Wikipedia and an idea mentioned on the Swedish equivalent of the Village Pump in early 2012. While a "handful" of editors were "adamantly opposed", the great majority were in favor. Several smaller trials were conducted before the large-scale project that led to the millionth article, including on birds and sponges.
He told us that bot-created articles can offer significant benefits to Wikimedia communities: "human minds should not be wasted on mind-numbing tasks that a machine can do equally well. Let the machines do the grunt work, and let humans do what requires real intelligence." Bots are also better and far faster at repetitive tasks than humans, who can inadvertently introduce errors. Any bot errors, which in an ironic twist are typically kindled human mistakes, can usually be fixed by a second bot run, similar to what Lsjbot will be doing to add images to the biological articles it has created.
The very concept of bot-created articles, though, has garnered significant opposition in the Wikimedia community as a whole, particularly from German Wikipedians. The prominent editor Achim Raschka authored a piece in the German-language news outlet Kurier. He lamented the Swedish Wikipedia's "bitter" milestone, which puts a spotlight on an article that has little more than "their existence and taxonomic pigeonholing" and omits key information like where the species lives or what it does. Raschka told the Signpost that these stub articles impart little useful information to readers—he asks, "who could be helped with [these] fragment[s] of data?" He also pointed at an entry Denis Diderot wrote for the Encyclopédie, titled "Aguaxima":
“ | Aguaxima, a plant growing in Brazil and on the islands of South America. This is all that we are told about it; and I would like to know for whom such descriptions are made. It cannot be for the natives of the countries concerned, who are likely to know more about the aguaxima than is contained in this description, and who do not need to learn that the aguaxima grows in their country. It is as if you said to a Frenchman that the pear tree is a tree that grows in France, in Germany, etc . It is not meant for us either, for what do we care that there is a tree in Brazil named aguaxima, if all we know about it is its name? What is the point of giving the name? It leaves the ignorant just as they were and teaches the rest of us nothing. If all the same I mention this plant here, along with several others that are described just as poorly, then it is out of consideration for certain readers who prefer to find nothing in a dictionary article or even to find something stupid than to find no article at all. | ” |
... the bot is always right, uses a neutral language, forms complete sentences, provides verifiable facts and makes no trouble, unlike us human authors. It knows ... correct formatting, rarely [vandalizes], addresses no other authors offensively, sought no barrier tests, never complains and is easily turned off without resistance. There are no bots with gender bias and of course no problems with the author leaving the site. If in any topic people are missing, there is no problem, as the programming of a few new bots by specially trained bots, perhaps with steward rights, proceeds rapidly. They are absolutely reliable even with a vote. ... We simply need to take note: Bots are better Wikipedians, our days are gone. We have only consumption, sex and drugs. But this does not have to be bad, right?
A separate Kurier article by Schlesinger, which hyperbolically compared the bot-created articles to the famous novel Brave New World and claimed that bots can and will replace human editors, is a non sequitur. While bots can create article shells and—as can be seen on the Swedish Wikipedia—even short stubs, they can never be programmed to mass-create detailed articles capable of becoming featured or even good articles.
There was also extensive discussion on the Wikimedia-l mailing list and a Wikipedia blog post.
Lsj was unaware of the wider German-language attacks on bot-created articles, but after examining them, found that they were principally based in deeply held principles, making them difficult or impossible to provide an effective counter-argument.
In reply to Hubertl's sarcastic mailing list post, Lsj commented that the statistics, including view counts, editor numbers, and participation, contradict Hubertl's argument.
Still, a major problem could come from human error. Lsj acknowledges that source materials' errors could then creep into articles, but explains this by saying that a second bot run would fix the problem. The obvious rhetorical reply is simple: what if an error only creeps up every so often and is not fixable by bots? What if these errors are not caught until a significant amount of articles are created? A small base of active users may not be able to deal with the required cleanup.
Despite the risks, carefully planned bot-created articles could hold significant benefits for the Wikimedia movement. As Lsj told the Signpost:
“ | Bots are much faster than people at those tasks that bots can do. It is not realistic to expect articles about 50,000 fungi or 100,000 flies to be hand-written within the foreseeable future. [If our slogan is] "imagine a world in which every single human being can freely share in the sum of all knowledge", bots are the only serious option for approaching that vision in the case of thousands and thousands of obscure organisms. [They provide] proper formatting, sources, infobox, categories, etc., right from the start, unlike many hand-written stubs. | ” |
While German-language Wikipedians lament the loss in quality in these programmatic articles, especially when compared to their stringent biology project guidelines, a short article may be better than none at all. This advantage is particularly apparent in smaller languages, whose Foundation projects have few editors and limited sources of information on the Internet, but far less so for wikis with larger userbases and article counts. It remains to be seen if more wikis will choose to bolster their content in this way.
With little more than a day before voting closes for the WMF elections for three community seats on the ten-member Board of Trustees, fewer than 1700 Wikimedians out of a purported 90,000 active editors have turned out to vote—about one in every 50. This compares with a vote of almost 3500 in the last elections for these two-year seats, in June 2011.
The disappointing rate of participation is despite a lengthy pre-election period and almost two weeks of voting, with banners on all WMF sites and reminder emails sent out. The graph shows the day-by-day vote until the time of publication. The typical spurt of interest followed by a rapid fall-off in numbers occurred twice: once at the open of voting on 8 June, and once a week later on 15 June, corresponding to the distribution of email notifications.Risker, a member of the volunteer election committee, commented: "It is lower than I would have expected ... It may be that the active community of 2013 is not as interested in the 'meta' aspects of the Wikimedia movement as in the past, as we have mostly followed the same processes as existed over the past several elections. Or it could be something entirely different. It's generally much harder to figure out why people don't do things than why they do them."
Of the 1659 votes cast at the time of writing, 592 (35.7%) are from English-language sites, 221 (13.3%) German, 157 (9.5%) Italian, 153 (9.2%) French, 82 (4.9%) Spanish, 55 (3.3%) Commons, 48 (2.9%) Polish, 41 (2.5%) Chinese, and 310 (18.7%) from all other languages.
Other languages on the radar are Japanese (27 voters) and Indonesian (12)—both welcome signs of the beginnings of a closer engagement with the worldwide movement—and Hebrew (10), Finnish (9), Danish (7), and Norwegian (7).
A notable disappointment is Hindi, with one voter out of some 200 million native speakers and a significant number of second-language speakers—the fourth-most-spoken language in the world—and an active and growing offline movement in the subcontinent.
Arabic, counting all dialects, has well over 400 million speakers, including 300 million native speakers, but managed to garner only four voters; this is despite a marked shift from the English and French Wikipedias to the Arabic Wikipedia in Arabic-speaking countries, and a successful start to a WMF education program in Egyptian universities.
Editors can vote until UTC 23:59 Saturday 22 June, by clicking on this link to the SecurePoll interface. Instructions on voting and information about candidates is at Meta. The close of voting corresponds to Saturday afternoon to evening in the Americas, before sunrise on Sunday morning in the Subcontinent, and early to late Sunday morning in East Asia and Australia/New Zealand.
“ | Our school does not have a library at all so when we need to do research we have to walk a long way to the local library. When we get there we have to wait in a queue to use the one or two computers which have the internet. At school we do have 25 computers but we struggle to get to use them because they are mainly for the learners who do CAT (Computer Application Technology) as a subject. Going to an internet cafe is also not an easy option because you have to pay per half hour. 90% of us have cell phones but it is expensive for us to buy airtime so if we could get free access to Wikipedia it would make a huge difference to us. | ” |
Discuss this story
Low voter numbers in WMF elections
The Indonesian voters are partly helped by campaign via Indonesian Wikipedia Facebook Groups. I curious though, how accurate the language count of the voters, if, let's say, I'm voting from Meta? Bennylin (talk) 11:13, 21 June 2013 (UTC)[reply]
I tried to vote but gave up. Simply too difficult. Seems it is organised for programmers. Besides of that WMF and so on (local chapters etc) in my experience really does not connect to the average contributor. It does not mean WMF etc is irrelevant, but that for someone who want to have fun contributing to Wikipedia it is simply a step away from the fun. Best regards Ulflarsen (talk) 15:26, 21 June 2013 (UTC)[reply]
A million articles... With a bot
I must say I find it hard to understand why so many people are vehemently opposed to bot-generated stubs. I myself am very much in favor of bots doing the tedious work of creating stub articles for individual species in invertebrate zoology and botany. I do understand the inevitable Immediatist versus Eventualist disagreement on Wikipedia, but still... I am amazed that in Wikipedia (of all places!) so many people intensely dislike the idea of bots creating these helpful little stubs. Once stubs are in place it is extremely easy for relative newcomers to add images or other useful pieces of information. It is a big nuisance to have to create your own stub every time you want to add an image of a species that is not represented, and I think many people who are not very experienced may be put off by that necessity. I know that some people loathe stubs, but until we ban humans from creating stubs (which are much more likely to be error-prone and much harder to fix), I don't see why we should say it is terrible to let bots create them. Invertzoo (talk) 00:06, 21 June 2013 (UTC)[reply]
Without wishing to join the debate here, let me just add a few pertinent facts:
Lsj (talk) 16:27, 23 June 2013 (UTC)[reply]
The bot won my contribution time
As someone who often tries to add photos of plants or animals that I know almost nothing about (having simply copied their binomial from a label), I completely agree with Invertzoo. I would really prefer not to have to look up all the bits and pieces to make a stub if they can be scripted from a database. If the stub is already there, I can even add a picture to another language wiki. I guess that's why sv:Gudeoconcha sophiae ceb:Gudeoconcha sophiae war:Gudeoconcha sophiae, and sv:Epiglypta howinsulae ceb:Epiglypta howinsulae war:Epiglypta howinsulae have illustrated articles about Gudeoconcha sophiae and Epiglypta howinsulae, species which live(d) solely on an English speaking Island, while the English Wikipedia does not! (Undoubtedly User:Invertzoo, a Gastropod expert, will help me rectify this, but that's not the point.) --99of9 (talk) 10:23, 21 June 2013 (UTC)[reply]
I like the idea of bots doing the grunt work and preparing an easy to improve stub for us squishier types of contributors. Starting an article from nothing can be intimidating for people, not to mention hard if you want to do it well, with all the categories, info boxes etc. Also, I find it heartening to see smaller wikipedias expanding this way. I didn't notice if the bots were using information from the English wikipedia to help with the work, but one of the reasons I contribute to the English wikipedia is my hope that some of this great collection of people's good will can one day find its way to Slovene wikipedia as well. --U5K0'sTalkMake WikiLove not WikiWar 18:30, 23 June 2013 (UTC)[reply]
Raschka's comments: wording/grammar mess
The following needs someone who actually knows what it's supposed to mean to fix it:
Should it be "were stubs that impart" or "were stubs and impart" or something like that? And the direct quote is [fixed from the original] but winds up still making no sense. Should it be "helped [by these]"? DMacks (talk) 00:44, 21 June 2013 (UTC)[reply]
Where voters come from
I noticed the statistics regarding where the voters come from. It does skew the numbers greatly that all links to the SecurePoll page in the Signpost article and the meta page explaining where to vote, are to the English Wikipedia version. If a great number of people have an account there, I wouldn't be surprised if many of them didn't bother changing the URL to their home wiki. -Svavar Kjarrval (talk) 01:07, 21 June 2013 (UTC)[reply]
Wikipedians
English
Spanish
Other
ja 346
pt 201
nl 249
sv 114
Pric-o-pedia
The portrait of Jimmy Wales is hilarious and skillful. I have no idea why anyone would vote to delete the portrait, and the image is a flattering likeness of Wales, so I can't imagine why he objected to it if, indeed, he did. -- Ssilvers (talk) 21:54, 21 June 2013 (UTC)[reply]
pl wiki and sv wiki
"By the same metric, the Polish are twice the size of the Swedish." Userbase = active editors?
2526 (Sorry, ceb:Burg was also created by a bot) --Metrónomo (talk) 01:38, 5 July 2013 (UTC)[reply]