The Swedish Wikipedia's prolific Lsjbot, which has created a significant proportion of the site's 1.7 million articles and has nearly single-handedly pushed it to being the fourth-largest Wikipedia, was covered in the Wall Street Journal this week.
In its front page article, the US newspaper reported that the bot has created 2.7 million articles, which is apparently a reference to the Waray-Waray and Cebuano Wikipedias (where Lsjbot is also active), and that "on a good day", it creates 10,000 articles.
The Wall Street Journal's article comes as the Cebuano Wikipedia is now the twelfth Wikipedia to cross the million article mark, almost entirely from the boost of these formulaic articles. Of these, over 40% (Swedish, Waray-Waray, Cebuano, Vietnamese, and Dutch) have received significant help from automated article creation scripts. The highest depth of these five is Vietnamese, with 18; Swedish follows with 11, and the others are all under ten. By comparison, the German Wikipedia has a depth of 90.
The process of bot-created articles has proved controversial among Wikimedians; by way of commenting, German Wikipedian Achim Raschka pointed the Signpost to an entry Denis Diderot wrote for the Encyclopédie, titled "Aguaxima". Diderot lamented that all they knew about the Aguaxima was that it was a plant in Brazil, yet he still had to describe it: "If all the same I mention this plant here, along with several others that are described just as poorly, then it is out of consideration for certain readers who prefer to find nothing in a dictionary article or even to find something stupid than to find no article at all."
Disagreement with these edits even led to a proposal last year that would have banned the overuse of bot-created articles on Wikimedia projects.
Still, they are not the first Wikipedias to utilize bots to augment human article creators: in 2007, Volapük and Lombard were expanded by over 100,000 bot articles each; Tagalog saw a similar rise. Lombard editors later placed a moratorium on new automated articles and deleted most of them; the Lombard Wikipedia currently has around 31,000 articles. Volapük is hovering around 120,000, and the Tagalog Wikipedia has close to 63,000.
Waray-Waray, Cebuano, and Tagalog are three of the largest languages of the Philippines. Volapük is a 19th-century constructed language from Germany, and Lombard is a Romance language from northern Italy. Vietnamese is primarily limited to Vietnam, while Dutch is spoken in the Netherlands, Belgium, and Suriname.
Discuss this story
Tweets from Congress
Could somebody create a Twitter feed showing all the Wikipedia edits from the headquarters of all the Fortune 500 companies? Smallbones(smalltalk) 13:14, 19 July 2014 (UTC)[reply]
There is also RuGovEdits (which caught some interesting stuff already). --Tgr (talk) 01:03, 20 July 2014 (UTC)[reply]
Bot-created articles
It's very sad to see this trend continuing, especially with species articles. These bots typically use non-specialist databases that simply list every species that has been described, regardless of whether they are currently considered valid. These articles are not maintained by anyone and gradually rot into outdated cruft. And because they are also commonly copied to other Wikipedias, the cruft spreads like a plague throughout the projects. Kaldari (talk) 17:42, 22 July 2014 (UTC)[reply]