The Philippines has over 120 languages

The Swedish Wikipedia's prolific Lsjbot, which has created a significant proportion of the site's 1.7 million articles and has nearly single-handedly pushed it to being the fourth-largest Wikipedia, was covered in the Wall Street Journal this week.

In its front page article, the US newspaper reported that the bot has created 2.7 million articles, which is apparently a reference to the Waray-Waray and Cebuano Wikipedias (where Lsjbot is also active), and that "on a good day", it creates 10,000 articles.

The Wall Street Journal's article comes as the Cebuano Wikipedia is now the twelfth Wikipedia to cross the million article mark, almost entirely from the boost of these formulaic articles. Of these, over 40% (Swedish, Waray-Waray, Cebuano, Vietnamese, and Dutch) have received significant help from automated article creation scripts. The highest depth of these five is Vietnamese, with 18; Swedish follows with 11, and the others are all under ten. By comparison, the German Wikipedia has a depth of 90.

The process of bot-created articles has proved controversial among Wikimedians; by way of commenting, German Wikipedian Achim Raschka pointed the Signpost to an entry Denis Diderot wrote for the Encyclopédie, titled "Aguaxima". Diderot lamented that all they knew about the Aguaxima was that it was a plant in Brazil, yet he still had to describe it: "If all the same I mention this plant here, along with several others that are described just as poorly, then it is out of consideration for certain readers who prefer to find nothing in a dictionary article or even to find something stupid than to find no article at all."

Disagreement with these edits even led to a proposal last year that would have banned the overuse of bot-created articles on Wikimedia projects.

Still, they are not the first Wikipedias to utilize bots to augment human article creators: in 2007, Volapük and Lombard were expanded by over 100,000 bot articles each; Tagalog saw a similar rise. Lombard editors later placed a moratorium on new automated articles and deleted most of them; the Lombard Wikipedia currently has around 31,000 articles. Volapük is hovering around 120,000, and the Tagalog Wikipedia has close to 63,000.

Waray-Waray, Cebuano, and Tagalog are three of the largest languages of the Philippines. Volapük is a 19th-century constructed language from Germany, and Lombard is a Romance language from northern Italy. Vietnamese is primarily limited to Vietnam, while Dutch is spoken in the Netherlands, Belgium, and Suriname.

In brief

Discuss this story

Tweets from Congress

Could somebody create a Twitter feed showing all the Wikipedia edits from the headquarters of all the Fortune 500 companies? Smallbones(smalltalk) 13:14, 19 July 2014 (UTC)[reply]

It would be a great idea. However, not all big companies have their own IP addresses. --NaBUru38 (talk) 18:17, 19 July 2014 (UTC)[reply]

There is also RuGovEdits (which caught some interesting stuff already). --Tgr (talk) 01:03, 20 July 2014 (UTC)[reply]

Bot-created articles

It's very sad to see this trend continuing, especially with species articles. These bots typically use non-specialist databases that simply list every species that has been described, regardless of whether they are currently considered valid. These articles are not maintained by anyone and gradually rot into outdated cruft. And because they are also commonly copied to other Wikipedias, the cruft spreads like a plague throughout the projects. Kaldari (talk) 17:42, 22 July 2014 (UTC)[reply]

I started an en.wikipedia article on Lsjbot. --agr (talk) 18:48, 23 July 2014 (UTC)[reply]
Quite so. Off Wikipedia, there are what appear to be "click bait" sites that generate similar webpages from the US GNIS database. So you google for "Obscure Lake", and the webpage that pops click bait that tells you that it's in such-and-such a topographic map quadrant, which you knew already. After a few times, one recognizes them and avoids them like the plague. I worry about the cumulative effect of training our readers to think that 99 times out of 100, our article at a particular scientific name will be a waste of their clicks. I think the potential damage to our reputation far outweighs the supposed ability of this approach to "nucleate" articles, which it doesn't seem to do at any perceptible rate.
What would make much more sense, although I doubt we have the technical means to do it at this time, would be the ability to create these on demand. When someone has a little bit of information or a picture of a species they want to add, they could trigger the bot, and it would spit out a stub for them to edit. That would be much more useful then filling small Wikipedias with grey goo. Choess (talk) 02:45, 24 July 2014 (UTC)[reply]
We do have this technical ability (or, at least, we're pretty close). There's a discussion here about how to do it from Wikidata; the results would be crude but probably good enough to tell you (in your local language) "this is a type of bat, it lives in Venezuela, here's a photo" on demand. Andrew Gray (talk) 21:08, 25 July 2014 (UTC)[reply]


