Wikipedia gets quite a bit of press attention from drive-by vandalism, incoherent scribbles, rude gestures, and just plain page blanking perpetuated by Internet trolls and schoolchildren who take the site's free-to-edit model as an invitation to cause as much havoc as possible. The public perception that Wikipedia is riddled with errors and perpetually vandalized was a major retardant in the site's formative years, when it first engaged in its still-central battle for relevance and accuracy.
But this is a battle that, on the whole, Wikipedia has been winning for a lengthy amount of time. Years of nearly unchecked growth and explosive expansion have made Wikipedia not only the largest but also the most expansive information compendium the world has ever seen. Editing is tightly watched by users armed with tools like Twinkle, Huggle, rollback, semiprotection, and bots. Vandalism as we most commonly think of it is anything but dead—visible pages still regularly get as much as 50 percent of their edits reverted[1]—but today's compendium of anti-vandalism tools have confined it in lesser form to the furthest and most overtaxed fringes of Wikipedia.
The dearth of vandalism lasting more than a few seconds has done much to improve our image. Five years ago, a project as enterprising as the Wikipedia Education Program could never have even existed, let alone thrived as it does today.[2] The days when being a regular editor on Wikipedia was seen as unusual by others are slowly becoming more distant, its use ever more mainstream, and its editing body ever more academic. But another, subtler form of vandalism persists, and in the deterioration of its more visible cousin, may even be spreading—fabrication.[3] Wikipedia has a long, daresay storied history with the spinning of yarns; our internal list documents 198 of the largest ones we have caught as of 4 January 2013. This op-ed will attempt to explain why.
Wikipedia's policy on vandalism is complex and extensive. Coming in at 41 KB, it is best remembered by the {{nutshell}} wrapper that adorns its introduction, stating that "Intentionally making abusive edits to Wikipedia will result in a block", a threat carried through more often than not. At just over 5k, the guideline on dealing with hoaxes is comparatively slim, and readily admits that "it has been tried, tested, and confirmed—it is indeed possible to insert hoaxes into Wikipedia". It is not hard to tell which is the more robust of the two policies.
First and foremost, this is a consequence of Wikipedia's transitional nature. The site has become mired somewhere between the free-for-all construction binge it once was, and the authoritarian, accuracy-driven project it is quickly becoming. The days of rapidly developing horizontal sprawl are long gone, swallowed up by the project's own growth; increasingly narrow redlink gaps and ever deeper vertical coverage are the new vogue, spearheaded by the plumping of standards and the creation of such initiatives as GLAM and the Education Initiative. Wikipedia gets better, but it also gets much more specialist in nature, and this has a major impact on its editing body. Explosive growth both in the number of articles and the number of editors, once the norm, has been superseded by a more than halved level of article creation and the declining number of active editors, both besides bullish, frankly unrealistic growth projections by the Wikimedia Foundation.[4] The project has reached its saturation limit—put another way, there simply aren't enough new people out there with both the will and the smarts to sustain growth—and the result is that an increasingly small, specialized body of editors must curate an increasingly large, increasingly sophisticated project.[5]
“ | Now, it's pretty much up to yours truly to fix most things that are wrong in any article in the topic area I inhabit, and I just don't have the time to do it all. There are other editors in the topic, of course, but they appear to be in the same predicament. | ” |
— Cla68 |
A sparser, more specialized editing body dealing with highly developed articles and centered mainly on depth has a harder time vetting edits than a larger, less centric one focused more on article creation. Take myself as an example: while I have the depth of field to make quality tweaks to Axial Seamount, I could never do as good a job fact-checking Battlecruiser as a Majestic Titan editor could, and I cannot even begin to comprehend what is going on at Infinite-dimensional holomorphy. This hasn't mattered much for pure vandalism: the specialization of tools has proved more than adequate to keep trollish edits at bay. But vetting tools have not been so well-improved; the best possible solution available, pending changes, has received a considerable amount of flak for various reasons, and has so far only been rolled out in extremely limited form. On pages not actively monitored by experienced editors, falsified information can and indeed does slide right through; with an ever-shrinking pool of editors tending to an ever growing pool of information, this problem will only get worse for the foreseeable future.
The relative decline in editor vetting capacity is paralleled by the ease with which falsehoods can be inserted into Wikipedia. Falsified encyclopedic content can exist in one of three states, by its potential to fool editors examining it: inserted without a reference, inserted under a legitimate (possibly offline) reference that doesn't actually support the content, and inserted under a spurious (generally offline) reference that doesn't actually exist. While unreferenced statements added to articles are often quickly removed or at least tagged with {{citation needed}} or {{needs references}}, editors who aren't quite knowledgeable about the topic at hand passing over a page are extremely unlikely to check newly added references, even online ones, to make sure the information is legitimate. This is doubly true for citations to offline sources that don't even exist. Taking citations valeur faciale is standard operating procedure on Wikipedia: think of the number of times that you have followed a link through or looked up a paper or fired off an ISBN search to ascertain the credibility of a source in an article you are reading; for most of us, the answer is probably "not many". After all, we're here to write content, not to pore over other articles' sourcing, a tedious operation that most of us would rather not perform.
This is why complex falsifications can be taken further than mere insertions: they can achieve the kinds of quality standards that ought to speedily expel any such inaccuracies with great prejudice. The good article nominations process is staffed in large part by two parties: dedicated reviewers who are veterans of the process, and experienced bystanders who want to do something relatively novel and assist with the project's perennial backlog. In neither case are the editors necessarily taking up topic matters they are familiar with (most of the time they are not), and in neither case are the editors obligated to vet the sourcing of the article in question (they rarely do; otherwise who would bother?[6]), whatever the standards on verifiability may be. And when a featured article nomination is carried through without a contribution of content experts (entirely possible), or the falsification is something relatively innocent like a new quote, such articles may even scale the heights of the highest standard of all in Wikipedia, that much-worshiped bronze star! Nor are hoaxes necessarily limited to solitary pages; they can spread across Wikipedia, either through intentional insertions by the original vandal, or through the process of "organic synthesis"—the tendency of information to disseminate between pages on Wikipedia, either through copypaste or the addition of links.
Readers of this op-ed may well take note of its alarmist tone, but they need not be worried: studies of Wikipedia have long shown that Wikipedia is very accurate, and, by derivation, that false information is statistically irrelevant. Well, if as I have striven to show manufacturing hoaxes on Wikipedia is so strikingly easy, why isn't a major problem?
Answering this question requires asking another one: who are vandals, anyway? The creation of effective, long-lasting hoaxes isn't a matter of shifting a few numbers; it requires an understanding of citations and referencing and the manufacture of references to sources, the positing of real intellectual effort into an activity only perpetuated by unsophisticated trolls and bored schoolchildren, and as it turns out the difficulties involved in making believable cases for their misinformation are a high wall for would-be vandals. And even when real hoaxes are made, studies have shown that Wikipedia is generally fairly effective (if not perfect) at keeping its information clean and rid of errors. Hoaxes have reached great prominence, true, but they are small in number, and they can be caught.
But there is nonetheless a lesson to be learned. Wikipedia is extremely vulnerable. If some sophisticated wash wants to launch a smear campaign on the site, falsification would be the way to do it; and that is something that should concern us. The continual unveiling and debunking of hoaxes long after they have been created is a drag on the project's credibility and on its welfare, and when news breaks out about hoaxes on the site in the media it takes a toll on our mainstream acceptance. This is not a problem that can be easily solved; but nor is it one that should be, as it is now, easily ignored.
“ | The Quazer Beast was a perfectly normal looking article. After its creation it was categorized, copy-edited, and linked to; it was even vandalized once. That's the standard life cycle for an article. Except that the Quazer Beast is a hoax, and it isn't the only one out there ... it is important that we recognize what is a Quazer Beast and what is not. | ” |
— Society for the Preservation of the Quazer Beast |
Sorted by date of discovery, here is a selection of what I consider to be fifteen of the most impactful and notable hoaxes known to have existed on Wikipedia.
Discuss this story
The prose in Bicholm conflict might have been "well crafted", but the hoax itself was transparent and easily detectable - had I reviewed the article for DYK, for example, using basic DYK checks I would almost certainly have identified it as a hoax immediately. But neither the GA review nor the (admittedly brief) FAC discussion picked up the problem.
The lesson is really a pretty simple one - be suspicious of any article none of whose major references can be verified online, and for whose content you cannot find any corroboration elsewhere. Gatoclass (talk) 09:18, 13 February 2013 (UTC)[reply]
ROFL! This article previously quoted from the Wikipedia biography controversy article, saying that the hoax had not been discovered and corrected for more than nine months, which is a clear mathematical error (May to September is four months). The "nine months" text was in the main article about the Wikipedia biography controversy article due to unreverted vandalism from November 2012. I've fixed both the mainspace and SP articles, but I guess this op-ed proved its own point. Graham87 11:42, 13 February 2013 (UTC)[reply]
Great, the Chen Fang incident was my fault... Back in 2008, I found out about the hoax from an acquaintance and immediately nominated it for deletion (because contemporary news sources had a different person as the mayor). The hoaxer one day randomly introduced himself to me at work, claimed credit for the page I'd just nominated, and presented me with "evidence" that I am User:Mxn – duh – intending to pressure me to delete the AfD template. He soon deleted the template himself and produced a source that lay behind a paywall (something like Newsbank or ProQuest). It sounded fishy, so on the talk page, I promised to check the source once I got back to campus after my internship, but I never got around to it. Moral of the story: don't procrastinate, or your error will be preserved in Harvard policy for posterity. – Minh Nguyễn (talk, contribs) 12:25, 13 February 2013 (UTC)[reply]