Study of vandalism survival times

Vandalism

Study of vandalism survival times

Methods

A random sample of 100 articles from the English language edition of Wikipedia was obtained through the use of the random article link in navigation toolbar. For each article, the history log was used to examine each recorded change, starting from the most recent, going back until a clear instance of vandalism was found. Then the changes were scanned in reverse order, going forward until the vandalism was corrected.

For each such instance of vandalism, the elapsed time until correction was computed, in minutes. These are the fundamental data on which this report is based.

In addition, some notes were taken on the general nature of the vandalism. All data collection occurred on 2009-06-11.

Results

Of the 100 articles, fully 75 had never been vandalized.
Of the 25 articles that were vandalized at least once, the most recent such instance of vandalism was eventually corrected in 23 articles.
In five (20%) of the vandalized articles, the most recent instance of vandalism was corrected in less than one minute. A further four instances were corrected in less than two minutes.
The median time to correction was four minutes.
Two articles were found to have suffered vandalism that was never corrected. One of these was a subtle act of vandalism that was committed on 2007-02-23, and still not detected by the date of the study, 2009-06-11.

Discussion

Distribution of time to correction (in minutes) for Wikipedia vandalism.

A histogram of times to correction is shown in the chart to the right. Note that the horizontal axis is depicted on a logarithmic scale, to accomodate its enormously long right-hand tail.

In this histogram there are evidently two separate processes at work. The bulk of the histogram follows a curve that declines as a power function of elapsed time: this is the process by which ordinary readers and editors of Wikipedia stumble across and correct instances of vandalism.

The first two bars on the left, however, are significantly higher than the curve would suggest. The difference between the actual height of the bars and the height predicted by the curve is accounted for by the independent activity of Wikipedia's Recent Change Patrol (RCP). Members of the RCP typically monitor the Recent Change Log for suspicious edits. The RCP is able to correct most blatant vandalism within seconds of occurrence.

Both of these vandalism-correction processes act in concert to produce a remarkable result: the median time to correction for vandalism in this study was found to be just four minutes. Similar (unpublished) studies performed by this author one and two years ago yielded median times to correction of five and six minutes, respectively. It seems apparent that Wikipedia is improving its already impressive rate of vandalism detection and correction.

Problems with Mean Time to Correction

The fact that the estimated curve for the survival function is exponential on a graph whose horizontal axis is logarithmic indicates that the probability density function itself follows a power law distribution, also known as a Pareto distribution, given by the formula

f(x)=ax^{-b-1}

If the parameter $b$ in the above formula is less than one — as it is in this case — then the mean of the distribution is infinite. The practical significance of this unusual situation is that any sample mean calculated from empirical data conveys absolutely no information whatsoever about the typical length of time that it takes for an instance of vandalism to be corrected.

The only useful alternative to a sample mean in this situation is the sample median, which is fully robust with respect to long-tailed distributions.

Depending upon what assumptions are made concerning the rate of activity of the RCP, the parameter $b$ for the Pareto distribution lies in a range between about 0.25 and 0.40. This range is comfortably below one, indicating that the tail of the distribution is huge and that sample means are completely and utterly useless for describing the data.

Observations on types of vandalism

About 84% of the vandalism that I observed in this random sample seemed to be just adolescent fooling around. Of the 16% that appeared more adult, half seemed to be adult humor or anger, and half seemed to come from people whose intent was to leave a permanent but nearly invisible mark upon Wikipedia. For example, the perpetrator will carefully change the spelling of an obscure name to an incorrect form, or change a location to something that still looks plausible at first glance. I imagine them coming back over and over again to the page that they altered, to see if that subtle little change is still there. Perhaps this impulse is roughly the same as the one which causes people to carve their initials into trees, or to scratch them on rocks.

Conclusions

The fact that 50% of all vandalism is being detected and reverted within an estimated four minutes of appearance should go a long way to allay fears about the susceptibility of English-language Wikipedia articles to malicious vandalism. On the other hand, the fact that an estimated 10% of all vandalism endures for months and even years indicates that some new tools and strategies are needed for rooting out the most subtle and persistent forms of vandalism.

Raw data

The elapsed times (in minutes) to correction for the instances of vandalism found in this study were as follows: { 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 4, 5, 8, 9, 19, 73, 213, 490, 672, 2442, 14176, 152996 }. In addition, two cases of vandalism had never been corrected (until discovered by the author).

In this issue

22 June 2009 (all comments)

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

==Author's note==

I appreciate any and all commentary and criticism of this study — use the Discussion page for this. If you edit this report, please do so with extreme care. Aetheling (talk) 05:35, 15 June 2009 (UTC).[reply]

Sample size

Have you thought about using a larger sample? You'll admit, a sample size of only 100 has some pretty big error bars on it. I know it's tedious to do more, but hey ... that's what grad students are for :-P

Also, I'd like to see some way to take into account the "importance" of a page. You could use monthly page view numbers as a rough proxy. My guess is that vandalism reversion time and the popularity of an article are highly correlated. So while 4 minutes might be the median across all articles, the median across articles that people are actually reading (let's admit, most articles barely get read at all) might be smaller still. --Cyde Weys 03:09, 23 June 2009 (UTC)[reply]

Thanks for the comments. On your first point: data from 100 pages produced a very clear survival curve, which was after all the goal of the study. I am reasonably certain that adding more pages to get a smoother curve is just a waste of time. I did consider asking a grad student to help with the data collection, but in the end the sad fact is that I just don't trust a grad student to be careful enough with reading through all those edits (it helps to have some OCD traits). On your second point, I think this is an excellent idea. I wish I had thought of it! Perhaps the resulting data will allow some form of logistic regression. Next time... —Aetheling (talk) 05:19, 23 June 2009 (UTC).[reply]

Nice informative study. I also reacted negatively to the mention up front of a sample size of 100. Agree on Cyde's idea of somehow normalizing for the page views. Tempshill (talk) 19:42, 26 June 2009 (UTC)[reply]

Some thoughts on tools

Thanks for the study -- it was quite interesting to read. My first thought was, did you correct the two instances of vandalism that had not been corrected before? If not, tell me what they were, and I'll do it. JesseW, the juggling janitor 19:05, 16 June 2009 (UTC)

Sorry for not making this clear. Yes, I did correct the two instances that I found. —Aetheling (talk) 03:41, 19 June 2009 (UTC).[reply]
- I assume this fix to Leslie Roy Marston was the one from 2007, as it was vandalized on the day you mentioned; but what was the other one? 70.213.92.234 (talk) (really, User:JesseW/not logged in) 04:49, 25 June 2009 (UTC)[reply]

Regarding what tools might be helpful -- better history analysis tools would seem to be of considerable help. For quite a while now, I've wanted to take the time to craft a number of such tools: one to show all the text that has been added to an article over a given time-frame (even if it was removed within the time period); one to highlight the age of text; one to highlight text that has not stayed unaltered during a given time frame; etc. I think such tools would go a long way to rooting out vandalism that got lost in the history. The remaining problem would be intentionally subtle vandalism incorporate within otherwise correct changes, or subtle factual lies or bias, which is even harder to handle. Your thoughts would be certainly appreciated. JesseW, the juggling janitor 19:05, 16 June 2009 (UTC)

uncorrected vandalism

I have to ask: What were the two articles with uncorrected vandalism? Kaldari (talk) 00:49, 17 June 2009 (UTC)[reply]

Sorry, I didn't keep a record. They were quite forgettable. To get a feeling for how banal most Wikipedia articles are, try clicking the Random article link for a while. You will see endless details about popular culture, obscure geographical locations, odd lists, etc. It leads me to think that a better study might focus on the more substantive articles, which also tend to attract more vandalism. —Aetheling (talk) 03:41, 19 June 2009 (UTC).[reply]

Hmm...

I think this examination is a very good start, but the small size of the sample, combined with the use of such expressions as "50% of all vandalism is being detected" and "10% of all vandalism endures for months" (emphasis mine) makes me very uncomfortable. While those were the percentages that turned up in your (rather small) sample, it's a bit over-reaching to assert that 100 samples are absolutely representative of "all vandalism". – Clockwork Soul 05:23, 23 June 2009 (UTC)[reply]

It is a small sample, but the resulting survival curve is convincing and the overall results are similar to previous studies. I don't think sampling error is a problem here. Incidentally, sampling error has no effect on the "50%" figure, because that comes from the definition of the median. It does affect the median itself (the "four minutes to reversion" figure). I have changed the wording slightly so as to make it less absolute. —Aetheling (talk) 03:23, 24 June 2009 (UTC).[reply]

Query

This is excellent work. I’ve been sceptical about the usual reassuring statements about vandalism reversion for a long time, having come across many instances of ancient vandalism persisting, even in rather high-traffic articles. This neatly describes what is going on. A question: is it possible to estimate from these results what percentage of articles are currently vandalised? (I realise that 2% of the sample was in this state, but I am not clear what, with any confidence, can be drawn from that.) Ian Spackman (talk) 06:30, 23 June 2009 (UTC)[reply]

I didn't record the data that is required to estimate the percentage of articles that are currently vandalized. It's a shame, because it would have been easy to do, had I thought about it. Next time... —Aetheling (talk) 03:23, 24 June 2009 (UTC).[reply]

It is probably impossible to prevent all vandalism - whether creative/humorous or destructive - and some examples will overlap with truthiness, POV-ism and/or genuine misunderstanding. Even if there was a drive to ensure that "every last article as of 1 January 2010 is free of error, vandalism, POV and other problems" a few examples will survive - and there will be a fresh crop of such things emerging.

I would guess that eg (present Pope, Prime Minister, Monarch, President, Sports Champion etc) will be more subject to vandalism than the equivalents from 100/200/500 years/other date ago. - — Preceding unsigned comment added by 83.104.132.41 (talk • contribs)

There are some good points here. I appreciate that restricting the survey's focus to clear instances of vandalism was probably necessary, but I think the grey area of subtle vandalism/POV/misunderstanding presents interesting problems of its own. Presenting the study's conclusions in terms of "all vandalism" seems to sweep them a bit too far under the carpet for my taste. -- Avenue (talk) 08:52, 24 June 2009 (UTC)[reply]

Possible bias

Thanks for this interesting study. I do agree with your general conclusions. However I think that the results may be biased (in the technical sense) due to the sample design, and in particular your investigation of only the most recent instance of vandalism from each article. This means that instances of vandalism in heavily vandalised articles would be given equal weight in the results to instances in less vandalised articles, and thus each instance of vandalism in a heavily vandalised article is less likely to enter your sample than instances in less vandalised articles. If vandalism to heavily vandalised articles is corrected more quickly and thoroughly, e.g. because people expect it and watch for it more closely, then your measures of time to correction would tend to be overstated.

It might be possible to correct for this effect, e.g. by weighting based on some measure of vandalism rate. However there are other potential biases lurking here too, e.g. due to some articles being older than others. Adjusting for everything may be difficult. Another option would be to think through any assumptions you are making, and hedge the results accordingly. -- Avenue (talk) 08:38, 24 June 2009 (UTC)[reply]

Some topics are vandal magnets, while "in the news topics" (in the broadest sense) are likely to suffer much vandalism, "errors arising from overlapping editing" and other sources of error, which will drop significantly after the event passes into history (eg articles on George W Bush and Tony Blair are likely to show this phenomenon). And "vandalism and errors" in articles on obscure topic are likely to remain undetected for some time. Could "someone statistical" be brought in to determine suitable bases for "low", "medium" and "high" activity articles? (A more technical analysis would involve comparing articles across the various languages in which they appear - to see the way in which particular controversy "travels." —Preceding unsigned comment added by 83.104.132.41 (talk)

Wikiproject

Ideally, I had set up the Wikipedia:WikiProject Vandalism studies to do just the sort of study that is mentioned here. Hopefully, with this study there may be more interest in getting that project going again. Remember (talk) 16:52, 24 June 2009 (UTC)[reply]

Methology

I see some problems with matching the conclusions to the results. We (or you) state that "50% of all vandalism is being detected and reverted within an estimated four minutes." I'm not sure you studied that. I think what you found was that 50% of previously vandalized articles had their most recent vandalism reverted within an estimated four minutes. I think there are two effects you are ignoring.

1. You're not taking a good sample of "vandalism." "Vandalisms" are edits, and so your sample should select randomly from edits. Instead, you sample randomly from articles. This substantially overweights Ted Chabasinski which is one article, but has 6 edits, and substantially underweights George W. Bush which has more edits, and thus more vandalism, but if both of those two articles were the entire sample, you would say that 50% of articles have never been vandalized and 50% of articles have their vandalizm reverted in seconds for a median of "in seconds." In fact, what you should have done was take a random sampling of edits, determined which of those edits was vandalism, and determined the reversion time on those edits. You can select a random edit from the database in multiple ways - I'm certain the more technically literate can help you figure out the most random way.

2. You're ignoring the "still exists" vandalism that was covered by more recent vandalism. Imagine an article was vandalized in a very subtle and damaging way a year ago (say, alledging that the person was involved in the assassination of JFK). Then, imagine that someone, 1 minute ago, wrote "PENIS PENIS PENIS PENIS" over the header, which was instantly reverted. Your study would show that the vandalism TTL on this article was instantly reverted, when, in actuality, the TTL in a sample of all-vandalism would show 1 TTL of instant, and 1 TTL of never reverted.

These two effects would seem, to me, to pull in different directions. My expectation is that if you gave a distribution, you would show a median TTL that was too long (4 minutes too large), but with a distribution that was far too normal (IE - vandalism has a fatter tail than even you discuss, consisting of subtle, damaging vandalism designed to disparage people the vandal does not like). This is also being discussed offsite, at [1] but one should have a very thick skin and be able to deal with all comers if they engage at that location. Hipocrite (talk) 16:41, 25 June 2009 (UTC)[reply]

I raised some similar issues in the #Possible bias section above. Thanks for the link to the external discussion; there are some good ideas there, along with the vitriol. -- Avenue (talk) 19:19, 25 June 2009 (UTC)[reply]

I strongly agree, especially to point number 2. As a result, this report might be encouraging underestimation of the real urgency that Wikipedia is facing due to vandalism. - Subh83 (talk | contribs) 05:22, 13 April 2011 (UTC)[reply]

Never?

I have to say, I like this study and what it sets out to do. We can learn from this, then repeat it, with bigger samples to see if we have improved.

I quibble with your use of the word "never" - why not just state the time between the vandalism and the time you found it? It could have been only 1 day for all we know. I can't really see what "never" could mean in this context. Stevage 01:13, 26 June 2009 (UTC)[reply]

Studies

Would a "compare and contrast" of "rearrangements and vandalism" to Michael Jackson, Farrah Fawcett and AN Other Minor Notable be useful?

Guestimating the likelihoodness of non-constructive rearrangements for the three persons.

Mistake?

I haven't read the article very carefully, but in Wikipedia:Wikipedia Signpost/2009-06-22/Vandalism#Results... since when has 5 been 25% of 25? 20% surely? Maybe it should also be made clearer that this is the percentage of the vandalised articles, not of the total - any fool reading the page could work that out, but there are a lot of fools on Wikipedia. —Vanderdecken∴ ∫ξ φ 20:45, 1 July 2009 (UTC)[reply]

It's your Signpost. You can help us.

Home

About