On 15 January, the English Wikipedia turned thirteen years old. In that time, this site has grown from a small site that was known to only a select few to one of the most popular websites on the internet. At the same time, recent data suggests that there is a power law among users, where the comparative few who are writing most of Wikipedia have most of the edits. The result of this is that there is going to be bias in what is created, and how we deal with it as Wikipedians is indicative of the future of the site. Furthermore, this brings up what we have to do in order to combat this bias, as there are many ideas, but the question is whether they will work or not.
Every Wednesday, various charts are updated that show trends in editing. These include lists on the top editors, top article creators, and overall bot edit counts, as well as what editors have made the most edits in the last thirty days, which is updated less than the others. Over the past few years, there have been periodic attempts at deciphering this information to figure out what it all means, although as far as I know, no one in the Wikimedia Foundation has published reports using this information. When I came across these lists in 2011 and decided to put these trends on a chart and see what it all meant, unsurprisingly, some interesting trends came up. Fast forward to two weeks ago, when I decided to update the charts for the first time since November of 2012, and I had no idea what I would discover.
One of the more interesting trends that I found during the many hours that I built the charts was how many edits a rather select few Wikipedians have when compared to the rest of the site's users. In terms of overall numbers, 45% of the edits on Wikipedia have been done by a combined ten thousand editors and the 850+ bots on the site. When charted onto a line graph, there is a distinct power law that rises sharply for both bots and editors. Interestingly, the top bot (Cydebot) has more than three times the top edits than Koavf, the editor with the highest edit count on the site. These high number of edits have helped to push the bots into a significant percentage of the overall edits on the site, totaling 12%. As of the publication of this article, there are 20,590,000+ users on the site, meaning that .052% of Wikipedian users (bots included) have a vast majority of the edits.
Even more surprising was the numbers on article creators. Most Wikipedians who are active on the site have written an article or two, some being as simple as a stub, or some that have been expanded to a Featured Article. Other times, users focus on expanding existing articles, due to knowledge on a specific subject area. Other users, myself included, have created hundreds or tens of thousands of articles. To find the time to even create an article thoroughly takes time and dedication, and it is likely that many of these articles were created as stubs. This is shown in the fact that the top 3,000 editors have written 55% of the articles on the entire site. Adding in the next 2,000 editors shows that they have only written 5% of the articles, but it shows that 60% of the articles on this site have been written by 5,000 users, which equates to .026% of the site's overall users. Of note are the numerous IP addresses that show up on these page creation lists, as before 2005 users were allowed to anonymously submit articles (a feature which was removed because of the Seigenthaler incident). On the list, the IP address 67.173.107.96 has 983 live article creations, a number which places it at 459th on the list.
One question that should be asked about the fact that so few editors are writing so many articles is why this is occurring. Wikipedia can often be harsh to new users, as the amount of rules both written and unwritten can scare off even the most dedicated of writers. Those who stay seem to be ones who want to contribute and write more for the site, but the data seems to show that these are an incredibly select few individuals when compared to the over twenty million usernames that have been registered over the years. Furthermore, with declining editor counts, this number is only going to become more of an issue over the years as the Wikipedians who are left will probably start expanding into more niche topics, ones that are not easily researchable to the average person with stable internet access.
One other question that this brings up are what are the costs of having so few editors who write so many articles. In theory, having fewer users write more articles brings standardization to the site, as there are fewer differences in prose and article quality. In reality though, having so few users means that there is going to be an implicit bias in what is written, to degrees which have already been shown through the work of the Wikimedia Foundation. With the already low numbers of females on the site, this means that there will be more coverage of male-oriented topics. If an article is not covered immediately, there is a good chance that it will be created in the coming years. Unfortunately, this means that whatever female-oriented topics are out there will probably get further neglected, as there is less of a chance that someone will even know that the subject exists, never mind it being notable enough for an article (when in doubt, go for it). The amount of these super page creators only exacerbates the problem, as it means that the users who are mass-creating pages are probably not doing neglected topics, and this tilts our coverage disproportionately towards male-oriented topics.
Finally, the last question that is brought up is why are the majority of editors only responsible for 60% of the articles. Most users are aware of the Article wizard, while fewer know about Articles for creation (side note, if you can, please volunteer there, as they have been flooded in the past couple of years by new articles and are in need of knowledgeable Wikipedians for reviews). Oftentimes, articles that are created in either of these two venues that are created by inexperienced users are deleted or shot down before the users have any idea what is going on. This can be a discouraging issue and dissuades users from helping out. Other times, they will come seeking help, but will get discouraged when the topic that they have been working on is deemed unnotable. Most likely, many more Wikipedians out there have attempted to create an article, but because it is deleted, the data skews slightly more in favor of pushing the number of edits towards experienced Wikipedians, who then go on to hold a slightly more majority of article creations as well.
The Teahouse has been a successful model of helping new editors along in the process. Through providing guidance to new editors, they have found great success in their endeavors. Additionally, mentoring editors and guiding them towards working on articles that they might not have originally thought of working on can also be a good way to direct their enthusiasm into something positive. Through the channeling of talent and encouraging and redirecting editors onto viable paths, it is possible to ensure that a greater amount of knowledge will be present on the site in the coming years. Finally, the Wikipedia Education Program and Wiki Education Foundation have also attempted to make inroads in the classroom, by encouraging students to become more involved with the community through their school work.
The final part of this is whether or not these attempts will work. A community that is dedicated to fixing and addressing the issues that exist on the site is a community that will succeed. In the past, many ideas at reform have been met with resistance from the community, often with mixed results. Other times, approaches to fix these issues run counter to what others want to do in the community, so some editors end up unintentionally (or intentionally, for that matter) sabotaging the intentions of reform-minded users, although this can also be expected in a large community where people have differing views.
In the end, it is up to us as a community to ensure that the site continues for another thirteen successful years, as we are part of one of the greatest social, intellectual, and academic experiments on the internet. Our success in the coming years will be based on how we choose to address these issues, so it is imperative that we attempt to correct these issues while there are still people interested in editing the site, in order to continue to strive to be the most important encyclopedia in the world.
Discuss this story
Early discussions
Consider also that most vandalism, self-promotion and other blatantly non-constructive edits fall within the orange pie, and it becomes obvious that while more than a billion could edit, and tens of millions indeed do edit, overall it is just a few thousand people that write most of the useful content on Wikipedia. I wouldn't be surprised if you would find similar trends within contributors to high-quality articles (FA, GA). --ELEKHHT 01:03, 25 January 2014 (UTC)[reply]
The last graph is amazing (and more than a bit scary). Am I right in thinking though that the concentration of edits among a fairly small proportion of accounts is not a new development? Thanks for this interesting article BTW. Nick-D (talk) 02:49, 25 January 2014 (UTC)[reply]
As we are interested mainly in content, perhaps we should strip out bot edits. in my experience, these tend to be non-content in nature. -- Ohc ¡digame! 07:34, 25 January 2014 (UTC)[reply]
Participation inequality is commonly found in many Internet projects. Very true within all Wikimedia projects, and with smaller projects between them. Interestingly, in my recent (still collecting data) pattern on Arbcom, I am seeing it even in the Arbitrators activity. It's everywhere, probably could even find it if we run some metrics on Signpost contributors... --Piotr Konieczny aka Prokonsul Piotrus| reply here 11:01, 25 January 2014 (UTC)[reply]
Demographics
Part of what WP has been experiencing is a normal fad cycle, in which WP broke big c. 2005 and has sort of gradually declined since as the proverbial "low hanging fruit" vanished, footnoting became more rigorous, the rulebook expanded, and so forth. People get older, they get jobs, have families, and get on with other things in life — it's normal. The question is how to replace those departing with newcomers, who in my minds eye have specialized knowledge able to take foundation articles to a higher level and build gingerbread topics on the edifice. I figured it would be a new corps of "grey rookies" — tenured and emeritus professors winding down their active teaching careers but seeing value in the educational mission of Wikipedia. I figured the one thing that was really needed was a WYSIWYG editor that was more or less as simple as MS Word.
Well, that entire vision is in question, given the abject failure of the VisualEditor project. So, maybe there's no new crop of converts to the WP mission, maybe we continue to add to our writing corps one person at a time, rather at random, much like we always have. It's a difficult puzzle to solve. I see hope in those academics seeking to make WP improvement part of their class projects, but I must frankly say that the reality of the work doesn't meet the expectations from what I've seen so far. Still, that's one potentially fertile ground for development. And hurray for those who work patiently with newcomers, teaching them the ropes, because ultimately that's how the process of developing new contributors works. Carrite (talk) 03:03, 25 January 2014 (UTC)[reply]
Thank you for your excellent article Kevin Rutherford. Looking at the data, I don't take it as a given that it shows a problem. It might. It might show a functional self-sorting. The Wikipedia is pretty much a new thing so we don't know what the edit distribution is supposed to be 13 years in.
If it does show a problem, is the problem in the distribution or the raw numbers? If it's in the distribution, we could address it by imposing a cap on number of edits per month. That this would be foolish indicates to me that it's probably not a distribution problem. So if it's a problem, it's raw numbers. So to solve the problem, we want to multiply by n the number of editors making 2000 edits/month, 500 edits/month, 20 edits/month, and so on, without much caring if the distribution changes. Well of course, ideally. (Although no change carries no downsides; n=2 could see degradation in average value of an edit, for instance; but I think most people would agree that more editors would be better, within reason.)
But what is the value of n such the we would be able to say "problem solved"? We can't say. So we're saying "some more editors would be nice". Which is true, but doesn't prove that we have a problem now. Herostratus (talk) 03:38, 25 January 2014 (UTC)[reply]
What I think would test my hypothesis would be to do an analysis of the data over the last 10,12 years. Identify on a month-by-month basis the people who contributed 50% of the content. (Based on Kevin's article, this appears to be a breakpoint.) Has that number fluctuated greatly over the years? Then look at the usernames in this group: what is the turn-over? Do people routinely appear in it -- is there an identifiable "core" of contributors -- or do people routinely show up for only a month or two, then drop out of it for several months? If there is an identifiable core of contributors to Wikipedia based on these two ways of crunching the numbers, then there is a limited pool of contributors. And if the WMF wants to effectively solve issues of system bias in Wikipedia, it needs to do in light of this information. (And if such a study proves my beliefs are incorrect, I'll admit I'm wrong & stop mentioning it.) -- llywrch (talk) 20:52, 27 January 2014 (UTC)[reply]
Teahouse
Regarding this:
I was under the opposite impression - that the data gathered, so far, indicates the Teahouse has had relatively little success. Could someone point to a definitive analysis? -- John Broughton (♫♫) 03:48, 25 January 2014 (UTC)[reply]
It would be nice to see some academic metrics. Is WMF running any studies on this? They should be. (Btw, wasn't Teahouse grant-supported?). --Piotr Konieczny aka Prokonsul Piotrus| reply here 11:07, 25 January 2014 (UTC)[reply]
Longevity
While I agree that some Wikipedians have become ridiculously prolific editors, I would guess that those with the highest edit counts are also those with the longest longevity. Does Wikipedia have a class of rapid adopters that go from 0 to 10,000 edits in just a couple years? Is it more likely a cohort of Wikipedians that joined pre-2006 have stayed and racked up the edits while new editors have in the past few years joined and quit or perhaps haven't had time to be as prolific? That longevity (I would surmise) has something to do with survivability at ANI, RfA, and ARBCOM where old hands will find stability as newer members find the door. Chris Troutman (talk) 08:30, 25 January 2014 (UTC)[reply]
Holy Guacamole!
Thanks to you I just found out I am in the top 500 of article creators. I find that surprisingly sad, because I often claim that my edits are just a drop in the bucket of what needs to be done. This also reinforces my firm belief that the article creation process is really crucial to the health of the project and definitely needs more work in order to keep from scaring newcomers away. In outreach efforts, I always emphasize using a "similar existing article" to the type of article any newcomer wants to work on as an example of how to approach article creation. This approach has helped at least one person who I met briefly again last Saturday after she made 49 new articles with the help of another veteran editor. In-person outreach is the best way to get new people editing or to spur existing editors into editing more. Jane (talk) 09:29, 25 January 2014 (UTC)[reply]
Error?
"Most users are aware of the Article wizard. Even fewer know about Articles for creation." I think there is a logical problem here? --Piotr Konieczny aka Prokonsul Piotrus| reply here 10:57, 25 January 2014 (UTC)[reply]
How big's the pool?
How many self selected, volunteer, encyclopedists, who are adept at research, writing, and coding references (who no doubt find all of those tasks tedious, at times - and at other times pleasurable, or, at least worth it) are there in the world? Alanscottwalker (talk) 16:01, 25 January 2014 (UTC)[reply]
Where am I?
I should be in the 5,000s -- having done 11,097 edits. But I'm not there. How come? Smallchief (talk 17:17, 25 January 2014 (UTC)[reply]
Redlinks and the culture of the site
I was having a discussion recently with an editor who hasn't been here that long, and had never seen anyone deliberately leave a redlink in an article. I think that is worth more of an OMG than the raw statistics on who creates new articles.
Given another conversation with someone who had taken out one of my redlinks, which had a connected conversation when I finally created the article and was told (by another editor) that it was "like a Christmas present", the culture on redlinks has taken a wrong turn. They are not to be regarded as "defects", but (as ever) growth points.
WP:REDLINK is fine in its nutshell version: further down is where BLP concerns may be overdone.
I have worked on redlink lists for most of my time here. More structured effort on missing articles could help newcomers. Charles Matthews (talk) 18:39, 25 January 2014 (UTC)[reply]
Mark Twain
On the culture of the site, I do find the statistics strange. Since 2004 I have created over 80 new articles from scratch, yet my personal statistics show me making less than 2000 edits in that time. I think the reason is that I write and code a new article on an off-Wiki text editor, and only when I am satisfied with it will I paste it to the Wiki editor for a final polish before saving to Wikipedia. This means that all the editing effort, often over several days of editing time per article, is not accurately represented. Apwoolrich (talk) 19:27, 25 January 2014 (UTC)[reply]
Again...
I have to say this every time bias discussions come up on Wikipedia: the people who edit are the people who edit. There's little we are going to do to change that. The distribution of editors is determined by large-scale social forces and we are mostly powerless against them. We aren't going to get as many women as men editing because women think they have better things to do. (Object to that? Well, it was one of the leading reasons women gave for not editing in a survey by the WMF to study the gender gap.) So unless the WMF is going to start offering baby sitting services to editors or start forcing people to do something they don't want to do, we are going to miss out on some demographics. I personally think, while editor demographics is interesting, efforts to "combat" it are largely futile and, at this point, somewhat blasé. Jason Quinn (talk) 03:03, 27 January 2014 (UTC)[reply]
Aaron Swartz article
Although the essay I started, WP:1EDITMYTH, does not provide many details, it can be shown even in 2002-2003 how experienced editors were mainly creating and expanding articles, not much by passing strangers of a few edits. A key example is the article for singer "Édith Piaf" created by IP 209.105.200.54 on 26 July 2002 (see: hist). Well, after created, as title "Edith Piaf", that page was mostly expanded that year by IPs 200.100.200.*, who were all 250 IP addresses of the same person (as deduced by narrow focus on French subjects). Another editor who expanded the Piaf page in June 2003 was User:Arpingstone, still active this week (contribs). Other major text was added by User:DW (contribs), who edited many pages for 6 solid months (blocked 4 years later), adding content to hundreds of pages (not a hundred drive-by editors each adding a paragraph, but rather 1 username adding, adding, adding to every page). Those actions are typical: there are some editors who add/add/add content, but do not copy-edit for grammar, sources or wikilinks. Those IP editors who seem to add text for 1-2 days and leave "never to return" are often the same people with another IP address or username. I will try to expand essay WP:1EDITMYTH to better explain editor activity during years 2002-2005. -Wikid77 (talk) 17:04, 30 January 2014 (UTC)[reply]