The Signpost

Special report

The few who write Wikipedia

Contribute  —  
Share this
By Kevin Rutherford
The views expressed in this special report are those of the author only; responses and critical commentary are invited in the comments section.
Edit distribution of all Wikipedians, as of 8 January 2014

On 15 January, the English Wikipedia turned thirteen years old. In that time, this site has grown from a small site that was known to only a select few to one of the most popular websites on the internet. At the same time, recent data suggests that there is a power law among users, where the comparative few who are writing most of Wikipedia have most of the edits. The result of this is that there is going to be bias in what is created, and how we deal with it as Wikipedians is indicative of the future of the site. Furthermore, this brings up what we have to do in order to combat this bias, as there are many ideas, but the question is whether they will work or not.

Some observations

Every Wednesday, various charts are updated that show trends in editing. These include lists on the top editors, top article creators, and overall bot edit counts, as well as what editors have made the most edits in the last thirty days, which is updated less than the others. Over the past few years, there have been periodic attempts at deciphering this information to figure out what it all means, although as far as I know, no one in the Wikimedia Foundation has published reports using this information. When I came across these lists in 2011 and decided to put these trends on a chart and see what it all meant, unsurprisingly, some interesting trends came up. Fast forward to two weeks ago, when I decided to update the charts for the first time since November of 2012, and I had no idea what I would discover.

One of the more interesting trends that I found during the many hours that I built the charts was how many edits a rather select few Wikipedians have when compared to the rest of the site's users. In terms of overall numbers, 45% of the edits on Wikipedia have been done by a combined ten thousand editors and the 850+ bots on the site. When charted onto a line graph, there is a distinct power law that rises sharply for both bots and editors. Interestingly, the top bot (Cydebot) has more than three times the top edits than Koavf, the editor with the highest edit count on the site. These high number of edits have helped to push the bots into a significant percentage of the overall edits on the site, totaling 12%. As of the publication of this article, there are 20,590,000+ users on the site, meaning that .052% of Wikipedian users (bots included) have a vast majority of the edits.

Even more surprising was the numbers on article creators. Most Wikipedians who are active on the site have written an article or two, some being as simple as a stub, or some that have been expanded to a Featured Article. Other times, users focus on expanding existing articles, due to knowledge on a specific subject area. Other users, myself included, have created hundreds or tens of thousands of articles. To find the time to even create an article thoroughly takes time and dedication, and it is likely that many of these articles were created as stubs. This is shown in the fact that the top 3,000 editors have written 55% of the articles on the entire site. Adding in the next 2,000 editors shows that they have only written 5% of the articles, but it shows that 60% of the articles on this site have been written by 5,000 users, which equates to .026% of the site's overall users. Of note are the numerous IP addresses that show up on these page creation lists, as before 2005 users were allowed to anonymously submit articles (a feature which was removed because of the Seigenthaler incident). On the list, the IP address 67.173.107.96 has 983 live article creations, a number which places it at 459th on the list.

What does this all mean?

Top article creators when compared to the rest of the community

One question that should be asked about the fact that so few editors are writing so many articles is why this is occurring. Wikipedia can often be harsh to new users, as the amount of rules both written and unwritten can scare off even the most dedicated of writers. Those who stay seem to be ones who want to contribute and write more for the site, but the data seems to show that these are an incredibly select few individuals when compared to the over twenty million usernames that have been registered over the years. Furthermore, with declining editor counts, this number is only going to become more of an issue over the years as the Wikipedians who are left will probably start expanding into more niche topics, ones that are not easily researchable to the average person with stable internet access.

One other question that this brings up are what are the costs of having so few editors who write so many articles. In theory, having fewer users write more articles brings standardization to the site, as there are fewer differences in prose and article quality. In reality though, having so few users means that there is going to be an implicit bias in what is written, to degrees which have already been shown through the work of the Wikimedia Foundation. With the already low numbers of females on the site, this means that there will be more coverage of male-oriented topics. If an article is not covered immediately, there is a good chance that it will be created in the coming years. Unfortunately, this means that whatever female-oriented topics are out there will probably get further neglected, as there is less of a chance that someone will even know that the subject exists, never mind it being notable enough for an article (when in doubt, go for it). The amount of these super page creators only exacerbates the problem, as it means that the users who are mass-creating pages are probably not doing neglected topics, and this tilts our coverage disproportionately towards male-oriented topics.

Finally, the last question that is brought up is why are the majority of editors only responsible for 60% of the articles. Most users are aware of the Article wizard, while fewer know about Articles for creation (side note, if you can, please volunteer there, as they have been flooded in the past couple of years by new articles and are in need of knowledgeable Wikipedians for reviews). Oftentimes, articles that are created in either of these two venues that are created by inexperienced users are deleted or shot down before the users have any idea what is going on. This can be a discouraging issue and dissuades users from helping out. Other times, they will come seeking help, but will get discouraged when the topic that they have been working on is deemed unnotable. Most likely, many more Wikipedians out there have attempted to create an article, but because it is deleted, the data skews slightly more in favor of pushing the number of edits towards experienced Wikipedians, who then go on to hold a slightly more majority of article creations as well.

What can we do to fix this?

The Teahouse has been a successful model of helping new editors along in the process. Through providing guidance to new editors, they have found great success in their endeavors. Additionally, mentoring editors and guiding them towards working on articles that they might not have originally thought of working on can also be a good way to direct their enthusiasm into something positive. Through the channeling of talent and encouraging and redirecting editors onto viable paths, it is possible to ensure that a greater amount of knowledge will be present on the site in the coming years. Finally, the Wikipedia Education Program and Wiki Education Foundation have also attempted to make inroads in the classroom, by encouraging students to become more involved with the community through their school work.

The final part of this is whether or not these attempts will work. A community that is dedicated to fixing and addressing the issues that exist on the site is a community that will succeed. In the past, many ideas at reform have been met with resistance from the community, often with mixed results. Other times, approaches to fix these issues run counter to what others want to do in the community, so some editors end up unintentionally (or intentionally, for that matter) sabotaging the intentions of reform-minded users, although this can also be expected in a large community where people have differing views.

In the end, it is up to us as a community to ensure that the site continues for another thirteen successful years, as we are part of one of the greatest social, intellectual, and academic experiments on the internet. Our success in the coming years will be based on how we choose to address these issues, so it is imperative that we attempt to correct these issues while there are still people interested in editing the site, in order to continue to strive to be the most important encyclopedia in the world.

Power law of the top Wikipedian editors, which is similar to the article creation and bot curves
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Early discussions

Consider also that most vandalism, self-promotion and other blatantly non-constructive edits fall within the orange pie, and it becomes obvious that while more than a billion could edit, and tens of millions indeed do edit, overall it is just a few thousand people that write most of the useful content on Wikipedia. I wouldn't be surprised if you would find similar trends within contributors to high-quality articles (FA, GA). --ELEKHHT 01:03, 25 January 2014 (UTC)[reply]

That actually would be a good thing to explore in the future, as I am sure that there is data on that somewhere on the site for the things that you mentioned. Thanks for the suggestion! Kevin Rutherford (talk) 02:28, 25 January 2014 (UTC)[reply]
Wikipedia:List of Wikipedians by featured article nominations. Resolute 06:19, 25 January 2014 (UTC)[reply]

The last graph is amazing (and more than a bit scary). Am I right in thinking though that the concentration of edits among a fairly small proportion of accounts is not a new development? Thanks for this interesting article BTW. Nick-D (talk) 02:49, 25 January 2014 (UTC)[reply]

Yeah, it's one of those things that is incredibly telling, as the power curve exists with all of the groups shown. It's definitely not a new development, as this likely has been occurring for years, but I only started charting it in 2011, when the data lists came out. Thanks for the compliment though, and I look forward to doing a follow-up article in the future on these phenomenon! Kevin Rutherford (talk) 03:46, 25 January 2014 (UTC)[reply]

As we are interested mainly in content, perhaps we should strip out bot edits. in my experience, these tend to be non-content in nature. -- Ohc ¡digame! 07:34, 25 January 2014 (UTC)[reply]

Participation inequality is commonly found in many Internet projects. Very true within all Wikimedia projects, and with smaller projects between them. Interestingly, in my recent (still collecting data) pattern on Arbcom, I am seeing it even in the Arbitrators activity. It's everywhere, probably could even find it if we run some metrics on Signpost contributors... --Piotr Konieczny aka Prokonsul Piotrus| reply here 11:01, 25 January 2014 (UTC)[reply]

Demographics

Part of what WP has been experiencing is a normal fad cycle, in which WP broke big c. 2005 and has sort of gradually declined since as the proverbial "low hanging fruit" vanished, footnoting became more rigorous, the rulebook expanded, and so forth. People get older, they get jobs, have families, and get on with other things in life — it's normal. The question is how to replace those departing with newcomers, who in my minds eye have specialized knowledge able to take foundation articles to a higher level and build gingerbread topics on the edifice. I figured it would be a new corps of "grey rookies" — tenured and emeritus professors winding down their active teaching careers but seeing value in the educational mission of Wikipedia. I figured the one thing that was really needed was a WYSIWYG editor that was more or less as simple as MS Word.

Well, that entire vision is in question, given the abject failure of the VisualEditor project. So, maybe there's no new crop of converts to the WP mission, maybe we continue to add to our writing corps one person at a time, rather at random, much like we always have. It's a difficult puzzle to solve. I see hope in those academics seeking to make WP improvement part of their class projects, but I must frankly say that the reality of the work doesn't meet the expectations from what I've seen so far. Still, that's one potentially fertile ground for development. And hurray for those who work patiently with newcomers, teaching them the ropes, because ultimately that's how the process of developing new contributors works. Carrite (talk) 03:03, 25 January 2014 (UTC)[reply]

I definitely agree with you, and I had the option of exploring all of these options, but I really didn't want to write a 50K Signpost article, as no one would read it. Based on the feedback so far, I might create a series of articles in the future exploring the phenomenon further, so I look forward to what others say in order to plan out some viable ideas in the future. Kevin Rutherford (talk) 03:46, 25 January 2014 (UTC)[reply]
A series of articles on what you're finding out about the editor pool would be great! Everyone can discuss and offer opinions, but most of us aren't going to dig in and crunch through the site data. For example, I'd be curious to know more about editors adding inline citations, and about editors who add content that improves tagged articles. Djembayz (talk) 23:58, 27 January 2014 (UTC)[reply]
Very good points. What we have since 2005 is a maturing Wikipedia, that is ever harder to improve by kids, but is not yet attractive enough for the "grey rookies". Yet we fail to attract them and to provide an editing environment that makes them stay. I know there have been ideas around to address this [1], [2] but it did not happen yet. --ELEKHHT 04:22, 25 January 2014 (UTC)[reply]
Carrite, I don't agree the visual editor has been an "abject failure." The beta rollout may have been premature, but the Foundation is still working hard on the software, and doesn't plan to stop anytime soon. In my opinion, the editor is quite close to achieving its goal of generally stable and intuitive WYSIWYG editing, although I admit that would be irrelevant to those who are philosophically opposed. —Neil 09:21, 25 January 2014 (UTC)[reply]
Well, riddle me this: what's the percentage of editing being done on en-WP with this non-abjectly-failed tool? I believe the answer, rounded to the closest whole percentage point, is 0%. Someone please feel free to correct me if I'm wrong. I have no doubt, however, that money will continue to be poured into the project for as long as there is money to pour. I have no philosophical problem with WYSIWYG. I do fear that the ship has sailed and our markup code has so many intricacies that it will be impossible to get there without breaking the wiki. And I'd rather face facts than break the wiki. Carrite (talk) 19:26, 25 January 2014 (UTC)[reply]
Well, Carrite, all I know about editing patterns is that 24,000 editors on this wiki have opted in to the VE beta feature. Actual use might well be very low. But current use is not the point—it's still under significant development, and at some point in the future it will offered to all users, registered or not, by default. We'll see how it shakes out then; I'm confident it'll win plenty more converts. And I think the editor has long had a handle on the fundamental issue of translating from wikitext to HTML and back. It's now about additional features, like table editing and support for complex scripts; better user interface, for example for adding references; snappier loading; and so on. All of that has technical challenges, but nowhere near the level of impossible. —Neil 07:21, 26 January 2014 (UTC)[reply]
There are only 30k editors who are "actives" in the WMF definition, which means 5+edits/mo. Are you saying that eighty percent of those people are using VizEd? Or are you saying, that out of 20 million total usernames registered and abandoned over the years, point-one-percent of them opted into the VizEd, before leaving. How many of those 24k people are in the 99+edits/mo group? How many of those 24k people are in the 99+edits/mo group? 74.192.84.101 (talk) 17:57, 26 January 2014 (UTC)[reply]

Thank you for your excellent article Kevin Rutherford. Looking at the data, I don't take it as a given that it shows a problem. It might. It might show a functional self-sorting. The Wikipedia is pretty much a new thing so we don't know what the edit distribution is supposed to be 13 years in.

If it does show a problem, is the problem in the distribution or the raw numbers? If it's in the distribution, we could address it by imposing a cap on number of edits per month. That this would be foolish indicates to me that it's probably not a distribution problem. So if it's a problem, it's raw numbers. So to solve the problem, we want to multiply by n the number of editors making 2000 edits/month, 500 edits/month, 20 edits/month, and so on, without much caring if the distribution changes. Well of course, ideally. (Although no change carries no downsides; n=2 could see degradation in average value of an edit, for instance; but I think most people would agree that more editors would be better, within reason.)

But what is the value of n such the we would be able to say "problem solved"? We can't say. So we're saying "some more editors would be nice". Which is true, but doesn't prove that we have a problem now. Herostratus (talk) 03:38, 25 January 2014 (UTC)[reply]

I think the problem is more in the fact that it is indicative of the unintentional bias that has occurred over the years, and have compounded into what is shown on the charts. Having the raw numbers by themselves means nothing, but when you add our coverage of articles to the equation, the issue becomes more apparent, especially when we are at this age as a site. In the end, these issues will probably never be fixed completely, but the fact that are working towards it and working towards increasing coverage means that we will make it more reader-friendly at the end of the day. Kevin Rutherford (talk) 03:46, 25 January 2014 (UTC)[reply]
  • Presumably you are using cumulative data that exclude those on deleted pages in mainspace. Trends would be more apparent if periodic maybe annual deltas be analysed. A separate analysis of accounts with low or no activity excluded from the deltas. -- Ohc ¡digame! 07:28, 25 January 2014 (UTC)[reply]
I've always held that the pool of potential Wikipedia contributors is finite, & smaller than the WMF believes -- or tells potential donors. Even under ideal conditions -- no harassing of other editors, a user-friendly interface, etc. -- there are people who would not contribute even a typo or spelling correction just because. (Years ago I had one friend I attempted to recruit to contirbute to Wikipedia, whom I felt had the perfect personality & inclination for the project. His reaction: "Uh, I'll think about it" -- & to my knowledge he hasn't even looked at one Wikipedia article.)

What I think would test my hypothesis would be to do an analysis of the data over the last 10,12 years. Identify on a month-by-month basis the people who contributed 50% of the content. (Based on Kevin's article, this appears to be a breakpoint.) Has that number fluctuated greatly over the years? Then look at the usernames in this group: what is the turn-over? Do people routinely appear in it -- is there an identifiable "core" of contributors -- or do people routinely show up for only a month or two, then drop out of it for several months? If there is an identifiable core of contributors to Wikipedia based on these two ways of crunching the numbers, then there is a limited pool of contributors. And if the WMF wants to effectively solve issues of system bias in Wikipedia, it needs to do in light of this information. (And if such a study proves my beliefs are incorrect, I'll admit I'm wrong & stop mentioning it.) -- llywrch (talk) 20:52, 27 January 2014 (UTC)[reply]

Teahouse

Regarding this:

The Teahouse has been a successful model of helping new editors along in the process. Through providing guidance to new editors, they have found great success in their endeavors.

I was under the opposite impression - that the data gathered, so far, indicates the Teahouse has had relatively little success. Could someone point to a definitive analysis? -- John Broughton (♫♫) 03:48, 25 January 2014 (UTC)[reply]

This link suggests that, but although the Forbes article doesn't outright state that, it is implied there. A lot of the articles out there are by Sarah, so I decided to search for independent ones when stating that. The fact that it is still active after almost two years is also a good sign of its success, as I was very skeptical of it when it was first proposed, but I have begun to respect the project over the years due to their outreach. @Sarah Stierch might be able to provide more detailed information about this, so I will ping her just in case she has more detailed information. Kevin Rutherford (talk) 04:12, 25 January 2014 (UTC)[reply]

It would be nice to see some academic metrics. Is WMF running any studies on this? They should be. (Btw, wasn't Teahouse grant-supported?). --Piotr Konieczny aka Prokonsul Piotrus| reply here 11:07, 25 January 2014 (UTC)[reply]

To the first question, nothing that I could find, but I also found a lot of articles trumpeting it online. To the second question, that is possible, although I am not sure what the true origins of the program are, besides what is easily accessible out there. Kevin Rutherford (talk) 23:26, 25 January 2014 (UTC)[reply]
metrics here [3], and here [4]; and the negative impact of betacommand is here [5]; yet the betacommand method is followed not teahouse. 76.161.242.74 (talk) 18:59, 26 January 2014 (UTC)[reply]
Conference paper and metrics report (thanks for linking) describe promising short-term trends, long term retention analysis of a larger sample of Teahouse guests is possible. The data are there. You can also view monthly activity metrics here. Yes, TH was a fellowship project, kind of a precursor to the current Individual Engagement Grants program. Thanks to Kevin for the excellent article! - J-Mo Talk to Me Email Me 23:44, 31 January 2014 (UTC)[reply]

Longevity

While I agree that some Wikipedians have become ridiculously prolific editors, I would guess that those with the highest edit counts are also those with the longest longevity. Does Wikipedia have a class of rapid adopters that go from 0 to 10,000 edits in just a couple years? Is it more likely a cohort of Wikipedians that joined pre-2006 have stayed and racked up the edits while new editors have in the past few years joined and quit or perhaps haven't had time to be as prolific? That longevity (I would surmise) has something to do with survivability at ANI, RfA, and ARBCOM where old hands will find stability as newer members find the door. Chris Troutman (talk) 08:30, 25 January 2014 (UTC)[reply]

I think it's mostly the rapid adopters, through I'd have to spend a few minutes (which I don't have ATM) refining some data to give you exact numbers. If anybody wants to do so, just look at the Wikipedia:Most active editors, harvest the date they started (and stopped) editing, and see how many years it takes them to get to that list. --Piotr Konieczny aka Prokonsul Piotrus| reply here 11:09, 25 January 2014 (UTC)[reply]
It's possible to start new and quickly learn the curve, but then you also draw attention to yourself if you are pumping out three thousand edits a month, a few months in. At the same time, the top editors are ones who have been here awhile, but only because to have 100,000+ edits, you are either editing Wikipedia as a full-time job, or are casually making bursts of edits over a period of years. Kevin Rutherford (talk) 23:26, 25 January 2014 (UTC)[reply]
As a point of reference, last week I was browsing thru the pages related to Wikipedia:Service awards, & found that very few long-term (by this I mean 7+ years of editing) are in the ranks of top contributors. The person who ranks the highest in both tenure time & edits is Michael Hardy at 10 years & 173K edits; those who have been around longer have fewer edits, & those who have more edits haven't been around as long. (This seems to be a rule about the service awards.) But then, "those who have been here a while" has a different meaning to someone who has contributed for, say, 18 months, vs. someone who has contributed for over 10 years, so I may be missing your point. -- llywrch (talk) 19:39, 27 January 2014 (UTC)[reply]
@Chris troutman: sure; I'm one of those. Over 20,000 edits in less than three years. Toccata quarta (talk) 10:34, 26 January 2014 (UTC)[reply]
Same here: 27k+ edits since I became active in spring 2011. Curly Turkey (gobble) 23:48, 31 January 2014 (UTC)[reply]
User:Yintan. Managed 18k edits in one month. Now retired, due to the 2-steps-forward-2-steps-back rate of progress fixing things around here, from what I can grok. 74.192.84.101 (talk) 17:34, 26 January 2014 (UTC)[reply]

Holy Guacamole!

Thanks to you I just found out I am in the top 500 of article creators. I find that surprisingly sad, because I often claim that my edits are just a drop in the bucket of what needs to be done. This also reinforces my firm belief that the article creation process is really crucial to the health of the project and definitely needs more work in order to keep from scaring newcomers away. In outreach efforts, I always emphasize using a "similar existing article" to the type of article any newcomer wants to work on as an example of how to approach article creation. This approach has helped at least one person who I met briefly again last Saturday after she made 49 new articles with the help of another veteran editor. In-person outreach is the best way to get new people editing or to spur existing editors into editing more. Jane (talk) 09:29, 25 January 2014 (UTC)[reply]

I've used that approach as well, as I've noticed that I learn easier by copying something and then modifying it for my tasks, but outreach is also important in this approach, especially if you are good at hands-on teaching. Kevin Rutherford (talk) 23:26, 25 January 2014 (UTC)[reply]
I totally agree: the best way to help people to edit Wikipedia is to meet them in person and teach them side-by-side. --NaBUru38 (talk) 17:40, 28 January 2014 (UTC)[reply]
@Jane: Using a "similar existing article" is an excellent idea. But how do you locate such articles when the categories at the bottom of articles are constantly disappearing through wp:CfDs? Have you found another method? XOttawahitech (talk) 23:32, 9 February 2014 (UTC)[reply]
Hi Ottawahitech, I have very rarely seen a category that I use get nominated for deletion. Perhaps my corner of the Wikipedia universe is pretty stable. However, I ran into a big difference in categories when I participated in an edit-a-thon for heritage sites. I worked with a Japanese Wikipedian to translate a Dutch heritage site into a new article on the English Wikipedia (made by me) and a new article in the Japanese Wikipedia. I put some evaluation slides about this experience on Commons here: File:Edit-a-thon 12 October 2013.pdf. Jane (talk) 22:33, 17 February 2014 (UTC)[reply]

Error?

"Most users are aware of the Article wizard. Even fewer know about Articles for creation." I think there is a logical problem here? --Piotr Konieczny aka Prokonsul Piotrus| reply here 10:57, 25 January 2014 (UTC)[reply]

Yeah, you're right, and I'll go correct that now. Kevin Rutherford (talk) 23:26, 25 January 2014 (UTC)[reply]

How big's the pool?

How many self selected, volunteer, encyclopedists, who are adept at research, writing, and coding references (who no doubt find all of those tasks tedious, at times - and at other times pleasurable, or, at least worth it) are there in the world? Alanscottwalker (talk) 16:01, 25 January 2014 (UTC)[reply]

I'm tempted to say "about 10,000," which seems to be where the cumulative "Very Active Editor" count across all wikis has been hovering. The pop culture stuff will always have creators, in my view. The paid editors for really boring commercial topics will always be around... The question is how to get educated people to contribute to niche academic topics. Those people are out there; getting them past the walls of cumbersome wikicode (or the unoperational alternative editor), massive style manuals, hyperactive vandalism fighters, and the esoteric and frequently dysfunctional anarcho-liberal political culture here is the big question. That is the tough one... Carrite (talk) 19:20, 25 January 2014 (UTC)[reply]
I'm tempted to have it expanded to the next 5,000, but based on how the trends are occurring, it would probably only grab in the next five to seven percent of the article creators. At this point, there is probably a huge overlap in the data, as I doubt people who have created hundreds of articles have a small amount of edits. Kevin Rutherford (talk) 23:26, 25 January 2014 (UTC)[reply]
Why do we want every editor to be highly motivated, unpaid, good at sourcing, excellent at authoring, and a master of markup? Plus have oodles of time on their hands? Agree there are probably somewhere between 1k and 10k folks like this, in the English-speaking world... drawn from over a billion speakers of that language. But the whole point of wikipedia is to be the encyclopedia that anyone can edit, because that pushes the knowledge-summation-facilities out to the endpoints: the people who have the knowledge. This is always more efficient than trying to centralize the production of knowledge, into the hands of "the best" bureaucrats. Carrite may disagree with me about the economics analogy here, but Nupedia and Citizendium were failures, and wikipedia has thus far been a success, *because* the former were centralized walled gardens, and the latter pushed hard to be the encyclopedia anyone could edit. Do we want ~5k super-editors, or do we want ~5M editors that each help a little bit? Hope this helps. 74.192.84.101 (talk) 17:51, 26 January 2014 (UTC)[reply]
That then is what you will get: a few thousand who do all that stuff, and a few multiples of that number who do an edit here and there, and they will all edit each other, And, that seems to be what we have. Alanscottwalker (talk) 18:04, 26 January 2014 (UTC)[reply]
Funny... in a way you are correct. That *is* what we have. 20M who signed up at one point or another, and made a random edit or two, then left. A central core of ~500 active admins and ~2500 heavy contributors, which differs in *membership* over time, but will probably never go away, no matter how snarky and byzantine the editing-environment gets around here. But what I'd like to see is a growing editor-count, in both the 99+edts/mo category, and also the 5+edits/mo category, until we have double or quadruple or more. We've got 500M readers that show up every month. Surely more than 0.006 percent of those people would make good light-duty contributors. fixing one small thing a week, or thereabouts. And yet, we only have 30k, not 300k, not 3 million light-duty editors. Surely one reader in a 100, or even one reader in a thousand, is worthy of editing the encyclopedia anyone can edit. 20M people have tried, and 20M have failed to stick with it, rounding to the nearest million. Is that good? It seems non-good, to my eyes. Are you happy with the trendlines,[6][7] Alanscottwalker? Editor-count heading down, readership heading up. (The pageview-downtrend is caused by the goog's Knowledge Graph which monetized infoboxen.) I'd rather see more actives-per-reader, not fewer. 74.192.84.101 (talk) 00:11, 27 January 2014 (UTC)[reply]
Don't know. If what the pedia needs is intensive editors, for example to bring say many of the Vital articles to a really good level, than those are the editors that are needed, at this stage of the project. Alanscottwalker (talk) 00:39, 27 January 2014 (UTC)[reply]
Pointing out that that argument puts a heavy burden on said intensive editors - it would be better for them to be assisted by lots of less intensive editors, and not only to guard against systematic bias, but also because they may not necessarily all have the inclination to work on precisely the task of raising an identified set of articles to FA quality. In addition, systematic bias is an important issue - by definition, an encyclopedia aims to be comprehensive. I take issue with the common assumption that at this stage the encyclopedia is anywhere near reaching that level. There are still a mammoth number of missing articles, in my perception. More generally, it goes for both editors and articles: we need all kinds for efficiency and to match readers' needs. Yngvadottir (talk) 05:50, 27 January 2014 (UTC)[reply]
OK. We need all kinds -- and by that you mean? "Good edits" or at least not bad? I certainly agree that all different kinds of backgrounds would be better than not, I would just note that for all of them, we expect roughly the same thing in editing, regardless of background (and, yes in dealing with eachother). This is straying a bit from my original point, which was certainly not that "we want every editor to be highly motivated, unpaid, good at sourcing, excellent at authoring, and a master of markup? Plus have oodles of time on their hands?" It was not about want of any kind, but to note that there are a finite number of skills for this project, and a finite number of inducements to do it, which of necessity have to match the interest of the primarily self-selected, and compete with their other interests - leading to a finite pool. But I do agree that if a reader finds something useful/interesting/pleasurable to them here, they are probably more likely to try editing (and may decide to enter into the demands between reader and crowd-sourced-encyclopedia-editor). Alanscottwalker (talk) 19:16, 27 January 2014 (UTC)[reply]
I was responding to your suggestion that "what the pedia needs is intensive editors, for example to bring say many of the Vital articles to a really good level" (and trying to be brief). By all kinds I mean that we need not only people interested in bringing articles up to "a really good level" - which I interpreted as FA level, but the same goes for GA level - and not only people interested in working on those articles that have been labelled "vital". We also need people interested in fixing the plethora of spelling, grammar, and stylistic errors in articles, which lower our general standard and in some cases make articles very hard for readers to understand; we also need people interested in creating the articles we still lack, which most of us are probably not even aware we lack (systemic bias and the sheer size of the accumulation of recorded human knowledge at this point; your suggesting that what we really need at this point is to focus on those articles that have been identified as vital in itself suggests this, it's a common position but one I strongly disagree with that the encyclopedia is approaching completion in its coverage); we need intensive editors, sure, but we also need as many as we can get of those who only make an occasional edit, both to take some of the load off the intensive ones and because occasional editors are more likely to later become intensive editors than are people who never edit - in addition to which, we advertise this as "the encyclopedia anyone can edit", and it's easy and quite destructive to give people who try it, or have been doing it occasionally, the impression that that is a lie, we only like intensive editors or editors who've been doing it for years. In response to your follow-up: it would be lovely if every editor had lots of free time and was highly motivated to spend it editing Wikipedia, had good access to sources and both motivation and know-how to insert them in articles, had mastered the markup and was also a good writer. (I'll add that it would be lovely if they'd all learned English grammar in school.) But the essence of this being a wiki is that we can and do fix the markup, the poor writing, even the underwhelming referencing. These things do not all have to be features of the same person. What we can't as easily fix is holes in our coverage of topics among the editing corps - missing people - or even a lack of people willing and able to go through articles making them better, which includes our collection of more or less embarrassing microstubs as well as the articles with more or less horrendous English or more or less horrendous errors and omissions. So - we need all kinds of editors; or more accurately, I think it's counterproductive to suggest we only need one kind. Does that clarify? Yngvadottir (talk) 17:33, 28 January 2014 (UTC)[reply]
I am not sure what lie or destruction you are talking about, but the article above was talking about the pool of "active" editors, so that is the context of these comments. Moreover, it is not my suggestion, but a recent study I recall reading here that made the claim that what Wikipedia needed was to get its vital articles up to snuff or it was going to languish in mediocrity, that is why I said "if" in response to the claim that we should go back to the "old days" of contributer numbers. Alanscottwalker (talk) 17:53, 28 January 2014 (UTC)[reply]
Yes, I saw the "if", but I'm in deep disagreement with that point of view. I don't think it's a matter of the old days as recognizing that people differ and our fundamental strength is being able to draw on all of them. Yngvadottir (talk) 18:32, 28 January 2014 (UTC)[reply]

Where am I?

I should be in the 5,000s -- having done 11,097 edits. But I'm not there. How come? Smallchief (talk 17:17, 25 January 2014 (UTC)[reply]

You're number 5,574 on page two with 11,233 edits. Kevin Rutherford (talk) 23:26, 25 January 2014 (UTC)[reply]
Thanks for hunting me down. My user page gives my edit count as 11,097 as of Jan 24. This list has 11,233. Why the difference? Smallchief (talk 00:08, 26 January 2014 (UTC)[reply]
No problem! There are actually three to four different counts for this, with each number relying on different things. Personally, I like this link, as it gives you the most comprehensive totals, and includes your deleted contributions as well. Kevin Rutherford (talk) 03:01, 26 January 2014 (UTC)[reply]

I was having a discussion recently with an editor who hasn't been here that long, and had never seen anyone deliberately leave a redlink in an article. I think that is worth more of an OMG than the raw statistics on who creates new articles.

Given another conversation with someone who had taken out one of my redlinks, which had a connected conversation when I finally created the article and was told (by another editor) that it was "like a Christmas present", the culture on redlinks has taken a wrong turn. They are not to be regarded as "defects", but (as ever) growth points.

WP:REDLINK is fine in its nutshell version: further down is where BLP concerns may be overdone.

I have worked on redlink lists for most of my time here. More structured effort on missing articles could help newcomers. Charles Matthews (talk) 18:39, 25 January 2014 (UTC)[reply]

Charles, I actually do that all the time, as the links are to potential future articles. Kevin Rutherford (talk) 23:26, 25 January 2014 (UTC)[reply]
I've seen editors go through my watchlist removing redlinks with the edit summary "cleaning up" or "fixing broken links." It's quite sad. - BanyanTree 04:44, 29 January 2014 (UTC)[reply]

Mark Twain

On the culture of the site, I do find the statistics strange. Since 2004 I have created over 80 new articles from scratch, yet my personal statistics show me making less than 2000 edits in that time. I think the reason is that I write and code a new article on an off-Wiki text editor, and only when I am satisfied with it will I paste it to the Wiki editor for a final polish before saving to Wikipedia. This means that all the editing effort, often over several days of editing time per article, is not accurately represented. Apwoolrich (talk) 19:27, 25 January 2014 (UTC)[reply]

I'm the same, in that I generally create a nice WP:WALLOFTEXT in the thirty-kilobyte range, and then save it all in a wallop. Ktr101, in your next series, can you do some work on separating out the *kinds* of edits? Twinkle-supported vandalism-reverts are important and useful, but tend to turn into WP:MMORPG. You are explicitly calling editicountitis "writing" the encyclopedia, but in fact edit-count is heavily skewed towards favoring folks that spend time protecting the encyclopedia. Actual writing, such as Apwoolrich is engaged in, can best be measured by number of bytes added which were not deleted within a week. This set of statistics would give you an orthogonal look, to show the power-curve amongst writers/inclusionists. Currently, I'm guessing your power-curve is mostly showing protectors/deletionists. Both are needed, and they are rarely the same humans. 74.192.84.101 (talk) 17:43, 26 January 2014 (UTC)[reply]

Again...

I have to say this every time bias discussions come up on Wikipedia: the people who edit are the people who edit. There's little we are going to do to change that. The distribution of editors is determined by large-scale social forces and we are mostly powerless against them. We aren't going to get as many women as men editing because women think they have better things to do. (Object to that? Well, it was one of the leading reasons women gave for not editing in a survey by the WMF to study the gender gap.) So unless the WMF is going to start offering baby sitting services to editors or start forcing people to do something they don't want to do, we are going to miss out on some demographics. I personally think, while editor demographics is interesting, efforts to "combat" it are largely futile and, at this point, somewhat blasé. Jason Quinn (talk) 03:03, 27 January 2014 (UTC)[reply]

Aaron Swartz article

  • I am surprised you didn't mention a very related article by Aaron Swartz from 2006 on this topic. Basically, he questions a few of your assumptions and goes a bit of length to show that they might be not as indicative as you think and suggest in your article. Among them is that edit count fails short as a proxy to measure who added actual content to Wikipedia: what Aaron found is that most of the content comes from the long tail of contributors, whereas cleaning up the content, wikifying it, fixing grammar and typos increases edit count (and these are all crucial tasks), but do not offer much insight about where the content comes from. Also, the number of 20 Million registered editors is a red herring: what would be more interesting would be the number of actual contributors, i.e. users who contributed content to any main namespace article that did not get reverted. I guess this number is one or two orders of magnitude smaller, which would put the rest of the findings in a better relation. Having said that, thank you for the interesting article and read, even though I do not fully agree with your methodology and I am not convinced by your findings, but keep up the work! --denny vrandečić (talk) 20:08, 28 January 2014 (UTC)[reply]
Denny, thanks for posting that link! When I first read the article I was thinking the same thing and I started rooting around looking for the Aaron Swartz article and couldn't find it. Though Wikipedia has changed a lot since then, I think the main gist of what he wrote is still true. So while the bot-like repetitive tasks cause necessary edits to articles, the main meat of those articles is still based on edits by low-edit-count individuals. I myself am a "niche-player", and I would guess there is a bulk of short article stubs out there created by other people like me, interested in one specific niche, but there are also lots of "drive-by" article creators who work on one thing and don't come back for years. Jane (talk) 00:02, 30 January 2014 (UTC)[reply]
Well, that is from 2006. Maybe it was true then: I was not editing here at that time. Now, it might not be the case as much. (I don't actually know for sure: there are a lot of variables and it is hard to take all of them into account.) WP:1EDITMYTH is a recent (2013) essay on this by Wikid77: it reaches the opposite conclusions, although it is obviously still not a complete study, albeit more recent.
The earlier you go in WP history, the more likely it is that you will see what Aaron found, and the more clearly you will see that: because many important articles were then at a rudimentary state and pretty much any passing reader could add something. Now it can sometimes get very difficult to do this to some vital articles. (Mostly the ones that are GAs or above, although this is sometimes true even at lower ranks on the quality scale.)
Some questions I'm interested in: new editors may well still write much of the text for stubs, but how often is this content cited? (And if it is not, then how do we know if it is true?) And how often is this content well-formatted and grammatical? (Like it or not, good style does contribute to the reader's image of this encyclopaedia.) (Note: these are not rhetorical questions. I am actually genuinely curious about approximate answers to these questions.) Double sharp (talk) 13:22, 30 January 2014 (UTC)[reply]
Hm, so you disagree with Swartz' findings because they are old. But at the same time you admit that there has been no new study with regards to that. Also, the essay you cite has been criticized for the same reason on its respective talk page: the essay doesn't report on any research, it just states that it does not believe the findings and that you merely have to think about it in order to see the truth of that.
It might be that the Swartz study is dated and wrong today, but I would like to actually see it repeated in order to know the facts. A simple statement "today everything is different" does not persuade me. And yes, I know that doing this kind of research is hard, but that's why we should read when someone has done it, and learn from it. --denny vrandečić (talk) 23:39, 3 February 2014 (UTC)[reply]
  • Old phrases are kept years yet not sourced: I have not analyzed new-editor citation levels, but I have noticed in many hundreds of articles how the initial text of the first revision is preserved, almost revered, for many years long after created. It is almost bizarre to think that dozens of follow-on editors would not rewrite the intro wp:lede text, or reword sections for better clarity, but they don't. Many later edits to pages merely expand the text, or delete old phrases, but very few people actually rewrite an article to re-explain the concepts, and that is why exact phrase-tracing to prior edits can be checked for years. In many cases, phrases added by various users can be found, years later, almost unchanged in the overall text.
    Although the essay I started, WP:1EDITMYTH, does not provide many details, it can be shown even in 2002-2003 how experienced editors were mainly creating and expanding articles, not much by passing strangers of a few edits. A key example is the article for singer "Édith Piaf" created by IP 209.105.200.54 on 26 July 2002 (see: hist). Well, after created, as title "Edith Piaf", that page was mostly expanded that year by IPs 200.100.200.*, who were all 250 IP addresses of the same person (as deduced by narrow focus on French subjects). Another editor who expanded the Piaf page in June 2003 was User:Arpingstone, still active this week (contribs). Other major text was added by User:DW (contribs), who edited many pages for 6 solid months (blocked 4 years later), adding content to hundreds of pages (not a hundred drive-by editors each adding a paragraph, but rather 1 username adding, adding, adding to every page). Those actions are typical: there are some editors who add/add/add content, but do not copy-edit for grammar, sources or wikilinks. Those IP editors who seem to add text for 1-2 days and leave "never to return" are often the same people with another IP address or username. I will try to expand essay WP:1EDITMYTH to better explain editor activity during years 2002-2005. -Wikid77 (talk) 17:04, 30 January 2014 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0