The Signpost

Special report

Towards "Health Information for All": Medical content on Wikipedia received 6.5 billion page views in 2013

Contribute  —  
Share this
By James Heilman and Andrew West
Editor's note: this article originally appeared on the blog of The London School of Economics and Political Science. It is reprinted here with the permission of the authors.

Recently we published “Wikipedia and Medicine: Quantifying Readership, Editors, and the Significance of Natural Language” in the Journal of Medical Internet Research (JMIR). Some of the key-findings include:

  1. Medical content received about 6.5 billion page views in 2013. This was extrapolated from empirical per-article “desktop” traffic data plus project-wide mobile estimates.
  2. Wikipedia appears to be the single most used website for health information globally, exceeding traffic observed at the NIH, WebMD, WHO etc.
  3. Wikipedia’s health content is more than 1 billion bytes of text across more than 255 natural languages (equal to four times the size of the 32-volume Encyclopedia Britannica).
  4. Nearly 1 million references support this content and the number of references has more than doubled since 2009, with citation density outpacing raw byte growth.
  5. Article access patterns are heavily language-dependent, i.e., most topics show substantially differing popularity ranks across Wikipedia’s different natural language versions.
  6. The core community of editors numbers less than 300, having shrunk significantly over the last 5 years.
  7. The core community is about 50% health care providers; 85% have completed a university degree; 50% have more than one university degree.

The paper contains many more data-driven insights we encourage readers to investigate. Equally important, this evidence provides us further motivation to expand translation of medical content across natural languages, continue to improve the quality of medical content, and identify partnerships and fundraising opportunities that will allow us to overcome current challenges.

Single most used website for health information – Figure 1: Health care site traffic comparison. Light blue portions represent official Wikimedia Foundation data.

Translating medical content

Structured translation efforts for medical topics have been ongoing since 2012 as a collaboration between WikiProject Medicine (WPMED) and Translators Without Borders (TWB). Naive approaches like using Google Translate fail because of their inaccuracy (especially with technical medical content) and limited language scope. In contrast, the current WPMED-TWB collaboration operates in 100+ languages and has translated more than 4 million words to date. This work may be useful for those developing translation engines.

One highlight is exemplified by the recent West African Ebola Outbreak, when the team was able to help quickly increase the number of languages with an “Ebola” article from 40 to more than 110. This work was additionally aided by a for-profit translation company, Rubric, freely donating their services. Data from Microsoft shows that Wikipedia was the single most used source of Ebola information in the 3 most affected countries (Sierra Leone, Liberia, and Guinea), ahead of CNN, CDC, and the World Health Organization (WHO) during the worst part of the outbreak.

Partnerships to improve core/English content

A number of organizations have now partnered with WikiProject Medicine to improve English medical content, including the University of California at San Francisco (UCSF), Mount Sinai, the Cochrane Collaboration, the Sustainable Sanitation Alliance, and the National Institutes of Health. Discussions are ongoing with WHO regarding a collaboration involving vaccine-related content.

Nearly 1 million references support medical content – Figure 2: Citations/references appearing in the medical content of English Wikipedia and all Wikipedia languages based on year-end snapshots.

Notably, the UCSF collaboration began in 2012 after James Heilman presented lectures on “Wikipedia and Medicine” at the institution. This was a catalyst for Amin Azzam, a professor at the school, to assemble an elective course that provided Wikipedia training, followed by a four-week period where students actually edited and improved medical articles. Review and guidance was provided by librarians and faculty at UCSF, as well as core members of WikiProject Medicine.

Another interesting experiment was performed where a high-quality Wikipedia article was submitted to the journal Open Medicine and underwent peer review (See previous Signpost coverage). While there are expenses associated with this form of review and dissemination, the eventual publication of the article serves as proof that Wikipedia can support professional-quality medical content. Additionally, it is an opportunity for authors of Wikipedia to receive academic recognition for their work. Work is ongoing to develop an internal peer-reviewed medical journal which is PubMed indexed.

Barriers to growth

Despite supporting 6.5 billion page views in 2013, the authors/translators/contributors to medical content are all volunteers. The project did receive a small amount of funding ($12,000) via the Indigo Foundation to support the development of content in East African languages. This funding allowed content to be developed significantly more quickly than in other comparable languages.

It is also encouraging that Wikipedia as a whole has secured free and unlimited data access to its content via cellphone partnerships, enabling a knowledge resource for 400+ million people in the developing world. While this has enormous potential for the consumption of content, the form factors of mobile-devices which are pervasive in these regions render it very difficult for readers to also contribute content. Partly because of this, it remains difficult to find translators for languages of the developing world.

Expanding efforts

While the above points highlight some successes of WikiProject Medicine and its partners, the challenges also indicate there is much work to be done. Data points to the enormous potential of these efforts, and further data analysis can help us understand how to focus this work and make it more impactful. Human volunteers will undoubtedly continue their tireless contributions and further volunteers are needed, but greater funding has the potential to greatly speed our progress. Paid staff could enable experts to focus on content instead of administrative work.

Additionally, more universities are interested in engaging with Wikipedia. However, programs that familiarize potential contributors with Wikipedia norms, like that at UCSF, involve a steep learning curve and the time-intensive involvement of existing experts. At current, these factors inhibit the scalability of the initiative. Expanding the program will require provisioning “online teaching assistants”, whose funding beyond a June 2015 pilot remains unclear.

Funding could also be used to expand translation centers around the world. If such assistance is obtained, it will also allow faster broadening of our scope of emphasis. Initial translation efforts focused on major medical disease (HIV/AIDS, tuberculosis, etc.) and we are expanding slowly into topics such as women’s health, sanitation matters, and items from the WHO List of Essential Medicines; however, work moves slowly.

James Heilman is a Canadian emergency room physician and clinical faculty at the University of British Columbia. He is a founding member of the Wiki Project Med Foundation.
Andrew West is a Senior Research Scientist at Verisign Labs. He completed his Ph.D. in computer science at the University of Pennsylvania.

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

"... Wikipedia was the single most used source of Ebola information in the 3 most affected countries (Sierra Leone, Liberia, and Guinea), ahead of CNN, CDC, and the World Health Organization (WHO) during the worst part of the outbreak." That is pretty intriguing. How can we ensure that the articles in this important area do not fall prey to the "rot" that is typical on Wikipedia? As long as Wikipedia can not guarantee any level of quality, let alone a high one, it seems unlikely that many serious health organizations would be willing to enter into partnerships that could improve the breadth and availability of our content in this area.--Anders Feder (talk) 04:58, 7 June 2015 (UTC)[reply]

We have already partnered with the NIH, Cochrane collaboration, UCSF College of Medicine, and are working a bit with the World Health Organization. They are interesting in these partnerships not because our content is universally great but because it is universally highly read. Doc James (talk · contribs · email) 03:27, 8 June 2015 (UTC)[reply]
But to what extent and for how long? For the resources it takes to cope with all the vandalism and edit wars they could probably build a new hospital instead. Eventually someone will question the cost-benefit relationship.--Anders Feder (talk) 03:39, 8 June 2015 (UTC)[reply]
Been a few years for the NIH, Cochrane and UCSF. They are not putting in the resources required to build a hospital :-) which is 100s of millions to billions. Would be nice if they did though. Doc James (talk · contribs · email) 04:07, 8 June 2015 (UTC)[reply]
"Would be nice if they did though." That's what I mean :)--Anders Feder (talk) 04:11, 8 June 2015 (UTC)[reply]
Yes most initiatives are fairly small at this point in time. Doc James (talk · contribs · email) 04:13, 8 June 2015 (UTC)[reply]
Unfortunately, Wikipedia:Identifying reliable sources (medicine) is becoming a problem in fringe articles. Especially with organic food MEDRS is often used to keep positive, verifiable information by reliable sources (like agricultural colleges and universities) out of the articles. That is undermining the stature of MEDRS. The Banner talk 15:00, 7 June 2015 (UTC)[reply]
Have not really followed the content dispute regarding organic food. Doc James (talk · contribs · email) 03:27, 8 June 2015 (UTC)[reply]
Haven't followed any content dispute either, but that may be seen as a positive influence of MEDRS. Widefox; talk 17:45, 14 June 2015 (UTC)[reply]
4 times the 32 volume EB Doc James (talk · contribs · email) 19:25, 8 June 2015 (UTC)[reply]
Okay, thanks. So "127 times the size of one volume of the 32-volume EB" would be more accurate but awkward, so I revised the article in this diff to say what you just said. Revert/reword if not helpful but I think this is good. Thanks. --doncram 19:52, 8 June 2015 (UTC)[reply]


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0