The Signpost

Recent research

Drug articles accurate and largely complete; women "slightly overrepresented"; talking like an admin

Contribute  —  
Share this
By William Skaggs, Maximilian Klein, Piotr Konieczny, Gamaliel, Jonathan Morgan and Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

German study finds Wikipedia's pharma articles accurate and largely complete

Review by William Skaggs

Recently when my 83-year-old father was undergoing medical treatment, the doctor wanted to change one of his blood pressure drugs, and in order to let us know what the effects would be, she printed out the Wikipedia article on the drug and handed it to us. This accords with the overall impression I have developed: Wikipedia's articles on drugs are pretty good – good enough to impress even doctors. A new research study[1] adds some substance to that impression.

A team of German pharmacologists picked a set of 100 drugs described in pharmacology textbooks, and compared the textbook descriptions with Wikipedia articles about the drugs, for accuracy (meaning that the Wikipedia article matched the information in the textbook) and comprehensiveness. They found that 99.7% of the facts in the Wikipedia articles were accurate, and 83.8% of the facts from the textbooks made it into the Wikipedia articles. These numbers were derived from the German Wikipedia, but the authors state that similar results were obtained for the English language version. They conclude that "our results suggest that Wikipedia is an accurate and informative source of drug information for undergraduate medical students." They also revisited the drug articles examined in 2010 by an earlier study which came to less positive conclusions (see coverage in this newsletter: "Quality of drug information in Wikipedia"), and "found the quality of pharmacological information significantly improved". Upon reviewing several other empirical studies which evaluated the quality of medical information on Wikipedia, the authors observe that "despite different methodologies, the main conclusion of these studies was that Wikipedia articles on health topics contain few errors and are well referenced, while the information provided often lacks depth."

Obviously this is something we should be proud of, but let me note a caveat. Articles about specific drugs are a prime example of the sort of thing Wikipedia is best at: articles about topics that can be handled in a systematic way, without requiring mastery of a large body of literature. As a rule, the more comprehensive a topic, the lower the quality of the Wikpedia article. Thus our article on the drug chlordiazepoxide (commonly known as Librium) is better than our benzodiazepine article, which covers the class of drugs to which Librium belongs. The latter article contains a lot of good information but is poorly organized. Our article pharmaceutical drug shows this flaw to an even greater degree. The general take-home message, supported by the German study, is that our medical articles can be very useful to people who are looking for specific facts, but tend to be less useful to people who are trying to understand broad principles.

Notable women "slightly overrepresented" (not underrepresented) on Wikipedia, but the Smurfette principle still holds

Review by Maximilianklein

"It's a man's Wikipedia? Assessing gender inequality in an online encyclopedia",[2] presented at the Ninth International AAAI Conference on Web and Social Media (ICWSM) this week, is an investigation into the gender of biography articles of six different Wikipedias. Four different biases that are investigated are coverage bias (who makes it into the encyclopedia), structural bias (which articles link to which), lexical bias (the type of words used in the articles), and visibility bias (who is featured on the Main Page).

Coverage bias is analysed by seeing who from the reference databases of notable humans of Freebase, MIT's Pantheon, and Human Accomplishment are in Wikipedia. A surprising result here is that women are not proportionally underrepresented as hypothesised, but even "slightly overrepresented". (The researchers acknowledge that the first two of these three are at least partly based on Wikipedia themselves, but try to address this issue by "seeking patterns that exist across all three datasets".)

The structural bias is a graph theoretical measure of how men and women's articles link to each other. Here it is shown that across all six languages, articles about women tend to link more to articles about men than vice versa. The Smurfette Principle, that women are less central in the link graph, is also tested. The in-degree of the two gendered article categories is compared, and it is found that men are indeed significantly more central in all language editions, except in the Spanish Wikipedia, where men and women are equally central.

The lexical bias notion stems from the idea of the Finkbeiner test, that a female scientist will often be noted as a woman as much as a scientist. It is indeed found that articles about women place linguistic emphasis on relationship, gender, and family. Whereas top terms in men's articles focus on their professions. The authors mention that this ties into the concept of male as the null gender. For instance the word "divorced" is 4.4 times more frequent in a woman's article than a man's on English Wikipedia. For German and Russian, that multiplier increases to 4.7 and 4.8 times, respectively.

Lastly visibility bias, the propensity of gendered articles to appear on the English Wikipedia Main Page is tested. Yet no significant difference is found in the propensity of the two genders to appear on the Main Page.

Unfortunately this paper suffers from its Euro-focus. The six languages in question are English, German, French, Italian, Spanish and Russian, but the width of the methods used still show wide-scale issues. The authors conclude that Wikipedia does show some signs of addressing systemic bias, like equal visibility on the main page, and coverage bias equality; but still there are stark differences in their portrayal. Whether this is due to biases in the real world, or the way that Wikipedians write about the real world, they say, is still an unknown mixed bag.

Editors who use user talk pages are more involved in high-quality articles

Review by Piotr Konieczny

An article[3] in the Journal of the Association for Information Science and Technology (JASIST) examines Wikipedia editors' public communication using social network analysis theory. This research suggests that Wikipedia editors who engage in communication with others using user talk pages "are more experienced in editing high quality articles and are more integrated in the community". The author distinguishes quantitative and qualitative contributions, noting that the use of communication tools is more directly related to contributing not just to many articles, but to high quality articles, as well as larger number of namespaces. The use of such tools is centered on "coordinating and mentoring editors who edit lower quality articles", or in other words, the author observes that editors who edit high quality articles and use communication tools a lot seem to be more likely to reach out to less experienced editors than the other way around. The author concludes that online collaboration systems are improved through features that allow creation of what the author calls "personal" communication network. Through the study excluded bots, it does not seem to have investigated the details of communication (ex. templates, warnings, awards, others), and so its conclusions on the nature of communications (rather than who engages in it) are more tentative.

"Wikipedia, collective memory, and the Vietnam war"

Should the article Vietnam War open with this lead image (because "it's one of only two photos of [a member of the US military] winning the Medal of Honor"), or instead with a depiction of the My Lai massacre? One of the many debates from the article's talk page (the current version uses a collage of several images)
Review by Piotr Konieczny

This paper,[4] likewise published in the JASIST, looks at the Talk:Vietnam War page (and its archives) and analyses it in the context of theories dealing with the concept of collective memory (cultural memory, memory space, and the "floating gap" concept introduced by Pentzold (2009) in his paper on Wikipedia.[supp 1] As such, this paper is one of several works that argues that Wikipedia is a place where modern world's memories are being recorded and, to some extent, shaped for posterity. The paper finds that the Wikipedia's article is affected by two major debates ("(a) whether the US actually lost the war and (b) whether the voice of the American Vietnam veteran should be privileged.") It reviews major, recurring arguments presented by the talk page participants, and concludes that Wikipedia allows us to study how collective memory is shaped. The author also argues that it is the very fact that such debates can be observed on Wikipedia that may distance some educators, primarily librarians, who are used to works that conceal their knowledge production processes. The author ends with a call for librarians to edit Wikipedia, and help their patrons do the same, in order to participate in the 21st century curation of collective memories.

In a separate paper, published earlier in the Journal of Documentation,[5] the author examined the debate about reliable sources on the same talk page and concluded (according to the abstract) that while much of it "is conducted without acrimony, the level of analysis one finds in the talk pages is rather shallow while the attention of individual contributors is not overly concentrated."

Survey of secondary school use of Wikipedia

Review by Gamaliel

Three researchers have conducted a survey[6] of the use and perceptions of Wikipedia among secondary school teachers and librarians in the United States. Twenty-two teachers and librarians responded to the survey. The vast majority (91%) reported that "Wikipedia had some effect on student research". Responses were mixed about how positive or negative that effect was, however. Positive comments included responses that Wikipedia is "easily understood...thorough, up-to-date, and easily edited" and "students use it to get the basic ideas for their research, then go to other websites to verify it." Negative comments largely centered on the fact that many students did not go beyond Wikipedia in their research, such as the responses that "students rely on it too heavily and do not expand their research to prove or disprove their findings" and "Students don’t want to check sources when they can just get their work done in one stop." Most (91%) reported that their schools had no policy regarding the use of Wikipedia, but responses were roughly split regarding the need for one. Teachers and those responding that Wikipedia had a negative effect were more likely to respond there was a need for such a policy, as opposed to librarians and those responding it had a positive effect. Based on the results, the authors concluded that any policy should not restrict Wikipedia use. They write "instead of banning and fighting against the usage, students need to be taught the skills to utilize it an effective way, such as how to use Wikipedia as a jumping off point to other potentially more trustworthy resources and how to evaluate the reliability of articles." Given the very small sample size of the survey, this article is more useful for its excellent literature review.


Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.


  1. ^ Kräenbring, Jona; Tika Monzon Penza; et al. (2014). "Accuracy and completeness of drug information in Wikipedia: a comparison with standard textbooks of pharmacology". PLOS ONE. 9 (9): e106930. Bibcode:2014PLoSO...9j6930K. doi:10.1371/journal.pone.0106930. PMC 4174509. PMID 25250889. Open access icon
  2. ^ Wagner, Claudia; Garcia, David; Jadidi, Mohsen; Strohmaier, Markus (2015-04-21). "It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia". Ninth International AAAI Conference on Web and Social Media. Ninth International AAAI Conference on Web and Social Media.
  3. ^ Tsikerdekis, Michail (2015-06-01). "Personal communication networks and their positive effects on online collaboration and outcome quality on Wikipedia". Journal of the Association for Information Science and Technology. 67 (4): 812–823. doi:10.1002/asi.23429. ISSN 2330-1643. S2CID 30732217. Closed access icon
  4. ^ Luyt, Brendan (2015-06-01). "Wikipedia, collective memory, and the Vietnam war". Journal of the Association for Information Science and Technology. 67 (8): 1956–1961. doi:10.1002/asi.23518. ISSN 2330-1643. S2CID 12986829. Closed access icon
  5. ^ Brendan Luyt (2015-03-25). "Debating reliable sources: writing the history of the Vietnam War on Wikipedia". Journal of Documentation. 71 (3): 440–455. doi:10.1108/JD-11-2013-0147. ISSN 0022-0418. Closed access icon
  6. ^ Polk, Tracy; Johnston, Melissa P.; Evers, Stephanie (2015-04-23). "Wikipedia Use in Research: Perceptions in Secondary Schools". TechTrends. 59 (3): 92–102. doi:10.1007/s11528-015-0858-6. ISSN 8756-3894. S2CID 255309594.
  7. ^ Miquel-Ribé, Marc (2015-04-22). "User Engagement on Wikipedia, A Review of Studies of Readers and Editors". Ninth International AAAI Conference on Web and Social Media. Ninth International AAAI Conference on Web and Social Media.
  8. ^ Lobert, Joshua; Isaias, Bianca; Bernardi, Karel; Mazziotti, Giuseppe; Alemanno, Alberto; Khadar, Lamin (2015-04-25). The EU Public Interest Clinic and Wikimedia Present: Extending Freedom of Panorama in Europe. Rochester, NY: Social Science Research Network. SSRN 2602683.
  9. ^ Noble, Bill; Fernandez, Raquel (2015-06-04). Centre Stage : How Social Network Position Shapes Linguistic Coordination (PDF). 2015 Workshop on Cognitive Modeling and Computational Linguistics: Social Science Research Network.{{cite conference}}: CS1 maint: location (link)
  10. ^ Arazy, Ofer; Ortega, Felipe; Nov, Oded; Yeo, Lisa; Balila, Adam (2015). "Functional Roles and Career Paths in Wikipedia". Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. CSCW '15. New York, NY, USA: ACM. pp. 1092–1105. doi:10.1145/2675133.2675257. ISBN 978-1-4503-2922-4. Closed access icon / author copy 1, author copy 2
  11. ^ Keegan, Brian C.; Brubaker, Jed R. (2015). "'Is' to 'Was': Coordination and Commemoration in Posthumous Activity on Wikipedia Biographies". Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. CSCW '15. New York, NY, USA: ACM. pp. 533–546. doi:10.1145/2675133.2675238. ISBN 978-1-4503-2922-4. Closed access icon / author copy
  12. ^ Klein, Maximilian; Maillart, Thomas; Chuang, John (2015). "The Virtuous Circle of Wikipedia: Recursive Measures of Collaboration Structures". Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. CSCW '15. New York, NY, USA: ACM. pp. 1106–1115. doi:10.1145/2675133.2675286. ISBN 978-1-4503-2922-4. Closed access icon
  13. ^ Narayan, Sneha; Orlowitz, Jake; Morgan, Jonathan T.; Shaw, Aaron (2015). "Effects of a Wikipedia Orientation Game on New User Edits". Proceedings of the 18th ACM Conference Companion on Computer Supported Cooperative Work & Social Computing. CSCW'15 Companion. New York, NY, USA: ACM. pp. 263–266. doi:10.1145/2685553.2699022. ISBN 978-1-4503-2946-0. Closed access icon
  14. ^ Tkacz, Nathaniel (2014-12-19). Wikipedia and the Politics of Openness. Chicago ; London: University Of Chicago Press. ISBN 9780226192277.
  15. ^ Barbe, Lionel; Merzeau, Louise; Schafer, Valérie (2015-04-13). Wikipédia, objet scientifique non identifié. Presses Universit. Paris 10. ISBN 9782840169208.
  16. ^ Geoffrey Colin Fairchild: Improving disease surveillance: sentinel surveillance network design and novel uses of Wikipedia. PhD thesis, CS, University of Iowa, December 2014 pdf
  17. ^ Steiner, Thomas; Ruben Verborgh (2015-01-26). "Disaster monitoring with Wikipedia and online social networking sites: structured data and linked data fragments to the rescue?". arXiv:1501.06329 [cs.SI].
  18. ^ Sen, S. W., Ford, H., Musicant, D. R., Graham, M., Keyes, O. S. B., Hecht, B. 2015 Barriers to the Localness of Volunteered Geographic Information. CHI 2015 PDF
  19. ^ Thomas Roessing: Enzyklopädie-Amateure als Amateur-Journalisten: Wikipedia als Gateway für aktuelle Ereignisse. / Amateur encyclopedia editors as nonprofessional journalists: Wikipedia as a gateway for breaking news HTML, PDF extended abstract in English: PDF. Studies in Communication | Media, No 2 of 2014.
  20. ^ Fang, Guanshen; Sayaka Kamei; Satoshi Fujita (2015-01-31). "How to extract seasonal features of sightseeing spots from Twitter and Wikipedia (Preliminary Version)". Bulletin of Networking, Computing, Systems, and Software. 4 (1): 21–26. ISSN 2186-5140.
  21. ^ Elisa Alonso: Analysing the use and perception of Wikipedia in the professional context of translation. JoSTrans Issue 23 HTML
  22. ^ Hale, Scott A. (2015-01-04). "Cross-language Wikipedia Editing of Okinawa, Japan". Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. pp. 183–192. arXiv:1501.00657. doi:10.1145/2702123.2702346. ISBN 9781450331456. S2CID 1952716.
  23. ^ Barbu, Eduard (2015). "Property type distribution in Wordnet, corpora and Wikipedia". Expert Systems with Applications. 42 (7): 3501–3507. doi:10.1016/j.eswa.2014.11.070. ISSN 0957-4174. Closed access icon
  24. ^ Suzuki, Yu (2015). "Quality assessment of Wikipedia articles using h-index". Journal of Information Processing. 23 (1): 22–30. doi:10.2197/ipsjjip.23.22.
  25. ^ Picot-Clémente, Romain; Cécile Bothorel; Nicolas Jullien (2015-01-07). "Social interactions vs revisions, what is important for promotion in Wikipedia?". arXiv:1501.01526 [cs.SI].
Supplementary references and notes:
  1. ^ Pentzold, Christian (2009). "Fixing the floating gap: The online encyclopedia Wikipedia as a global memory place" (PDF). Memory Studies. 2 (2): 255–272. doi:10.1177/1750698008102055. ISSN 1750-6980. S2CID 146343263. Retrieved 26 May 2015.
  2. ^ Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang and Jon Kleinberg. Echoes of power: Language effects and power differences in social interaction. Proceedings of WWW, 2012.
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • Thanks for the coverage of this research. The gender research is most useful, although the centrality and gendered language results were already known, if I remember correctly. All the best: Rich Farmbrough22:15, 29 May 2015 (UTC).
  • With regard to coverage bias: I would have to wonder if it's not more likely that the sources used as comparisons underrepresent women, than that Wikipedia overrepresents them. With regard to language bias, historically notable females have had some family-related reason to become notable, be it family connections, early widowhood, or simply the identities of the men they married. I'm less concerned, then, with language bias on articles for women who were born 50+ years ago; it'd be interesting to see if the detected language bias still exists for newer biographies. Powers T 23:39, 29 May 2015 (UTC)[reply]
  • Found this research article on popular pages' content quality: Wasted effort and missed opportunities. Is there a WikiProject or TaskForce addressing improvements of the most popular pages based on the Top 25 Report or other similar rating? — Preceding unsigned comment added by JayaJune (talkcontribs) 17:58, 31 May 2015 (UTC)[reply]
@JayaJune: That paper was already covered in last month's issue, see also the discussion on the talk page (including one of the researchers). See here on how to alert us about new papers that should be covered. Regards, Tbayer (WMF) (talk) 07:05, 2 June 2015 (UTC)[reply]
  • I'm skeptical about the conclusion that women are "slightly overrepresented" How do we know if they are not instead underrepresented in these databases? One of these databases is crowdsourced, which means it would certainly be subject to the same systemic biases as Wikipedia, and the other is a database generated from a book from a single author with a history of dubious scholarship. Gamaliel (talk) 19:26, 31 May 2015 (UTC)[reply]
@Gamaliel: I don't know of a reason why "crowdsourced" (btw: [1]) databases should be inherently more biased than those compiled by professionals, but of course you are asking a very important question about the authors' choice of gold standard. Here is what they wrote about it:
It is important to understand that a biased reference dataset will obviously impact our results. If, for example, our reference dataset is already biased towards men (i.e., it covers only extremely famous women but also less famous men) than the proportion of women who are represented on Wikipedia would probably be higher than the proportion of men. To address this issue we analyze the coverage using several independent reference datasets (Jaccard coefficient between the three datasets ranges from 0.0 to 0.12 for different language editions), assuming that each of them will have a different bias and seeking patterns that exist across all three datasets.
While I don't expect this paper will be the last word on gender-related content bias of Wikipedia, it's a lot more solid than many other claims that have been made about the topic, especially in the media. It is also consistent with Magnus Manske's recent blog post who compared Wikidata with VIAF and ODNB (finding both more "sexist" than Wikidata) and concluded that
"Strong gender bias towards men exists in the number of biographical items on Wikipedia and Wikidata, however, this bias appears to be to a large degree due to historical and/or cultural bias, rather than generated by Wikimedians. Since our projects are not primary sources, we are restricted to material gathered by others, and so reflect their consistent bias."
On the other hand, the 2011 "WP:Clubhouse" paper found evidence that "female" films are less well covered on WP than "male" films, and a 2011 paper by Joseph Reagle and Lauren Rhue concluded that "Wikipedia provides better coverage and longer articles, and that it typically has more articles on women than Britannica in absolute terms, but we also find that Wikipedia articles on women are more likely to be missing than are articles on men relative to Britannica".
Also, Max was too modest to mention his ongoing WIGI (Wikipedia Gender Index) project in his review. While - AIUI - it won't examine coverage bias directly, it will surely yield a lot of data that should make it much easier for others to look at possible evidence for such bias.
Regards, Tbayer (WMF) (talk) 07:05, 2 June 2015 (UTC)[reply]


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0