The Signpost

In focus

Measuring gender diversity in Wikipedia articles

Contribute  —  
Share this
By PAC2

When thinking about gender diversity in Wikipedia, we often think of the number of biographical articles about men and women. The Humaniki project shows that about 19% of biographical articles on the English Wikipedia are about women. However, this is only one aspect of gender diversity. In this article, I develop a method which measures gender diversity at the article level and show why it's useful.

Motivation

While working on the article about economics on the French Wikipedia, I was surprised by the low number of women among the people cited in the article. So I've started exploring methods to measure gender diversity. I draw a distinction between gender diversity and gender parity[1]. First, gender parity supposes binary gender, which excludes non-binary people. Second, gender parity implies that the ideal would be a fifty-fifty divide between men and women. After some iterations, I've found a way to measure gender diversity at the article level. This tool can be used to explore gender diversity for articles about academic fields, activities, or occupations. My approach is very basic and simply computes the share of people cited in an article by gender.

This simple quantitative approach to measure gender diversity is similar to many research projects on this theme in computational social sciences. David Doukhan is tracking women's speaking time on the radio[2]. Antoine Mazières and his co-authors are computing the share of screen time with women in popular movies[3] and Gilles Bastin and his co-authors are computing gender frequency of people cited in French newspapers[4].

Methodology

For each article, I get the list of internal links (also known as blue links). I retrieve them using the Wikipedia links API. Then I combine this query with a Wikidata SPARQL query[5]. I select all links corresponding to human beings in Wikidata (property P31 is Q5) and I retrieve their gender (property P21 in Wikidata). Note that gender in Wikidata can be male, female, non-binary, intersex, transgender female, transgender male, or agender. I'd find it more intuitive to group together transgender males with males and transgender females with females but I prefer to keep the classification of Wikidata.

Last, I count the number of entities by gender and compute the share.

Everyone can compute gender diversity for a single Wikipedia article using the gender diversity explorer tool.

This is a very basic approach. It doesn't distinguish any difference between entities cited in the references and entities cited in the core of the article. It doesn't take into account people cited in the article without a link to a Wikipedia article. But even if it's imperfect, I believe this is a useful approach.

Numbers should be interpreted with caution. The number of gendered entities cited in a single article is often very low. I personally don't interpret proportions if the total number of gendered entities is lower than 50.

Insights

Focus on economics

Chart measuring gender diversity in the Wikipedia article Economics in May 2022.

Let's have a look at the article about economics. In May 2022, we find 137 males, 6 cisgender females, and 1 transgender female[6]. So fewer than 5% of people quoted in the article are female. Of course, everyone knows that many prominent economists from Adam Smith to Jean Tirole are male. So no one is really surprised to find a vast majority of males in the results. Nobody would be able to say what a fair share of females in the article would be. However, I personally think that 5% is not much and that the contribution of women to economics is more important. Harriet Martineau, Mary Paley Marshall, Joan Robinson, Elinor Ostrom, Anna Schwartz, Janet Yellen, Esther Duflo, or Susan Athey have all made major contributions to economics.

Focus on academic fields

Share of people cited in articles by gender for academic fields

In this section, I compare gender diversity in Wikipedia articles about some important academic fields. As with economics, we know that most academic fields have long been dominated by male figures. So we're not surprised to find a relative low share of women in Wikipedia articles. By comparing Physics, Architecture, Economics, Social science, Computer science, Philosophy, Mathematics, Psychology, Medicine, Music, Political science, Sociology, Biology, Science, Art, History, and Literature, I find that all of them have a proportion of men higher than 80%[7]. Values for computer science and political science should be taken with caution since the number of people cited in those articles is lower than 50. If we exclude computer science and political science, we find that 10 out of 15 articles have less than 10% of women among all gendered entities! If we look at raw numbers, the count of women in each article is really low: 4 women in mathematics, 4 women in medicine, 1 woman in physics.

Conclusion and discussion

I believe that measuring helps to raise awareness of the problem of gender diversity in Wikipedia articles. Anyone can play with the gender diversity inspector and discover some insights.

In the next months, I would like to explore gender diversity in articles about occupations (journalist, politician, etc.) and activities (journalism, politics, sports, etc.). I would also like to have large scale studies looking at all articles about academic fields or all articles about an occupation.

My experiments with measuring gender diversity in Wikipedia articles lead me to believe that women are often forgotten or undermined in Wikipedia articles about general topics. It would be worthwhile to give specific attention to this topic. WikiProjects such as Women in Red could focus on this issue to ensure that the role of women hasn't been diminished in articles.

References

  1. ^ "The idea of closing the “gender gap” itself has always struck me as somewhat problematic as it implies a gulf between two equivalent sides and reinforces the idea of binary gender. An aspiration to equitable “gender diversity” might be more fitting" writes Katherine Maher in "Capstone: Making History, Building the Future Together", in Wikipedia @ 20, MIT Press, 2020, https://wikipedia20.pubpub.org/pub/4d61w771/release/2?readingCollection=08ec69da
  2. ^ https://larevuedesmedias.ina.fr/la-radio-et-la-tele-les-femmes-parlent-deux-fois-moins-que-les-hommes
  3. ^ "Computational appraisal of gender representativeness in popular movies", https://www.nature.com/articles/s41599-021-00815-9
  4. ^ Gendered News project, https://gendered-news.imag.fr/genderednews/
  5. ^ See the SPARQL queries in the project methodology
  6. ^ https://observablehq.com/@pac02/explore-gender-diversity-in-a-single-wikipedia-article?wikipedia=en.wikipedia.org&article=Economics
  7. ^ https://observablehq.com/@pac02/gender-diversity-in-wikipedia-articles-evidence-from-some?collection=@pac02/gender-diversity-in-wikipedia-articles
S
In this issue
+ Add a comment

Discuss this story

It could be argued, though, that “increasing the proportion of women in our citations for the sake of such” is one way of countering systemic bias. I don’t think we need to know detailed statistics about the contribution of women to economics to know that 95% of citations being from men is likely to be unrepresentative and worth improving on. — OwenBlacker (he/him; Talk; please {{ping}} me in replies) 09:22, 30 May 2022 (UTC)[reply]
We face the same issue with the share of women among biographies. No one know what is the good or fair share (15%, 19%, 30%?). But in the last years, projects such as Women in Red have focused on this issue and made an effort to increase the number of biographies dedicated to women. I'm just raising the same issue at the article level (poke Chess). Of course we need to rely on sources and reflect the reality of the topic. But we have some editorial freedom in the way we write articles and we can develop some aspects of the topic. In the article about economics in French, I've dedicated a section to the question of women in economics. I think it's a good way to start (if there are some sources of course). Last but not least, it's also in my opinion one aspect of the concept of "knowledge equity", which is key in Wikimedia movement strategy (Wikipedia:Wikimedia Strategy 2018–20/Innovate in Free Knowledge). PAC2 (talk) 16:45, 31 May 2022 (UTC)[reply]
@Chaosdruid: The analysis has drawn to your attention (someone who has knowledge of female involvement in Economics) that there is a gap in the article and you have made an improvement to it. I would say that is a positive. Similar analysis of other articles may help identify other areas where there are particular gaps.
Most of the "debate" above is about refining the method of analysis to produce more accurate data. With accurate data, we will be able to spot articles that have an unusual disparity and correct them. From Hill To Shore (talk) 05:44, 9 June 2022 (UTC)[reply]
@From Hill To Shore: I have no fore-knowledge, except A level economics - I simply did a search on Google for the top 10 female economists, read about them, and used that info. That should have already been done, since this page has discussion involving 8+ editors going back for at least two weeks. I feel that is a negative. Similarly these edits include removing a male author, instead of leaving him and adding the inserted female one; which actually looks like more of a negative considering that the article now does not include the counter statement to the previous paragraph end.
Perhaps action is more important than discussion - do we wait to see if anyone else actually adds the other 2 I mentioned? Maybe then we can do an analysis of why no one bothered to actually fix the thing you were all discussing? I will leave it up to one of the other nine or so editors to maybe add some detail on the ladies I mentioned as I feel perhaps there is a litle bit of looking for a disparity rather than curing it. I did not see a "gap in the article", I saw a gap in the editing of said article after someone had raised a flag.
... and yes, I get annoyed about things that are discussed and never actually acted upon Wikiwide, as well as hasty knee-jerk editing that tries to correct a perceived wrong but actually lowers the accuracy of an article. Chaosdruid (talk) 07:15, 9 June 2022 (UTC)[reply]
@Chaosdruid: So, you are complaining about knee jerk reactions but want 8+ editors to jump in and attempt to fix something they may not be familiar with? My interest and expertise do not lie with economics, so you are better placed than I to look at that article. Also, your example of a set of bad edits involve an ongoing content dispute on the article talk page that predated the publication of this edition of Signpost. Why are you trying to link an unrelated content dispute to the editors here?
You are also misrepresenting this discussion. While a few people here have talked about the example used of the economics article, most of the comments are about the principles and methods of analysis. Is there actually a problem and is the data a valid representation of the situation? You want us to fix wiki-wide problems but seem to begrudge people giving up their own time to discuss how we can better understand what the problem is and where we should fix it. That you wanted to improve the economics article and went ahead and edited it is great. However, you shouldn't expect every editor to conform to your expectations and timescales. We all improve the project in our own ways and at our own speed. From Hill To Shore (talk) 09:25, 9 June 2022 (UTC)[reply]





       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0