Almost half a century ago, officials at the University of California, Berkeley became concerned about apparent gender bias against women at their institution's graduate division: 44% of male applicants had been admitted for the fall 1973 term, but only 35% of female applicants – a large and statistically significant difference in success rates. The university asked statisticians to look into the matter. Their findings,[supp 1] published with the memorable subtitle
"Measuring bias is harder than is usually assumed, and the evidence is sometimes contrary to expectation."
became famous for showing that not only did such a disparity not provide evidence for the suspected gender bias, rather, on closer examination, the data in that case even showed "small but statistically significant bias in favor of women" (to quote from the Wikipedia article about the underlying paradox). The Berkeley admissions case has since been taught to generations of students of statistics, to caution against the fallacy that it illustrates.
But not, apparently, to Francesca Tripodi, a sociology researcher at the UNC School of Information and Library Science, who received a lot of attention on social media over the past month (and was interviewed on NPR by Mary Louise Kelly) about a paper published in New Media & Society, titled "Ms. Categorized: Gender, notability, and inequality on Wikipedia". Her summary of one the two main quantitative results mirrors the same statistical fallacy that had tripped up the UC Berkeley officials back in 1973:
"I sought to compare if the overall percentage of biographies about women nominated for deletion each month was proportionate to the available biographies about women. If the nomination process was not being biased by gender, the proportions between these datasets should be roughly the same. [...] From January 2017 to February 2020, the number of biographies about women on English-language Wikipedia rose from 16.83% to 18.25%, yet the percentage of biographies about women nominated for deletion each month was consistently over 25%." [my bolding]
And while Tripodi correctly points out that this overall discrepancy between articles about male and female subjects is statistically significant (just like the one in the Berkeley case), further arguments in the paper veer towards p-hacking (a term for a kind of data misuse that consists of repeating an experiment or measurement multiple times, cherry-picking those outcomes that resulted in a significant result in the expected direction, and dismissing those that did not):
"In January 2017, June 2017, July 2017, and April 2018, women’s biographies were twice as likely as men’s biographies to be miscategorized as non-notable (p < .02 for each month). The statistical significance and the real significance of the observed difference of these findings strongly support the patterns identified during my ethnographic observations. Wikipedians trying to close the gender gap must work nearly twice as hard to prove women’s notability [...] Only once (June 2018) were notable men more frequently miscategorized, but this was not statistically significant (p > .15). Three times over the three-year period my data could not reject the null hypothesis. The proportion of miscategorized biographies was equal between men and women in October 2018, November 2018, and May 2019. However, these proportions were not statistically significant (p > .85)."
Does this mean that disparities such as the one found by Tripodi here can never be evidence of gender bias? Of course not. But (again quoting from the aforementioned Wikipedia article), it requires that "confounding variables and causal relations are appropriately addressed in the statistical modeling" (with several methods being used for this purpose in bias and discrimination research) – something that is entirely lacking from Tripodi's paper. And it is easy to think of several possible confounders that might have a large effect on her analysis.
It is also noteworthy that several previous research publications who started from similar concerns as Tripodi (e.g. that the gender gap among editors – which is very well documented across many languages and Wikimedia projects, see e.g. this reviewer's overview from some years ago – would cause a gender bias in content too) but applied more diligent methods, e.g. by attempting to use external reference points as a "ground truth" against which to compare Wikipedia's coverage, ended up with unexpected results:
To be sure, other papers found evidence for bias in expected directions, for example in the frequency of words used in articles about women. But overall, this shows that Tripodi's conclusions should be regarded with great skepticism.
Tripodi's second quantitative result, the "miscategorization" concept highlighted in the paper's title, is likewise more open to interpretation than the paper would like one to believe. The author found that once nominated for deletion, articles about women have a higher chance of surviving than articles about men. She interprets this as evidence for sexist bias against women (apparently taking the eventual AfD outcome as a baseline, i.e. postulating the English Wikipedia community as a whole as a non-sexist neutral authority against which to evaluate the individual AfD nominator's action). Other researchers have taken the exact opposite approach, where it would have counted as evidence for bias against women when pages about them would be more likely to be deleted than pages about men, e.g. Julia Adams, Hannah Brückner and Cambria Naslund in the paper reviewed here (which also, as Tripodi acknowledges, "found that women academics were not more likely to be deleted" in a sample of 6,323 AfD discussions – in contrast to Tripodi's sample, where women in general were deleted less often than men).
The quantitative results only form part of this mixed methods paper though. In its qualitative part, Tripodi draws from extensive field research, namely
hundreds of hours of ethnographic observations at 15 edit-a-thons from 2016 to 2017. Edit-a-thons are daylong events designed to improve the representation of women on Wikipedia while also providing a safe space for new editors—primarily women—to learn how to contribute to Wikipedia [...]. In addition to edit-a-thons, I also attended two large-scale Wikipedia events, smaller meetups, happy hours, and two regional chapter meetings. In-depth interviews with 33 individuals (23 Wikipedians and 10 new editors) were conducted outside participant observation spaces.
Tripodi's report about the impressions and frustrations shared by these participants are well worth reading. For example:
"In interviews following the event, newcomers said that they enjoyed the process, but would not likely edit on their own because they still found the experience too frustrating. Most had attended the event in the hopes of adding hundreds of women. They were dismayed to learn that adding just part of an article had taken the entire day. Only one person I interviewed recalled their username/password just days following their participation in an edit-a-thon and none of the new editors had added the articles they created to their “watchlist” ..."
Still, even the validity of some of the paper's qualitative observations have been questioned by Wikipedians. For example, Tripodi opens her paper with a misleading summary of the Strickland case:
"On March 7, 2014, a biography for Donna Strickland, the physicist who invented a technology used by all the high-powered lasers in the world, was created on Wikipedia. In less than six minutes, it was flagged for a “speedy deletion” and shortly thereafter erased from the site. This decision is part of the reason Dr. Strickland did not have an active Wikipedia page when she was honored with the Nobel Prize in Physics four years later. Despite clear evidence of Dr. Strickland’s professional endeavors, some did not feel her scholastic contributions were notable enough to warrant a Wikipedia biography."
However, this deletion within minutes did not at all rely on examining "evidence of Dr. Strickland’s professional endeavors" – rather, it was done based on the "Unambiguous copyright infringement" speedy deletion criterion, as can be readily inferred from the revision history that Tripodi cites here.
It is worth noting that the author of this deeply flawed paper has testified twice before U.S. Senate Judiciary Committee in the past, on different but somewhat related matters (bias in search engine results in particular).
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
From the abstract:
"...we asked which sources informed Wikipedia’s growing pool of COVID-19-related articles during the pandemic’s first wave (January-May 2020). We found that coronavirus-related articles referenced trusted media sources and cited high-quality academic research. Moreover, despite a surge in preprints, Wikipedia’s COVID-19 articles had a clear preference for open-access studies published in respected journals and made little use of non-peer-reviewed research up-loaded independently to academic servers. Building a timeline of COVID-19 articles on Wikipedia from 2001-2020 revealed a nuanced trade-off between quality and timeliness, with a growth in COVID-19 article creation and citations, from both academic research and popular media. It further revealed how preexisting articles on key topics related to the virus created a frame-work on Wikipedia for integrating new knowledge. [...] Lastly, we constructed a network of DOI-Wikipedia articles, which showed the landscape of pandemic-related knowledge on Wikipedia and revealed how citations create a web of scientific knowledge to support coverage of scientific topics like COVID-19 vaccine development. [...] Wikipedia successfully fended of disinformation on the COVID-19 [sic]"
From the abstract:
"This article argues for research on the effects of multilingualism and mutual intelligibility on Wikipedia reading behaviour, focusing on the Nordic countries, Denmark, Norway, and Sweden. Initial exploratory analysis shows that while residents of these countries use the native language editions quite frequently, they rely strongly on English Wikipedia, too."
From the abstract:
"... the Covid-on-the-Web project aims to allow biomedical researchers to access, query and make sense of COVID-19 related literature. To do so, it adapts, combines and extends tools to process, analyze and enrich the "COVID-19 Open Research Dataset" (CORD-19) that gathers 50,000+ full-text scientific articles related to the coronaviruses. [...] The dataset comprises two main knowledge graphs describing (1) named entities mentioned in the CORD-19 corpus and linked to DBpedia, Wikidata and other BioPortal vocabularies, and (2) arguments extracted using ACTA, a tool automating the extraction and visualization of argumentative graphs, meant to help clinicians analyze clinical trials and make decisions. On top of this dataset, we provide several visualization and exploration tools ..."
From the abstract:
From the abstract: "Findings showed that the library has created or edited digital content for various categories of women, such as women in academia, industry and politics. These entries have received more than eight million views over a period of two years, which shows that the entries are being utilised. However, the editing exercise had been confronted with challenges such as accessing reliable citations in terms of the notability and verifiability policy of Wikipedia amongst others."
From the abstract:
"Scholarship and journalism about Wikipedia often consider the ways it carries forward, diverges from, or takes to an extreme the various qualities commonly ascribed to encyclopedias. In doing so, it is taken for granted that encyclopedias are authoritative sources of summarized knowledge based on values like accuracy and comprehensiveness, and the question becomes how Wikipedia compares. Through this dissertation, I argue that these commonly held beliefs about encyclopedias are not inherent in the text but the result of centuries of external associations and internal efforts to cultivate a particular kind of authority. Encyclopedias have had close relationships with powerful institutions throughout their history and use a variety of techniques to frame the ways readers should think about them. Furthermore, these cultivated 'encyclopedic virtues' obscure the way that encyclopedists negotiate competing priorities and influences in the knowledge production process. Rather than being perfect, neutral summaries of the world, they often reflect nationalist, religious, or capitalist interests, sometimes even requiring the consent of the powerful in order to be published at all, or in rare cases, they can even prioritize direct critique of those same institutions."