The Signpost

Recent research

STEM articles judged unsuitable for undergraduates below the first paragraph

Contribute  —  
Share this
By Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"The First Paragraph Is As Good As It Gets": study discourages STEM students from using the rest of Wikipedia articles

A study published last month in the journal College Teaching[1] evaluated the suitability of English Wikipedia articles on STEM topics for undergraduate students' "opportunistic learning", defined as "informal, self-regulated study to learn, relearn, or be introduced to a concept".

The 28 articles were chosen from "six disciplines for which willing academics familiar with introductory STEM topics were available to participate in this study: Biology, Chemistry, Environmental Science, Mathematics, Physics, and Statistics", plus a "General" STEM category. Within each of these, the authors selected "four diverse introductory topics commonly encountered in STEM programs [... covering] topics commonly misunderstood or important in the discipline". The four "Statistics" articles had already been examined in a previous paper by three of the authors (see our review: "Evaluating Wikipedia as a self-learning resource for statistics: You know they'll use it").

Each article was evaluated in three components, based on a revision from 14 November 2019: "the entire article, the preamble (before the Table of Contents) [what Wikipedia's manual of style refers to as the lead section], and the preamble first paragraph". The focus on the latter two was motivated by the observation that they "are easily accessed on mobile devices with small screens, and [...] may be all that is read" (quoted from the earlier paper), an assumption supported by several data points and research results.

The articles were evaluated using what the authors call the "ACPD framework" (developed in their earlier paper), assigning a score from 1 ("Not suitable for opportunistic learning") to 3 ("Recommended for opportunistic learning") in each of four criteria:

  • Article accuracy (A), including definitions; interpretation; notation; usage; examples. Accuracy focuses on errors, ambiguities, omissions, and inconsistencies, but also correct spelling and grammar.
  • Effectiveness of the conceptual explanations (C): logical explanations that lead to procedures; explanation beyond definitions; explanation of what is behind the procedure.
  • Effectiveness of the procedural explanations (P): accuracy of procedures explained; examples used to explain procedure; explanation of procedure.
  • Effectiveness of the display or visual components (D): clear; accessible; coherent and well-paced; organized; logical; interesting; context; readability; density of formulae; use of diagrams, videos, animations etc. for illustration; complexity, use and suitability of images.

The authors summarize the resulting ratings as follows:

"Physics was the only discipline to receive zero 3-Ratings. In contrast, Chemistry received four 3-Ratings. [...] Accuracy (A-qualifier) was a barrier to opportunistic learning in nine of the 84 components (all in Environmental Science and Statistics) [...]. The number of A-qualifiers alone suggests not recommending Wikipedia as a learning resource in STEM disciplines.

Conceptual barriers were common (all components within every discipline, except the first paragraphs of Statistics articles), and procedural barriers reasonably common (except for Chemistry). [...] Statistics has (ignominiously) the most barriers regarding displays. Statistics and Environmental Science have the most identified barriers overall.

The number of C-, P- and D-qualifiers noticeably increased while moving from the first paragraph, to the preamble, to the article (Table 3), suggesting first paragraphs are the most useful component."

In the Statistics category, the authors judged the first paragraphs "excellent" with the exception of histogram. But "the preambles and the entire articles were generally poor, with many A-qualifiers (errors). Some errors were basic..."

In "Environmental Science" (the other discipline where the evaluation had flagged accuracy concerns, in the articles extinction and greenhouse effect), the study criticized "uneven, vague, overly simplistic and/or imprecise" writing, highlighting an example from the article species which said "evolutionary processes cause species to change continually, and to grade into one another". The study also took issues with "paragraphs only tangentially related to the topic [...] For example, the 'Biodiversity' (Environmental Science) article states 'Biodiversity inspires musicians, painters, sculptors, writers and other artists', which is not useful for a learner seeking to understand the concept of biodiversity".

In "Mathematics", "articles were generally instructive from an encyclopedic viewpoint, but the fluid narrative was less useful for learners unless supplemented". While there were no accuracy concerns, "C-qualifiers were frequently applied because the development was less helpful for opportunistic learning".

For "Chemistry", the study criticized that "eight of the 12 article components lacked conceptual development (C). The articles introduced concepts at a level substantially above that expected of undergraduates or assumed knowledge that most would not have".

The authors emphasize that these evaluations were specific to the suitability of the Wikipedia articles for opportunistic learning, and that "a technically correct article may be a poor opportunistic learning resource. Of course, some criticisms (e.g., accuracy) may apply more generally".


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Quality change: norm or exception? Measurement, Analysis and Detection of Quality Change in Wikipedia"

From the abstract:[2]

"... we study evolution of a Wikipedia article with respect to [Wikipedia's internal] quality scales. Our results show novel non-intuitive patterns emerging from this exploration. As a second objective we attempt to develop an automated data driven approach for the detection of the early signals influencing the quality change of articles. We posit this as a change point detection problem whereby we represent an article as a time series of consecutive revisions and encode every revision by a set of intuitive features. Finally, various change point detection algorithms are used to efficiently and accurately detect the future change points."

"Digital Communication and Interactive Storytelling in Wikipedia : A Study of Greek Users' Interaction and Experience"

This master's thesis[3] presents results of a survey asking readers of Greek Wikipedia how useful they found its "interactive storytelling tools (hyperlinks to other articles, navigation tables, page previews, photos, external sources of information, etc.)", and about improvements they would suggest.

"A Map of Science in Wikipedia"

From the abstract:[4]

"We rely on an open dataset of citations from Wikipedia, and use network analysis to map the relationship between Wikipedia articles and scientific journal articles. We find that most journal articles cited from Wikipedia belong to STEM fields, in particular biology and medicine (47.6% of citations; 46.1% of cited articles). Furthermore, Wikipedia's biographies play an important role in connecting STEM fields with the humanities, in particular history."

"Analyzing Race and Country of Citizenship Bias in Wikidata"

From the abstract:[5]

By comparing Wikidata queries to real-world datasets [listed here, ...] we discovered that there is an overrepresentation of white individuals and those with citizenship in Europe and North America; the rest of the groups are generally underrepresented. Based on these findings, we have found and linked to Wikidata additional data about STEM scientists from the minorities. This data is ready to be inserted into Wikidata with a bot.


  1. ^ Dunn, Peter K.; Brunton, Elizabeth; Marshman, Margaret; McDougall, Robert; Kent, Damon; Masters, Nicole; McKay, David (2021-11-13). "The First Paragraph Is As Good As It Gets: STEM Articles in Wikipedia and Opportunistic Learning". College Teaching: 1–10. doi:10.1080/87567555.2021.2004387. ISSN 8756-7555. S2CID 244109849. Closed access icon
  2. ^ Das, Paramita; Guda, Bhanu Prakash Reddy; Seelaboyina, Sasi Bhusan; Sarkar, Soumya; Mukherjee, Animesh (2021-11-02). "Quality change: norm or exception? Measurement, Analysis and Detection of Quality Change in Wikipedia". arXiv:2111.01496 [cs.SI].
  3. ^ Mavridis, George (2021). Digital Communication and Interactive Storytelling in Wikipedia : A Study of Greek Users' Interaction and Experience.
  4. ^ Yang, Puyu; Colavizza, Giovanni (2021). "A Map of Science in Wikipedia". arXiv:2110.13790 [cs.DL].
  5. ^ Shaik, Zaina; Ilievski, Filip; Morstatter, Fred (2021-08-11). "Analyzing Race and Country of Citizenship Bias in Wikidata". arXiv:2108.05412 [cs.AI].

In this issue
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

File:Pocket ref cover 4th ed.png


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0