The Signpost

Recent research

Graham's Hierarchy of Disagreement in talk page disputes

Contribute  —  
Share this
By Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"How to disagree well: Investigating the dispute tactics used on Wikipedia"

Graham's hierarchy of disagreement

This paper,[1] presented earlier this month at the Empirical Methods in Natural Language Processing conference, applies a modified version of Graham's hierarchy of disagreement to classify talk page comments on the English Wikipedia. As explained by the authors:

"[English] Wikipedia recommends the hierarchy of disagreement formulated by Graham (2008) as a guide for constructive dispute resolution [in the Wikipedia:Dispute resolution policy]. Graham’s hierarchy posits that there are seven levels of disagreement, ranging from namecalling (at the bottom) to refuting the central point. [...] Despite its popularity, this hierarchy has not been verified empirically."

The authors call these "rebuttal tactics", and distinguish them from a second category of dispute tactics, "attempts to promote understanding and consensus (referred to as coordination tactics)." Coordination tactics are classified with a separate set of "non-disagreement labels" which is combined from comment types identified in several previous research publications about Wikipedia talk pages (e.g. a paper by Ferschke et al. that was summarized in our March 2012 issue: "Understanding collaboration-related dialog in Simple English Wikipedia").

The authors provide a dataset "of 213 disputes (comprising 3,865 utterances) on Wikipedia Talk pages, manually annotated with the dispute tactics employed in the process of resolving a disagreement between editors", allowing multiple labels for each comment ("up to three rebuttal strategies and two resolution strategies per utterance", see examples below).

These discussions are drawn from the authors' own "WikiDisputes" dataset, which provides information "which is annotated according to whether the dispute was resolved without the need for a moderator." This allows the researchers to identify relations between specific dispute tactics and the risk of a conversation escalating. For example, they

find that a lower mean rebuttal level in a disagreement is correlated with less constructive dispute resolutions, providing empirical validation of the ordering proposed by Graham (2008) and recommended by Wikipedia to its editors.

In particular, they examine the effect of personal attacks, finding e.g. that conversations can still recover after a personal attack happens:

"We define recovery in terms of having an utterance labeled as rebuttal level 5 or higher and no further personal attacks. By this definition, half of the disputes were found to recover after a personal attack, indicating that personal attacks do not necessarily result in conversational failure."


Of the escalated disputes with personal attacks, only 44.3% are found to recover, whereas 59.2% of resolved disputes recover post attack. This indicates that although personal attacks also occur in non-escalated disputes, participants are better adept at moving beyond them. We further find that immediate retaliation (i.e. a personal attack being followed by another personal attack) occurred in 25.7% of cases. In disputes where at least one personal attack had occurred, the probability that the initial offender will re-offend in the same conversation is 53%, while the probability of another user using a personal attack at some point subsequently is 64%.

The study proceeds to use machine learning for automatically classifying talk page comments with these multi-labels. A BERT-based model performed best (according to three different performance metrics), but still struggled with some of the labels:

"The label most frequently correctly predicted is coordinating edits (111 of 137 cases), which is also the most common label in the training set. The next most correctly predicted label, proportionally, is contextualisation (75%, or 24 of 32 cases), despite not being a commonly used label. This is likely due to the additional positional information available to the model, since this label is often applied to the first utterance in a conversation. On the other hand, refutation and refuting the central point are never correctly predicted (out of 44 cases), with counterargument often mistakenly predicted instead."

Lastly, they apply this to the separate task of predicting whether a conversation will escalate, already examined in their earlier paper that gave rise to the "WikiDisputes" dataset. Namely, they use "multitask training with escalation as the main task and tactics as the auxiliary task, such that the features that are predictive of dispute tactics are incorporated in the escalation predictions." This improves upon their earlier prediction algorithm, "indicating that knowledge of these dispute tactics is useful for tasks beyond classifying the tactics employed."

The following table (adapted from Figure 1 in the paper) shows the labeling of several comments by two different users in one talk page discussion:

"An example from the WikiTactics dataset"
Utterance Coordination tactic(s) Rebuttal tactic(s)
The community put WP:ENGVAR in place exactly because there is no rational way to resolve a style dispute like this. The notion is that if English style X is established in article, don't change it without prior consensus. Without that [policy], articles would be beset by endless edit wars over style issues that would become a time sink across the encyclopedia. Contextualisation
Hi, I am aware of WP:ENGVAR and would like to point out to you the policy says that one should "use the variety found in the first post-stub revision that introduced an identifiable variety". In the case of this article, that is "a herb", which was introduced in the original article. I will leave the current wording for a few weeks to see if anyone else decides to weigh in, and intend to then change the page to align with policy. Suggesting a compromise DH6: Refutation
It is impossible to get local consensus on this kind of thing, which is why ENGVAR exists. Leave it alone, or waste the community's time with an RfC but stop wasting your time and mine making useless arguments here. I don't care if it says "an" or "a" - what is not acceptable is messing with it. DH4: Repeated argument
DH3: Policing the discussion
I admit that when I made those edits, I didn't realise it was actually a ENGVAR issue but rather just a mistake, hence my zeal in making the changes. To emphasise: the policy exists to recanting unamIbiguously resolve these debates and for this article, it should be "a herb". I see no real arguments for the contrary, and for what it's worth, my having made policy-incorrect edits (in good faith), doesn't diminish the fact that policy is clear on this one. Conceding / recanting DH4: Repeated argument
I have warned you to walk away from being a style warrior and wasting everyone's time. You will do as you will. DH1: Ad hominem attack
No one further has weighed in on this and so I am making the change in accordance with policy, as I have done on each of the herb-related pages. Each of these articles is now in accordance with WP:ENGVAR. Please do not edit it without an RFC or DR. We are now within the spirit and letter of policy on each of these pages and I hope we can draw a line under this ridiculous matter. Coordinating edits DH3: Policing the discussion


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Analyzing Digital Discourses: Between Convergence and Controversy"

From the abstract:[2]

"This study analyses Wikipedia’s sites for negotiating convergence, conflict and identity, concentrating on two aspects. First, convergence and conflict at the macro-level of intercultural comparison are investigated using the example of the construction of concepts of nationalism, citizenship, identity and tribe in their English and German language versions. Second, the English articles serve as a basis to examine the types of convergence and conflict tendencies at the micro-level of the Talk-section."

From the paper's section on talk pages:

"[...] in our data, criticism of content (81 instances/31% of all 259 conflictual codings) is the most frequent conflictual category [...], followed by general metapragmatic criticism concerning clarity and more general stylistic features [...], metapragmatic criticism related to Wikipedia's principles (each comprising about half of the total of 81 metapragmatic tokens), or a mixture of both [...].

Giving reasons for disagreeing is the mitigating strategy used most frequently in all for Talk1-sections, followed by suggesting, inviting and hedged imperatives to induce further improvement of an article, agreement and additional explanation to clarify an issue [...]."

Discursive Perspective on Wikipedia: More than an Encyclopaedia? (book)

From the publisher's description:[3]

"This book provides a concise yet comprehensive guide to Wikipedia for researchers and students of linguistics, discourse and communication studies [...]. Drawing on Herring's situational and medium factors, as well as related developments in (critical) discourse studies, the author studies the online encyclopaedia both theoretically and empirically, examining its origins, production and consumption before turning to a discussion of its societal significance and function(s). "

"What’s hot and what's not in lay psychology: Wikipedia’s most-viewed articles"

From the abstract:[4]

"We studied views of articles about psychology on 10 language editions of Wikipedia from July 1, 2015, to January 6, 2021. We were most interested in what psychology topics Wikipedia users wanted to read, and how the frequency of views changed during the COVID-19 pandemic and lockdowns. [...]. We made two important observations. The first was that during the pandemic, people in most countries looked for new ways to manage their stress without resorting to external help. [...] We also found that academic topics, typically covered in university classes, experienced a substantial drop in traffic, which could be indicative of issues with remote teaching."

"Building a Public Domain Voice Database for Odia"

From the abstract and paper:[5]

"The pilot detailed in this paper is about creating a large freely-licensed public repository of transcribed speech in the Odia language as such a repository was not known to be available. The strategy and methodology behind this process are based on the OpenSpeaks project [which is hosted on the English Wikiversity at ].

"The 'Methodology' section details the process of collecting words [from a dump of Odia Wikipedia], compiling a wordlist [making use of Wikidata lexeme forms to generate additional forms], recording the pronunciation of those words, and uploading the speech data to Wikimedia Commons using Lingua Libre."


  1. ^ Kock, Christine De; Vlachos, Andreas (December 2022). How to disagree well: Investigating the dispute tactics used on Wikipedia. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 3824–3837. Data
  2. ^ Kleinke, Sonja; Landmann, Julia (2021). "Cross-Cultural Observations on English and German Wikipedia Entries at the Interface of Convergence and Controversy". In Johansson, Marjut; Tanskanen, Sanna-Kaisa; Chovanec, Jan (eds.). Analyzing Digital Discourses: Between Convergence and Controversy. Cham: Springer International Publishing. pp. 135–162. ISBN 9783030846022. Closed access icon Google Books
  3. ^ Kopf, Susanne (2022). A Discursive Perspective on Wikipedia: More than an Encyclopaedia?. Cham: Springer International Publishing. ISBN 9783031110238. Closed access icon
  4. ^ Ciechanowski, Kaśmir; Banasik-Jemielniak, Natalia; Jemielniak, Dariusz (2022-10-12). "What's hot and what's not in lay psychology: Wikipedia's most-viewed articles". Current Psychology. doi:10.1007/s12144-022-03826-0. ISSN 1936-4733.
  5. ^ Panigrahi, Subhashish (2022-04-25). "Building a Public Domain Voice Database for Odia" (PDF). Companion Proceedings of the Web Conference 2022. WWW '22. New York, NY, USA: Association for Computing Machinery. pp. 1331–1338. doi:10.1145/3487553.3524931. ISBN 9781450391306.
In this issue
+ Add a comment

Discuss this story

It's amusing to read the sample discussion and see that

  1. It is a lame dispute about whether it should be "a herb" or "an herb"
  2. That the usage had been flipped before
  3. That the first three editors were Usernameistoosimilar, Deli nk and Jytdog who have all been indefinitely blocked for their sins.

So, the main problem is not where we stand on the tactical hierarchy of argument but whether we should be arguing at all. Wikipedia is infested with dysfunctional and unproductive editors – griefers, grinders, obsessive pedants, fanatics and more. In making mountains out of molehills, they are operating at a different level and this analysis fails to capture this more fundamental issue. As usual, further research is needed...

Andrew🐉(talk) 21:39, 1 January 2023 (UTC)[reply]

@Andrew Davidson: While the dispute certainly is lame, the correct reference for the conversation used as a sample is Talk:Fenugreek#A_herb/an_herb_2, which was started by Jytdog off of that previous discussion you linked to, and grew quite a bit more involved than that one (though no more productive, AFAICT). Neither Usernameistoosimilar nor Deli nk seem to have participated. It was basically just Jytdog and Porphyro going back, and forth, and back, and forth, and... well, it's long, is the point. FeRDNYC (talk) 14:37, 3 January 2023 (UTC)[reply]
That further discussion takes the lame to another level. And then there's a part 3 in which an edit notice is requested. And now, of course, we are digging deeper with this further derivative discussion...
Looking at the actual article, I notice that it doesn't mention that fenugreek was one of the ingredients in the original herbal formula of the famous Lily the Pink. I must edit the article myself and see what further mayhem ensues. WP:INTODARKNESS...
Andrew🐉(talk) 18:06, 3 January 2023 (UTC)[reply]
RIP Andrew Davidson, died on An Hill,[a] 3 January 2023. He will be missed.
  1. ^ SWIDT?
FeRDNYC (talk) 06:16, 4 January 2023 (UTC)[reply]
I fixed that link to the source discussion. Thanks! Regards, HaeB (talk) 12:15, 4 January 2023 (UTC)[reply]
So, I edited fenugreek to add some details of its use in Lydia E. Pinkham's Vegetable Compound. I had researched this reasonably well, finding a book about the history of the compound which detailed the recipe. I cited this but the entire addition was reverted with the edit summary, "Nonsense and WP:FRINGE". This seems to be name-calling at the bottom level of the hierarchy but so it goes. The next step may be to start an RfC... Andrew🐉(talk) 16:51, 7 January 2023 (UTC)[reply]


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0