The Signpost

File:Eastman Johnson - Winnowing Grain.jpg
Eastman Johnson
PD
0
26
300
Discussion report

Six thousand noticeboard discussions in 2025 electrically winnowed down to a hundred

Contribute   —  
Share this
By JPxG

What is Wikipedia? Wikipedia is a new paradigm in human discourse. It's a place where anyone with a browser can go, pick a subject that interests them, and without even logging in, start an argument. In fact, Wikipedia is the largest and most comprehensive collection of arguments in human history, incorporating spats and vendettas on subjects ranging from Suleiman the Magnificent to Dan the Automator. (links added)

— Lore Sjöberg, "The Wikipedia FAQK", Wired, 2006

Since its beginning, the English Wikipedia has used a consensus model: community discussions are the main process to implement, interpret, reinterpret and even form policies and guidelines. Over the years, the venues for this have grown and evolved. Currently, most of it takes place at one of a couple dozen "noticeboards", internal project pages in which threads are opened to address issues or open discussions. These range from broad discussions of core sitewide policy (hence why we call it Wikipedia:Village Pump) to conduct issues with individual users (hence why we call it Wikipedia:Great Dismal Swamp).

However, there is far too much of it for anyone to keep track of: since the beginning of 2025, there have been over six thousand threads on the noticeboards and village pumps.

Who has time for that?

Luckily for the person who wants to keep up anyway, most of these are somewhat inconsequential in the grand scheme of things (one person having a minor CSS issue on a specific skin, one person vandalizing a page and being blocked immediately). The more consequential threads are few and far between. But there is still an issue here: how can we distinguish between them? Even if 90% of threads are routine everyday issues, it is still quite time-consuming to go through a giant list and determine which 10% of thread titles will end up being a discussion of significant consequence.

Well, more significant threads tend to be longer. Often, the conversations with the most participants are those which examine Wikipedia's most interesting edges in editorial policy, coverage of content, and values of users. Discussions with high engagement are almost always conflicts and debates, where discussion participants are passionate about a topic and recruit others into the conversation. Noticeboard threads follow a power law distribution, and giving ourself a length-based cutoff sharply decreases the number of discussions to look at. But even then, hundreds of noticeboard archives would take days to go through and manually examine the section sizes.

This is where computerized analysis becomes useful. I wrote a program that would make the little electric person inside of the computer box look at every noticeboard thread after a start date, and compile a table of each discussion (its title, its URL, a count of its participants and its length) — this is what it had to say.

Total
number
of
threads
350
700
1,050
1,400
1,750
2,100
2,450
2,800
3,150
3,500
3,850
4,200
4,550
4,900
5,250
5,600
5,950
6,300
6,650
7,000
0
.5
1
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
Size (decakilobytes and/or casks and/or gifts)

The above graph relates the number of discussions to the cutoff length, in tens of kilobytes. Roughly speaking, one character of plain text (with no formatting) is one byte, so a kilobyte is a thousand characters: the Rifleman's Creed is about one kilobyte, the short story The Gift of the Magi is eleven kilobytes, and the short story The Cask of Amontillado is thirteen. In Wikipedia discussions, there is a lot of formatting for bolding, underlining, italicization, links and templates: for example, my signature (jp×g🗯️) is 175 bytes but shows up as four letters and an emojus. Even the default signature — JPxG (talk) — is 44 characters of code for 11 characters of text. There are plenty of other situations where people use code in discussions, but every comment is guaranteed to have one signature, so we can assume right out the gate that the byte count of a discussion will virtually always be higher than the byte count of readable text.

Now, while I'm sure there is some way to determine an aggregate coefficient of discussion size to amount of rendered text, time constraints (as you will see) limit how much effort can be spent on this, so for the time being, we can somewhat approximately say that ten kilobytes is around the length of one Cask of Amontillado, or one Gift of the Magi.[1]

What this means is that if we decide to only read discussions from 2025 of at least one cask's length, our task goes from reading six thousand threads to reading nine hundred: more than a six-fold reduction. And if we go to two casks, it becomes four hundred, and by the time we get to fifty kilobytes, there are only 118 threads from this year to date, which represent a pretty wide gamut of discursive events: those who have enough free time to follow these places on a regular basis will likely see a lot of familiar section headings.

Methodology

This approach does, of course, give us some shortcomings:

Ultimately, however, I think that perfect is the enemy of the good, and these shortcomings do not eliminate the benefit of this procedure. The alternative to running a pre-winnowed analysis of noticeboard discussions is not an artisanal hand-crafted holistic analysis, but rather no analysis at all. Indeed, running this analysis in July gives a substantial backlog of discussions, even with a relatively high threshold, and time constraints would dictate an extremely sparse allotment of time to each. For the intrepid, there is also a truly massive, browser-groaning table of all 921 discussions above 10k.

Drama

One thing that's quite noticeable about these discussions is that many of them are very contentious arguments about user conduct issues. That is to say, they are "dramaboard" threads. This was somewhat unexpected; while I knew that there were a lot of these, and I knew that they got very long, I didn't think that they would actually constitute a majority of high-length noticeboard discussions. Perhaps this reflects negatively on us as a project — or perhaps it reflects negatively on a noticeboard-centric methodology for winnowing discussions. I think more analysis is necessary to figure out what's going on here. In the process of preparing this report, it was pointed out to me that this could lead down a dark path — those of a certain age may recall the heyday of Encyclopædia Dramatica with consternation.

It is true, I think, that including so many intensely-personal disputes in a list of most-participated-in discussions could end up being intrusive or even voyeuristic if done without sensitivity and care. Indeed, this is the same issue that occurs when writing the Signpost arbitration report — a column that often features lurid details of good editors at their worst. But noticeboard threads, like arbitration cases, bear heavily on the policies and guidelines of the project, and are indeed inseparable from them. Many important policies and precedents are based on specific incidents, and the same is even more true of our unwritten customs.

While we may have our personal disputes, we are ourselves the persons who shape the project, and this project remains a major participant in the online world's information ecosystem — many arbitration cases are central to our coverage on contentious hot-button issues, and obviously of great import to the project at large. For this reason, I think it is appropriate to include all noticeboard threads, even the dramaboards, and maybe even especially the dramaboards.

The table

As a brief sample of what sorts of things this approach turns up — and again given the combination of time constraints with the large amount of time to be covered — I will give a table overview of noticeboard discussions above 50,000 characters closed between the beginning of 2025 and today.

Since this is a sortable wikitable, the way to view it sorted is to click on the top of the respective column: the default order has no particular significance.

In this table you can see a number of statistics for each discussion, aside from simple length. It's possible to count the number of comments in a discussion,[2] and do a winnowing based on that, rather than simple volume of commentary. It's also possible to count the number of distinct signatures, which allows winnowing based on how broad participation was, rather than how much of it occurred. Furthermore, maximum indent level can be measured, which represents the longest exchange in a subthread. One may imagine other measurements, like average indent level, which would give an approximation of how much the conversation consisted of individual exchanges (e.g. a straightforward RfC where each comment was a response to the opening question would have a low average indent level, whereas a highly personal back-and-forth argument between individual users would have a high one, even if both had the same amount of text).

For example:

It is my great regret that I must leave you with a simple unrefined table of discussions, but vicissitudes in my own life have recently conspired to give me very little time for on-wiki activities. However, it is my plan to keep running this program for every issue.

Perhaps someone might step forward for upcoming issues to help summarize and analyze future batches!

Noticeboard Heading title Length in characters Number of signatures Number of distinct users Maximum indent level First detected timestamp Latest detected timestamp
VPWMF RfC: Adopting a community position on WMF AI development 249401 313 159 16 2025-05-29 2025-07-03
VPWMF The WMF should not be developing an AI tool that helps spammers be more subtle 74909 49 62 8 2025-05-24 2025-06-10
VPWMF WMF receives letter from Trump-appointed acting DC attorney 147850 289 191 20 2025-04-26 2025-06-05
VPR Finishing WP:LUGSTUBS2 126305 175 60 19 2025-04-24 2025-07-09
VPR RfC: work field and reflinks 51476 98 76 11 2025-04-04 2025-05-09
VPR On redirect from mis/other capitalization tags 69069 132 40 18 2025-05-20 2025-06-02
VPR Reviving / Reopening Informal Mediation (WP:MEDCAB) 50118 47 50 6 2025-01-25 2025-02-26
VPT Simple summaries: editor survey and 2-week mobile study (cont.) 222861 365 216 15 2025-06-04 2025-06-22
VPT We are looking for a pilot for our new feature, Favourite Templates 63339 117 52 25 2025-06-17 2025-07-05
VPT Simple summaries: editor survey and 2-week mobile study 117788 229 221 11 2025-06-03 2025-06-14
VPT Simple summaries: editor survey and 2-week mobile study (cont.) 222861 365 216 15 2025-06-04 2025-06-22
VPT Dark-mode navbox styling 52234 4 6 3 2025-05-19 2025-05-19
VPP Admin inactivity rules workshopping 121523 181 68 17 2025-05-25 2025-06-11
VPP Temporary account IP-viewer 90310 162 70 9 2025-06-09 2025-06-24
VPP Rate-limiting new PRODs and AfDs? 132788 207 75 16 2025-03-03 2025-05-04
VPP RfC: Amending ATD-R 67663 106 52 12 2025-01-24 2025-03-23
VPP RfC: Voluntary RfA after resignation 82006 173 163 8 2024-12-16 2025-01-20
VPP LLM/chatbot comments in discussions 262672 408 251 12 2024-12-02 2025-01-13
VPM Heritage Foundation intending to "identify and target" editors 86113 190 148 12 2025-01-08 2025-01-15
VPIL Navigation pages 87257 161 59 18 2025-03-13 2025-05-26
VPIL What do we want on the front page? 84416 157 62 20 2025-02-04 2025-03-30
VPIL "Eligibility", "Suitability", or "Admissibility" instead of "Notability" 60731 123 47 14 2025-03-29 2025-04-05
VPIL Dealing with sportspeople stubs 57898 95 46 16 2025-02-20 2025-03-08
VPIL Opt-in content warnings and image hiding 110267 208 60 24 2024-12-11 2025-01-04
VPWMF WMF plan to push LLM AIs for Wikipedia content 94457 117 71 15 2025-04-30 2025-05-28
RSN Paper co-authored by FRINGE org founder 110893 125 43 18 2025-07-02 2025-07-13
RSN RFC: Euro-Mediterranean Human Rights Monitor 99018 149 78 14 2025-03-19 2025-06-23
RSN RFC: Southern Poverty Law Center 228357 365 231 22 2025-05-24 2025-06-10
RSN LiveMint for the 2025 India-Pakistan conflict 97427 166 42 21 2025-05-21 2025-06-07
RSN Classical sources (Herodotus, Plutarch etc) 177444 183 41 11 2025-05-12 2025-06-02
RSN Question about Hatewatch and the SPLC 100531 172 59 20 2025-05-22 2025-05-31
RSN RfC: Handwritten testimony of Geneviève Esquier 56508 70 25 13 2025-04-17 2025-05-23
RSN When RS make false claims, we do not treat them as true. 84851 92 34 13 2025-03-17 2025-03-31
RSN Is the Cass Review a reliable source? 92087 107 62 9 2025-02-21 2025-03-19
RSN Erin Reed, LA Blade, and Cass Review: Does republication of SPS in a non SPS publication remove SPS? 165288 168 70 13 2025-01-29 2025-02-25
RSN Forbes contributor David Axe 50982 69 29 17 2025-02-07 2025-02-17
RSN RfC: Jacobin 156406 253 182 20 2021-07-19 2025-02-21
RSN RfC: Geni.com, MedLands, genealogy.eu 52013 83 31 9 2024-12-31 2025-02-03
RSN Nigerian newspapers 69908 108 60 11 2024-12-19 2025-01-17
RSN RFC Science-Based Medicine 89547 174 81 18 2024-12-06 2025-01-11
RSN Jeff Sneider / The InSneider 72990 78 19 19 2024-12-21 2025-01-09
RSN RfC: Al-Manar 68771 144 67 21 2024-11-15 2025-01-03
BN Resysop Request (NaomiAmethyst) 54289 107 67 17 2025-03-10 2025-03-19
AN Review of SPLC closure 51752 70 56 14 2025-06-10 2025-06-25
AN RfC closure review request at Wikipedia:Fringe theories/Noticeboard#Society for Evidence-Based Gender Medicine 196387 229 103 19 2025-05-29 2025-06-08
AN Creations by banned or blocked users -- must they always be speedily deleted per WP:G5? 115171 191 86 18 2025-03-15 2025-03-30
AN Tban appeal 71967 105 73 11 2025-03-25 2025-04-02
AN CBAN appeal - Roxy the Dog 57739 113 85 20 2025-02-14 2025-02-19
AN Threats and ad-hominems being used to bully editor 55083 50 32 8 2025-02-24 2025-02-28
AN Arbitration enforcement action appeal by Toa Nidhiki05 65250 46 38 12 2025-02-24 2025-03-08
AN References 99032 189 111 24 2024-12-16 2025-01-01
ANI User:WhoIsCentreLeft - Action/intervention needed for WP:DISRUPTIVE, including serious and repeatedWP:COPYVIO (EDIT: Request URGENT block under WP:CVREPEAT) 50957 107 20 28 2025-07-13 2025-07-14
ANI Darkwarriorblake and personal attacks 91028 134 64 19 2025-06-27 2025-07-05
ANI User:bloodofox 318443 335 173 17 2025-06-12 2025-07-09
ANI Ohconfucius Changing English variants without consensus 57925 107 56 9 2025-06-19 2025-07-07
ANI Issues with a student project 72302 66 44 12 2025-06-20 2025-06-28
ANI Grayfell selectivelly removing reliable sources from several articles 83782 113 43 12 2025-06-22 2025-07-01
ANI LukeWiller 67575 114 85 10 2025-07-01 2025-07-02
ANI Editors reverting RfC closure at Talk:Forspoken 183261 245 104 12 2025-06-01 2025-06-19
ANI Administrator civility standards and Necrothesp 56971 88 69 11 2025-06-18 2025-06-22
ANI Kellycrak88, again 59260 52 49 10 2025-06-16 2025-06-24
ANI Persistent, long-term battleground behavior from multiple editors at capitalization RMs 523983 768 219 19 2025-06-08 2025-07-03
ANI Editors reverting RfC closure at Talk:Forspoken 178396 236 101 12 2025-06-01 2025-06-14
ANI Is it appropriate for an Admin editor to create an article just to put Nazi ancestral claims into a BLP? 207378 299 132 14 2025-05-13 2025-06-09
ANI Breakdown of BRD and potential Holocaust Revisionism at Roman Shukhevych unarchived 82751 127 65 12 2025-04-04 2025-05-23
ANI Newsjunkie Part 4 63786 88 25 30 2025-05-22 2025-06-04
ANI Disruptive editing from Wlaak 76230 124 46 9 2025-04-29 2025-05-15
ANI David Eppstein and Good Article Reassessment 168696 223 112 10 2025-05-08 2025-05-15
ANI Baseless accusations, incivility, and POV-pushing by User:TurboSuperA+ 97564 124 66 15 2025-05-07 2025-05-16
ANI IP editor User:46.97.170.73 violating BLP, bludgeoning, deleting other peoples comments, POV-warring, violating NPA/being extremely hostile and may be a sockpuppet 52504 95 76 9 2025-04-24 2025-05-07
ANI Davidbena and euphemisms for rape 116372 185 112 13 2025-04-09 2025-04-20
ANI Ethnic Assyrian POV-push 78070 71 35 15 2025-04-03 2025-04-23
ANI Continuously disruptive editing by User623921 95858 70 30 15 2025-03-27 2025-04-07
ANI Personal attack at Wikipedia talk:What Wikipedia is not 58115 119 53 14 2025-04-16 2025-04-19
ANI Disruptive Editing from User TarnishedPath 108664 191 90 17 2025-03-16 2025-03-26
ANI Transphobia from Ergzay 100052 200 96 17 2025-04-01 2025-04-05
ANI TurboSuperA+ closes 67632 88 64 9 2025-02-28 2025-03-09
ANI Harassment and attempted outing by User:CoalsCollective. 60325 70 43 8 2025-03-04 2025-03-09
ANI Non-neutral paid editor 192242 245 85 12 2025-01-16 2025-03-05
ANI Intimidation tactics, suppression and other violations from Simonm223 85072 100 58 9 2025-02-19 2025-03-05
ANI Bias and NOTHERE by Big Thumpus 62118 108 50 15 2025-02-13 2025-02-21
ANI WP:BATTLEGROUND & WP:PA by Cerium4B 100614 132 54 11 2025-02-05 2025-02-21
ANI User:Engage01: 2nd ANI notice 58458 90 35 10 2025-02-02 2025-02-08
ANI Off-site harassment from Anatoly Karlin 51665 66 21 19 2025-02-09 2025-02-11
ANI Kansascitt1225 53853 51 49 9 2025-01-26 2025-02-13
ANI Me (DragonofBatley) 126597 197 51 17 2025-01-14 2025-01-28
ANI User:Toa_Nidhiki05: WP:OWN and WP:BATTLEGROUND behaviour. 82047 86 34 12 2025-01-20 2025-01-29
ANI User:Moribundum: incivility and problem editing reported by User:Zenomonoz 69171 58 26 9 2025-01-28 2025-02-02
ANI Stalking from @Iruka13 52795 64 39 10 2024-11-13 2025-01-19
ANI Edit warring to prevent an RFC 94644 125 46 14 2025-01-05 2025-01-11
ANI Cross-wiki harassment and transphobia from User:DarwIn 146741 284 134 19 2024-12-29 2025-01-14
ANI Beeblebrox and copyright unblocks 62669 94 81 12 2025-01-12 2025-01-15
ANI User:Jwa05002 and User:RowanElder Making Ableist Comments On WP:Killing of Jordan Neely Talk Page, Threats In Lead 75257 139 48 10 2025-01-13 2025-01-17
ANI Incivility and ABF in contentious topics 143823 279 113 13 2025-01-04 2025-01-19
ANI User:Bgsu98 mass-nominating articles for deletion and violating WP:BEFORE 108540 168 66 14 2025-01-08 2025-01-17
ANI Complaint against User:GiantSnowman 55566 114 47 8 2025-01-05 2025-01-08
AE Your Friendly Neighborhood Sociologist 72266 70 43 4 2025-06-01 2025-06-22
AE Colin 120097 128 58 13 2024-12-12 2025-05-30
AE PadFoot2008 53185 52 30 8 2025-04-16 2025-05-08
AE Akshaypatill 72586 60 25 12 2025-02-27 2025-04-05
AE FMSky 62816 68 44 7 2025-03-22 2025-04-10
AE 3rdspace 56130 71 33 9 2025-03-09 2025-03-18
AE Toa Nidhiki05 58745 46 27 6 2025-02-04 2025-02-18
BLPN Edit request for BLPs on US federal judge birth dates 67754 136 51 15 2025-05-20 2025-06-06
FTN Society for Evidence-Based Gender Medicine 284859 313 148 17 2025-02-03 2025-05-26
FTN Pathologization of trans identities 292263 361 79 20 2025-02-07 2025-04-29
FTN Is WPATH the gold standard for research on trans healthcare in academia? 87862 108 71 11 2025-02-05 2025-04-15
FTN Puberty blockers in children 51122 53 47 7 2025-02-04 2025-02-21
NORN White Mexicans and blood type 57824 91 23 23 2025-01-28 2025-02-13
NPOVN Should we try to correct for reliable sources being systematically biased against Palestinians? 60639 103 50 15 2025-06-08 2025-07-06
NPOVN Geography map dispute 115161 230 48 20 2025-02-22 2025-04-11
NPOVN 2024 United States presidential election 76252 113 43 15 2025-01-09 2025-01-29
DRN Agent Carter (TV series) 50940 77 13 20 2025-05-21 2025-06-19
DRN Sonic the Hedgehog 3 (film) 50113 53 15 6 2025-04-04 2025-05-05
DRN Arameans 62177 37 18 6 2025-03-20 2025-03-27
DRN The Left (Germany) 52200 83 22 11 2025-03-07 2025-03-31
DRN Autism 353378 287 34 19 2024-12-20 2025-01-17

Notes

  1. ^ Please forgive me for not having time to find a literarily acclaimed short story that is in the public domain and constitutes more precisely ten thousand characters.
  2. ^ Technically, to count the number of timestamps in a discussion, which roughly equates to the number of user signatures, which roughly equates to the number of comments. Wikitext parsing is extremely difficult!
Signpost
In this issue
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

This discussion report shows very interesting trends like the longest discussions often having few distinct users. However, this information seems better suited for year-end issues, rather than appearing in every issue like the Signpost's Traffic Report. Knowing article viewership helps identify which articles are high-profile enough to warrant greater editor attention. Knowing which discussions are highly disputed attracts even more input that may be counterproductive to resolving disagreement between parties with the relevant knowledge. After all, we already alert editors to which discussions need broad consensus through WP:centralized discussion. ViridianPenguin🐧 (💬) 14:34, 18 July 2025 (UTC)[reply]

Total
number
of
myriabytes
1,000
2,000
3,000
4,000
0
.5
1
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
Size of bucket in myriabytes.

Here's the size of the buckets in the same units, which I've called myriabytes. Done in a rush so might need some tweaks. But the shorter buckets use more bytes. All the best: Rich Farmbrough 15:40, 18 July 2025 (UTC).[reply]




       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0