Please vote in the current ArbCom election, if you haven’t already. As of November 28, 1,483 voters have submitted a ballot, compared to 1,858 last year with 4 days left to vote. Ballots may be submitted until 23:59, 2 December 2019 (UTC).
Several indicators of Wikipedia’s progress will be celebrated over the next several weeks. The English-language Wikipedia is likely to mark its six-millionth article, sometime between New Year’s Eve 2019 and January 8, 2020. Wikipedia itself will mark its 19th birthday on Wikipedia Day, January 15, and The Signpost reaches its 15th birthday on January 10.
Like Wikipedia, The Signpost has been involved in a few disagreements but continues on an upward path. The Signpost, we are convinced, is the best place on Wikipedia for Wikipedians to write about, read about, and learn about Wikipedia. You can help us prepare for our upcoming birthday by contributing in many ways. No, we're not asking for money, but your participation in our little newspaper will ensure our continued success.
You can contribute in several ways:
Our system of writing and publishing is a combination of individual work and group effort. You will get credit via a byline in most cases, but at least one other editor will check your work, and help you with fact checking and copyediting. The editor-in-chief will then check that Wikipedia's rules have been followed.
What are the rules that apply to writing for The Signpost? This is a WikiProject like many others, such as WP:Military history or the National Register of Historic Places WikiProject. Within broad limits we set our own rules, like how an article is approved for publication or how the editor-in-chief is selected. It is important to remember that we are not writing encyclopedia articles for the mainspace, but writing journalism for a newspaper. Journalistic standards apply as well as Wikipedia rules. The policy on not including original research does not apply to Signpost articles. We always strive to be fair and accurate in our news articles, but occasionally the exact wording of the neutral point of view policy may not apply. We encourage opinion and humor pieces as well as news stories.
The policy on biographies of living persons does apply to all pages on Wikipedia, but this does not mean that we can't write about administrators, paid editors, or any other editors who put themselves in the public eye. If an article meets journalistic standards and the text would be acceptable on another WikiProject, at the administrators' incidents notice board, or during public Arbitration Committee proceedings, then it will be acceptable on The Signpost when approved by the editor-in-chief.
There is another way that our regular readers can contribute to The Signpost. Some Wikipedians love to argue vociferously and at length about what many people consider to be minor matters. Some writers find it difficult to accept that one of their submissions has been rejected. Others love to argue about grammar. It would do wonders for the morale of our staff if readers would occasionally let these folks know that we are volunteers contributing our time and just trying to do our best. Reader participation really is the key to our future success.
There are some types of "contributions" that we do not accept. For example, sometimes a subject of an article decides that they are better qualified to report on themselves than our reporter is. This almost never works out. If you are the subject of an article and anything the least bit controversial is reported, then the reporter will contact you for your side of the story. Letting the article subject write the article itself will likely deprive our readers of other sides of the story. If you'd like to write an opinion piece about yourself, you may contact the editor-in-chief, but this type of article is not in great demand.
A particularly obnoxious "contribution" that we will never accept is from those people who try to inject their point of view into a news article, or into some other author's opinion piece in the couple of hours just before publication. There are multiple aspects of articles we have to check and recheck before publication. Interfering with this process is at best obstructionism. Uninvited submissions are generally not accepted during the day before publication. Trying to edit war your opinions into an article at this time is a form of censorship, and is simply unacceptable.
So please do consider how you can best contribute to our upcoming birthday celebration. We appreciate your support.
There are currently 6,914,008 articles on Wikipedia. |
The English Wikipedia will reach six million articles around January 1 or perhaps a bit later, according to our estimate. Previous milestones are noted below.
Previous milestones | Date | Article |
---|---|---|
1 million | 1 March 2006 | Jordanhill railway station |
2 million | 9 September 2007 | El Hormiguero |
3 million | 17 August 2009 | Beate Eriksen |
4 million | 13 July 2012 | Ezbet el-Borg |
5 million | 1 November 2015 | Persoonia terminalis |
External videos | |
---|---|
Revolution 1 (slow version), (4:15) | |
The Beatles – Revolution (fast version), (3:27) |
Everybody wants to change Wikipedia in some way. Our model of knowledge production and distribution depends on it. Be bold! If you see something in the encyclopedia you don't like, change it. Many people want to change the Wikipedia model and use it for purposes other than building an encyclopedia. Good luck to them!
But not all change is good. This month saw examples of people striving to systematically change the content of our encyclopedia, Wikipedians and others trying to tweak the Wikipedia model of many small content contributors and many small financial contributors, and governments trying to dictate what an online encyclopedia should look like.
That would be completely correct except:
There are many opportunities to discuss bad news, problems, and concerns in the Wikiverse, and I think that having candid discussions about these issues is often important. Many days I spend more time thinking about problems than about what is going well. However, also I think that acknowledging the good side and taking a moment to be appreciative can be valuable.
I encourage you to add your comments about what's making you happy this month to the talk page of this Signpost piece.
Job openings
For your listening enjoyment
Images from Norway
Wikipedia for public good
Approaching a milestone
New Wikimedia affiliates
Wiki Loves Monuments national winners
Some of the national winners of the annual Wiki Loves Monuments competition that have been added to this page.
Humor
Even those who are experienced at public communications worry about making big mistakes. This video (YouTube link), from the British political satire Yes, Prime Minister, shows what happened when the Prime Minister learned that Sir Humphrey, the head of the Home Civil Service, made an indiscreet comment that was recorded by a BBC microphone. I feel some sympathy for Humphrey because occasionally I too say something that I wish that I hadn't!
"The Wait"
This is a short film regarding wildlife photography. The film has scenes of European bison in Romania. The narration is in French, and English subtitles are available. I think that wildlife photographers for Wikimedia Commons will find this film to be relatable. https://vimeo.com/180080686
Milestone on Arabic Wikipedia
Wikimedia Technical Conference reports
English Wikiquote of the day for 10 November 2019
All writers … have an obligation to our readers: it's the obligation to write true things, especially important when we are creating tales of people who do not exist in places that never were — to understand that truth is not in what happens but what it tells us about who we are. Fiction is the lie that tells the truth, after all.
— Science fiction and fantasy author Neil Gaiman of England
Recent featured media on English Wikipedia and Wikimedia Commons
Skillful translations of the sentence "What's making you happy this week?" would be very much appreciated. If you see any inaccuracies in the translations within this article then please {{ping}} User:Pine in the discussion section of this page, or boldly make the correction to the text of the article. Thank you to everyone who has helped with translations so far.
What's making you happy this month? You are welcome to write a comment on the talk page of this Signpost piece.
Two requests for arbitration committee cases were filed at Wikipedia:Arbitration/Requests/Case in November. One was withdrawn and one has been accepted.
A new request, "Drmies salting", was initiated by Wumbolo on November 9. The request was about a mainspace article that has never been created
and has been salted by an administrator in order to prevent its creation. In place of the actual title, the placeholder "XYZ" was used in the request. The userpage note left for the filing party in conjunction with the salting has been oversighted and The Signpost has no further details on the page's contents.
The case request was closed as "withdrawn" by a clerk on November 10. The same day, Wumbolo was oversight blocked indefinitely. The Signpost has no further details on the reason for the user block.
A new request, "Conduct in portal space and portal deletion discussions", was initiated by ToThAc on November 18. ToThAc described the issue as follows:
As summarized in Robert McClenon's essay on issues surrounding portals, the necessity of portals in general has been heavily debated over the course of several months. In April 2018, The Transhumanist started an RfC on deprecating portals, which was closed with a rough consensus to not delete all portals.
The complainant said that despite the prior RfC, uncivil discussion of individual portal creations and deletions has ensued, and named 20 other involved parties.
The case was accepted and opened, with arbitrator Joe Roe commenting This has proved to be a long-running and intractable dispute.
Similar to last week, Star Wars is present (#1), another TV show makes an appearance at #2. Unlike last week, however, there are an ungodly number of royals peppering this list – in fact, another TV series, The Crown, is responsible (#3, #4, #5, #8, #9).
Rank | Article | Class | Views | Image | About |
---|---|---|---|---|---|
1 | The Mandalorian | 1,707,701 | Similar to another time this list was heavily dominated by one subject, a Star Wars-related subject takes top ranking; however, this time that wasn't the primary topic on the list. Disney+'s debut, accompanied by this original TV series set in the Star Wars universe, has received positive ratings. | ||
2 | Caitlyn Jenner | 1,570,820 | Formerly known as decathlon gold medalist and Kardashian–Jenner patriarch Bruce Jenner, Caitlyn has decided to go across The Pond and survive in the jungle on the reality show I'm a Celebrity...Get Me Out of Here! | ||
3 | Princess Margaret, Countess of Snowdon | 1,492,698 | The Crown has returned, and thus again there's a views spike for the daughters of George VI. Given the show has skipped from the 1940s to the 1960s, the actresses have changed to two women who have played crazy queens: Margaret is now Red Queen Helena Bonham Carter and Elizabeth changed to Queen Anne Olivia Colman. | ||
4 | Elizabeth II | 1,266,912 | |||
5 | Aberfan disaster | 1,212,970 | Not everything The Crown brings into this list is old or dead royalty, it turns out. | ||
6 | Fiona Hill (presidential advisor) | 1,156,864 | Americans are divided into two factions right now: those who are eagerly hanging onto every word of the impeachment inquiry, and those who would like to end all of those responsible for the 24-hour news cycle and all the prolonged impeachment inquiries on it. The first group has propelled this article to its position here, as they rush to find out who the hell this person is anyway. | ||
7 | Frozen II | 1,113,692 | This list has something for the child inside all of us, for if this isn't your style there is, of course, Fred Rogers down at #23, having been portrayed by Tom Hanks (and as Weird Al said, nobody doesn't like Tom Hanks!) Here, though, Elsa and Anna return to travel on a magical, icy journey to discover happy things, because Disney likes happy things. | ||
8 | Princess Alice of Battenberg | 1,073,049 | More The Crown. One would think people wouldn't Wikipedia the name of the person in a TV show they're watching, for that would count as a spoiler – but that doesn't seem to be the case. In an alternate universe, this show was probably why spoiler warnings were deprecated: a lengthy RfC concluded with the consensus "no we will not put spoiler tags on an actual real-used to be alive person, and y'all can't figure out how to do that, so no more spoiler tags". | ||
9 | Harold Wilson | 1,072,674 | The Prime Minister during the period portrayed in the latest season of The Crown. | ||
10 | Prince Andrew, Duke of York | 1,008,200 | With the multitude of other royals brought here by The Crown (#18, and a third of the rest) one might think this is simply another case of TV fever, but no, Jeffrey Epstein (#13) and his entourage of criminals brought a royal down with them. Andrew's allegations that he couldn't possibly have been the person his accuser referred to – for he simply couldn't sweat, and she said he did! – were shot down by some photos (and I hate the Mirror too, don't worry) and little princey's birthday was cancelled, not to mention the whole "being kicked out of Buckingham Palace" issue. |
Even if late due to a delay with the WP:5000, the Report is actually early with a topic: Star Wars got a #1 one month before it is supposed to with the Disney+ (#16) series The Mandalorian (#1, #11), and is also present in a new video game (#20). Aside from eight pages that remained from last week, there's politics (#7), television (#8).
Rank | Article | Class | Views | Image | About |
---|---|---|---|---|---|
1 | The Mandalorian | 1,570,155 | Proving that in spite of a divisive Episode VIII and an underwhelming spin-off, Star Wars still moves the masses: a Disney+ (#16) series by Jon Favreau starring a bounty hunter managed to top our list. | ||
2 | Joker (2019 film) | 1,029,042 | The list of movies which have grossed $1 billion in 2019 finally has a non-Disney production with this take on the clown supervillain. Readers must also be curious about the "Future" section, given how rarely studios decide not to follow such acclaim and popularity with sequels. | ||
3 | Death Stranding | 803,135 | Hideo Kojima doesn't need Konami to make successful video games, as this PlayStation 4 action game set in a post-apocalyptic USA, whose cinematic values go down to having actors such as Norman Reedus, Mads Mikkelsen and Léa Seydoux, got great reviews and good sales. | ||
4 | Deaths in 2019 | 723,908 | I never thought I'd die alone I laughed the loudest, who'd have known... | ||
5 | Henry V of England | 669,639 | Henry the Fifth at fifth place, how appropriate. The high views are due to the Netflix movie The King. | ||
6 | John Demjanjuk | 587,622 | Still on Netflix, documentary series The Devil Next Door tells the story about a Ukrainian-American autoworker who allegedly worked as a guard at Nazi extermination camps during World War II. | ||
7 | Marie Yovanovitch | 585,578 | Yovanovitch was the United States Ambassador to Ukraine. Trump's involvement with that country's government is having repercussions for him. Yovanovitch was ousted after an alleged smear campaign, and is now testifying in the impeachment inquiries against Trump. | ||
8 | Rick and Morty (season 4) | 547,199 | Adult Swim has debuted the newest episodes of this time-and-space-hopping cartoon starring a mad scientist and his grandson. | ||
9 | Doctor Sleep (2019 film) | 544,211 | Decades after The Shining, a traumatized Danny Torrance (now played by Ewan McGregor) tries to save a girl with similar powers from people who literally feed on said "Shining" kids. Doctor Sleep got good reviews for being atmospheric, well-acted and spooky (even if, on this writer's opinion, a bit long and slow), but has struggled on the box office, having barely recouped its $55 million budget worldwide. | ||
10 | Jeffrey Epstein | 523,050 | Jeffrey Epstein has become the Internet's newest "tree-fiddy" – wedged into every unexpected nook and cranny, the message awaits: "Epstein didn't kill himself". While the real-life events that transpired in his cell that night remain a matter of conjecture, the popular opinion is certainly clear. |
The eleventh month started spearheaded by the subjects of two streaming productions, meaning Netflix is the cause of boosted views for 15th century English kings and World War II Soviets working with the Nazis. More history is found in the 30th anniversary of the Berlin Wall's fall (#14), men who inspired holidays (#24) and Google Doodles (#21), a land dispute to be settled (#12) and a battle to be documented by Bollywood (#7). Speaking of movies, #3 and #5 are the same Hollywood blockbusters from last week(s), now joined by a horror flick (#15) and an actor (#10) who found love (#9) years after a tragedy (#19). The recently deceased (#4, #11), video games (#6), books adapted by HBO (#13), MMA (#18, #20), politicians from both sides of the Pond (#16, #17), and a changed landmark (#23) close the list.
Rank | Article | Class | Views | Image | About |
---|---|---|---|---|---|
1 | Henry V of England | 1,803,080 | One of the royal Henrys chronicled in Shakespeare's plays, which in turn are now adapted in the Netflix production The King. | ||
2 | John Demjanjuk | 948,250 | Still on Netflix, the documentary series The Devil Next Door tells the story about a Ukrainian-American autoworker who allegedly worked as a guard at Nazi extermination camps during World War II. | ||
3 | Joker (2019 film) | 940,728 | 10 years after the Batman movie that made everyone just want to talk about the Joker broke a billion dollars worldwide, a movie just about Gotham's clown sociopath is nearing a ten digit gross as well. Seems like everyone wants to dance with the devil in the pale moonlight. | ||
4 | Jeffrey Epstein | 918,694 | The possibility that the deceased pedophile financier was killed instead of having hanged himself has become an online meme. | ||
5 | Terminator: Dark Fate | 916,532 | In spite of being better than what you'd expect from a movie with a 63 year old female gunslinger and a 72 year old killer robot, Terminator: Dark Fate has not enthralled audiences so much (it opened atop the box office but now has fallen to #5, and only broke $200 million worldwide so far) and possibly won't get any follow-ups. As a fan of this series even if I had objections to some things in Dark Fate, it saddens me to see the franchise terminated. | ||
6 | Death Stranding | 796,900 | Hideo Kojima is back (while Konami continues to neglect his best known work) with this PlayStation 4 title whose cinematic values go down to having actors such as Norman Reedus, Mads Mikkelsen and Léa Seydoux. | ||
7 | Third Battle of Panipat | 796,729 | Bollywood released the first trailer for Panipat, which in December will re-enact this 1761 confrontation against an invading Afghan army. | ||
8 | The King (2019 film) | 791,674 | Timothée Chalamet plays our #1 in this Netflix movie. | ||
9 | Alexandra Grant | 771,172 | Possibly the most popular actor of the year, Keanu Reeves has reportedly been dating an artist with whom he has already written two books, making all his fans very happy that "Sad Keanu" might be a thing of the past. | ||
10 | Keanu Reeves | 761,249 |
A new beta feature has been deployed which allows you to preview references by hovering over the inline footnote. Reference Previews display the reference and its type in a popup with a link to navigate to the reference. Similar functionality has been available through gadgets on several wikis, such as Navigation popups and Reference Tooltips.
Several userscripts stopped working suddenly on October 21, as reported on the technical Village Pump. This was due to code being deprecated and removed over a shorter timeframe than usual and without much forewarning. Following this incident, XFDcloser was made into a gadget.
The Contributions special page had its limit of 5,000 results per page decreased to just 500. The lower limit was implemented to prevent potential denial of service attacks due to the impact of certain queries with long date ranges.
Extraneous semicolons started appearing on the Watchlist and other listings on November 8. The issue arose from improvements to the mobile watchlist that unexpectedly impacted the desktop view.
The Community Wishlist Survey 2020 is open for voting until December 2nd. The WMF's Community Tech team, as per previous years, will work on the top wishes decided on by the community. Unlike previous years, this survey is exclusively for the smaller sister projects, with wishes for Wikipedia, Commons, and Wikidata specifically excluded. There are a total of 72 wishes in the survey, mostly for Wikisource, Wiktionary, and Wikiversity.
{{xx icon}}
templates (and their redirects) with {{in lang}}
Latest tech news from the Wikimedia technical community: 2019 #45, #46, #47, & #48. Please tell other users about these changes. Not all changes will affect you. Translations are available on Meta.
MediaWiki:ipb-default-expiry
can set the default length to block a user for your wiki. You will be able to use MediaWiki:ipb-default-expiry-ip
to set a different default block length for IP editors. [1]Revisiting last December's "Sun and Moon, water and stone" solstice theme, we present some interesting and unusual winter and holiday images. We hope you enjoy them as much as we did.
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
A paper titled "The Roles Bots Play in Wikipedia", published in Proceedings of the ACM on Human-Computer Interaction by five researchers from the Stevens Institute of Technology[1] was presented at this month's CSCW conference. Bots are a core component of English Wikipedia, and account for approximately 10 percent of all edits as of 2019. After retrieving all 1,601 registered bots (as of 28 February 2019), the researchers used a procedure involving machine learning to organise them into a taxonomy with nine key "roles":
Some bots act in several roles (e.g. AnomieBOT as Tagger, Clerk and Archiver).
The last part of the paper concerns the impact of bots on new editors that they interact with. Extending previous research that had found increased retention for newbies who were invited to the Teahouse support space by HostBot, an "Advisor" bot, the researchers show that other Advisor bots have a significant positive effect as well (although in the example cited, SuggestBot, they may have wanted to mention as a confounding factor that users need to opt into receiving its messages). Likewise confirming previous research, messages from ClueBot NG were found to have a negative effect, but this wasn't the case for other "Protector" bots: "The newcomers seem to not care about the bot signing their comments (SineBot) and are even positively influenced by the bot reverting their added links that violate Wikipedia’s copyright policy (XLinkBot)."
A press release, titled "Rise of the bots: Team completes first census of Wikipedia bots", quoted one of the authors as saying "People don't mind being criticized by bots, as long as they're polite about it. Wikipedia's transparency and feedback mechanisms help people to accept bots as legitimate members of the community."
The authors note the relevance of Wikidata to their study, where the proportion of bot edits "has reached 88%" (citing a 2014 paper), and find that the move of interlanguage link information to Wikidata lead to a decrease in "Connector" bot activity on Wikipedia. At last year's CSCW, a paper titled "Bot Detection in Wikidata Using Behavioral and Other Informal Cues"[2] had presented a machine learning approach for identifying undeclared bot edits, showing that "in some cases, unflagged bot activities can significantly misrepresent human behavior in analyses". In the present study about Wikipedia, it would have been interesting to read whether the authors see any limitations in the data source they used (Category: All Wikipedia bots).
A paper in PLoS Biology[3] uses Wikipedia pageview data for "the first broad exploration of seasonal patterns of interest in nature across many species and cultures". Specifically, the researchers looked at the traffic for articles about 31,751 different species across 245 Wikipedia language editions. They found "that seasonality plays an important role in how and when people interact with plants and animals online. ... Pageview seasonality varies across taxonomic clades in ways that reflect observable patterns in phenology, with groups such as insects and flowering plants having higher seasonality than mammals. Differences between Wikipedia language editions are significant; pages in languages spoken at higher latitudes exhibit greater seasonality overall, and species seldom show the same pattern across multiple language editions." Seasonality was often found to "clearly correspond with phenological patterns (e.g., bird migration or breeding...)", but in other cases also to human-made events such as annual holidays. For example, traffic for the English Wikipedia's article on the wild turkey (Meleagris gallopavo) spiked during Thanksgiving in the US, and saw a softer peak during "the spring hunting season for wild turkey in many US states."
Overall, articles about plants and animals exhibited seasonality much more often than articles about other topics. (Concretely, 20.2% of the species articles in the dataset were found to have seasonal traffic, compared to 6.51% in a random selection of nonspecies articles. One quarter of species had a seasonal article in at least one language. Technically, seasonality was determined via a method that involved, among other steps, fitting the pageviews time series to a sinusoidal model with one or two annual peaks, using a manually defined threshold.)
See also earlier coverage of a related paper involving some of the same authors: "Using Wikipedia page views to explore the cultural importance of global reptiles"
"How Does Editor Interaction Help Build the Spanish Wikipedia?" by Taryn Bipat, Diana Victoria Davidson, Melissa Guadarrama, Nancy Li, Ryder Black, David W. McDonald, and Mark Zachry of University of Washington, published in the 2019 CSCW Companion, examines talk page discussions in Spanish Wikipedia with a specific eye to how they might be different from the types of interactions in English Wikipedia.[4] It replicates work from ACM GROUP 2007 that had developed a classification scheme for how editors use policy to discuss article changes.[supp 1]
This is a short paper so it does not have the depth of work you would expect in a full-length conference paper, but the authors select 38 talk pages from Spanish Wikipedia (presumably using the methods from the original work, which focused specifically on talk page conversations that involved high levels of conversation over the course of a month) and code them based on how often policies are linked to and in what context the policies are being linked to. The contextual codes that are applied are: "article scope", "legitimacy of source", "prior consensus", "power of interpretation", "threat of sanction", "practice on other pages", and "legitimacy of contributor". They find that "power of interpretation" and "article scope" are the most-used strategies, followed by "legitimacy of source". They also found a number of examples of editors linking to English Wikipedia pages.
While I would love to see a more robust analysis comparing English and Spanish talk pages that were sampled with the same strategy and from the same time periods, this work is an example of much-needed analyses of how the frameworks and models that are designed for one language community do or do not apply to other language communities. It would be fascinating to further understand the degree to which editors who are active across multiple languages adapt their discussion strategies to the local community versus apply similar strategies across all communities.
In this article,[5] three researchers from China present "a system dynamic model of Wikipedia based on the co-evolution theory, and [investigate] the interrelationships among topic popularity, group size, collaborative conflict, coordination mechanism, and information quality by using the vector error correction model (VECM)."
These five factors ("PSCCQ") are each represented by a monthly time series:
In the paper, they are analyzed for the English Wikipedia's article on global warming, for the timespan of February 2004 to November 2015. First, the researchers apply Granger causality tests to identify which of the five variables tend to predict which, resulting in the depicted graph. E.g. popularity is predicted by coordination (number of talk page discussions, as the only factor in this case), indicating perhaps that Wikipedia editors tend to be quicker to debate new information about global warming than the general public will take it as occasion to look up global warming on Google. Furthermore, the authors calculate the impulse response functions for each of the 20 possible pairs. In the above example, this indicates how the popularity measure tends to "react" to a given increase in coordination. The application of a third technique, forecast error variance decomposition, further corroborates the results about how the five variables relate to each other.
The study presents two quite far-reaching takeaways from the relations it identified between the five factors:
An obvious limitation of this research, only somewhat coyly mentioned in the paper, is its restriction to a single article (and only one Wikipedia language version). While an effort is made to justify the choice of global warming as a high-traffic page with a substantial amount of controversies, it remains unclear how much the takeaways can be generalized.
See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
This paper[6] found that on English Wikipedia talk pages, about 22% more uncivil messages originate from impacted regions on the Mondays following the shift to daylight saving time.
From the abstract:[7]
We analyze the relationship between the structural properties of WikiProject coeditor networks and the performance and efficiency of those projects. We confirm the existence of an overall performance-efficiency trade-off, while observing that some projects are higher than others in both performance and efficiency, suggesting the existence factors correlating positively with both. [...] Our results suggest possible benefits to decentralized collaborations made of smaller, more tightly-knit teams, and that these benefits may be modulated by the particular learning strategies in use."
From the abstract:[8]
"... we use [the] existing Web Traffic Time Series Forecasting dataset by Google to predict future traffic of Wikipedia articles. [...] we built a time-series model that utilizes RNN seq2seq mode [sic]. We then investigate the use of symmetric mean absolute percentage error (SMAPE) for measuring the overall performance and accuracy of the developed model. Finally, we compare the outcome of our developed model to existing ones to determine the effectiveness of our proposed method in predicting future traffic of Wikipedia articles."
From the abstract:[9]
"we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. [...] To demonstrate [our approach's] efficiency, we apply it to two datasets: Enron Email dataset and Wikipedia page views. We show that the anomalous spikes are triggered by the real-world events that impact the network dynamics. Besides, the structure of the clusters and the analysis of the time evolution associated with the detected events reveals interesting facts on how humans interact, exchange and search for information ..."
This paper from CSCW 2017[10] "replicates, extends, and refutes conclusions" of a paper by Yasseri et al. that had received wide and prolonged media attention for its claims that Wikipedia bots are fighting each other (cf. previous review: "Wikipedia bot wars capture the imagination of the popular press - but are they real?").
From the abstract:[11]
"We propose the construction of a Digital Knowledge Economy Index, quantified by way of measuring content creation and participation through digital platforms, namely the code sharing platform GitHub, the crowdsourced encyclopaedia Wikipedia, and Internet domain registrations and estimating a fifth sub-index for the World Bank Knowledge Economy Index for [the] year 2012."
From the abstract:[12]
"This paper will discuss a technical solution [...] for faster linking across databases with a use case linking Wikidata and the Global Biotic Interactions database (GloBI). The GUODA infrastructure is a 12-node, high performance computing cluster made up of about 192 threads with 12 TB of storage and 288 GB memory. Using GUODA, 20 GB of compressed JSON from Wikidata was processed and linked to GloBI in about 10–11 min. Instead of comparing name strings or relying on a single identifier, Wikidata and GloBI were linked by comparing graphs of biodiversity identifiers external to each system. This method resulted in adding 119,957 Wikidata links in GloBI..."
From the abstract:[13]
"We study aggregated clickstream data for articles on the English Wikipedia in the form of a weighted, directed navigational network. We introduce two parameters that describe how articles act to source and spread traffic through the network, based on their in/out strength and entropy. From these, we construct a navigational phase space where different article types occupy different, distinct regions, indicating how the structure of information online has differential effects on patterns of navigation. Finally, we go on to suggest applications for this analysis in identifying and correcting deficiencies in the Wikipedia page network that may also be adapted to more general information networks."
This paper[14] aims to understand two paradigms of information seeking in Wikipedia: search by formulating a query, and navigation by following hyperlinks.
Adminitis is a state of mind in which some Wikipedians find themselves at times. Though generally confined to administrators, the condition has been observed in some non-administrators. Although the exact causes are unknown, there is thought to be some correlation towards extensive and prolonged anti-vandalism activity. The mystery is that that form of disease is not only found in humans, but in all the creatures of the planet (even to things created from humans, the most common case being cars and refrigerators, although the symptoms found on items vary significantly from the ones that appear in humans) (except plants (yet)), and the way these things are showing the symptoms of adminitis is yet unknown. Adminitis is being studied by very many scientists (especially doctors) but the results from these studies only confuse the situation. It has been ranked the top most dangerous disease in the entire Wikiverse by the Community Health Initiative.
It is generally advised that when you see anything suffering from adminitis, human being or anything else, call the nearest hospital (for humans), the nearest service center (for items) or the local building inspector (for buildings). If they do not respond, immediately call the local police department to report a public health hazard, because the entity with adminitis may harm or infect others.
It is a near-universal truth that sufferers from this illness will reject any diagnosis of the condition by an outside party. With that in mind, it's important for those who have received this diagnosis to conduct a self-test. If more than three of the following apply to you, you may be suffering from this illness:
The infectious nature of this illness is unknown at this time. More study needs to be conducted in order to identify transmission mechanisms. No effective containment mechanisms have been identified. So far anti-vandal bots have proved immune to this condition, although as yet there is no convincing explanation concerning this anomaly.
Observational studies have noted that sufferers will seek the counsel of their Wikipedia friends, but end up infecting them in the process. Other studies have noted the evolution of a Wikipedia editor as potentially having a causal role in this illness.
Treatment varies from case to case. As of 2024, no consensus exists on the best methods for recovery. Some methods that have been used:
Mortality rates from this illness have not been clearly supported by research. Frequently, what appears to be a wikideath becomes a wikiresurrection in the form of a user who is considerably more circumspect, and often more detached from processes. Some resurrectees have exhibited shortness of temper, but with considerably lower flameout levels. Reinfection is rare, but if it occurs is almost always unrecoverable.
If the initial condition is not fatal, it may take months for a patient to recover, even if under the care and treatment of WP:ARBCOM.
Well, this interview aged quickly. So what has changed? What does spam look like nowadays on Wikipedia?
Firstly, I don't know if linkspam in all its forms has increased or not since them. It is no longer economical for me to spend time pursuing it.
I spend my time dealing with undisclosed paid editing instead. UPE is an imprecise term. A better one is covert advertising – the insertion of advertisements that very closely mimic the format of legitimate encyclopedic articles written by volunteers. It is irrelevant whether disclosure is made per the Terms of Use because there is no indication whatsoever to the casual reader that editors have been paid for in both cases. A reader would need to check all of the page history, the talk page and the user pages of all significant contributors to the article in order to determine whether content is paid for. The disclosure requirement is therefore completely pointless for the casual reader.
The most obvious form of UPE involves the creation of articles that would not otherwise warrant inclusion. Long term contributors may remember when Wikipedia:Conflict of interest was titled Wikipedia:Vanity page. This is exactly the functionality these "articles" serve. Ghostwritten vanity pages are designed explicitly to show up on the first item and the sidebar of a Google search, but are difficult for Wikipedians to find and, if found, to evaluate the notability of their subject. Spam is less about Viagra or Cialis, and more about early-stage startups, businesspeople, motivational speakers, cryptocurrencies and so forth.
There are numerous companies that offer ghostwritten vanity pages for a small amount of money, typically a few hundred dollars. These companies employ freelancers in English speaking Third World countries who have very few opportunities for legitimate employment. In fact, similiar dishonest activities such as running a fake news website or writing for an essay mill turn out to be quite lucrative, in purchasing power parity terms, for the freelancers concerned.[1][2]
The level of abuse is systematic, pervasive, and of increasing sophistication. The worst spammers have taken on characteristics of advanced persistent threats, including the use of compromised computers, VPNs and cloud computing infrastructure to post spam. There are no effective admin tools. Two new page patrollers, who screen newly created articles for notability and other problems, have been blocked for corruptly reviewing spam last week (Meeanaya and Ceethekreator). It is only a matter of time before paid editors systematically infiltrate the admin corps.
Much of the increase in spamming is a consequence of Wikipedia's own success. However, a large portion of the blame lies squarely with the Wikimedia Foundation. The WMF places significant emphasis in materials targeted at donors on crude metrics of content quantity and community size simply because that is what the WMF thinks donors want to hear.[3] The WMF therefore faces incentives very similar to Facebook and Google. Social media sites tolerate a high level of bots, Russian trolls and spammers because fake accounts pad their key metrics of monthly active users and ad impressions, giving the illusion of growth and making them look good in the eyes of their customers (advertisers) and investors. Similar emphasis is put by the WMF (and Facebook) on outreach efforts in the poor countries that are the source of much of the spam, despite multiple past high-profile failures, again because the WMF thinks donors want to see desperate, impoverished people in sub-Saharan Africa being helped.[4][5] A few extra vanity pages and sockpuppets certainly help the WMF look good in their pitch to donors.
The WMF does not sufficiently care about our admin tools being fit for purpose.[6] Like Facebook, Youtube and Google before recent scandals, investments in content moderation are seen as purely a cost[7][8] while "initiatives" that provide feel-good anecdotes for donors or increase donor-targeted metrics and hence increase donations are heavily prioritized. The WMF deserves nothing but utter condemnation and scorn for the complete lack of maintenance, let alone investment, in the code underlying the administrator toolset. A seemingly simple task such as adding a checkbox to the delete form that deletes the associated talk page requires nothing less than a fundamental rewrite of the relevant code.
The fight against spam is nothing short of an existential battle against the degeneration of this encyclopedia into a large set of vanity pages about attention-seeking subjects. And we're losing.
This week, we spent some time with WikiProject Spam. The project describes itself as a "voluntary Spam-fighting brigade" which seeks to eliminate the three types of Wikispam: advertisements masquerading as articles, external link spam, and references that serve primarily to promote the author or the work being referenced. WikiProject Spam applies policies regarding what Wikipedia is not and guidelines for external links. The project received some help in February 2007 when the English Wikipedia tagged external links as "NOFOLLOW", preventing search engines from indexing external links and limiting the incentive for many spammers to use Wikipedia as a search engine optimization tool. The project maintains outreach strategies, detailed steps for identifying and removing spam, a variety of search tools, several bots for detecting spam, and a big red button to report spam and spammers. The project was started by Jdavidb in September 2005 and has grown to include 371 members. One of the project's most active members, MER-C, agreed to show us around.
How much time do you typically devote each week to fighting spam?
WikiProject Spam is the most active project by edits (including bots) and the second most watched project on Wikipedia. What accounts for this high activity and interest by the Wikipedia community?
What type of wikispam do you come across most often? Do you use any special tools to detect spam or do you simply remove spam you notice while reading and editing articles?
wikipedia-en-spam
(don't go there yet, it's not currently working) and others. User:XLinkBot, a spam reversion bot, and User:COIBot use this channel as their source of link additions. Reports are triggered when a small group of users are responsible for a large fraction of link additions to a particular site or can be requested through IRC or User:COIBot/Poke (administrators and trusted users only).Have you had any heated conversations with spammers after removing spam from an article? What are some strategies you've used to resolve these conflicts?
Has your experience fighting spam resulted in any humorous stories? Have you heard any amusing excuses and special pleading from spammers trying to defend their edits?
Risker has held multiple positions within the Wikimedia community and is a member of the Roles & Responsibilities strategy working group.
FULBERT has worked with several WikiEdu programs and is a member of the Capacity Building strategy working group.
Jackiekoerner holds a doctorate in Higher Education and is a member of the Community Health strategy working group.-S
Organizations and movements develop a strategic plan to guide their activities and planning over an extended period. A strategic plan helps the parts of the movement to work together to achieve overall goals. The last Wikimedia movement strategy covered 2010-2015. Since then there's been no consistent, global direction to guide the movement. The absence of a high-level plan creates challenges for different parts of the movement to work together toward shared goals. The movement began to address this gap in 2017, when the 2030 strategic direction was developed with community consultation, and was endorsed by many organized movement groups and individual contributors. The Wikimedia Foundation has been the financial sponsor of this process.
After the strategic direction was defined, nine working groups were formed to focus on different strategic areas, and started their work in mid–2018. Extensive workshops and sessions were held at the Wikimedia Summit in March 2019 and each group carried out research, consultations, community conversations, and formulated ideas that led to the first iteration of their recommendations.
There were "strategy salons" held around the globe, both in-person and online, which generated ideas for the working groups to consider and incorporate into their recommendations. Almost 90 recommendations were developed by the working groups, released in mid-2019 for further discussion within the community. Each group presented and workshopped its draft recommendations at Wikimania in August 2019.
Both contract and volunteer strategy liaisons worked with online communities, affiliates, and working groups, and held two regional conferences in East Africa and the East, Southeast Asia and the Pacific Regional Cooperation.
Developing a longterm strategy is difficult even in straightforward circumstances. Doing so is even more challenging for a global volunteer movement that values diversity and community input, and also values knowledge-sharing and high quality information. Every working group received feedback from both organized and informal movement groups, as well as consultants, the coordination team, and of course individual community members. That feedback was considered, and was taken into account as the working groups prepared their second round of recommendations in preparation for the harmonization meeting in September 2019.
The participants of the September harmonization meeting refined key principles and identified groups of similar recommendations, but the session did not result in a fully synthesized set of draft recommendations. The working groups finished their work at the beginning of November. Some members of the working groups volunteered to complete the written draft recommendations, and this synthesis is ongoing.
Once the draft recommendations are written, other members of the former working groups will review the document. Other working group members will be going through all of the accumulated research, consultation, and feedback to ensure that key points have been addressed in the synthesized set of recommendations. In January 2020, a further round of conversations with the movement will review the proposed recommendations prior to final revisions before submission to the WMF Board of Trustees.
Early next year, once the draft recommendations are public, the Strategy Core Team will reach out to the English Wikipedia to review the recommendations and understand what proposed changes would be relevant to this community. Community members from all areas of Wikimedia will be invited to participate in this round of conversations, which will start in January 2020. The invitations will be posted on noticeboards, mailing lists and other key community discussion points. Discussions will likely take place in a centralized location, although this process has not yet been finalized.
Let's say you are interested in how many active editors from France are editing the English-language Wikipedia; or conversely, you'd like to know how many editors from the UK are editing the French-language Wikipedia. All the necessary information needed to calculate these numbers is recorded, at least temporarily, by the Wikimedia Foundation, but unless you worked for the WMF and had access to the Geoeditors Monthly database you could never find those numbers. The WMF did not wish to disclose this data out of concerns that the numbers were precise enough that governments or others could back out material that might lead to the identification of individual editors.
This month a new dataset was made public by the Wikimedia Foundation Geoeditors/Public, or more informally Active Editors by country. It allows the public to see, more or less, how many active editors (5–99 edits in a month) and very active editors (100+ edits) from about 180 individual countries contribute to active Wikipedia versions, each month from January 2019 onward. For example, if you wanted to know how many people editing from the UK made more than 99 edits to the French version of Wikipedia in September, you can look it up in this dataset. The answer is somewhere between 11 and 20.
Because of privacy concerns exact numbers are not given. Data from 30 countries are excluded, e.g. China, Kazakhstan, Russia, Saudi Arabia and Venezuela. Exact data on the number of editors in each category (editors from country x who edited Wikipedia version y) are not given. Rather these numbers are only given in “buckets” of ten: 1–10, 11–20, 21–30, 31–40, etc. Technical information is available here. The data are available here.
But enough for the preliminaries! What questions can the dataset answer that I’ve been dying to know the answer to? The following analysis is only the briefest overview of data from one month, September, quickly done. It’s not in any sense academic research, but hopefully will allow people to understand what type of data the dataset contains and what type of questions it can be used to address.
My main questions – of personal interest – are:
Table 1 shows the 11 countries with the most active editors and the 11 with the most very active editors to enwiki (14 countries total), plus two other large English-speaking countries, Ireland and South Africa. Numbers marked * are not in the largest 11.
Editors from | Editors with 100+ edits (lower bound) |
% of total reported |
Editors with 5–99 edits (lower bound) |
% of total reported |
---|---|---|---|---|
United States | 1,881 | 42.9% | 25,401 | 41.0% |
United Kingdom | 731 | 16.7% | 7,491 | 12.1% |
Canada | 271 | 6.2% | 3,321 | 5.4% |
Australia | 231 | 5.3% | 2,491 | 4.0% |
India | 191 | 4.4% | 5,241 | 8.5% |
Germany | 121 | 2.8% | 1,281 | 2.1% |
Philippines | 81 | 1.8% | 1,021 | 1.6% |
Netherlands | 61 | 1.4% | 621* | 1.0% |
Italy | 51 | 1.2% | 831 | 1.3% |
New Zealand | 51 | 1.2% | 441* | 0.7% |
Sweden | 51 | 1.2% | 431* | 0.7% |
France | 41* | 0.9% | 791 | 1.3% |
Ireland | 41* | 0.9% | 661* | 1.1% |
Spain | 41* | 0.9% | 681 | 1.3% |
Brazil | 31* | 0.7% | 721 | 1.2% |
South Africa | 21* | 0.5% | 291* | 0.5% |
Total (in table) | 88.8% | 83.6% |
The countries with the most very active editors in enwiki are the US (43%) and the UK (17%) , or almost 60% of the total reported editors between them. The two large rich countries predominate. Two rich but less populous countries, Canada and Australia, are also well-represented with almost 12% of the total very active editors between them.
The much smaller but still relatively rich New Zealand and Ireland, with about 1% of the total reported very active editors each, trail among those countries where English is the predominant first language.
The proportion of native English speakers by country is shown at English language#Pluricentric English. The four countries with the largest native English-speaking populations are also the largest four contributors to enwiki – in the same order: USA, UK, Canada, and Australia.
India, which has the 5th largest group of very active editors (4%) and third largest group of active editors (9%), has a very large population, for whom English is an important medium of instruction but the first language of only a small fraction. The Philippines, with nearly 2% of the reported very active editors, may be affected by similar factors as India. The percentages of reported active editors (5–99 edits) appear to be similar to the percentages for very active editors.
Six rich European Union countries where English is not the mother tongue, Germany, the Netherlands, Italy, Sweden, France and Spain, together account for 8.4% of the reported very active editors. Of the countries in this table, only the rankings of Brazil and perhaps South Africa do not appear to be directly explained by the three factors of mother tongue, population, and wealth.
Table 2 shows analogous rankings for the Spanish language Wikipedia. While Spain and Argentina combine for slightly over half of the reported very active editors, the very active editors are distributed more evenly over all the reported countries. Only one country without Spanish as its predominant language, the United States, has a fairly large proportion of the very active editors. The same three factors that seem to explain the rankings for enwiki editors, mother tongue, population, and wealth, may very well explain the rankings for eswiki as well.
Nevertheless, wealth – or perhaps dialect – may be playing a stronger role in eswiki than it does in enwiki. The 12 largest countries by native Spanish-speaking population are, in order, Mexico, Colombia, Spain, Argentina, the United States, Venezuela, Peru, Chile, Ecuador, Cuba, Guatemala, and the Dominican Republic. Note that Venezuela and Cuba are excluded by the WMF from the dataset. The population rankings for native English-speaking countries are almost identical to the rankings in Wikipedia contributions of the same countries. But the population rankings for native Spanish-speaking countries are much less similar to their rankings in Wikipedia Spanish-language contributions.
Editors from | Editors with 100+ edits (lower bound) |
% of total reported |
Editors with 5–99 edits (lower bound) |
% of total reported |
---|---|---|---|---|
Spain | 211 | 35.9% | 3,881 | 35.4% |
Argentina | 101 | 17.2% | 1,421 | 12.0% |
Mexico | 71 | 12.1% | 1,471 | 13.4% |
Chile | 51 | 8.7% | 831 | 7.6% |
Colombia | 41 | 7.0% | 851 | 7.8% |
Peru | 31 | 5.3% | 631 | 5.8% |
Ecuador | 11 | 1.9% | 231 | 2.1% |
Nicaragua | 11 | 1.9% | 81* | 0.7% |
United States | 11 | 1.9% | 251 | 2.3% |
Uruguay | 11 | 1.9% | 211 | 1.3% |
unknown | 11 | 1.9% | 11* | 0.1% |
Bolivia | 1* | 0.2% | 101 | 0.9% |
Dominican Republic | 1* | 0.2% | 101 | 0.9% |
Total (in table) | 96.0% | 91.8% |
Table 3 shows how very active editors from the US and the UK edit the non-English Wikipedias. Altogether very active editors from the US edit in 44 different Wikipedia versions. Those from the UK edit in 29 versions. Among those versions with 11–20 very active editors from the US are an interesting mix of the Chinese, Spanish, Farsi (Persian), Japanese, and Russian Wikipedias. The similar data from UK editors only includes the French Wikipedia.
Version edited | From | Editors with 100+ edits (lower bound) |
---|---|---|
enwiki | United States | 1881 |
zhwiki | United States | 51 |
eswiki | United States | 11 |
fawiki | United States | 11 |
jawiki | United States | 11 |
ruwiki | United States | 11 |
simplewiki | United States | 11 |
37 others | United States | 37 |
enwiki | United Kingdom | 731 |
frwiki | United Kingdom | 11 |
27 others | United Kingdom | 27 |
Time is the main variable of interest that was left out of the above examinations. Right now we could see how edit contributions from different countries change over the nine months from January through September 2019. As time goes by, more months of data will be released, and the effect of time will likely be of greater interest. For example, let's say that there was a new program introduced intended to increase the number of editors from country Y. The full effects of the program might not be seen after 9 months, but after 2 or 3 years hopefully any effects could be seen in the data.
Another area of interest might involve combining this dataset with other datasets. For example, say a program is undertaken to increase the quality – rather than the quantity – of articles about country Z. Using this data in conjunction with data on readership might give a more complete understanding of the effects of the program.