The Signpost

Technology report

WMF and the German chapter face up to Toolserver uncertainty

Contribute  —  
Share this
By Jarry1250 and Tony1
Wikimedia Deutschland's CEO Pavel Richter attributes the organisation's compromise over Toolserver funding to the establishment of the multi-million dollar Wikimedia Labs project.

The Toolserver is an external service hosting the hundreds of webpages and scripts (collectively known as "tools") that assist Wikimedia communities in dozens of mostly menial tasks. Few people think that it has been operating well recently; the problems, which include high database replication lag and periods of total downtime, have caused considerable disruption to the Toolserver's usual functions. Those functions are highly valued by many Wikimedia communities, comprising data reports on the relationships between pages, categories, images, and external links; support for Wiki Loves Monuments, OpenStreetMap and GLAM projects; talk-page archiving services; edit counters; and tools aimed at easing many automated administrative processes such as the account and unblock request processes on several major wikis, as well as cross-wiki abuse detection.

How did the Toolserver start?

It was originally set up in 2005 through the donation by Sun Microsystems of servers to Wikimedia Deutschland (WMDE); so it was almost by coincidence that the German chapter was prompted to take on responsibility for the project. WMDE has since invested heavily in Toolserver infrastructure and its operations—an unusually global role for a chapter, resulting from the particular nature of its revenue streams and German charity laws. There has been in-kind support from the Wikimedia Foundation, mostly in the form of database replication and space in its Amsterdam data centre (valued at US$65k a year), as well as financial grants to expand the hardware (example). Nevertheless, WMDE still makes up the bulk of the general budget of about €100k (US$130k); other chapters, such as Wikimedia UK, have also made smaller contributions.

Wikimedia Labs vs Toolserver: a comedy of errors?

In 2011, the Foundation announced the creation of Wikimedia Labs, a much better funded project that among other things aimed to mimic the Toolserver's functionality by mid-2013. At the same time, Erik Möller, the WMF's director of engineering, announced that the Foundation would no longer be supporting the Toolserver financially, but would continue to provide the same in-kind support as it had done previously.

DaB is the volunteer who administers the Toolserver, and who in the process has acquired unique expertise for running the system. (WMDE has also contracted Marlen Caemmerer to assist in Toolserver administration since October 2011.) DaB told the Signpost that there is a simple reason for the recent degradation in performance: the Toolserver's hardware was not added to in 2012, while more tools have been written and more people are using the tools. The German chapter, he says, has refused his request to extend the hardware infrastructure, giving only a vague commitment of support. But its September forward planning allocates just a fraction of last year's funding.

DaB's comments are a reference to a message from WMDE's CEO, Pavel Richter, who publicly reassured Toolserver developers this week that "Wikimedia Deutschland will make all necessary investments [including new hardware] to keep the Toolserver up and running", but said that the chapter could not ignore the existence and growth of Labs. The movement now faces a complex challenge in working out how to maintain continuous support of the tools, a complexity that is obvious from recent debates (conducted in German) on Meta and the German Wikipedia; moreover, DaB has threatened to resign if WMDE does not allocate funds for hardware purchase.

What the WMF didn't anticipate, and what it now seems as though they're naively ignoring despite the outcry, is that WMDE doesn't have anything like the foundation's eight-figure budget, and apparently the WMF has decided the Toolserver is going to get the short end of the stick when it comes to funding.

 — Hersfold

Richter's reference to Wikimedia Labs' rapid growth prompted WMF deputy director Erik Möller to express the Foundation's thinking (full version, including rationale) in response to questions raised about the scenario:

It is true that we (the WMF) have ... asked WMDE to work with us in transitioning from Toolserver to Labs. ... Chapters are autonomous organizations, and it's WMDE's call how much / whether it wants to continue to invest in [the Toolserver] ... However, for our part, we will not continue to support the current arrangement ... indefinitely. The timeline we've discussed with Wikimedia Germany is roughly as follows:
  • Wind down new account creation on Toolserver by Q2 of 2013 calendar year
  • Decommission Toolserver by December 2013
Möller accepted that Labs, while well-resourced both in terms of processing capability and storage space, is not yet suitable for Toolserver migrants, lacking (among other things) both database replication and a "Quick Start" mode for users uninterested in Labs' capability for custom server setups. While funding has been put aside for developing such features, Möller would not commit to targeted WMF funding for tool transition, and therein lies the cause of concern among volunteer Toolserver developers: that they could be left facing a switchover deadline without being in a position (lacking either the time, the capabilities, or both) to migrate their tools themselves. They are concerned, then, that only time will tell what will happen to these popular but difficult to migrate tools, to whose continued existence both WMDE and the WMF seem unwilling to commit.

English Wikipedia arbitrator Hersfold was closely involved in writing the "unblock ticket request system" (UTRS), which allows blocked users—including innocent parties caught up in range-blocks—to appeal their blocks. UTRS, created only recently and now officially mandated by the Foundation, is written for the Toolserver, not the Labs environment. Hersfold told the Signpost:

How Labs functions seems to be almost completely different from how the Toolserver functions. We've been told multiple times that Labs will provide lots of "beefy" infrastructure for tools development; ... users will be able to set up virtual machines, or "instances" ... to handle their development, and submit new programming code to a shared location. As one may expect from the Foundation, it's a very collaborative setup. Once inside their instance, a user can more-or-less do whatever they want; install MediaWiki, run a bot, set up web pages for tools, whatever. But most people on the Toolserver don't need "beefy"; we just need a web server that will let us run our tools and access the databases holding information about Wikipedia and the other projects. If someone needed "beefy," they'd have set up their own server ages ago. While Labs is all swishy and fancy (and presumably has less downtime than the Toolserver), it's an environment we're all completely unused to, and perhaps worst of all, it provides no access to the Wikimedia databases, which will prevent most tools and bots from working at all. Supposedly this functionality will be available at some point in the future [editor's note: planned for the first quarter of 2013] ... I don't think either organization fully realizes how much Wikipedia, the Commons, and all the other projects rely on the tools provided by the Toolserver ... [if it goes,] most of the tools and bots we take for granted will suddenly cease to function.

Carl, another developer, agreed, "labs will be useful for some projects, particularly for developing MediaWiki extensions. [But] the current plans seem to be intentionally preventing [other] Toolserver users from simply migrating their tools to Labs; the result will be a great leap backwards when/if the toolserver is taken offline."

The Signpost understands that a further sticking-point is licensing: while recommended to, some tool operators have not released their code under a free license, which is a requirement for using Labs (one operator has stated he legally cannot do so, since he created the tool using his company's computer systems, so the company holds the copyright).

An earlier version of this article incorrectly asserted that access to the Wikimedia databases would occur in December 2013. It is actually planned for the first quarter of next year.

In brief

Signpost poll
Code review priorities
In my view, the WMF's code review priority should be...: reducing the disparity between volunteers and staff: 29%; widening the pool of reviewers: 18%; reducing the average/maximum wait time: 12%; other / unsure / impossible to pick: 41%.
You can now give your opinion on next week's poll: How often do you use the Toolserver?

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.

At the time of writing, 14 BRFAs are active. As usual, community input is encouraged.
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
I've added a mention of Marlen. However, she's not mentioned anywhere on the WMF's site, although does come up on a google search in association with "Toolserver". I hope this addresses your concern. Thanks. Tony (talk) 16:03, 2 October 2012 (UTC)[reply]
Tony, thank you for your response. Clarification: Marlen works for WMDE, not WMF. I am going to make that correction in the article; hope the Signpost does not mind! Sumana Harihareswara, Wikimedia Foundation Engineering Community Manager (talk) 17:44, 2 October 2012 (UTC)[reply]
(Personal comment:) The current version is still somewhat misleading, as it implies that the Toolserver had no paid staff before October 2011. See e.g. http://journal.toolserver.org/ ("River becomes the first paid toolserver admin", February 5, 2010), Wikipedia:Wikipedia_Signpost/2010-02-08/News_and_notes#Briefly or Jarry1250 April 2011 overview article "What is: the Toolserver?".
Also, a casual reader unfamiliar with the topic might take away the mistaken impression that Toolserver performance problems are an entirely new phenomenon. (As a small example from the Signpost itself, encountering Toolserver failure has long been a routine part of the Signpost's publication process.)
Regards, HaeB (talk) 21:06, 2 October 2012 (UTC)[reply]
It's not a new phenomenon, but it certainly has become more frequent in recent months. Hersfold non-admin(t/a/c) 21:58, 2 October 2012 (UTC)[reply]
My impression from the emails on the TS mailing list were that those features wouldn't be functional until December, or mid-2013 at the earliest. This still doesn't address much of the concerns expressed by toolserver users, which I believe hinged on the ability to join the replicated databases with their own user databases. Hersfold non-admin(t/a/c) 16:16, 2 October 2012 (UTC)[reply]
As Sumana said, stated goal for DB replication is Q1 of 2013, and we're looking into whether an earlier roll-out is feasible. So I'm not sure what your impression is based on. :-) Eloquence* 20:15, 2 October 2012 (UTC)[reply]
I must have mis-read something or missed an email then. Apologies to Tony and Jarry for the bad info. Hersfold non-admin(t/a/c) 21:58, 2 October 2012 (UTC)[reply]
I have corrected this in the article. Hope that's okay with you, Hersfold, and no worries, I'm sure worse mistakes have been made! :-) Ed [talk] [majestic titan] 05:28, 3 October 2012 (UTC)[reply]

In general, as I noted on toolserver-l, I agree with Carl that we should find ways to support projects like the WP 1.0 assessment DB in Labs. The feature set of the Labs DB replication isn't final, and it's likely going to be iterative.

We'll host an IRC meeting soon that we'll broadcast to toolserver-l@ as well to allow for more discussion of requirements for tool labs (the phase of the labs project dedicated to supporting tools development) and to answer questions about how folks can use Labs today. In the meantime, there are usually folks hanging out on #wikimedia-labs on irc.freenode.net as well in case you have immediate questions.--Eloquence* 20:18, 2 October 2012 (UTC)[reply]

Two comments as someone who uses some Toolserver tools heavily in dealing with spam:
  1. Whatever the outcome of all this, thank you Wikimedia Deutschland for subsidizing this great capability for the rest of us for so long!
  2. I believe the Foundation should fund and support the existing toolserver as long as necessary until Wikilabs is ready to replace it. (I'm also open to not replacing the toolserver -- whatever makes the best sense, I just want the tools)
Thanks also to all the tool developers around the world who've developed these useful tools, too.
--A. B. (talkcontribs) 20:32, 2 October 2012 (UTC)[reply]

I've written several tools that aid maintenance work on Wikipedia, most notably in identifying uncategorized articles and extensive work with disambiguation. If I lose (1) Wikipedia database replication or (2) the ability to join my user database to the replicated database, all of that work is lost. All of it. I know that maintenance work is not glamorous or interesting to most Wikipedians, but it is nevertheless important. I hope that those who are making the decisions about keeping Toolserver viable during the interim and how to set up Wikimedia Labs take into account the role Toolserver plays in maintaining Wikipedia infrastructure. --JaGatalk 22:37, 2 October 2012 (UTC)[reply]

Yep, we hear you, including on the user-DB-to-production-DB join issue. Our main concern is in coming up with an architecture that's reliable and performant, even when users do crazy things ;-). We'll post more details on the DB replication strategy in coming weeks, and as I noted above, will also organize open IRC sessions to dig more into some of the current use cases for tool developers. We'll post updates to toolserver-l.--Eloquence* 23:09, 2 October 2012 (UTC)[reply]
  • On a Toolserver related question, whatever happened to the Articles Created tool? The one where you could put in a user's name and it would list the articles (with or without redirects) that the user had created? It was really useful and I can't seem to find it anymore. SilverserenC 23:38, 2 October 2012 (UTC)[reply]

The thing which i have little bit hard to understand is that why the Toolserver need to be shutted down at all. The reasoning behind why to create the Labs is pretty solid, but answer for the question why The Labs and the Toolserver cant coexists is not. The key question in this is seems to be the SQL replication to the outside world. If WMF takes it away then there is no future for anything like Toolserver at all. Period. Alternative vision could be that in the future besides the Labs there could be multiple instances of independent [tool]servers working with replicated data. The current TS could be used as prototype for this. Reasoning for independent systems would be that even when the Labs system is fully operational it can't ever be used for everything. One limiting thing is licence policy, one cannot use the closed source in the labs, second is that even the Labs horsepower is considerable it is not unlimited and suitable for everything. One can prefer to use specialized computing for him/her own needs. --Zache (talk) 08:13, 3 October 2012 (UTC)[reply]

I have direct, personal experience of the utility of the Toolserver for creating content on the projects (Wikisource, in particular). Whatever the engineering considerations, I'm certainly concerned that the approach taken doesn't seem driven by free content. Does seem "more of the same" with the "cool" stuff. I.e. the cart gets put before the horse. Charles Matthews (talk) 16:30, 3 October 2012 (UTC)[reply]

  • The Toolserver is an essential tool. If it is underfunded by either WMDE or WMF, I would say those organisations have dropped the ball. This is especially true because those organisations have plenty of money, a lot of which they spend for more and more bureaucratic overhead. They should think hard about maintaining the Toolserver in an adequate way. I would like to point out that the Toolserver and the tools on it are also powerful projects that give reason to participate as donor in the donation campaign. Longbow4u (talk) 04:55, 5 October 2012 (UTC)[reply]
  • Setting aside any question of how this mess arose or how to fix it, it is clear that once Labs is able to support the tools, there will still be a significant time needed (presumably in some part by the tool owners) to migrate them from Toolserver to Labs (or perhaps some other infrastructure), to test them on the new infrastructure, verify they work at least as well as before, then bring them into general use. It seems braindead obvious that we need the Toolserver to continue functioning until this is accomplished. Killing Toolserver at the end of 2012, months in advance of even starting the port to Labs is absolutely ass-backwards. It should get the necessary support to keep it going until all the tools are moved and working on Labs. Where are the discussions on scheduling these moves? All I've seen so far are vague pronouncements about when Labs will be running, not when the tools will be. So, WMF, be prepared to have a large number of supremely upset users if this doesn't get sorted out. Do what it takes so that WMDE have the means and motivation to keep TS going until it is no longer needed.LeadSongDog come howl! 15:54, 5 October 2012 (UTC)[reply]
Can you explain why you are interpreting the statement "Toolserver will not end early next year. Period. Wikimedia Deutschland will make all necessary investments to keep the Toolserver up and running" as "the active support will end 30 December 2012"? Regards, HaeB (talk) 21:05, 5 October 2012 (UTC)[reply]
DaB (main admin of toolserver) said that if toolserver doesn't get proper support (new hardware so that they can handle the growth) then he will resign at end of the year. Pavel answered that Wikimedia Deutschland will make all necessary investments to keep the Toolserver up and running, but it seems that means something like the replacement parts because toolserver is going to be replaced by Labs. This however is not enough to handle current situation of the toolserver. --Zache (talk) 10:20, 8 October 2012 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0