The Signpost

Technology report

Forks, upload slowness and mobile redirection

Contribute  —  
Share this
By Jarry1250, Jorgenev

Making Wikimedia more forkable

The question of how easy it is to "fork" Wikimedia wikis, or, indeed, to merely mirror their content on another site, was posed this week on the wikitech-l mailing list by Wikimedian David Gerard. The concept is also related to that of backups, since a Wikipedia fork could provide a useful restore point if Wikimedia server areas were affected by simultaneous technical failure, such as that caused by a potent hacking attempt.

During the discussion, Lead Software Architect Brion Vibber suggested that the Wikimedia software setup could be easily recreated, as could page content. Instead, he said, the major challenge would lie in "being able to move data around between different sites (merging changes, distributing new articles)", potentially allowing users of other sites to feedback improvements to articles whilst also receiving updates from Wikimedia users. So far, at least one site (http://wikipedia.wp.pl/) has been successful in maintaining a live copy of Wikimedia wikis, lagging behind the parent wiki it tries to mirror by only minutes. No site has yet implemented an automated procedure for pushing edits made by its users upstream to its parent wiki, however. Other contributors suggested that few external sites would have the facility to host their own copy of images, and keeping in line with Wikimedia's strict policy on attribution.

In unrelated news, there were also discussions about making pageview statistics more accessible to operators of tools and apps (also wikitech-l). In particular, the current reliance on the external site http://stats.grok.se to collate data was noted. As MZMcBride wrote, "currently, if you want data on, for example, every article on the English Wikipedia, you'd have to make 3.7 million individual HTTP requests to [the site]".

Uploading was slower than it used to be, but that's fixed, says bugmeister

Early data seemed to show a dramatic fall in upload speed earlier this year.

Although hampered by a lack of data points, anecdotal evidence collected over the past fortnight pointed to a slowdown in the speed of uploading files to Wikimedia wikis. The problem therefore made mass API uploading very difficult, and, as a result, a bug was opened. "An upload that should take minutes is taking hours", wrote one commenter. Another pinpointed Wikimedia servers as the bottleneck: during a test, uploads to the Internet Archive had been over ten times quicker. As it became clear that the problem was affecting a large number of users and data collected seemed to show a dramatic decrease in upload speeds earlier this year, significant resources were devoted to the issue. WMF technicians Chad Horohoe, Roan Kattouw, Sam Reed, Rob Lanphier and Asher Feldman have all worked on the problem.

Once the upload chain was determined as "User → Europe caching server → US caching server → Application server (Apache) → Network File System → Wikimedia server MS7", members of the operations team worked to profile where the bottleneck was occurring. Unfortunately, an error introduced by the profiling meant that uploads were in fact blocked for several minutes. Then, on 12/13 August, the problem was pinpointed and fixed: a module for helping optimise network connections, Generic Receive Offload (GRO), had in fact been slowing them down. According to WMF bugmeister Mark Hershberger, smaller data packets were being collated into much larger ones. The new packets were then too large to be handled effectively by other parts of the network infrastructure. Although there are still some reports of slowness, test performance has increased by a factor of at least three. In the future, more data on upload speed is likely to be collected to provide a benchmark against which efficiency can be tested.

In brief

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.

How you can help
Spread News of Job Vacancies

This week, the Foundation's Rob Lanphier reiterated that the Foundation is having problems hiring a new Data Analysis engineer and a software developer. Know someone who might be interested? Link them to the details.

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
Section moved to
Talk:List_of_Michelin_starred_restaurants_in_the_Netherlands#Rendering_time_etc.

Not sure if you meant Network File System (protocol), as opposed to the generic term. The link Network File System is a disambiguation page. The protocol is more likely if the apache servers were running Linux for example. W Nowicki (talk) 19:27, 16 August 2011 (UTC)[reply]

Forks

I must note that I didn't intend it as an urgent call to action - rather, as something we need to keep in mind, and which will only benefit us. I am quite cognisant that the likely number of forks of English Wikipedia is zero ... but every one of the steps needed to make our projects forkable is actually (a) a good idea technically (b) important to preserving our work.

(I'd also like to make us forkable so that we can tell our more special critics "here, fork it, if you're right you'll do so much better than us." At the least, watching them come up with new excuses not to will be amusing.) - David Gerard (talk) 20:41, 16 August 2011 (UTC)[reply]

Upload speed

I upload lots of own-photograph images to Commons, and I've noticed the upload speed dropping — most of my uploading is done on a major university campus with huge bandwidth, but I've still noticed over the summer that the upload speed is markedly slower than it was over Christmas break. I'd vaguely wondered if snow on the ground made image sizes smaller and if blue sky made them larger, but I'd not really considered server problems. Nyttend (talk) 23:29, 19 August 2011 (UTC)[reply]

Chinese Wikipedia Numbers

With reference to the fact that the "page view" numbers on the chinese wikipedia have trebled over the last few months I didn't quite understand what has happened? Did the bug mean that the site was getting much fewer hits from search engines and bots (i.e the huge increase is not real people but the number of automated programs visiting the wiki) or did the bug mean that the site is now getting far more human viewers through search engines which were previously not displaying these pages as results in response to queries. If it is the latter as it seems (that whoever found this bug has essentially made the chinese wikipedia 3 times more popular!) then that seems like a huge deal, and whoever found that bug should probably get an award or something (or at least a big round of applause!) 86.66.128.117 (talk) 20:01, 20 August 2011 (UTC)[reply]

It does seem (from the linked explanation) that it's the latter. I agree that would be a big deal, worthy of more than a bullet point in the "In brief" section. --Avenue (talk) 22:49, 22 August 2011 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0