The Signpost

Technology report

Wikimedia down for an hour; What is: Wikipedia Offline?

Contribute  —  
Share this
By Jarry1250 and FT2

Wikimedia wikis down for an hour

As noted in last week's "Technology Report", Wikimedia wikis underwent a scheduled downtime of one hour on Tuesday 24 May at around 13:00–14:00 UTC. The downtime meant that the Foundation has already missed previous aired targets of limiting downtime to just 5.256 minutes per annum (equivalent to 99.999% uptime) and 52.6 minutes (99.99% uptime) for this calendar year. However, the work does appear to have been successful at reducing the quantity of out-of-date pages served to readers and other similar problems.

During the downtime, designed to allow the operations team sufficient time to "update the router software and tune the configuration", access to Wikimedia sites was intermittent. The episode and associated issues was alluded to by cartoonist Randall Monroe on his comic strip xkcd (see also this week's "In the news" for more details). Wikimedia developers enjoyed dissecting the technical aspects of the cartoon on the wikitech-l mailing list.

What is: Wikipedia Offline?

Related articles
What is...?

Wikimedia Labs: soon to be at the cutting edge of MediaWiki development?
23 April 2012

MediaWiki 1.20wmf01 hits first WMF wiki, understanding 20% time, and why this report cannot yet be a draft
16 April 2012

What is: agile development? and new mobile site goes live
12 September 2011

The bugosphere, new mobile site and MediaWiki 1.18 close in on deployment
29 August 2011

Code Review backlog almost zero; What is: Subversion?; brief news
18 July 2011

Wikimedia down for an hour; What is: Wikipedia Offline?
30 May 2011

Bugs, Repairs, and Internal Operational News
25 April 2011

What is: localisation?; the proposed "personal image filter" explained; and more in brief
21 March 2011


More articles

Many Wikipedia editors can now access the Internet from multiple locations: at home, at work, even on-the-go with smartphones. In 2010, however, only 30% of the world had any access at all to the so-called "World Wide Web", even when the high rates of availability found in the developed world are allowed to skew the data (source: CIA World Factbook). Since the Wikimedia Foundation's aim is to "encourage the growth, development and distribution of free, multilingual content", it is clear that either the remaining 70% will have to be supplied with the Internet so they can access the online versions of Wikimedia wikis, or the Wikimedia wikis will have to be provided in an offline-friendly format (in contrast, 50% of the world has used a computer, according to Pew Research). The "Wikipedia Offline" project, then, is a WMF initiative aimed at spreading its flagship product freely to the two billion people who use a computer but cannot access the Internet.

There are two parts to the challenge: firstly, in ensuring that there are Wikipedias in as many languages as possible. The number of users for whom a Wikipedia exists in a language they speak was recently estimated as above 98% (foundation-l mailing list); about 82% have a Wikipedia in their native tongue (also foundation-l). The second challenge is the technical one of supplying the information. A current strategy of the Foundation is to continue to make the raw data of Wikipedias available via so-called "dumps", while simultaneously supporting open-source programs that can process these files. In combination, this will allow whole Wikipedias to be either downloaded when an Internet connection is available, or to be shipped on DVDs or other portable media. This runs alongside the Foundation's existing project to select the most useful articles from a given Wikipedia, hence condensing an encyclopedia onto a single CD.

While "dumps" are largely tried and tested (though recent work has focussed on improving their regularity and reliability), there have also been efforts to enable the export of smaller "collections" of articles, for example those relating to major health issues faced by developing countries. This was in part provided by a new export format (ZIM, developed by the openZim project) that can be read by some offline readers. However, ongoing efforts focus mainly on the second half of the strategy: the provision of a good-quality reader capable of displaying off-line versions of wikis. A number of possible readers were tested. The "Kiwix" reader was selected in late 2010, and the Foundation has since devoted time to improving its user interface, including via the translation of its interface. There is also competition from other readers, including "Okawix", the product of the French company Linterweb. User:Ziko blogged last week about the differences he found between the two. Which, if either, will become the standard is unclear, because it is such a fast-moving area.

See also: Wikimedia strategy document, update on Wikimedia's progress (as of March 2011).

In brief

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

"office action"? "legal request"???? Is that all the explanation we will get for the deletion of those images? --Orange Mike | Talk 19:21, 31 May 2011 (UTC)[reply]

Downtime for the maintenance was well below a full hour. Also, planned maintenance does not count against the targeted downtime, as the targeted downtime is for unscheduled downtime (this is normal). That said, there have been a number of other outages that have put us past the 99.999% target, or even the 99.99% target. --Ryan Lane (WMF) (talk) 21:43, 31 May 2011 (UTC)[reply]

Thanks for clarifying Ryan. I'd never have guessed that was standard practice. Does the Foundation have a target for limiting scheduled downtime too, do you know? - Jarry1250 [Weasel? Discuss.] 21:45, 31 May 2011 (UTC)[reply]
I don't believe we have any targets listed for this. Obviously the ops team would like to have no scheduled downtime, and we rarely do.--Ryan Lane (WMF) (talk) 22:07, 2 June 2011 (UTC)[reply]

"In 2010, however, only 30% of the world had any access at all to the so-called "World Wide Web", even when the high rates of availability found in the developed world are allowed to skew the data" — what is that supposed to mean? It's the nature of an average to take both areas with high(er) and lower availability of Internet/WWW access into account. Areas with higher-than-average penetration rates aren't "skewing the data" any more than areas with a lower-than-average penetration rate are. Perhaps a "pure" average is not a good metric here, though. -- Schneelocke (talk) 12:28, 1 June 2011 (UTC)[reply]

  • Skewness refers to the spread of data. Here, a small number of advanced economies with very high penetration rates drag up the mean, but would not have touched the median. 30% is the mean average, which is not altogether appropriate here, as (I think) you acknowledge. The "even when" was a warning that 30% is not a useful metric for some uses because it hides the fact that some countries have very high rates but most have very low rates. - Jarry1250 [Weasel? Discuss.] 13:09, 1 June 2011 (UTC)[reply]
    I read that as a statement of percentile threshold, not an average of anything. Perhaps they calculated it using a population weighted mean of availability rates by country, which would be an informative statistic, but then calling it an average would only characterize the method of derivation, not the nature of what the statistic purports to represent. The fraction of people who had access is not the same kind of thing as the mean amount of access each person had. ~ Ningauble (talk) 18:47, 1 June 2011 (UTC)[reply]
Well, it's my average: I took the sum of internet users and divided it by the size of the population. 30% of all the people in the world accessed the internet in 2010, that's the important point. - Jarry1250 [Weasel? Discuss.] 21:20, 1 June 2011 (UTC)[reply]
Yes, that is the important point. It is not what is termed an "average", or measure of central tendency. If you calculated it yourself based on statistics in the cited source, it is a little misleading to cite that as the source of your conclusion. (Sorry if this seems like nitpicking. Unclear statistical writing is a pet peeve of mine because it is so widespread, even among professional writers. I highly recommend the book linked in my previous edit summary for all journalists (and encyclopedists) who use or report statistics. Despite the tongue-in-cheek title, it is very illuminating about what ought to be common sense but is commonly done wrong.) ~ Ningauble (talk) 17:11, 2 June 2011 (UTC)[reply]
To be fai, I don't even know why I used the word "average" in the first place...! Plain old "30% of the world" is more pithy anyway :P - Jarry1250 [Weasel? Discuss.] 18:01, 2 June 2011 (UTC)[reply]

More direct citations would be more useful. To dig up *where* in the Factbook this could be found would take a bit of searching. "In 2010, however, only 30% of the world had any access at all to the so-called "World Wide Web", even when the high rates of availability found in the developed world are allowed to skew the data (source: CIA World Factbook)."Jodi.a.schneider (talk) 17:48, 23 June 2011 (UTC)[reply]

Country listing, under "World". - Jarry1250 [Weasel? Discuss.] 18:03, 23 June 2011 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0