US National Archives enshrines Wikipedia in Open Government Plan, plans to upload all holdings to Commons

News and notes

US National Archives enshrines Wikipedia in Open Government Plan, plans to upload all holdings to Commons

Share this

The US National Archives and Record Administration (NARA) has committed to engaging with Wikimedia projects in their newest Open Government Plan. The biannual effort is a roadmap for how the agency will accomplish its goals in the digital age. In the first plan, issued in 2010, Archivist of the United States David Ferriero wrote "the cornerstone of the work that we do every day is the belief that citizens have the right to see, examine, and learn from the records that document the actions of their Government. But in this digital age, we have the opportunity to work and communicate more efficiently, effectively, and in completely new ways."

These "new ways" included reaching out to Wikipedia, starting in 2011 with the hiring of Dominic McDevitt-Parks as a Wikipedian in residence. The position began as a student internship, but McDevitt-Parks has since moved to being a digital content specialist with a specialty in the Wikimedia sites. Ferriero has spoken at multiple Wikimedia events, including the Wikipedia in Higher Education summit in 2011 (see Signpost coverage) and Wikimania 2012 (video; transcript; Signpost coverage). He has been frequently quoted saying varying forms of "if Wikipedia is good enough for the Archivist of the United States, maybe it should be good enough for you."

How has the Wikimedia movement benefited from NARA and McDevitt-Parks' placement? There are three organized projects dedicated to NARA. On Wikisource, NARA has an ongoing initiative that is transcribing US government documents. On Commons, NARA has uploaded over 100,000 images, the most recent of which came a month ago. The English Wikipedia has gone into action with several articles related to images from NARA, such as Desegregation in the United States Marine Corps. The site has benefited with several images uploaded for specific users, such as living Medal of Honor recipients, like Charles H. Coolidge, and the lead images for three US battleship articles: Pennsylvania-class battleship, USS Arizona (BB-39), and South Carolina-class battleship (Editor's note: the author of this article has made significant contributions to the last three pages).

All of that is in the past, though. The Open Government Plan lays out what NARA wants to accomplish in the next two years; but as a general plan it suffers from a lack of specifics. The Signpost contacted McDevitt-Parks to learn what the inclusion of Wikipedia in this plan will mean for the site.

He told us that there is no quantitative target for a total number of image uploads, because NARA plans to upload all of its holdings to Commons. "The records we have uploaded so far contain some of the most high-value holdings (e.g. Ansel Adams, Mathew Brady, war posters)", he said. "However, we are not limiting ourselves to particular collections. Our approach has always been simply to upload as much as possible ... to make them as widely accessible to the public as possible."

To accomplish this, volunteers are working with NARA on a new upload script to port images to Commons; the work in progress is posted on Github. At NARA itself, an API is in development that will make it easier to extract the metadata of the images. Given these efforts, McDevitt-Parks says that they will "allow us to more easily upload all of our existing digitized holdings to Wikimedia Commons and similar third-party platforms, and also that in the future upload to platforms like Commons will be the end of all digitization. Looking at it this way, I would say that in a way all of our digitization efforts are also for upload to Wikimedia Commons."

In the meantime, the special requests process—the first pilot launched by NARA when McDevitt-Parks began his tenure—is still available for Wikipedia editors. In the future, they hope that this ad hoc arrangement can be supplemented with a volunteer citizen scanning program that will be able to "generate greater Wikipedian-initiated digitization."

What do the Vietnamese, Waray-Waray, and Swedish Wikipedias all have in common?

News and notes

5, 10, and 15 years ago
31 August 2022

Four billion words and a few numbers
28 December 2021

Progress at Wikipedia Library and Wikijournal of Medicine
28 June 2020

The deprecation of Persondata; RfA – A broken process; Complaints from users on Swedish Wikipedia
3 June 2015

US National Archives enshrines Wikipedia in Open Government Plan, plans to upload all holdings to Commons
25 June 2014

Swedish Wikipedia's millionth article leads to protests; WMF elections—where are all the voters?
19 June 2013

Picture of the Year voting begins; Internet culture covered in Sweden and consulted in Russia; brief news
2 May 2011

Report from the Swedish Wikipedia
21 August 2006

More articles

The Vietnamese and Philippines-based Waray-Waray Wikipedias have crossed the one million article rubicon—the tenth and eleventh to do so. Just like the Swedish Wikipedia, the sites have attained this symbolic milestone with the help of bots, a process that has divided opinions among Wikimedians from several languages. For example, for a previous Signpost article on the topic, German Wikipedian Achim Raschka pointed us to an entry Denis Diderot wrote for the Encyclopédie, titled "Aguaxima". Diderot lamented that all they knew about the Aguaxima was that it was a plant in Brazil, yet he still had to describe it: "If all the same I mention this plant here, along with several others that are described just as poorly, then it is out of consideration for certain readers who prefer to find nothing in a dictionary article or even to find something stupid than to find no article at all."

In an email to the Wikimedia-l mailing list, Vietnamese Wikipedian Minh Nguyen wrote that some editors on the site shared similar concerns and were "alarmed" at the sharp uptick in bot-created articles. Yet at the same time, crossing the one million article mark with a high proportion of auto-articles led the community to look at its small size—its roughly 1250 active editors is less than the Catalan Wikipedia, a language with almost 60 million less speakers—and they are taking steps to ease the learning curves of new editors.

The question of active users is even more pertinent for fellow millionaire Waray-Waray, which has just 71 active users. The related Cebuano Wikipedia, which has also embraced bot-created articles and will soon join the million article club, has even fewer.

Meanwhile, the Swedish Wikipedia's article-creation bot has started editing again. The bot's operator told the Signpost that the source code has been rewritten to use the most recent references, though it is currently mostly operating on the Waray-Waray and Cebuano Wikipedias, which will soon also have one million articles. Other Wikipedias, such as Farsi (mostly spoken in Iran), have also expressed an interest in the bot's operation. Why have other Wikipedias not adopted similar processes, aside from those (like the English and German) that have philosophical objections? Lsj believes "it is mostly a matter of whether there is somebody who knows both bots and the target language well enough, and is prepared to devote the time required. Small language versions likely do not have such a person."

This article was updated after publication with information and comments from Minh Nguyen.

In brief

Commons devolving into full-blown URAA conflict: Battle lines have formed over the past few months with a split in the Commons community over the American Uruguay Round Agreements Act, which restored US copyrights on several works. While arguments over the Commons mission have gone on for years (in the Signpost in the past year alone—op-ed, reply, and forum), the URAA debate came to a head only recently with a large community request for comment. This was closed with a strong majority in favor of disallowing deletions based on URAA alone, but a subsequent discussion was rejected both on legal grounds and as being, in the words of several editors, "unclear". Recent discussions have taken place on the Wikimedia-l mailing list, but the current situation is muddled. A new proposal to break the deadlock has been started on Commons' administrators' noticeboard, but Wikimedia Foundation board member Samuel Klein has quickly suggested a major change that would modify Commons' policy to read "Keep something that is public domain in its country of origin, as long as there is reason to believe that rights-holders would want it to be used in the rest of the world."
Wiknics: US and Dutch Wikimedians are organizing Wiknics for early July.
Wikimania volunteers needed: The organizers of this year's London Wikimania have issued a call for volunteers to serve in a number of capacities during the conference week (8–10 August).
Engineering goals: The Wikimedia Foundation's Engineering and Product Development department is currently formulating its goals for the 2014–15 fiscal year. Editors are invited to contribute their views on the draft plan's talk page.
Quarterly reviews: Three quarterly reviews have been published on Meta: the editing team (formerly VisualEditor), Parsoid team, and Analytics team. Quarterly reviews aim to ensure accountability and allow senior Foundation staff to offer specific guidance to their proliferous and diverse initiatives.

← Previous "News and notes"

Next "News and notes" →

In this issue

25 June 2014 (all comments)

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

On 30 June 2014, Wikipedia:Wikipedia Signpost/2014-06-25/News and notes was linked from Slashdot, a high-traffic website. (Traffic)

All prior and subsequent edits to the article are noted in its revision history.

I posted a fairly lengthy response to wikimedia-l regarding the state of bot-created stubs at the Vietnamese Wikipedia. – Minh Nguyễn (talk, contribs) 08:54, 29 June 2014 (UTC)[reply]
- Hey Mxn, I've updated the article above with information from you. Thanks! Ed ^{[talk] [majestic titan]} 15:25, 1 July 2014 (UTC)[reply]
  - Thanks, The ed17. It can be misleading sometimes to compare active users across different language editions, because for instance Catalonia probably has better Internet penetration than Vietnam. But the number of active users is lower than many in the community would like: we're #25 among Wikipedias by views per hour but only #117 by editors per speaker. I think most view it as room for the community to improve – or a metric to game... – rather than as a knock against bot operators. – Minh Nguyễn (talk, contribs) 08:57, 2 July 2014 (UTC)[reply]

We've made Slashdot! Thank you all for reading this story, and I hope you enjoy it; please suggest any improvements here. Ed ^{[talk] [majestic titan]} 02:06, 30 June 2014 (UTC)[reply]

I do hope there will be a contribution towards Wikipedia to assist -- it sounds like a lot of servers, storage and bandwidth will be consumed by this project. Wikipedia is free, but the hardware most certainly isn't.203.12.85.25 (talk) 03:55, 30 June 2014 (UTC)[reply]

This headline is dubious, to say the least. NARA isn't uploading all its holdings. It's uploading all its digital images. Big difference. 32.218.35.228 (talk) 16:23, 30 June 2014 (UTC)[reply]
- I wouldn't have wrote it in the article had Dominic not directly said "our goal is to have all of our holdings ... available on Wikimedia Commons". :-) Ed ^{[talk] [majestic titan]} 18:49, 30 June 2014 (UTC)[reply]
  - The corollary here is that another of our big goals is that "one day, all of our records will be online." These are big goals when we are talking about more than a billion records, but the point was intentional that we are not limiting ourselves to previously digitized material. Our goal is that Wikimedia Commons should be an end point of the digitization workflow for all future digitizations, too. Dominic·t 21:45, 30 June 2014 (UTC)[reply]

This is utterly absurd. NARA has billions and trillions of records. There is no way on earth that it is going to upload them all to Commons. The last estimate I saw of how long it would take NARA to digitize its holdings was > 1000 years. (See: [1], [2].) There is obviously some misunderstanding or miscommunication here. This probably just refers to NARA'S extant digital images, not to all its holdings. 32.218.35.228 (talk) 22:19, 30 June 2014 (UTC)[reply]

Who's to say this pace doesn't pick up at some point? Anyway, to clarify again, this is about their entire holdings. Ed ^{[talk] [majestic titan]}

Credulous, aren't you? The pace would have to pick up 1000% in order for the 1800 year estimate in the NY Times article to be reduced to 18 years. What has changed at NARA in the past 5-6 years to account for such an overwhelming change in pace? Nothing. Not one single thing. (Or perhaps the Koch brothers died and left all their wealth to NARA, and I missed the news.) You can believe whatever you like, but neither a flat earth nor the assertion that NARA will upload all its holdings makes the least bit of sense. 32.218.35.228 (talk) 00:30, 1 July 2014 (UTC)[reply]

Just because a mission goal is big and audacious doesn't mean it is absurd. I think Wikipedians, who work for the mission of "a world in which every single human being can freely share in the sum of all knowledge" will naturally understand this. The "billions and trillions of records" is made-up number, overstating the holdings (which may be several billion at most) by orders of magnitude. But, in any case, it's true that NARA has a lot of stuff yet to digitize before it can be uploaded. It is also true that it has millions of records already digitized which we aim to put on Wikimedia Commons, and that in itself is no small task. Dominic·t 00:59, 1 July 2014 (UTC)[reply]

Press coverage: TechCrunch, Techlicious, Gigjets, InTheCapital, Smithsonian Magazine, and Fedscoop. Ed ^{[talk] [majestic titan]} 18:23, 4 July 2014 (UTC)[reply]

Tangential: Financial Express (India). Ed ^{[talk] [majestic titan]} 02:11, 27 July 2014 (UTC)[reply]

The Signpost needs your help putting together the next issue.

Home

About