The Signpost

News and notes

The next big step for Wikidata—forming a hub for researchers

Contribute  —  
Share this
By Jan eissfeldt and Tony1
Wikidata accepts the Open Data Publisher Award at the ODI Awards in London, 4 November 2014. Second from the right is team-member Magnus Manske, a researcher in the fields of high-throughput sequencing and data visualisation. Manske created MediaWiki and more recently has written some 100 tools on WMFlabs, many of them to facilitate contributions to Wikidata and the ways in which it is used.

Wikidata, Wikimedia's free linked database that supplies Wikipedia and its sister projects, is gearing up to submit a grant application to the EU that would expand Wikidata's scope by developing it as a science hub. The proposal, supported by more than 25 volunteers and half a dozen European institutions as project partners, aims to create a virtual research environment (VRE) that will enhance the project's capacity for freely sharing scientific data.

The goal is to overcome the insular approach of conventional, feature-complete environments by building on Wikidata's existing community and role in sharing scientific data. Instead of secure, self-contained and often discipline-specific platforms, the push is designed to enhance the open collaborative functionalities that Wikidata already provides to enable new forms of research and public interaction among both professional and citizen scientists.

The Wikidata meets archeology symposium in Berlin, March 2013
Wikidata's development teams in June 2014
A Wikidata diagram in English explaining the terminology of a Wikidata statement, in this case for "Douglas Adams" (Q42).

Wikimedia projects have a long track-record of interaction with the research community, for example through Gene Wiki, which has been creating content on human genes since 2008. A blog post on the Gene Wiki's recent efforts to create Wikidata items for all human genes, followed by the publication of the underlying proposal, is what triggered the drafting of the present proposal.

This was closely followed by an announcement from Google that their collaborative knowledge base, Freebase, will be de-commissioned in early 2015, and that they look forward to the integration of Freebase into Wikidata, which is currently under discussion.

These developments came at the end of a solid year of progress for Wikidata, including increased usability. Through 2014, the local community is now the fourth-most-active editing community among Wikimedia projects—after the English Wikipedia, Commons, and the German Wikipedia. Externally, the project gained additional recognition by winning the Open Data Publisher Award 2014, presented by Tim Berners-Lee, the founder of the World Wide Web, and Nigel Shadbolt.

The community page about the proposal puts the proposal into a broader perspective:

The proposal is being prepared under the guidance of Wikimedia's long-serving volunteer for open science cooperation, biophysicist Daniel Mietchen. He works at the Museum of Natural History in Berlin, the city where Wikidata's developmental team is located; the Museum will act as the institutional coordinator of the project. More than 10 institutions have signalled their interest in joining the endeavour as associate partners, in addition to the volunteers, Wikimedia Germany (which has been primarily responsible for developing Wikidata), and the other five European partner institutions. Research from one of them—the Open University of Catalonia—is also featured in this issue's Recent research.

Mietchen told the Signpost that the budget, between €1m and €2m, would be invested primarily in building technical infrastructure, improving Wikidata's two-way connections with external data sources (including their ontologies), and training scientists and interested members of the editing community. Succeeding in EU rounds is notoriously difficult; if approved, it would be by far the biggest competitive external grant for a Wikimedia project. Either way, the proposal will be a significant conceptual and methodological advance; because it is drafted under a CC-BY license, it is available to other proposers or funders to engage with and build on.

Even before the current proposal was conceived, the Wikidata community had been exploring ways of improving ties to scientific communities in several respects. Back in 2012, the German community held a joint workshop with scholars from universities that included Cambridge, Stanford and Oxford to investigate the usability of the project for research (Signpost coverage). In 2013, Wikidata's WikiProject Chemistry discussed ways to collaborate with PubChem, one of the largest chemical databases.

Mietchen says that the team would welcome community members to participate in the drafting, to review the proposal text, or to help shape the advisory board and network of associate partners. The EU's application deadline is 14 January, so timely contributions are particularly helpful. Upcoming steps on the path to the finishing line are on the project page. Beyond that deadline, the proposal is intended to spark follow-ups with a disciplinary or regional focus, and as a seeding ground for the newly created WikiProject Wikidata for research to develop procedures for the coordination of future activities between the research and Wikidata communities.

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • Wow! Harej (talk) 03:06, 3 January 2015 (UTC)[reply]
  • No one can see the future, but my impression of this project is that if it achieves moderate success then it could be the start of the most consequential societal contribution that Wikimedia projects make.
My reading of this is that it is talking about the Semantic Web. It discusses making technical information accessible and readable to people outside of any given field. A lot of this proposal presumes open access to writings and more importantly to the data behind research, which is an excellent demand made more persuasive by hosting a Wikimedia community who can work with the data when and if it is made free, and who will note when data is and is not being made available.
This project has the potential to bring a lot of professionalism to Wikimedia projects because it provides a data sharing environment on neutral, non-profit grounds when no obvious alternative exists. I can imagine subject matter experts using Wikidata to host their datasets here, even if only to mirror it here, and if that gets anyone to interact with it even one time in a period of years then the cost of mirroring here is so low that it could become standard for people to do this.
I see this project as the start of making Wikimedia projects more personally customized, in the sense that it seeks to make available highly specific information in a narrow range which could be generated by calling only certain parts of multiple datasets which are of interest to the individual who requests them, but which are not otherwise published. For lots of reasons, and especially the layman access to databases, this project seems like a radical intervention to me. This is why Wikidata was introduced. Blue Rasberry (talk) 16:57, 5 January 2015 (UTC)[reply]
I think you've summarized my sentiments rather aptly, Blue Rasberry; these are very heartening developments, though I think it will be years yet before people truly begin to appreciate how monumental the impacts could be. I don't think it's in the least overblown to say that using Wikimedia's framework for collaboration could prove to be an absolute revolution in many areas of research, so to have my attention brought back to Wikidata in the light of these developments has me quite excited. Snow talk 05:54, 9 January 2015 (UTC)[reply]


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0