The next big step for Wikidata—forming a hub for researchers: Wikidata, Wikimedia's free linked database that supplies Wikipedia and its sister projects, is gearing up to submit a grant application to the EU that would expand Wikidata's scope by developing it as a science hub. The proposal, supported by more than 25 volunteers and half a dozen European institutions as project partners, aims to create a virtual research environment (VRE) that will enhance the project's capacity for freely sharing scientific data.
Wikidata, Wikimedia's free linked database that supplies Wikipedia and its sister projects, is gearing up to submit a grant application to the EU that would expand Wikidata's scope by developing it as a science hub. The proposal, supported by more than 25 volunteers and half a dozen European institutions as project partners, aims to create a virtual research environment (VRE) that will enhance the project's capacity for freely sharing scientific data.
The goal is to overcome the insular approach of conventional, feature-complete environments by building on Wikidata's existing community and role in sharing scientific data. Instead of secure, self-contained and often discipline-specific platforms, the push is designed to enhance the open collaborative functionalities that Wikidata already provides to enable new forms of research and public interaction among both professional and citizen scientists.
Wikimedia projects have a long track-record of interaction with the research community, for example through Gene Wiki, which has been creating content on human genes since 2008. A blog post on the Gene Wiki's recent efforts to create Wikidata items for all human genes, followed by the publication of the underlying proposal, is what triggered the drafting of the present proposal.
This was closely followed by an announcement from Google that their collaborative knowledge base, Freebase, will be de-commissioned in early 2015, and that they look forward to the integration of Freebase into Wikidata, which is currently under discussion.
These developments came at the end of a solid year of progress for Wikidata, including increased usability. Through 2014, the local community is now the fourth-most-active editing community among Wikimedia projects—after the English Wikipedia, Commons, and the German Wikipedia. Externally, the project gained additional recognition by winning the Open Data Publisher Award 2014, presented by Tim Berners-Lee, the founder of the World Wide Web, and Nigel Shadbolt.
This proposal is significant because no other open collaborative project—or "virtual research environment", in EU parlance—can connect the free databases in the world across disciplinary and linguistic boundaries. With the inclusion of Freebase into Wikidata in 2015, the project will be capable of providing a unique open service: for the first time, that will allow both citizens and professional scientists from any research or language community to integrate their databases into an open global structure, to publicly annotate, verify, criticise and improve the quality of available data, to define its limits, to contribute to the evolution of its ontology, and to make all this available to everyone, without any restrictions on use and reuse.
The proposal is being prepared under the guidance of Wikimedia's long-serving volunteer for open science cooperation, biophysicist Daniel Mietchen. He works at the Museum of Natural History in Berlin, the city where Wikidata's developmental team is located; the Museum will act as the institutional coordinator of the project. More than 10 institutions have signalled their interest in joining the endeavour as associate partners, in addition to the volunteers, Wikimedia Germany (which has been primarily responsible for developing Wikidata), and the other five European partner institutions. Research from one of them—the Open University of Catalonia—is also featured in this issue's Recent research.
Mietchen told the Signpost that the budget, between €1m and €2m, would be invested primarily in building technical infrastructure, improving Wikidata's two-way connections with external data sources (including their ontologies), and training scientists and interested members of the editing community. Succeeding in EU rounds is notoriously difficult; if approved, it would be by far the biggest competitive external grant for a Wikimedia project. Either way, the proposal will be a significant conceptual and methodological advance; because it is drafted under a CC-BY license, it is available to other proposers or funders to engage with and build on.
Even before the current proposal was conceived, the Wikidata community had been exploring ways of improving ties to scientific communities in several respects. Back in 2012, the German community held a joint workshop with scholars from universities that included Cambridge, Stanford and Oxford to investigate the usability of the project for research (Signpostcoverage). In 2013, Wikidata's WikiProject Chemistry discussed ways to collaborate with PubChem, one of the largest chemical databases.
Mietchen says that the team would welcome community members to participate in the drafting, to review the proposal text, or to help shape the advisory board and network of associate partners. The EU's application deadline is 14 January, so timely contributions are particularly helpful. Upcoming steps on the path to the finishing line are on the project page. Beyond that deadline, the proposal is intended to spark follow-ups with a disciplinary or regional focus, and as a seeding ground for the newly created WikiProject Wikidata for research to develop procedures for the coordination of future activities between the research and Wikidata communities.