Wikidata Graph Split and how we address major challenges

Technology report

Wikidata Graph Split and how we address major challenges

Disclosure: I have a conflict of interest to favor all technology which I describe below, as I develop it as Wikimedian in Residence at the University of Virginia School of Data Science.

TL;DR summary – Wikidata has had a crisis since 2015, and in hindsight I wish we had talked about it sooner. More generally, I think that our Wikimedia Movement has a systemic problem of failing to identify and address our challenges. Comment below if you recognize missteps here in other Wikimedia systems.

If we had a problem, then would we talk about it?

About 1/3 of Wikidata items have always been metadata for scholarly articles from the WikiCite project, and now this is split from the main Wikidata graph.

The Linked Open Data cloud shows how open datasets link to other datasets. Since at least 2007 Wikimedia has been the most reused data resource. Consequently, any research institution which indexes its scholarly metadata in Wikidata is much more visible.

On 20 January 2026, the Wikimedia Foundation finalized the split of Wikidata into two collections of data, or "graphs". This Wikidata Graph Split affects the hundreds of regular contributors and thousands of regular tool users in the WikiCite community, who see value in curating a Wikimedia citation database. Since 2015, WikiCite's popularity exceeded the limits of Wikidata, or broke Wikidata, and consequently Wikidata has turned away new users, institutional partnerships, financial investments, and major content contribution projects due to our infrastructure lacking capacity to accept the contemporary standard of small data upload projects. All of us Wikipedia editors understand technical limitations throughout the Wikimedia projects, and to me Wikipedia's commitment to free and open-source software is endearing.

But in the case of Wikidata's limits, the problematic part was that since 2015, we tolerated uncertainty about if and when Wikidata's capacity would increase. We turned away users and projects for 10 years, and failed to signal a crisis and emergency. While I can understand Wikimedia governance planning fixes on a schedule in the context of our scarce resources, I want confidence that we have shared understanding of our challenges, and to reduce long-term uncertainty about if and when our tools will function as expected. If we had a major problem with a Wikimedia platform, then do we have the community infrastructure to talk about it?

My feeling is that our Wikidata challenge was not technical, but rather was about interpersonal relationships. For the future, I want confidence and trust that when we Wikimedia editors have major challenges, then we have a community governance system to recognize and discuss them. Look here with me at the circumstances which have slowed Wikidata growth for some years, and be hopeful with me about the success plan to fix things by summer 2027 when the Wikimedia Foundation will migrate Wikidata's backend to a new SPARQL engine.

Why anyone should care about WikiCite or Scholia

Scholia is a scholarly profiling service using Wikidata and affected by the split. Findings from this 2025 user survey included that users are enthusiastic to browse scientific research through Scholia as a Wikimedia research service.

Wikimedia annual plans all prioritize investing in the recruitment of more Wikipedia users. At the same time, we have gone many years without discussing Wikidata's limits as a major barrier to growth.

Scholia profiles for people visualize their scholarly publications, topics of works, co-authors, software use.

WikiCite is important for the Wikimedia community because it has been among the most popular Wikidata projects in terms of user count, content produced, investment attracted, university partnerships, active discussions, count of non-editor users, and stirring of passion. Universities are in the business of doing research, but lack an easy way to list their own researchers and own research publications. Only some universities can afford subscriptions to scholarly profiling services such as Web of Science or Scopus, but the WikiCite community seeks to provide this for free, to everyone, by using Wikidata to match citation metadata to researchers, institutions, and topics. The WikiCite project attracts contributors because it is easy to imagine a Wikipedia-aligned scholarly profiling service becoming fundamental to global research infrastructure.

WikiCite is the project to curate scholarly metadata in Wikidata. It includes the editing project, the community of editors and conferences, and outreach efforts through which institutions contribute their data, such as the WikiProject Program for Cooperative Cataloging project which recruited 50 universities to index their research in Wikidata. There are a handful of projects in the Wikimedia Movement which have 100s of editors and a portfolio of institutional partnerships. Although there are multiple reasons why editors come to WikiCite, a unique connection that the project has is that universities index their faculty and research publications in Wikidata both for Wikimedia community curation, and also because that indexing is a good investment as it surfaces the university's research output as linked open data in all other Internet services and AI which index research.

Scholia is a friendly web interface for accessing WikiCite collections. It is friendly in the sense that it has more than 400 scholarly queries already formatted, for example, list of a researcher's publications, list of people and research at a university, or profile of research on a topic. This sort of service is "scholarly profiling", and to sort this data, one needs the "scholarly graph of metadata" as Linked Open Data connecting topics to scholarly articles to authors to their institutions, co-authors, software, datasets, grants, and everything else. Scholia and WikiCite are the Wikimedia projects for scholarly profiling, and alternatives to services including Google Scholar, Web of Science, or OpenAlex. I am part of the Scholia team, and I am biased to favor it, but I think the WikiCite approach to connecting Wikimedia projects to a global scholarly database is one of the best and most popular project ideas that the Wikimedia Movement has developed. The WikiCite community includes a base of power users who also find value in this approach, as communicated in our 2025 survey of Scholia.

Exceeding the limits of Wikidata

In May 2024, The Signpost shared my story that "Wikidata would soon split as the sheer volume of information overloads the infrastructure". Disclosure, again: I am a Wikimedian in Residence who develops Wikidata content as a university researcher, so please note that I have an employer conflict of interest in this op-ed and in Wikidata's perpetual growth.

The split divided WikiCite content, which was 1/3 of the content of Wikidata, from everything else in Wikidata. The Wikimedia Foundation and Wikimedia community actually did discuss this, a lot. I really appreciate the Wikimedia Foundation staff who did many favors for me to give me many meetings monthly since 2024 by video, email, at conferences, and through referrals. Copied from the 2024 Signpost article, here again are the major discussion reports. The insight to gain from these reports is long term recognition of a major challenge, when all the while Wikidata is at reduced growth with no planned year in which we would increase capacity. No one did anything incorrectly, and delaying the decision always made sense at the time.

I see parts of the Wikimedia Movement that invest heavily in growing the editor community, and other parts of the Wikimedia community where I feel that technical challenges are incompatible with editor recruitment. In my view, Wikidata has been closed and in limbo for 10 years, but no community group ever organized to make a leadership statement of when Wikidata might update, and how we should make multi-year plans. There were thousands of hours of user time spent talking about the problem. We were unable to establish a governance plan to evaluate the cost of delay versus the scheduling of a decision. The worst part of this to me was that each year, there was the misunderstanding that someone was about to fix the problem, and that Wikidata service would expand. If this is a one-off in the Wikimedia Movement, then that might be tolerable, but I expect that if we had more robust community governance, then we might have a public ranked list of Wikimedia greatest challenges, and some estimate of the costs of decisions to address those challenges or delay.

Wikidata Graph Split

The Wikidata Query Service Split and its Impact on the Scholarly Graph (Q137374886) is documentation for institutions which need an explanation of the split.

Wikimedia servers use Grafana to track resource use. Here, the Wikidata Query Service has normal usage in November 2025 – January 2026.

Now that scholarly content is split into its own graph, it is hard to access. Use which was too high to manage has dropped to perhaps not at all in November 2025 – January 2026.

I am lacking insight, but now that Wikidata is split into two graphs, I am unaware of the existence of individual or institutional users of the scholarly graph which was supposed to be a solution to sustain Wikimedia community access to this content.

To clarify, Wikidata has two familiar parts: Wikibase, where users edit Wikidata; and Blazegraph, which hosts the query service. Wikibase is the data-oriented variation of MediaWiki; it is what most people think of when they are familiar with Wikidata, as it is the wiki for editing data. Wikidata's Wikibase is not split. The other part of Wikidata is its query engine, and that is split.

One of the splits is the Wikidata Query Service, now minus scholarly articles after the split.

https://query.wikidata.org/

After the graph split, now there is the scholarly graph, which is an endpoint containing only citation metadata.

https://query-scholarly.wikidata.org

This is jumping ahead a bit, but the Scholia team found the scholarly graph unusable, and migrated the full graph to a Qlever query engine. Anyone wanting to query a single graph can do so at

https://qlever.scholia.wiki/

While WikiCite is a major Wikidata project, Wikidata is such a large platform that most Wikidata users do not curate citations, and will not notice the Wikidata Graph Split. For those who do want citation data through the Wikidata Query Service, then the Wikimedia platform solution is that they have to write a two-part query in which they seek some data from the Wikidata main graph, then get citation data from the Wikidata scholarly graph. In practice, this is too difficult. If there is a user community for the Wikimedia hosted scholarly split graph, then I have not yet seen their projects, and please someone link to them in the comments section of this article.

The Scholia team hosts virtual hackathons where anyone can put issues or problems in queue for the volunteer developer team to address in the next round. The April, November, and December events from 2025 all have documentation on what volunteers had to organize to prepare for the January 2026 graph split. There is a list of affected tools, some of which have updates. The Scholia team created Wikidata Query Service graph split documentation to describe how anyone should respond to the Wikidata graph split. This is both extraordinary that volunteers put these events and labor together, but also common across Wikimedia projects that volunteers organize responses and adaptations to keep tools functional in response to Wikimedia Foundation platform changes.

Blazegraph migration

Scholia 2026 Compliance with SPARQL 1.1 (Q138233208) reports that Scholia is updated to prefer standard-compliant SPARQL 1.1 in the Qlever SPARQL engine in favor of the older-versioned and customized Wikidata SPARQL for Blazegraph

Wikidata was established in 2012 as the linked data complement to Wikipedia's prose, and was part of our strategy to keep Wikimedia projects technologically advanced. The software backend of Wikidata is the scrappy Blazegraph, which is free and open-source software. At the time of Wikidata adopting it, it already had its own independence, development team, and funding to sustain it. While no one can buy or close open-source software, companies can hire every developer and expert on the software. Amazon acquired the Blazegraph team soon after Wikidata had committed to Blazegraph as its SPARQL engine for queries. Amazon Neptune is based on Blazegraph open software, but proprietary software. Consequently, Wikidata's SPARQL engine backend has not had a significant update since Wikidata established its SPARQL endpoint in 2015.

While the Wikidata graph split relieves the Wikimedia Foundation servers of the intense computation required of a larger dataset, the graph split is not intended as a solution, but just a way to delay the crash by 2 years, assuming that we also keep restrictions on data imports and deterring expected use. Blazegraph is now abandoned technology and inferior to alternatives. The planned solution to ready Wikidata for next generation editing is to migrate Wikidata's SPARQL engine to another database by summer 2027.

In September 2025, the Wikimedia Foundation announced a schedule for a Wikidata Query Service backend update. It is good news for Wikidata editors that there is a newly appointed Wikidata Platform WMF staff team doing these changes. Everyone should support them and wish them all success. They are available to meet during scheduled office hours.

Another major change which is timely now is that when Wikidata migrates to a new SPARQL engine, we could update to standard SPARQL 1.1. The Wikidata Query Service has been using a customized, older version of SPARQL only for Wikidata. The Wikidata version of SPARQL is easy to use especially for managing multiple languages, but using customized SPARQL also has drawbacks. One drawback is that if we migrate to another system, then either we need to redesign the customization, or require that every single Wikidata tool and query be updated to standard SPARQL. The previously mentioned list of tools affected by the graph split may be small in comparison to the changes needed if we migrate to standard SPARQL.

We in the Scholia team migrated to an option which uses standard SPARQL by modifying about 400 queries.

Selection of next-generation SPARQL engine

Benchmarking SPARQL Engines on Wikidata Queries (Q137374978) reports Wikimedia community-supported testing of various Blazegraph replacements

WDQS Triple Store Evaluation: Benchmark Results Report (Q138235408) reports the Wikidata Platform team's testing of Blazegraph replacements

There is an exciting competition happening right now to decide the next SPARQL engine for Wikidata. The Wikimedia Foundation has selected two candidates: Qlever and Virtuoso. If all goes well, we should have a revived Wikidata by mid 2027 with greatly expanded capability for processing data and inviting institutional partnerships. Both of these options have 10–100× the capacity of Blazegraph, and are viable alternatives. Other candidates have already been disqualified after earlier testing.

The Scholia team has already made a commitment to Qlever. To avoid federated queries, there is a single Wikidata graph containing everything at https://qlever.scholia.wiki/ , and hosted by the Qlever team at the University of Freiburg. Virtuoso is a great candidate also and both should be tested; I am just sharing how things turned out.

Wikimedian Peter F. Patel-Schneider has been benchmarking various engines with 7 different competition benchmarking query sets, each of which is a large dataset designed to stress the systems with queries. In mid-February 2026 the Wikidata Platform team posted their WDQS Triple Store Evaluation using 3 of the simpler of those 7 datasets, and published their own benchmarking results. Communication between the Wikimedia communities and the new Wikidata Platform team is starting and ongoing. Wikimedia Switzerland has been supporting Wikimedia community engagement in the transition process, including by sponsoring research in this report and by hosting WikiCite 2025.

How we talk about challenges

The solution that I want for the graph split, and for many other existing Wikimedia Movement challenges, is simply to be able to see that there is some group of Wikimedians somewhere who have active communication about our challenges. I want to get public communication from leadership who acknowledges challenges and who has the social standing to publicly discuss possible solutions. I want to see that someone is piloting the ship upon which we all sail, and which no one would replace if it ever failed and sunk. For lots of issues at the intersection of technical development and social controversy – data management, software development, response to AI, adapting to changes in political technology regulation – I would like to see Wikimedia user leadership in development, and instead I get anxious for all the communication disfluency that we experience. Ten thousand of us or so participated in the 2018–2020 Wikimedia Movement Strategy, which had the goal of improving our governance infrastructure such that if we ever had a major problem, then we would quickly identify it and discuss it without fear. The Wikidata Graph Split is not the story here. The story here is that so much in the Wikimedia Movement is fragile, and that when we have major challenges then networks like WikiCite are unable to create chains of decision making to address them.

I appreciate all the effort that Wikimedia Foundation staff put into collaborating with the WikiCite community for the transition. The Wikimedia community is extraordinary for community participation in all levels of governance. The challenges we have are normal for Internet tech platform development anywhere, and is the way that user communities experience software updates.

What you can do

Happy Valentine's Day, everyone love one another

Participate in on-wiki conversations to make decisions.

If you want to talk with the Wikimedia Platform team, then there are migration office hours
Wikidata is currently having its boldest discussion on notability criteria. Is WikiCite in scope? What about locations in OpenStreetMap? Should we graph split biographies? Can we do WikiCite, but for Internet Archive holdings instead of scholarly publications? Is it finally time to import all proteins and all astronomical objects?
Also comment on mass editing policy, and other Wikidata requests for comment
The Wikimedia Foundation and Wikimedia Deutschland agreed off-wiki that even after migration from Blazegraph, the split graphs will not be rejoined, even if the new platform has capacity. There is no public discussion about this, and I want one. Please comment below or message me privately if you want to help arrange some public discussion for this.
The Wikimedia Foundation operates Wikidata's API, and Wikimedia Deutschland operates everything else Wikidata. They share power and money with each other. I do not know anyone in authority for Wikidata issues at either place, but right now is Valentine's Day time of year and I think they could be better pals. If anyone can, get interviews with representatives from both and get them to say publicly that each one wants the other to perpetually have all the power and control and money that they currently do. If either objects, then get them to talk it through.
Please sign to support meta:WikiCite (3), which is a proposal to establish WikiCite the citation database as an official Wikimedia project

← Previous "Technology report"

Next "Technology report" →

In this issue

17 February 2026 (all comments)

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

The sad thing is that if you look at the financial information for the Wikimedia Foundation, you'll see that the foundation has plenty of money to throw at problems. Specifically, between net assets of the organization, and money in a separate endowment fund, there is at least $300 million for such things as increasing data storage and processing capacity, things that can and should be done well before a crisis arrives. -- John Broughton (♫♫) 18:00, 17 February 2026 (UTC)[reply]

The thing that everyone should know about Wikidata and Blazegraph is that Amazon acqui-hired everyone at the Blazegraph nonprofit organization, so it has not had a major update since 2015. Wikidata has been in trouble since that time in 2015. I think this was a major failure of vision on part of WMF. Nothing was stopping WMF from taking over the project. They could have hired people to work specificly on it. Open source is not just a place to get software without paying for it. You're expected to contribute back to make it meet your needs. We have hundreds of people working on mediawiki. Blazegraph was, as far as i can tell based on github stats, developed by basically just two people. WMF could have hired some folks to replace Systap. Bawolff (talk) 18:24, 17 February 2026 (UTC)[reply]

I mean, the WMF did initially have two people (Stas and Nik) assigned to working on WDQS and by implication, Blazegraph, it's just that the MediaWiki Core team (which, disclosure, I was on) was destroyed in the infamous engineering reorg that moved WDQS under the new "Discovery" (aka Search) department that would end up being the center of the Knowledge Engine fiasco...

Anyways, I do think this is a bit of an incomplete retelling of the story. The first candidate for WDQS was a homegrown PHP/MySQL system based on DBAL that was possibly fully written but never got close to deployment. Second candidate was a database named "Titan", and they too were acqui-hired basically the week (if not the day) the decision was being finalized!!! Blazegraph was the third choice, and when the situation repeated itself, at that point the decision was to keep moving forward despite it being an dead upstream because that had already effectively reset the project twice. I'm not sure that was the wrong decision at the time.

The last thing I'll say is that if you read old project pages like Wikibase/Indexing, you'll see that the initial MVP case was for WikiGrok, which was killed before WDQS even launched. Very much agreed that it was a failure of vision of what Wikidata could and would become. Legoktm (talk) 02:49, 19 February 2026 (UTC)[reply]

I always had the impression that stas and Nik were more doing devops related to WDQS, but were definitely not taking over general development of Blazegraph (stas has 15 commits to blazegraph, i'm not sure if Nik has any). Which is what i think would have been needed to be done back in the day if we were serious about long term viability. Even if we do end up switching to qlever, i think the same applies - wmf should hire people to work specificly on it if we want to ensure long term viability of wikidata. Qlever may not have been aqui-hired (yet!), but as far as i understand its essentially a project of one professor who gets students to help (looking at github stats it looks like there are about 5 reasonably active contributors right now). That is a small bus factor, and academia isn't exactly known for being a conductive environment for long term development of software. WMF should put resources into it to ensure its long term viable. We should not assume others will just always do the work for us. Bawolff (talk) 05:44, 19 February 2026 (UTC)[reply]

Both of these options have 10–100× the capacity of Blazegraph. I think an allegedly deserves to be added here. Blazegraph, according to marketing material, also supports significantly higher capacity then we are at currently. Its easy to claim theoretical high capacity when nobody puts it to the test. Bawolff (talk) 18:29, 17 February 2026 (UTC)[reply]

@Bawolff: I have a report as mentioned above - Benchmarking SPARQL Engines on Wikidata Queries (Q137374978). I could be communicating incorrectly, but as I understand, Peter believes this is the capacity. Qlever at least can store the current Wikidata graph with no problem, unlike Blazegraph. If we have the capacity, then I am in favor of merging the split graphs to restore the full graph, but I think WMF and WMDE are skeptical of including citation data projects at all. — Preceding unsigned comment added by Bluerasberry (talk • contribs) 23:31, 17 February 2026 (UTC)[reply]

Re benchmarking - that is an important fitst step, but it doesn't seem to be testing under load (many people querying at the same time) or with data churn (the db being updated as people edit wikidata). I'm far from an expert on this, but reading through the design of qlever, it seems like its the sort of design where rapid data changes may affect performance, much more so than blazegraph is. Bawolff (talk) 03:45, 18 February 2026 (UTC)[reply]

As an addendum, it looks like i was reading an on wiki summary originally, and not the actual paper which contains more details. While its a very promising sign that qlever queries are faster, the experimental setup is still pretty artificial (e.g. how warm caches are, lack of other concurrent traffic, lack of data updates). More importantly i don't really think this paper evaluates scalability. For that you would need to model how things are going to change over time under realistic load, not to mention the different axis of scalaility as its not just query time. (its understandable that they didnt do that, it would be an absolutely huge undertaking). The paper is still very good work and a positive sign for qlever, but i think its an overstatement to say that it shows qlever has 10-100x the capacity or that it even quantified the capacity limits of qlever at all. Like even if you ignore potential differences between reality and the experimental setup and other forms of scalability other than query time, the paper still doesn't really say much on the question since it only evaluated the products at one graph size. You can't really draw any conclusion about what the underlying scalabiliy is from collecting only a single data point. Bawolff (talk) 22:54, 18 February 2026 (UTC)[reply]

@Bawolff: Great questions. I do not have the answers, but I have more information. I see two "axes of scalability" that you mention: 1) What is capacity the database? 2) When at capacity, can a database also handle Wikidata's huge traffic?

Here are comparable Qlever instances

9B triples - https://query.wikidata.org/ ⇒ this is Blazegraph, the official Wikidata query service
17B triples - https://qlever.dev/wikidata ⇒ this is Qlever holding the unsplit Wikidata graph, including all WikiCite content
11B triples - https://qlever.dev/wikimedia-commons ⇒ commons:Commons:Structured data
206B triples - https://qlever.dev/osm-planet ⇒ OpenStreetMap
239B triples - https://qlever.dev/uniprot ⇒ UniProt

UniProt, as I understand, gets 1000s of requests a day, compared to Wikidata getting 10s of 1000s a day.

Disclaimer - I know nothing whatsoever about any of this and just repeat what people tell me - but as I understand, these numbers are evidence that Qlever has more capacity to query triples than Blazegraph, at least at low traffic, and that the traffic reports we do have demonstrate that it can handle somewhat high traffic, although we do not have a demonstration that it can handle 10x triples at Wikidata's high level of traffic.

I think we can get expert opinions about how to do more benchmarking, with all of Wikidata and artificial simulated Wikidata data to make a big database, and then test high traffic, but I think this has not been done yet.

I really am not sure how to match Wikimedia community participation and governance with technical decisions in this space, except to say that as an editor I feel the pain of the indecision and lack of conversation which has halted my projects and also a lot of Wikidata outreach. Bluerasberry (talk) 18:11, 19 February 2026 (UTC)[reply]

I think its hard. Rigiourus testing would be a very labour intensive venture, and to a certain extent there is a question of, why does it matter. Qlever is after all the most promising option, we might as well just do it and see what happens. No offense to virtuoso but its not like the competitors are really great. Its kind of the only option, so what's the point of testing it to the nth degree - it will either work or it won't and if it doesn't maybe improvements can be made.

There are many axes of scalability since so many things can affect it. For example how does RAM affect performance? You can buy machines with plenty of ram. AWS rents machines with 32 terabytes of it, so there are (expensive) off the shelf solutions with 250x the ram of current WDQS. Different products might behave very differently depending on how much of the dataset fits in ram (which also implies tracking how internal data structure size increases with graph size could be a major contributing factor. I suspect qlever is more efficient than blazegraph in this regard). I get the sense for WDQS, one of the big scalability concerns is setup time of a fresh db, as it is long and the blazegraph software is not exactly rock-solid stable. Qlever is a major improvement on this front. Bawolff (talk) 18:57, 19 February 2026 (UTC)[reply]

Important subject. If Wikidata is to become truly useful, then something needs to be done about those limits. For most major applications or types of items, it needs a far larger state of completion via mass-data-imports and these seem currently only feasible if sth is done about those technical limitations. Potential applications include Scholia charts about studies about a subject or by an author (maybe at some point altmetrics scores can also be queried; note a main issue is that only a fraction of studies have WD items), books data, documentary films, software, ingredients (eg see 1, 2), products, companies, and so on.
Think especially of where people use databases in their daily lives – isn't that where we'd like Wikidata to come in? I use the linked calorie tracker and get data from OpenFoodFacts (WD not involved), check movie data on imdb, use CodeCheck to scan products for prevalent harmful chemicals, search sorted studies on ScienceOpen, etc. Also worth noting is that one can now build SPARQL queries using natural language so it's become easier to query this vast dataset.
Scaling isn't the only difficulty; it also needs people to actually do such imports (including Anna's Archive metadata about books I think) as well as the mostly archived/stale bot requests and it probably needs some way to lock items or most properties thereof because it's not feasible to watch millions of items (alternatively better patrolling tools). It does seem like genuine innovations and out of the box thinking is needed to solve this problem of the technical limits. --Prototyperspective (talk) 01:14, 18 February 2026 (UTC)[reply]

Using Anna's Archive metadata is a non-starter because they just scraped it without permission. See [1]. If we want to convince Worldcat to what parts of their metadata they're willing to share, we should probably just ask nicely. I rather doubt they'd give us everything, but what they do give would be usable and not the kind of stuff deleted due to a court order after we've already built on top of it. SnowFire (talk) 19:54, 18 February 2026 (UTC)[reply]

We should probably go the legit route, but it also seems unlikely the anna's archive metadata is actually problematic. Databases cannot be copyrighted in the usa, and the other claims seem to be about the method of obtaining the data not the data itself. Bawolff (talk) 22:19, 18 February 2026 (UTC)[reply]

That order is not just about metadata but also other data they scraped and the process of the scraping, not the data itself. I don't think one can copyright factual metadata like how many pages a book has. Things like descriptions can't be imported. Another idea would be a separate WMF-unaffiliated site that uses Wikidata but does extend it with such data – however that would not only be far from ideal but also not needed I think. This approach could nevertheless be considered in general even without copyright concerns.
Basically, the main thing I'm saying is that it's good to think more about people's real-world potential uses and then trace back from there so to say what would need to be done for WD to be the platform where people get their data from. It needs real use for something to be worth spending time, effort and resources on. For example, charts based on citation counts or paper topics are very limited in usefulness (misleading) if there's only 10% of papers in the database that the chart shows (currently without any indication that it's incomplete btw). The problem there is not just technical limitations but also that it needs bulk data imports.
So in regards to your concern, ideas would be a) launching some thorough investigation to which extent which of its data can't be made accessible on Wikidata legally when users contribute it b) considering alternatives such as more dynamic loading of external data, a separate platform as a layer on top of Wikidata and c) looking for alternatives (OpenLibrary etc) that would however be far less complete etc. One could also work on enabling some other applications (practical/actual uses) first and wait for legal things around books metadata to be clarified. This is just what seems to be the barrier when it comes to books data. Prototyperspective (talk) 22:35, 18 February 2026 (UTC)[reply]

Databases are a famous gray area - see Database right. While any one fact might be non-copyrightable, the collection may well be, which is why (pre-Internet) you couldn't just reprint other people's almanacs or atlases or guidebooks of sports statistics. So if the WMF undertook to rebuild Worldcat on its own and put together a Worldcat clone without just scraping Worldcat for everything, then yes, that's legit, even if the result was suspiciously similar to Worldcat. Anyway I'm not remotely in tune with what the WMF "partners" division is up to, but given that the WMF already works with outright for-profit corporations like Google, I have to hope that relations with OCLC would be more cordial. If we think this data is useful to clone, it'll be many times easier and more accurate if Worldcat just gives us some subset. If the WMF says that the OCLC told us to pound sand and we think this data is a big deal, we can discuss re-inventing the DB then, IMO. SnowFire (talk) 05:20, 19 February 2026 (UTC)[reply]

I don't think the article you linked supports your position. I'd agree its grey internationally, but it doesn't seem grey in USA. Bawolff (talk) 05:24, 19 February 2026 (UTC)[reply]

I think paying attention to limits and thinking about what content should be in Wikidata is important. Instead of having one big data base I prefer federated databases. So for example for books or scientific articles having an own database with the possibility to refer to a specific book in a Wikidata statement through linking it. Most of my edits in Wikidata came through adding descriptions to items about scientific articles using mass upload using QuickStatements. So in the past I worked a lot on content rarely used in Wikipedia language versions. Regarding the query service I support trying QLever and lets see if it works. I asked about available budget for development of the chosen query service backend. To make sure what happened with Blazegraph can be avoided this time. It is from my point of view important more people pay attention to it and ask for example questions in the Annual Plan 2026 discussions regarding this topic.--Hogü-456 (talk) 22:34, 20 February 2026 (UTC)[reply]

@Hogü-456: Apparently, federating Wikibase costs tens of millions of dollars and is a long-term commitment. Federating enables limitless growth and also allows other institutions to connect content without importing it into Wikidata, but for our current Wikidata community, we do not have that level of sophistication. Also, the same barriers which have blocked Wikidata's growth since 2015 have also been blocking the growth of the Wikibase community, and federating makes much more sense if we have a robust Wikibase community that has many requested uses for federation, which right now requests do not exist. As I understand, some very devoted institutional partners to Wikibase have already left to use other platforms. Getting more data into or connected to Wikidata, whether through federation or through using a bigger database, should increase its usefulness and increase demand for it.

As for available budget, the desired strategic plan is to invest about $0, and instead choose free and open source software that other institutions will sponsor and fund, so that we get benefits of sharing. With Blazegraph, it was a surprise and unpredictable that Amazon would close its development and make it commercial. Any open source project can be closed with enough money. Right now, Qlever is developed by a team at University of Freiburg. I have no idea how much money it takes to convince a human to fork and close their software, but for as long as the team is supporting openness, it is easier to get European Union institutions to sponsor it.

I feel very vulnerable about all of this. Regardless of whether there will be an investment in federation, definitely that decision will happen after the migration from Blazegraph to Qlever or Virtuoso. Bluerasberry (talk) 23:02, 20 February 2026 (UTC)[reply]

Hi all, to clarify scope: the Blazegraph migration does not include re-merging the scholarly and main graphs and there are no plans to do so as part of this work; this is now explicitly documented in the WDQS project scope and the graph split FAQ, which we’ve updated to reflect this. Based on requests from people in the WikiCite community, WMDE has committed to supporting the community in clarifying its decision making processes. That is a basic requirement to be able to make a decision about how the WikiCite community wants to move forward given the existing constraints. That will happen later this year. We're also hosting regular Blazegraph migration office hours (next on Tuesday) for anyone who’d like to discuss what is in scope for the Blazegraph migration. Udehb-WMF (talk) 16:08, 2 March 2026 (UTC)[reply]

@Udehb-WMF: Thanks, I asked to be contacted when WMDE organizes that community support conversation, so that I can rally the WikiCite community to participate. - meta:Talk:Wikimedia Deutschland/Plan 2026. Bluerasberry (talk) 00:04, 4 March 2026 (UTC)[reply]

Explore Wikipedia history by browsing The Signpost archives.

Home

About