The Signpost

Opinion essay

The copyright crisis, and why we should care

Contribute  —  
Share this
By Moonriddengirl

Moonriddengirl has been a Wikipedian since the first half of 2007, becoming an administrator for the English Wikipedia later that year. In that capacity, she dedicates much of her volunteer time to dealing with copyright concerns at the English Wikipedia's copyright problems board and contributor copyright cleanup, attempting to implement Wikimedia's zero tolerance policy on copyright infringements. In addition, she works for the Wikimedia Foundation in community liaison. Below, Moonriddengirl outlines her view that all contributors need to pull together to manage copyright concerns on the English Wikipedia.

The views expressed are those of the author only. Other editors will often leave opposing views and potential corrections in the comments section. The Signpost welcomes proposals for op-eds. If you have one in mind, please leave a message at the opinion desk.



We have a copyright crisis. Wikipedia is full of copyright problems. How full, I don't know.

I do know that CorenSearchBot (before it became inoperable due to a catastrophic change in Yahoo's terms) routinely found several dozen new articles every day built on content copied from other websites. I know that every day more articles and images are tagged by human contributors for speedy deletion for copyright concerns or listed for the slower processes of the copyright problems board or possibly unfree files. I know that there are more tens of thousands of articles and images awaiting copyright review at WP:CCI than I want to tally; this is content placed by people we know have repeatedly violated copyright. Odds are good that a substantial portion of this content is a problem, too. In spite of policies prohibiting it—and in spite of prominent reminders of those policies on every edit page—more copyrighted content finds its way into our project every day.

Why it happens

People place copyrighted content on Wikipedia because they can, because it's easier to copy somebody else's words than write your own, because it's hard to resist using somebody else's picture when the only other alternative is that an article has no pictures at all. Some people do it accidentally, attempting to change content but not changing it enough. Some people do it defiantly, using Wikipedia as part of their own statements against copyright laws.

Most people do it with good intentions, I believe. I've talked to hundreds of people about this over the last few years. Few of them seem to be out to deliberately cause trouble, even the ones who wind up being blocked because we can't get them to stop. The fact is that many of them just don't see the harm, and some have trouble even understanding what the issue is.

In some cultures, copyright is no big deal—even reputable sources copy without obvious concern. (No kidding: I've seen books by evidently respected academicians that have baldly copied from Wikipedia without credit and government websites that have done the same.) In a way, it's not much of a deal to the international Internet culture we all share. People paste news articles into their blogs or appropriate copyrighted cartoon characters as their avatars all the time, without a thought as to whether the content is copyrighted and what that might mean.

Why we should care

This may be why even some of the contributors who don't cause the problems and who plainly do understand the concept of copyright simply don't think about whether or not it's happening here. Blatant violations may pass right in front of them, and they don't notice. They simply don't seem keyed in to the issue. It happens everywhere, and, after all, if a copyright holder objects, all we have to do is take it down.

While technically true, this is an attitude Wikipedia can't afford. For whatever reasons people place the content, and however we ourselves may feel about copyright, keeping it is not only potentially damaging to copyright holders, it's bad for us. It's bad for our reusers; it's bad for Wikipedia; it's bad for our volunteers.

I'm not going to discuss the question of whether intellectual property laws are a good thing or a bad thing. (Although as a published writer who receives small royalty checks every year, I have a certain interest in the question.) It's a passionately debated subject, and, in my opinion, it's not necessary to go into it to settle the important point. It's a simple matter of fact that we are subject to intellectual property laws, and we need to recognize how working within that reality is in our best interests. While we have the option to swiftly address copyright concerns by simply pulling material from publication—indeed, we have a legal obligation to have a designated agent to answer takedown notices sent to us by copyright holders and their representatives—our content reusers may not have the option of responding so simply. If a video documentarian uses images that were hosted on Wikipedia under the mistaken belief that the free license label on them is accurate, he may have to recut his documentary to remove them or replace them with something else. If a publisher places some of our featured articles on animals in a textbook, she may have to pull it from distribution.

A propaganda cartoon explaining why multilicensing benefits reusers, part of the push to accommodate reusers.

This is a major problem. We like content reusers (if not all of them). We really do. We encourage them to do it—to use our material online, in books, newspapers, video documentaries; to use it and modify it whenever and however they like, so long as they follow the licensing terms. Indeed, the Wikimedia Foundation's mission is "to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally." We've made it as easy for them as we can. But how many times would a reuser encounter the trouble or expense of withdrawing problematic content before deciding to avoid our work? If the content we bill as "free" is not, we risk damage to our reputation and discouraging the global dissemination of our work.

Beyond that, I have personally observed the inconvenience and expense (at least in terms of time) to our volunteers when copyright problems created by others are encountered too late. "Too late" in this context would be after they have themselves engaged with the content. Too often, somebody creates an article or expands it with copyrighted content placed without permission of the copyright holder. Others come behind to improve the article, sometimes putting a great amount of time into polishing prose, locating sources, adding text. Their work is tainted, too. The time they've spent polishing copyrighted content is lost when that content must be removed. The hours they've put in could have been better spent building usable content or creating an article we can retain. Then there is the cost to their motivation. I've spoken multiple times to people in this situation who are heartsick and discouraged by the experience. I hate the thought that we've wasted their time, that we might lose them, because of a problem that was not promptly detected or resolved.

There's also a cost to the volunteers who create the problems in the first place. As I said, I believe most of these people are working in good faith. Those who have trouble grasping the issue may require more guidance than those who simply didn't think it mattered, but copyright problems can be corrected. If the issue is discovered early in a Wikipedian's career, we may be able to more easily clean up any outstanding issues and help them avoid creating more, enabling them to move forward as constructive and valuable contributors. If problems linger, more articles may be tainted and fall-out greater in terms of both collateral damage to others and loss of the contributor themselves.

We need to care; we need to take action.

What we can do

Handout derived from "Let's get serious about plagiarism" in The Signpost

While copyright cleanup can use all the active contributors it can get, you can help with the problem simply by being conscious of the potential so that you recognize copyright issues when they appear. Does an image look unlikely to be original to the uploader? Text too polished or disjointed in tone? Even if you don't feel that you can help with cleanup, you can tag a suspicious text or image copyright concern for others to evaluate. You can save reusers potential time and expense, save your fellow volunteers wasted effort, perhaps a reparable contributor issue from devolving into an unsalvageable one. The simple act of identifying the problem is the first, crucial step to resolving it. Swift handling is the best service we can provide to our reusers, to the project and to our contributors (as well as, in my opinion, to the copyright holders). By recognizing the problem and resolving it when it first appears, we can keep it contained.

Further reading

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Wikipedia is in the position of strength here. The search engines require Wikipedia a lot more then the reverse. We should consider blocking spidering of Wikipedia by Yahoo unless they allow Wikipedia via CorenSearchBot access to search returns for copy vio purposes. It wouldn't take long for Yahoo to be distressed at such a decision. Can we use Google to check the copy vios? Google makes big bucks out of Wikipedia by immediately accessing the updates. Is there an a technical issue of using Google or some policy issue? Regards, SunCreator (talk) 01:58, 6 September 2011 (UTC)[reply]

Even the idea that Jimbo would announce to the press that Wikipedia was thinking of blocking Yahoo would send it's share price down and some immediate attempt to rectify the situation from Yahoo. It's not like accessing it's search results automatically is a problem, they just don't want everyone doing that, I'm sure they will make an exception for Wikipedia. Has anyone even asked Yahoo? Regards, SunCreator (talk) 02:08, 6 September 2011 (UTC)[reply]
Coren and Jimbo are in negotiation with Google with respect to this issue. MER-C 02:57, 6 September 2011 (UTC)[reply]
If Jimbo does do that it's perfectly allowable by the nonexistent rules of capitalism. --Σ talkcontribs 03:49, 6 September 2011 (UTC)[reply]
While I kind of like the idea of us throwing our weight around, in the spirit of Christmas, lets not. extransit (talk) 05:23, 6 September 2011 (UTC)[reply]
  • I think it would be unethical to punish Yahoo for first helping us (do searches) and then deciding (for unknown reasons) that it cannot help us any longer. Why should anyone help us, if we show that we will be vengeful when they stop? JRSpriggs (talk) 06:38, 6 September 2011 (UTC)[reply]
  • Quite a while later but I came across this while deciding on my arbcom votes and agree whole heartedly. Particularly since we would be punishing Yahoo for something which neither Google or Bing allowed us to do. Doesn't Yahoo rely on Bing nowadays anyway (i.e. can we even block them independently)? For suggestions like ƒETCH proposing we make noise about all three it's a little fairer but IMO still not likely to be effective. People are more likely to thing just because we're a non profit doesn't mean others have to let use their service in a manner that's normally charged for, so we'll come across as whiny complainers. Remember also search engines work both ways. Yes they take our resources by indexing but they also make it easy for people to find our content. Us using a search engine to find copyvios isn't that much of a benefit to search engines except in an abstract 'it's good for us therefore good for them' or 'good publicity' sort of way. Nil Einne (talk) 17:35, 5 December 2011 (UTC)[reply]
  • Great article. I try to apply a 'does it look too good to be true?' test to new articles and uploaded images, and this has produced good results (I've caught a largish number of copyright violations and been pleasantly surprised by content that turned out to be fine). In my experience text that looks like it came from a news story probably did. Nick-D (talk) 10:54, 6 September 2011 (UTC)[reply]
  • I have been criticized by an experienced Wikipedian for deleting material copied unaltered from a web site. I agree that there is a problem, both in the extent of copyvios, and the blasé attitute of many Wikipedians to copyvios. -- Donald Albury 11:02, 6 September 2011 (UTC)[reply]
  • There's no indication that Yahoo did this for the simple pleasure of spitting in Wikipedia's face so I don't see why we should freak out. Wikipedia's reaction was exactly what it should be: regret Yahoo's decision, try to find an agreement with Google. You don't need to act like a bully just because you have enough muscle to do so credibly. Pichpich (talk) 21:29, 6 September 2011 (UTC)[reply]
  • Given that Yahoo!'s CEO Carol Bartz has just been kicked out, a new executive might be more open to reversing the API changes. If a Google deal falls through, I think publicly embarrassing the three major English-language search engines a little might push someone to act. /ƒETCHCOMMS/ 04:17, 7 September 2011 (UTC)[reply]
  • I'd add that the copyvio problem is not limited to articles. I've uncovered a huge number of copyvios in my short stint at AfC as well. When I watch the new user log, I frequently check new userpages, and I'm quite liberal in tagging pages from obvious corporate accounts, because my experience is that many times, even if they don't quite meet G11,they're often copyvios from somewhere. The Blade of the Northern Lights (話して下さい) 04:30, 7 September 2011 (UTC)[reply]

I just wanted to point out that back on August 30th, I proposed a change to Special:NewPages to help us deal with copyvios while CorenSearchBot was down. The thread can still be found at Wikipedia talk:New pages patrol#Proposal of additional bullet point at top of Special:NewPages (while CorenSearchBot is down). Singularity42 (talk) 20:01, 7 September 2011 (UTC)[reply]

  • As someone who deals with copyright issues in the File namespace on a regular basis, I can attest to the scope of the problem there. Wikipedia has several hundred thousand images and Commons has several million. On a daily basis images that were just found on the internet and are clearly the work of other people are uploaded by usually good intentioned users as 'own work' and given free licenses. I place a good deal of blame on the Wizard and its defaults, however Moonriddengirl is correct that a major cause is the lack of knowledge about copyright among many people. Most troubling is that a good number of people know about the existence of copyright but have major details wrong. I often hear the statement "it's on the internet, therefore it's in the public domain". What is needed are a set of guides, written so clearly that a third grader could understand them, that we can link to as an easy way of showing people the mistakes they are making. Communication with these people is key. Sven Manguard Wha? 17:40, 9 September 2011 (UTC)[reply]
  • +1 to the idea that most people don't know the first or last thing about copyright law wrt to images (For fun and frustration, if you have a Flickr account, go over there, look over new uploads (especially under certain CC licenses) and find copyright violations like screenshots, or photos of three-dimensional public artwork in the U.S. (for even more of a challenge, don't use a fish-in-the-barrel tag search like that. But you'll still find some if you know what you're looking for) Then leave comments for the users who uploaded them telling them about this. Not a single one will have been aware of this; some of them will even tell you off. Yahoo! is (in addition to its other problems) sitting on a huge litigation time bomb here; they are demonstrably negligent even without comparing them to us.

    We make this even more complicated with a fair-use policy that is more restrictive than U.S. law, so someone who thinks they're OK (and would be elsewhere) is actually not (I have found it interesting that, in surveys of how many new accounts stick around to become members of the community, virtually none of those whose first edit was to create a page outside of article namespace have done so. Hmm ... what kind of new user starts by creating a non-article page? You got it ... someone uploading an image that they thought they could use (It would be interesting to see how many of them did, indeed, upload third-party copyrighted images that wouldn't be justified under our policies). Daniel Case (talk) 19:59, 9 September 2011 (UTC)[reply]

  • Can we not make file-uploading a userright independent of autoconfirmed? It is harder to wrap one's head around all the fair-use, OTRS, etc. material than to understand "No copy/pasting text". --Σ talkcontribs 07:58, 12 September 2011 (UTC)[reply]

Yahoo and Google both permit automated queries (which is what Corenbot is/was). They charge for them, though; you can see those costs by following the links to the relevant terms of service mentioned here [1]. The cost wouldn't be minimal for the Foundation (Google, $5 per 1000 queries, for up to 10,000 queries per day; [2]; Yahoo either 80 cents per 1,000 or 40 cents per thousand using a limited index and slower refresh (about 3 days).[3] However many thousand new articles per day over all projects, times number of queries per article (possibly one for each article sentence?) And I'd suppose other non-profits, including university research projects, would like cost exemptions, including those for copyvio searches, and have comparable claims. Novickas (talk) 01:19, 10 September 2011 (UTC)[reply]

And TinEye (relevant to Commons) is 10 times more costly than Google -- $1500 for 30 000 of queries. Trycatch (talk) 17:36, 12 September 2011 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0