The Signpost

Op-Ed

Anti-vandalism with masked IPs: the steps forward

Contribute   —  
Share this
By Johan Jönsson
Johan Jönsson works for the WMF with the technical development of the Wikimedia wikis and Community Relations.

The Wikimedia wikis can be edited by registered and unregistered users alike. When someone isn’t logged in to an account, instead of their user name, the history – and the recent changes feed, your watchlist and so on – will show their IP address. This is mainly for attribution: when you write on the Wikimedia wikis, the copyright still belongs to you. You just give permission for the text to be spread and changed. So we need to attribute authorship to someone: a name, a pseudonym or at least an IP address. But knowing the IP behind an edit is also a tool we use to fight the edits we don’t want to see: vandalism and harassment, spam, and those that push a specific point of view at the cost of neutrality.

Roughly a year ago, a team within the Wikimedia Foundation’s Product department started a process on IP masking – hiding the IP addresses we today show in public. Our goal was roughly to try to address all the problems we knew it was going to bring, and hopefully be able to do it with no more work for vandal fighters than before we started. Recently the Wikimedia Foundation’s Legal department clarified their guidance: for legal reasons – which they can’t explain in detail due to legal privilege, the legal professional rules that control what lawyers can say about their work – this is something we have to do. We’re flexible on the how and the when, but not on the if. Thus that’s the reality we must deal with and the situation we are publicizing to the communities, as soon as we can.

There are other reasons for bringing up the subject, of course. The longer I work on the project, the stranger I personally find it that we publicly publish IPs – which I used to find completely natural, not least since I mainly contributed without being logged in for years in the earlier days of Wikipedia – of people who are trying to help make the wiki better. As a movement, we’ve had occasional debates on whether publishing the IPs really is what we should be doing for about as long as we’ve been doing it. But these are reasons for starting a conversation. Our legal experts telling us that this is something that has to be done is reason to do it.

I think one main communications issue is that we’ve tried to let the Wikimedia contributors in as early as possible and it’s not apparent to everyone where we are in the process. OK, we say, so we have to do this: Please let us know your fears and issues and everything you want us to take into account. This is something we need to solve with the wikis and vandal fighters, so that we can mitigate as much as possible. We try to ask questions as early as possible instead of doing internal planning based on our assumptions. The Wikimedia wikis have very different cultures and needs. They don’t see the same patterns around problems like undisclosed paid editing, harassment and returning vandals. The fact that I’m intimately familiar with this work on one wiki doesn’t mean there aren’t many things we need to learn from the communities, and no single wiki is a good model for all. What works for you or me will not work everywhere else.

We try to take the conversation that normally happens in Phabricator – open, but not easily accessible for most Wikimedians – and put it on the wiki. This means that we’re a couple of steps earlier in the process than people expect us to be. Some see that we plan to mask IPs, try to figure out how this is going to work and come away with the impression oh no, they have no idea what they’re doing. They have no plan. We do have a plan. It’s just that collecting information from the communities before we plan solutions is part of it. There’s time to work this out together. We’re not throwing the switch next week. Whether we know what we’re doing remains to be seen, of course, and I’m not the one to judge.

How do we plan to mitigate problems? Partly by giving more people access to the information that we’ll now be hiding from the public. We’ve been toying with the idea of a system with three tiers. First, we’d either build a new user right or maybe even just make access to the information opt-in, as long as the user meets certain criteria. Second, others could have access to part of the IP, to be able to see which range it belongs to. The threshold for access to the first user right would be lower than adminship on many wikis, since access still needs to be provided to admins on Wikimedia wikis with less stringent criteria, such as five or so users saying sure, why not, this new person seems serious and sincere. Third, the public and those with no interest in the tasks where this information is relevant would see a masked IP. Those who are involved in cross-wiki vandal fighting would need global access. We don’t intend to break the system by putting this on the checkusers and stewards. The details need to be hashed out with the communities.

Partly we’re aiming to solve it by building new tools. We’re trying to make the checkusers’ and stewards’ lives easier by updating the checkuser tool and working on a tool to find potential undetected sockpuppets. We’re working on surfacing the information about what the IP address means in a way that’ll be accessible to more vandal fighters than used to be the case. We want to hear more needs and suggestions.

So we talk to people. In various places and languages, to figure out how it would affect them. It varies: a significant number of English Wikipedia vandal fighters have expressed concern on Meta, while Swedish Wikipedia hasn't, when explicitly asked. The Arabic Wikipedia discussion did not raise the same problems as the Chinese one.

Why do IP masking at all, some ask. Why not disable IP editing instead? We’re investing significant time and resources in trying to solve this because we’re convinced that turning off unregistered editing would severely harm the wikis. Benjamin Mako Hill has collected research on the subject. Another researcher told us that if we turn IP editing off, we’ll doomed the wikis to a slow death: not because the content added by the IP edits, but because of the increased threshold to start editing. We can’t do it without harming long-term recruitment. The role unregistered editing plays also varies a lot from wiki to wiki. Compare English and Japanese Wikipedia, for example. The latter wiki has a far higher percentage of IP edits, yet the revert rate for IP edits is a third of what it is on English Wikipedia: 9.5% compared to 27.4%, defined as reverted within 48 hours. And some smaller wikis might suffer greatly even in the shorter term.

And that’s the heart of the problem: There is no available strategy without risk. Legal risk. Risk of vandalism. Risk of hurting long-term editor recruitment. So we hope to be able to work together, listen to suggestions and problems, and build around potential obstacles and mitigate concerns. Give the communities the tools they need.

S
In this issue
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

How might this work with the current problem of IP leaking the identity of logged in users who are blocked to other users on the same IP? All the best: Rich Farmbrough 19:58, 1 November 2020 (UTC).[reply]

I'm not sure this in itself would change anything at all. We'll jave to look into it. Thank you for raising the question. /Johan (WMF) (talk) 22:22, 1 November 2020 (UTC)[reply]

So where is the substantial improvement in anti-abuse tools you promised when you announced this unwanted project? Oh wait, you haven't deployed anything. MER-C 20:00, 1 November 2020 (UTC)[reply]

The most recently tool is the new version of the checkuser tool Special:Investigate, which was deployed to the last remaining wikis – including English Wikipedia – in October, although it still requires significant fixes.
But to be clear: we're also far from implementing masking, and there's more time for tool development before that happens. This update is because the Wikimedia Foundation Legal department clarified that the status quo couldn't remain, which we had previously considered a potential outcome, and we wanted to the let the communities know that as soon as possible. /Johan (WMF) (talk) 22:22, 1 November 2020 (UTC)[reply]
That is about 2% of the work you need to do to mitigate this when complete. Try harder. MER-C 13:43, 2 November 2020 (UTC)[reply]

I normally support WMF decisions, but a lack of transparency on why this must take place, and insisting "not if, but how" it will take place, is reminiscent of the so heavily opposed renaming efforts, also forced upon the community as something that must happen in some form or another. ɱ (talk) 20:13, 1 November 2020 (UTC)[reply]

@: I can assure you that there is at least one very good and concrete reason why WMF Legal is insisting that we mask IP addresses and this reason also prevents the WMF from discussing it publicly. I know that sounds like an Orwellian ultimatum, but that's the unfortunate reality of the legal situation. This is not analogous to the renaming effort, as it is a legal requirement, not something the WMF actually wants to do. Ryan Kaldari (WMF) (talk) 17:27, 4 November 2020 (UTC)[reply]
Ryan Kaldari (WMF), that is hard to believe when no one will answer the simple question "What law is it that requires that?". Legal codes are already publicly available, so it's not like you'd be revealing confidential information just by saying "1 USC Section 42 requires that." I'm not aware of any laws that make it illegal to display an IP address, but of course the lawyers may know something we don't. Why all the secrecy? They may be prohibited by professional ethics from talking about the advice they give, but the people who receive the advice are not similarly restricted. Seraphimblade Talk to me 23:43, 6 November 2020 (UTC)[reply]
I trust legal's advice on this and thus support that it is required. Doc James (talk · contribs · email) 20:15, 7 November 2020 (UTC)[reply]

"we publicly publish IPs [...] of people"

The longer I work on the project, the stranger I personally find it that we publicly publish IPs [...] of people who are trying to help make the wiki better. - this framing, which seems to have been the main motivation for initiating the entire effort before the sudden recent discovery of the legal requirements, is questionable to say the least. It casts these "people" as helpless victims whose IP address is forcibly exposed by the decision of others. But "IP editors" are not an immutable protected group. They are just contributors like everyone else, who have made a different choice after hitting the edit button - namely to have their contribution attributed to their IP address rather than an (easily created) account.

Now, I agree that an editor's IP address can be very sensitive (I have long advocated this view myself, e.g. as a main author of the German Wikipedia's checkuser guidelines, which are more restrictive than those of many other projects out of such concerns). But the reality is that many editors rationally decide that this is not the case for them personally.

Also unacknowledged in the rhetoric about this project is that contributing under IP can often even be the more privacy-preserving choice: The information that can be derived from a dynamic IP is frequently much less revealing than what can be concluded from a logged-in user's aggregate edits (I compiled a few examples in this Wikimania talk a good while ago).

Regards, HaeB (talk) 20:16, 1 November 2020 (UTC)[reply]

As stated above, I'm not talking about that legal message (which seems to be that WMF is not allowed to give editors these two choices even if they wanted to), but about a quite distinct rationale which was a focus when the project was initiated by the Product department last year.
And I totally agree about the potential value of the planned improvements to the checkuser tool or the various efforts to provide automated sockpuppet detection. But these are entirely separate - they wouldn't be tied to the masking effort and could in fact have been implemented years ago.
The bottom line remains that there is an inescapable tradeoff between information integrity and (perceived) privacy benefits here. Of course, now that it turns out that the existing practice is illegal, we need to fix it. But especially at this time where fighting misinformation is on many people's minds (including the Foundation's), we should not pretend that it won't have a negative impact on that work.
Regards, HaeB (talk) 20:57, 1 November 2020 (UTC)[reply]
HaeB: You're probably aware of this, but just to be clear so people reading this don't get the wrong impression about why this is happening: whereas I personally believe there are also privacy benefits, and that this has often been lost in the conversation, and not only costs and problems that we have to solve, at the end of the day, my opinion is immaterial. We're preparing to move forward on this (not now, not this month, not next month: when we've had time to prepare and develop tools) because Wikimedia Foundation Legal department recently clarified that the status quo remaining is not an option, and not because of any other argument I could make. /Johan (WMF) (talk) 22:22, 1 November 2020 (UTC)[reply]
Yes, Johan, I think Bri and myself had already mentioned this about three times above. It is not disputed that we need to change the status quo now that we have learned that it has been illegal or at least too legally risky since 2002, or perhaps only since some more recent legal developments.
Still, in your op-ed you chose to advance that "immaterial" separate non-legal argument (that we are wronging "people who are trying to help make the wiki better" by offering them the option have their edits attributed to their IP address instead of an account). So it remains worthwhile explaining how you arrived at that view.
It is appreciated that this change is not being implemented in a rush, and that a serious effort has been made (e.g. in Claudia's report) to understand how editors currently use this IP information to deal with abuse. But unless I overlooked something in the documentation on Meta, no comparable research has been conducted about the perspectives of the objects of your concern, namely the editors who choose the IP attribution option for their edits. There is an assumption (also spelled out explicitly on the main Meta-wiki page) that they are usually not intellectually capable to really understand the anon edit warning displayed after one clicks "edit", and are therefore unable make an informed decision about this. But a serious assessment of the privacy vs. information integrity tradeoffs would involve estimating how often this is really the case, and for what reasons. This could also have pointed to alternative solutions, like making that warning more easy to understand or perhaps even more legally pertinent. As mentioned above, editing without logging into an account can actually often (although of course not always) be the more privacy-preserving choice. And anecdotally, many IP editors appear to be experienced regulars rather than naive newbies.
Regards, HaeB (talk) 23:51, 1 November 2020 (UTC)[reply]
I did, and that might have been too personal, but I should live up to it nevertheless.
(For those skimming and not having read the entire conversation above: The argument below is not why the Foundation is moving forward with this, which is based entirely on legal requirements. This is personal notes on a topic I mentioned as reasons to start a conversation, rather than making a decision.)
For context, I was almost exclusively an IP editor for my first four years of Wikimedia editing; after a little while I had an account with very few edits to its name, but it took me years to get into the habit of logging in; I'm not here to disparage our intellectual capabilities. There are people who continue being IP editors fully or at least partially aware of what this means. I would rather say that we and almost every other website has effectively taught users that whenever you post something, there's information being thrown at you need to disregard. Banner blindness is a real thing. Then, even if you read it, you need to understand what an IP address is, which many don't. Then, you need to understand the implications of this, which even fewer do. How can this be used against you? Not doing this is not having the intellectual capability: it's about not having to spend significant time and effort understanding the technical background just so you can make a small fix to a text online. It's the sensible choice, just like the decision to not read through the end-user license agreement is just a sane way of living one's life.
With that said, yes, this is largely based on assumptions (informed partly by having spent a lot of time talking to people who described making one or ten edits), and if I wanted to make an anonymous edit for some reason, broadcasting my IP would sure be efficient than using my normal non-WMF account. I also think it's a weakness that IP users are not really part of the conversations around this.
Legal did look into consider making the warning more clear, or unavoidable, as part of the their investigation, but that this was rejected as an avenue forward. /Johan (WMF) (talk) 02:52, 5 November 2020 (UTC)[reply]

 Question: "[F]or legal reasons – which they can’t explain in detail due to legal privilege, the legal professional rules that control what lawyers can say about their work – this is something we have to do @Johan (WMF): I understand that there may be reasons to keep things private, but this is a very peculiar assertion. If this is a case of legal privilege, who are the parties? Surely the WMF is the client? Mo Billings (talk) 23:58, 1 November 2020 (UTC)[reply]

I think the answer to your question is in the first paragraph of Wikipedia:Wikipedia_Signpost/2020-11-01/News and notes#Mandatory IP masking. Yes, as best I can tell, WMF's counsel has told WMF this is required. ☆ Bri (talk) 00:25, 2 November 2020 (UTC)[reply]
meta:IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation#Statement_from_the_Wikimedia_Foundation_Legal_department explains a *tiny bit* more about what Legal is thinking. The reason for the secrecy is likely that Legal's advice to the WMF on this matter would be considered work product. Having your legal department publish a brief saying "we think we might be in violation of/could be sued under Law X in Country Y" is generally considered a Bad Idea. While I'd definitely like to hear more about Legal's concerns so that we as a community can better design and evaluate mitigations, that's unfortunately how the courts work. --AntiCompositeNumber (talk) 00:34, 2 November 2020 (UTC)[reply]
@Johan (WMF) and Mo Billings: - yes, stating they couldn't explain because of legal privilege was a bit odd. It can be waived by whoever the client is (in this case the WMF itself). If they (that is, Legal) can't release it because the WMF (as an organisation) refuses to waive it that is an important clarification. That (in)action could be warranted, but the specific reason should be given. Nosebagbear (talk) 00:39, 2 November 2020 (UTC)[reply]
Thanks, AntiCompositeNumber, that link was quite helpful. Still, I would be more comfortable with this if it wasn't phrased in such terms. If the WMF is the client then putting this in terms of "legal privilege" seems like a fig leaf to hide the fact that the WMF doesn't want to talk about the reasons for this change. I would rather be told that this is being done to reduce future legal exposure (without knowing the details) than be asked to go on trust. The re-branding project and the proposed board changes have recently weakened my level of trust in the WMF. Mo Billings (talk) 03:55, 2 November 2020 (UTC)[reply]
@Johan (WMF): I would appreciate a clarification form you or WMF legal on the privilege question. Thanks. Mo Billings (talk) 17:33, 4 November 2020 (UTC)[reply]
Mo Billings, I've pointed Legal to this – I'm not the right person to handle the legal questions, I'm afraid, coming to this from the product side. /Johan (WMF) (talk) 17:51, 4 November 2020 (UTC)[reply]
What's to stop a hash of the IP being used, at a minimum? Adam Cuerden (talk)Has about 7.6% of all FPs 02:54, 2 November 2020 (UTC)[reply]
A hash would not be good enough to track an IP vandal hopping within

 Question: I also am not reassured that a magical tool will be sufficient to track long-term abuse. Will that 'wand' allow us to distinguish the following known pattern of disparate IP usage? 'Griefer451' has access to computers at home, at work and sometimes at the library. They have a 'fairly' distinct style allied with a grievous resentment towards WP, resulting in both numerous defacements at intervals together with an impression that this 'editor' is somehow familiar even though a number of IPs are used at dissimilar periods of day and also migrating over weeks. How are we ever to shut down this vandal? If we can't notice that the IPs related by vandalism are clustered? That even when (home) IPs change they are actually from the same pool? This is not theoretical, but actual long-term patterns.

Further, how are we to ever notice school kiddy vandalism? Will there be a magic flag added to the tokenized identity that says this is a middle school educational pool so we can apply the dunce cap?

The legal team say they have determined an unassailable legal stance for WP? Have they determined whether it is workable? I would challenge the WMF thusly. Have every member of the legal team spend one or two hours a day following IP edits around WP, fixing the obvious vandalisms and reverting the graffitos, for at least a month. Oh, and track back in time _all_ the edits those IPs have left lying around for months. First, the lawyers will *love* the billables. Second, WMF will gain a new respect for the amount of time that IP inadvertencies soaks up, while rueing the cost of reality-based research. I feel that legal opinions are not information sufficient to proceed, but must be reconciled with our day-to-day realities. Moreover, I feel, anyone not having spent hours and hours fixing IP vandalism is not qualified to appreciate the difficulties already existing. Don't make it impossible. Shenme (talk) 04:49, 2 November 2020 (UTC)[reply]

It also encourages vandalism. Now, if someone vandalises from the UK Parliament, they get shamed. After this? They completely get away with it. Adam Cuerden (talk)Has about 7.6% of all FPs 04:59, 2 November 2020 (UTC)[reply]
And the counter-vandalism efforts of most users without access to advanced privacy tools would be rendered useless if it is impossible to track patterns now ascribed to a single IP (static school IPs, for example) or range of IPs (the classic IP-hopping vandal within a subnet). Vandalism and "sockpuppetry" would run rampant when there are fewer users capable of identifying and reporting the source of problematic edits; the rest of us would basically be playing whac-a-mole with vandalism in articles, which in my opinion is a wholly unacceptable outcome by itself. We'll see what WMF comes up with, but anything that is a net negative for non-admin (or worse, non-CheckUser) RC patrollers is a step in the wrong direction for the project (admins and CheckUsers especially are overworked enough as is). As a non-admin RC patroller, I hold reservations about this. ComplexRational (talk) 14:08, 2 November 2020 (UTC)[reply]
I don't believe in a magical wand tool either, to be clear, though I would love one – we don't have a single tool that would drastically change the field. I'd describe the current plans as smaller changes across various areas, combined with making sure that the information isn't limited to checkusers, or admins for that matter. /Johan (WMF) (talk) 17:51, 4 November 2020 (UTC)[reply]

EU Privacy Law

I used to work in the Data Protection area in the EU, so I have a suspicion that I know why this is necessary, and why the WMF might not want to concede that IP data is personal information until they are in a position to stop displaying it. However I'm curious as to what we are going to do with the hundreds of millions of edits that are currently linked to an IP address. Leave them untouched? If you stop displaying the IP address how do you expect people to comply with the attribution part of CC-BY-SA? To me it has long seemed a bit of a nonsense that we require attribution of IP addresses, better in my view to have edits by logged in users as CC-BY-SA and in future to have some of legalese to the effect that if you choose not to use an account the SA bit of CC-BY-SA does not apply to you as you have not given a name for reusers to attribute your edits to. The recruitment of new editors is a really important point, but there is an alternative. Currently we are over dependent on the desktop view as the mobile view recruits very few readers to become editors. Making the mobile view more editor friendly for smartphone users is probanbly too big a software task for the WMF. But if we launched a tablet view an intermediate in editor friendliness between mobile and desktop, and maybe upgraded everyone on their first edit from Vector to Monobook, we might have sufficient new editors that we could afford to lose IP editing. ϢereSpielChequers 09:55, 2 November 2020 (UTC)[reply]

So, for my sins, my last job was also in EU data compliance (May 2018, fun days...), and this got discussed a bit more on meta, in obviously non-confirmed ways. Given that they didn't say we had to do this a couple of years ago, I had wondered whether one of the regulators had dropped them an unofficial message, or if one of the jurisdictions had had a case WMF Legal re-interpret the articles/recitals. Nosebagbear (talk) 16:27, 2 November 2020 (UTC)[reply]

User contributions

I ask this as an editor without much technical understanding of the "masking" process being proposed here: will editors still be able to see the user contributions of IP editors? Help:User contributions points out that "Other users' user contribution pages can also be accessed and are useful for seeing how other users have contributed. They can be used to track down vandalism, serial copyright violations, etc." I routinely use IP editors' user contributions pages to find and revert all of the vandalism a vandal has posted after stumbling across one instance of it in my watchlist. Will this still be possible with the IPs "masked"? If not, it will make spotting and quickly fixing the work of vandalism-only IP editors much more difficult for me. -Bryan Rutherford (talk) 04:26, 3 November 2020 (UTC)[reply]

My interpretation of that is that side would still be as normal - it would just be "user contributions of IPMask-12345" rather than looking like an IP address. Nosebagbear (talk) 10:12, 3 November 2020 (UTC)[reply]
I can confirm that everyone will still have access to contributions, just like today. The only real difference here is that you'll see something else than the IP as the user ID. The one other question mark is persistence for the masks (the user IDs, the "user names" so to speak). /Johan (WMF) (talk) 17:54, 4 November 2020 (UTC)[reply]
Years ago I remember an exchange over a biographical article which the Foundation had blanked due to a complaint, yet when queried about what the problematic content was, WMF counsel (I believe that was Mike Godwin) replied they not only could not tell us "for legal reasons". This led to the equivalent of a bizarre version of 20 questions between editors & the WMF counsel to figure out what the content was so it could be excluded from future versions. I also remember another exchange where another editor needed some legal advice concerning an edit, only to be told by Mike Godwin, "I don't work for you, I work for the Foundation." In other words, WMF Foundation has not only managed to antagonize the editing community, but taken the stance that for the most part we volunteers & our concerns are not important to the success of the projects. (If we are not considered part of the problem hindering that success.)
Reading once again this evasive language, that IP masking must be done, but the reasons can't be explained to us "for legal reasons" is, frankly, insulting. It's a repeat of the insult many of us felt in the FRAM incident: that the opinions of the people who are creating this treasure of information aren't important. Now, I'm sure someone from the Foundation will appear to argue that this is not the case, but those words won't work. Even if that person is the head of the WMF Legal team, because it's clear WMF Legal only cares about the Foundation, not about the volunteers who enable the Foundation to exist. We volunteers have given far more in labor & resources to the success of Wikipedia & related projects than the visible heads of the Foundation, & unless we are seriously included in matters like these, one day we will stop editing. This is an observation, not a threat. -- llywrch (talk) 19:07, 3 November 2020 (UTC)[reply]
@Llywrch: As a 16-year volunteer editor who also happens to work for the WMF, I can assure you that the WMF would absolutely not be doing this unless it was a clear legal necessity. And unfortunately, I can't explain why it is a legal necessity due to specific legal reasons. You may consider that evasive or insulting, but it is not intended to be either. Due to the laws under which the projects and the WMF operate, it unfortunately isn't always possible to have complete transparency. This is just as frustrating for the WMF as it is for the community. Ryan Kaldari (WMF) (talk) 17:51, 4 November 2020 (UTC)[reply]
Please note this was directed at WMF Legal. While many WMF employees are concerned about the relationship between the volunteer communities & the Foundation, I have yet to encounter any who work for that department. If anything, of all of the units within the foundation they are the most hostile to our needs & requests. -- llywrch (talk) 18:26, 4 November 2020 (UTC)[reply]
@Llywrch and Ryan Kaldari (WMF): I actually find Legal's action here annoyingly uncharacteristic - I have quite a lot of communication with them through OTRS, where they are both pleasant, to the point and treat agents more akin to colleagues. I also feel it's somewhat bonkers to say that WMF Legal only cares about the Foundation, given fairly strenuous efforts to aid the Community when they could have just required compliance, and still avoided risk to the WMF. However, here, they've listed a bunch of things which wouldn't be legally binding, which makes it read more like covering detail, to make it harder to pin down the specific reason - hence a viewpoint of evasion Nosebagbear (talk) 14:50, 5 November 2020 (UTC)[reply]

EU-US Privacy Shield invalidation

Those interested in details about this requirement might want to review the July ruling from the European Court of Justice finding that the EU-US Privacy Shield framework failed to protect Europeans' rights to data privacy.[1] 107.242.121.56 (talk) 21:30, 3 November 2020 (UTC)[reply]

About time

I have raised this issue a number of times. I think Wikipedia is today the only website which openly displays users' IP address, which can reveal data about them, and make them potentially vulnerable to hackers. We know that revealing such data can be highly inappropriate, which is why we allow oversighting of edits by unlogged in users. But by default the WMF is revealing information about users without adequately warning them of the consequences. It should be a priority matter to automatically hide people's IP address, and not because the WMF can get sued but because it can put people in harm's way, and nobody should be put in harm's way because of editing Wikipedia, even if they are vandals. The WMF could automatically assign a unique username to each IP address, making it clear this is an unregistered account, but identifying it so it can be monitored, and still allowing checkusers to look at the IP address if appropriate. It should do this for each new IP user, but also convert all existing IP edits into unique usernames, providing functionaries with all the data of the changed IP names. The information the legal team probably wants to conceal is detail on the ways that an IP address can be vulnerable (and thus the rationale for why they want to do this), and it is right that such information is concealed, and that we shouldn't be speculating here on those vulnerabilities. SilkTork (talk) 12:06, 6 November 2020 (UTC)[reply]

I think Wikipedia is today the only website which openly displays users' IP address... that's because WMF sites are among the few that allow unregistered editing. I don't know of any non-WMF wiki sites that allow unregistered editing; they may exist, but they are rare. So this simple solution to mitigate privacy problems may simply be to ban unregistered editing like everywhere else. Jules (Mrjulesd) 20:25, 6 November 2020 (UTC)[reply]
I'm not seeing a requirement to register, nor any particular difference between someone editing Wikipedia unlogged in and being automatically assigned "UserNo123456" rather than "IP:774637", other than giving that user safety which they would not otherwise have. Everything else is the same - their edits are automatically logged to them, and they are not having one moment's pause as the software assigns their user name in the same way that it currently assigns (and reveals) their IP. If you're seeing something in it which I can't see, I'd be interested to hear it. SilkTork (talk) 13:49, 7 November 2020 (UTC)[reply]
Well particular problems include ranges; for example IPv6 connections use /64 ranges, without seeing these it will be difficult to pick up on user contributions. Likewise if "IP usernames" are allocated dynamically, user contributions from a particular IP will be difficult to assess, e.g. whether they come from from an educational establishment that could be blocked. Not allowing IP editing seems like a simple solution; accounts are cheap, and are used in all other wikis (that I know of). It could help to alleviate a huge exodus of vandal fighters if this comes to pass. Jules (Mrjulesd) 14:03, 7 November 2020 (UTC)[reply]
Good points. But I should imagine that one IP would equal one username to make identification easier, but without revealing any information that could be misused. Information about the actual IP address behind the username would still be available to checkusers the same as a manually created username. And if these automatically created usernames are differentiated from manually created ones, then vandal fighters would still be able to quickly identify users who have not manually created a user name, and so may have less commitment to the project, and so be more likely to make test edits. SilkTork (talk) 18:56, 7 November 2020 (UTC)[reply]
@SilkTork: This probably isn't the place to have this discussion, but along with the concerns that Mrjulesd has mentioned, consider how automatically assigned IDs might differ from usernames. I'm making some sensible assumptions about how this will be done (let's ignore cookies to keep this short). IDs will be assigned based on some form of fingerprinting of the user's device combined with IP address. If the browser gets updated, that ID will change. If their IP address is dynamic, it will change occasionally (or regularly, depending on how they access the internet) and that ID will change, even if they are the same person using the same device from the same location. If they edit from more than one place, say, school and a coffee shop, they will be assigned different IDs because the IP addresses are different, even though they are the same person using the same device. If they edit with more than one device, say a laptop at home and a phone while on the train, they will be assigned different IDs. Compare this to mandatory creation of an account and then signing in to that account when editing. Which makes more sense? Mo Billings (talk) 22:53, 7 November 2020 (UTC)[reply]
I am not opposed to mandatory creation of an account, I am opposed to WMF revealing people's IP's when they edit without logging in. And any problems associated with auto-generated account names would also occur with mandatory manually created account names because users can create multiple accounts. As regards incorporating the user's device into the creation of an automatic account, I'm not seeing why that is necessary or useful. The idea of an automatic account is purely to conceal the user's IP so they are not open to potential abuse. Information about the device itself would still be available to checkusers, but does not need to be incorporated into the account creation because it's not publicly revealed anyway. Any changes that can occur to the automatically generated account would also occur to the IP address, so vandal fighters would have the same problems they have now. Automatically generated accounts would not make the vandal fighters job any easier or harder, but would offer greater protection to all unlogged in users, many of whom are not vandals. SilkTork (talk) 01:33, 8 November 2020 (UTC)[reply]
You are right, it is not necessary to make use of the device information if the aim is to simply mask IP addresses. It would be useful in distinguishing users for the purposes of persistent IDs or detecting abuse. Mo Billings (talk) 04:11, 8 November 2020 (UTC)[reply]
Thanks for the link - I'll take a look. As regards the unlogged in warning: Are you talking about the phrase: "Your IP address will be publicly visible if you make any edits."? I would regard that as a bland statement rather than a warning, as it doesn't explain that in some circumstances a hacker can make use of their IP address to force entry to their device and steal personal information which can be used in financial scams. This is not just about roughly pin pointing someone's location, it's also about identity theft. If there was a two stage access to editing unlogged-in in which the dangers were explained and on the second page a small box had to be located and ticked to confirm the person was aware of the dangers of editing unlogged in and was happy to proceed, that would be a suitable warning. Bear in mind that some very experienced users have occasionally edited unlogged in by accident - and had to have their edits suppressed. SilkTork (talk) 16:21, 12 November 2020 (UTC)[reply]
I suppose that's why they're taking this action, they're probably worried about being sued over data breeches occurring from public exposure of IP addresses from people editing here. One solution perhaps would be to make all IP edits go through a very clear disclaimer process, but even that could be subject to legal problems, particularly in regards to minors. But that is obviously not what they're considering here.
All this stems from the extremely antiquated process of allowing unregistered editing. All other major web-sites simply do not allow this, and therefore are not subject to liability from data-breeches in this manner. I think this proposed solution of "pseudo-accounts" could cause mass confusion and consternation, and is simply not needed in this day and age, where user registration for contributing is not only an accepted practice but also an obligatory practice. User retention is simply more important than user recruitment, and we should accept that and not create these hodge-podge "solutions" which will likely cause more problems than they solve. Jules (Mrjulesd) 17:23, 12 November 2020 (UTC)[reply]

Yes, it seems quite extraordinary that we allow unregistered and anonymous people to join in without logging in from any fly-by-night Internet Cafe or temporary SIM card phone and have a go at doing whatever they fancy for good or ill with basically no accountability at all. A site "that anybody can edit" should simply mean "that anybody can freely register for" (in a couple of minutes), basta - every other website in the world works that way, and it doesn't seem to stop many of them getting huge numbers of customers. As for masking, well, it seems utterly extraordinary that the legal eagles can't tell us what law we're supposed to be complying with - why the hell not, it's a basic right to know how we're being governed. Masking is an utterly ludicrous solution, both because the IPs should be logging in, and because (as others have said above) it will make the tracking-down of vandalism worse - how are we going to warn somebody when we have no way at all of knowing if they did it before, it makes no sense: doubly ridiculous. Get them to log in and all the technical faffing-about and complexity is sidestepped. Should have been done years ago. Chiswick Chap (talk) 20:42, 14 November 2020 (UTC)[reply]

I was about to start a thread called "About damned time", but I'll just use the existing less vulgar heading. >;-) This has been a long time coming, and is very overdue. Without repeating all of HaeB's pro and con arguments, I just want to say that the fact that IP-address disclosure is meaningless to and harmless for many (maybe even most) anon editors is no excuse. The fact that it's a serious security problem for some (possibly even a safety one) is reason enough to stop broadcasting potentially personally-identifiable IP addresses to the entire world. Especially since it's very easy to become logged out without noticing until after having made some additional, now-IP edits that directly connect that IP to the user ID you were just logged in as.  — SMcCandlish ¢ 😼  16:32, 17 November 2020 (UTC)[reply]

A different IP problem

Since we have IP-related dev attention here, I want to raise a side topic: It's intensely frustrating (as well as a security problem) that our systems are presently blanket-blocking (often on a WMF-global basis) all sorts of IP addresses that WMF assumes are "web host providers or colocation providers", without regard to the obvious facts that a) IP addresses and the servers behind them often serve multiple purposes; b) once a user is logged into an actual account, what IP address they are coming from and what other services are provided by the owner of that IP address are irrelevant; and c) the endpoints of most VPNs anyone would bother subscribing to are very likely to be "web host providers or colocation providers" as most of their bread-and-butter, or they would not have the bandwidth to be useful VPN endpoints in the first place.

Example of the kind of block notice generated by this nonsense (though anonymized and without all the giant red text formatting and other hooey):

You do not have permission to edit this page, for the following reason:

You are currently unable to edit Wikipedia.

You are still able to view pages, but you are not currently able to edit, move, or create them.
Editing from 123.456.789.0/22 has been blocked (disabled) by AdminUserName for the following reason(s):
The IP address that you are currently using has been blocked because it is believed to be a web host provider or colocation provider. To prevent abuse, web hosts and colocation providers may be blocked from editing Wikipedia.
You will not be able to edit Wikipedia using a web host or colocation provider.

Since the web host acts like a proxy or VPN, because it hides your IP address, it has been blocked. To prevent abuse, these IPs may be blocked from editing Wikipedia. If you do not have any other way to edit Wikipedia, you will need to request an IP block exemption.

If you do not believe you are using a web host, you may appeal this block by adding the following text on your talk page: {{unblock|reason=Caught by a colocation web host block but this host or IP is not a web host. My IP address is _______. Place any further information here. ~~~~}}. You must fill in the blank with your IP address for this block to be investigated. Your IP address can be determined using whatismyip.com. Alternatively, if you wish to keep your IP address private you can use the unblock ticket request system. If you are using a Wikipedia account, you will need to request an IP block exemption by either using the unblock template or by submitting an appeal using the unblock ticket request system.

Administrators: The IP block exemption user right should only be applied to allow users to edit using web host in exceptional circumstances, and they should usually be directed to the functionaries team via email. If you intend to give the IPBE user right, a CheckUser needs to take a look at the account. This can be requested most easily at SPI Quick Checkuser Requests. Unblocking an IP or IP range with this template is highly discouraged without at least contacting the blocking administrator.

Using ISP Rangefinder

This block has been set to expire: 14:28, 11 October 2022.

Even if blocked, you will usually still be able to edit your user talk page and email other editors and administrators.

Other useful links: Blocking policy · Username policy · Appealing blocks: policy and guide

If the block notice is unclear, or it does not appear to relate to your actions, please ask for assistance as described at Help:I have been blocked.

I sometimes have to bounce around between 10+ endpoints on my VPN provider's network before I find one from which I can edit, and this is just downright stupid. (And then it changes again a few days later so I can't use that one, meanwhile the unnecessary block on another expires and I can use it again. For a few days. Then I have to try to come in from Panama or Japan or Zimbabwe. Until next week, then maybe Liechtenstein or New Zealand. It's just random, brain-farty, wannabe-security nonsense.)

I've had requests to unblock a specific VPN IP address, for me as a logged-in user, declined simply because it's a technical hassle. It shouldn't be a hassle. It's only a hassle because of how things have been set up on the sysadmin side of things. And sometimes these requests are declined for even more daft reasons, like maybe I'm not really who I say I am, and why am I coming in from IP addresses all over the globe, is maybe my account compromised, or am I "really me" but a bad-actor after all, despite years of service? It's blatant circular reasoning: We're screwing with your ability to edit by carpet-bombing various IP addresses because someone vandalized through them once upon a time; then we're declaring you to be a possible vandal or sockpuppet or system cracker because this idiocy has forced you to try to use other IP addresses to get in. That's called "blaming the victim".

This has to stop. While I don't entirely disagree with the dev's announcement/op-ed thing above expressing concerns that just blockading all anon IP edits would do harm to the projects by erecting a barrier to entry that many potential editors would not climb (though pt.wikipedia is providing direct evidence against that prediction), it's more than just hypothetically harmful to use blunderbuss approaches to "security" (actually just anti-vandalism and anti-socking convenience) that thwart editors like me (with 15+ years of solid experience here, and advanced permissions), and actually reduce real security by convincing various legit, account-registered editors to stop trying to log in through VPNs. Given how many editors are now editing with mobile laptops, phones, and tablets, from locations they do not completely control and which are sometime actively targeted by persons and organizations trying to eavesdrop on data, this is a real and growing security hole (especially for users with advanced permissions like TemplateEditor, Admin, etc), as well as a totally unnecessary pain in the butt.

Johan Jönsson, I doubt you have anything personally to do with this problem, but you appear to be in a development-insider position to amplify the squeaking of this wheel so that it actually gets some grease.

PS: This firehose approach to IP blocking doesn't even function as intended, anyway. It is often the case that I can edit from one of my VPN's IP address for anywhere from several minutes to an hour or longer without incident, only to eventually have it stop working, with that dunderheaded block notice popping up finally. There is a huge lag in the ability of the system that does the IP address analysis to even "get a bead" on what the IP address is and match it to a block list. That's a bit like having a car-door lock that only actually locks the door at some random interval, anywhere from a minute to several hours, after you press the lock button (and probably do it when the actual car owner is trying to get in, not when a thief is). It's certainly doing jack to prevent vandalism or socking, since by far the majority of such unconstructive behavior is going to happen quickly, not after 39 minutes or 2.6 hours of editing around as an anon at that same IP address. The entire approach is just flat-out broken.
 — SMcCandlish ¢ 😼  16:32, 17 November 2020 (UTC)[reply]

SMcCandlish: This is indeed not really my area, but that doesn't mean I can't help at least get this to the right persons. Just to make sure I understand you correctly: Is your request that the Foundation look into the technical aspects to make it easier for the Wikimedia communities to handle exceptions in the long-standing discussion around VPN access to Wikipedia and the privacy wins (and sometimes necessities, in some areas of the world, to simply access the wikis) versus the anti-vandalism/anti-sockpuppet cons? I.e. that this behaviour is partly driven by the bluntness of blocking tools, and we should give the communities the chance to act with more consideration of the specific situation? Or were you thinking of non-technical aspects? /Johan (WMF) (talk) 04:43, 18 November 2020 (UTC)[reply]
Thank you for getting back to me (and I apologize for venting a bit; it's just been maddening – it's one of the reasons my once-prodigious editing is now fractional). Your summary does indeed get to the gist of it, though there is also a social/policy/administrative "gatekeeping" aspect: it should be easier to get approval for an exception as a logged-in user. I'm not making any kind of argument/request with regard to what anons should be able to do from these same IP addresses. The tool is overly blunt (and seemingly a bit damaged, given the lag bug), but it's also being wielded more crudely than should be necessary. If there's some complication that would make anti-vandal/sock actions seriously more difficult, a likely compromise would be to make the permission available (or usually available) only to editors of a certain trust level. That could be anything from bare AutoConfirmed up to one or more advanced, high-trust permissions with an vetting process like TemplateEditor, PageMover, or FileMover. I'm not feeling strongly about the specifics. If it should be rather low but not super-low, ExtendedConfirmed might be good. If it must involve administrative judgment but still be broadly available, the Rollbacker seems reasonable. Etc. PS: I was inspired to bring this up here and now because your piece above indicates that the impending legally required changes to IP-address handling are already necessitating some "re-code stuff so we don't break abuse-fighting tools" efforts, so now seems like the ideal time to deal with both problems.  — SMcCandlish ¢ 😼  06:09, 18 November 2020 (UTC)[reply]
@Johan (WMF): forgot the ping.  — SMcCandlish ¢ 😼  06:14, 18 November 2020 (UTC)[reply]
I have written an essay, User:Bri/Misapplication of blocking, that addresses some parallel concerns with overbroad application of IP blocks, and a lack of transparency and accountability. ☆ Bri (talk) 04:48, 18 November 2020 (UTC)[reply]
Thanks. I will have a look-see at it. :-)  — SMcCandlish ¢ 😼  06:09, 18 November 2020 (UTC)[reply]
@Bri: Zoiks. That's really troubling. I realize it's not the exact same issue I'm raising, but it has at-least-equally-bad implications.  — SMcCandlish ¢ 😼  06:14, 18 November 2020 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0