The Signpost

Op-Ed

Random Rewards Rejected

Contribute  —  
Share this
By Kudpung

Lab rats revolt: Researchers don't get their way with the Wikipedia community

A proposed research project which would have randomly awarded barnstars to Wikipedia editors was recently withdrawn by researchers at Carnegie Mellon University (CMU). Bending to concerns expressed by en.Wikipedians that the process was a social experiment, Ph.D. student Diyi Yang and Robert E. Kraut, Ph.D, Herbert A. Simon Professor of Human-Computer Interaction at Language Technologies Institute, CMU, withdrew their proposal. Initially approved by the institutional review board (IRB) at CMU, the proposed research entitled How role-specific rewards influence Wikipedia editors' contribution would have involved placing thousands of randomly assigned barnstars on unsuspecting editors' user pages in order to monitor their reactions.

Yang's research is supported by a Facebook Fellowship. Facebook's own research has been criticized in an article in The Guardian by Sam Levin on 1 May 2017 over research in which it sought to alter the emotions of users without their consent, and again by George Monbiot in his opinion piece in the same newspaper on 31 December 2018, stating that "universities are leading us into temptation, when they should be enlightening us". The CMU proposal came under fire at Meta from several leading Wikipedians including BrownHairedGirl, Deryck Chan, Risker, SlimVirgin, and WereSpielChequers when the discussion at Meta spilled over to the Wikipedia Village Pump in a long and heated thread.

Words used by Wikipedia editors to describe the project included:

"...Barnstars awarded among Wikipedia editors and the WikiLove messages I give and receive actually mean something. To use the Barnstars (and potentially the WikiLove system) in the researchers' proposed way devalues their meaning..." – Shearonink (diff)

"Diyiy, can you reply, please, to the part of SarahSV's question where she asks "in whose interests it's being done?" For my part, I want to know why Carnegie Mellon wants to know about Wikipedian behaviour. What benefits accrue to the university? And is the experiment to be of benefit to any of the great manipulators of public behaviour such as Facebook, Google, Twitter, or anyone who desires to sharpen their sophisticated tools even further? Does the university have corporate, government, academic, or other partners who seek to benefit from barnstar-motivation studies? Are you, yourself, a ripe candidate for recruitment by Facebook or similar, based on your current social experiment activity, or arising out of your Facebook fellowship? I am seeking full transparency about any hidden partners or researcher motivations. Cui bono? Thank you." — O'Dea

Aaron Halfaker Photo: Myleen Hollero

In a 455-page paper partly funded by Google, Who Did What: Editor Role Identification in Wikipedia, delivered at the Tenth International AAAI Conference on Web and Social Media (ICWSM 2016), Aaron Halfaker (currently WMF Principal Research Scientist) in his capacity as WMF staff collaborated with CMU researchers Diyi Yang, Robert Kraut, and Eduard Hovy. From the abstract: "Understanding the social roles played by contributors to on-line communities can facilitate the process of task routing. In this work, we develop new techniques to find roles in Wikipedia based on editors' low-level edit types and investigate how work contributed by people from different roles affect the article quality."

"Diyiy and I should have been more precise when saying 'the proposed work has nothing to do with Facebook' and 'Facebook won't benefit at all from the research we've been describing'. We should have said that Facebook does not benefit directly from our research and does not benefit more from this knowledge than do other online platforms. We started this research on the influence of social roles in Wikipedia in collaboration with the WMF and our first paper[1] on the topic was published in 2016 before Diyiy received a Facebook fellowship. The proposed research should lead to generalizable knowledge about the consequences of bestowing recognition and the influence of social roles in online groups. This generalizable knowledge could be useful to many different types of online groups, including Wikipedia, open-source software development communities, online health support groups, peer-to-peer lending groups and many others, including Facebook's online groups."– Robert Kraut

"Every single barnstar I have came as the result of significant effort on my part. I don't understand why the researchers have decided to grant what is, essentially, one of the highest interpersonal symbols of respect on the project to people who have not made the level of contribution that the rest of the community would expect to see when a barnstar is granted. It's like throwing a parade in recognition of successfully emptying the trash baskets, very disproportionate."– Risker

"Sorry but I'm not happy about this. Please see "Wikipedia is not a laboratory". The proposal could be regarded as somewhat "disruptive to the community" in diluting the value of the barnstar, which we would hope is intended as a sincere expression of appreciation from one Wikipedia editor to another. [...] Wikipedia editors are not lab rats and should not be fed barnstars to see if they scurry round any faster afterwards! Feel free to disregard this if other contributors don't see it this way." – Noyster

Winding the clock back...

Seven years ago in April at ANI an attempt by Boing! said Zebedee to retain the dignity attached to the barnstar philosophy, by restricting its rampant willy-nilly use by IP users, a discussion on 'IP handing out random barnstars' was closed with: "Barnstar campaign and other forms of appreciation are not, other than exceptional cases, problematic or disruptive or actionable. This was not the droid you were looking for."

"If the barnstars are to have any meaning, it's probably wrong. However, the guidelines on when to hand out a barnstar are pretty liberal. I suppose you could request a change in who is allowed to give barnstars maybe. Beyond that, though it seems a tad excessive, it's not really uncivil or disruptive. – Avanu"

In April 2012 almost exactly 12 months later Softlavender filed a further ANI report on IP Barnstar spaming: 'I'm all for barnstars, but their value and purpose is diluted (could even say desecrated) when meaninglessly sprayed shotgun by a constantly changing and anonymous IP range for no good reason.'

"...there is far worse vandalism than this, and many more people should be praised for the work they do, but this is just random and devalues well-deserved recognition. The IP editor clearly knows how to edit, and the right sort of phrases etc. to use, so they are not a novice, and could make useful contributions." –Arjayay

The case was closed with: 'While some find random (and inappropriate) acts of Love annoying, no consensus exists for mass action at ANI and cases can be handled one at a time. Changing policy on barnstars is clearly outside of the scope of ANI...'

The phantom barnstar bomber

The wild Barnstarist turns out in both cases to be none other than Mike Restivo editing while logged out in the pursuit of an early research agenda covered in The Signpost column 'Recent Research' from the issue of 30 April 2012. His works are cited by Halfaker et al:

  1. Restivo, Michael, and Arnout van de Rijt. "No praise without effort: experimental evidence on how rewards affect Wikipedia's contributor community." Information, Communication & Society 17, no. 4 (2014): 451-462.
  2. Restivo, Michael, and Arnout Van De Rijt. "Experimental study of informal rewards in peer production." PloS one 7, no. 3 (2012): e34358.

S
In this issue
+ Add a comment

Discuss this story

  • Stolen valor is an interesting analogy, thanks for pointing that out Athaenara. Bri.public (talk) 19:01, 31 January 2019 (UTC)Reply[reply]
  • For what it is worth, everybody recognizes that TWA awards are meaningless and values them as such. Barnstars are another matter entirely, seeing as they (usually) actually signify something. Compassionate727 (T·C) 19:31, 31 January 2019 (UTC)Reply[reply]
User:Herostratus/External Barnstars Already ahead of you. Herostratus (talk) 02:21, 3 February 2019 (UTC)Reply[reply]
  • Well, just because the valuation of something is low (or perceived by some as low) does not mean that said value cannot be debased. That said, I agree with the main thrust of your comments, as to what the major issues of significance here are. While it is not per se unethical to conduct research without informed consent under every single one of the various legal and institutional rubrics which define such matters, in this case (where there were non-anonymous subjects whose responses were to be tested and behaviour monitored), the approach was very obviously inappropriate in the extreme, as an ethical matter. Learning of this makes very curious as to who is on Carnegie's institutional review board and how they possibly thought this was permissible behaviour for researchers. Indeed, to the extent any of these researchers are members of the APA, I wonder how they might feel about this attempted research, given it would have, to my eye, pretty flagrantly violated numerous provisions of the associaton's ethics code. I'm also curious whether the information culled, beyond being shared with the venture's partners, was also intended for use in publication with one of the APS journals, given the role Kraught serves as facilitator for the APS Wikipedia initiative. But putting the APA and APS to the side for the moment though, and returning to the active parties here, this whole affair gives off an odour that does not reflect well on any of the institutions involved. It's an embarrassment that any researcher thought this could end well and yet more indication of how disrespectful academics can be of Wikipedia and community in particular--to say nothing of how laissez-faire they can be about their ethical responsibilities with regard to online research generally. I'd like to say I believe it likely that this affair would stain the reputation of the involved parties among serious researchers, but, the truth is that I rather doubt it will.. Snow let's rap 01:03, 1 February 2019 (UTC)Reply[reply]
  • From an economic/monetary standpoint, it clearly does dilute the value of the Barnstar. Barnstars have a concrete value in that they signify the community's respect and thus lend prestige and authority to the recipient. Every instance of a barnstar created reduces the value of every instance that already exists; they would lend much more prestige and authority, for example, if only twelve people had them.[1] We accept the very small dilution every time someone hands out a barnstar because we consider the value the recipient gets out of it, particularly as a morale boost, and the value the community gets out of it, as a way of identifying reputable people, to be worth it.[2] It is harder to make that argument for thousands of random barnstars, especially in light of the fact that they would lose much of their value as a recognition: you would no longer be able to look at someone's user or usertalk page and think: "That person has a barnstar, I can trust him or her to be smart, competent and experienced" because a ton of people who are potentially none of those things now have them. Compassionate727 (T·C) 19:44, 31 January 2019 (UTC)Reply[reply]

References

  1. ^ Your hug analogy fails because the random person's hug and your SO's hug are actually two different commodities whose value is not directly dependent on one another. The value of your SO's hug would diminish, however, if they gave hugs frequently and/or to many people versus if they only hugged you and only once every quarter. The same would also be true if you received hugs from many people, because other peoples' hugs are a substitute for your SO's hug. Of course, you may perceive your SO hugging you as different or special because they're your SO, which is why we typically don't use emotionally-laden things for examples in economics.
  2. ^ If the number of barnstars is not increasing as a percentage of the total number of users in the community, the relative value of the barnstar over time may not be decreasing at all. It would still be decreasing, however, in absolute terms.
  • @Liz: Without seeing the paper myself, I would guess there are more than a few pages consisting mostly or entirely of raw or uninterpreted data. Compassionate727 (T·C) 19:48, 31 January 2019 (UTC)Reply[reply]
  • The paper in question is 10 pages long. That's a fairly common page limit in computer science conference papers. Maybe Kudpung could update the op-ed to reflect that it's not 455 pages? Cheers, Nettrom (talk) 16:24, 4 February 2019 (UTC)Reply[reply]
Kerry Raymond, I don't disagree in the slightest, but to be frank, any researcher working in the social and psychological sciences who is going to be doing research involving direct stimulus-response testing of individuals needs to have informed consent. That is research ethics 101. Believe me, I understand the complications that this creates for the research itself, particularly in the arena of social psychology, but there are reasons these principles were adopted by the scientific community in the last century and those reasons weren't by any stretch of the imagination trivial concerns. The lack of institutional watchdogs such as you would find in using individuals as test subjects through their employement should not be treated as free license to utilize people as test subjects through open communities online, without obtaining consent. Ethics should not go out the window just because there isn't a sufficient presence to invoke practical liabilities--that's not the bedrock upon which ethical research should lay. And frankly, even a grad student not getting this is an embarrassment to the profession and a sign that their institution has flubbed the task of their basic education in this area. Taking shortcuts through that cut through ethical barriers just because you are conducting it online is no more acceptable than trolling people is acceptable because it's done in the anonymity of the internet. No researcher should feel one whit more comfortable conducting research in an online forum where that same experiment would be clearly unacceptable if conducted at a farmer's market. This isn't rocket science: the people one might be inclined to use as subjects online are still people, and it is still just as shady to exploit them by failing to get consent. And if the only thing keeping research in line were intermediaries with their own liabilities and legal limitations, and not the good ethical sense and training of the researcher themselves, things will get to a bad place fast, as indeed we have seen happen repeatedly in recent times, often involving one of the funding partners in this very research. It's bad enough that we have to worry about this kind of behaviour from social media players, marketing firms, and the political class, all of whom act with such disturbing impunity when it comes to the privacy and consent. To allow academics to get in on that game without any sense of concern as to the implications... Snow let's rap 23:18, 1 February 2019 (UTC)Reply[reply]
I too was shocked at what appeared to be blatant disregard for "informed consent". In spite of having all the same problems, Restivo/van de Rijt got published in what I assume to be a peer-reviewed journal. I looked at that paper and I found the following paragraph near the beginning:
This study's research protocol was approved by the Committees on Research Involving Human Subjects (IRB) at the State University of New York at Stony Brook (CORIHS #2011-1394). Because the experiment presented only minimal risks to subjects, the IRB committee determined that obtaining prior informed consent from participants was not required. Confidentiality of personally-identifiable information has been maintained in strict accordance with Human Subjects Committee requirements for privacy safeguards.
What does this mean for us? It means that as far as the above-mentioned Committees on Research Involving Human Subjects are concerned, it's OK to mess with people's heads, not to mention rending the social fabric, without telling them that they're part of an experiment. What's up with that? This looks like a bigger problem than just one naïve professor at CMU and his grad student. Bruce leverett (talk) 02:49, 4 February 2019 (UTC)Reply[reply]
Yup--and while research institutions have been known to apply that "minimal risks" standard when culling data from pre-existing media, here the researchers were to have been directly experimenting with the subjects, providing stimuli and recording responses and that has traditionally been seen by all institutions, professional associations, and researchers in good standing a brightline rule for when consent is required. Unfortunately, it would seem that the principle of social psychology (that most any person in the contemporary world with web access is familiar with), whereby the consequences of improper behaviour online are seen as less consequential or "real" than the same conduct would be perceived to be offline, applies as much to many researchers with regard to their work as it does to random joes who drops their standards for appropriate conduct. Even though such researchers ought to be more on guard than most people about the irrationality and dangers of such a cognitive bias. Like I said, an absolute embarrassment to the profession and something that needs to be addressed. Someone should do a systematic review of that--a research topic of some actual consequence. Cripes, would I love to see the expression on one of these researchers' faces when, while at a conference, they realized they were being referenced obliquely in a breakdown of slipping ethical standards owing to an inability to contextual rationalization. What sweet irony that would be. Snow let's rap 09:02, 4 February 2019 (UTC)Reply[reply]
I would love to have seen the human subjects research ethics application for this study. I wonder if a FOI request would get it? I've seen some questionable applications go through when 'commercialization' is mentioned.AugusteBlanqui (talk) 11:10, 4 February 2019 (UTC)Reply[reply]
Well, Carnegie Mellon is a private university and thus would not typically be subject to FOIA requests directly, and while HS-IRBS are required to maintain minutes of their meetings and other documentation of their review of proposed research, they are not typically required to file these documents with OHRP or other federal oversight entity unless the agency requests it (for example, as part of a review)--and if the documents are not within the possession of such a federal agency, they typically cannot be reached by a FOIA. (There are possible exceptions where, as alluded to before, the research institution is a state entity or it used federal funds in the research). I suppose it's possible (maybe even probable) that when multiple parties sign on to an 'IRB of record' agreement (this is where the involved institutions agree to allow one IRB to investigate and authenticate compliance for joint research), if even one of the researchers involved is from a state institution (or arguably used federal funds on the research in even a trivial way), the documentaion could be reachable by FOIA that way, even if the IRB in question was that of a private institution that did not use federal funds and did not file the documentation with a federal entity. But I just don't know the regulations that intimately to say for sure. My overall inclination is to say taking this approach could be an ordeal. However, some private institutions try to be more transparent than others and I imagine some may be amenable to public requests. This is where one would begin investigating such an inquiry with regard to Carnegie Mellon. Snow let's rap 12:04, 4 February 2019 (UTC)Reply[reply]
Snow Rise here's a question: how would this research have insured that no minors were used as test subjects? I mean, apparently CMU cares about this. Wikipedia-as-petri dish seems to leave the door open for violations of child protection policies as far as informed consent goes or general ethical guidelines for research. Might be worth Wikipedia Foundation getting the word out. AugusteBlanqui (talk) 12:13, 4 February 2019 (UTC)Reply[reply]
I don't see how they could have, honestly. In many (but not all) research situations involving informed consent, parents or other legal guardians can provide consent as proxies. Here though, the IRB decided it was fine to conduct this research without asking anyone for consent, whether directly or in a guardianship role. That actually raises an interesting question, because I note that Pennsylvania has a statute (Act 153) which requires that all researchers likely to have contact with minors to register with three state entities. Now obviously the type of harm that statute seeks to protect against anticipates mostly in person interactions, but looking at the statutory language itself, I see nothing that obviates the university of that responsibility when contact is restricted to online research. In any event, CMU's own internal child protection policy makes clear that "Programs and Activities Involving Minors" is defined as any program, event, or activity involving one or more individuals under the age of 18 that is...[s]ponsored, funded and/or operated by any Carnegie Mellon administrative unit, academic unit, or student organization, regardless of location. This includes programs and activities conducted on-campus, off-campus, or remotely via the internet or other means of communication" [emphasis added]. I suppose if I were in a dialogue with the IRB, that would be a fruitful question to raise regarding their review of this research--whether all researchers who might reasonably have had contact with minors through this research had Act 153-compliant registration. Snow let's rap 12:44, 4 February 2019 (UTC)Reply[reply]
BrownHairedGirl, SlimVirgin, I thought the two of you might be interested in some of the issues we are discussing here, particularly as we can't be sure this will be the last time we will see something of this sort. Thank you, btw, for providing a check here; we really rely on editors like you who volunteer time on both Meta and the local project as a first line of review of such matters, and you really came through for the community. Also, hi to you both--I hope you've been well? Snow let's rap 13:28, 4 February 2019 (UTC)Reply[reply]
Random distribution of barnstars as part of a behavioral experiment is not an act of kindness though is it? AugusteBlanqui (talk) 17:00, 1 February 2019 (UTC)Reply[reply]
We talk a lot of about intent here on Wikipedia such as in WP:NOTHERE. We are here to build an encyclopedia and I think it is easy to underestimate and dismiss the number of obstacles that stand in the way of doing so, but the overall sum is considerable. Mkdw talk 06:17, 3 February 2019 (UTC)Reply[reply]
@Kudpung: Amen to that. – Athaenara 02:43, 25 February 2019 (UTC)Reply[reply]
I appreciate the points raised above, and agree that there are issues around for example child protection. However, I personally don't think it's productive at this stage to get too locked into those details. My concern is that there should be high-level filters in place, and that those filters should include issue such as addiction and child protection. But it makes little sense to me to start discussion the nature of the filters when there is no framework for all filtration system.
As I noted at the VPM discussion, I have a very low regard for the ethical controls at universities. They are now so heavily dependent on corporate funding that a corporate approach to ethics is hardwired into all their decision-making processes. The responses at VPM by @Diyiy and Robertekraut to my points at VPM about disclosure only underline the convergence between corporate ethics and those contemporary academia.
So Wikipedia needs its own filters. But m:Research:Committee is dormant (or possibly extinct), and there seems to be nothing in its place. Instead we had this proposal brought to community with the support of @Halfak (WMF), who is a previous research colleague of Diyly and Robertekraut. Whatever view anyone takes of either the substantive or ethical merits of that research (see it at m:Research:The Rise and Decline), Halfak had a clear conflict of interest in assessing this research. Yet so far as I could tell from the VPM discussion, there was no other oversight of this project within the WMF.
That is clearly wrong. We need some framework for assessing research ethics either at the WMF or at en.wp, or both; yet we have neither.
I don't try to follow the internal politics of the WMF, so I have idea whether the issues raised at the VPM discussion have led to discussions within WMF; but I have seen nothing publicly about news structures or policy. I think that's a serious and astonishing omission, but it is how it is.
So it seems to me that en.wp needs to set up its own framework for screening research proposals. --BrownHairedGirl (talk) • (contribs) 05:55, 6 February 2019 (UTC)Reply[reply]
@BrownHairedGirl: It would certainly be a start, though it would be a difficult situation for all involved if such a system were not built in lockstep with the WMF, which some under-informed researchers (unfamiliar with the organizational and legal complexities of this project and community) may presume is the only entity with whom they need to communicate such plans. Indeed, in this case, despite the fact that it seems the research was to be carried out in this local community, it seems that no effort was made to seek input anywhere outside of Meta until after the proposal came under the scrutiny of rank and file editors. Incidentally, I noted for the first time today upon review of the proposal page at Meta with a closer eye, that we are months or weeks past when most of this project was supposed to have taken place, and that the first communications here (Dec. 18) took place weeks after the stimulus portion of the experiment was to take place). Are we entirely certain that they did not proceed with any testing before their approach came under under fire? I'd very much like to know the answer to that question.
Anyway, as I was saying, we're going to have real discord (I mean Knowledge Engine levels of animosity, disruption, and distrust) if the local community and the WMF don't operate as a unified front on an issue of this importance. But that shouldn't necessarily stop us from taking preliminary steps. I don't think we would have a difficult time rallying the community to create a policy which states that no research shall be conducted here which involves human behavioural testing relating to the study activities in any way induced by the study itself unless informed consent is sought from each user utilized in said study, and that failure to do so is to be treated as refused consent for each such person.
Really that needs to be in the Terms of Use to have full efficacy (one more reason we need the WMF to hear our concerns here; perhaps it does not hurt to bring WMF Legal into the conversation at this point). But, although it would be one of those very rare policies that is more precatory to outside players than useful for internal processes, creating a community consensus document as to that principle would at least have the impact of putting researchers on notice as to how such behaviour is likely to be regarded here: that would have uncertain effects with regard to later review of their research conduct by those entities (institutional or governmental) who are capable of engaging in oversight of their work at various levels. The professional and legal implications would be quite uncertain without a document that is more expressly legally operative (such as a new section/additional language in the ToU), but given the number of potential complications that might nevertheless arise from disregarding an express statement of this nature (with regard to their institutions, any professional associations to which they belong, the OHRP and other federal regulators, and states Attorneys General, and their funders/commercial partners to name a few interested entities), such a local policy might at least give researchers pause in the future about proceeding with human testing on this platform without first attaining consent. Since it seems we can't always trust them to exercise their professional conduct in this regard by way of their own restraint without making such a blunt statement. Snow let's rap 07:01, 6 February 2019 (UTC)Reply[reply]
Whatever the potential legal consequences to a random 'independent' researcher who violates child protection policies by conducting behavioral research, WMF should have been well aware of child protection issues. If I conduct research in Ireland then I am required to certify whether or not it involves minors. If it does involve minors, or even worse if I admit that I would be unable to tell, then there is not a chance in hell the proposal would get through without informed consent being attached. Zero. A 'minimal risk to participants' rationale might work for adults--maybe. The WMF (and/or the researcher) would be vulnerable in multiple jurisdictions too--what passes for child protection in Pennsylvania might not work in the Netherlands, etc. I'm gobsmacked that WMF 'signed off' on this. Maybe child protection/informed consent was part of the original plan?AugusteBlanqui (talk) 10:47, 6 February 2019 (UTC)Reply[reply]
"If I conduct research in Ireland then I am required to certify whether or not it involves minors. If it does involve minors, or even worse if I admit that I would be unable to tell, then there is not a chance in hell the proposal would get through without informed consent being attached. Zero."
It's meant to work the same way in regard to U.S. research: here are the relevant federal regulations on human testing as regards special protections for research involving minors. Note that both the assent of the child and the permission of a parent or guardian is required, and assent is defined expressly as follows: "Assent means a child's affirmative agreement to participate in research. Mere failure to object should not, absent affirmative agreement, be construed as assent." The IRB is also given the direct responsibility for ascertaining that those requirements are met. Which rather raises the question of whether the researchers here made clear in their application to the IRB that close to 25% of Wikipedians are below the required age of consent for a study of this nature (the regulations also indicate that the age of consent required is the age of consent in the jurisdiction where the research takes place). If not, it raises two unavoidable possibilities, neither one of them great: 1) they just didn't think about the ethical complications here long enough for this to occur to them, or 2) This obvious reality was known to them and they didn't disclose it. If they did in fact disclose this fact in their HS-IRB application, and the Board still let the research move forward, we're potentially talking about an even bigger problem of botched ethical controls at CMU. It's worth repeating here also that Pennsylvania law also requires that any researcher having contact with children obstain a number of different certifications, and that CMU's own internal policies on this make clear that the requirement, insofar as the university is concerned, is meant to those who will undertake such actions online. So I'd be very interested in knowing if these certifications were sought and granted for any party to this research who was going to (or did?) have interactions with our editors, and whether this information was presented in the IRB application or in any review meetings (also required under federal law).
And while these issues regarding minors are certainly very salient and troubling concerns with regard to the ethical controls here, I don't want to get so lost in the weeds of just one of the more eye-catching issues that it seems as if this research was ethically compliant with regard to all the other participants. Because I think very much was not. Putting aside the question of informed consent for adults for a moment--I do not believe most researchers, institutions, or professional associations would view this as acceptable circumstances to proceed without consent, "low risk" or not, but we can table that for a moment--there are numerous other concerns, the most obvious of which is privacy. Experiments which test a subject's response to stimuli (especially those conducted without their consent or knowledge) are meant to be conducted with the utmost confidentiality; there are requirements under law, under the policies of particular research establishments, and under the conduct codes of professional associations. Here, there was absolutely zero possibility of their keeping the stimulus and response of the individual subjects confidential, since they were going to be taking place on very much arguably the world's single most open platform in existence, where every detail of those interactions would be freely viewable to anyone with an internet connection. It is mind-boggling to me that none of this sent up red flags to any of the players involved (the researchers, their university oversight, their financial partners) before this got to the Meta and Wikipedia itself. Snow let's rap 21:30, 6 February 2019 (UTC)Reply[reply]
The process should not be limited to an ethics review; it would also need to include community-level consent. Using an analogy raised by another commenter, imagine if this experiment was proposed to the management of a small employee-owned grocery store and it was discovered that the research was funded by a big-box chain. Even if the procedure was demonstrated to be completely harmless to the participants, it would obviously not be in the best interests of the community and would certainly be rejected.
In this case, a number of editors expressed that they felt uncomfortable with any research associated with Facebook. The researchers did not seem prepared to address these concerns and seemed to think that they only had to convince us that the risk to individual participants was low. We should develop a research approval procedure that involves both an ethics review and community approval, with the understanding that community approval may be withheld for any reason just as individual consent may be withheld for any reason. –dlthewave 14:56, 6 February 2019 (UTC)Reply[reply]
Well, that's just a separate issue from what others have been focused on here--which is not to say it is a non-issue. But I will say that the researchers here did disclose their intentions with a Meta research page--a whole month before they intended to start their research...--though, from what I have seen they did not reach out to the local community until after concerns were raised at Meta (which was, I note with concern, well after they had planned to be already underway in their research). In any event, I share your concern at the cavalier attitude displayed with regard to Facebook being a part of this research, regardless of whether it was through a grant arrangement. If it were quite literally any other company in the world, these concerns would be lessened, but given the company's recent history on privacy issues regarding third party activities that have touched upon so many concerning behaviours, I don't think any concerns in this area can ever be described as mere hand-wringing. One way to address such concerns in the future is to make funding disclosures a requisite part of any research proposal presented at Meta, along with a requirement that such proposals be advertised in major community news spaces for each local project on which research will be conducted. Indeed, when you look at those proposals, they are laughably skimpy on the details regarding parties and oversight--and indeed, give only a partial accounting of the proposed research methodology itself. Snow let's rap 21:30, 6 February 2019 (UTC)Reply[reply]
"In any case, nothing was done, due to the ignited feed-back."
I hope that's true. It would be nice to have confirmation on that either way, given that the original proposed timeline had the researchers beginning testing on Dec. 1st, and as best I have seen, the first concerns about the research were not raised until a time after that. Snow let's rap 21:35, 6 February 2019 (UTC)Reply[reply]
I hope that's true. It would be nice to have confirmation. Dear User:Snow Rise. If they say they havn't, then they havn't... except if they are a bunch of liars. What is your educated guess ? In any case, it would be interesting to have a study of all the barnstars that were granted in the current window of say 12 months, centered on dec. 2018, in order to detect changes of behavior, if any. Or to have a more general study, to detect if there are monthly tendencies, or general trends or what else. How many more theses !!! Pldx1 (talk) 12:10, 9 February 2019 (UTC)Reply[reply]
I have to be honest: I can't really tell if you're being facetious or not. For my part I wouldn't have had any reservation about believing them if they said they had not yet proceeded, despite the projected timeline. But they have never actual said as much as far as I can see, in any of the related discussions, and that fact is what inspired my response to you above. However, I also trust that J-Mo would not be asserting below that they had not yet proceeded into human subject testing in such a factual and assuring manner unless he was privy to additional knowledge beyond what is present in the previous threads (which again, do not provide clarity from either researcher as to this point).
Of course, I'm assuming a lot on good faith there--for example: 1) that Diyi and Robertkraut have not responded to pings here because they are feeling a little bruised by the whole affair and taking a wikibreak, and that they are not avoiding answering questions which they know we would not like the answers to and which they now realize could potentially have real professional consequences, 2) that J-Mo did not give assurances on a mere presumption of his, rather than predicating said assurances on additional inside knowledge that he was privy to that allows him to be certain they did not proceed with testing--but I still have enough AGF in me for each of these individuals to allow for that. I may not be blown away by every aspect of the ethical conduct of these researchers or the tone of the response of certain WMF staff members in responding to community concerns here, but my "educated guess" (as you put it) is that they would not complicate the situation further by being misleading (even through omission or assumption). Snow let's rap 15:36, 9 February 2019 (UTC)Reply[reply]


I don't understand the purpose of this op ed. But I was involved in the discussion around this particular research proposal, I have experience performing and evaluating this kind of research, and I know the alleged perpetrators (or perhaps victims is a better term) well. So here are some facts, however unwelcome they may be to some (not all) of the people involved in this discussion, and in the related discussions on the VP and on Meta.

  1. First, to address the preceding comment: the study was not performed, and will not be performed. Robertkraut and Diyiy engaged actively and in good faith with members of English Wikipedia who expressed a range of concerns about the study, and as a result of that discussion decided not to perform the study, a decision which they conveyed promptly and through the appropriate channels. They are both competent, professional and ethical researchers and have done nothing to my knowledge that would suggest otherwise. Dr. Kraut is a founding partner for the Wikipedia Education Program, and a veteran researcher of Wikipedia and other online communities.
  2. In the case of the 2016 paper, "Supporting in part by... a grant from Google" probably means that one or more of the graduate students involved had a research fellowship from Google at the time. Graduate students in technical fields are routinely funded by research grants from public and private institutions. In general, code, data, and research reports generated by researchers funded under Google (or Facebook, or NSF) grants are publicly available, and the choice of what research to perform is not dictated by the grantmaking entity. So a researcher or team might get $100k to fund 3 graduate students (tuition+stipend) for a year, based on a proposal that says something like "We will investigate the motivations of people who contribute to online communities", but Facebook/Google generally doesn't get to tell them what communities to investigate, or what research methods to use, and doesn't get special private access to their findings.
  3. It is not clear to me why Aaron Halfaker's name (and picture) are being called out here. To me, this has the appearance of an attempt to suggest that he is involved in some sort of unethical or otherwise nefarious activities to undermine Wikipedia, and is using his position within WMF to further those activities. If this is indeed what is being suggested, it is both incorrect and, frankly, kind of gross. Aaron Halfaker probably cares more about the wellbeing of English Wikipedia than any researcher you can name, and has done as much or more to further the goals of the project and benefit its participants—in both a volunteer and WMF staff capacity.
  4. IRBs assess potential for harm, according to evidence-based risk assessment criteria. Without knowing the details of CMU's IRB's response to Dr. Kraut's research proposal, I cannot comment directly on the issue of "how could the IRB let this happen?" But I can say that, to my knowledge, sending people templated expressions of gratitude is unlikely to cause them harm. Potential for online community disruption is out of the scope of IRBs, which is why we have our own documented processes for assessing potential for community disruption when we vet research proposals.
  5. Those processes are effective (as they were in this case) only insomuch as the researchers are willing to comply with them (as they did in this case). However, when researchers are subjected to personal attacks and bad faith accusations (as they were, by some editors, in this case) and drummed off Wikipedia, it undermines the authority of the process we ourselves created. A different, less professional and ethical, set of researchers who want to perform a study on Wikipedia may be less inclined to tell us about it after seeing how these researchers were treated. We as a community have very few effective protections against this kind of behavior. And we already have our hands full with legitimate vandalism, COI editors, and (at least potentially) coordinated attempts to sow disinformation in order to further the aims of state and non-state actors.
  6. The real clear and present danger that bad researchers present to Wikipedia is what they do with editors' non-public data. If someone is running a survey, or conducting interviews or otherwise collecting personal information about contributors, they need to have a clear statement of why that data is necessary to collect, how it will be used, how it will be securely stored, anonymized, etc., who has access, and how long it will be kept. Good faith, professional researchers will have clear answers to these questions. Ideally, their data practices should be verifiable by an external authority (IRBs are good at this). Bad faith researchers, or even simply naive researchers who didn't spend years in grad school, may not. Treat bad answers as red flags.
  7. Research benefits Wikipedia directly. We know what we know about the gender gap and the editor decline because of research. Intervention-style research can be an invaluable tool for figuring out how to address pressing issues like the gender gap, knowledge gaps, the editor decline, toxic cultures, vandalism, disinformation, editor burnout, and systemic bias. Some kinds of interventions don't work as well if everyone knows they're being 'intervened' with. Not having that knowledge can have a range of effects, from null/negligable to pronounced. And sometimes we don't like being 'intervened' with even if we believe the intervention isn't likely to be harmful. The potential benefits and risks of any particular intervention should be weighed based on a reasoned assessment of the nature, and scale, of both intended and unintended consequences. We have processes for that, but those processes depend entirely on good faith collaboration between researchers and community members.
  8. Research furthers Wikipedia's mission. In order to make the "sum of all human knowledge" available to everyone we need to understand how Wikipedia came to be, how it works, and even how it doesn't work. Researchers shouldn't expect that they can use "but we make science!" as a blanket excuse to do whatever they want, but the potential mission-aligned benefits of understanding this or that social or psychological feature of Wikipedia (and the people who write it) are valid points for consideration when weighing risk vs. reward.

Finally, an appeal: assume good faith of researchers who approach the Wikipedia community openly and honestly. Recognize that your own preconceived notions about what a researcher wants, or what affiliation and funding sources they have, may be incorrect or incomplete. Ask questions, but try to ask them like you'd interview a job candidate rather than like you'd interrogate a criminal suspect. You can't stop truly nefarious researchers, at least not in a systematic way. You can teach good faith researchers how to respect community norms. And you can tell them "no" and trust they will comply with community decisions. But treat them all as de-facto enemies, and you lose all the potential benefits of research while reaping none of the rewards and doing nothing to curb risks of individual harm or community disruption. J-Mo 23:51, 6 February 2019 (UTC)Reply[reply]

Exceptionally long post: enter at risk of eye strain. Snow let's rap 04:15, 7 February 2019 (UTC)Reply[reply]
:
J-Mo, I'll attempt to respond to your points in the order you have raised the issues, to the extent it is feasible while summarizing what I believe are the concerns that have been raised here.
1. First off, thank you for confirming the research did not proceed as planned before community concerns began to surface; based on the timeline presented in the Meta proposal and the date at which push-back began to develope from the community, that was not at all clear. I was hoping Robertkraut or Diyi would speak to that question, but given the strong sense of certainty you provide in your assurances, I assume you are privy to additional information (that was not made public in the Meta or VP discussion) as to where in their process they stopped. Therefore if you are saying they ceased pursuing this project before any engagement in human subject testing, I'm sure we can of course take you at your word about that and put those concerns to bed. I would note, however, that this episode underscores a need for the local communities who are to be the subject of research to be directly informed of such proposals so that objections can be raised much sooner--rather than just before (or during) the actual research itself. If the goal is to solicit the community's feedback on the proposal, a page on meta, absent promotion on the target project itself, is never going to be very effective in addressing concerns before they become urgent. That's the first thing that needs to change in our procedures.
That issue addressed, I must tell you that I nevertheless do not at this moment in time have as rosey an outlook as you do with regard to the professionalism and ethics displayed in how this research was approached. I'm going to hazard a guess here, based upon your previous comments in the VP discussion and your current assessment here, that you have only ever been a 'researcher' in the commercial/private meaning of that term. Because, had you ever undertaken research in the behavioural sciences in an academic setting, I believe you would better recognize why there are some serious questions here with regard to how these researchers approached issues such as informed consent and privacy protections for human subjects. IRB approval and hand-waving regarding "low risk" or not, I must tell you that approaching subjects in this fashion would not generally be seen as acceptable by most researchers in the social, behavioural, and psychological sciences--nor by most institutions and professional associations that provide oversight for such research. Indeed, going even farther, I believe this research, had it proceeded, could have run afoul of federal regulations (and potentially state statutes) governing the testing of human subjects--particularly with regard to privacy protections and (even more so) the use of underage subjects, for whom the assent of the subject and permission of their parent or guardian is always required (outside a handful of exceptions which do not apply here) and cannot be assumed. It is for exactly this reason that I wonder if the IRB was given all salient information here when making their determination, because I have a hard time seeing how they would approve this research if they knew that nearly a fourth of Wikipedia's editors are below the requisite age of independent consent that is relevant to this particular research.
You have spoken repeatedly in the previous discussions and here about other "potentially less ethical" researchers invading the project if we do not present a welcoming front to those willing to submit proposals. But it is worth noting that in every example you have provided thus far, the research in question at least made the individual being approached aware of the fact that they were talking to a researcher, and sought their willing engagement with the process. While I agree that the examples you provide nevertheless present issues that we as a community (and individuals) should be concerned about, such voluntary procedures--those which use surveys and passive studies of previous (non-induced) data--are considered by oversight entities (both governmental and institutional) to be fundamentally different from the process of exposing a subject to a test stimulus and then observing their reaction. These types of experiments are generally classed separately and, even in the rare case where an exception for consent might be permissible, that exception is not made for minors, and there must be controls for the protection of the privacy of all subjects--something that would have been infeasible on a platform such as this. My main point here under this first section of response being that the ethical questions that are at least raised here are not by any means trivial ones, and they aren't the type you should be eager to dismiss simply by repeatedly re-asserting that you personally think they are well balanced to achieve benefits with "low risk".
2. This is not really where my main concerns lay, and obviously I cannot speak on behalf of those who have raised these concerns. But I will say that your assertion that Facebook does not typically get privileged access to data in its grant agreements is by no means a universal principle--to be fair to you, you did throw in the "generally" there, but I think the general thrust of your statements in this area attempt to provide a degree of assurance that is undue given Facebook's historical (and indeed recent) practices--especially insofar as I presume that you have no particular knowledge as to what degree of data sharing that was agreed to with regard to this particular grant. In fact, this is a big problem for us in general, and I don't see any reason why we should not require disclosures of both financial backing and data-sharing arrangements made by any researcher wishing to advance a proposal here; there's no reason they shouldn't be required to show the same level of transparency towards us as they do the review boards at their respective institutions. As a project, Wikipedia has as much skin in the game (including potential liabilities) as any party, and if researchers wish to avail themselves of this platform for their research, they can be up front with us about anything that might look like a conflict of interest or source of potential exposure for the privacy and personal data of our community members.
3. I'm not sure as to that myself I presume (absent any information to the contrary) that the previous research used sourced data rather than direct human subject testing, and so it is not super relevant, other than Kudpung's stated purpose in showing a previous close working relationship between Mr. Halfaker and the researchers here. However, I suspect part of the reason this was raised was because it seemed as if the WMF's researchers were circling the wagons to insulate the study's researchers (and the proposal itself) from criticism. As someone who did not participate in that thread and now now is on the outside looking in, I must tell you that it's very difficult to tell how much you and EpochFail were commenting as community members there and to what degree you were speaking in your WMF capacities, which I'm sure you will agree is potentially problematic. Further, there are places there where I would describe your comments as needlessly antagonistic towards expressed community concerns. I understand that this was obviously motivated by a desire to protect a pair of individuals whom you respect and whom you felt had acted in good faith. But the most ideal way of doing this is not to accuse others of bad faith, as you did during that discussion and elsewhere. I see no one in the entirety of that thread who seemed to be acting out of anything but concern for the project and its users, or in any other way which would entail "bad faith" as that term is usually used on this project (vandalism, trolling, gamesmanship, sockpuppetry, ect.). Even where they were focusing on issues that you and I may agree were not the most salient issues to contemplate (devaluation of the barnstar and so forth) I wouldn't say that they were completely irrelevant concerns for the community--and, in any event, the community members were obviously being sincere. I think that "bad faith" wording and some other comments represent poorly chosen language on your part that may have served to inflame perceptions of bias on the part of the WMF researchers in this situation.
4. You're right, we don't have access to the IRB's thinking, and in my view, that's another problem. Before we ever consider human testing research on this project again, we may very well consider requiring that exact information; IRBs are required by law to keep minutes of their research review meetings in a very proscribed format, and we could consider requiring a copy of these documents be presented with any human subject proposals in the future. Afterall, we have our own ethical obligations to our community members and I see no reason why we should have less access to the researcher's accounting of the ethical questions raised by their study and how they intend to control for it, for the purposes of making our own decision on whether to allow it to proceed.
5. & 6. I'm not sure just how effective our procedures are here; from where I am standing, they could do with some strengthening, and this situation demonstrates precisely why. I also feel like you are presenting us with false choice here between relaxing protections to make "good actors" feel welcome and actively driving them underground. First off, if they are truly good actors as that term should be applied to behavioural researchers, they wouldn't be inclined to subvert our rules and normal ethical considerations based on how warm a welcome they receive. If a given researcher can't be trusted to comport themselves with out community rules and the mandates of their own profession with regard to ethical research, simply because they are concerned about being grilled here, and they would consider just ignoring our processes instead...then they certainly can't be trusted with the much more demanding responsibilities of protecting user confidentiality and seeking proper informed consent--and they therefore aren't the type of person we should be tailoring our approach towards in any event.
Also, I disagree that there's nothing to be done about the "bad actors". Where they are simply hoovering up information, of course we can't stop that, but there's also no reason to stop them, insofar as everyone who participates on this project agrees to allow their contributions and statements to be freely accessible and usable for almost all purposes. But where an outside researcher is trying to trigger a response, that kind of activity is going to leave a record and people are going to notice suspicious patterns. If "nefarious" researchers attempt this without having their projects approved by the community and seeking informed consent, we can shut them down just like we would any other WP:disruptive user. And supposing that that they did get past our guard and engage in shady behaviour and they are academics, as soon as they publish or present findings, they can be reported at many different levels of oversight that will have potential professional complications for them--depending on the exact nature of their conduct. Commercial researchers, of course, are a little less amenable to such controls unless they do something blatantly illegal or which would bring them negative press. But commercial researchers (to the extent they come here, which I think is uncertain) are probably not likely to come through the approval process in any event, and there's not point in trying to adjust it to their whims.
7. & 8. Good faith is a two way street. I can't disagree with you that research is of vital importance to us, but it has to be approached in a non-disruptive and ethical fashion or else it will be a net negative to the project. And no researchers should ever be allowed to waltz through the front door to conduct whatever tests they want on our contributors, based solely on their own idiosyncratic analysis (not even when informed by their own IRB process) as to whether the risks and consequences outweigh the benefits. The community should always conduct its own analysis of that question, and should be afforded a high degree of transparency with regard to the researchers' intentions, methodologies, previous institutional reviews, the uses to which their data will be put (including especially with whom and in what way confidential information will be shared), and any potential conflicts of interest. And regardless of the answers to those questions, where their work involves treating our community members as test subjects, we should always, without a single exception, require that they get informed consent from anybody they wish to utilize in that fashion. For anyone who is unwilling to meet those requirements, WP:NOTLAB applies and accounts operating outside of our policies should be shut down, same as with any other disruptive user.
As to your final paragraph I agree with you thoroughly. I would only add that I don't think anyone has treated the researchers here as the "de-facto enemy". The concerns raised have been reasonable and in keeping with the objective of learning from this episode and designing more robust procedures that will benefit both researchers and the community. Nobody's objective, insofar as I have seen, is to "shame" anyone. But there are serious questions raised here as regards respecting the privacy and autonomy rights of volunteers to this project. They are necessary questions to contemplate whenever we consider approving research on this platform. Snow let's rap 04:15, 7 February 2019 (UTC)Reply[reply]
Snow Rise I was unable to keep my response brief, and it felt weird to post another wall of text in the "comments" section of this Op Ed, so I decided to post it on your talkpage instead. Cheers, J-Mo 00:56, 11 February 2019 (UTC)Reply[reply]

There was actually a similar experiment at the German Wikipedia that showed that barnstars increase new editor retention. I wonder what the difference was between the two that only one was allowed to go forwards. In addition, there's also m:Research:Testing capacity of expressions of gratitude to enhance experience and motivation of editors. What determines whether or not a given experiment like this will be permitted? By the way, should this talk page be broken into sections? It seems to have gotten pretty long. Care to differ or discuss with me? The Nth User 16:15, 28 April 2019 (UTC)Reply[reply]




       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0