The Signpost

In the news

WikiLeaks and Wikipedia; Google–WP collaboration to translate health information

Contribute  —  
Share this
By Wackywace, Tilman Bayer, and Tony1

Difficult relationship between WikiLeaks and Wikipedia

The logo of Wikileaks

As reported in last week's "In the news", the continuing media attention for Julian Assange's website WikiLeaks (which recently published thousands of documents revealing what The Guardian newspaper called "the true Afghan war") has had an adverse effect on Jimmy Wales and Sue Gardner[1] who both reported that they frequently had to deal with people who mistakenly assume that they are involved in Wikileaks. In an interview with The Independent, Wales said he was "fed up" with the volume of e-mails he was receiving that attacked him for "putting the lives of thousands of US troops at risk", but that his reaction was to "just roll my eyes, chuckle to myself and tell them they've got the wrong man." Somehow in agreement with Assange, who had responded to Gardner's statements by pointing out that "wiki" was around a long time before "Wikipedia", Wales said that he could not just copyright the word 'wiki', because he did not want to prevent other people from starting wikis. He said it is not the Wikimedia Foundation's "style" to "come to blows with Wikileaks in court". He did say, however that he thinks "the issue of having to protect our name is something I can anticipate coming up again in the future."

Both Wales' concern and the public confusion between the two "web behemoths" (The Independent) are nothing new. On January 3, 2007, (the day of the first public report of Wikileaks' existence), the domains wikileaks.com, wikileaks.us, wikileaks.biz and wikileaks.net were registered, apparently by Wales out of concern about the name. To this day, they belong to Wales' company Wikia, although they now appear to be used by Wikileaks in addition to its main site wikileaks.org.

Around that time, Wikileaks described itself as "an uncensorable version of Wikipedia for untraceable mass document leaking and analysis."[2] According to internal emails published by Wikileaks co-founder John Young on his own leaking site, Cryptome, the Wikileaks founders deliberately used Wikipedia's fame to draw attention to themselves among "1000 other organizations jostling for time" at the January 2007 World Social Forum in Kenya, where Assange lived around that time: "even those [people who aren't net-savvy] can't but help hearing about the wikpedia."

Julian Assange (2009)
Although Wikileaks' founders were sometimes quoted as having "no ties" with Wikipedia at the time, there was widespread confusion between the two, as noted by German Wikipedian Avatar, who described a particularly egregious example in his blog: A commentary published in the January 25, 2007, print edition of Die Welt, one of Germany's major broadsheet newspapers, which called Wikileaks "the youngest branch" of Wikipedia, and went on to explain that "the Internet encyclopedists with their exhibitionist will for subversion now want to spread inaccessible, allegedly suppressed knowledge, too", which "perfectly matches the Wikipedians' weird image, who always communicate the image of being a grassroots democracy in their content, but often appear to be anarchist or even chaotic in their methods." (After Avatar complained to the newspaper, the online version was quickly corrected.)

Back to 2010: In a post on his "The Wikipedian" blog some weeks ago, titled "WikiLeaks: No Wiki, Just Leaks", William Beutler (User:WWB) speculated "that the site was so named to borrow from the credibility enjoyed (and earned) by Wikipedia". He observed that while Wikileaks was running on the MediaWiki software, all participatory functions of the software seemed to be disabled, along with those providing transparency (history and recent changes).

However, Assange initially did intend to model Wikileaks after Wikipedia's collaborative processes, according to his remarks at a symposium at the Berkeley Graduate School of Journalism on April 18 (video, from 36:30, or on YouTube), where he explained what made him change his mind:

Wikipedia has at least twice been the subject of leaks published by Wikileaks. In 2009, an archive file was uploaded to Wikileaks that was described as containing postings from a private mailing list of some Wikipedia editors, several of whom were later sanctioned by ArbCom due to their involvement in the list. In November of the same year, during a debate in the German public about deletionism on Wikipedia (see Signpost coverage: "German Wikipedia under fire from inclusionists"), Wikileaks published a copy of one of the articles in question, which had been deleted for notability reasons. And in December 2007, Assange had generated media coverage with a report titled Wikileaks busts Gitmo propaganda team, largely containing Wikipedia edits carried out under an IP belonging to the Joint Task Force Guantanamo (but that connection had already been noted four months earlier by User:Computerjoe).

At the same time, Wikileaks and Assange appear to have been concerned about their coverage on Wikipedia. On April 9, a Twitter message by Wikileaks read "WL opponents seem to have created Julian's Wikipedia page. For ethical reasons we can't edit. Please fix". (Privatemusings, who had started the article Julian Assange, replied that he was "not sure where that's coming from".) On April 21, Assange posted comments on the article's talk page itself, again acknowledging that "it would not be ethical for me to edit this article directly" but stating that "The nature of my work, exposing abuses by powerful organizations and nation states, tends to attract attacks on my person as a way to color debate. The history of this page has numerous examples" (without naming any) and voicing concern that a portrait photo used in the article "tends to undermine my message" (he later donated a publicity image under a free license). On August 16, Assange said "there are frequent attempts by military apologists and others to manipulate our Wikipedia pages".


Major Google–Wikipedia translation project: Health Speaks

Hospitals and doctors are unavailable to much of humanity, so providing free, accessible health-related information in local languages could have a major impact on health outcomes.
Gaining free access to online information could make a huge difference to the lives of many people in developing countries. Jennifer Haroon, Google.org's Manager of Health Initiatives, says that "in most parts of the world, ... quality information that would help people improve their health is not available online in local languages". She points out that as far back as 2004, the prestigious British medical journal The Lancet ran an article describing the lack of access to health information in local languages as a "major barrier to knowledge-based healthcare in developing countries [and that] among currently available technologies, only the Internet has the potential to deliver universal access to up-to-date healthcare information."

Fast-forward to 2010, and Google.org has announced on its home page a new collaboration with the English Wikipedia's WikiProject Medicine on an initiative called Health Speaks. Haroon says the aim is to support pilot translation projects in which "volunteers translate health articles into [Arabic, Hindi and Swahili] and publish them online on Wikipedia for all to access". The project will explore active cooperation between professional medical editors (hired by Google) and any interested Wikipedians to further improve the quality of articles selected by the Google Foundation. Haroon says, "We have chosen hundreds of good quality English language health articles from Wikipedia that we hope will be translated with the assistance of Google Translator Toolkit, made locally relevant, reviewed and then published to the corresponding local language Wikipedia site." The countries targeted are Kenya, Egypt, Tanzania, and India; among those encouraged to register for volunteer translation and translation review are medical, nursing and public health students; health professionals; health NGO employees; and other students.

Volunteers will work from a dedicated site for each language, with access to an ongoing table of articles and their translators'/reviewers' usernames. Volunteers may use the online Google translator toolkit to assist their task, and can register for news updates and information about training events. For the first two months, Google will donate 3 cents (US) to three non-profit medical institutions in India, Egypt, and east Africa for each English word translated, up to $50,000 each; these institutions will provide local support for the program.

The collaboration follows activities by Google.org's parent company Google, which likewise uses the Translator Toolkit to translate Wikipedia articles (Signpost coverage: Google uses machine translation to increase content on smaller Wikipedias and Wikipedia, Google Translate and Wikimedia's India strategy).

Wikipedia's health-science experts comment

Tim Vickers is a US-based biochemist and a key player in our Google Project Taskforce. He says the main issues for Wikipedia will be "improving the quality of the articles as much as possible before they are translated, and trying to tone down the US-centred approach we commonly take to topics." Fvasconcellos is a professional translator and an experienced member of both WP:MED and WikiProject Pharmacology. As a translator, he is glad of two things: the first is Google's choice to use Google Translator Toolkit rather than Google Translate. "The Toolkit is basically a postediting suite, and thus allows articles to be translated more easily and quickly than if the job was done manually, while reducing much of the inevitable inaccuracy of machine translation." The second thing is Google's use of "folks who know what they're doing—the usernames suggest many translators and reviewers are medical professionals. Professional input is of the utmost importance when translating medical and health-related content, as the pitfalls and potential implications of inaccuracy can be far more significant than in other fields.... when you're talking about drug dosage, symptoms, or how to prevent infectious diseases, quality assurance can become a matter of life or death."

Vasconcellos says, "I do think there is a risk that talk pages will become magnets for misguided medical advice, possibly dangerous anecdotal information, and quackery. This happens in the English Wikipedia all the time [and] should certainly be on the minds of all those involved. What's the user base like? Will knowledgeable users be watching the articles for misguided edits, and the article talk pages for inappropriate content?" Kilbad (of the Dermatology task force) says: "As the coverage of medicine-related content continues to grow and improve on Wikipedia, the number of talk-page requests for medical advice will likely increase. [If this occurs, it will be] important to make Wikipedia's medical disclaimer (WP:MEDICAL) a prominent feature of these pages to discourage the giving and/or asking of advice". However, Tim Vickers feels the "talk-page advice" problem is not that common on enwiki pages, and is likely to be "similarly rare in other languages. Most of the important medical articles are on multiple watchlists, so this won't be a big job."

To Vasconcellos, there are more significant issues. "What will happen when well-meaning users start adding sources that enwiki would not use due to quality concerns? Arguments on the appropriateness of sources happen every day at enwiki. How will they be handled? Will they happen at all, or will articles grow and evolve in an unorganized fashion, with questionable content added and left to 'fester' unchecked?" And "how will traditional medical practices be dealt with? How do these Wikipedias deal with this sort of information at the moment? Is it allowed? Is it given equal footing with (Western) science-based content? We strive to make the medical content of enwiki evidence-based, and by that I mean based on recent, high-quality scientific sources. Is that practice followed at Hindi Wikipedia, for instance, and if not, should it be? Should content based on traditional medical views of malaria or diabetes, for instance, be added/allowed/encouraged for cultural reasons?"

For the Wikimedia Foundation, working through these and related issues is likely to make Health Speaks a fascinating venture into cross-cultural and -linguistic management. If it succeeds, Google hopes the program will grow into a larger scheme involving more languages and more articles.


Briefly

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Wikileaks

Wikileaks have a page about contributing at http://wikileaks.org/wiki/WikiLeaks:Writer%27s_Kit. PhilKnight (talk) 09:41, 7 September 2010 (UTC)[reply]

Not big payers, are they. You'd hope your investigative costs were insignificant. Tony (talk) 10:41, 7 September 2010 (UTC)[reply]
Based on their talk page, I gather they aren't paying anything. PhilKnight (talk) 10:46, 7 September 2010 (UTC)[reply]
I have to say, I wish Jimmy hadn't registered those Wikileak domain names. It rather muddies the waters. I think what we should be hoping for is that 'Wiki...' as a prefix becomes so widespread that we wouldn't attach it to any one movement any more than we would '... .com' or '.org' --bodnotbod (talk) 09:41, 8 September 2010 (UTC)[reply]


Note: The Wikipedia-related leaks/reports linked in the article have since been removed from wikileaks.org (http://wikileaks.org/wiki/Wikileaks_busts_Gitmo_propaganda_team and http://wikileaks.org/wiki/Mogis_Wikipedia_article_and_history_before_removal,_Nov_2009 are currently giving 404 error messages), as have many other previous leaks. For the time being, the description page for the first is still readable in Google's cache. A copy of the second appears to be here.

Jimmy Wales has since clarified the issue of the domain names:

"The domain names were legally transferred to Wikileaks a long time ago, but for unknown reasons, Wikileaks never completed the technical aspects of the transfer. Wikia has made multiple requests to them to do so, with no result yet."

(In a recent interview on the Charlie Rose show, he mentioned having been in contact with Assange about the issue.)

And in the comments to WWB's blog posting, a reader linked to a historical version of the Wikileaks FAQ which illustrates the changes in policy mentioned in this Signpost article more clearly:

Regards, HaeB (talk) 11:21, 30 November 2010 (UTC)[reply]

Campaigns Wikia

Campaigns Wikia currently redirects to Wikia, but that is not what we are looking for. Can we pipe the link instead to say a webpage about Campaigns Wikia, or its article (if there is one)? ANGCHENRUI Talk 11:09, 7 September 2010 (UTC)[reply]

I've linked to http://campaigns.wikia.com/wiki/Campaigns_Wikia. PhilKnight (talk) 20:49, 7 September 2010 (UTC)[reply]

"Copyrighting" a word

"Wales said that he could not just copyright the word 'wiki', because he did not want to prevent other people from starting wikis." Surely Wales meant he could not just trademark the word 'wiki', right? TJRC (talk) 01:05, 8 September 2010 (UTC)[reply]

Google Translating medical articles

I would have thought that we'd classify medical articles as the very last category of articles we would want to apply machine translation to, because of the sorry state of the latter. Comet Tuttle (talk) 03:17, 9 September 2010 (UTC)[reply]

I believe it is strictly as an adjunct to human translation. Tony (talk) 03:45, 9 September 2010 (UTC)[reply]
Yes indeed. While still incipient in many ways, the practice of postediting—that is, having real translators rework and rewrite a machine translation into an accurate and functionally equivalent text—is increasingly common and is being used to great effect in several fields. With a good system in the hands of a good translator, it is a surprisingly powerful tool. Some extremely high-volume translation work, such as that required by international organizations, would be next to impossible without the aid of machine translation. Emphasis on aid. "Raw" MT is indeed a disaster in most contexts and, unsurprisingly, its use often has equally disastrous consequences. Fvasconcellos (t·c) 02:26, 10 September 2010 (UTC)[reply]
What Fv says. And to reinforce his point, as I understand it, speed is one of the big issues. It is much faster to fix up a bad machine translation that to translate from scratch. Editing the bad into the good actually allows the translator to focus on different things in the relationship between the original and the translated equivalent—subtle nuances that are more likely to be drowned in the pure grunt work of translating from scratch. However, fixing (bad) machine translation is no bed of roses: it involves lower-level work, but just less of it by proportion. As with all translation, it works very well if the translator knows the original language reasonably well and the target language very well.
It occurs to me that as this project gains momentum, some of the linguistically inclined volunteers and Google-paid professionals might collaborate to produce a guideline specific to the task of translating medical texts from English to other languages with machine assistance. Perhaps those new to the task can be warned of pitfalls, of common quirks thrown up by the machine process, of things to look out for. It could be partly generic and partly language-specific. It could be combined with a guideline on the cultural sensitivities of the target readers, and how to handle the vexed issues raised in the Signpost article above concerning traditional medicine. Tony (talk) 02:44, 10 September 2010 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0