The Signpost
Single-page Edition
WP:POST/1
25 December 2013

Recent research
Cross-language editors, election predictions, vandalism experiments
Featured content
Drunken birds and treasonous kings
Discussion report
Draft namespace, VisualEditor meetings
WikiProject report
More Great WikiProject Logos
News and notes
IEG round 2 funding rewards diverse ambitions
Technology report
OAuth: future of user designed tools
 

2013-12-25

Cross-language editors, election predictions, vandalism experiments

Contribute  —  
Share this
By Daniel Mietchen, Maximilian Klein, Piotr Konieczny and Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Cohort of cross-language Wikipedia editors analyzed

Network graph of the cross-language Wikipedia edits analyzed in the study.
The same network, with the node for the English Wikipedia removed.

Analyzing edits to the then 46 largest Wikipedias between July 9 and August 8, 2013, a study[1] identified a set of about 8,000 contributors (labeled multilingual) with a global user account who have edited more than one of these language versions (excluding Simple English, which was treated separately) in that time frame. It tested five hypotheses about cross-language editing and editors and looked, for instance, at the proportion of contributions that any of these Wikipedias receives from multilingual editors versus contributions from those only editing one language version. The research found that Esperanto and Malay stick out with a high proportion of contributions from multilinguals, and on the other end, that Japanese has few contributions from multilinguals. Overall, in terms of edits per user, multilingual users made more than twice the number of contributions to the study corpus than monolinguals did; they often work on the same topics across language; and in any given language, they are frequently editing articles not edited by monolinguals during the one-month period analyzed here. They thus serve a bridging function between languages.

Two existing write-ups are good starting points to putting the study in context.[supp 1][supp 2] In the long run, it would be interesting to extend the research to (a) cover a longer time span, (b) include contributions from non-registered users, despite technical difficulties, (c) include smaller Wikipedias, and (d) explore the effects of that bridging function in more detail, perhaps in search for ways to support its beneficial effects while minimizing the non-beneficial ones. It would also be interesting to focus on some aspects of those multilingual users (e.g. how do the languages they edit in match with the languages they display on their user pages) or their contributions (e.g. how do their contributions to text, illustrations, references, links, templates, categories or talk page discussions differ across languages, or how contributions from multilinguals differ across topics or between pages with high and low traffic – or to entertain ideas for a multilingual version of editing tools like User:SuggestBot. The paper is one of the first to make use of Wikidata; comparing such cross-lingual Wikipedia contributions with contributions to multi-lingual projects like Wikidata and Commons may also be a fruitful avenue for further research. (See also earlier coverage of a CSCW paper about a similar topic: "Activity of content translators on Wikipedia examined")

Attempt to use Wikipedia pageviews to predict election results in Iran, Germany and the UK

A new paper on arXiv asks the question "Can electoral popularity be predicted using socially generated big data?"[2] Operating on the assumption that "sentiment data is implied in information seeking behaviour," the authors Yasseri and Bright compare Wikipedia page views and Google search trends to election outcomes in Iran, Germany and the UK. In Iran and the UK, where the researchers were able to use the articles of individual politicians, the page view and search trend data correctly pick the winners of the elections. In the UK, the data polled even correctly picks the orders of the runners-up, but the same is not true for Iran. In the German case, no correlation is found between search data and election results. Yasseri and Bright defer to the argument from previous studies on Twitter prediction that conclude that the sample data is too self-selecting. Overall, it is shown that "people do not simply search in the same proportions that they vote." Still the researchers note that these techniques react "quickly to the emergence of new 'insurgent' candidates."

Integrity of Wikipedia and Wikipedia research

A book titled Confidentiality and Integrity in Crowdsourcing Systems contains a chapter on the integrity of the English Wikipedia as a case study of integrity management in crowdsourcing systems.[3] To test the integrity of Wikipedia, they first tried to start a new article with "invalid content" (it got deleted) and then turned to vandalizing pages systematically, both of which violates Wikipedia policies (cf. Wikipedia:Vandalism). They noted that simple cases were caught by automated counter-vandalism tools (ClueBot and XLinkBot, whose user pages – one of them with a typo – are the only references cited in the chapter), whereas more subtle cases ("incorrect information containing words related to the page’s topic" or adding external links present in related Wikipedia articles) were not. No indication was given as to whether these inappropriate edits had later been removed (by the authors themselves or by other users), nor what the affected pages were or what IP address(es) they had used to make those edits.

In a next step, the authors went through dumps of the English Wikipedia from 2001 to 2011 and analyzed revision histories for "100 good and featured articles" (which refers to Wikipedia:Good articles and Wikipedia:Featured articles – later, they call this set "high-quality articles") and "100 non-featured articles" (by which they mean neither good nor featured – later, they refer to this set as "low-quality articles"). In this sample (of which no further details are given), they observed that the number of contributions to high-quality articles is about one order of magnitude higher than that of low-quality articles and "that there is a highly active group of contributors involved from the creation of high quality articles until present", while most editors to low-quality articles never contributed to those pages again. They then looked at revert rates, at the overlap between sets of top contributors to a given article across years, and at the range of topics edited by top contributors to an article, observing that "the top contributors have become the owners of high quality articles and their engagement has increased" (which runs contrary to WP:OWN), "[T]his results in higher quality for a small portion of articles in Wikipedia" and "[T]op contributors of high quality articles are more like- minded than the top contributors of low quality articles", concluding "that the main difference between low quality and featured articles is the number of contributions."

From that, they venture into extrapolating to crowdsourcing systems more generally: "[w]e observe that to have higher integrity in crowdsourcing systems, we need to have a permanent set of contributors who are dedicated for maintaining the quality of the contributions to the articles. For systems with open access such as Wikipedia, this can be a huge burden for the permanent editors. Therefore, we need new mechanisms for coordinating the activities in a crowdsourcing information system." No discussion of these new mechanisms is offered.

The chapter has a few simple tables and plots but no link to the underlying data nor the code used for the analysis, nor links to relevant literature or Wikipedia policies, but it is paywalled behind a price tag of $29.95 / €24.95 / £19.95. Given that the experimental edits to Wikipedia actually damaged the project, it is hard to imagine that an ethical review panel involving Wikipedians might have approved the study in that form. In fact, such a panel does exist in the form of the Research Committee, which had not been contacted about the project. Considering further that the conclusions of the study are not new, their possibly interesting implications for crowdsourcing more generally are not discussed and neither the paper nor its materials are available to those concerned about the integrity of Wikipedia, it is hard to see any benefit of this study that would outweigh the damage it caused (cf. earlier coverage: "Link spam research with controversial genesis but useful results", "Traffic analysis report and research ethics").

Briefly

References

  1. ^ Hale, Scott A. (2013). "Multilinguals and Wikipedia Editing". Proceedings of the 2014 ACM conference on Web science - Web Sci '14. pp. 99–108. arXiv:1312.0976. doi:10.1145/2615569.2615684. ISBN 9781450326223.
  2. ^ Taha Yasseri; Jonathan Bright (2013). "Can electoral popularity be predicted using socially generated big data?". arXiv:1312.2818 [physics.soc-ph].
  3. ^ Ranj Bar, A.; Maheswaran, M. (2014). "Case Study: Integrity of Wikipedia Articles". Confidentiality and Integrity in Crowdsourcing Systems. SpringerBriefs in Applied Sciences and Technology. p. 59. doi:10.1007/978-3-319-02717-3_6. ISBN 978-3-319-02716-6. Closed access icon
  4. ^ https://fosdem.org/2014/schedule/event/how_we_found_600000_grammar_errors/ (abstract only)
  5. ^ Azer, S. A. (2014). "Evaluation of gastroenterology and hepatology articles on Wikipedia". European Journal of Gastroenterology & Hepatology. 26 (2): 155–63. doi:10.1097/MEG.0000000000000003. PMID 24276492. Closed access icon
  6. ^ Benkler, Yochai, Aaron Shaw, and Benjamin Mako Hill. "Peer Production: A Modality of Collective Intelligence". (draft paper) http://mako.cc/academic/benkler_shaw_hill-peer_production_ci.pdf
  7. ^ Jason Stacy, Cory Blad, and Rob Velella. "Morbid Inferences: Whitman, Wikipedia, and the Debate over the Poet's Sexuality". Polymath: An Interdisciplinary Arts and Sciences Journal, Vol. 3, No. 4, Fall 2013 https://ojcs.siue.edu/ojs/index.php/polymath/article/view/2857
  8. ^ Bernard Fallery, Florence Rodhain. "Gouvernance d'Internet, gouvernance de Wikipedia : l'apport des analyses d'E. Ostrom". Management et Avenir, 65 (2013) 168–187, http://hal.archives-ouvertes.fr/docs/00/92/05/08/PDF/2013_RMA.pdf (in French, with English abstract)
  9. ^ Meyer, Christian M. Wiktionary: "The Metalexicographic and the Natural Language Processing Perspective". Technische Universität Darmstadt, Darmstadt Ph.D. Thesis], (2013) http://tuprints.ulb.tu-darmstadt.de/3654/
Supplementary references:


Reader comments

2013-12-25

Drunken birds and treasonous kings

Self-Portrait, Yawning by Joseph Ducreux, c. 1783.
This Signpost featured report covers material promoted from 15 to 21 December 2013.
Portrait of Charles I of England from the studio of Anthony van Dyck, 1636
Arsenal F.C. players lining up before a league match against Chelsea in April 2012.


The Jefferson Memorial in Washington, DC, at sunset.
Disclaimer: Summaries on this page borrow shamelessly from the articles cited; see the article histories for attribution.


Reader comments

2013-12-25

Draft namespace, VisualEditor meetings

This photo of Phra Si Sanphet temple in Ayutthaya, Thailand is the Wikimedia Commons Picture of the Day for December 24, 2013. The photo is by Wikimedia contributor user:Poco a poco.
Happy holidays from The Signpost. December 25th and January 1st are holidays in much of the world. December 25th is Christmas Day and January 1st is New Year's Day.

This is mostly a list of non-article page requests for comment believed to be active on 22 December 2013 linked from subpages of Wikipedia:RfC, recent watchlist notices and SiteNotices. The last two are in bold. Items that are new to this report are in italics even if they are not new discussions. If an item can be listed under more than one category it is usually listed once only in this report. Clarifications and corrections are appreciated; please leave them in this article's comment box at the bottom of the page.

Style and naming

The front cover of a modern Hungarian passport. A Wikipedian has started a discussion about splitting the "Hungarian passport" article.

Policies and guidelines

WikiProjects and collaborations

Technical issues and templates

Proposals

English Wikipedia notable requests for permissions

(This section will include active RfAs, RfBs, CU/OS appointment requests, and Arbcom elections)

Meta

Upcoming online meetings

As noted in the Wikimedia blog, An Individual Engagement Grant was approved for the Wikimaps Atlas project.

2013-12-25

More Great WikiProject Logos

Your source for
WikiProject News
Submit your project's news and announcements for next week's WikiProject Report at the Signpost's WikiProject Desk.

We saved one last special report for 2013. After our well-received review of great WikiProject logos a couple years ago, it was only a matter of time before we collected a new batch of interesting iconography that showcases the creativity of the Wikipedia community. Hopefully, these logos will also inspire other projects to liven up their drab pages.

Before we begin, it is important to note that gilded pages do not guarantee a project's success. Slapping a new coat of paint on a flailing WikiProject won't eliminate the project's deeper flaws. We have special reports on reviving WikiProjects, learning from dead projects, and other kernels of knowledge that can help struggling projects.

The list below presents interesting designs that stood out among the many projects surveyed by one Report writer. This list is in no particular order and is by no means exhaustive. For great logos we may have overlooked, we invite our readers to post their favorite WikiProject's logo in the comments section of this report.

Award winning project

WikiProject Wikipedia Awards, home to barnstars and other bits of WikiLove, had a logo purpose built by Antonu, the editor who remastered most of Wikipedia's barnstars and creates new ones for WikiProjects upon request. The logo in its entirety has been translated into Korean and Urdu. The golden barnstar with a laurel wreath, created specifically for the WikiProject Wikipedia Awards logo, was refashioned for the "2.0" version of the WikiProject Barnstar which is awarded to "someone who makes great strides in improving WikiProjects."

Interwoven fossils

The logo for WikiProject Palaeontology is simple yet distinctive, with a Plesiosaurus macrocephalus fossil wrapped around the project's name. The fossil image looks pretty good for a hundred-year-old sketch.

Practice what you teach

Like the project's dual pursuits, WikiProject Heraldry and Vexillology has dual identifiers. On the left, the "Wikipedia coat of arms" which is described thusly: Or, on a puzzle piece gules a flag waving Or on a flagstaff bendwise argent; for the crest, a flag waving Or on a flagstaff palewise argent issuing from an escutcheon Or issuing from a wreath Or and gules. To the right, the WikiProject Heraldry and Vexillology seal which would look mighty fine on a flag or letterhead.

For our next trick

WikiProject Magic has a logo calling to mind the carnivals, circuses, and other traveling shows where magicians and soothsayers once made a living. While today's illusionists and escape artists demand flashier events, this call-back to a bygone era fits the project's extraordinary members.

That's so yesterday

What better way is there to identify WikiProject Fashion than with an outfit that's hopelessly out of fashion? This 1920s postcard should remind everyone that some day your children and grandchildren will be laughing at how ridiculous you looked back in 2013.

Just when you thought it was safe to go back in the water...

WikiProject Sharks doesn't need anything flashy. While many sea creatures have dorsal fins, a simple fin-shaped-object poking out of the water immediately brings sharks to mind (often to humorous effect). The morale of this story is that you don't need a DFA to design a decent logo. All it takes is an idea and a little motivation.


Just when you thought it was safe to jaywalk...

The WikiCops are coming for you. WikiProject Law Enforcement has an intricately crafted badge mixing a sheriff star, cop shield, and bobby crown with a laurel wreath and some rays of chivalric order starshine.


In all seriousness

The folks at WikiProject Editor Retention mean business. Attracting and keeping editors is a huge challenge for Wikipedia, so it's reassuring to know that the professionals at WikiProject Editor Retention are on it.


On the road again

The vast array of roadway projects and task forces have something in common that tie them all together: vectorized road signs unique to their corner of the world. If you don't have the time or know-how to create something from scratch, just use the resources that are already at your project's disposal. Logos going left to right are from: WikiProject U.S. Roads, Auto Trails Task Force, U.S. Territories Task Force, WikiProject Canada Roads, and WikiProject Australian Roads.


Full circle

We ended the first Great WikiProject Logos with a simple yet effective logo from the folks at WikiProject Zoo. Since then, they've updated their look with a wild image clearly inspired by edgy advertising buffers for some televised wildlife programs. The Wikipedians of WikiProject Zoo are clearly excited about their subject.

Did any of these stand out to you? Did we forget your favorite project's icon? Do you have something new you've been working on that you'd like to share? Post it to the comments section below!

Next week, we'll ring in the New Year with our annual retrospective. Until then, revisit years past in the archive.

Reader comments

2013-12-25

IEG round 2 funding rewards diverse ambitions

Automatically calculated "bounding boxes" for the Wikimaps Atlas project, one of seven successful applications for individual engagement grant funding.


A significant move by the Wikimedia Foundation has been the broadening of the types of activities it funds. To this end, the Foundation has developed several quite different forums for allocating that funding, setting up volunteer committees that conduct initial assessments of competitive applications. The most recent of these programs was the individual engagement grants (IEG) scheme, launched last January. The scheme awards funds to individuals or teams of up to four people to produce high-impact outcomes for the WMF's online projects. The IEG scheme favours innovative approaches to solving critical issues in the movement. This arm of WMF grantmaking is different from the Funds Dissemination Committee, which started more than a year ago and judges applications for annual operating grants by eligible afilliated organisations.

The IEG committee has just announced the results of its second twice-yearly round. There are seven successful applications for projects that are striking for their reach and diversity, underlining the complex and multidimensional nature of the Wikimedia movement. The allocations—some of them based on applications of impressive quality—involve on-the-ground social, cultural, and technical innovations. Individuals from Cameroon, Uganda, India, Israel, France, Italy, Germany, and the US will begin their projects in the new year, most of which will run from January to June.

Increasing participation and engagement

IEG recipient Emily Temple-Wood is interviewed by Wikimedia Germany on the challenges of editing Wikipedia experienced by women generally, and female scientists in particular.
Keilana (Emilie Temple-Wood), an undergraduate student in Chicago studying molecular biology, Arabic, and Islamic studies, will lead a Women Scientists Workshop Development project, funded at more than US$9K. The project will encourage college-aged women to become part of the editing community and to use high-quality content to combat systemic bias. Temple-Wood wrote: "One-time edit-a-thons have a very low retention rate for creating new editors and may not be worth the time and effort it takes to put them on." Instead, her model will involve regular edit-a-thon sessions supported by outreach and the creation of a new kit; this is likely to be the basis of a kit for other groups to use.

A surprisingly large proportion of our editors are under 18, according to Temple-Wood. She and Jake Orlowitz (Ocaasi) have also been provisionally funded to pilot a week-long summer conference, Generation Wikipedia, for young Wikipedians and Wikimedians from around the globe to connect, share skills and build leadership and community capacity among the youngest generation of editors. The conference, for which $20K may be allocated, would stress the particular needs of minors for safety, privacy, and liability protection in such an environment.

Paul Kiguba and the house that will be the new Wikipedia e-learning centre in rural Uganda

The Mbazzi Village writes Wikipedia has its origins in the meeting of two people from very different countries who crossed paths in an exchange program for their students: Paul Kiguba, deputy head at a primary school on a small peninsula that juts into the massive Lake Victoria in Uganda; and Dan Frendin, a teacher in Sweden. Together, they founded the Luganda Wikipedia to serve the local language. An empty house owned by Paul Kiguba in the village of Mbazzi will become a new Wikipedia centre, emphasising the writing of articles on health and agriculture. The project, funded at nearly $3K, will be assisted by Sophie Österberg, who was the WMF's Global Education Manager.

The nation of Cameroon (green)
A pilot project in the west African nation of Cameroon will be conducted to develop novel communication tools to promote an international conversation on WMF projects and the sharing of free knowledge. This will follow on from WikiAfrica Cameroon, supported by several institutions, and the French chapter's dynamic Afripédia program, which promotes French-language initiatives in the African WMF world. The centrepieces of the pilot project will be the production of a video and a series of comics, by video-makers, designers, writers, and artists in Cameroon. It will be led by Marilyn Douala Bell and Iolanda Pensa with collaboration from Michael Epacka, with funding of €15K.

Technical innovations

A further three projects involve technical innovations. Wikimaps Atlas is designed to address a problem many editors are aware of: creating maps for WMF online projects is a labour-intensive process that fails to meet the demand for accuracy and updating. The current system has left us with a large messy pool of locator and other base maps with varying styles, accuracy, and formats. The project will automate the creation of SVG base maps in a well-researched cartographic style using the latest and most accurate open geographic data. Put simply, it will systematically generate a free atlas of the world with well coded SVG files. Arun Ganesh, Hugo Lopez and collaborators will receive $12.5K to achieve this.

VisualEditor, the system of WYSIWYG editing in display mode, has had a controversial start, but will be an inevitable feature of editing on WMF projects. A key challenge is to create a centralised register of all gadgets, with a programmatic understanding of how each relates to editing and an assessment of its popularity across projects. Grantees Eran Roz and Ravid Ziv, have received $4.5K to accomplish this preparatory task and on that basis to integrate high-priority gadgets into VisEd.

Wikidata Toolkit, proposed by Markus Krötzsch, a researcher at the University of Oxford and data architect for the Wikidata project, has been awarded $30K, the highest amount in this round. He will lead a small team of researchers and students at Dresden University of Technology to address a key problem surrounding Wikidata. Wikidata, a relatively new project largely supported by Wikimedia Germany, aims to create a free knowledge base about the world—names, dates, coordinates, relationships, URLs, and references—that can be read and edited by humans and machines alike. However, in Krötzsch's words, "understanding this data requires technical means for querying and analysis that are not currently available. Even skilled developers have hardly any basis for working with Wikidata." The goal is to develop technical components to simplify "query answering" of Wikidata data; in technical terms, a robust and flexible query backend will be created to provide an API for running a variety of queries. The two main outcomes will be a Wikidata toolkit and a query web service.

The next round of IEG proposals will open on 1 March 2014.

In brief

  • British Library follow-up: With the British Library's release of over one million digitized, free images, the Wikimedia Commons community is spearheading an effort to create an index of the full collection.
    One of many images released by the British Library this month
  • Virtual internship openings: The US National Archives and Records Administration (NARA), with its Wikipedian in Residence and Digital Content Specialist Dominic McDevitt-Parks, has created a new virtual internship for Wikipedians. Two tracks are available—technical and community. Being a technical intern could include scripting an upload bot for the Wikimedia Commons or analyzing NARA's Wikimedia-related activities, while community interns would coordinate WikiProjects, act as NARA's point of contact, and communicating NARA initiatives to the relevant Wikimedia communities. Requirements for these positions are available on NARA's website.
  • In the media
    • FDC chair endorses paid editing: Dariusz Jemielniak (Pundit), the chair of the Funds Dissemination Committee, has authored an article in the Daily Dot arguing that paid editing is something that cannot be controlled. Jemielniak continued, "Paid edits do and will take place on Wikipedia. Just ignoring this phenomenon will not make it go away. ... Wikipedia is too important and too valuable to let this threat grow."
    • Indian languages receive attention: The Indian broadsheet Daily News and Analysis published a short article on the 20 Indian-language Wikipedias.
  • English Wikipedia Portal drive: An effort to bring all of the Main Page's portals, which are intended to serve as landing pages for specific topics but tend to be rarely edited, to featured portal status has been completed. The biography, mathematics, science, arts, geography, history, society, and technology portals are now featured.
  • Commons video deleted: The acrimonious saga of a video which documented the making of a penis-drawn portrait of Jimmy Wales has come to a close with its deletion. The video and accompanying portrait laid bare a sharp divide in the greater Wikimedia community over the Wikimedia Commons' policies in regard to explicit or pornographic media, going so far as to be part of the impetus for a new resolution from the Wikimedia Foundation's Board of Trustees. The six administrators closing this deletion discussion—the unusual number due to the oddly high-profile page—declined to publish their votes, saying only that "The principal determining factor in our choice was that the video can be reasonably interpreted as (sexual) harassment of an editor. Files which have the implication of attack (of which harassment is a form) are outside of Commons' scope."
  • Ombudsman Commission applications due: Those wishing to apply for the Ombudsman Commission, which investigates complaints about violations of the Wikimedia privacy policy, must do so by 1 January of any timezone.
  • Arbitration report: The "Ottoman Empire–Turkey naming dispute" case was closed. The evidence phase for both the Kafziel case and the Nightscream case will close on 29 December 2013.

    Reader comments

2013-12-25

OAuth: future of user-designed tools

Interface a user sees when granting permission for an application to access their account

Last month, the OAuth extension was deployed to all Wikimedia wikis. OAuth is a standard used for allowing users to authenticate third-party applications, also known as consumers, to take actions on their behalf.

In the past, tools were forced to use systems like TUSC to authenticate users, or store a separate authentication database like UTRS. Now, these applications can take actions using your account without you having to give them your password. For example, you can use the CropTool tool to crop an image on Commons, and the cropped image will be uploaded using your own account with a tag showing that CropTool was used.

Instructions for getting your application set up to use OAuth can be found on mediawiki.org. Currently Dan Garry, the product manager for OAuth, is approving each application before it can be used. That role will transition over to the Stewards after the guidelines for OAuth consumers, which are currently being drafted, are finalised.

More information:

In brief

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.

  • Deployments on hold: There are no planned deploys for the week of December 23rd due to holidays.
  • Edit visualization tool: Jeph paul has created a tool which visualizes an article's history. More information about the IEG.
  • Any updates to a file on Commons will be reflected in pages that use it faster (bug 22390).
  • Improvements to Drafts namespace: Steven Walling and Pau Giner published a blog post about what drafts might look like in the future.

    Reader comments
If articles have been updated, you may need to refresh the single-page edition.



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0