The Signpost

Recent research

Bot writes about theatre plays; "Renaissance editors" create better content

Contribute  —  
Share this
By Kim Osman, Federico Leva, Tilman Bayer and Max Klein

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

US playwright Alice Gerstenberg. A bot-generated article about her 1920 comedy Fourteen was accepted with minimal changes.

Bot detects theatre play scripts on the web and writes Wikipedia articles about them

A paper[1] presented at the International Conference on Pattern Recognition last year (earlier poster) presents an automated method to improve Wikipedia's coverage of theatre plays ("only about 10% of the plays in our dataset have corresponding Wikipedia pages"). It searches for playscripts and related documents on the web, extracts key information from them (including the play's main characters, relevant sentences from online synopses of the play, and mentions in Google Books and the Google News archive in an attempt to ensure that the play satisfies Wikipedia's notability criteria). It then compiles this information into an automatically generated Wikipedia article. Two of the 15 articles submitted as result of this method were accepted by Wikipedia editors. For the first, Chitra by Rabindranath Tagore, the initial bot-created submission underwent significant changes by other editors ("the final page reflects some of the improvements we can incorporate in our bot"). The second one, Fourteen by Alice Gerstenberg, "was moved into Wikipedia mainspace with minimal changes. All the references, quotes and paragraphs were retained".

"Renaissance Editors" create better Wikipedia content

A study of the German Wikipedia[2], about the diversity of editor contributions among the 8 "main categories", shows a relationship between editor diversity and quality. The authors start by defining an "interest profile" of an editor – the proportion of bytes contributed across all categories. Then an entropy measure is proposed which rewards an interest profile for being more distributed across more categories – having a polymath style.

Leonardo Da Vinci is a famous example of a "Renaissance man" or "polymath"

There is a correlation shown between the average diversity of contributors and what types of article quality they've contributed to. Article quality is determined based on whether the article is a "Good Article", "Featured Article", or neither. It is also shown that total productivity, measured by bytes contributed, is linked to diversity, only marginally insignificantly. Finally, a logistic regression shows that diversity more than productivity significantly determines article quality.

Despite too many simplifications (e.g. single language, naive article quality ratings, too broad categories), the methods used by the researchers are well-defined, clear, and convincing in a limited scope, and place a finger on the notion that our most lauded editors tend to run all over Wikipedia.

Briefly

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

References

  1. ^ Banerjee, Siddhartha; Cornelia Caragea; Prasenjit Mitra (2014). Playscript Classification and Automatic Wikipedia Play Articles Generation. 2014 22nd International Conference on Pattern Recognition (ICPR). pp. 3630–3635. doi:10.1109/ICPR.2014.624. Closed access icon, preprint, dataset
  2. ^ Szejda, J.; Sydow M.; Czerniawska D. "Does a "Renaissance Man" Create Good Wikipedia Articles?" (PDF). In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval. (KDIR-2014): 425-430. doi:10.5220/0005155804250430. Retrieved 28 January 2015.
  3. ^ Mattus, Maria. "The Anyone-Can-Edit Syndrome – Intercreation Stories of Three Featured Articles on Wikipedia". Nordicom Review (35) 2014: 189–203. Retrieved 28 January 2015.
  4. ^ Borra, Erik; et al. "Societal Controversies in Wikipedia Articles" (PDF). Proceedings of CHI 15, April 18–23, 2015, Seoul, Republic of Korea. ACM. doi:10.1145/2702123.2702436. Retrieved 28 January 2015.
  5. ^ Erik Borra, Esther Weltevrede, Paolo Ciuccarelli, Andreas Kaltenbrunner, David Laniado, Giovanni Magni, Michele Mauri, Richard Rogers, Tommaso Venturini: Contropedia – the analysis and visualization of controversies in Wikipedia articles PDF
  6. ^ Wasala, Asanka; Schäler, Reinhard; Buckley, Jim; Weerasinghe, Ruvan (21 Feb 2013). Building Multilingual Language Resources in Web Localisation: A Crowdsourcing Approach. Theory and Applications of Natural Language Processing. Springer Berlin Heidelberg. pp. 69–99. ISBN 978-3-642-35085-6. Retrieved 26 January 2015.
  7. ^ Alegria, Iñaki; Cabezon, Unai; Betoño, Unai Fernandez de; Labaka, Gorka (21 Feb 2013). Reciprocal Enrichment Between Basque Wikipedia and Machine Translation. Theory and Applications of Natural Language Processing. Springer Berlin Heidelberg. pp. 101–118. ISBN 978-3-642-35085-6. Retrieved 26 January 2015.
  8. ^ Ferschke, Oliver; Daxenberger, Johannes; Gurevych, Iryna (21 Feb 2013). A Survey of NLP Methods and Resources for Analyzing the Collaborative Writing Process in Wikipedia. Theory and Applications of Natural Language Processing. Springer Berlin Heidelberg. pp. 121–160. ISBN 978-3-642-35085-6. Retrieved 26 January 2015.
  9. ^ Oltramari, Alessandro; Vetere, Guido; Chiari, Isabella; Jezek, Elisabetta (2013). Senso Comune: A Collaborative Knowledge Resource for Italian. Theory and Applications of Natural Language Processing. Springer Berlin Heidelberg. pp. 45–67. ISBN 978-3-642-35085-6. Retrieved 26 January 2015.
  10. ^ Gandica, Y.; F. Sampaio dos Aidos; J. Carvalho (2014-08-19). "The dynamic nature of conflict in Wikipedia". arXiv:1408.4362.
  11. ^ Thomas Steiner: Comprehensive Wikipedia Monitoring for Global and Realtime Natural Disaster Detection. ISWC 2014 Developers Workshop PDF
  12. ^ A Spencer, B Krige, S Nair: Digital doorway: Gaining library users through Wikipedia PDF
  13. ^ Tuan Tran and Tu Ngoc Nguyen: Hedera: Scalable Indexing and Exploring Entities in Wikipedia Revision History PDF
  14. ^ Sandro Bauer, Stephen Clark , Thore Graepel: Learning to Identify Historical Figures for Timeline Creation from Wikipedia Articles. PDF
  15. ^ Zhang, Kezun; Yanghua Xiao; Hanghang Tong; Haixun Wang; Wei Wang (2014). WiiCluster: A Platform for Wikipedia Infobox Generation. Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. CIKM '14. New York, NY, USA: ACM. pp. 2033–2035. doi:10.1145/2661829.2661840. ISBN 978-1-4503-2598-1. Closed access icon
  16. ^ Wilson, Jodi L. (2014). "Proceed With Extreme Caution: Citation to Wikipedia in Light of Contributor Demographics and Content Policies". JETLaw: Vanderbilt Journal of Entertainment & Technology Law. 16 (4): 857.
  17. ^ Armstrong, Richard (2014-08-01). "Wikipedia: helping to promote the art and science of civil engineering". Proceedings of the ICE – Civil Engineering. 167 (3): 101–101. doi:10.1680/cien.2014.167.3.101. ISSN 0965-089X. Closed access icon


+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Fantastic report! Lots of studies to read, thank you. :-) --Atlasowa (talk) 14:26, 30 January 2015 (UTC)[reply]

This an amazing report! I'll be even more polymath. Before reading this, I edited mostly insects, hopping from family to family, but now I have diversified to GA reviewing, editing a few Romania-related articles, and mostly working on animals. Gug01 (talk) 13:57, 31 January 2015 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0