The Signpost

Recent research

Automated Q&A from Wikipedia articles; Who succeeds in talk page discussions?

Contribute  —  
Share this
By Eddie891, Thomas Niebler, Barbara Page and Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"Reading Wikipedia to Answer Open-Domain Questions"

Reviewed by Thomas Niebler

This paper by Chen et al.[1] proposes to use the Wikipedia article corpus as a source of world knowledge in order to answer open domain questions. They point out that Wikipedia articles contain a lot more information than current knowledge bases, such as DBPedia or Freebase. While knowledge in KBs is encoded in a more machine-friendly way, the vast majority of Wikipedia's knowledge is not covered in KBs, but contained in unstructured text and is thus difficult to access in an algorithmic way. The proposed approach, called "DrQA", aims to overcome that limitation by leveraging the article content. It first retrieves Wikipedia articles relevant to a question, and then uses a recurrent neural network (RNN) to detect relevant parts in the article's paragraphs that could be used as answers. This RNN is based on a set of pretrained word embeddings as well as a set of other features.

Their results indicate that DrQA seems better suited to answer open domain questions than other competitors, based on a set of four question benchmarks. While the evaluation score improvement seems rather small (77.3 vs 78.8 F1 score), the whole task of machine reading at scale using Wikipedia gives directions for interesting future research and applications. For example, depending on the speed of the framework (which unfortunately was not discussed), a new Wikipedia service for answering such open domain questions could be established. Furthermore, this process of answering common knowledge questions could help in improving chatbots.

Are you a policy wonk? Who succeeds in talk page discussions

Reviewed by Barbara (WVS)

This Carnegie Mellon University study[2] quantified the success of those editors who engage in talk page discussions and their roles in these discussions. The roles assigned to each editor was:

Unlike earlier studies exploring editor interactions, editors in this study could be assigned simultaneous roles on an article talk page. Success of each editor was determined by analyzing subsequent edits to the article under discussion which were promoted by a particular editor and longevity of these edits. Those editors that are more detail-oriented tend to have more success than those more interested in organization. Multiple editors assuming the role of organization lessens the success of individual editors. The study assessed 7,211 articles, 21,108 discussion threads, 21,108 editor discussion pairs, and the average number of editors per discussion. The number of total edits by an editor is not associated with success.

The researchers also published a dataset consisting of "53,175 instances in which an editor interacts with one or more other editors in a talk page discussion and achieves a measured influence on the associated article page".

"Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features"

Summarized by Eddie891

This article[3] focuses on the 1.2 million unassessed articles in the Polish Wikipedia, and considers "over 100 linguistic features to determine the quality of Wikipedia articles in Polish language." From the conclusion: "Use of linguistic features is valuable for automatic determination of quality of Wikipedia article in Polish language. Better results in terms of precision can be achieved when the whole text of [an] article is taken into the account. Then our model shows over 93% classification precision using such features as relative number of unique nouns and verbs (unique, 3rd person, impersonal). However, if we take into account only [the] leading section of an article, relative quantity of common words, locatives, vocatives and third person words are the most significant for determination of quality. Using the obtained quality models we [assess] 500 000 randomly chosen unevaluated articles from Polish Wikipedia. According to result, about 4–5% of assessed articles can be considered by Wikipedia community as high quality articles."

Conferences and events

See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer


  1. ^ Danqi Chen; Adam Fisch; Jason Weston; Antoine Bordes: Reading Wikipedia to Answer Open-Domain Questions. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  2. ^ Maki, Keith; Yoder, Michael; Jo, Yohan; Rosé, Carolyn (2017). "Roles and Success in Wikipedia Talk Pages: Identifying Latent Patterns of Behavior". Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1: 1026–1035.
  3. ^ Lewoniewski, Włodzimierz; Węcel, Krzysztof; Abramowicz, Witold (2018-01-03). "Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features". doi:10.20944/preprints201801.0017.v1. {{cite journal}}: Cite journal requires |journal= (help)
  4. ^ Lewoniewski, Włodzimierz (2017-06-28). Enrichment of Information in Multilingual Wikipedia Based on Quality Analysis. International Conference on Business Information Systems. Lecture Notes in Business Information Processing. Springer, Cham. pp. 216–227. doi:10.1007/978-3-319-69023-0_19. ISBN 9783319690223. Closed access icon
  5. ^ Lewoniewski, Włodzimierz; Węcel, Krzysztof; Abramowicz, Witold (2017-10-12). Analysis of References Across Wikipedia Languages. International Conference on Information and Software Technologies. Communications in Computer and Information Science. Springer, Cham. pp. 561–573. doi:10.1007/978-3-319-67642-5_47. ISBN 9783319676418. Closed access icon author's copy / conference presentation video recording
  6. ^ Rubira, Rainer; Gil-Egui, Gisela (2017-10-30). "Wikipedia as a space for discursive constructions of globalization". International Communication Gazette. 81: 3–19. doi:10.1177/1748048517736415. ISSN 1748-0485. S2CID 149356870. Closed access icon
  7. ^ Jipmo, Coriane Nana; Quercini, Gianluca; Bennacer, Nacéra (2017-11-05). FRISK: A Multilingual Approach to Find twitteR InterestS via wiKipedia. International Conference on Advanced Data Mining and Applications. Lecture Notes in Computer Science. Springer, Cham. pp. 243–256. doi:10.1007/978-3-319-69179-4_17. ISBN 9783319691787. Closed access icon, author copy
  8. ^ Ledger, Thomas Stephen (2017-09-01). "Introduction to anatomy on Wikipedia". Journal of Anatomy. 231 (3): 430–432. doi:10.1111/joa.12640. ISSN 1469-7580. PMC 5554820. PMID 28703298. Closed access icon
  9. ^ Skolik, Sebastian (2017). "Instytucjonalizacja ruchu wolnej kultury na przykładzie projektów Wikimedia w przestrzeni Europy Środkowo-Wschodniej". Wydawnictwo Uniwersytetu Śląskiego: 347–367. {{cite journal}}: Cite journal requires |journal= (help) Closed access icon (in Polish, book chapter from ISBN 978-83-8012-916-0)
  10. ^ Sokolova, Sofiia (2017). "The Russian-language Wikipedia as a Measure of Society Political Mythologization". Journal of Modern Science. 33 (2): 147–176. ISSN 1734-2031. Closed access icon
  11. ^ Samoilenko, Anna; Lemmerich, Florian; Weller, Katrin; Zens, Maria; Strohmaier, Markus (2017). "Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach". Proceedings of the Eleventh International AAAI Conference on Web an Social Media (ICWSM in Montreal, Canada). 11: 210–219. arXiv:1705.08816. doi:10.1609/icwsm.v11i1.14881. S2CID 30431459.
  12. ^ Tinati, Ramine; Luczak-Roesch, Markus; Shadbolt, Nigel; Hall, Wendy (2015). "Using WikiProjects to Measure the Health of Wikipedia". Proceedings of the 24th International Conference on World Wide Web. ACM Press. pp. 369–370. doi:10.1145/2740908.2745937. ISBN 9781450334730. Closed access icon / Tinati, Ramine; Luczak-Rösch, Markus; Hall, Wendy; Shadbolt, Nigel (2015-05-23). Using WikiProjects to measure the health of Wikipedia. Web Science Track, World Wide Web Conference.
In this issue
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
Possibly, but it is based upon a data dump from 2008. Carnegie Mellon has a whole department dedicated to WP. I've been there. Ironically, most do not edit. Barbara (WVS)   14:16, 5 February 2018 (UTC)[reply]
@Barbara (WVS): Note how the 2011 paper isolated authority citations and opinion changes ("alignment moves") as the primary features (beyond the writing parties, their semantic assertions, etc.) of talk pages. While the CMU paper says as much in section 2 on page 1027, they proceed to focus solely on authority claims in section 5.2.3 on page 1030, along with the other features in section 5.2, but ignore the crucial instance of participants coming into agreement with others. That's a really stark omission and I am sure their analysis would have been stronger if they included it. Do you know the authors? If so, please suggest that if it makes sense to you. (talk) 18:36, 5 February 2018 (UTC)[reply]


The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0