The Signpost

Recent research

Politically diverse editors write better articles; Reddit and Stack Overflow benefit from Wikipedia but don't give back

Contribute   —  
Share this
By Barbara Page, FULBERT, Steve Jankowski, and Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Politically diverse editors and article quality

Controversy is an organized sport for some editors but may alert readers that there is more than one view on a topic.
"The Wisdom of Polarized Crowds"[1]
Reviewed by FULBERT

While politics in the United States appears to be increasingly polarized around extremes in political discourse, it was unclear how this affected the open, collective production of knowledge that is Wikipedia.

The researchers used a data dump of English Wikipedia from 12/1/16, including all edits made since its start within the domains of politics, social issues, and science. They focused on the "American liberalism" and "American conservatism" categories and sub-categories as delimiters, with breakdowns in social issues and science down four levels from the root. The researchers reached out to the Wikipedia community, Wikimedia staff, and those who directly inquired on the page they created through Meta-Wiki, with 118 responses overall for their survey. The researchers then analyzed user edits to determine political alignment based on contributions to conservative or liberal articles.

The researchers found that "articles attracting more attention tend to have more balanced engagement from editors along the conservative-liberal spectrum" (p. 4). They then measured the quality of articles using a tool developed by Wikimedia research staff (ORES), and determined that higher political polarization was associated with higher article quality. All this fed into their study goals of exploring the relationship between diversity of political alignment and article quality and bias. Through their statistical analysis, they determined that the quality of articles in Wikipedia improves when editors on both sides of politically polarized issues work together to seek collaborative consensus on topics. While this research was directly focused on politically-related topics, it surfaced both a need for political diversity and for motivated contributors.

(Cf. related earlier coverage: "Being Wikipedian is more important than the political affiliation", "Cross-language study of conflict on Wikipedia")

The study of controversy

"Computing controversy: Formal model and algorithms for detecting controversy on Wikipedia and in search queries"[2]
Reviewed by Barbara Page and Tilman Bayer

This paper presents a "method for automatic detection of controversial articles and categories in Wikipedia", based on three data sources:

The researchers argue that applying a mathematical model to Wikipedia talk page controversies has the potential of incorporating a 'controversy' metric in web-searches. This should give those searching for information on a topic a way to quickly assess controversial topics. Wikipedia provides researchers with accessible and historical controversial discussions. The authors further describe their work: "[Assessing] the controversy should offer [readers] a chance to see the 'wider picture' rather than letting [them] obtain one-sided views." The authors' conclusions were: "Our approach can be also applied in Wikipedia or other knowledge bases for supporting the detection of controversy and content maintenance. Finally, we believe that our results could be useful for...understanding the complex nature of controversy..."

Students edit but still doubt the value of Wikipedia

"Wikipedia in higher education: Changes in perceived value through content contribution"[3]
Reviewed by Barbara Page

Students are a convenient group to study, especially if being studied is part of the syllabus. The 240 students in this study readily admitted to using Wikipedia as a resource even though they did not be consider it to be 'reliable and trustworthy'. Using Wikipedia as a resource does not necessarily encourage content contributions by students. In addition, when the students in this study actually added content, their perceptions of the reliability and usefulness of Wikipedia did not change.

(For coverage of various other papers studying the use and perception of Wikipedia by students, see also our 2017 special issue on Wikipedia in Education)

Researching the research using Wikipedia as a corpus

"Excavating the mother lode of human-generated text: A systematic review of research that uses the Wikipedia corpus"[4]
Reviewed by Barbara Page

The amount of research that uses Wikipedia as a source of data continues to grow and enough scholarly content now exists that systematic reviews are available. Computer science has especially been quick to see the potential of this 'mother lode' and how it can be used to study information retrieval, natural language processing, and ontology building. The reference section in this article itself makes interesting reading if only to appreciate the collection of data sets and other research that exists and continues to expand.

(See also our earlier coverage of literature reviews, some involving the same authors: "A systematic review of the Wikipedia literature", "'Wikipedia in the eyes of its beholders: A systematic review of scholarly research on Wikipedia readers and readership'", "Literature reviews of Wikipedia's inputs, processes, and outputs")

Sneaky editing and masking bias

"Persistent Bias on Wikipedia: Methods and Responses"[5]
Reviewed by Barbara Page

Apparently, Wikipedia editors are not the only ones who have observed biased editing. The author of this research article (already mentioned in a previous issue) used his own article as a case study and example of biased editing. It is no surprise that an editor can 'nominally' follow editing guidelines to maintain their bias. Here is the 'how to' on such behavior:

Those who are biased sometimes support their editing even in 'the face of resistance'. This is done by:

When bias is challenged by other editors, the strategies for dealing with it is making complaints, 'mobilizing counterediting', and exposing the bias. The authors' stinging conclusion speaks for itself: "It is worthwhile becoming aware of persistent bias and developing ways to counter it in order for Wikipedia to move closer to its goal of providing accurate and balanced information."

Seeking credibility

"Information Fortification: An Online Citation Behavior"[6]
Reviewed by Barbara Page

This study is a rebuttal to a 2005 position paper by Forte (one of the authors) and Bruckman, which had drawn "on Latour’s sociology of science and citation to explain citation in Wikipedia with a focus on credibility seeking". Citing sources is associated with other issues of bias and identifies the patterns used to in citing sources to encourage and even fabricate controversy. This study was limited to non-scientific topics and used data derived from edit logs, interviews and text analysis. "[I]nformation fortification [is] a concept that explains online citation activity that arises from both naturally occurring and manufactured forms of controversy."

Anti-vandalism on Wikidata

"Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017"[7]
Reviewed by Barbara Page

Vandalism of Wikidata can have significant disruptions in the use of the data leading to flaws in the analysis of such data. Collaborative efforts continue to address these concerns and included some friendly 'competitions'. Strategies for 'fighting' vandalism at this time include manual review, community feedback, and analyzing reverting patterns. Other 'vandalism' fighting tools are being developed. Interesting is the discussion about the effort to use "psychologically motivated features capturing a user’s personality and state of mind..."

Wikipedia's one-way relationships with Reddit and Stack Overflow

"Examining Wikipedia With a Broader Lens: Quantifying the Value of Wikipedia's Relationships with Other Large-Scale Online Communities"[8]
Reviewed by Steve Jankowski

There is a growing body of literature that examines Wikipedia's role in creating value for other websites as part of a media ecosystem. Adding to these studies is the work of Vincent, Johnson & Hecht who examined the bidirectional value created for Reddit and Stack Overflow. Conceptually, the authors distinguished between two sets of metrics to define this value. For Reddit and Stack Overflow, they understood value as being a function of user engagement (score/votes, comments, page views) that is contextualized by potential revenue. For Wikipedia, value is likewise seen as user engagement, characterized by edit count, editors gained, editors retained, and article page views, but is not contextualized by revenue (p.4).

Based on this operationalization of value, the authors assessed the amount of content and links created through associative and causal analyses. They found that Wikipedia provided substantial value to Stack Overflow and Reddit. Most clearly, they illustrated this by explaining how posts containing Wikipedia links gained engagement levels that were estimated to be worth $100K per year (p.2). However, this level of engagement did not operate in the reverse. The authors found "negligible increases" (p.2) to the number of edits and editor signups. Based on these results, the authors observed that the relationship between Wikipedia and the two communities was "one-way", with Wikipedia providing more value than it received in return.

Considering this new direction in studying Wikipedia, there are a number of elements that require commentary. The first is the obvious care the authors displayed in their methods. For example, they were conscious of the need to adjust their analyses to consider the skew of current events by providing inter-rater agreement on the required qualitative analysis that this required. The second comment is that there is a conceptual mismatch of using revenue as an appropriate metric for analyzing value created "between communities", considering that the communities themselves do not receive any profit. Perhaps future research in this area might need greater granularity in the type of relationships that reflect differences between community-to-community, owner-to-owner, and community-to-owner.

Despite this terminological slippage, this research adds specific details to Van Djick's analysis of the social media ecosystem[9] where she described the character of the relationship between Google and Wikipedia within a for-profit context. Likewise, the article provides greater support to conclusions presented in an earlier study conducted by McMahon, Johnson & Hecht.[10] In that paper, Google's usage of Wikipedia content in its Knowledge Graph results was shown to reduce the amount of through traffic when a link to Wikipedia was removed. As the authors of both papers agree, contextualizing Wikipedia as part of an ecosystem is significant for understanding and assessing how external relationships can be adapted to the sustainability of Wikipedia.

A 2015 study confined to the subreddit /r/todayilearned (TIL) found "strong statistical evidence suggesting Reddit threads affect Wikipedia viewership levels in a non-trivial manner", but did not examine effects on editor activity.[11]

"Knowledge categorization affects popularity and quality of Wikipedia articles"[12]
Reviewed by FULBERT

This empirical research paper explored how knowledge categorization – common in classification systems within the information sciences – works as a scientific and social process when Wikipedia articles are attended to by editors. Categorization leads to nesting of information under major topics, and the further down a hierarchy, the less editing attention articles appear to garner. Articles higher in the hierarchy are referred to as coarse-grained, and while these receive the most attention, their levels of quality have not been the focus of previous studies.

The researchers analyzed a database dump of the English-language Wikipedia from October 20, 2016, considering all articles that were members of at least one category (n=5,006,601). They defined granularity as the length of the shortest path from the root (main category), which averaged 7.59 across all articles, which they then compared to the number of article edits (which related to preception of higher quality articles), the number of articles as rated by importance (done individually by WikiProjects), perceptions of quality (based on being classified as a featured article), and the notion of return on effort (quality of an article relative to the amount of work done on it by editors). They conducted non-parametric and parametric statistical analyses using numerous variables based on the many article records through their data dump.

There were many levels of findings, with the main one being that articles in coarse-grained categories (those nearest the top of the hierarchies) received the most number of edits and attention from editors, though they were least likely to be featured (highest quality) articles. This seemed to surprise the authors, as it means that those articles that receive the most attention (by editors) overall lack the depth of quality found in featured articles, most of which are further down the hierarchy.


"Mean number of edits is displayed in the x-axis. The linear regression coefficient α1 of the granularity variable explaining the number of edits [...] is displayed in the y-axis. Area of points is proportional to the number of articles in the respective top-level category. (Figure 6 from the paper)


"The baseline probability of featured articles in the respective TLC [top-level category] is displayed in the x-axis. The logistic regression coefficient of the granularity variable, when controlling for the number of edits [...], is displayed in the y-axis." (Figure 7 from the paper)

Conferences and events

See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Barbara Page and Tilman Bayer

References

  1. ^ Shi, Feng; Teplitskiy, Misha; Duede, Eamon; Evans, James (2017-11-29). "The wisdom of polarized crowds". Nature Human Behaviour. 3 (4): 329–336. arXiv:1712.06414. doi:10.1038/s41562-019-0541-6. PMID 30971793. S2CID 8947252.
  2. ^ Zielinski, Kazimierz; Nielek, Radoslaw; Wierzbicki, Adam; Jatowt, Adam (2018). "Computing controversy: Formal model and algorithms for detecting controversy on Wikipedia and in search queries". Information Processing & Management. 54 (1): 14–36. doi:10.1016/j.ipm.2017.08.005.
  3. ^ Soler-Adillon, Joan; Pavlovic, Dragana; Freixa, Pere (2018). "Wikipedia in higher education: Changes in perceived value through content contribution". Comunicar (in Spanish). 26 (54): 39–48. doi:10.3916/c54-2018-04. ISSN 1134-3478. English version here
  4. ^ Mehdi, Mohamad; Okoli, Chitu; Mesgari, Mostafa; Nielsen, Finn Årup; Lanamäki, Arto (2017). "Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus" (PDF). Information Processing & Management. 53 (2): 505–529. doi:10.1016/j.ipm.2016.07.003. S2CID 217265814.
  5. ^ Martin, Brian (2017). "Persistent Bias on Wikipedia: Methods and Responses". Social Science Computer Review: 089443931771543. doi:10.1177/0894439317715434. S2CID 65125326. Closed access icon Author's copy
  6. ^ Forte, Andrea; Andalibi, Nazanin; Gorichanaz, Tim; Kim, Meen Chul; Park, Thomas; Halfaker, Aaron (2018-01-07). "Information Fortification: An Online Citation Behavior" (PDF). Proceedings of the 2018 ACM Conference on Supporting Groupwork, GROUP 2018, Sanibel Island, FL, USA, January 07-10, 2018. ACM. pp. 83–92. doi:10.1145/3148330.3148347. ISBN 9781450355629. S2CID 20820320.
  7. ^ Heindorf, Stefan; Potthast, Martin; Engels, Gregor; Stein, Benno (2017). "Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017". arXiv:1712.05956 [cs.IR].
  8. ^ Vincent, Nicholas; Johnson, Isaac; Hecht, Brent (2018-04-21). "Examining Wikipedia With a Broader Lens: Quantifying the Value of Wikipedia's Relationships with Other Large-Scale Online Communities" (PDF). CHI 2018. Montréal, QC, Canada: Association of Computing Machinery.
  9. ^ Dijck, José (2013). The culture of connectivity : a critical history of social media. Oxford New York: Oxford University Press. Chapter 7.4. ISBN 9780199970780.
  10. ^ McMahon, Connor; Johnson, Issac; Hecht, Brent (2017). "The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies". Eleventh International AAAI Conference on Web and Social Media. AAAI. pp. 142–151.
  11. ^ Carson, S. L.; Dye, T. K.; Goldbaum, D.; Moyer, D.; Carson, R. T. (2015). "Determining the influence of Reddit posts on Wikipedia pageviews". Wikipedia, a Social Pedia: Research Challenges and Opportunities: Papers from the 2015 ICWSM Workshop. ICWSM 2015. Association for the Advancement of Artificial Intelligence. pp. 75–82. ISBN 9781577357377.
  12. ^ Lerner, Jürgen; Lomi, Alessandro (2018-01-02). "Knowledge categorization affects popularity and quality of Wikipedia articles". PLOS ONE. 13 (1): e0190674. Bibcode:2018PLoSO..1390674L. doi:10.1371/journal.pone.0190674. ISSN 1932-6203. PMC 5749832. PMID 29293627.
  13. ^ Jatowt, Adam; Kawai, Daisuke; Tanaka, Katsumi (2018-02-08). "Time-focused analysis of connectivity and popularity of historical persons in Wikipedia". International Journal on Digital Libraries. 20 (4): 287–305. doi:10.1007/s00799-018-0231-4. ISSN 1432-5012. S2CID 254084505.Closed access icon
  14. ^ "Gesundheitsinfos: Wer suchet, der findet – Patienten mit Dr. Google zufrieden". Spotlight Gesundheit, Bertelsmann Stiftung. 2018.
  15. ^ di Sciascio, Cecilia; Strohmaier, David; Errecalde, Marcelo; Veas, Eduardo (2017). WikiLyzer: Interactive Information Quality Assessment in Wikipedia. IUI '17. New York, NY, USA: ACM. pp. 377–388. doi:10.1145/3025171.3025201. ISBN 9781450343480.
  16. ^ Thruesen, P.; Čechák, J.; Sezñec, B.; Castalio, R.; Kanhabua, N. (December 2016). To link or not to link: Ranking hyperlinks in Wikipedia using collective attention. 2016 IEEE International Conference on Big Data (Big Data). pp. 1709–1718. doi:10.1109/BigData.2016.7840785. Closed access icon
  17. ^ Dehigama, Kanchana; Jazeel, M. I. M. (2017-12-07). "Usage of Wikipedia by health science and social sciences & humanities undergraduates of University of Peradeniya and SouthEastern University of Sri Lanka". {{cite journal}}: Cite journal requires |journal= (help)
  18. ^ Nguyen, Dong; McGillivray, Barbara; Yasseri, Taha (2017-12-22). "Emo, love and god: Making sense of Urban Dictionary, a crowd-sourced online dictionary". Royal Society Open Science. 5 (5). arXiv:1712.08647. doi:10.1098/rsos.172320. PMC 5990761. PMID 29892417.
S
In this issue
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
Amen to that! I have long believed this and formulated it this way:
Talk page negotiation table

"The best content is developed through civil collaboration between editors who hold opposing points of view."

-- BullRangifer. From WP:NEUTRALEDITOR
The best content is developed through civil collaboration between editors who hold opposing points of view. Everyone is biased,[1] and it is natural for humans to be blind to their own biases; we tend to suffer from confirmation biases[2][3] and the Dunning–Kruger effect. Therefore other editors provide an important counterbalancing service when they spot and correct the consequences of our biased editing. When pointing out such editing errors, it is important to follow the Golden Rule and assume good faith in fellow editors. No one is perfect. -- BullRangifer (talk) PingMe 03:25, 21 February 2018 (UTC)[reply]
Old issue indeed, plenty of research is available. The Wisdom of Crowds argues that crowd is wise only as long as it's diverse, while polarization is not necessary and may be harmful. --Nemo 11:47, 27 February 2018 (UTC)[reply]
Exactly! Getting a bunch of people together who already hold the same POV doesn't create much improvement. We all need each other. -- BullRangifer (talk) PingMe 15:28, 27 February 2018 (UTC)[reply]

References

  1. ^ Johnson, Carolyn Y. (February 5, 2013), Everyone is biased: Harvard professor’s work reveals we barely know our own minds, Boston.com, retrieved December 12, 2015 {{citation}}: Italic or bold markup not allowed in: |publisher= (help)
  2. ^ Phelps, Marcy (June 5, 2015), Are your biases showing? Avoiding confirmation bias in due diligence investigations, Phelps Research, retrieved November 15, 2015 {{citation}}: Italic or bold markup not allowed in: |publisher= (help)
  3. ^ Yanklowitz, Shmuly (October 3, 2013), Confirmation Bias and the Ethical Demands of Argumentation, The Huffington Post, retrieved November 15, 2015 {{citation}}: Italic or bold markup not allowed in: |publisher= (help)
  • Cause for thought. Those editors should be more compliant with our policies and stop pushing fringe theories and using unreliable sources. That would help the problem, because we don't want disruptive editors here. Banning them is a net plus for an accurate encyclopedia. -- BullRangifer (talk) PingMe 21:14, 21 February 2018 (UTC)[reply]
  • Case in point: assertion not backed by facts. How is the term "right wing editor" defined? For example, I consider my own political views left of center in terms of US politics, yet I am very aware that they would be considered centrist or even right of center in terms of European politics. And how can you prove that editors of one particular POV are "sanctioned or banned" more often than those of any other POV? -- llywrch (talk) 00:24, 22 February 2018 (UTC)[reply]
  • Without evidence for any biased reason why some would be banned more than others, I just assume a certain amount of good faith in our processes, ergo, there are more of one type who are misbehavin'. -- BullRangifer (talk) PingMe 01:10, 22 February 2018 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0