The Signpost

Recent research

Language analyses examine power structure and political slant; Wikipedia compared to commercial databases

Contribute  —  
Share this
By Tilman Bayer and Piotr Konieczny
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.

Admins influence the language of non-admins

An Arxiv preprint titled "Echoes of power: Language effects and power differences in social interaction"[1] looks at the language used by Wikipedia editors. The authors look at how conversational language can be used to understand power relationships. The research analyzes how much one adapts their language to the language of others involved in a discussion (the process of language coordination). The findings indicate that the more such adoption occurs, the more deferential one is. The authors find that editors on Wikipedia tend to coordinate (language-wise) more with the administrators than with non-administrators. Further, the study suggests that one's ability to coordinate language has an impact on one's chances to become an administrator: the admin-candidates who do more language coordination have a higher chance of becoming an administrator than those who don't change their language. Once a person is elected an administrator, they tend to coordinate less.

A blog post on the website of Technology Review summarized the results using the headline "Algorithm Measures Human Pecking Order" and highlighted the fact that one of the authors is Jon Kleinberg, known as inventor of the HITS algorithm (also known as "hubs and authorities").

Can Wikipedia replace commercial biography databases?

California State University, East Bay: Could it rely on biographical information from Wikipedia and the web alone?

An article[2] by a librarian and professor at California State University, East Bay offers a comparison of "biographical content for literary authors writing in English" between Wikipedia, "the web" (i.e. top Google search results) and two commercial databases: the Biography Reference Bank (BRB, now part of EBSCO Industries) and Contemporary Authors Online, motivated by the decision of the author's institution to cancel its subscription to the latter database (CAO) during a budget crisis in 2008–2009, which among other reasons had been accompanied by "a comment that this information is 'on the web'".

The paper starts out with a literature review on the reliability of Wikipedia and then describes how the author compiled a list of 500 authors (mostly from the US and UK) by "examining curricula and textbooks from English literature courses across the USA" and soliciting additional suggestions from peers. These names were then searched on BRB, CAO (as part of the Literature Resource Center), Wikipedia and Google.

Regarding breadth of coverage, only six of the 500 names were "absent" on Wikipedia (meaning that they had "no entry of their own or reference in any other entry"), compared to 14 for CAO, and 50 for the Biography Reference Bank.

While the study does not seem to have attempted a systematic comparison of factual accuracy, it observes that Wikipedia "entries are less uniform than those in commercial databases. The biographical information ranges from extensive to perfunctory."

The author remarks favorably on Wikipedia's searchability:

"The databases and Wikipedia deal better than the Web with variant names, pseudonyms, and names that apply to multiple people. Cross-referencing is very good. [...] Wikipedia searching is very easy.There were even cases where it was easier to search Wikipedia than the databases. [...] Wikipedia also 'disambiguates' names and offers quick descriptions to enable the searcher to find the correct individual."

A large part of the comparison consists of examining each resource's production process. Wikipedians may find parallels to their policies on biographies of living people, self-published sources and notability in the description for the Biography Reference Bank:

"Current Biography [the main content source of BRB] articles rely on secondary sources, but Wilson [the then publisher] has occasionally spoken directly with subjects or their proxies. Upon publication, many articles have been sent to subjects for review before being updated for the print annual and the databases. If subjects raise objections, misinformation is corrected, but not matters of public record. Adjustments may be made for privacy, for example omitting the specific names of children.
"To be included in World Authors [another source of BRB], authors must have published more than one critically acclaimed book. [...]"
"For autobiographies, Wilson attempted to contact subjects in Junior Authors and World Authors for a statement, but not subjects in Current Biography. [... An example offered by a Wilson employee:] For some reason, Jennie Tourel, a Russian-American opera singer, often provided false information, but, according to the Wilson biography, “passports and other documents that surfaced soon after her death helped to correct some of these inaccuracies'".

In the conclusion, the author answers the initial question by recommending that her employer "re-subscribe to a commercial biographical database" if the budget would permit it again, because "Commercial databases provide a foundation with authoritative core content authenticated prior to publication and integrated with the fabric of information in the library’s holdings. They are easy to search and reliable, although they cannot be as current as Wikipedia or the Web because of their authentication processes. Wikipedia become [sic] more impressive as searching proceeded. The focus may be on verifiability rather than authority and there may be challenges in securing contributors, but the current contributors provide citations and often include unique information." All in all she seems to favor Wikipedia and the two databases over "The web" (Google results) which "may have plenty of dross and be less reliable, harder to search, and focused on commercialism, but there are gold nuggets." She worries: "What will happen if contributors to Wikipedia and the web have no authoritative databases to use as sources?"

Students predict connections between Wikipedians

Among the student projects in a class on "Computational Analysis of Social Processes" at Rensselaer Polytechnic Institute, three analyzed social networks of Wikipedia editors:

Language analysis finds Wikipedia's political bias moving from left to right

A study presented earlier this month at the annual meeting of the American Economic Association which is to appear in The American Economic Review[6] sets out to test whether the English Wikipedia is truly neutral, by measuring bias within a sample of 28,000 entries about US political topics, examined over a decade. The bias is identified through detecting the use of language specific to one side of the American political scene (Democrats or Republicans). To quote from the article: "In brief, we ask whether a given Wikipedia article uses phrases favored more by Republican members or by Democratic members of Congress" (in the text of the 2005 Congressional Record, using a method developed in an earlier paper by Gentzkow and Shapiro who applied it to newspapers). The authors identified, as of January 2011, 70,668 articles related to US politics, about 40% of which had a statistically significant bias. They find that Wikipedia articles are often biased upon creation, and that this bias rarely changes. Early on in Wikipedia's history, most had a pro-Democratic bias, and while "by the last date, Wikipedia's articles appear to be centered close to a middle point on average", this is simply an effect of a larger amount of new pro-Republican articles than due to the existing ones having been rewritten neutrally.

While the authors made efforts to exclude articles not pertinent to US politics (requiring the terms "United States" or "America" to appear at least three times in the article text), the sample also includes the clearly international article Iraq War. And in what Wikipedians may call out as systemic bias, the authors never question their assumption that for an international encyclopedia, a lack of bias would be indicated by the replication of the spectrum of opinions present in the US Congress. As early as 2006, Jimmy Wales objected to such notions with respect to the community of contributors: "If averages mattered, and due to the nature of the wiki software (no voting) they almost certainly don't, I would say that the Wikipedia community is slightly more liberal than the U.S. population on average, because we are global and the international community of English speakers is slightly more liberal than the U.S. population. ... The idea that neutrality can only be achieved if we have some exact demographic matchup to [the] United States of America is preposterous." Nevertheless, even if one turns the study on its head and reads it as a statement on average American political opinion compared to the rest of the world as reflected in the English Wikipedia, its results remain remarkable.

Briefly

References

  1. ^ Danescu-Niculescu-Mizil, C., Lee, L., Pang, B., & Kleinberg, J. (2011). Echoes of power: Language effects and power differences in social interaction. http://arxiv.org/abs/1112.3670 Open access icon
  2. ^ Soules, A. (2012). Where’s the bio? Databases, Wikipedia, and the web. New Library World, 113(1/2), 77–89. Emerald Group Publishing Limited. DOI:10.1108/03074801211199068 Closed access icon
  3. ^ Lavoie, A. (2011). Interaction vs. Homophily in Wikipedia Administrator Selection. http://assassin.cs.rpi.edu/~magdon/courses/casp/projects/Lavoie.pdf Open access icon
  4. ^ Molnar, F. (2011). Link Prediction Analysis in the Wikipedia Collaboration Graph. http://assassin.cs.rpi.edu/~magdon/courses/casp/projects/Molnar.pdf Open access icon
  5. ^ George, R. (2011). Link prediction on a Wikipedia dataset based on triadic closure. http://www.cs.rpi.edu/~magdon/courses/casp/projects/George.pdf Open access icon
  6. ^ Zhu, Feng; Greenstein, S. (forthcoming). Is Wikipedia Biased? American Economic Review (Papers and Proceedings). http://www-bcf.usc.edu/~fzhu/wikipediabias.pdf Open access icon
  7. ^ Nielsen, F. A. (2011). Wikipedia research and tools: Review and comments. http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6012 (working paper) Open access icon
  8. ^ Redondo-Olmedilla, J.-C. (2012). A Review of “Good Faith Collaboration: The Culture of Wikipedia.” The Information Society, 28(1), 53–54. Routledge. DOI:10.1080/01972243.2011.632286 Closed access icon
  9. ^ Colgrove, Caitlin; Neidert, Julia; Chakoumakos, R. (2011). Using Network Structure to Learn Category Classification in Wikipedia. http://www.stanford.edu/class/cs224w/proj/colgrove_Finalwriteup_v1.pdf Open access icon
  10. ^ Waller, V. (2011, November 8). Searching where for what: A comparison of use of the library catalogue, Google and Wikipedia. Library and Information Research. http://www.lirgjournal.org.uk/lir/ojs/index.php/lir/article/view/466 Open access icon
  11. ^ Yanxiang, X., & Tiejian, L. (2011). Measuring article quality in Wikipedia: Lexical clue model. 2011 3rd Symposium on Web Society (pp. 141–146). IEEE. DOI:10.1109/SWS.2011.6101286 Closed access icon
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Very thorough job with this report. Pinetalk 08:14, 31 January 2012 (UTC)[reply]

Second that. Nageh (talk) 08:32, 31 January 2012 (UTC)[reply]
Agreed. Very nice work providing explanations of these studies! Kaldari (talk) 22:27, 31 January 2012 (UTC)[reply]
Yes, very interesting; great work collecting the information and putting this together. MathewTownsend (talk) 01:09, 1 February 2012 (UTC)[reply]



       

The Signpost · written by many · served by Sinepost V0.9 · 🄯 CC-BY-SA 4.0