Wikipedia Reference Desk quality analyzed

In an article published recently in the Journal of Documentation,^[1] library researcher Pnina Shachaf analyzed the quality of answers at the Wikipedia Reference desk. The reference desk serves as an open forum for visitors to ask questions about any topic not directly related to Wikipedia itself (the Help desk answers questions about the site); anyone is welcome to help answer questions. This paper is the first to study Wikipedia's reference desk. It found that Wikipedia volunteers performed as well or better than traditional library reference desk services on most quality measures, providing a similar level of service.

Shachaf's study analyzes the quality of Wikipedia Reference desk answers in the context of other, similar studies. A literature review is given of other studies of online question and answer boards, such as Yahoo! Answers, as well as a brief review of classic studies of the effectiveness of traditional library reference services.^[2] Shachaf notes that collaborative Q&A sites are a new model for reference, and that research to examine their quality is still new and rarely takes account of findings from traditional reference research.

The study uses content analysis to analyze reference desk answers on three measures: reliability ("a response that is accurate, complete, and verifiable"); responsiveness ("promptness of response"); and assurance ("a courteous signed response that uses information sources") (pp. 982). These are based on a metric called the SERVQUAL measures that have been extensively used in other studies of library reference services. They also map to the basic guidelines that are given to question-answerers on the Wikipedia reference desk (for instance, to sign responses).

The data sample used was from April 2007, and analysis was done on 77 questions with a total of 357 responses, out of the 2,095 questions received in April 2007 (or an average of 299 transactions for each of the seven topical reference desks). Shachaf notes that "on average, the Wikipedia Reference Desk received 70 requests per day and users provided an average of 4.6 responses for each request" (pp 980). Shachaf first analyzed whether the questions were asked and answered by "experienced" (determined for the purposes of this study as an editor with a userpage) or "novice" (without a userpage) users, finding that 85% of answers were provided by "experienced" users.

Of the questions analyzed, Shachaf found that most questions were answered quickly (on average, the first response was given after four hours); that answers were signed with Wikipedia usernames; and that 92% of the questions given a partial or complete answer. 63% of the questions were answered completely. Of the factual questions where the coders were able to determine accuracy, it was found that 55% of the answers were accurate, 26% were not accurate, and in 18% of the cases, there was no consensus reached on the reference desk. 55% is comparable to studies of the accuracy of traditional one-on-one reference.^[3]

The sources used in reference desk answers were also examined. The sources used in a sample of 210 interactions were analyzed; Wikipedia articles were referred to in 93% of these transactions and account for 44% of the references listed. Sources such as journals, databases, and books were very rarely used. This is a major difference from answers provided in traditional library reference services; librarians tend to use and cite sources, including traditional information sources such as journals and databases.

Shachaf compares these statistics to traditional library reference services. Overall, answers at the Wikipedia reference desk are comparable to library reference services in accuracy, responses are on average posted more quickly than emails to libraries are replied to, questions are answered more completely at the Wikipedia reference desk than via library virtual reference services, and thank-yous from question askers are received at the same rate. The conclusion is that "The quality of answers on the Wikipedia Reference Desk is similar to that of traditional reference service. Wikipedia volunteers outperformed librarians or performed at the same level on most quality measures" (pp. 989).

However, Shachaf cautions that these results are only achieved in the aggregate. Shachaf writes:

...while the amalgamated (group) answer on the Wikipedia Reference Desk was as good as a librarian's answer, an amateur did not answer at the same level as an expert librarian. Answering requests in this amateur manner creates a forest of mediocrity, and, at times, the "wisdom" of the crowd, not of individuals, reaches a higher level. For a user whose request received more than four answers, sorting out the best answer becomes a time consuming task. ... The quality of an individual message did not provide answers at the same level as individual librarians do, but an aggregated answer made it as accurate as a librarian's answer (pp.988–989).

Shachaf offers some ideas as to why the all-volunteer Wikipedia Reference Desk service might work as well as library reference services, including the possibilities that experienced question-answerers gain practice in answering reference questions similar to professional librarians; that the wiki itself is conducive to providing collaborative question-answering services (more so than most software used for library reference); that the type of questions being asked may differ from Wikipedia to libraries; and that (according to Shachaf, the most likely possibility) the collaborative aspects of the service, where answers can be expanded on, improved and discussed, helps improve answer quality. She concludes that more research is needed into the nature of online Q&A boards staffed by volunteers.

Notes

^ Shachaf, Pnina. (2009). "The paradox of expertise: is the Wikipedia Reference Desk as good as your library?." Journal of Documentation, v. 65 (6). pp. 977–996. [1]. Not available freely online.
^ Traditional library reference is understood in this context as a questioner interacting one-on-one with a professionally trained librarian, either in-person at a library reference desk, or via email/chat/phone.
^ Note that it is quite difficult to determine accuracy for most reference transactions, since answers may be partially accurate or have qualitative situational differences; 55% accuracy is a standard estimate based on studies of in-person reference desk interactions in the 1980s (citation to Hernon and McClure (1986) and later analyses given by Shachaf).

In this issue

1 March 2010 (all comments)

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Cool. I will use the reference desk for what ever I need now! 70.171.224.249 (talk) 23:43, 2 March 2010 (UTC)[reply]

The criteria for a "novice/experienced" user are extremely flawed. As a result, that aspect of the analysis is pretty much useless.--Rockfang (talk) 01:16, 3 March 2010 (UTC)[reply]
Agree with Rockfang. Nearly 6 months editing/≈2000 edits, ≈342 on the Ref. Desk. [2] & I don't have a user page (yet). Novice? Almost a Journeyman Editor! --220.101.28.25 (talk) 06:01, 3 March 2010 (UTC)[reply]

As great as that is, I certainly hope the 2010 RD regulars have been doing better than the 2007 ones. (average of four hours to answer?!) ALI ^{nom nom} 18:03, 3 March 2010 (UTC)[reply]

I think I read on the Ref Desk Talk page discussion of this paper that the author analyzed the first 10 questions on a specific day from each of 7 desks. This could create some sort of time zone effect affecting the speed of response, depending on when the date header rollover is w.r.t. when most editors are active.

But anyway, using that metric, and starting with the March 1 section headers (and without exhaustively going into how correct the answers are):

The Computing desk scores 3 hours 30 minutes on average (though there is one question still unanswered that I didn't include). Six questions were answered in under 20 minutes, one in 85 minutes, and the two that pulled the average down took more than 12 hours each.
The Humanities desk (disclaimer: I answered one of these, but long before thinking of this analysis) scores 3 hours 35 minutes on average (though again there is one question still unanswered). Only two questions were answered in under an hour.

Out of time to do any more but I think a four hour average is better than it first sounds. Best, WikiJedits (talk) 17:56, 4 March 2010 (UTC)[reply]

I was surprised to see that this research paper by Shachaf is not online. I wonder why? Ottawahitech (talk) 20:10, 3 March 2010 (UTC)[reply]

I wonder if they also analysed Q&A sites from the StackOverflow family. It seems to me their method of ranking answers would probably be the best way to avoid the time consuming task of choosing the best answer, as they mention. Also, I believe they'd have fared well, because of their smart karma system. --Waldir ^talk 08:04, 4 March 2010 (UTC)[reply]

I suspect that in traditional library reference questions, we underperform, and in more expertise-related questions, we may overperform, due to the rather high level of talent and knowledge some of our reference desk regulars have displayed (over at math there are several people who are very clearly professional mathematicians fielding some fo the questions. Ray^Talk 15:13, 4 March 2010 (UTC)[reply]

Same goes for the language desk; a traditional librarian is not likely to speak as many languages as all of our answerers put together do. -- Александр Дмитрий (Alexandr Dmitri) (talk) 19:46, 5 March 2010 (UTC)[reply]