Signpost investigation: code review times

Signpost investigation: code review times

Code review figures mixed, improving

The chart that accompanied last month's story about code review times, the vagaries of which prompted the present investigation.

Late last month, the "Technology report" included a story using code review backlog figures – the only code review figures then available – to construct a rough narrative about the average experience of code contributors. This week, we hope to go one better, by looking directly at code review wait times, and, in particular, median code review times

To this end, the Signpost independently analysed data from the first 23,900 changesets as they stood on September 17, incorporating some 66,500 reviews across 32,100 patchsets. From this base, changes targeted at branches other than the default "master" branch were discarded, as were changesets submitted and reviews performed by bots. Self-reviews were also discarded, but reviews made by a different user in the form of a superseding patch were retained. Finally, users were categorised by hand according to whether they would be best regarded as staff or volunteers.^{[nb 1]} Although this week's article focuses mainly on so-called "core" MediaWiki code, future issues will probe extension-related statistics.

WMF bosses will, on the whole, be pleased with the final figures. 50% of revisions to core MediaWiki code submitted during August was reviewed for the first time in just 3 hours 30 minutes, with 25% being reviewed in 20 minutes and 75% within 27 hours. These figures were similar across both first patchsets and later amendments, and similar with regard to slight changes in what qualified as "a review".^{[nb 2]} The relevant trend over time is considered in the following tables. On the left is review time across all patchsets submitted to core; with the right hand table, just the first patchset in any given changeset is included.

Month	25%	Median	75%	Current mean
May	42 minutes	4 hours and 25 minutes	1 day, 11 hours and 27 minutes	3 days, 3 hours and 38 minutes
June	47 minutes	19 hours and 10 minutes	3 days, 16 hours and 45 minutes	5 days, 8 hours and 29 minutes
July^{[nb 3]}	39–40 minutes	7 hours and 4–8 minutes	2 days, 5–9 hours	2 days, 16 hours and 38 minutes
August^{[nb 3]}	20–21 minutes	3 hours and 11–29 minutes	21–24 hours	1 day, 11 hours and 52 minutes

Month

25%

Median

75%

Current mean

May

42 minutes

4 hours and 25 minutes

1 day, 11 hours and 27 minutes

3 days, 3 hours and 38 minutes

June

47 minutes

19 hours and 10 minutes

3 days, 16 hours and 45 minutes

5 days, 8 hours and 29 minutes

July^{[nb 3]}

39–40 minutes

7 hours and 4–8 minutes

2 days, 5–9 hours

2 days, 16 hours and 38 minutes

August^{[nb 3]}

20–21 minutes

3 hours and 11–29 minutes

21–24 hours

1 day, 11 hours and 52 minutes

Month	25%	Median	75%	Current mean
May	38 minutes	3 hours and 27 minutes	1 day, 5 hours and 4 minutes	2 days, 1 hour and 58 minutes
June	45 minutes	12 hours and 34 minutes	2 days, 13 hours and 31 minutes	3 days, 7 hours and 39 minutes
July	22 minutes	3 hours and 16 minutes	1 day, 7 hours and 21 minutes	1 day, 17 hours and 18 minutes
August	19 minutes	3 hours and 33 minutes	19 hours and 50 minutes	1 day, 1 hour and 26 minutes

Month

25%

Median

75%

Current mean

May

38 minutes

3 hours and 27 minutes

1 day, 5 hours and 4 minutes

2 days, 1 hour and 58 minutes

June

45 minutes

12 hours and 34 minutes

2 days, 13 hours and 31 minutes

3 days, 7 hours and 39 minutes

July

22 minutes

3 hours and 16 minutes

1 day, 7 hours and 21 minutes

1 day, 17 hours and 18 minutes

August

19 minutes

3 hours and 33 minutes

19 hours and 50 minutes

1 day, 1 hour and 26 minutes

The data show, then, that there has been a marked improvement in getting followup patchsets reviewed quicker, while review times for "first attempt" patchsets have improved less dramatically. Other analyses are more concerning. For example, a volunteer-written patchset waits, on average (either median or mean) twice as long as a staff-written one for its first review, although the gap has closed from three times as long in June and July. Staff provide 86% of first reviews for core, with just five staff members collectively accounting for some 55% of the total.^{[nb 4]} Moreover, even in August, more than 5% of patchsets targeted at core waited a week for their first review.

As with all large datasets, it is difficult to rule out subtle methodological issues and in any case unideal to pinpoint trends over as short a period as four months. The full data set is available upon request.

Notes

^ One notable side effect of the methodology employed was the exclusion from the final analysis of patches that were abandoned or amended without review, even if the user had intended for them to be so reviewed and/or the amendments were minimal. Future analyses may wish to refine their methodology to take this into account.
^ Specifically, the in/exclusion of reviews not assigning numerical scores and the in/exclusion of force abandonments and reversions.
^ ^a ^b Ranges indicate the possible impact of patchsets still awaiting a review.
^ The equivalent figures for core plus WMF-deployed extensions are 95% and 43% respectively.

In brief

Signpost poll

External sites

You can now give your opinion on next week's poll: In my view, the WMF's code review priority should be...

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.

HTML5 problems and tolerating invalidity: After initial reports last week of problems with cell alignment, this week brought fresh reports of HTML5 problems centred on Wikipedia "pushpin" maps. The maps, which show given co-ordinates overlaid on a background map, are highly sensitive to single-figure changes in display position, which seems to be the source of the present difficulties. All broken templates are expected to be fixed by hand, although it is unclear how easily or reliably this small degree of brokenness can be identified. In related news, a consensus appeared to form on the wikitech-l mailing list in support of outputting "invalid" (but still functioning) HTML5 in preference to subject it to an automated but error-prone automatic conversion.

Search data published, retracted: The publication of "anonymous search log files for Wikipedia and its sister projects" was halted and reversed on Wednesday after the anonymised data was found to contain a limited amount of personal information (Wikimedia blog). Email addresses, credit card numbers and social security numbers had previously been removed from the dataset, which was intended to (among other things) "provide valuable feedback to our editor community, who can use it to detect topics of interest that are currently insufficiently covered". The team behind the release does not yet know when the same data might be released in a more fully anonymised form; there was also no indication in the blog post of how many copies are thought to be circulating.

French Wikipedia told: do worry about performance: The recent creation of a series of templates designed to map area codes to values on the French Wikipedia caused a significant detrimental effect on overall performance, reports Lead Performance Architect Tim Starling. The templates, which provoked the first direct WMF intervention in template content in recent times, used a switch statement that included nearly 40,000 items, all of which had to be loaded into memory on each template invocation. Using Lua, due for release in January, would partially ameliorate the situation, it was reported. "At some point in the future, you may be able to put this kind of geographical data in Wikidata. Please, template authors, wait patiently", wrote Starling, while Wikidata Project Director Denny Vrandečić added that he was "at the same time delighted ... by the capability and creativity of the Wikipedia community to solve such tasks with MediaWiki template syntax and ... horrified by the necessity of the solution taken".

Your preference over preferences?: The thorny issue of rethinking the options available at Special:Preferences was raised again this week with the creation of an RfC looking at the issue (wikitech-l mailing list). Even slight amendments to certain preferences have previously caused controversy; for example, the removal of preferences tends to provoke anger among those users reliant upon them, while the renaming or repositioning of preferences tends to cause confusion. The work links in with moves to create "global" gadgets, which could then replace static preferences such as alterations to link colours.

An example of post-edit feedback from the first trial

Results in from first "post-edit feedback" trial: The results from the first trial of post-edit notifications (example pictured) have now been published on the Wikimedia blog. The trial, undertaken by the Foundation's "Editor Engagement Experiments" (E3) team, trialled displaying the messages to new editors after every save, finding an overall increase in quantity of edits and no change in editor quality. The same team is now testing messages only after "milestone" edit counts have been reached, to compare the effect on enticing new editors to edit more. In related news, the Foundation's "page curation" project saw its first stable release this week, with users given the green light to use Special:NewPagesFeed and its toolbar on a day-to-day basis.

← Previous "Technology report"

Next "Technology report" →

In this issue

24 September 2012 (all comments)

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

For the search logs - that's kind of embarrassing especially given that I'm pretty sure its a re-occouring topic on wikitech-l about how we do not release search logs for the very reason they were taken down... Bawolff (talk) 18:43, 26 September 2012 (UTC)[reply]
- The data should be available for researchers. The benefits for the community are significant. To avoid spam harvesting, the data could be put behind a registration wall or such. --_{Piotr Konieczny aka Prokonsul Piotrus| reply here} 19:43, 26 September 2012 (UTC)[reply]

- It's more than embarrassing. Every time this topic (releasing search logs) has ever come up, the AOL search data leak is mentioned. It seems Wikimedia wanted to join the club? I'm kind of shocked. --MZMcBride (talk) 23:05, 26 September 2012 (UTC)[reply]

Non-difference in PEF-1 success rate

There was no significant difference in the success rate of editors in the PEF-1 experiment. When I arrived, this report stated that there was an insignificant decrease in quality. This is both false and misleading. There was actually an insignificant increase in quality in the reported statistics. However, this detail is irrelevant since the statistical test failed to identify a meaningful difference and other analyses from the report reversed the comparison. The best we can report to a lay audience is that no change was observed. See meta:Research:Post-edit_feedback/PEF-1. --EpochFail^(talk|work) 19:55, 26 September 2012 (UTC)[reply]

Quite right, I misread the chart. - Jarry1250 ^{[Deliberation needed]} 21:20, 26 September 2012 (UTC)[reply]