Wikipedia relies heavily on artificial intelligence (AI) based tools in order to operate at the scale that it does today. The use of AI is most apparent in counter-vandalism tools, like those used to revert nearly all the vandalism on the English Wikipedia: ClueBot NG, Huggle and STiki. These advanced wiki tools use intelligent algorithms to automatically revert vandalism or triage likely damaging edits for human review. It's arguable that these tools saved the Wikipedia community from being overwhelmed by the massive growth period of 2006–2007.
Regretfully, developing and implementing such powerful AI is hard. A tool developer needs to have the expertise in statistical classification, natural language processing, and advanced programming techniques as well as access to hardware to store and process large amounts of data. It's also relatively labor-intensive to maintain these AIs so that they stay up to date with the quality concerns of present day Wikipedia. Likely due to these difficulties, AI-based quality control tools are only available for English Wikipedia and a few other, larger wikis.
Our goal in the Revision Scoring project is to do the hard work of constructing and maintaining powerful AI so that tool developers don't have to. This cross-lingual, machine learning classifier service for edits will support new wiki tools that require edit quality measures.
We'll be making quality scores available via two different strategies
http://ores.wmflabs.org/scores/enwiki?models=reverted&revids=644899628|644897053
→
{"644899628":
{"damaging":
{"prediction": true,
"probability": {'true': 0.834253, 'false': 0.165747}
}
},
"644897053":
{"damaging":
{"prediction": false,
"probability": {'false': 0.95073, 'true': 0.04927}
}
}
}
from mw import api
from revscoring.extractors import APIExtractor
from revscoring.scorers import MLScorerModel
model = MLScorerModel.load(open("enwiki.damaging.20150201.model"))
api_session = api.Session("https://en.wikipedia.org/w/api.php")
extractor = APIExtractor(api_session, model.language)
for rev_id in [644899628, 644897053]:
feature_values = extractor.extract(rev_id, model.features)
score = model.score(feature_values)
print(score)
We'll also provide raw labelled data for training new models.
We've already completed our first milestone: replicating the state of the art in damage detection for English, Turkish and Portuguese Wikipedias. In the next two months, we will construct a manual hand-coding system and ask a set of volunteers to help us categorize random samples of edits as "damaging" and/or "good-faith". These new datasets will help us train better classifiers. If you'd like to help us gather data or extend the scoring system to more languages, please let us know by saying so on our talk page.
Discuss this story
Please exercise extreme caution to avoid encoding racism or other biases into an AI scheme. For example, there are some editors who have a major bias against having articles on every village in Pakistan, even though we have articles on every village in the U.S. Any trace of the local writing style, like saying "beautiful village", or naming prominent local families, becomes the object of ridicule for these people. Others can object to that, however. But AI (especially neural networks, but any little-studied code really) offers the last bastion of privacy. It's a place for making decisions and never mind anybody how that decision was decided. My feeling is that editors should keep a healthy skepticism - this was a project meant to be written, and reviewed, by people. Wnt (talk) 12:58, 20 February 2015 (UTC)[reply]