Wikipedia has a variety of counter-vandalism software tools developed by the com- munity that fall into two main categories: automatic detection (bots) and assisting users (editing applications). We briefly describe three notable bots and three notable editing applications that have a clear presence on Wikipedia in repairing vandalism. Other counter-vandalism tools are detailed by the Counter-Vandalism Unit32, a com- munity project dedicated to combating vandalism by training new counter-vandals and developing counter-vandalism tools.
Table 2.1 shows the number of bots that exist compared to users (before the year 2013) and how many of each have been active in December 2012. We further discuss the significance of bots and why they must be considered in research in the next section. We do not cover editing applications in this thesis because they are not solely focused on detecting vandalism and there are related works that are actively providing solutions for better user interfaces [West et al., 2010b; Halfaker et al., 2014].
• Notable bots:
– Anti-Vandal Tool33is a bot that monitors the feed of all edits on Wikipedia as they occur. Vandalism is detected by matching words in the edit to a list of vandal words used in past vandalism cases.
– ClueBot34was the most active counter-vandal bot from 2007 to 2011. When this bot inspects an edit, a score is determined from a variety of pattern matching heuristics that includes large changes, mass deletes, controver- sial topics, targeted celebrities, incorrect redirects, vulgar words, minor sneaky changes (explained in Chapter 6), and others that are added as certain types of vandalism are discovered.
– ClueBot NG35 is the successor to ClueBot and also the first Wikipedia counter-vandalism bot to use machine learning algorithms to improve de- tection rate and lower false positives. ClueBot NG uses a combination
32https://en.wikipedia.org/wiki/Wikipedia:Counter-Vandalism_Unit 33https://en.wikipedia.org/wiki/User:Lupin/Anti-vandal_tool 34https://en.wikipedia.org/wiki/User:ClueBot
§2.4 Wikipedia’s Counter-Vandalism Tools 15 URL https://es.wikipedia.org/w/index.php?title=Arte&di= 61972492&oldid=61972471 Title Arte Editor PatruBOT Vandalised Paragraph
Ezequiel es un artista feo, pero es mejor que un niño de 3 años, gusta de Ani ydado que su definición está abierta a múltiples interpretaciones, [. . . ]
Repaired Paragraph
La noción de arte continúa hoy día sujeta a profundas disputas,
dado que su definición está abierta a múltiples interpretaciones, [. . . ]
Figure2.5: An example case of vandalism on the Spanish Wikipedia.
URL https://fr.wikipedia.org/w/index.php?title=Algorithme&di=
79567568&oldid=79566816
Title Algorithme
Editor El Caro
Comment Révocation de vandalisme par 90.47.194.165 ; retour à la version de Lomita
Vandalised Paragraph
Une [[recette de cuisine]] est untrisomique. Elle en contient les élémentsautistes
Repaired Paragraph
Une [[recette de cuisine]] est unalgorithme. Elle en contient les élémentsconstitutifs
Figure2.6: An example case of vandalism on the French Wikipedia.
URL https://ru.wikipedia.org/w/index.php?title=%D0%9F%D0%
BE%D0%BB%D1%8C%D1%88%D0%B0&di=1199673&oldid= 1197217
Title Algorithme
Editor 83.21.20.7
Comment rev. vandalismul Vandalised
Addition
POLSKA PONAD WSZYSTKO, RUSKIE ´SCiERWA (Peslu
blika Po l~xa[. . . ] Repaired
Paragraph
\Pol~xa" (Peslu blika Po l~xa[. . . ]
of predefined rules, Bayesian classifiers, and artificial neural networks to generate a vandalism score for a revision that is passed through a thresh- old calculation and post-processing filters. Known vandalism instances are collected in a data set for the bot to learn models of vandalism. As the data set grows over time and new machine algorithms are added, it is expected that ClueBot NG will be more accurate in distinguishing vandalism. Some weaknesses of ClueBot NG are: no open or peer-reviewed research of the correctness in identifying vandalism, and the discontentment of editors wrongly accused of vandalism; and the focus of development mainly on the English Wikipedia, as seen in Table 2.2.
• Notable editing applications:
– Huggle36 is a browser application that allows fast viewing of incoming edits. It allows users to identify vandalism or non-constructive edits, and to quickly revert them.
– STiki37 is a cross-platform application for trusted users to detect and re- vert vandalism and other non-constructive edits. This application was developed from research [West et al., 2010b] and uses a variety of machine learning algorithms to identify potential vandalism for human editors to inspect. Importantly, it allows users to classify an edit in four categories: vandalism, good-faith revert, pass, and innocent, which feeds back into the algorithms to adjust their models.
– Snuggle38 is a browser application designed to allow experienced editors to observe the activities of new editors and distinguish vandals and non- vandals. This application was developed from research [Halfaker et al., 2014] to address the decline in retention of new Wikipedia users. The interface provides four categories to classify edits analogous to STiki, but allows viewing of an editor’s editing history and personal messaging to provide feedback to (new) users.