Capítulo II: El ecosistema informativo digital en Sudamérica
3.1 Argentina: análisis de casos
3.1.2 Cosecha Roja
One of the early works done on free-text keystroke dynamics was that done by Gaines et al. [39] in 1980. In that research, data was collected from six professional typists. The typists were requested to type-in three paragraphs of text at two different times, which were four months apart. The data from the three different passages were then used in combination because it was found that there was only little information when using the three passages individually. The three passages included: ordinary English text, a collection of random words, and a collection of random phrases, respectively. The study relied on the down-down time of di-graphs that had occurred at least 10 or more times. Removing the outliers was then performed for di-graph latencies that exceeded 500 milliseconds. The classification process was done using a two-sample t-test for each user. The FAR for this study was zero and the FRR was about 4%. A further enhancement was applied to this study; using only core digraphs which are these five di-graphs: “in, io, no, on, and ul” in the testing process. This yielded in perfect authentication performance, zero FAR and FRR. Although the results are very encouraging, the number of participants involved in the experiment was significantly small.
Likewise, Umphress and Williams [34] used the mean and standard deviation of all latencies to represent the users’ profiles. Samples with errors and standard deviation over 0.75 were discarded. Only the first 6 letters of each word were considered which matched with what has been already discussed by Card et al. [93]. This study utilized two tests, the first of which compared each of the latencies in the test data with the corresponding di-graph in the profile matrix; if it was within 0.5 standard deviation from the mean, it was considered a valid di- graph; the ratio of valid di-graphs governed the acceptance of the test data as being from the original user. The second test applied a two-tailed t-test to the overall means. Seventeen people were asked to take 2 typing tests separated by several days. The first was the user profile, including 1400 characters, and the second was the test sample, consisting of 300 characters. The error rates of the study were found to be 12% FRR and 6% FAR.
The R measure, or degree of disorder, was first introduced by Bergadano et al. [73]. The degree of disorder was calculated by summing the distances between the orders of each element in the two samples. It was used to find the distance between two samples based on the tri-graph duration time of these samples. The tri-graph duration is the elapsed time between the press of the first and of the last keys of a three key sequence. Forty-four
volunteers were requested to re-type a text of 683 characters five times. This produced a total of 220 samples, which were used for genuine testing. In addition, 154 participants produced 71500 samples for imposter testing. An overall of 0% FAR and 2.3% FRR was produced in this study.
One of the most cited free-text studies is that conducted by Gunetti and Picardi [8], in which they aimed to refine the algorithm used by Bergadano et al. [73]. This study depended on two measures, the first of which was the relative measure (R measure), which was developed by Bergadano et al. to find the degree of disorder between the two samples; similar to [73]. Unfortunately, the R measure was not always enough, i.e. if the typing speed of all the di- graphs in one sample are exactly twice as the other sample, the distance between the two samples will be zero and fail to differentiate between samples. Therefore, the absolute measure (A measure) was introduced to calculate the absolute distance between the two samples. In both the R and A measures, only down-down time of n-graphs occurring in both typing samples was considered. These n-graphs include di-graphs, tri-graphs, and other longer n-graphs, as opposed to using only tri-graphs in [73]. Forty people participated in this study providing fifteen typing samples each; over a period of six months. Both A and R measures have to be a minimum value to give good estimation that the two typing samples are fairly similar, therefore, originated by the same person. The authors experimented with several combinations of A and R measures using various n-graphs. The best result they found was an FRR of 5% and an FAR of 0.005%. Even though these results were very good, the computational cost required to identify users was expensive because it was necessary to compare the test sample with all users’ templates in the database, which clearly makes it less scalable.
Hu et al. [115] attempted to solve the scalability issue of Gunetti and Picardi’s method [8] using the K-nearest Neighbor classifier. In this approach, training samples were divided into clusters such that, every test sample was compared only with the samples of those users in the same cluster. Results from this modification revealed accuracy which compared well with that of [8]. Computation speed, on the other hand, proved to be 66.7% better.
In addition, Davoudi and Kabir carried-out a number of modifications on the method introduced by Gunetti and Picardi [8]. In [116] Davoudi and Kabir combined the R and A measures with a distance-calculating method that used histogram-based density estimation to find the probability density function of each di-graph’s duration time. This modification
resulted in five false rejections and nine false acceptances. Moreover, Davoudi and Kabir modified the relative distance, in [117], by choosing the di-graph with the highest difference in duration between the two samples to compute the difference of its positions first. After that, it was removed from the two timing vectors, and then, the new vectors were sorted again. This resulted in 0.08% FAR and 18.8% FRR. Davoudi and Kabir also applied one further modification to Gunetti and Picardi’s method, in [118], by adding a weight factor to the digraphs when computing the relative distance. This weight was defined as the ratio of the number of occurrences of this di-graph and its standard deviation. The study resulted in 0.07% FAR and 15.2% FRR. All of the modifications mentioned were applied to a subsection of the data extracted by Bergadano et al. in [8]; this includes 21 participants producing 15 samples each.
Continuous authentication was investigated by Bours and Barghouthi [77]; they used a penalty-reward function for long free-text keystroke authentication. They used the duration of a single key and the latency of two successive keys as the timing features for their experiment. In particular, only the durations and the latencies of keys which occurred more than 50 times and which mean and standard deviation were under a predefined value were added to the user’s templates. The penalty-reward function in this research had a zero start-up value and it increased if the distance between the test sample and the user’s template was larger than a threshold and decreased otherwise. If the value of this function gets higher than another threshold, the user was denied further access to the system. This experiment used data from 25 volunteers, which was collected over at least six days. At the end of the experiment, the average number of keystrokes typed before an imposter was locked out was found to be between 79 and 348 keystrokes.
Most of the previous work has not fully benefited from the keystroke dynamics concepts, in the sense that they did not consider key-pairs. Therefore, Park et al. [69] divided all keystrokes to four features; left hand side, right hand side, spacebar and backspace bar. Then, they created di-graphs using feature combinations. This resulted in sixteen key-pair features, e.g.: left hand side key & space bar, left hand side key & backspace bar etc. Comparing the two samples was performed using these key-pairs. Only key-pairs with more than ten appearances were used in the comparison process. The Kolmogorov-Smirnov test (KS-test) was then used to compare samples. Thirty-five users participated in this study, in which they were requested to type two page length news articles. An EER of 0.0892% proved that this method had indeed increased the performance of keystroke dynamics authentication.
Similar to Park et al. [69], a key grouping technique was introduced by Sing and Arya [50]. Key grouping was performed by classifying the keys based on their location on the keyboard. The keyboard was divided into 8 sections; two left and right halves and then each half was divided into 4 lines representing the rows of the keyboard. For example “wm” is represented as Left2-Right4. Flight times between these key-pairs were utilized. Euclidean distance was applied to calculate the difference between the training and testing vectors. A threshold was defined to decide if the test sample had originated from the authorised user, however, there were no details about its value or how it was chosen. Data used in this experiment was collected from 20 objects, each of which performed 5 login trails, as a legitimate user, and 5 login trails, as an impostor. The overall performance reached 4.0% FRR and 2.0% FAR.
In addition to the standard duration, the latency of the 20 most frequent di-graphs in English and the total duration time of the 20 most frequent words in English were also used as keystroke features in [119]. SVMs, k-nearest neighbour, Naive Bayes classifier were all used to classify the data collected from 28 individuals. For building the user profile, the first 25 repetitions of each feature were captured, 20 of which were used for training and the remaining 5 for testing. The SVMs succeeded to achieve the best accuracy reaching 90%. Moreover, duration time was found to produce the best performance followed by the word total duration. Using only the frequent words might not have been the best choice, as most of the frequent words used in this study are very small words such as: “the”, “is”, “it”. It would have been interesting to compare the results from such words with the results from other longer words.
Moreover, a list presenting brief details about some of the work done in free-text keystroke dynamics authentication is shown in table 2.5.
Table 2.5: A list of free-text keystroke dynamics studies.
Study Features Method Subjects Samples Performance
Gaines et al. [39] Latency T-test 6 36 0.00% FAR, 4% FRR Umphress and Williams
[34] Latency Standard deviation, t-test 17 34
6% FAR, 12% FRR
Monrose & Rubin [33] Latency, duration Euclidian distance, probability score, weighted di-graph probability 31 - 23% accuracy Gunetti & Ruffo [71] Latency, executed commands Decision tree 10 - 90% accuracy
Dowland et al. [78] Latency Mean, Standard deviation 4 - 50% accuracy Gunetti & Picardi [8] N-graph duration Relative distance, absolute distance 205 765 0.005% FAR, 5%
FRR Gunetti et al. [88] N-graph duration Relative distance 30 124 1.67%FAR,
11.67% FRR Villani et al. [49]
Latency, duration, typing speed, percentage of
special characters, editing patterns
Euclidian distance, k-nearest
neighbour 118 2360 99.8% - 44.2 % accuracy Curtin et al. [79]
Latency, duration, typing speed, percentage of
special characters, editing patterns
Euclidian distance, k-nearest
neighbour 30 90
100% - 97% accuracy Filho & Freire [89] Latency Simplified Markov chain model 15 150 41.6% - 12.7%
EER Janakiraman & Sim
[120] Latency, duration Bhattacharyya distance 22 -
100% - 70% accuracy Buch et al. [121]
Latency, duration, percentage of special
characters Euclidian distance 36 650
100% - 98% accuracy Hu et al. [115] N-graph duration Relative distance, absolute distance, k-nearest neighbour 36 36554
0.045% FAR, 0.005% FRR Hempstalk et al. [80] typing speed, error rate, press-release ordering One-class classification 10 150 11.3% FAR, 20.4% FRR Ahmed et al. [91] Latency Neural network 22 - 0.015% FAR, 4.82% FRR Davoudi & Kabir [116] N-graph duration Relative distance, absolute distance, histogram-based density estimation 21 315 0.0025% FRR 0.015% FAR, Pilsung et al. [122] Latency Kolmogorov-smirnov Test - - 0.17% EER Samura & Nishimura
[76] Latency, duration Weighted Euclidian distance 112 -
67.5% - 81.2% accuracy Bours & Barghouthi,
[77] Latency, duration Distance measure 25 -
79 – 348 keystrokes Davoudi & Kabir [117] N-graph duration Modified relative distance 21 315 0.08% FAR, 18.8% FRR Davoudi & Kabir [118] N-graph duration Weighted relative distance 21 315 0.07% FAR, 15.2% FRR Park et al. [69] Latency Kolmogorov-smirnov Test 35 - 0.089% EER Messerman et al. [43] N-graph duration Normalized relative distance 55 - 2.20% FAR, 1.84% FRR Sing & Arya [50] Latency Euclidian distance 20 - 2.00% FAR, 4.00% FRR Chantan et al. [27] Latency Bayes classifier - - 0% EER Bakelman et al. [68] Latency K-nearest neighbour 20 200 4% EER
Bours [44] Latency, duration Scaled Manhattan distance 25 1620 182 keystrokes Monaco et al. [123] Latency, duration K-nearest neighbour 30 300 99.96% accuracy
Kang & Cho [124] Latency
MV test, K–S statistic, C–M criterion, R measures, A measures,
Gauss, Parzen, k-NN, SVDD. 35 35 7.87% EER Matsubara et al. [125] Latency, duration Weighted Euclidean distance, Relative
distance 250 2500 92% accuracy Darabseh & Namin
[119]
Latency, duration, word total time duration
SVMs, k-nearest neighbour, Naive
Bayes classifier 28 700 90% accuracy
Descriptions of latency, duration and n-graph duration are found in Section 2.6. Definitions of accuracy, FAR, FRR, ERR and keystrokes are found in Section 2.7