2. CAPÍTULO II MARCO DE REFERENCIA
2.2. Marco teórico
Related work already gave first indications that consistency is of high importance especially for a short and focused task such as authentication [107]. Till now, one of the main drivers for consistency has always been accessibility [58, 90, 105]. Within the long-term study, we found several results that support the assumption of the necessity of consistency, performance-wise as well as memorability-wise. Therefore, the results give further evidence that consistency is of high importance. Within this work, two types of consistency are distinguished. Firstly, outer consistencyrefers to the consistent use of the same authentication method. Strictly speaking, the different keypad layouts of this study could be considered different authentication mechanisms.
Inner consistencyon the other hand means that when using the same authentication mechanism for two different times, the user has to do the same input in the same way to authenticate. In the example of this experiment, inner consistency is therefore absent when using a random layout since the users have to press keys in a different order every time they want to authenticate. Talking about consistency within this thesis refers to inner consistency if not indicated otherwise.
The analysis of the different time measurements revealed that the results are influenced by ex- treme values. Especially preparation times are influenced by those values. Having a look at the 95th percentile of the results of the preparation times (thus ignoring the last 5% of times), shows that the maximum becomes the more extreme, the more the layouts differ from the participants’
standard layout, the telephone layout. For the telephone layout, the maximum of the 95th per- centile is 2.58 seconds. Using the calculator layout already increases this value by 45.3% (1.17 seconds). The linear layout increases it by 66.7% (1.72 seconds). The layout with the biggest difference from the telephone layout, the random layout, adds 214.3% (5.53 seconds).
This increase can be explained with human perception. Unfamiliar layouts have to be scanned sequentially to find the right digits. From study observations, we know that especially for PIN, participants tend to start the input after they successfully have identified all required digits. An- other influencing factor is the fact that many participants used strategies based on visual and mus- cle memory cues (simply said, remembering the positions instead of the digits). These strategies do not work when the layout changes and thus the PIN has to be remembered in a more complex way. Thus, the lack of consistency has an additional negative influence if consistency is expected by the participants due to the normal authentication approach.
Results of the follow-up study using the random layout are better when it comes to preparation times and error rate. For instance, participants in the follow-up study created hardly any critical errors using the random layout while this layout caused most of the critical errors during the main study. This gives additional support to the assumption that consistency is an important criterion. Lacking inner consistency in the keypad layout (being random), the participants had to employ other (less effective) strategies to remember their PINs. Therefore, the preparation phase of the follow-up study only consists of the time required for the sequential search and advanced memory strategies could not lead to confusion due to their absence.
Based on these results only, one could assume that the lack of inner consistency in the random layout is acceptable since it forces the users to apply other strategies. However, in chapter 4.3.1, another advantage of the use of a consistent layout (outer consistency) was highlighted. As opposed to the random layout, using the telephone layout allowed for significant learning effects. That is, participants employed advanced strategies that they improved over time which resulted in a significant improvement of authentication performance. Additionally, the overhead created by the preparation phase of the random layout is nearly 100% as shown in chapter 4.3.1. Also the error rates caused by memorability issues did not improve by training when using the random layout as opposed to the other layouts as shown in figure 4.8. This can be attributed to the lack of inner consistency as well and is a problem for a time-sensitive task such as authentication. Due to the lack of inner consistency, randomized systems do not cope (or work together) with the diverse strategies that users employ (like using shapes and the like). This was confirmed by the results of the final questionnaire. Especially trained users noted that they used visual or muscle memory learning strategies which are superior to plain memorizing. Thus, it has to be considered an important criterion of authentication mechanisms for public spaces to provide consistency (of some kind). This helps to improve both, performance and memorability due to muscle memory effects that can improve memorability even after a long period of non-use [51]. For PIN-entry, it is really helpful that positions can be easier remembered than distances [64].
Another interesting finding on consistency issues was on the performance of trained versus un- trained users. Even though no significant differences between those two groups were found, the results of active authentication times show that especially for the random layout, trained users
have a strong tendency to be slower. For the random layout they performed on average 0.41 sec- onds worse then untrained users. This can be explained by a higher degree of confusion caused by alternative layouts in the trained group compared to the untrained users and different memory strategies. The preparation speed of trained users on the other hand was faster for the telephone layout.
4.5
Lessons Learned
Only performing a long-term study like the one presented in this chapter helped to refine and specify two important criteria for the design of authentication mechanisms in general but specif- ically for public spaces: consistency and importance of in-depth time measurement.
Consistency is considered an important aspect in many areas of human-computer interaction, like in user interface design [112]. Even though sometimes seen as something negative [57], it has become a constant in software design. Talking about authentication mechanisms, consistency has never before been considered an overly important factor. Based on the results of the study, we argue that both inner and outer consistency are important factors for authentication mechanisms. Considering the fact that authentication is a time critical task that users never see as their primary goal (and simply want to get over with), it is even more important than for standard user interface design. A problematic finding was on inner consistency. If we consider randomization a bad de- sign choice, the most common tool of choice for providing security has to be critically reassessed. For instance, ColorPIN [32], as presented in chapter 3.4.1, uses randomization in combination with limited inner consistency. This way, participants in the study were enabled to use more elaborate strategies to perform the authentication which led to a remarkable improvement of per- formance. The conclusion simply has to be that the higher the inner and outer consistency, the better the performance of the system.
A thorough approach for measuring time as the one proposed in this work (see figure 4.2) has never been proposed before. Even though similar data is often collected, it has never been re- ported or analyzed. Reasons might be that its importance remains mostly hidden. Only an in- depth analysis of data collected over a longer period of time allows for completely revealing its benefits. In chapter 3.3.2, MobilePIN [29] and its evaluation are described in detail. Realizing that the time needed to connect the mobile device to the public terminal and that this time has to be counted as part of the authentication time (despites cases in which the device has to be connected for the actual interaction) gave first indications that current time measurements had to be rethought of. In the scheme presented in this chapter, the time required for connection has to be categorized as preparation time. That is, an action that has to be performed before the actual authentication task can be started, but without starting it is not possible. The long-term study showed that precisely measuring authentication speed is also important for cases in which no obvious actions (like connection) take place and still provides important insights.
The long-term study on keypad layouts helped to refine these two criteria. However, both are highly technical. Consistency influences the (interface) design of an authentication mechanism
already in an early stage. Authentication speed measurement on the other hand takes place in a later (or final) stage of the development process. The open gap that remains is on how “human setups” influence the design and evaluation of criteria in contrast to technical issues. Formulated as a question, this means: “What are behavioral criteria that influence the design of authentica- tion mechanisms?” To fill this gap, we conducted a long-term field study on real use of public terminals (ATMs) which will be presented in the next chapter.
Chapter
5
Authentication in the Wild
Man is many things, but he is not rational.
– Oscar Wilde –
If we are not rational, then how can we expect from each other to behave securely? This is a very important question in general but especially when it comes to the security of computer systems. As a conclusion of this simple quote, researchers working on usable privacy and security have to deal with irrational users and irrational behavior. Education and training are often considered solutions for this problem and there is proof that they can sometimes work [79, 119]. A more promising approach, however, seems to be to solve these problems at a user interface level rather than shifting the responsibility to the users [111].
To find appropriate solutions for irrational behavior, it is important to know what exact behavior causes problems. Due to the sensitivity of such data, there is only little knowledge about be- havioral factors that influence the design of secure authentication mechanisms for public spaces. Such factors can influence performance like authentication speed or acceptance, but they can also directly harm the security of proposed systems. Perceived levels of privacy, intimacy and security, time pressure and anxiety were previously identified as important factors influencing the decision whether or not to use an ATM [83, 84]. However, this data is usually based on different kinds of interviews and does not give insights on actual performance at and use of public terminals. The approach taken in this chapter goes one step further. We performed a series of field observa- tions of ATMs to explore how users actually interact with them and to find out about the influence on different behavioral factors [33]. Field studies, have the ability to uncover facts that would remain hidden otherwise (e.g. [63, 100]), for instance, since interview partners might not admit or think about them. The main focus of the observations was on the ATM authentication process,
i.e., how people enter their PIN, whether and how people protect their PIN-entry from skimming attacks, and what contextual factors affect security and secure behavior.
After analyzing the first field study, we conducted two additional follow-up studies: A second field observation with the focus on obtaining more detailed interaction times, and an additional set of interviews in public spaces in order to ground some of our findings. All results of the studies point to one important conclusion: behavior is mostly irrational and seldom secure. Besides this, the studies gave deep insights on the following1:
a) The observations provided various insights on how users really interact with authentication mechanisms in public spaces. Especially findings on insecure behavior were completely unknown or unpublished before and quite surprising.
b) Based on these insights, a number of behavioral criteria were derived that directly affect the design and evaluation of authentication mechanisms for public spaces.
c) The study allowed us to derive several lessons learned and recommendations on how to perform field studies about the use of privacy and security relevant technologies. Details on these results can be found in [33].
5.1
Field Study Methodology
To optimize validity, the field observations were performed in six different locations in two cen- tral European cities, Munich (Germany) and Delft (the Netherlands). We chose ATMs that were available 24 hours a day, seven days a week. Due to legal issues, they had to be located out- side. Additionally, this allowed for unobtrusively observing the actual ATM interactions. The observation method will be presented later.
The data of the primary field observation was collected over a period of nearly two months. The minimum number of visits per ATM was at least four times, with at least one observation session on a Sunday and at least one session during “rush hour” (i.e., mid-mornings, noon, or early evenings). This was necessary to ensure that the data collected was as broad as possible and did not, e.g., only include off-peak times, which could have biased the results. Rush hours and off-peak times were identified in pre-observations. Depending on the location (for instance, one was close to a supermarket) these times differed not only between cities, but also between locations within the cities. For instance, the rush hour close to a supermarket was between 5pm to 7pm while the rush hour at an ATM in a pedestrian area with shops and restaurants was during lunch time (around 1pm).
During the pre-observations, we noticed that terminal software can significantly differ from one bank to another and have influence on the performance. Therefore, we also made sure to observe a variety of ATMs from different banks (six banks in total) to avoid influences of the software
on the results. Each bank within the study used different software. At each ATM, 60 users were observed, resulting in an overall data set of 360 users, which were collected during 44 observation sessions. 199 of the observed users were male, 161 female.
All observations were performed and recorded by the one and the same researcher. This was necessary to keep the data comparable, since different people might apply different standards during the observation, deliberately or not. Even though multiple observers might have reduced the risk of accidentally missing data, we opted for this solution since we rated consistency over efficiency (speed of collecting the data).
The most undesirable influence on the data would have been if users would have realized that they are being observed. Therefore, in order to remain unobtrusive during observations, we chose ATMs that were visible from public outdoor seating areas, i.e., street cafés and restaurants that had tables in appropriate positions outside. Surprisingly, a large number of outdoor ATMs that we found were actually close to such spots. Thus, finding appropriate locations was not an issue. Considering these precautions, it is very unlikely that the observer did arouse suspicion amongst ATM users. Additionally, the single observation sessions were kept rather short to minimize this risk.