CAPITULO I. TRAYECTIORIA PROFESIONAL
1.4 Rotación de Cirugía General
1.4.2. Caso 8
2.3.1 Background
Technological device fingerprinting relies on measuring the small differences present in each device which makes it distinguishable from the other devices of its type. It has been long established that devices such as Cameras as described in [22, 73] and typewriters as in [58] can be distinguished from other similar devices through fingerprinting. In [29], Peter Eckersley investigated the degree to which modern web browsers are subject to such fingerprinting by analysing the information sent to websites upon request. By introducing the concept of fingerprinting to distinguish between web browsers Eckersley has thus set
the scene for the identification of individual users through data extracted from their web browsing activities. Indeed, an individual user is identified through their browser history, i.e. the list of URLs they have been browsing which are surely unique to them, just as much as their biological fingerprint is [55]. In the same article, i.e. [55], Brian Hayes explains how we now also have what he refers to as “data identity”, defined by various combinations of traits that distinguish us from anybody else on the planet. This idea about data identity was well supported by the work carried out by Sweeney of Harvard University [98] in which she showed how, by using only a small set of simple demographic information such as the date of birth, the zip code, and the gender, we can identify an individual from the rest of the population. Furthermore, the authors of [112] describe a system called WiFi-ID which extracts unique features that capture the walking style of a person, and thus allow for the unique identification of such an individual, by analysing the channel state information.
2.3.2 Uniqueness of Mobility Traces
In [27], Yves-Alexandre et al. proposed a formula that determines the uniqueness of indi- vidual mobility traces. A key result of their work is that they showed that the uniqueness of human mobility traces is high and that individual users mobility data are likely to be identified using information about only a few outside locations. In the same research work, i.e. [27], Yves-Alexandre et al. further showed that only four spatio-temporal points are enough to uniquely identify 95% of the users in the large data set that they used for evaluation. This means, if a user u visited the set of locations {a,b,. . . ,z} then only four of these locations would be enough to prove the uniqueness of the mobility traces of u. This is very much consistent with our finding presented in [32] and in Chapter 4 which provides a detailed discussion about the uniqueness of the individual users’ mobility fingerprints. However, our work differs substantially, because in addition to creating a unique user pro- file, we can also employ such a profile to identify the user from a short record of observed movements; for example, if {e,f ,g,h} denotes some observed mobility trail, then we can employ the fingerprint constructed from the user’s historical record of visit (e.g. to the locations {a,b,. . . ,z}) to correctly predict that the observed trail was created by the user u. A substantial part of Chapter 4 of this thesis is dedicated for the investigation of the relationship between the user identifiability and the fingerprint uniqueness as well as the implications when the fingerprint is compressed.
2.4
Detection of Mobile Users Social Grouping by using Wi-
Fi Activity Traces
Studies involving Wi-Fi networks data analysis can be divided into two broad categories: descriptive versus predictive analysis. While descriptive research on characterisation of user mobility in Wi-Fi networks explore various features such as the time duration the user spends connecting to an Access Point (AP2), and the amount of data a user sends and receives over the network, predictive studies can be classified according to the mod- elling approach adopted in such studies. Amongst the common modelling methods utilised in previous studies are: clustering [65, 103], Support Vector Machines (SVM3s) [74] and Markov models [69, 72, 107]. In this section we review both the predictive and the descrip- tive research works that are available in the literature focusing specifically on the social dimension of the human presence within an academic institution.
2.4.1 Social Groups of Mobile Users
Using data collected from a hundred mobile phones over a period of nine months, the authors of [28], proposed a system for complex social systems’ sensing. They were able to detect social patterns in daily user activity, infer user relationships, discover socially signif- icant locations, and thus model the rhythms of observed organizations by using standard bluetooth-enabled mobile phones. Static bluetooth device IDs were used as an additional indicator of location, and this was shown to provide a significant improvement in user localization, especially within indoor environments such as an office building. The authors of [59] proposed a method for extracting interaction patterns and social behaviour of mo- bile users by using passive WiFi monitoring of probe requests and null data frames that are sent by smart-phones. They are able to discover proximity relationships, occupancy patterns, and social interactions among users by analysing the temporal and spatial cor- relations of the Receive Signal Strength Indicators (RSSI4) of packets from these low rate transmissions. Although results of conducted tests, which used commodity off-the-shelf smart-phones and WiFi Access Points, demonstrate that the proposed method is capa- ble of detecting social relationships and interactions in a non-intrusive manner, the study was conducted on a very limited scale. In [30] and in Chapter 6 of this thesis, namely in Section 6.5, we discuss a method for detecting classroom friends by using a data set representing a full snapshot of Wi-Fi usage of a whole university for a period covering a
2Access Point 3
Support Vector Machine
4
full academic term.
2.4.1.1 Attendance of Learning Activities
In [97] an occupancy sensing system for a real university campus environment was pro- posed. The researchers conducted a lab experiment in order to evaluate various commercial sensors in terms of cost, ease of operation, and accuracy. Deploying beam-counter based system in 9 real classrooms of varying sizes across their university campus, they collected data over a period of 12 weeks covering more than 250 courses. Employing detected course attendance patterns and classroom occupancy, they developed an off-line method that dy- namically allocates courses to classrooms, and thus they managed to make gains of over 50% in room related costs. In [76] the authors explored the use of Wi-Fi for estimating attendance in a dense university campus environment. They proposed new methods for distinguishing and filtering out WiFi-connected users outside an observed lecture room, and feed such data to a regression model in order to estimate room occupancy. The authors of [85] analysed data from a Wi-Fi network at technical university using different granu- larities (each individual access point, groups of access points, entire network) in order to study the network usage. Their work investigated whether students attending a lecture use the wireless network differently in comparison to the way students not attending a lecture do. By employing a supervised learning approach based on Quadratic Discriminant Anal- ysis (QDA5) they are able to classify rooms into empty and occupied spaces. Although
the proposed method can detect room occupancy, i.e. rooms being empty or occupied, it falls short in detecting attendance of lectures as it has no means of tracking individual student’s class attendance. In Chapter 6 of this thesis, namely in Section 6.4, we discuss a method for estimating class attendance by tracking the attendance of individual students over the course of a given academic term of 11 weeks. In [119] the researchers attempted to measure students’ behaviour in classroom-based courses in a large-scale study. They proposed a system, called EDUM (EDUcation Measurement) to characterise educational behaviour at a large university campus. They investigated a number of behaviours includ- ing class attendance, and late arrival to lectures as well as early departure. Their research work had some interesting findings; for example, they detected class attendance and what time of day it reaches its highest and lowest levels, the most hard-working day of the week by using measures such as the attendance ratio and the late arrival ratio. While their proposed method employs data from multiple sources including Wi-Fi data, in Chapter 6 of this thesis, we discuss how we detected class attendance by inferring session attendance
5
utilising patterns extracted only from Wi-Fi activity traces. Moreover, the ability to filter noise, i.e. bystanders (individuals who are not part of the intended class but nonetheless appear to be part of it), is a key factor in developing a successful method that can detect the attendance of an observed class. In the same chapter, namely in Section 6.4.3, we discuss two methods for noise removal: Noise Reduction and Attendance Coherence. In [119] which employs data from multiple sources, the removal of noise merely depends on how far a connected mobile device is located from the Access Point.
2.4.2 Spatial Classification
Unfortunately, the research in space-based modelling (i.e. models that focus on space) of the human presence and movement behaviour falls short in devising laws that describe space patterns [109]; for example, how the interactions and occurrences of activities are timed in spatial distribution. Modelling space from the perspective of time allows for the spatial organizations and temporal ordering of spatial functions [109]. Due to the lack of research contribution, we do not have a good theoretical understanding of this area [109]. However, in Chapter 7 and in [33], we investigate the hypothesis that the distribution of a social group inter-visit duration, i.e. the waiting time between visits made by the same social group, approximately follows a uniform distribution for locations where formal activities, such as attending a meeting or a learning session, take place. We developed a model that learns a spatial classification in which the type of an observed location is predicted based on the patterns of inter-visits durations of detected social groups. The details of this model is discussed in great detail in Chapter 7 of this thesis.