HIPÓTESIS
3. RESULTADOS Y DISCUSIÓN
3.3 Determinación de polimorfismo del gen ATP1A1
After the extraction of stay points, we built the similarity matrix using the geodesic distance based alignment kernel. These stay points served as the symbols for activ- ities in our traffic sequences. We calculate distance matrix between daily activity sequences from user with pairwise sequence alignment. Then, the sequences were clustered based using DBscan [16]. The sequences of each clustering were addition- ally aligned and we produced traffic logos for them shown in Fig. 4.5, Fig. 4.6 (a-d) and Fig. 4.7 (a-b). For the sake of visualization of time information, we labelled the time of stay points in every sequence with M-Morning (before 9:00 Am), D-Day (9:00 am to 6:00 pm) and E-Evening (after 6:00 pm). In the traffic logos we used different colours to indicate these time labels, namely green for morning, yellow for
daytime, and red for evening. Furthermore, we have used two different formulas for the hight calculation of each individual symbol in the logos, i.e., (a) relative entropy of the symbol and (b)relative entropy divided by base frequency. (b) is also derived from biological sequence analysis where the importance of a symbol at a a particu- lar position is visualized by changing the entropy calculation (diving the entropy by overall base ferquency). As a result, symbols which are rare in the whole sequence and predominantly appear at a certain position are more pronounced. The same logic applies to traffic sequences.
As one can see, traffic logos show a very dense and illustrative view of clusters for user’s daily activities. Fig. 4.5 describes the logos for the whole data set: (a). Traffic logos demonstrate the mixed patterns of activity for the whole data set. For example, staying at home in the morning and evening is more certain than going to work (presumably, due to non work days at the weekend. Similarly occasional shopping in evening (red ‘A’ and ’R’ symbols) is alternatively done with still less frequent Tennis in the evening (red ’T’ symbol). Occasional swings in the normal routine, i.e., early and late work going routine (red and green ’W’s) can also be picked up. Furthermore, a yellow ’B’ followed by ’A’ and ’R’ hints at a weekend routine of going to ATM machines in banks and then shopping. These activity sequences will be further segmented to enhance the visibility of each pattern. (b). The height of a symbol is divided by its over all base frequency to know its relevance to a certain position in the sequence, i.e., if a position only occurs at a specific sequence most of the times, then its height will be increased. Consequently, this gives us an activity-versus-position binding which in the context of analysing preferred order of user routines can be very useful. For example, going to bank is mainly done as first activity on the weekend during day time, Tennis comes out only as an evening hobby and ’O’ (friends and city center) is mainly carried out in the night. Furthermore, going to shopping is quite a routine in the evening and rare in daytime, therefore, its affect is accentuated in day time and nullified in evening. Notice that ’Traffic Logos’ a give a compact representation of user activity patterns after approx 99% compression, i.e., almost all of the semantic information present in the detailed activity analysis (see Fig. 4.10) of the raw data can be described through logos.
Fig. 4.6 describes the cluster wise logos after alignement of sequences in each cluster, e.g., (a) Cluster 1 is the most frequent cluster in the data comprising around 40% of user’s routine days. This cluster is composed of one accentuated pattern of Home (Morning)—Work (Day Time)—Home (Evening) with a occasional deviations from the routine like Early or Late Office going and leaving routine. A small ’O’ at
4.2. Trajectory Clustering (for User Activity Analysis) 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 1.8 x 104 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2x 10 4
(a) Street Network Map
REAL(Shop2) Home Work Tennis ALDI(Shop1) City Center Bank
(b) Labelled Stay Points
Figure 4.4.: Convex hulls of labelled stay points (blue polygons over gray-edged street network). The main stay points are Home, Office, Bank, Ten- nis, Shopping-1 (ALDI), Shopping-2 (REAL) and City Center.
Position in Sequence Entropy 0 0.5 1 1.5 2 0 0.5 1 1.5
(a) Sequence logos for the whole data set
Position in Sequence Entropy 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0
(b) Sequence logos for the whole data set
Figure 4.5.: Traffic logos for stay-point based sequences. x-axis denotes sequence positions and y-axis denotes ’information’ present in each column. The symbols in the figure denote labels of activities based on stay points i.e: H denotes staying at Home; W–Working; A–shopping at ALDI; R– shopping at Real; B–getting cash from Bank, T–playing Tennis and O denotes Other activities for leisure (i.e., city center roaming and visiting friends). Colours of symbols denote the time of day, i.e., green denotes Morning (before 9.am); yellow–daytime (9am-6pm) and red — evening (after 6pm). The height of a symbol denotes the certainty of an activity at the given day time and position in the data.
the end presents occasional tendency to go ’Out’ in the night. It becomes readily clear that this cluster encodes the daily routine of staying at Home in the morning with a higher certainty and then going to office early or staying at home with a very
Position in Sequence Entropy 0 0.5 1 1.5 2
(a) Work Routine(87 objects)
Position in Sequence Entropy 0 0.5 1 1.5 2
(b) Work & Shopping(68 objects)
Position in Sequence Entropy 0 0.5 1 1.5 2
(c) Weekend Routine(34 objects)
Position in Sequence Entropy 0 0.5 1 1.5 2
(d) Work & Sports(26 objects)
Figure 4.6.: Traffic Logos for the after segmentation of daily activity sequences. Ev- ery segment (or cluster) describes one of the possible routines that user follows in her daily life. The symbols in the figure denote labels of activ- ities based on stay points i.e: H denotes staying at Home; W–Working; A–shopping at ALDI; R–shopping at Real; B–getting cash from Bank, T–playing Tennis and O denotes Other activities for leisure (i.e., city center roaming and visiting friends). Colours of symbols denote the time of day, i.e., green denotes Morning (before 9.am); yellow–daytime (9am-6pm) and blue-evening (after 6pm).
small possibility. In the daytime, the user goes to work with a very higher certainty and comes back around 6pm with a small possibility of staying at work. The small ’O’ at the end of the logo describes a small possibility of going for other leisure activities (city center roaming or visiting friends). (b). Cluster 2 comprises around 25% of user’s routine days. This cluster is composed of one accentuated pattern of Home (Morning)—Work (Day Time)—Shopping (Evening)—Home (Evening). This is a quite similar daily routine to (a). There is, however, an important difference. The user shops at either of two shopping centres (ALDI, and REAL) after work. (c)
4.2. Trajectory Clustering (for User Activity Analysis) Position in Sequence Entropy 0 0.5 1 1.5 2
(a) Work Routine(87 objects)
Position in Sequence Entropy 0 0.5 1 1.5 2
(b) Work & Shopping(68 objects)
Position in Sequence Entropy 0 0.5 1 1.5 2 0
(c) Weekend Routine(34 objects)
Position in Sequence Entropy 0 0.5 1 1.5 2 0
(d) Work & Sports(26 objects)
Figure 4.7.: Enhanced traffic logos for visualizing the importance of a user activity w.r.t. a particular position in her activity routines. X-axis denotes sequence positions and Y -axis denotes ’information’ present in each col- umn. Every cluster describes one of the possible routines that user follows in her daily life. The symbols in the figure denote labels of activ- ities based on stay points i.e: H denotes staying at Home; W–Working; A–shopping at ALDI; R–shopping at Real; B–getting cash from Bank, T–playing Tennis and O denotes Other activities for leisure (i.e., city center roaming and visiting friends). Colours of symbols denote the time of day, i.e., green denotes Morning (before 9.am); yellow–daytime (9am-6pm) and blue-evening (after 6pm).
describes a cluster which is possibly weekend-routine since there is not high W(ork) symbol at all. So on the weekend, the user stays at home in the morning and then gets cash from bank with a small probability. Afterwards the user shops from ALDI- then-Real or only REAL during the day time. Then she comes back home and stays. However, with a small possibility, instead of coming back to home after shopping, she chooses to do describe an occasional tendency (yellow and red) ’O’s to do Other
leisure activity like city center roaming or a visiting a friend at weekends (later in the day or evening). (d). A small percentage of user days are composed of tennis playing hobby along with regular routine, i.e., Home (Morning) — Work (Day)— Tennis(evening) — Home (evening)
Fig. 4.7 describes the enhanced traffic logos for visualizing the importance of a user activity w.r.t. a particular position in her activity routines. Roughly speaking, this figure gives an activity-versus-position binding which in the context of analysing user routines can be very useful. For example, the rare position specific routines of going to bank early in the day on weekends, visiting friends in the evening and playing tennis in the evening after some work days is pronounced in comparison to Fig.4.6 and gives an idea about the preferences and order of specific activities in the data.This is an affirmative answer to questions (Q1) and Q3).