CONTROL DE LA ACCIÓN DEL GOBIERNO - BOLETÍN OFICIAL DE LAS CORTES GENERALES

3.4.2.1 Detecting Location Context using Term Weighting

It is noted that this subsection is mostly from one of our papers as Qiu et al. (2010) which is an output of author’s PhD studies.

We used the term weighting technique to identify significant locations in this study. It is faster and less reliant on rules than other clustering based techniques. As mentioned earlier there are a number of components of a commonly used term weighting scheme such as TF*IDF. By employing the components of TF*IDF term weighting (TF, DF, IDF) four weighting techniques can be defined for identifying the important locations from personal location logs. In this study, the author was interested in locating a number of important location types:

 Home/Work locations

The locations we most frequently visit, would be important for many locations applications. The volunteer user had purchased a new home, so was expected to locate in both homes.

 Social locations

These are the locations that are most similar and the locations that we attend periodically, but not every day. It is expected that the system would be able to identify important social locations from the archive automatically. These social locations are those which the individual returns to again and again, such as family home, a relative’s home and socialising locations.

80  Travel, extended visit locations

These locations are the places where the individual has spent some time, but visits do not reoccur so frequently. For example, holiday locations or work travel locations.

 Pass-through locations

These locations are the places where we pass through often but rarely stop. These are exemplified by short linger/stay durations which occur frequently. An example of these locations is the shops that we pass through on our way to work every day.

The location of user is important to classify users’ activities. For example when a user is going to a shop, the activity is more than walking, but shopping. This experiment will combine the activity data and location data to get more detail of activities. In the experiments, the important location region and the time the user stays there will be considered to decide the important moments.

3.4.2.2 Detecting Activity Context using SVM

It is noted that this subsection on the application of SVM in detecting activity context is mostly from one of our papers as Qiu et al. (2011) which is an output of author’s PhD studies.

Four activity contexts detected automatically by the system are sitting/standing, lying, walking and driving. The steps for detecting activity context using SVM are shown in Figure 3.10. In the process, the training data is classified into two classes (binary classification) for each activity. Following that, the optimal parameters for

each of the four activities are identified. The optimal parameters and training data are used to train the classification model for each activity. Each of the four models is then evaluated using five-fold cross-validation.

Figure 3.10: Process of classifying raw acceleration data into user activities

Source: Qiu, et al. (2011) A number of attributes are used as input to the activity classifier, and these are described in detail below.

 Raw acceleration data: Raw data can be used to judge the posture of the mobile device. Due to gravity, the value of the accelerometer axis is about 1G. For example, when the user lies down, the horizontal axis’ value decreases while the longitudinal axis’ value increases.

 Standard deviation: This attribute is used to calculate the strength of activities. If the accelerations change rapidly, there is a strong likelihood that the user is walking or driving rather than sitting/standing or lying.

 Range: This attribute can be used to better distinguish driving from walking. When the user is driving, the Standard Deviation may be the same as walking.

However the range of values which change is smaller than for the walking activity. For example, when the user is walking, the maximum acceleration in the y direction can be 5 while it will be 3 when the user is driving.

Because accelerations were collected from a 3-axis accelerometer, a total of 9 attributes (raw acceleration data, standard deviation and ranges for each of the three axes) are used for one reading of acceleration.

3.4.2.3 Segmenting Events using SVM

Following the four steps of SVM, the lifelogging system in this study segments the events from the raw data and the contexts extracted from sensors.

 Step 1: Choosing training dataset

After collecting sensor data and uploading it to the server side, the participants were asked to annotate the event boundaries. The dataset with the annotated event boundaries is chosen to be the training dataset.

 Step 2: Extracting the optional attributes of data

Based on the users’ and researcher’s own experience, some attributes are extracted from the contexts. Example attributes in this study are speed, signal strength change of WiFi hotpots, etc.

 Step 3: Training the classification model

Lifelog data collected in this study was generated by different sensors. The attributes extracted from sensor data have very different value ranges. Before being used they must be standardised. Standardisation is a very important step before training data. The main advantage of standardisation is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges. After

standardisation, all attributes are equal to the SVM. To identify a good C and a good , the two parameters for an RBF kernel we used the LibSVM to train our model.

 Step 4: Evaluating the classification

SVM usually evaluates the training dataset itself. In v-fold cross-validation, a training set would be separated into v subsets of equal sizes. Each subset is tested sequentially using the classifier trained on the remaining v-1 subsets. In this study, we adopted five-fold cross-validation. To evaluate the effectiveness of SVM on event segmentation, three different metrics were used: precision, recall, and F1-Measure (i.e. a single measure that incorporates both precision and recall as defined in Section 5.6).

In document BOLETÍN OFICIAL DE LAS CORTES GENERALES (página 71-200)