C. La impugnación de los actos realizados por el guardador de
2. Acción de impugnación de los actos de injerencia del guardador
modality, but on average their best performances with the use of a SVM classi- fication algorithm are between 92.64 ± 0.74% and 95.74 ± 1.15%. Furthermore, the low percentage of support vectors found in the best models indicate that the problem is not difficult from the point of view of machine learning.
A different experiment concerning the use of SVM is [1], with the particular goal of analyzing the best placement of four sensors and the variability of training data along different days with reference to different positions of arm and forearm during the recordings. The results show that the use of four correctly placed electrodes and a slight signal pre-processing can give good result of classification. Therefore, focusing on the correct experimental procedure, high classification ac- curacies can be obtained with a SVM classifier using four electrodes, and thus is not necessary to look for more complex algorithms with more electrodes.
A further development based on [1] is presented in [17], where an hybrid EMG classifier is proposed by combining a Support vector machine and an Hidden markov model (HMM). In particular, HMM is used to distinguish between steady- state signals from transient one, and then SVM is used to classify the EMG signal during steady-state. The reason why HMM is introduced is to allow the classi- fication of transients, not made possible by a time depending algorithm. In conclusion, the results of the experiment show that an increase on the gesture classification higher than 12% is reached by the hybrid approach.
2.3
Clustering
Clustering is a method of exploratory data analysis based on the grouping of data according to a certain notion of similarity between them. Its aim is to con- struct groups of data (clusters), such that data in the same cluster share similar characteristics, while data in different clusters are dissimilar. It is a technique used in many fields, as statistical data analysis, machine learning, pattern recog- nition and data compression.
One approach can be a statistical method, based on the assumption that there is a probabilistic model that generates the data points, while one of great interest is the similarity-based method. It defines a similarity function between pairs of data points and formulates a criterion based on it, so that the clustering method must optimize. The central point for clustering optimality is therefore the definition of a ’good’ similarity function.
In particular, there exist two main categories of clustering: partitioning methods and hierarchical methods. The first construct k clusters such that for each cluster
there exists at least one element, and each element belongs to one and only one group. These requirements may be summarized as follows
(i) ∀Ci, ∃xj ∈ Ci, ∀i = 1, ..., k
(ii) ∀i, j ∈ {1, ..., k}, Ci∩ Cj = ∅
where Ci is the i-th cluster, X = {x1, ..., xN} is the set of observations to be
grouped in k clusters.
The hierarchical methods instead build a hierarchy of clusters, on the basis of the directions followed in the construction: there is an agglomerative approach, if a bottom-up strategy is followed, or a divisive approach, if the top down strategy is used. In general, the choice of the kind of clustering depends on the structure of data available and on the specific purpose of the study.
In this Section we focus on one of the most used partitioning methods, namely the K-means clustering, which is more suitable dealing with large datasets.
2.3.1
K-means algorithm
Let Y = {xn}n=1,...,N be a set of N realizations/samples of a random variable
X in RD. As first assumption, we consider that the number K of clusters is given
by the user.
Definition 2.4 (1-of-K encoder). We call 1-of-K encoder the function rnk ∈ {0, 1}
which assigns each data point to one cluster. Equivalentely,
rnk = 1, if xn∈ Ck 0, otherwise with Ck denoting the k-th cluster.
As we said above, the idea of the k-means is to assign both xi and x0i to the
same cluster if the similarity distance between them is small enough, otherwise to different clusters, and repeat this procedure for all the pairs of data points. In the specific, the k-means is characterized by the use of the squared Euclidean distance as similarity distance, therefore, denoting µkthe prototype of cluster Ck,
we may introduce an objective function as
J (rnk, µk) := N X n=1 K X k=1 rnk||xn− µk||2 (2.21)
It is a summation over all the data available of the sum of the squared Eu- clidean distances between each data point to its prototype µk. Our goal is to find
2.3 Clustering 33
the values of {rnk} and {µk} that minimize Eq. (2.21). It can be obtained with
an iterative procedure where, for each step, two optimization problems have to be solved, one concerned {rnk} and the other concerned {µk}. Given some initial
values for the µk, the algorithm can be summarized as in the following table:
K-means algorithm Repeat until convergence:
1. Fixed µk, solve min rnk J (rnk, µk) 2. Fixed rnk, solve min µk J (rnk, µk)
Table 2.1: K-means algorithm iterative scheme
On the one hand, to execute (1) in Table (2.1) we can use the linearity of Eq. (2.21) with respect to rnk. We can optimize for each n separately, assigning each
xn to the closest cluster center, according with the formula
rnk = 1, if k = arg minj||xn− µj||2 0, otherwise (2.22)
On the other hand, as Eq. (2.21) is quadratic with respect to µk, we can differ-
entiate and pose the result equal to zero, as follows ∂J (rnk, µk) ∂µk = 2 N X n=1 rnk(xn− µk) = 0 ⇔ µk = P nrnkxn P nrnk (2.23)
It can be observed that the denominator of Eq. (2.23) is equal to the total number of data assigned to cluster k, because of its definition. Therefore Eq. (2.23) is an estimate of the mean value of data belonging to the k-th cluster.
Keeping in mind this observation, the functional cost in Eq. (2.21) can be seen as the sum over the total number of clusters of the squared Euclidean distance between each point and the estimate of the mean value of the cluster. In other words, it is a linear summation of the estimates of the variances associated to each cluster J (rnk, µk) = K X k=1 σ2k
The two steps illustrated in Table (2.1) are reapeated until the total number of iterations is reached or when no more assignment is possible. Usually, the algorithm used for solving these optimization problems is the gradient descent, but it may occur that J converges to local minima rather than global minima. An important observation may be done about the choice of the initial prototypes µk.
If they are deliberately chosen by the user, the algorithm may take several steps to reach convergence. It is suggested in literature that the best choice for improving the running time and the quality of the final solution should be a random subset of k data points, as implemented in the kmeans algorithm for Matlab [?].
Observation 6. In this study, we use the K-means algorithm as a supervised method for individuating exactly a number of clusters equals to the number of gestures an artificial device should have to reproduce. Therefore, we do not mention the techniques most used for the choice of the number of clusters.