• No se han encontrado resultados

6. DIAGNOSTICO DE LOS PROCESOS DE ALISTAMIENTO Y

6.3. ANÁLISIS DE LA INFORMACIÓN PROCESO DE ALISTAMIENTO

Our objective is now to reduce the feature set. We propose to use the Fisher Linear Dis- criminant Analysis is then used to select features and to reduce further the dimension of the classification model.

N N

Figure 4.9: The classifier should be able to make a difference between the two signals above in order to find the right time to react to an event detected by the f/t sensor.

4.4.1 Fisher Linear Discriminant Analysis

Fisher-LDA is a tool used for classification and dimension reduction. One of the most popular tool for dimension reduction is the principle components analysis (PCA). The two will be compared and we will discuss why LDA is preferable for our problem. PCA is an orthogonal linear transformation that transforms the data to new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. Suppose that we have a data set X ∈ RN×P, in which xi∈ RP, 0 ≤ i ≤ N denotes one data point, with zero mean (for data set with nonzero means, they can be normalized). So the N rows represent N data points, and a data point includes P features. The mapping is defined as finding a matrix of projection W , by which:

Y = W X (4.7)

such that the individual variables of Y considered over the data set successively inherit

the maximum possible variance from X, with each loading vector wi in W constrained to

be a unit vector. PCA is an unsupervised technique and, as such, does not include label information of the data when performing dimension reduction. LDA, instead, does not try to maximize the variance of the whole data set, but try to project the data set, so that the classes of data are more separated. As PCA and LDA can both be seen as a rotation of the coordinates in which data are presented, the difference can be summarized as in Figure

4.11.

Fisher-LDA considers finding the best weight w by maximizing the objective:

J(w) = w TS Bw wTS Ww (4.8) where SBis the “between classes scatter matrix”, and SW is the “with classes scatter matrix”.

Figure 4.10: Part of the WPT tree plotted for the signals in Fig.4.9. The first subbands (highlighted by red rectangle) capture the peak in signal (the peaks are in white color). The WPT is computed with Symlets 5 wavelets (sym5). The zeros in the tree show also the sparsity of the WPT.

The value of J means the separability of classes for a set or a subset of features. The two values are computed as:

SB =

c (µc− ¯x)(µc− ¯x)T (4.9) SW =

c

i∈c (xi− µc)(xi− µc)T (4.10)

where ¯xis the overall mean of the data points, c denotes the class label, and µc the mean value of the in class data. An important property about the projection J is that it is invariant w.r.t. rescaling of the vector w. This is useful because we can always chose w such that wTS

Ww = 1. The problem of the maximization of J is then transformed into an optimiza-

tion problem defined as:

minw − 1 2w T SBw (4.11) s.t. wTSww = 1 (4.12)

4.4.2 Fisher LDA for Feature Selection

Here, we define the criterion to select the best features in our feature set of the relative energy map E. The criterion is reformulated into equation4.13. It is calculated for every feature in E to evaluate the separability measure [Gu 11] of that feature. As WPT produces

Figure 4.11: Difference between PCA and LDA: LDA maximize the separability between classes, while PCA projects data while maximizing the scatter of all data points.

a sparse representation of the signal, which means many terms in the tree are nearly zero, it is reasonable not to include those features, which are invariant between classes.

J= tr(S−1W SB) (4.13)

Compared to the definition4.8, we will see that this defines the separability, and omits the projection vector. The fact that we have not simply used LDA as classifier is because our problem is not linearly separable, and also because of a lack of data. To find a classifier for an overlapping data set, of which few prior information is lacking, we choose a kernel based classifier. Then why does this criterion make sense while we use another tool for classification? As we want to reduce the size of the feature set, we observed that most features are zeros, which is reasonable because WPT produces a sparse representation of the original signal, and so the energy map. However for this sparse representation, the zero terms are not really zeros, because of the noise in the signal. In the results, we will see that the significant terms are very evidently distinguished from the near-zero terms, if measured by the LDA criterion.

During the object exchange manipulation, the orientations of f/t changes in different situations: the posture of the robot and of people, the exchange positions, and more. Not including orientations for the event detection simplifies the problem. In fact, the f/t are expressed in the local frame of the sensor, which makes the orientations more difficult to use for the classifier. Learning a model with orientations will also be much more complex

and produce a model of much higher dimension, resulting in a slower classifier. While the orientations of f/t are excluded from the feature set, the feature set building process can be summarized as:

• The WPT tree representation is built from force F and from torque T , F is computed as F = ||Fx + Fy + Fz||, and T is computed as T = ||Tx + Ty + Tz||.

• The relative energy map is computed for the WPT.

• It is then combined with a subband from the tree. The subband is also scaled to range from 0 and 1, matching the same as relative energy map.

• Fisher LDA criterion is used to reduce the size of the feature set of energy map E and the level from the subband S selected as a compromise between the classification accuracy and feature size.

Documento similar