Modelling temporal patterns using soft computing techniques. Application to the analysis of human body movements

Texto completo

(1)Departamento de Tecnologı́a Fotónica FACULTAD DE INFORMÁTICA. MODELLING TEMPORAL PATTERNS USING SOFT COMPUTING TECHNIQUES. APPLICATION TO THE ANALYSIS OF HUMAN BODY MOVEMENTS. Autor: Gonzalo Bailador del Pozo Ingeniero en Informática. Director: Gracián Triviño Barros Doctor en Informática. Año 2011. PRESENTADA PARA LA OBTENCIÓN DEL TÍTULO DE DOCTOR EN INFORMÁTICA EN LA UNIVERSIDAD POLITÉCNICA DE MADRID CAMPUS DE MONTEGANCEDO, BOADILLA DE MONTE, MADRID, SPAIN.

(2)

(3) TRIBUNAL DE LA TESIS: MODELLING TEMPORAL PATTERNS USING SOFT COMPUTING TECHNIQUES. APPLICATION TO THE ANALYSIS OF HUMAN BODY MOVEMENTS. Presidente:. Luis Pastor Pérez. Secretario:. Juan Antonio Felipe Fernández Hernández. Vocal 1:. Ángel Rodrı́guez Martı́nez de Bartolomé. Vocal 2:. Maarten W. Van Someren. Vocal 3:. Luka Eciolaza Echeverrı́a. Acuerdan otorgar la calificación de: Sobresaliente Cum Laude. Lugar y Fecha: Boadilla del monte, 30 de mayo de 2011.

(4)

(5) res y to d o. A M na is s on r m is a .A m su ar o ro n n Vi e s qu e n o so i ci c c u e q s nte an ( po r s ac ar me d e us c om e h eno pa ñ u sc q u er o s ho c del C EDIN T ud s b e do ar o n o l e p m r r kl o d g pj es), Ar nau d ( po e m a lab n r os e r E ñ a ás s). . mpa A mis co ilo anti guo s co mo up p a un un ami go q ue co mo a. A m i her m ar u co t i n a p o r a g i mp ét r y a o ñ i o m id e r b o s s ), ac Ja v p ca p i C ( u p s o e r on ro d e )y a vo f s ti da o r r r e e e s v ma l di g t l m u id o es i om qu e ct q ue m nu pr r ez ca r e e pa nto en o r a al (p o d s e e o n el ar r : s tavo nto u pi a G s o), S a l t e ed , r g a io ado rd ( t p od o r ha ber me ayud eja o s ad aq ue ll os qu e m e ay udar on. mi s an a p or an os co t a r m e e n l r s: se a A d lb e rt o (p o r r lit ar ra er a m y os t r i en a aq y p or de m mí (po u s en í r h to ab er con fia do e ro s ñ d t a a e la v mp nc co m eI pr i da. ex - co s) A is m m a vu ipio il ona c sie ra a sc e os ófi d o e l n e d t isc ld u sio nes fil a rm usie e s o c c a a t p o r ad la p or h o ( i ) y e m me No t e s ue is y ra q me con venci eron pa. pad n ua p oy o e dí u as m a l o s co n e ta n os m ), e q b u en o c o m i én d vo o i ue t ha el ti em el a ev be A p o es r nu st s rm s zi i ur e co or s ar i an ntra ta do) . A m p ( e (p o s, rt i on t e sp e lb e c A to o r ag a a cia lme nte ra ut do uan es mp rt sl o t a c cr o r s os tod as nues tra ib np fav irl o r es a ciá r a. G que le p ido). A. s.

(6)

(7) Table of Contents Table of Contents. i. 1 INTRODUCTION 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Objectives of the thesis . . . . . . . . . . . . . . . . . . . . . . . 1.3 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . .. 5 5 6 9. 2 SOFT COMPUTING 2.1 What is Soft Computing? . . . 2.2 Fuzzy logic . . . . . . . . . . . 2.2.1 Fuzzy sets . . . . . . . . 2.2.2 The components of fuzzy 2.2.3 Fuzzy inference systems 2.3 Artificial neural networks . . . . 2.3.1 Fundamentals . . . . . . 2.4 Evolutionary Computation . . . 2.4.1 GA concepts . . . . . . . 2.4.2 GA Operators . . . . . . 2.4.3 Genetic Algorithm . . . 2.5 Hybrid methods . . . . . . . . . 2.5.1 ANFIS . . . . . . . . . .. . . . . . . . . . logic . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. 3 STATE OF THE ART 3.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . 3.2 General schema of temporal pattern recognition methods 3.3 Classification of temporal pattern recognition methods . 3.4 Statistical approach . . . . . . . . . . . . . . . . . . . . . 3.5 Template matching . . . . . . . . . . . . . . . . . . . . . 3.5.1 Dynamic Time Warping . . . . . . . . . . . . . . 3.6 Structural approach . . . . . . . . . . . . . . . . . . . . . 3.6.1 String matching . . . . . . . . . . . . . . . . . . . i. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. 11 11 13 13 14 17 21 21 24 25 25 28 29 30. . . . . . . . .. 33 34 34 36 37 40 42 44 46.

(8) 3.6.2 Finite Automata . . . . . . . . . . . . . . . . . . . . . 3.6.3 Graph modelling . . . . . . . . . . . . . . . . . . . . . 3.7 Connectionist approach . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Time–Delay Neural Network (TDNN) . . . . . . . . . . 3.7.2 Jordan and Elman networks . . . . . . . . . . . . . . . 3.7.3 Continuous Time Recurrent Neural Network (CTRNN) 3.8 Sequential Pattern Recognition . . . . . . . . . . . . . . . . . 4 PREDICTION-ERROR-CLASSIFICATION 4.1 Introduction . . . . . . . . . . . . . . . . . . . 4.2 Global schema . . . . . . . . . . . . . . . . . . 4.3 Prediction-Error Block . . . . . . . . . . . . . 4.4 Predictor block using CTRNN . . . . . . . . . 4.4.1 CTRNN Predictor . . . . . . . . . . . 4.4.2 Pattern recognition using CTRNN . . 4.5 Predictor block using ANFIS . . . . . . . . . . 4.5.1 ANFIS Predictor . . . . . . . . . . . . 4.5.2 Pattern recognition using ANFIS . . .. . . . . . . .. 48 49 51 53 53 54 55. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 61 61 63 64 66 66 68 69 69 71. 5 TEMPORAL FUZZY AUTOMATA 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 5.2 Fuzzy Finite Automata . . . . . . . . . . . . . . . . 5.2.1 Optimistic and Pessimistic Fuzzy Automata 5.2.2 Reyneri’s Fuzzy Automata . . . . . . . . . . 5.2.3 Deformed Fuzzy Automata (DFA) . . . . . . 5.2.4 Hierarchical Fuzzy Automata (HFA) . . . . 5.3 Temporal Fuzzy Automata . . . . . . . . . . . . . . 5.4 Modelling a pattern using a TFA . . . . . . . . . . 5.4.1 Defining the states . . . . . . . . . . . . . . 5.4.2 Defining the transitions . . . . . . . . . . . . 5.5 Recognizing a pattern with a TFA . . . . . . . . . . 5.5.1 Synchronization . . . . . . . . . . . . . . . . 5.5.2 Update . . . . . . . . . . . . . . . . . . . . . 5.5.3 Ending . . . . . . . . . . . . . . . . . . . . . 5.6 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Synchronization . . . . . . . . . . . . . . . . 5.6.2 Robustness against time variability . . . . . 5.6.3 Robustness against amplitude variability . . 5.6.4 Degree of matching . . . . . . . . . . . . . . 5.7 A practical example . . . . . . . . . . . . . . . . . . 5.7.1 Defining the states . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. 73 73 74 75 76 77 78 80 84 84 86 89 90 90 94 94 95 96 97 98 100 101. ii. . . . . . . . . .. . . . . . . . . ..

(9) 5.7.2 5.7.3. Defining the transitions . . . . . . . . . . . . . . . . . . . 102 Experimental results . . . . . . . . . . . . . . . . . . . . 104. 6 LEARNING A TFA 6.1 Normalization of instances using DTW . . . . . . . . 6.2 Learning TFA with fixed number of states . . . . . . 6.2.1 Dividing the pattern into states . . . . . . . . 6.2.2 Defining the transitions of the TFA . . . . . . 6.3 Learning TFA with variable number of states . . . . . 6.3.1 Describing a pattern using linguistic intervals 6.3.2 Dividing the pattern into states . . . . . . . . 6.3.3 Defining the transitions of the TFA . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 107 . 107 . 110 . 110 . 111 . 113 . 113 . 116 . 120. 7 SENSOR DEVICE 123 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.2 Sensor device . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 8 POSTURE RECOGNITION 8.1 Introduction . . . . . . . . . 8.2 Capturing data . . . . . . . 8.3 The pendulum . . . . . . . . 8.4 The Rubik’s cube . . . . . . 8.5 Modelling of body postures 8.5.1 Conclusion . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 9 GESTURE RECOGNITION 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Segmentation . . . . . . . . . . . . . . . . . . . . . . . 9.4 Gesture recognition using HMM . . . . . . . . . . . . . 9.5 Gesture recognition using CTRNN . . . . . . . . . . . 9.5.1 CTRNN Predictor . . . . . . . . . . . . . . . . 9.5.2 Analysis of CTRNN predictors for prediction . . 9.5.3 Analysis of CTRNN predictors for classification 9.6 Gesture recognition using ANFIS . . . . . . . . . . . . 9.6.1 ANFIS predictor . . . . . . . . . . . . . . . . . 9.6.2 Analysis of ANFIS predictors for prediction . . 9.6.3 Analysis of ANFIS predictors for classification . 9.7 Gesture recognition using TFA . . . . . . . . . . . . . . 9.7.1 Analysis of TFA for gesture recognition . . . . . 9.7.2 Analysis of TFA for continuous recognition . . . 9.8 Comparison of all methods . . . . . . . . . . . . . . . . iii. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. 129 129 130 130 132 133 136. . . . . . .. . . . . . .. . . . . . . . . . . . . . . . .. 137 . 137 . 139 . 140 . 142 . 145 . 145 . 148 . 149 . 152 . 152 . 154 . 155 . 158 . 158 . 161 . 163.

(10) 10 HUMAN GAIT RECOGNITION 10.1 Introduction . . . . . . . . . . . . . . 10.2 Gait cycle . . . . . . . . . . . . . . . 10.3 Capturing the human gait . . . . . . 10.4 Analysis of waist accelerations during 10.5 Experimental setup . . . . . . . . . . 10.6 Experiment . . . . . . . . . . . . . . 10.6.1 Performance analysis . . . . . 10.6.2 Comparison with HMM . . . 10.6.3 Permanence analysis . . . . . 10.7 Conclusions . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . normal gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 11 CONCLUSIONS 11.1 PEC approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 TFA approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Patent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Pattern recognition using temporal fuzzy automata . . 11.4.2 Application of the computational theory of perceptions to human gait pattern recognition . . . . . . . . . . . . 11.4.3 Real time gesture recognition using Continuous Time Recurrent Neural Networks . . . . . . . . . . . . . . . 11.4.4 Gesture recognition using a neuro–fuzzy predictor . . . 11.4.5 Robust Gesture Recognition using a Prediction-ErrorClassification Approach . . . . . . . . . . . . . . . . . . 11.4.6 Linguistic description of human body posture using fuzzy logic and several levels of abstraction . . . . . . . . . . 11.4.7 Fuzzy sets of quasi–periodic signals . . . . . . . . . . . A COMPUTATIONAL COMPLEXITY A.1 Introduction . . . . . . . . . . . . . . A.2 CTRNN . . . . . . . . . . . . . . . . A.3 ANFIS . . . . . . . . . . . . . . . . . A.4 HMM . . . . . . . . . . . . . . . . . A.5 TFA . . . . . . . . . . . . . . . . . . A.6 Conclusion . . . . . . . . . . . . . . .. iv. ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 167 . 167 . 169 . 171 . 172 . 174 . 175 . 175 . 177 . 180 . 180 . . . . .. 183 183 185 189 189 189. . 190 . 191 . 191 . 192 . 193 . 194 . . . . . .. 195 195 197 198 199 200 204.

(11) List of Figures 2.1 Soft Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Linguistic labels to describe the temperature of a room . . . . . 15 2.3 Fuzzy inference system . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 Mamdani’s fuzzy inference system . . . . . . . . . . . . . . . . . 19 2.5 Example of TSK’s fuzzy inference system . . . . . . . . . . . . . 21 2.6 Model of artificial neural network . . . . . . . . . . . . . . . . . 22 2.7 Examples of ANN architectures . . . . . . . . . . . . . . . . . . 23 2.8 Example of crossover and mutation operators . . . . . . . . . . 27 2.9 Flow diagram of a Genetic Algorithm . . . . . . . . . . . . . . . 28 2.10 Architecture of the ANFIS . . . . . . . . . . . . . . . . . . . . . 30 3.1 Stages of temporal pattern recognition methods . . . . . . . . . 35 3.2 General schema of a statistical pattern recognizer . . . . . . . . 37 3.3 General schema of template matching . . . . . . . . . . . . . . . 40 3.4 Best warping path of both sequences . . . . . . . . . . . . . . . 43 3.5 General schema of a structural pattern recognizer . . . . . . . . 45 3.6 Triangular episodes proposed by Cheung and Stephanopoulos. Figure extracted from [Kiv98] . . . . . . . . . . . . . . . . . . . 46 3.7 Graph corresponding to the FTP of a temporal pattern . . . . . 50 3.8 From left to right, Jordan’s and Elman’s recurrent neural networks 54 3.9 Example of Markov’s chain for modelling the weather . . . . . . 56 4.1 Prediction-Error-Classification System . . . . . . . . . . . . . . 63 v.

(12) 4.2 Prediction-Error Block . . . . . . . . . . . . . . . . . . . . . . . 64 4.3 Architecture of the CTRNN . . . . . . . . . . . . . . . . . . . . 67 4.4 Example of signal pattern with two components . . . . . . . . . 70 4.5 ANFIS Predictor with inputs of previous time steps . . . . . . . 70 4.6 ANFIS Predictor with input vector . . . . . . . . . . . . . . . . 71 5.1 Pattern of a rectangular signal . . . . . . . . . . . . . . . . . . . 84 5.2 TFA used to recognize the rectangular signal . . . . . . . . . . . 85 5.3 Membership functions to describe the amplitude . . . . . . . . . 86 5.4 Linguistic labels for the duration of a state . . . . . . . . . . . . 88 5.5 Perfect fitting of the signal with the TFA (α = 0 , θ = 0.01) . . 95 5.6 Synchronization of the TFA (α = 0 , θ = 0.01) . . . . . . . . . . 96 5.7 Pattern with increasing period (α = 0 , θ = 0.01) . . . . . . . . 97 5.8 Pattern corrupted with uniform noise (α = 0, θ = 0.01) . . . . . 98 5.9 Pattern corrupted with glitches (α = 0.5, θ = 0.01) . . . . . . . 99 5.10 Different A4 signals . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.11 Modelling of A4 cycle . . . . . . . . . . . . . . . . . . . . . . . . 102 5.12 T imeT oMove and T imeT oStay for short and long states . . . . 103 5.13 Recognized patterns of several signals using different parameters 104 5.14 Degree of matching for all the recognized patterns of each different signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.1 Normalization of several training instances . . . . . . . . . . . . 109 6.2 Division into states . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.3 Linguistic labels for the duration of a state . . . . . . . . . . . . 112 [3,4]. 6.4 Example of definition of the linguistic interval A1. for input x1 114. 6.5 Intervals that cover the signals x1 and x2 of the pattern . . . . . 115 6.6 States that cover the summary obtained previously . . . . . . . 119 6.7 Examples of calculation of Tmin and Tmax . . . . . . . . . . . . . 120 vi.

(13) 7.1 From left to right: Biaxial Accelerometer ADXL-202 and Bluetooth module EYMF2CAMM-XX . . . . . . . . . . . . . . . . . 125 7.2 Layout of the circuit that implements the accelerometer. . . . . 126 7.3 Device to capture the movements . . . . . . . . . . . . . . . . . 127 8.1 Temporal Evolution . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.2 Acceleration signals . . . . . . . . . . . . . . . . . . . . . . . . . 131 8.3 Pendulum Coordinates . . . . . . . . . . . . . . . . . . . . . . . 131 8.4 Membership functions for orientation and height . . . . . . . . . 132 8.5 Rubik’s cube. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133. 8.6 Activation of cells during the movement between seated and standing upright. . . . . . . . . . . . . . . . . . . . . . . . . . . 134. 8.7 Temporal fuzzy automaton to model evolution of the body posture135 9.1 Gestures used to analyse the performance of the method . . . . 140 9.2 Acceleration signals recorded at the hand when performing a circular hand motion. . . . . . . . . . . . . . . . . . . . . . . . . 141 9.3 Average recognition rate using HMM for training sets with different sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 9.4 Architecture of the CTRNN used for gesture recognition . . . . 146 9.5 Evolution of the fitness function during a run of the training for the predictor of class A . . . . . . . . . . . . . . . . . . . . . . . 147 9.6 Mean prediction error produced by all predictors for all the instances of each gesture class . . . . . . . . . . . . . . . . . . . . 148 9.7 Average recognition rate using CTRNN for several training set with different size . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.8 Three ANFIS predictor for forecasting the next value of acceleration signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 9.9 Membership functions used in the predictor . . . . . . . . . . . 153 9.10 Mean prediction error produced by all predictors for all the instances of each gesture class . . . . . . . . . . . . . . . . . . . . 154 vii.

(14) 9.11 Average recognition rate using ANFIS . . . . . . . . . . . . . . 157 9.12 Cumulate prediction error for a given gesture of class A . . . . . 157 9.13 Linguistic labels used for acceleration components (ay ,az ) . . . . 158 9.14 Recognition rate versus number of training instances for isolated gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 9.15 Recognition rate versus number of training instances for gestures captured in a realistic environment . . . . . . . . . . . . . 160 9.16 Number of false accepted instances respect to the N parameter . 161 9.17 Number of false rejected instances respect to the N parameter . 162 9.18 Average false accepted and false rejected positives respect to the N parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . 163. 9.19 Average recognition rate for several training sets with different sizes by using several methods . . . . . . . . . . . . . . . . . . . 164 10.1 Division of gait cycle in phases . . . . . . . . . . . . . . . . . . . 169 10.2 Subphases of human gait . . . . . . . . . . . . . . . . . . . . . . 170 10.3 Example of accelerations of the human gait . . . . . . . . . . . . 173 10.4 Filtered accelerations of the human gait. . . . . . . . . . . . . . 174. 10.5 FAR and FRR calculated for several N parameters . . . . . . . 176 10.6 Evolution of the recognition and false acceptance rate using training sets of increasing sizes . . . . . . . . . . . . . . . . . . . 177 10.7 FAR and FRR curves obtained by HMM with random initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 10.8 FAR and FRR curves obtained by HMM with oriented initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 10.9 Confusion matrix for all the gaits of the first and second trial . . 181. viii.

(15) List of Tables 5.1 Evolution of degree of matching when including noise in the signal 99 9.1 Confusion matrix for isolated gestures using HMM . . . . . . . . 143 9.2 Confusion matrix for gestures captured in a realistic environment using HMM . . . . . . . . . . . . . . . . . . . . . . . . . . 144 9.3 Parameters of each Neuron of CTRNN . . . . . . . . . . . . . . 146 9.4 Confusion matrix for isolated gestures using CTRNN . . . . . . 150 9.5 Confusion matrix for gestures captured in a realistic environment using CTRNN. . . . . . . . . . . . . . . . . . . . . . . . . 150. 9.6 Confusion matrix for isolated gestures using ANFIS . . . . . . . 155 9.7 Confusion matrix for gestures captured in a realistic environment using ANFIS . . . . . . . . . . . . . . . . . . . . . . . . . 156 A.1 Examples of time complexities using big-O notation . . . . . . . 196 A.2 Complexities of algorithms presented in this thesis . . . . . . . . 204. ix.

(16) x.

(17) RESUMEN La mayorı́a de las señales provenientes del mundo real y especialmente aquellas capturadas del cuerpo humano suelen presentar eventos que se repiten de forma periódica. Al conjunto de las caracterı́sticas comunes que presentan estos eventos se les conoce como patrón. Cuando en un patrón se considera de forma particular su extensión temporal a éste se le denomina “patrón temporal”. Ejemplos de patrones temporales son: la forma de onda del electrocardiograma o las aceleraciones del cuerpo humano durante la marcha. En esta tesis se proponen diferentes técnicas para el modelado de estos patrones temporales ası́ como para la detección de dichos patrones en una señal dada. Debido a que estas señales presentan normalmente una alta variabilidad en su forma y duración, se han aplicado diferentes conceptos y técnicas provenientes de “Soft Computing” que permiten lidiar con esta variabilidad. A grandes rasgos, las diferentes técnicas utilizadas en esta tesis pueden ser clasificadas en los dos siguientes enfoques: • Predicción-Error-Clasificación. Este enfoque consiste en generar un pre-. dictor de señal por cada patrón y utilizar los errores producidos por cada predictor para determinar la clase de una señal dada.. • Autómata Finito Borroso. Cada patrón temporal es considerado como una secuencia de eventos que se modela mediante un autómata. Posteriormente, este autómata se utiliza para reconocer patrones en señales. Las diferentes técnicas estudiadas en esta tesis han sido evaluadas sobre una señal sintética y varias señales del movimiento del cuerpo humano. 1.

(18) 2.

(19) ABSTRACT Many signals obtained from the real world, including the ones captured from the human body, present different events repeated recurrently. When these events possess some common features repeated in time, we call them “temporal patterns”. Examples of these temporal patterns can be the electrocardiogram waveform or the accelerations of the human body during the gait. This thesis proposes different techniques to model these temporal patterns and to detect their presence in a given signal. Due to the fact that these signals usually present a high variability in its shape and duration, we have applied different concepts and techniques from Soft Computing since they are able to deal with this variability. In general terms, two different approaches has been studied in this thesis: • Prediction-Error-Classification approach. It consists of generating a signal predictor for each pattern and using the errors produced by each predictor to determine the class of a given signal. • Fuzzy Finite Automata approach. The pattern is considered as a se-. quence of events which are modelled by an automaton. Afterwards, this automaton is used to detect patterns in a given signal.. The performance of our proposed techniques has been tested on synthetic signals and on the analysis of the movements of human body.. 3.

(20) 4.

(21) Chapter 1 INTRODUCTION 1.1. Motivation. Due to the current advances in technology, there are electronic sensors able to measure many human body parameters. These sensors range in complexity from the simple thermometer to the electroencephalograph, which collects the brain waves. Usually, instantaneous measurements are not sufficient to analyse these signals and their temporal evolution must also be taken into account. Furthermore, most of these signals show typical repetitive temporal patterns. Some examples of temporal patterns are: the electrocardiogram waveform, the waves of an electroencephalogram, the accelerations of the human body during the gait, etc. In particular, researchers in the field of medicine are interested in modelling and analysing this kind of patterns because they allow detecting different pathologies [BLP07]. For example, the analysis of ECG signals provides information about abnormalities of the heartbeat [KJM95]. The analysis of the movements of human body, such as the human gait, can help to diagnose some diseases e.g., Parkinson [MMG+ 07] or to prevent falls in elderly people [HKJK04]. The recognition of body movements has many applications in the field of 5.

(22) human computer interfaces. These days there is a great interest in the video games field since it allows creating more intuitive and natural controllers. A good example is the new console Wii 1 , which is able to recognize the gestures performed with the hand. In addition, some portable devices (e.g., iPhone2 ) include sensors that allow controlling some basic functions by means of specific movements. Although this kind of signals presents characteristic patterns that are easy to recognize by an human expert, they also have a high variability in its shape and duration, which make them difficult to model using computational traditional techniques. Therefore, it is necessary to use techniques that allow dealing with this variability. Soft Computing is a set of techniques and concepts whose main goal is to exploit the tolerance for imprecision, uncertainty, partial truth, and approximation to achieve tractability, robustness and low solution cost. The main ingredients of Soft Computing are fuzzy logic, neural networks, probabilistic reasoning, genetic algorithms, belief networks and chaotic systems and some parts of the learning theory [Zad94]. Furthermore, sometimes several Soft Computing techniques can be combined to solve a specific problem. In this manner some of the drawbacks that the techniques present separately may be improved using hybrid solutions.. 1.2. Objectives of the thesis. The goal of this thesis is to model temporal patterns using techniques and concepts of Soft Computing and then use these models to recognize signals following these patterns. The performance of the proposed techniques will be tested on the analysis of the movements of human body. This is a good example as human beings usually repeat many movements during the day, 1 2. Nintendo video game console (Wii). Web page: http://wii.com Apple mobile phone (iPhone). Web page: http://www.apple.com/iphone 6.

(23) but never in the same way. In other words, these signals, having a similar pattern, present a high variability in amplitude and time. In particular, we have tested the proposed methods on three different experiments: recognition of some gestures performed with the hand, monitoring the position of the body and recognition of the specific gait of each individual. According to the needs of these applications, we have defined some general requirements that the proposed methods must fulfil: Robustness against noise and variability : Usually the signals captured from the real world are noisy; therefore, the model must be robust enough to deal with this noise. Furthermore, the human movements present a high variability and for this reason the model must allow representing this variability. Including time in the model : In temporal patterns, which can be structured in different phases, time appears in two different ways [Wan03]. On the one hand, there is an order between the phases that defines when each phase occurs. On the other hand, each phase has a specific duration. Therefore, the model must provide mechanisms to include both constraints. Avoiding segmentation : During the recognition stage, most of the techniques need a segmentation phase before analysing the signals. This segmentation extracts the meaningful parts of the signal, however this process usually is not simple. Moreover, a failure in this phase blocks the posterior analysis. Hence, the method must be able to synchronize with the signal in order to avoid this previous segmentation. Detecting null class : Most of techniques usually allow discriminating between several pattern classes, but they cannot reject a signal that does not belong to any 7.

(24) particular class. They only classify a pattern in one of its different classes. This is called “null class problem” and we must try to solve it. Additionally, we have defined some optional requirements of interest for the proposed methods: Understandability : The models should be understandable by an expert for two reasons. Firstly, in order for the expert to define the model during the design phase and, secondly, to allow that the expert can follow the evolution of the pattern during the analysis. Degree of matching : The analysed signals usually do not fit perfectly with the defined model. Therefore, it is interesting to provide a measurement that informs about the fitness of the signal with the model. Early recognition : In some applications that require different pre-calculations depending on the class of the pattern, it could be interesting to classify a pattern before it finishes, so that we can start these pre-calculations in advance. This is called “early recognition” and it is useful in many fields. For example in [MUK+ 06] the authors move a robot that imitates the gestures of a person. They take advantage of this feature because they can forecast which gesture the person is going to perform and then start the calculations for moving robot in advance. In [GHMDC05], the authors use this early recognition in air percussion. The percussionist makes the strikes in the air, then the gesture is recognized by a computer and the corresponding sound is generated. They want to forecast these strikes because if they wait until the strike is done in the air, the auditory feedback can be degraded due to the de-synchronization. 8.

(25) Low computational cost : Another interesting goal could be that the analysis phase will have a low computational cost in order to use this process in real time applications. Furthermore, if the proposed method is able to avoid the segmentation phase by means of synchronizing with the signal, the method could be used for doing on-line recognition. Until now, most of works on temporal pattern recognition are off-line. This means the data is recorded once and then all the algorithms are tested on this data. However, due to the improvement in the computational capability of portable devices, this process could be done in real time, which allows us to create more interactive applications.. 1.3. Outline of the thesis. The rest of this thesis is organized as follows: 2. Soft Computing: Techniques and concepts of Soft Computing are described briefly. 3. State of the art: A review of different previous approaches used in the field of temporal pattern recognition is presented. 4. Prediction-Error-Classification: Our approach based on the comparison of the results of several predictors to classify temporal patterns is explained. Two different methods were used to create these predictors: neural networks and neuro-fuzzy systems. 5. Temporal Fuzzy Automata: This chapter defines Temporal Fuzzy Automata based on the previous concept of General Fuzzy Automata and explains how to use them to detect temporal patterns. In order to show how an expert can model easily a pattern using this approach, some basic signals are modelled. 9.

(26) 6. Learning Temporal Fuzzy Automata: Although these automata can be modelled manually, sometimes it is interesting to model them automatically. This chapter proposes a method for learning a Temporal Fuzzy Automata from several instances of the pattern. 7. Experiments: Human movements: In order to test our approaches, we have performed several experiments for recognizing some movements of the human body. This chapter falls into four parts: (a) Sensor device. Election of the device used for capturing the movements and description of the chosen one. (b) Posture recognition. Explanation of the experiment for monitoring the posture of an individual using Temporal Fuzzy Automata. (c) Gesture recognition. Explanation of the experiment for recognizing some gestures made with the hand and comparison of the results obtained by the different approaches. (d) Human gait recognition. Explanation of the experiment for distinguishing the human gait of different subjects using Temporal Fuzzy Automata. 8. Conclusions: Summary of the main contributions of this thesis and evaluation of the different approaches proposed. 9. Computational Complexity Analysis In this appendix, we have performed a study of the computational complexity of the different techniques presented in the thesis.. 10.

(27) Chapter 2 SOFT COMPUTING 2.1. What is Soft Computing?. The term “Soft Computing” appears as opposed to the term “Hard Computing”, which makes reference to classical computing techniques oriented to find exact solutions without ambiguity and imprecision. The techniques of Soft Computing try to solve real world problems in which the only possible solutions are approximate, inexact and vague. One of the first discussions about this term took place during FLINS’96 [LRVdW98], five years after Zadeh used this term for the first time to create the Berkeley Initiative in Soft Computing (BISC). The result of this discussion was the following definition : “Every computing process that purposely includes imprecision into the calculation on one or more levels and allows this imprecision either to change (decrease) the granularity of the problem, or to “soften” the goal of optimization at some stage, is defined as to belonging to the field of Soft Computing.” The term “Soft Computing” also makes reference to the attempt to join the advantages of Fuzzy Logic, Artificial Neural Networks, Evolutionary Computation and Probabilistic Reasoning. Figure 2.1, extracted from the book [Cor01a], shows the different fields that form “Soft Computing”. Although 11.

(28) Figure 2.1: Soft Computing these four constituents share some common characteristics, they are considered complementary because some desirable features lacking in one approach are present in another. On the one hand, Fuzzy Logic is used in many applications for modelling the expert knowledge, allowing the inclusion of certain imprecision and vagueness in their descriptions. On the other hand, the main advantage of using neural networks and genetic algorithms is that they provide powerful learning or approximation methods. In the following sections, Fuzzy Logic, Artificial Neural Networks and Evolutionary Computation are presented briefly in order to explain their main concepts. Probabilistic Reasoning is not introduced since it was not used explicitly in this thesis. Finally, hybrid methods, which combine components of different fields, are also presented.. 12.

(29) 2.2. Fuzzy logic. In 1965, Lotfi A. Zadeh, the father of Fuzzy Logic, wrote the article Fuzzy Sets [Zad65], where he presented the bases of fuzzy sets theory which includes the classical set theory as a particular case. The syntax of fuzzy logic extends the classical one with the introduction of linguistic elements and the possibility of defining fuzzy facts and functions. One of the objectives of this logic is to provide the bases for approximate reasoning that allow using concepts with imprecision and uncertainty. This kind of reasoning is useful because our knowledge is usually imprecise, uncertain, not complete and expressed in natural language.. 2.2.1. Fuzzy sets. In a classical set (crisp), an element a can only belong to a set A (a ∈ A) or not (a ∈ / A), as a natural number belongs to the set of the odd numbers. However,. this concept can be generalized allowing that elements belong partially to a set with a degree of membership. This degree is defined in the interval [0, 1] where 0 means that the element does not belong to the set and 1 that the element is completely included in the set. The function that provides the degree for every element of the domain X to the set A is called membership function (µA ): µA : X → [0, 1] The typical operations of classical set theory have to be redefined in order. to work with fuzzy sets: • Subset: A ⊆ B ⇐⇒ ∀x ∈ X. µA (x) ≤ µB (x). • Complement: Ā = X − A ⇐⇒ µĀ (x) = N(µA (x)) = 1 − µA (x) • Intersection: C = A∩B ⇐⇒ µC (x) = T (µA (x), µB (x)) = min (µA (x), µB (x)) • Union: C = A ∪ B ⇐⇒ µC (x) = S(µA (x), µB (x)) = max (µA (x), µB (x)) 13.

(30) Although, in this case, the simplest operators are shown, other negations (N), T-norms(T ) or T-conorms(S) can be used to provide different properties. An analysis of different operators is presented in [Als96]. All these operators are also valid for crisp sets because they are a particular case.. 2.2.2. The components of fuzzy logic. As said before, one of the advantages of fuzzy sets is that it can be used to represent linguistic expressions since they are able to capture the vagueness and imprecision of the words in natural language. The main components of fuzzy logic are presented in the following paragraphs.. Linguistic variable A linguistic variable is a variable whose values are words or sentences. Each word is called “linguistic label” and it is represented by means of a fuzzy set defined in the domain of discourse of the variable. Figure 2.2 shows the membership functions associated to the linguistic labels: Cold,Warm and Hot. These labels are used to describe the linguistic variable “Temperature of a room”.. Fuzzy Relation The previous operators for complement, union and intersection are operations of sets in the same domain of discourse. However, sometimes to make more complex linguistic terms we need to relate fuzzy sets in different domains of discourse. A relation between two fuzzy sets A and B defined in domains of the discourse U and V respectively is called a “fuzzy relation”. In [BK86] , there is a review of many different fuzzy relations but in the following paragraphs we present the most common ones. The relation AND represents the degree in which two elements x and y 14.

(31) Degree of membership. 1. 0.8. 0.6 Cold. Warm. Hot. 0.4. 0.2. 0 −10. 0. 10. 20. 30. 40. Temperature (Celsius). Figure 2.2: Linguistic labels to describe the temperature of a room belong to the fuzzy sets A and B simultaneously. This fuzzy relation is implemented using a T-norm. µAN D (x, y) = T (µA (x), µB (y)) The relation OR expresses the degree in which the element x belongs to the fuzzy set A or y to the fuzzy set B. This fuzzy relation is implemented using a T-conorm. µOR (x, y) = S(µA (x), µB (y)) Another useful relation for doing approximate reasoning is the implication that defines a relation between the antecedent x and the consequent y. In fuzzy logic there are many different ways to define an implication and these are the most known ones: • Mamdani: J(x, y) = min (x, y) • Larsen: J(x, y) = x · y • Reichenbach: J(x, y) = 1 − x + x · y 15.

(32) • Lukasiewick: J(x, y) = min (1, 1 − x + y) Fuzzy Inference In opposition to classical logic, in fuzzy logic the reasoning is not precise but approximate. This means that a conclusion can be inferred though the fact does not verify the rule completely. Therefore, this conclusion will be as similar as the formal conclusion when the degree of fulfilment of the fact to the rule is also higher. A fuzzy rule is defined as: r: if X is A then Y is B where X and Y are linguistic variables with values in the domain of discourses U and V respectively and A and B are fuzzy sets over U and V respectively. This rule can be expressed by means of any of the fuzzy implication presented previously. Although there are different types of reasoning, in this thesis we only describe the generalization of Modus Ponens because it is commonly used. In this schema, a conclusion is obtained from a rule and a fact. Rule:. if X is A then Y is B. Fact:. X is A∗. Conclusion:. Y is B ∗. where A,B, A∗ and B ∗ are fuzzy sets. The conclusion B ∗ is a fuzzy set obtained applying Zadeh’s Compositional Rule of Inference (CRI) [FZ91]: µB∗ (y) = sup T (µA∗ (x), J(µA (x), µB (y)) f or. all. x∈U. y∈V. In the particular case that the fact A∗ is a numeric observation x0 instead of a fuzzy set, the CRI is simplified to the following equation [MTG03], which is commonly used in fuzzy inference systems: µB∗ (y) = J(µA (x0 ), µB (y))) f or 16. all. y∈V.

(33) 2.2.3. Fuzzy inference systems. A fuzzy inference system (FIS) processes inputs to obtain outputs by using fuzzy inference. The main advantage of this system is that the mapping between inputs and outputs is expressed by means of fuzzy rules that can be designed easily by an expert. These systems have been used in many different fields like data classification, expert systems and specially in control applications, where they have been successfully applied [KY95].. Figure 2.3: Fuzzy inference system Figure 2.3 displays the different parts of these systems: • Fuzzification. Usually, in control applications the input values are numbers. Therefore, the function of this stage is to convert the input values into fuzzy sets adequate for the inference engine. • Knowledge base. The expert knowledge for the specific control appli-. cation can be expressed by means of some rules. These rules have the. structure described in the previous section. • Inference engine. From the rule base and current inputs some conclusions are inferred using the Zadeh’s CRI. 17.

(34) • Defuzzification. In a control application, the outputs have to be a number. However, the outputs of the inference engine are fuzzy sets. Hence, it is necessary to convert them. FIS are classified in two different types: Mamdani and Takagi-SugenoKang. In the following sections, we describe their main features. Mamdani In 1974, Mamdani was the first one to apply fuzzy logic in control application creating a controller for a steam locomotive [Mam74]. The rules of a system proposed by Mamdani present the following form: Rj : IF X1 is Aj1 ∧ . . . ∧ Xn is Ajn THEN Y is Bj where Xi are the inputs and Y is the output. In this rule Rj , Aji are the specific linguistic labels chosen for each input i and Bj is the linguistic label that defines the output. Figure 2.4 shows an example of the inference process in a Mamdani system with two inputs (X1 and X2 ). In this case, these are the two fired rules: R1 : IF X1 is A11 ∧ X2 is A12 THEN Y is B1. R2 : IF X1 is A21 ∧ X2 is A22 THEN Y is B2. In figure 2.4, it can be observed that the first step is to obtain the membership degree of the antecedents for each rule. Due to the fact that the antecedents are linked by AND operator and it is implemented with the minimum function, the result of the evaluation of all antecedents will be the minimum of all of them. After that, the simplified CRI is applied to the antecedent and the consequent. In this case, the Mamdani’s implication is also a minimum, so the result will be the membership function of the consequent cut at the degree of membership of the antecedent. However, several rules can contribute to the same linguistic variable; therefore, the overall result is the combination of the results of all these rules. This combination that is represented in the figure 18.

(35) Figure 2.4: Mamdani’s fuzzy inference system as a grey shape is done with the operator OR, which is implemented with a maximum. Finally, the output is a fuzzy set; in this way, to obtain a numeric value that summarizes the whole fuzzy set, we need to use a defuzzification method. These are the most common ones: • First of maxima. This method provides the output value with the highest membership function. If the fuzzy set of the output is D defined in the universe Z, the value of the defuzzification y0 is calculated: y0 = max µD (y) y∈Z. • Mean of maxima. Sometimes, several points reach the maximum value, this method proposes to calculate the average of them all.. • Centroid. This method proposed by Sugeno in 1985 is probably the most used, as it is very accurate and it provides continuous and smooth variations in the output. R yµD (y)dy y0 = R µD (y)dy 19.

(36) Takagi-Sugeno-Kang Takagi, Sugeno and Kang (TSK) [TS85] proposed a new fuzzy inference system based on rules where the antecedent contains linguistic variables, but the consequent is represented with a function of the input variables. These functions are usually first-order or zero-order polynomials. The aspect of this type of rules is the following: Rj : IF X1 is Aj1 ∧ . . . ∧ Xn is Ajn THEN Yj = pj1 · X1 + · · · pjn · Xn + pj0 where Xi are the input variables, Y is the output variable and pji are the coefficients of the polynomial. In these systems, the output of these rules is a number instead of a fuzzy set, so the defuzzification stage is not needed. The output obtained from n rules of the knowledge base is calculated as the weighted average of the individual outputs of each rule Rj . Pn j=1 wj · Yj Y = Pn j=1 wj. where wj = T (Aj1(x1 ), · · · , Ajn (xn )) represents the matching between the. antecedents of the rule Rj and the current inputs. Although T could be any T-norm, minimum and product are the most commonly used. In order to compare this method with Mamdani’s one, we present the same previous example using a first-order TSK fuzzy inference system whose if-then rules are the following: R1 : IF X1 is A11 ∧ X2 is A12 THEN Y1 = p11 · X1 + p12 · X2 + p10. R2 : IF X1 is A21 ∧ X2 is A22 THEN Y2 = p21 · X1 + p22 · X2 + p20. Figure 2.5 shows an example of inference using the TSK’s FIS that models these rules.. 20.

(37) Figure 2.5: Example of TSK’s fuzzy inference system. 2.3. Artificial neural networks. Every biological neuron is composed of the body cell (soma) and the axon and dendrites which allow the connection between them. Brains are composed of millions of neurons interconnected among them, which provides their powerful operation. Artificial neural networks (ANN) is a bio-inspired technique that tries to imitate the operation of the biological neurons in order to reach their advantages.. 2.3.1. Fundamentals. ANN are also composed of some basic components called neurons. The operation of each neuron is simple and it consists of receiving inputs of other neurons, applying some weights to them and calculating an output value. This output value of a neuron is transmitted to other neurons where it is used as an input. Figure 2.6 shows the typical model of an artificial neuron. The inputs Y1..N to the neuron are outputs of other neurons multiplied by a weight W1..N . The internal activity of a neuron U is calculated as the sum of all the inputs and a bias θ: U=. N X i=1. Wi · Yi + θ 21.

(38) Figure 2.6: Model of artificial neural network Then, this sum is evaluated over an activation function φ to calculate the output of the function. Output = φ(U) The most common φ functions are non-linear, e.g, threshold function, piecewiselinear function and sigmoid. The reason for introducing a non linear function is that the relation between inputs and outputs could be non linear and this is impossible to generate by only using linear functions. These basic elements can be connected following different architectures as can be seen in figure 2.7. Although in fully connected networks each neuron is connected to every neuron, usually the neurons are structured in layers which constrain their possible connections. Depending on their position in the network, three types of layers can be distinguished: • Input layer. In this layer the neurons receive the inputs of the system and introduce this information in the net.. • Hidden layer The neurons belonging to this layer are internal and they are not connected directly to the exterior. Depending on the architecture of the networks, the number of hidden layers is different (even zero). 22.

(39) • Output neurons. These neurons transfer the information produced in the net to the exterior, so they provide the outputs of the system.. Figure 2.7: Examples of ANN architectures When the outputs of the neurons of a layer are only connected to neurons of posterior layers, this network is described as a feed-forward neural network. In the case that an output of a neuron is connected to a neuron in a precedent layer or even to itself, this network is described as recurrent neural network. On the one hand, the main advantage of using neural networks is that they provide powerful learning methods. The learning in ANN consists of the modification of the behaviour of the network with regards to external inputs using several training instances. In ANN, the knowledge is stored in the bias of each neuron and the weights associated to the connections between neurons. Therefore, every learning process implies some modifications in the weight of the neurons. On the other hand, the main inconvenience of this method is that this knowledge cannot be understood easily by an expert.. 23.

(40) 2.4. Evolutionary Computation. Darwin’s theory of evolution puts forward that the natural species are evolving to adapt themselves to their environment. The better adapted individuals have a higher probability to survive. Therefore, these individuals have also a higher probability to reproduce and to propagate their genes to the next generation. Furthermore, the combination of features of two well adapted parents can produce a new individual that it is better adapted to the environment than them. The evolution is the result of two primary processes: natural selection and sexual reproduction. The first one decides which members of the population will survive and the second one guarantees the combination of the genes in the descendants. First works in Evolutionary Computation (EC) began between the fifties and sixties. The basic idea consists of developing algorithms inspired in the processes of natural evolution to solve problems of learning, searching and optimization. In 1975, J.H. Holland introduced the Genetic Algorithm (GA) [Hol92], which is one of the most used techniques in EC. GA is a search technique used in computing to find exact or approximate solutions to optimization and search problems. This technique works with a population of individuals where each of them represents a possible solution of the given problem. These solutions compete in order to obtain the best solution of the problem. The best adapted individuals will be more likely to reproduce and, hence, they will cross their chromosomes producing descendent with features of both parents. However, the worst individuals have less probabilities for reproduction so they will disappear as well as their features will. Therefore, the environment, which in this case is only the other individuals, is producing a selective pressure where only the best adapted survive (the best solving the problem). 24.

(41) 2.4.1. GA concepts. Due to the fact that GA is based on the evolution of living organisms, most of its concepts come from genetics and biology: • Individual. Every individual is a string that is the approximation of a chromosome in the nature. This string represents the different parameters (genes) of a possible solution for the given problem. Each gene is encoded using an alphabet e.g., binary. Thus, the algorithm works with the coding of the parameters instead of the parameters themselves. • Population. A set of different individuals is a population. GA techniques consists of making this population to evolve successively. The size. of this population should be big enough in order to guarantee that the GA will cover all the zones of the search space. • Generation. The specific population of an iteration of the GA. • Fitness function. This function reflects the suitability of an individual. for the specified problem and it is used to assign a score to every individual. In the case of a maximization problem, the individuals which have high values for the fitness function will be more likely to survive. The election of this function is usually not an easy task as it depends completely on the problem to solve.. 2.4.2. GA Operators. These are the different operations that can be applied to the population in order to obtain a new population. After the evaluation of every individual using the fitness function, the following operators are applied sequentially: Selection The first step is to choose the individuals of the population that are going to be used for the reproduction. The goal of this operator is to give the best 25.

(42) individuals of the population more chances to reproduce. These are the most used operators: • Roulette. In this operator the chance of an individual getting selected. is proportional to its fitness. The probability of choosing an individual is the quotient between its own fitness and the sum of the fitness of all individuals. F itnessi SelectionP robabilityi = PN i=1 F itnessi. This method is called “roulette” because it can be understood given that the individuals are chosen by spinning a roulette which is divided into portions, the area of which is proportional to its selection probability. The main drawback of this method occurs when the population is not uniform and there is an individual that presents a high quality in comparison to the others. In this case, this individual will be chosen most of the times and this produces a loss in the genetic diversity. • Tournament. This operator uses roulette selection to produce a tournament subset of individuals. Then, the best individual in this subset is selected. The length of this subset is called tournament window and it affects the behaviour of the algorithm. If this parameter is quite high, the performance will be similar to the roulette operator and if it is low, the algorithm will evolve slowly. In opposition to the previous operator, this allows maintaining the diversity. In addition to these selection operators, there is another method called “elitism” which allows the best individuals to be copied directly to the next generation without suffering any change (neither crossover nor mutation). The main advantage of this technique is to obtain a next generation which will be at least as good as the previous one. 26.

(43) Crossover The goal of this operator is to combine the individuals chosen in the previous stage in order to generate the descendants. The crossover operator is applied to a percentage of the population defined by the parameter crossover percentage PC . Firstly, two individuals are randomly selected to form a couple from the previous selected population. Then, a crossover point is chosen randomly to combine the information of both individuals. The descendants of these individuals contain information of both parents. From the beginning to the crossover point, there is information of one of the parents and from the crossover point to the end of the other parent as can be seen in figure 2.8.. Figure 2.8: Example of crossover and mutation operators There are different variations of this operator depending on the number of crossover points and other issues but this is the basic one. Mutation The mutation operator consists of the random alteration of the genes of the individual with a probability of mutation PM as can be seen in figure 2.8. The goal of mutation is to produce diversity in the population. This operator produces little variations in the population and this guarantees that every 27.

(44) point in the search space can be reached. However, the probability PM must be small in order not to convert the genetic search into a random search.. 2.4.3. Genetic Algorithm. The following flow diagram 2.9 illustrates the different phases of the execution in a GA.. Figure 2.9: Flow diagram of a Genetic Algorithm This process is executed until any stop criteria is fulfilled. The typical stop criterion is a maximum number of generations, though other stop criterion can be used e.g., the convenience of the current solution or the convergence of the 28.

(45) algorithm. This last one can be detected when the solution does not improve after several generations.. 2.5. Hybrid methods. Sometimes, several Soft Computing techniques can be combined to solve a specific problem. In this manner, some of the drawbacks that the techniques present separately can be improved. These techniques that mix concepts from different techniques are called “hybrid techniques”. They were represented in figure 2.1 as the intersection of two main fields. Some examples of these hybrid techniques are: • Neuro-Fuzzy systems. Fuzzy systems that use a learning algorithm derived from neural network theory to determine their parameters (linguistic labels and rules). • Fuzzy neural networks. Neural networks that use fuzzy operators or/and fuzzify their inputs, outputs and connections.. • Genetic neural networks. Neural networks that use genetic algorithm to train them. They are used when the search space is wide.. • Genetic fuzzy systems. Fuzzy systems using a genetic algorithm to determine the system parameters.. • Others combinations. As an example of hybrid method “Adaptive-Network-Based Fuzzy Inference Systems” (ANFIS) [SJ93] is described in detail. This neuro-fuzzy system represents its knowledge using several rules expressed in fuzzy logic, but it also has the capability of learning because it is implemented using neural networks. This method, which will be used in some experiments of this thesis, can generate understandable models automatically from several training instances. 29.

(46) 2.5.1. ANFIS. In particular, ANFIS is an adaptive neural network implementation of a TSK fuzzy inference system. An adaptive network is a multilayer feed-forward neural network with supervised learning capability. Each node of this network performs a particular function on the incoming signals. These nodes can be fixed or adaptive. The adaptive ones present some parameters that can be trained and the fixed ones cannot be trained. In order to show the equivalence between ANFIS structure and a first-order TSK fuzzy model, we show how the example of a fuzzy inference system with two inputs represented in figure 2.5 can be converted into the adaptive network described in figure 4.5. In this diagram the circles are fixed nodes, the squares are adaptive nodes and the links between the nodes represent the flow of the signals in the system.. Figure 2.10: Architecture of the ANFIS The function of the nodes of each layer is described below: • Layer 1: This first layer is the fuzzification layer. Every node provides. the membership degree of the input value to the specific linguistic label. 30.

(47) These nodes are adaptive as the parameters that define the linguistic labels can be learned during the training stage. Although in figure 2.5 triangular labels were used, the type of these linguistic labels is usually bell-shaped, having thus the following equation: 1. µAij (Xj ) = 1+. . Xj −cij aij. 2 bij. (2.1). These parameters aij , bij and cij are called premise parameters. • Layer 2: The nodes of this layer multiply some signals coming from layer 1. Each node of this layer implements the evaluation of the antecedent. of a rule. Therefore, the result can be understood like the firing strength of a rule whose premises are the incoming signals. Although the product is a usual t-norm operator, other operators can be used. • Layer 3: The function of this layer is to normalize the firing strengths.. For this, each node calculates the ratio between the firing strength of each node of the previous layer and the sum of all firing strengths. In our example, the equation to calculate the normalized firing strength for nodes 1,2 is computed as: wi =. wi , i = 1, 2 w1 + w2. (2.2). • Layer 4: Every node i of this layer is an adaptive node that calculates the product between the normalized firing strength of the previous layer. and the evaluation of a first-order function fi over the inputs X1 and X2 . The output of this node can be calculated with the following equation: Oi = wi fi = wi (pi1 · X1 + pi2 · X2 + pi0 ). (2.3). These are adaptive nodes because the parameters pi0 , pi1 and pi2 are learned during the training stage. These parameters are called “consequence parameters”. 31.

(48) • Layer 5: Lastly, all the outputs of the previous layer are added to obtain the overall result:. z=. 2 X i=1. P2 wi fi wi fi = Pi=1 2 i=1 wi. (2.4). As can be seen the overall result is the same obtained with Sugeno fuzzy inference system (see figure 2.5). Lastly, to learn the parameters of premise and consequence, the ANFIS can be trained using a hybrid learning algorithm which integrates back-propagation and least square estimation. This learning algorithm consists of two passes, which are repeated several times. A forward pass that uses least-squares estimation to identify the consequent parameters while the premise parameters are fixed. After that, a backward pass based on gradient descent is performed fixing the consequent parameters and propagating the error to update the premise parameters.. 32.

(49) Chapter 3 STATE OF THE ART The work presented in this thesis can be included in the field of pattern recognition. The scope of this thesis may be restricted to the subfield called “temporal pattern recognition” (or time series classification). The main objective of this subfield is to classify instances with temporal extension in a set of predefined pattern classes. The techniques studied in this field have been used in many diverse applications e.g., speech recognition [BAM08], on-line handwriting recognition [HBT96], sign language recognition [JH01], stock market analysis [SWI98] or even gene expression data analysis [DK05]. In particular, they have been used widely in medical data analysis [BLP07] to monitor the activity of the patient [MBC01] or to analyse well known biological signals like electrocardiogram (ECG) -for detecting some anomalies in the heart beat [TBR03]-, electroencephalogram (EEG) -for evaluating the physiological state of the brain [OGNP01]- or electromyogram (EMG) -for testing the electrical activity of muscles [TFKI00]-. Currently, due to the new generation of sensors able to capture the body movement, these techniques are also applied in the domain of entertainment to recognize body movements [KPE+ 06] and especially gestures performed with the hands [KLJ04]. 33.

(50) 3.1. Basic concepts. In this section, the definitions of concepts belonging to this field are included. Definition 1.. A time series T is a sequence of data points of a n-. dimensional space taken at successive instants of time (usually at regular intervals). T = {t1 , · · · , tM } Where M is the length of the time series. Definition 2. A subsequence S of the time series T is a contiguous set of points of this sequence. S = ta , · · · , tb. such that 1 ≤ a < b ≤ M. In most of time series, there are some “similar” subsequences that appear frequently along the given time series. Definition 3. A temporal pattern is the common behaviour of these “similar” subsequences.. 3.2. General schema of temporal pattern recognition methods. As can be seen in figure 3.1, temporal pattern recognition methods can be divided in several stages: • Sensing. The sensors capture the information from the studied environment.. • Preprocessing. This stage tries to remove the noise usually present in real world signals.. 34.

(51) • Segmentation. The captured signal not only contains temporal patterns but also signals which are not interesting to classify. The process of segmentation consists of searching for sequences in the signal which are likely to contain a temporal pattern.. • Classification. This stage classifies a given signal in one of the available temporal patterns.. Figure 3.1: Stages of temporal pattern recognition methods. Sometimes, segmentation and classification stages are done simultaneously. In this case, the task of extracting meaningful segments from input signals and recognizing them is called “pattern spotting” [Lee98]. This interesting feature makes it possible to avoid an explicit segmentation stage, which can accept false instances or reject good instances. Depending on whether the method can deal or not with continuous flow of the signal, the recognition techniques can be divided in on-line and off-line recognition methods, respectively. Off-line methods are always applied over data previously captured and they can access directly any part of the signal. However, on-line methods only have a local view of the signal and they cannot access future values of the signal. In particular, if these methods have a low computational cost and they can be executed in real–time, they are called real–time recognition methods. 35.

(52) 3.3. Classification of temporal pattern recognition methods. Although there are different ways for classifying the existing techniques for temporal pattern recognition, in this thesis we adopt the classification presented in [JDM00] which distinguishes four different approaches: Statistical approach. Some global features, which summarize the behaviour of the whole sequence, are extracted from each sequence. The classification is performed only by taking into account these features Template matching. The classification is based on the comparison of a given instance with all the training instances without any feature extraction. The training instance more similar to the given instance indicates the class of the instance. This similarity is calculated using a distance measure. Structural approach. These methods assume that the temporal pattern presents an extractable structure. The classification is based on the similarity of the structures. Connectionist approach. Artificial neural networks are commonly used in temporal pattern recognition since they present a high ability to learn complex patterns from training instances. This division is only orientative because some techniques can be included in several approaches at the same time depending on their interpretation. Furthermore, there are some hybrid techniques which combine characteristics of different approaches, so they cannot be classified in only one specific approach. In the particular case of Hidden Markov Models, the authors of [JDM00] placed this method at the frontier of pattern recognition as it cannot be clearly classified in any approach. However, this is one of the most used methods in temporal pattern recognition, consequently it will be presented in depth in a 36.

(53) section of this chapter named “Sequential Pattern Recognition” after the other approaches have been explained.. 3.4. Statistical approach. In this approach, every signal is described using a set of D features that describe some global characteristics. These D features can be represented in a space of D dimensions; therefore, each signal can be viewed as a point of this feature space. The goal of this method is to obtain the decision boundaries over the feature space from several training instances, so that they allow assigning a given instance to the correct class. The typical schema of a pattern recognition method based on the statistical approach is shown in figure 3.2 (extracted from [JDM00]). This schema is divided into two parts in order to distinguish the training stage and the classification stage.. Figure 3.2: General schema of a statistical pattern recognizer The function of the “Preprocessing Segmentation” block is to reduce the noise in the signal or/and normalize the signal. In case that the signal is not segmented, this block will extract the patterns of interest from the whole sequence. This approach can also work without segmenting the signal, but in this case the features are calculated using a sliding window of a given length, which is scrolled over the whole signal. From several training instances of every pattern, the feature selection block 37.

(54) aims to determine some features, which makes it possible that every pattern is separated in the feature space. Usually, statistical features e.g., mean, standard deviation, are used in time series as they describe the global behaviour of the pattern. Other features based on frequency like the coefficients of the Fourier transform or the Discrete Time Wavelet Transform provide information about the different frequencies present in the pattern. Other works propose more complex features like the frequency domain-entropy [BI04] or Doppler spectrum [FK04]. In the classification stage, the block “Feature Measurement” will obtain the features that were chosen during the feature selection phase from the testing instances. The learning phase consists of creating a classifier which allows deciding the class of a given sequence based on the set of selected features. There are many different available techniques to create these classifiers and a detailed explanation of them can be found in [DHS01]. In [JDM00] the authors classify these techniques in the following approaches: • Similarity approach. This method supposes that similar signals are assigned to the same class. This similarity is measured in the feature space. using a metric like the Euclidean distance. The simplest method designed using this approach is nearest neighbour (1–NN), which assigns a testing instance to the class of the closest training instance. • Probabilistic approach. The classification process is based on optimal Bayes decision rule. An instance is assigned to the class which presents. the maximum posterior probability. Depending on the estimation of the probability density functions, we can distinguish two different approaches: – Parametric. The distribution of the features of every class can be modelled using a statistical model. Therefore, each class can be characterized by the parameters of the chosen model. The learning problem is reduced to find these parameters. One of the most 38.

(55) common models used is multivariate Gaussian distribution. – Non parametric. The distribution of the features are not adjusted to any model. The typical methods used in this case are: k–nearest neighbours (k–NN), Parzen windows [JR88], etc.. • Discriminant function approach. Every class has associated a function. A pattern is assigned to the class whose function obtains the highest value. The methods included in this approach range from a linear classifier (e.g., single–layer Perceptron [Ros58]) to non linear classifiers (e.g., radial basis function networks [PS91]).. • Examples of other approaches: – Decision tree. Each node of the tree represents a decision over the features. The leaves are the different pattern classes. During testing, the tree is travelled across based on the decisions taken for each node until a leaf is reached. Examples of decision trees are : CART[CGM98] , C4.5 [Qui93], etc. – Fuzzy expert systems. An expert system based on some IF–THEN rules can be used to classify patterns. The antecedent of each rule represents some conditions over the features and the consequence is the pattern class [LL04].. Although, the statistical approach has been used widely in many applications, this approach has a limitation. This method is not scalable because including new patterns usually supposes the retraining of the entire system. For example, some features, which are chosen to discriminate among different pattern classes, could be not sufficient to distinguish a new class. 39.

(56) 3.5. Template matching. As can be observed in figure 3.3, in template matching approach there is no feature extraction stage as the classification is based on the comparison of the sequences themselves. A given instance is compared to all the training instances and, after that, the instance is assigned to a class. Therefore, the learning phase of this method is simple as it only consists of storing the given training sequences with their respective classes. This type of learning where the generalization is delayed until the classification stage is called “lazy learning”. Its main drawbacks are a slower classification process and the need of storing the whole training data.. Figure 3.3: General schema of template matching. This approach supposes that the sequences belonging to the same class have a similar shape. The similarity between two different sequences P and Q is measured with a distance, which is calculated over the extension of the sequences. The following distances are the most used ones in template matching: • The mean absolute distance (MAD) N 1 X MAD(P, Q) = |Pi − Qi | N i=1. 40.

(57) • The root mean square distance(RMSD). v u N u1 X t RMSD(P, Q) = (Pi − Qi )2 N i=1 However, these distances can be applied only when the two sequences have the same length. In case that the sequences have different lengths, they must be rescaled to the same length using any interpolation technique. Another problem arises when two sequences, which present a similar shape, have different amplitudes or offsets. This causes that the measured distance is high although both sequences have a similar shape. The solution to this problem is to use some normalization techniques, which eliminates these differences. These preprocessing techniques must be performed in the same way for the training instances and the testing instances. The most typical method for the classification process is k–nearest neighbour (k–NN). In this method, the distances between a given sequence and all the training sequences for all classes are calculated. Then, the k-nearest neighbours are selected and the most repeated class among these neighbours indicates the class of the given sequence. When k = 1, this method is reduced to a nearest neighbour classifier, which assigns the given instance to the class of the closest neighbour. This approach can be used for pattern spotting using a window with the same length as the training instances. This window is scrolled over the entire time series to calculate the distance between the sequence contained in the window and the training instances. The main drawback of this approach consists of its high computational cost as one comparison with all training instances is performed for each sample of the time series. However, this computational cost can be reduced using early abandon technique, this means, to stop the calculation when the distances are greater than a fixed threshold . 41.