Sequential detection of artifacts in electroencephalographic signals using nonparametric statistical methods

Texto completo

(1)Sequential Detection of Artifacts in Electroencephalographic Signals using Nonparametric Statistical Methods. Carlos Alejandro Robles Rubio. Division of Mechatronics and Information Technology Instituto Tecnológico y de Estudios Superiores de Monterrey Monterrey, Nuevo León, México November 2008. A thesis submitted to the Instituto Tecnológico y de Estudios Superiores de Monterrey, Campus Monterrey, in partial fulfillment of the requirements for the degree of Master of Science in Electronic Engineering (Electronic Systems). c 2008 Carlos Alejandro Robles Rubio.

(2) i. Instituto Tecnológico y de Estudios Superiores de Monterrey Campus Monterrey Graduate Program in Engineering Division of Mechatronics and Information Technology Sequential Detection of Artifacts in Electroencephalographic Signals using Nonparametric Statistical Methods by Carlos Alejandro Robles Rubio. Dr. Frantz Bouchereau Lara Thesis Advisor. Dr. Sergio Omar Martı́nez Chapa Thesis Co-advisor. Dr. Graciano Dieck Assad Synodal. Dr. Joaqun Acevedo M. Director of the Graduate Program. Date: November 15, 2008. The members of the thesis committee hereby approve the thesis of Carlos Alejandro Robles Rubio as a partial fulfillment of the requirements for the degree of Master of Science in Electronic Engineering (Electronic Systems)..

(3)

(4) c 2008 Carlos Alejandro Robles Rubio.

(5)

(6) v. Abstract The main objective of this work is to develop and test an efficient algorithm for the sequential detection of artifacts in signals from an ambulatory system for measurement and processing of EEG, considering that no previous knowledge of the artifact statistics is available. First some useful EEG theory is presented. In second place, theoretical background. Then a novel nonparametric-based detector is presented, the Central Chi-Square Detection Algorithm (CCDA). Subsequently, the performance is presented showing the simulation results and conclusions that demonstrate the flexibility and power of the novel detector. Four types of simulated artifacts are used, and both AR signals and true EEG are fed as input to the algorithm. It is shown that for an AR input, CCDA needs 50s of learning samples to deliver optimum results, and 0.25s when true EEG signals are evaluated at a sampling rate of 1kHz. Finally, online implementation results are presented for a visual assessment of the capabilities of CCDA when varying the model order M and setting an appropriate value for the probability of false alarm PF A . A resource efficient recursive implementation for the sequential calculation of statistical moments is presented for further energy or memory constrained applications..

(7)

(8) vii. Resumen El objetivo principal de este trabajo consiste en desarrollar y verificar un algoritmo eficiente para la detección secuencial de artefactos en señales procedentes de un sistema ambulatorio para la medición y procesamiento de EEG, bajo la premisa de que no existe conocimiento previo de las caracterı́sticas estadı́sticas de los artefactos. Primero se menciona teorı́a básica de EEG. Después, el marco teórico. Enseguida se describe un método de detección novedoso basado en métodos estadı́sticos no paramétricos, y es llamado Algoritmo de Detección de la Chi Cuadrada Central (CCDA, por sus siglas en inglés). Posteriormente, se evalúa su desempeño a través de simulaciones. Finalmente se dan las conclusiones donde se demuestra la flexibilidad y potencia del novedoso detector. Se muestra que para una entrada AR, CCDA necesita 50s de muestras de aprendizaje para entregar resultados óptimos, y 0.25s cuando señales de EEG verdaderas muestreadas a 1kHz son evaluadas. También se presentan resultados de implementación en lı́nea, para una revisión visual de las capacidades de CCDA cuando se varı́a el orden del modelo M y se selecciona un valor apropiado de probabilidad de falsa alarma PF A . Se describe una implementación recursiva para el cálculo secuencial de momentos estadı́sticos, la cual reduce la utilización de recursos computacionales y puede ser considerada para posteriores aplicaciones con fuertes limitaciones en el uso de energı́a y/o memoria..

(9)

(10) ix. Acknowledgments I would like to thank the Instituto Tecnológico y de Estudios Superiores de Monterrey for the institutional and financial support on the development of my masters degree studies. Similarly I thank the BioMEMS research group and its coordinator, Dr. Sergio Omar Martı́nez Chapa for the offered financial support and the opportunity to collaborate in the electroencephalography research project. I also thank Dr. Martı́nez, who acted as my thesis co-advisor, for his comments and reviews on the final stage of this work. I also want to thank my thesis advisor Dr. Frantz Bouchereau Lara for all his comments, reviews and guidance through the inception and development of this research work. To the synodal of the thesis committee Dr. Graciano Dieck Assad, for his review and highlights on this document. I am truly grateful to my parents Carlos Salvador and Susana and my sister Paulina for their love, affection and support during all the steps in my life, which led me to the culmination of this degree. I am thankful to M.C. Ana Cecilia Puón Dı́az, M.C. Victor Hugo Pérez González and M.C. Juan Alberto González Lugo for their friendship, help and collaboration during the development of the novel methods described in this thesis. I specially thank Ana Cecilia for her support, affection and love that helped me during this period of studies. I would like to manifest my appreciation to my friends and colleagues Alejandro, Antonio, Carlos, Carolina, Christian, Deneb, Edgardo, Enrique, Ernesto, Héctor, Igmar, Jorge, José Luis, Julián, Liliana, Luis, Lyz, Manuel, Marco, Miguel, Omar, Ramón, Raúl, Rodolfo, Rubén, Sandra, Stephanie and Zu-Lym for their support and friendship and the conviviality during my studies of the masters degree.. Carlos Alejandro Robles Rubio December, 2008.

(11)

(12) xi. Contents 1 Introduction 1.1 Problem Description 1.2 Objective . . . . . . 1.3 Justification . . . . . 1.4 Contribution . . . . . 1.5 Thesis Organization .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 2 EEG Background 2.1 Electroencephalogram Basics . . . . . . . . . . . . . . . . . . . 2.1.1 Electroencephalogram Generation . . . . . . . . . . . . 2.1.2 Brain Rhythms and Abnormal Epileptic EEG Paterns . 2.1.3 EEG Measurement . . . . . . . . . . . . . . . . . . . . 2.2 Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Previous work with artifacts in EEG . . . . . . . . . . 3 Theoretical Background 3.1 Autoregressive Random Process Model 3.2 Bootstrap Resampling Method . . . . . 3.2.1 Bootstrap for Dependent Data . 3.3 Detection Theory . . . . . . . . . . . . 3.3.1 Nonparametric Tests . . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . . .. . . . . .. . . . . .. . . . . . .. . . . . .. . . . . .. . . . . . .. . . . . .. . . . . .. . . . . . .. . . . . .. . . . . .. . . . . . .. . . . . .. . . . . .. . . . . . .. . . . . .. . . . . .. 1 2 2 2 3 3. . . . . . .. 5 5 6 8 12 13 15. . . . . .. 19 19 21 23 24 27. 4 The Central Chi-Square Detection Algorithm applied to artifacts in EEG 31 4.1 Adaptive Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.1 Ideal threshold within a segment . . . . . . . . . . . . . . . . . . . 33.

(13) Contents. 4.2 4.3. 4.1.2 AR forward prediction psedo-samples 4.1.3 Bootstrap resamples . . . . . . . . . Block Data Sequential Detection Algorithm Recursive Sequential Detection Algorithm .. 5 Simulation Results for Performance 5.1 Test Cases . . . . . . . . . . . . . . . . . . . 5.1.1 EEG Signals . . . . . . . . . . . . . . 5.1.2 Artifacts . . . . . . . . . . . . . . . . 5.2 Simulation Results . . . . . . . . . . . . . . 5.2.1 Selection of the Best Set of Moments 5.2.2 Receiver Operating Characteristics . 5.2.3 Adaptive threshold estimation . . . . 5.2.4 Online implementation . . . . . . . .. xii . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . .. . . . . . . . .. 6 Conclusion 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Adaptive Sequential CCDA . . . . . . . . . . . . 6.2.2 Spectral Analysis with CCDA . . . . . . . . . . . 6.2.3 Optimum Best Set of Moments and Model Order 6.2.4 Work on models for EEG signals . . . . . . . . . 6.2.5 A Full Nonparametric Approach . . . . . . . . . . 6.2.6 The Non-Central Chi-Square Detection Algorithm. . . . .. . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . . . . . .. . . . .. 37 38 38 39. . . . . . . . .. 43 43 43 46 48 49 70 81 88. . . . . . . . .. 105 105 108 108 109 109 110 110 110. A Analysis and comparison of computational cost among CCDA strategies113 B Impact of AR process input correlation in the selection of moments. 117. References. 139.

(14) xiii. List of Figures 2.1 2.2 2.3 2.4 2.5. 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16. Action potential general schema (adopted from S. Sanei [1]) . . . . . . . . 7 Comparisson of the waveforms of the typical brain rhythms (adopted from [1]) 9 Example of a multichannel EEG with the occurrence of a tonic-clonic (grand mal) seizure (adopted from [1]) . . . . . . . . . . . . . . . . . . . . . . . . 11 The 10-20 standard for electrode location in EEG . . . . . . . . . . . . . . 13 Example of a multichannel EEG with the occurrence of an OA (adopted from [1]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Example of a Simulated EEG signal û1 [n] . . . . . . . . . . . . . . . . . . . Segment of EEG signal ucba1f f 01,O2 [n] . . . . . . . . . . . . . . . . . . . . . Segment of EEG signal ucba1f f 01,T 5′ [n] . . . . . . . . . . . . . . . . . . . . . Segment of artifactual signals with individual artifacts of length L = 250. a) White noise, b) Sawtooth, c) Mixed ECG and EMG, d)Simulated ECG Curve of ROC #1 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #2 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #3 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #4 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #5 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #6 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #7 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #8 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #9 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #10 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #11 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #12 for the evaluation of the BSM. See Table 5.1 . . . . . .. 45 46 47 49 52 53 54 55 57 58 59 60 62 63 64 65.

(15) List of Figures 5.17 5.18 5.19 5.20 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.30 5.31 5.32 5.33 5.34 5.35. Curve of ROC #13 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #14 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ROC #15 for the evaluation of the BSM. See Table 5.1 . . . . . . Curve of ideal ROC with the BSM for the evaluation of the CCDA detector Curve of ROC with the BSM and estimated AR parameters for the evaluation of the CCDA detector . . . . . . . . . . . . . . . . . . . . . . . . . . . Curve of ROC with the BSM and true EEG signal ucba1f f 01,O2 [n] for the evaluation of the CCDA detector . . . . . . . . . . . . . . . . . . . . . . . Curve of ROC with the BSM and true EEG signal ucba1f f 01,T 5′ [n] for the evaluation of the CCDA detector . . . . . . . . . . . . . . . . . . . . . . . Curves of probability of detection and probability of false alarm when the parameter L′ is variated and the input signal is û1 [n] . . . . . . . . . . . . Curves of MSE when applying FPP to the ideal AR process for threshold estimation with Q=500 realizations . . . . . . . . . . . . . . . . . . . . . . Curves of MSE when applying BR to the ideal AR process for threshold estimation with Q=500 realizations . . . . . . . . . . . . . . . . . . . . . . Curves of MSE when applying FPP to an AR process with L′ learning samples for threshold estimation with Q=5000 realizations . . . . . . . . . . . Curves of MSE when applying BR to an AR process with L′ learning samples for threshold estimation with Q=500 realizations . . . . . . . . . . . . . . Curves of MSE when applying FPP to an AR process with L′ learning samples for threshold estimation with Q=500 realizations . . . . . . . . . . . . Curves of MSE when applying FPP to an AR process with L′ learning samples for threshold estimation with Q=500 realizations . . . . . . . . . . . . Curves of MSE when applying FPP to an AR process with L′ learning samples for threshold estimation with Q=500 realizations . . . . . . . . . . . . Online performance of CCDA with ideal threshold and ideal AR parameters Online performance of CCDA with FPP estimated threshold from L′ = 250 samples and AR signal û[n] . . . . . . . . . . . . . . . . . . . . . . . . . . Online performance of CCDA with FPP estimated threshold from L′ = 2 × 105 samples and AR signal û[n] . . . . . . . . . . . . . . . . . . . . . . Online performance of CCDA with BR estimated threshold from L′ = 250 samples and AR signal û[n] . . . . . . . . . . . . . . . . . . . . . . . . . .. xiv 67 68 69 72 74 76 78 80 83 84 85 86 87 88 89 90 91 92 93.

(16) List of Figures. xv. 5.36 Online performance of CCDA with BR estimated threshold from L′ = 2×105 samples and AR signal û[n] . . . . . . . . . . . . . . . . . . . . . . . . . . 5.37 Online performance of CCDA with FPP estimated threshold from L′ = 250 samples and true EEG signal ucba1f f 01,O2 [n] . . . . . . . . . . . . . . . . . . 5.38 Online performance of CCDA with FPP estimated threshold from L′ = 2 × 105 samples and true EEG signal ucba1f f 01,O2 [n] . . . . . . . . . . . . . 5.39 Online performance of CCDA with BR estimated threshold from L′ = 250 samples and true EEG signal ucba1f f 01,O2 [n] . . . . . . . . . . . . . . . . . . 5.40 Online performance of CCDA with BR estimated threshold from L′ = 2×105 samples and true EEG signal ucba1f f 01,O2 [n] . . . . . . . . . . . . . . . . . . 5.41 Online performance of CCDA with FPP estimated threshold from L′ = 250 samples and true EEG signal ucba1f f 01,O2 [n] and model order M = 7 . . . . 5.42 Online performance of CCDA with FPP estimated threshold from L′ = 2 × 105 samples and true EEG signal ucba1f f 01,O2 [n] and model order M = 7 5.43 Online performance of CCDA with BR estimated threshold from L′ = 250 samples and true EEG signal ucba1f f 01,O2 [n] and model order M = 7 . . . . 5.44 Online performance of CCDA with BR estimated threshold from L′ = 2×105 samples and true EEG signal ucba1f f 01,O2 [n] and model order M = 7 . . . . 5.45 Online performance of CCDA with FPP estimated threshold from L′ = 250 samples, true EEG signal ucba1f f 01,O2 [n], model order M = 7 and PF A = 0.1 5.46 Online performance of CCDA with BR estimated threshold from L′ = 250 samples, true EEG signal ucba1f f 01,O2 [n], model order M = 7 and PF A = 0.1 B.1 Curve of ROC #1 for the evaluation of the BSM than the one used in chapter 5. See Table 5.1 . . B.2 Curve of ROC #2 for the evaluation of the BSM than the one used in chapter 5. See Table 5.1 . . B.3 Curve of ROC #3 for the evaluation of the BSM than the one used in chapter 5. See Table 5.1 . . B.4 Curve of ROC #4 for the evaluation of the BSM than the one used in chapter 5. See Table 5.1 . . B.5 Curve of ROC #5 for the evaluation of the BSM than the one used in chapter 5. See Table 5.1 . .. for . . for . . for . . for . . for . .. a . a . a . a . a .. less correlated AR . . . . . . . . . . . less correlated AR . . . . . . . . . . . less correlated AR . . . . . . . . . . . less correlated AR . . . . . . . . . . . less correlated AR . . . . . . . . . . .. 94 95 96 97 98 99 100 101 102 103 104. 118 119 120 121 122.

(17) List of Figures B.6 Curve of ROC #6 for the evaluation of the BSM for a less correlated AR than the one used in chapter 5. See Table 5.1 . . . . . . . . . . . . . . . . B.7 Curve of ROC #7 for the evaluation of the BSM for a less correlated AR than the one used in chapter 5. See Table 5.1 . . . . . . . . . . . . . . . . B.8 Curve of ROC #8 for the evaluation of the BSM for a less correlated AR than the one used in chapter 5. See Table 5.1 . . . . . . . . . . . . . . . . B.9 Curve of ROC #9 for the evaluation of the BSM for a less correlated AR than the one used in chapter 5. See Table 5.1 . . . . . . . . . . . . . . . . B.10 Curve of ROC #10 for the evaluation of the BSM for a less correlated AR than the one used in chapter 5. See Table 5.1 . . . . . . . . . . . . . . . . B.11 Curve of ROC #11 for the evaluation of the BSM for a less correlated AR than the one used in chapter 5. See Table 5.1 . . . . . . . . . . . . . . . . B.12 Curve of ROC #12 for the evaluation of the BSM for a less correlated AR than the one used in chapter 5. See Table 5.1 . . . . . . . . . . . . . . . . B.13 Curve of ROC #13 for the evaluation of the BSM for a less correlated AR than the one used in chapter 5. See Table 5.1 . . . . . . . . . . . . . . . . B.14 Curve of ROC #14 for the evaluation of the BSM for a less correlated AR than the one used in chapter 5. See Table 5.1 . . . . . . . . . . . . . . . . B.15 Curve of ROC #15 for the evaluation of the BSM for a less correlated AR than the one used in chapter 5. See Table 5.1 . . . . . . . . . . . . . . . .. xvi. 123 124 125 126 127 128 129 130 131 132.

(18) xvii. List of Tables 2.1. Most common types of epileptic seizures and their characteristics [1] . . . .. 17. 5.1 5.2 5.3 5.4 5.5 5.6. Moments used to find the most appropriate ROC . . . . . . . . . . . . . . Data for the calculation of the BSM using the sets defined in Table 5.1 . . Data from the Ideal ROCs evaluation . . . . . . . . . . . . . . . . . . . . . Data from the ROCs evaluation of signal ucba1f f 01,O2 [n] . . . . . . . . . . . Data from the ROCs evaluation of signal ucba1f f 01,T 5′ [n] . . . . . . . . . . . Data from the ROCs evaluation of signals û[n], ucba1f f 01,O2 [n] and ucba1f f 01,T 5′ [n]. 50 70 73 77 77 79. A.1 Computational cost of CCDA procedures in terms of combined multiplication and sum operations . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 115. B.1 Data for the calculation of the BSM using the sets defined in Table 5.1 and lower correlation in the AR input process than the used in chapter 5 . . . .. 137.

(19)

(20) xix. List of Acronyms ADC AEEG AP AR AuROC BioMEMS BR BSM BSS CCDA CNS DI ECG EEG EKG EMG EOG ERP EV fMRI FPP IC ICA iid. Analog to Digital Converter Ambulatory EEG Action Potential Autoregressive Area under the ROC Biological Micro Electro-Mechanical Systems Bootstrap Resampling Best Set of Moments Blind Source Separation Central Chi-Square Detection Algorithm Central Nervous System Data Improbability Electrocardiogram Electroencephalogram Electrocardiogram Electromyogram Electrooculogram Event Related Potential Extreme Values Functional Magnetic Resonance Imaging Forward Prediction Pseudo-sampling Independent Component Independent Component Analysis Independent and Identically Distributed.

(21) List of Terms. ITESM LMS LT MSE NCDA OA PCA PDF pdf RLS ROC SP UMP WSS. xx. Instituto Tecnológico y de Estudios Superiores de Monterrey Least Mean Square Linear Trends Mean Squared Error Non-Central Chi-Square Detection Algorithm Ocular Artifact Principal Component Analysis Probability Distribution Function Probability Density Function Recursive Least Squares Receiver Operating Characteristics Spectral Pattern Uniformly Most Powerful Wide Sense Stationary.

(22) 1. Chapter 1 Introduction This research project is developed in the area of Electroencephalographic (EEG) Signal Processing. Currently most EEG studies are made in dedicated facilities, which are generally located at hospitals, laboratories and specialized clinics, due to the necessary equipment and the required controlled environment where they should be performed. This is to reduce occurrence of noise that could lead to a wrong representation of data and consequently to their misinterpretation. The design of ambulatory EEG systems facilitates these procedures, giving the patients the opportunity to stay at their day to day environment (i.e. home, office, etc.), eliminating the hospital cost, and permitting the generation of longer recordings. Besides this, some of the problems that provoke the symptoms of the patient may be related directly with the day to day life activities rather than with controlled environments, which is another important reason for the development of ambulatory EEG (AEEG) systems [2]. Due to the non-controlled environment, the data obtained with an AEEG may be mixed with other signals not related with the brain activity, like the electric noise interference or even several biopotentials generated by some other organs of the patient. This thesis is focused in the detection of the second kind of occurrences mentioned, the physiological artifacts. In this text, when talking about artifacts they should be understood as the physiological signals as evidence of normal human activities..

(23) 1 Introduction. 2. 1.1 Problem Description The signals obtained with an electroencephalogram not only show the brain activity, but also some other electric potentials generated by the activity of another organs. These signals are called artifacts, and the segments of EEG recorded data that contain them have to be discarded. Nowadays there is research in the cancellation of artifacts, reconstruction of EEG signals, their spectrum analysis, and certain work related with detection of artifacts like encountered in references [3] and [4]. The main drawback of these detection techniques is that they are not designed for the detection of several kind of artifacts, and also that they require a priori knowledge of the statistics of the clean EEG signal and the artifacts being detected.. 1.2 Objective The general objective of this research work is to develop, research and test efficient algorithms for the sequential detection of artifacts in EEG signals, considering that there is no previous knowledge of the artifact characteristics and based on the statistical signal processing techniques. The scope of this project is limited to the detection of artifacts in the EEG signals, giving the opportunity to perform their cancelation and also to implement them in a hardware platform for future research work.. 1.3 Justification The importance of considering the mentioned problem when working with EEG signals arises from the need of the neurologists to find certain parameters and behaviors that help to clarify the health state of the patient. However in the presence of contaminated data sections it is not possible for them to correctly analyze the information. Moreover, these tests require adequate equipment to avoid subsequent repetitions due to data acquisition problems which could derive in additional cost, time and inconvenience to patients. In the ambulatory EEG systems it is necessary to save the most possible amount of.

(24) 1 Introduction. 3. resources, because as in every embedded system, there is the need to optimize the algorithms in order to implement them online and in real time. Identifying each artifact by its wave form, spectrum and statistics in order to program it in a portable system, requires very large amount of memory, besides it implies high processing work. This is why having a technique capable of recognizing artifacts based on common characteristics is of great utility.. 1.4 Contribution The contribution of this work is the development of a novel method for the sequential detection of artifacts in EEG signals, by means of nonparametric statistical methods. The detector can be tuned for the detection of any type of artifact and it can work with both raw EEG data or EEG after BSS (e.g. independent components from ICA). This methodology does not need a priori knowledge neither of the clean EEG statistics nor of the EEG plus artifact characteristics, it only needs a few learning samples to deliver results. The method is composed of three steps: process and detector characterization, adaptive threshold estimation, and sequential implementation. Several approaches are presented and developed for the three steps. The performance analysis of the first step is fully tested and characterized, the second is tested with an ideal random process, and the analysis of the capabilities of the last one is left as future research work.. 1.5 Thesis Organization This document is organized as follows: Chapter 1 presents an Introduction, the Problem Description, Objective and Justification. Chapter 2 shows several EEG basics for the general understanding of the work. Chapter 3 describes the Theoretical Background needed to develop the algorithms. Chapter 4 deals with a Novel Method developed in this research, based on a Nonparametric Chi-Square Test. Chapter 5 shows the Simulation results for performance. Chapter 6 presents Conclusions and Further Research..

(25)

(26) 5. Chapter 2 EEG Background 2.1 Electroencephalogram Basics An electroencephalogram consists on a set of measurements taken from the scalp or the brain by means of electrodes, that correspond to the electrical neural activity. In the early days of the electroencephalography the recordings where taken with few electrodes, but nowadays the signals are taken with an array of multiple sensors and using fully computerized systems, which are equipped with signal analysis and processing tools and enough memory for long time recording. If the data is taken directly from the brain it is called an electrocortiogram [1, 5]. The study and research of EEG signals is of high importance due to the numerous medical and engineering applications, like brain computer interfacing (BCI), study of memory, prediction of epileptic and nonepileptic seizures, behavioral patterns, mental disorders, among others. The usefulness of classification of EEG patterns is still straggled mainly because the signals collected from the scalp are a distorted version of the biopotentials generated in the brain, and because of the artifacts and noises introduced by the body, the environment, or some other external sources that tarnish the measurement of the EEG signals [6]..

(27) 2 EEG Background. 6. 2.1.1 Electroencephalogram Generation The information transmitted by the nerves is called an action potential (AP) and it is generated by an exchange of ions through the structure of the neurons and their connection with other neurons by means of the axons and dendrites which are part of these nervous system cells. The velocity of this communication is from about 1 to 100 m/s, and these events are generated by stimuli like chemical, light, electricity, pressure, touch and stretching in sensory nerves, and primarily by chemical activity in the central nervous system (CNS, brain and spinal cord). It is important to remark that a stimulus must surpass certain threshold in order to trigger an AP. For a person the amplitude of the AP ranges between approximately −60mV and 10mV . The potential before polarization is of about −70mV , but by the stimulus of ions the charge becomes less negative and if it reaches limit of −55mV the AP process continues. The potential may reach +30mV in the AP peak and then the depolarization takes the signal down to an undershoot of approximately −90mV , called hyperpolarization, and finally to the settling state of −70mV , see Figure 2.1. The hyperpolarization prevents the neuron from receiving any other stimulus during this time [1]. When several neurons are activated, they generate currents within their communication channels. This current generates a magnetic field that can be read by an electromyogram (EMG) machine, and also a secondary electrical field over the scalp which is what the EEG is able to measure and record. Due to the electric resistance of the human head, generated mainly by the brain, skull and scalp, only potentials generated by a large population of active neurons can be obtained with scalp electrodes. After their acquisition, the signals are amplified for either display or processing purposes [1]. According to [1] the study of EEGs aids in the diagnosis of many neurological disorders and abnormalities in the human body. These signals may be used for research on the following: • monitoring alertness, coma, and brain death; • locating areas of damage following head injury, stroke, and tumour; • testing afferent pathways (by evoked potentials);.

(28) 2 EEG Background. Fig. 2.1. Action potential general schema (adopted from S. Sanei [1]). • monitoring cognitive engagement (alpha rhythm); • producing biofeedback situations; • controlling anesthesia depth (servo anesthesia); • investigating epilepsy and locating seizure origin; • testing epilepsy drug effects; • assisting in experimental cortical excision of epileptic focus;. 7.

(29) 2 EEG Background. 8. • monitoring brain development; • testing drugs for convulsive effects; • investigating sleep disorders and physiology; • investigating mental disorders; • providing a hybrid data recording system together with other imaging modalities. 2.1.2 Brain Rhythms and Abnormal Epileptic EEG Paterns The EEG signals have characteristic patterns that appear in certain frequency bands. Those patterns are called brain rhythms and are classified by their frequency domain. The main brain waves are called alpha (α), theta (θ), beta (β), delta (δ), and gamma (γ) [1]. The theta waves are in the range of 4-7.5Hz. These waves appear as consciousness slips towards drowsiness, and they have been associated with access to unconscious material, creative inspiration and deep meditation. Larger contingents of theta wave activity in the waking adult are abnormal and are caused by various pathological problems [1]. Alpha rhythms appear in the posterior half of the head and are found over the occipital region. Their characteristic frequency range lies within 8-13Hz and has the appearance of a round or sinusoid-like signal. It has an amplitude of normally less than 50µV and it can be found in the occipital area of the head. This rhythm has been thought to indicate both a relaxed awareness without any attention or concentration. In general, the alpha waves appear with eyes closed and they are reduced or disappear by opening the eyes, hearing unfamiliar sounds, anxiety or mental concentration. The origin and physiological significance of alpha waves is still unknown and yet more research is needed in the area [1]. The activity of beta waves varies in the range of 14-26Hz, although in certain text no upper bound is presented. It is the usual waking rhythm of the brain, and it is related to active thinking, active attention, focus on the outside world, or solving concrete problems. It is encountered over the frontal and central regions. The amplitude is normally under 30µV [1]..

(30) 2 EEG Background. 9. In the range above 30Hz lies the gamma rhythm, also called the fast beta wave. Its amplitudes are very low and it has a rare occurrence, but this rhythm can be used for confirmation of certain brain diseases. It can be mainly located in the frontocentral area of the head. This band has been proved to be a good indication of event-related synchronization of the brain. Figure 2.2 shows a comparisson of the typical waveforms for the mentioned brain rhythms [1].. Fig. 2.2 Comparisson of the waveforms of the typical brain rhythms (adopted from [1]). The previously mentioned rhythms may last if the subject does not change of state, and are approximately cyclic. There are other brain waveforms that may be related to some other events that include: damaged sections of the brain, transients such as event-related potentials (ERPs) and contain positive occipital sharp transient (POST) (called rho (ρ) waves), motor activity such as the mu (µ) rhythm, which has been used in feedback training.

(31) 2 EEG Background. 10. for several purposes like the treatment of epileptic seizure disorder [1]. There are several mental disorders that may provoke the apparition of abnormal patterns in the EEG, mainly due to the changes in the network of neurons and the variations on their communication. Some examples of these problems are aging, dementia, epilepsy, psychiatric disorders (e.g. attention-deficit disorder) among others. Abnormal patterns may also arise from external effects such as looking at the TV screen, listening to music without any attention or also pharmacological and drug effects [1]. On the specific case of epilepsy, it comprises a diverse collection of disorders. The most common therapy is symptomatic, and the available drugs reduce the frequency of the seizures in the patients, but only a low percentage are free of them. The term seizure refers to a transient change of behavior due to the disordered, synchronous and rhythmic firing of populations of CNS neurons, and epilepsy is defined as a disorder of brain function characterized by the occurrence of recurrent, unpredictable and non induced seizures. When the seizures are intentionally provoked they are considered nonepileptic. The terms ictal and interictal refer to the adjectives “seizure-like” and “between seizure” respectively [7]. The behavioral manifestations of a seizure appear in the functions normally served by the cortical region where the seizure arises. When a simple partial seizure occurs the person generally preserves consciousness, on the other hand, a complex partial seizure is associated with impairment of consciousness. The majority of the last mentioned seizures originate in the temporal lobes. Absence, myoclonic, and tonic-clonic are examples of generalized seizures. A given patient usually exhibits multiple kinds of seizures in different episodes [7]. For further details in the classification and explanation on epileptic seizures and the common patterns in which they appear refer to [8]. Studies performed in EEG recordings for different kind of seizures reveal that there are distinctive abnormalities for different kind of seizures, e.g. there are patterns for tonic seizures and they differ from those of a clonic seizure [7]. Figure 2.3 shows a multichannel EEG with a generalized tonic-clonic (grand mal) seizure. There are several distinctive characteristics among different kind of seizures. Table 2.1.

(32) 2 EEG Background. 11. Fig. 2.3 Example of a multichannel EEG with the occurrence of a tonicclonic (grand mal) seizure (adopted from [1]). shows the most common kinds of seizures and their characteristics [1]. Generally a clinical seizure is characterized by a sudden change of frequency in the EEG signals. The transition from the preictal to the ictal state, in a focal epileptic seizure, presents a gradual change from chaotic to ordered waveforms. The amplitude of the spikes is not completely correlated with the severity of the seizure, and their morphology varies significantly with age, but they may occur in any level of awareness (e.g. wakefulness and deep sleep) [1].

(33) 2 EEG Background. 12. The distinction of epileptic seizure from common artifacts can be made based on the repetitive (rhythmical) nature of the epileptic spikes, which is different from the artifacts that are transients or noise-like in shape. In the specific case of electrocardiogram (ECG) the frequency of occurrence of the waveforms is approximately 1 Hz, but this waves are very different in shape compared with the seizure signals [1]. There may be spikes and other paroxysmal discharges in nonepileptic persons, and they may be found in healthy individuals; however they usually are signs of certain cerebral dysfunctions that may or may not develop into an abnormality. They may appear during periods of particular mental challenge on individuals, like soldiers in the war, pilots and prisoners [1]. 2.1.3 EEG Measurement The most used international standard for the collocation of the electrodes in EEG recordings is the 10-20 system, and it has 21 record points. The arrangement uses two points as reference, the Nasion, that is just above the nose in line with the eyes, and the Inion, found in the bone that is in the posterior base of the skull. From the reference points, the electrodes are positioned separated each 10 or 20% of the head surface. For a graphic representation of this array, see Figure 2.4. The amplitude of these signals is near 100µV if observed from the scalp, and from 1 to 2mV when recorded from the brain surface. The bandwidth is from 1 to 50Hz approximately [5, 1]. Commercial EEG recording systems frequently include impedance monitors to maintain an adequate level in this parameter because high impedance can lead to distortion in the signals [1]. Since the EEG signals have amplitudes in the order of µvolts and may contain frequency components up to 300Hz, the observed data should be conditioned in order to reject external potentials that may lead to erroneous interpretation. Usually a high-pass filter with cutoff frequency of 0.5Hz can be used to eliminate very low frequency components, and a low-pass filter with cut-off frequency usually selected between 50 to 70Hz may be used. This preprocessing can be made either before or after the ADC. The most commonly used.

(34) 2 EEG Background. Fig. 2.4. 13. The 10-20 standard for electrode location in EEG. sampling frequencies for EEG recordings are 100, 250, 500, 1000, and 2000 samples/s [1]. There are several methods to eliminate or mitigate the effects of the power line frequency interference, from a notch filter with null frequency in 50 or 60Hz to adaptive noise cancelers like the one described in [9].. 2.2 Artifacts Artifacts are defined as the signals perceived by an EEG, which are not generated by the brain. The physiological artifacts are those that arouse from an organ from the body of the patient, and the non-physiological artifacts are the ones that come from an external source. Muscle activity, ocular movement and the breathing process are examples of the first kind of artifacts, while the power lines interference (e.g. the 60Hz frequency) and the misplacement of electrodes are considered part of the second group [10]..

(35) 2 EEG Background. 14. Due to their specific characteristics, the ocular artifacts (OA) like eye blinks are particularly difficult to locate and eliminate from the EEG signals without losing important information of event-related potentials (ERPs). This kind of artifact generates signals within EEG of the order of ten times larger in amplitude than the cortical signals and last from 200 to 400 ms. This fact will be used later for the design of test cases for the different algorithms described within this text. Due to the power of the blinking artifacts and the scalp’s resistance the OAs can affect a large part of the electrodes. Figure 2.5 shows an example of the EEG signals with an OA [11, 12, 1]. Another common type of artifactual signal is the ECG, it occurs when the cardiac electrical field affects the potentials on the scalp and near the eyes. It leads to interference in the EEG and EOG recordings, and it can be easily recognized by its periodicity and coincidence with the ECG channel peaks. Its waveform varies from time to time, and large inter-individual voltage variations can be observed [13]. The work in [14] analyses this kind of artifact when a subject is under a functional Magnetic Resonance Imaging analysis (fMRI), where it commonly has more severe impact in the EEG signal; and the artifactual amplitude, spatial distribution on the scalp and frequency of occurrence is investigated. Their results show that this interference is normally largest in the frontal region, and that the mean amplitude of the artifact (from 78 channels in total) was 58µV with standard deviation SD = 58µV when the electrode leads were untwisted, and 36µV with SD = 34µV when the leads were twisted together. These results will be used in section 5.1.2 for the generation of test artifacts. The Electromyogram (EMG) is yet another very common type of artifact, and it is seen on almost all EEG recordings performed in clinical practice. Frontalis and temporalis muscles (e.g., clenching of jaw muscles) are particularly common. The generated potentials are shorter in duration and can be identified by their contrast in duration, morphology and frequency (50-100 Hz) to those of the EEG. Chewing is a particular example of this artifactual signals [15]..

(36) 2 EEG Background. 15. 2.2.1 Previous work with artifacts in EEG One of the most predominant artifact rejection research lines is the use of Independent Component Analysis (ICA), which is a technique of the branch of blind source separation analysis (BSS). It performs a decomposition from observed multichannel signals into various independent components, looking for statistical independency, not only based in decorrelation, but also in high-order statistical independence which is what makes it different from Principal Component Analysis (PCA) [16]. The main problem with ICA is that for EEG, the number of sources is not easily found thus almost all of the independent components obtained may contain important EEG data that should not be ignored for clinical analysis; and ICA relies in the assumption that the number of sensors is equal to the number of independent components, which is difficult to achieve [17]. Works using ICA include [3, 13, 16, 17, 18, 19, 20, 21]. In [17] an alternative to ICA for EEG artifact correction without the assumptions of this BSS method is presented. The detection method presented in this work can be used after ICA decomposition, but it is mainly intended for use with reconstruction methods like the one described in [17], where the algorithms work directly with the EEG data. In [18] several methods for detection of artifacts are described, which are: Extreme Values (EV), Linear Trends (LT), Data Improbability (DI), Kurtosis, and Spectral Pattern (SP). In EV, the data trials are labeled as artifactual if the absolute value of any data point in the trial exceeded a fixed threshold. According to [18] EV is the most widely used in the EEG community for artifact detection and has good performance with high amplitude OAs. When using LT, the goodness of fit of the EEG activity to an oblique straight line is measured, and the data is marked as artifactual or artifact-free depending on the minimum slope of this line and the result of the goodness to fit. The DI and Kurtosis methods look for unusual behavior of the EEG recorded signal, due to the transient and low probability of occurrence of artifacts. DI depends on the observed joint probability density function, which is obtained by dividing the observed data in several bins, then obtaining the corresponding pdf, and finally multiplying them. The Kurtosis method relies in typical values of the kurtosis moment for artifactual signals. Finally SP relies in the previous knowledge that certain artifacts have very well defined spectral components, which can be identified.

(37) 2 EEG Background. 16. within the EEG signal. The work of [22] assumes that the EEG signal can be modeled with an AR process, and it uses the variance of the innovation (by inverse filtering the EEG after estimating the AR parameters) as an indicator of the presence or absence of artifacts. It is thought from the perspective that the EEG is a stationary process within a segment of time, and thus the variance of the innovation is relatively constant. If this variance presents a significant change, it is said to be an EEG signal with artifactual contamination. From another point of analysis, the method in [23] is based on the Wavelet Transform (WT), and it is intended to eliminate ECG artifacts based on their characteristic spectral content. In [21] the authors introduce the Hurst exponent as an indicator of the presence or absence of artifactual signals in the EEG. According to this work, a time series can be parametrized by means of the Hurst exponent, and it has been found that this parameter H has a value equal to 0.70−0.76 for many natural, economic and human phenomena. With respect to artifactual signals, the ECG artifact has a value in the range of H = 0.64 − 0.69 and the eye blinking or OA is in the range of H = 0.58 − 0.64. The method consists on performing ICA for ICs separation and then calculating the corresponding Hurst exponent to yield a decision. Taking this categorization into account, the signal subspace is obtained and the data is filtered to obtain the corrected EEG signal. The work presents the method for the Hurst exponent recurrent calculation, enabling it to be used in sequential online processing. The main drawback of this method is that it relies on the ICA assumptions, and also that it needs certain a priori information about the characteristic values of the Hurst exponent for each of the signals that it wants to detect. The novel detector developed in this thesis is based on a similar approach to that of the DI and Kurtosis methods described in [18], looking for improbability measures, but instead of using the directly observed pdf from the data, or a single moment (like kurtosis), it is based on the pdf of a combination of statistical moments estimators, and the thresholds are automatically obtained by means of nonparametric methods..

(38) 2 EEG Background. Table 2.1 [1]. Kind Tonic-clonic. Petit-mal. Psychomotor. Most common types of epileptic seizures and their characteristics. Spatial location All electrodes with tendency to frontal ones -. Temporal lobe. Cortical(focal) -. Myoclonic. Frontal region. Tonic. -. Atonic. Generalized. Akinetic. -. Jackknife. -. 17. Frequency 6-12 Hz. Description It is the most common type of epileptic seizure and it has a rhythmic but spiky pattern in the EEG. 3 Hz Interictal paroxysmal seizure with a generalized synchronous spike wave complex of prolongued bursts. 4-6 Hz Also called complex partial seizure. It is presented by bursts of serrated slow waves with amplitude of above 60 µV. Rising amplitude and diminishing frequency during ictal period. It is usually initiated by local desynchronization. Concomitant polyspikes, seen clearly in the EEG. They can have generalized or bilateral spatial distribution. 10 Hz Occur in patients with Lennox-Gastaut syndrome and have spikes that repeat at the given frequency. 1-2, 10 Hz May appear in the form of a few seconds drop attack or be inhibitory, lasting for a few minutes. They show a few polyspike waves or spike waves with generalized spatial distribution followed by large slow waves. 1-2 Hz It is a rare kind of seizure and it is characterized by arrest of all motion but it is not caused by a sudden loss of tone as in atonic seizure. The patient is in an absent-like state. They are also called salaam attacks, and are common in children with hypsarrhythmia and are either a sudden generalized flattening desynchronization or have rapid spike discharges..

(39) 2 EEG Background. Fig. 2.5 Example of a multichannel EEG with the occurrence of an OA (adopted from [1]). 18.

(40) 19. Chapter 3 Theoretical Background 3.1 Autoregressive Random Process Model An Autoregressive (AR) sequence is a time series that can be used to model certain random processes, and it can be defined as x[n] = −. M X i=1. ai x[n − i] + ǫ[n]. (3.1). where ǫ[n] ∼ N (0, σǫ2 ) is the driving noise of the sequence, x[n] is the random variable in time n and M is the order of the AR sequence. An important remark is that the values of ǫ at each time n come as a random sample, i.e. they are independent and equally distributed [24]. A relevant aspect of the AR processes is the relation of the AR coefficients a with the lags of the autocorrelation sequence r(l). This relation is given by the Yule-Walker equation by the following equation     r(−1) −a1 . . . r(−M + 1)       . . . r(−M + 2)  −a2   r(−2)        .. ...   ..  =  ..   .  .   .   r(−M ) −aM r(M − 1) r(M − 2) . . . r(0) . r(0) r(1) .. .. r(−1) r(0) .. .. (3.2). where r(l) is the value of the autocorrelation sequence at the lag l and a1 , . . . , aM.

(41) 3 Theoretical Background. 20. are the values of the AR parameters [24]. For real valued processes the autocorrelation sequence is symmetric with respect to the origin, i.e. r(0), and the Yule-Walker equations become     r(1) −a1 . . . r(M − 1)       . . . r(M − 2)  −a2   r(2)   .  =  .   .. ...  .   .   .  .   .   r(M ) −aM r(M − 1) r(M − 2) . . . r(0) . r(0) r(1) .. .. r(1) r(0) .. .. (3.3). The AR processes are asymptotically stationary, so for a sufficiently large value of n, the asymptotic distribution of each x[n], which must be gaussian since it is a sum of gaussian random variables, is obtained by the following procedure. First, the mean is found by. E {x[n]} = E =−. (. −. M X i=1. M X i=1. ai x[n − i] + ǫ[n]. ai E {x[n − i]} + E {ǫ[n]}. = −E {x[n]} E {x[n]} 1 +. M X i=1. ai. !. ). M X. ai + 0. i=1. =0. E {x[n]} = 0. (3.4). then the variance is defined as. V ar {x[n]} = E x2 [n] − E 2 {x[n]} = E x2 [n]  ) !2  ( M M   X X ai x[n − i] + E ǫ2 [n] =E ai x[n − i] − 2E ǫ[n]   i=1 i=1   ! 2 M   X =E ai x[n − i] + σǫ2   i=1. (3.5).

(42) 3 Theoretical Background. 21. which is the general expression of the variance in terms of the order of the AR process M . Equation (3.5) can be evaluated in different values of M depending on the desired or observed circumstances, for example, for M = 1 and M = 2 refer to equations (3.6) and (3.7) respectively. V ar {x[n]} = V ar {x[n]} =. σǫ2 1 − a21. (3.6). σǫ2 1 − a21 − a22 +. 2a21 a2 1+a2. (3.7). In general the asymptotic distribution of x[n] is . x[n] ∼ N 0, E.  M  X . i=1.  !2   ai x[n − i] + σǫ2  . (3.8). and more specific distributions can be obtained by substituting the value of M and using the Yule-Walker equations to solve for the values of the autocorrelation sequence. The x[n] for any n > τ where τ is a sufficiently large time value where the AR becomes approximately stationary, can be considered to be asymptotically identically distributed.. 3.2 Bootstrap Resampling Method The bootstrap is a computational tool for statistical inference. Some of the tasks that can be performed with the aid of bootstrap based methods are: estimation of statistical characteristics (e.g. bias, variance, probability density function (pdf)), hypothesis tests which are the base for signal detection, and model selection. This tool can be used when there is little or no knowledge of the statistics of the data or only a small amount of data is available [25]. In [26] the developer of the original bootstrap method coauthors a wide description and analysis of this tool. The bootstrap principle consists in the following: Suppose the values x = {x1 , x2 , . . . , xL } are available, and they represent a realization of the random sample (i.e. the random variables are independent and identically distributed (iid)) X = {X1 , X2 , . . . , XL }, taken from an unknown distribution FX (x|θ). Let θ̂ = θ̂(X) be an estimator of some parameter θ of.

(43) 3 Theoretical Background. 22. FX (x|θ). The aim is to find statistical characteristics of θ̂ like its distribution [25]. If FX (x|θ) is considered to be known it is a relatively simple task to obtain the exact values of the corresponding characteristics of θ̂. However, in practical applications there are several factors that can obscure the obtention of the characteristics in a closed form, like the uncertainty of the distribution FX (x|θ) or a very intricate form of the parameter estimator θ̂(X). The problem is then how to perform statistical inference if there are no parametric or asymptotic results that could be dealt with. The bootstrap offers a solution to this obstacle. It suggests to substitute the unknown distribution FX (x|θ) by the empirical distribution of the data F̂X (x|θ). In general terms, the bootstrap recommends to reuse the original data through resampling to create what is called a bootstrap resample. A bootstrap resample has the same size as the original one, i.e. x∗b = {x∗1 , x∗2 , . . . , x∗L } for b = 1, 2, . . . , B where the x∗i , i = 1, 2, . . . , L are obtained from x by drawing the values in a random with replacement fashion. Based on the bootstrap resample x∗b , the bootstrap parameter estimates θ̂b∗ = θ̂(x∗b ) are calculated, and for a large number B of bootstrap parameter estimates, the distribution of θ̂ can be approximated by the distribution of θ̂∗ , that is originated from the bootstrap sample x∗ . In other words, the distribution Fθ̂ (θ̂|x) is approximated by the distribution of θ̂∗ , that is Fθ̂∗ (θ̂∗ |x∗ ) [25]. According to [25], one rule of thumb is for the number of bootstrap samples B to take a value between 25 and 50 for variance estimation and to be set to about 1, 000 where a 95% confidence interval is sought. The bootstrap simulation error, that is, the difference of the true distribution and the estimated distribution, is a mixture of two independent sources known as the bootstrap (statistical) error and the simulation (Monte-Carlo) error. The first one depends solely on the sample size L, and the second one can be minimized by increasing the number of bootstrap resamples B. The general idea is to choose a number B such that the simulation error is no larger than the bootstrap error. A rule of thumb mentioned in [25] is that B = 40L is appropriate in many applications, although this value is dependent on the processes under analysis, so it is left as a experimenter’s choose..

(44) 3 Theoretical Background. 23. 3.2.1 Bootstrap for Dependent Data For certain data, the iid assumption is not always valid, so the basic sampling with replacement previously described, will not provide accurate results when estimating parameters from a population. A consistent way to extend the basic bootstrap principle to dependent data is to use data modeling and then assume that the residuals that approximate the modeling and measurement errors are iid. The idea is to reformulate the problem so that the iid components inherent to the data could be used for resampling [25]. More specifically, when talking about AR models, the following procedure described in [25] can be implemented. Given L observations xn , n = 1, . . . , L, of an AR process of order M and coefficients ak , k = 1, . . . , M the steps to perform bootstrap with this data are: 1. With the estimates âk of ak for k = 1, . . . , M (obtained by solving the Yule-Walker P equations (3.2)), calculate the residuals as ẑn = xn + M k=1 âk xn−k for n = M + 1, . . . , L. ∗ ∗ with replace2. Create a bootstrap resample {x∗1 , . . . , x∗L } by drawing ẑM +1 , . . . , ẑL ment from the residuals {ẑM +1 , . . . , ẑL }. Then letting x∗n = xn for t = 1, . . . , M and P ∗ ∗ x∗n = − M k=1 âk xn−k + ẑn for n = M + 1, . . . , L.. 3. Obtain bootstrap estimates {â∗1 , . . . , â∗M } from {x∗1 , . . . , x∗L }. ∗b 4. Repeat steps 2-3 B times to obtain â∗b 1 , . . . , âM. for b = 1, . . . , B.. ∗b The bootstrap estimates â∗b 1 , . . . , âM for b = 1, . . . , B are used to estimate the distributions of â1 , . . . , âM or their statistical measures such as means, variances, or confidence intervals.. There are contributions in alternative methods for dependent data bootstrap implementation, like the full nonparametric Moving Blocks bootstrap. For more description on the distinct bootstrap methods refer to [25, 26]..

(45) 3 Theoretical Background. 24. 3.3 Detection Theory With nowadays digital signal processing technology there are wide possibilities to represent the information as a data set, which can be defined as {x[0], x[1], ..., x[L − 1]}, where L is the number of available samples. The general problem that the detection theory tries to solve consists in the determination of a function T dependent of the signal data set, i.e. T (x[0], x[1], ..., x[L − 1]), and find the way in which the range values of T influence the decision of either presence or absence of the event under analysis. A clear example in biomedical engineering is the detection of a cardiac arrhythmia [27, 28], or as in this work, the presence of artifacts in EEG signals. In a detection problem there are generally several hypothesis under consideration, like in an artifact detection system applied to EEG signals, when trying to determine the kind of occurrence in the current data (i.e. electrooculogram, electrocardiogram, electromyogram, etc.). Due to the data characteristics and the presence of such prospects, it is possible to formulate the problem based in the statistical hypothesis testing theory [27, 28, 29]. A hypothesis can be defined as a statement about a population parameter. In a hypothesis testing context, there are two complementary hypotheses, which are called the null hypothesis and the alternative hypothesis, denoted by H0 and H1 respectively. If θ constitutes a population parameter, the general format of the null and alternative hypotheses is H0 : θ ∈ Θ0 and H1 : θ ∈ Θc0 where Θ0 is some subset of the parameter space and Θc0 is its complement [29]. An example for the selection of the hypotheses in a system for detection of artifacts in EEG signals would be the following:.

(46) 3 Theoretical Background. H0 : ui [n]. 25. 0≤n≤U −1. vs. H1 : ui [n] + ak [n]. (3.9) 0≤n≤U −1. where the null hypothesis corresponds to the artifact-free EEG channel ui , and the alternative represents the same channel but with an artifact of type ak . The intention is then to obtain a test statistic T (Xi ) (i.e. detector), where Xi is the sample from channel ui , to be able to discern between the two options. A hypothesis test is a rule that defines: a)For which sample values the decision is made to accept H0 as true, b)For which sample values H0 is rejected and H1 is accepted as true. The subset of the sample space for which H0 is rejected is called the rejection region R or critical region. Its complement is called the acceptance region Rc [29]. There are several different parametric techniques of detection, each one for a distinct kind of signal and the environment in which they are immersed. For a deeper description of such methods see [27, 28], where detectors based on the Neyman-Pearson lemma and in Bayesian theory are developed. In [29] several methods for finding tests are presented. When performing a hypothesis testing procedure two kinds of errors can be committed, namely they are the Error Type I and Error Type II, or as in signal processing jargon, a False Alarm and a Miss respectively. The Type I error occurs when the parameter under evaluation θ ∈ Θ0 , i.e. H0 is true, but the sample x ∈ R, so the null hypothesis is rejected and H1 is considered true; in detection terms, a false detection, or false alarm occurs. The probability of a Type I error, or probability of false alarm PF A , is defined as PF A = P (X ∈ R|H0 ) = P (X ∈ R|θ ∈ Θ0 ). (3.10). that is, the probability that the sample x belongs to the rejection region R, given that the parameter θ belongs to the null hypothesis parameter subspace Θ0 . The Type II error, or Miss, happens when the parameter θ ∈ Θc0 , that is H1 is true, but the sample x ∈ Rc ,.

(47) 3 Theoretical Background. 26. and H0 is accepted when it is false. The probability of a Type II error, or probability of miss, is defined as PM ISS = P (X ∈ Rc |H1 ) = P (X ∈ Rc |θ ∈ Θc0 ) = 1 − P (X ∈ R|θ ∈ Θc0 ) that is, the probability of accepting H0 when it is false, or also, one minus the probability of correctly reject H0 when it is not true. From this last expression the probability of detection can be defined as PD = P (X ∈ R|H1 ) = P (X ∈ R|θ ∈ Θc0 ). (3.11). that is, the probability of appropriately reject H0 when it is false. From the three definitions above, it can be observed that the test with rejection region R can be fully described by the function of the parameter P (X ∈ R|θ) that has the value of PF A if θ ∈ Θ0 , and PD = 1 − PM ISS if θ ∈ Θc0 . This leads to the definition of the following function β(θ) = P (X ∈ R|θ). (3.12). that is named the Power Function and it depends on the parameter θ. A good test has power function near 1 for most θ ∈ Θc0 and near 0 for most θ ∈ Θ0 . When looking for a good test, it is common to restrict consideration to tests that control the Type I error probability at a specified level, while obtaining the highest possible values for the detection probability in the region of interest of the parameter. A test with power function β(θ) is said to be a size α test if supθ∈Θ0 β(θ) = α, for 0 ≤ α ≤ 1; and it is said to be a level α test if supθ∈Θ0 β(θ) ≤ α [28, 29, 30]. The selection among H0 and H1 is performed based on the test statistic T (X), and the decision is done by means of a threshold γ within the possible values of this statistic. For example for the hypothesis testing problem in (3.9), the detector T (X) will accept H0 for all the values γ ∈ Γ0 that are calculated from the sample values x ∈ Rc , and it will reject H0 for all the values γ ∈ Γc0 obtained from the sample values x ∈ R. The selection of the threshold γ can be made by selecting an appropriate size for the test, so for a size α test that rejects for higher values of the test statistic T (X), the following relation can be.

(48) 3 Theoretical Background. 27. evaluated P (T (X) > γ|θ ∈ Θ0 ) = α = PF A. (3.13). that is the probability of the test statistic taking a value greater than the threshold, given that H0 is true (false alarm); and the probability of detection can be calculated as PD = P (T (X) > γ|θ ∈ Θc0 ). (3.14). that is the probability of the test statistic being greater than the threshold, given that H1 is true. The works of [27, 28] show an alternative way of presenting the performance of a detector, the Receiver Operating Characteristics (ROC). It is a plot of PD versus PF A . If the test is a good one, then the curve should be above the chance line (i.e. the diagonal) that characterizes the performance of a pure guess; so it can be said that if the ROC curve of a detector is above the curve of another one, in general terms the first one has a better performance than the other. A possible way to measure this aspect is by means of the area under the ROC curve (AuROC), for a greater area a better performance. If PF A equals zero, H0 is always selected so PD = 0. On the other hand, if PF A equals one, then H1 always selected and PD = 1. Each point on the curve corresponds to a value of (PF A , PD ) for a given threshold γ. By adjusting γ any point on the ROC curve may be obtained, and as expected, as γ increases, PF A decreases but so does PD and vice versa [27, 28]. The optimum threshold within a ROC will be considered that which maximizes the PD while keeping the PF A small. To achieve this the threshold selected will be the one that produces the highest deflection of the ROC curve from the diagonal, i.e. the value of γ that yields the point (PF A , PD ) that has the greater distance from the chance line. 3.3.1 Nonparametric Tests Sometimes the PDF from which the observed data is obtained is not known, there is not enough data to estimate the possible distribution, or the data may come from a distribution for which methods are not readily available. In such situations where parametric methods are not possible to be implemented is where nonparametric methods come into play [31]..

(49) 3 Theoretical Background. 28. Nonparametric methods require minimal assumptions about the distribution of the population, while parametric methods require that the form of the population distribution be completely specified with exception of a finite number of parameters [31]. This is what gives this statistical approach a wide flexibility and opens many possibilities for analysis in areas where parametric description is hard to find or specify. For a varied treatment of several nonparametric methods for testing hypotheses or estimating parameters with different conditions in the observed data refer to [31, 32]. An specific type of nonparametric hypothesis testing procedure is the Chi-Squared test for goodness of fit. The idea consists in a test that decides whether certain observed data belongs to a given population or not. The problem is reduced to test a multinomial setting by comparing the observed cell counts with their expected values under H0 , that is the prospective population that fits the data [33]. To test the simple hypothesis H0 : x ∼ FX (x), i.e. the random sample X1 , . . . , XL has the PDF FX (x), the domain of F is partitioned in P cells, C1 , . . . , CP . If R1 , . . . , RP are the observed number of Xj ’s in this cells, then Rl has the binomial distribution with parameters L and pl = P (Xj f alls in Cl ) =. Z. dFX (x)dx. Cl. where 1 ≤ l ≤ P and the null hypothesis is true. A measure of fit can be based on the differences among the observed data in each cell Rl and the corresponding expected value Lpl . The quantities Rl − Lpl for a large number L can be approximated by a normal distribution, and considering the whole set of quantities an straight forward approximation is made with a nonsingular multivariate normal distribution of P − 1 random variables. iT h Also if, m = m1 . . . mw has a nonsingular w-variate normal distribution Nw (µ, C) then the quadratic form (m − µ)′ C−1 (m − µ) has a χ2w distribution as a function of m. Here µ is the means vector and C is the covariance matrix of m. Then considering the two previous statements, the following statistic.

(50) 3 Theoretical Background. 29. X2 =. P X (Rl − Lpl )2 l=1. Lpl. (3.15). will have an approximate χ2P −1 distribution when the number of available samples L is large enough. It is called the Pearson chi-squared statistic [33]. The test rejects H0 if the obtained value corresponds to a lower right tail probability than the selected size α on the chi-squared PDF. Special care should be taken with the degrees of freedom when some parameters are estimated, for a broader reference on the chi-squared tests family refer to [33, 34]. Another way to perform nonparametric statistical tests is by using the Bootrstrap Resampling method. The objective is to estimate the value of the threshold γ1−α , so that if the test statistic (i.e. the detector) T (X) exceeds this value, the null hypothesis H0 is rejected in favor of the alternative H1 . The true value of γ1−α is such that P (T (X) > γ1−α ) = α, i.e. the PF A = α. In general terms, the intention is to obtain several resamples x∗ of the ∗ observed vector x, and then obtain the (1 − α)-quantile γ1−α of the test statistic T (X∗ ) ∗ ∗ from its distribution P (T (X∗ ) > γ1−α |x) = α. Then γ1−α is used as an approximation for ∗ the unknown bound γ1−α , and the bootstrap test decides for H0 if T (X) ≤ γ1−α and for H1 otherwise [35]. In [35] a complete Bootstrap Resampling procedure is outlined to obtain the corresponding threshold of the test of size α, or in other words, with PF A = α for the process under consideration. The steps are the enunciated next. 1. Generate a bootstrap realization x∗ (b). Calculate Tb∗ = T (x∗ (b)). Repeat for b = 1, . . . , B; keep T1∗ , . . . , TB∗ in storage. ∗ ∗ 2. Order T1∗ , . . . , TB∗ with respect to size to get T(1) ≤ . . . ≤ T(B) . ∗ ∗ 3. Set γ1−α,B = T([(1−α)B]) , where [a] denotes the largest integer ≤ a. ∗ ∗ If B is chosen large enough, γ1−α,B will get arbitrarily close to γ1−α . An important remark is made in [35]: For evaluating quantiles, a larger number B of resamples is needed than in other applications of the bootstrap, as they are essentially determined by only a.

(51) 3 Theoretical Background. 30. small fraction of the largest or smallest values of T1∗ , . . . , TB∗ . For α = 0.05, e.g., B = 1, 000 usually suffices, for α = 0.01, B = 5, 000 or 10, 000 are needed. In this work the number of resamples B that will be used is determined by evaluating the mean squared error (MSE) of the threshold estimators when varying B..

(52) 31. Chapter 4 The Central Chi-Square Detection Algorithm applied to artifacts in EEG Due to the existence of many sources of artifacts, and the difficulty to obtain enough recordings to parametrize them, in order to use the common detection algorithms, a nonparametric detector is developed in this work. The first problem that arises in EEG characterization is the small availability of samples and the expenses generated to certain patient caused by the large number of hours of recordings. The approach in this document is to use a resampling method to generate B enough pseudo recordings of length L to obtain the statistical characteristics of the clean EEG. Then, in order to obtain the joint pdf of the samples, having no knowledge of their individual distributions (because of the presence of an arbitrary artifact or not), the power moments and the autocorrelation lag moments of the resamples are calculated with (4.1) and (4.2) respectively. It is important to mention that during this work non-central moments where used because the AR is a zero-mean process, but this is not a restriction for future developments; indeed, different classes of moments can be used if their estimators are known. L−1. νi,j. 1X (xj [n])i = L n=0. for i ∈ c, 0 ≤ j ≤ B − 1. (4.1).

(53) 4 The Central Chi-Square Detection Algorithm applied to artifacts in EEG 32. L−l−1 1 X rxj (l) = (xj [n + l]) (xj [n])∗ L − l n=0. for 0 ≤ l << L; 0 ≤ j ≤ B − 1. (4.2). By the Central Limit Theorem the pdfs of the moments estimators can be considered to be gaussian, and thus their joint pdf, so the means vector µ (p×1) and the covariance matrix C (p×p) can be estimated with B realizations by (4.3) and (4.4) respectively, where νj = [νc1 ,j , . . . , νcpm ,j ]T , rxj = [rxj (b1 ), . . . , rxj (bpl )]T , and pm +pl = p. The vectors c (pm ×1) and b (pl ×1) contain the selected moments and correlation lags to be used with the detector h iH h iH (e.g. c = 1 3 and b = 2 4 means that the moments ν1 , ν3 , rx (2) and rx (4) are to be used).. " # B−1 1 X νj µ̂ = B j=0 rxj " # " #H B−1 νj 1 X νj Ĉ = B j=0 rxj rxj. (4.3). (4.4). The detector used with this method is γ(m) = (m − µ̂)H Ĉ−1 (m − µ̂), that under H0 can be considered to have a chi-square distribution with p degrees of freedom [27]. Under H1 it has an unknown distribution but the values of γ will tend to be larger than the ones obtained under H0 . This has the form of a nonparametric Chi-Squared Test that determines that there are significative differences between the data observed and the stated in H0 when the value of γ exceeds certain threshold [32, 33, 34]. This critical value can be obtained by setting the value of probability of false alarm (PF A = α) as the right tail probability of the χ2p . For an evaluation of the performance of this detector refer to chapter 5.. 4.1 Adaptive Threshold The algorithm proposed uses an adaptive nonparametric approach to obtain the characterization of the detector and the threshold value instead of solving for the right tail probability of the χ2p described in [35], because it will be more accurate given the observed values and.

(54) 4 The Central Chi-Square Detection Algorithm applied to artifacts in EEG 33 not making the chi-square assumption. The idea of making this procedure adaptive is to enable it to recalculate the most accurate threshold for each EEG signal segment given that these data are only quasi-stationary [1]. The steps to get the threshold are: 1. Characterize the EEG signal, i.e. obtain the estimates µ̂ and Ĉ 2. Obtain the threshold γ1−α for the 1 − α quantile. It is important to notice that if the threshold is selected from the right tail probability of the χ2pm +pl ,1−α distribution, which should give good results, the process reduces to the estimation of µ and C. 4.1.1 Ideal threshold within a segment The ideal threshold is the one that shows the largest distance from the diagonal to the ROC curve because it maximizes the probability of detection (PD ) with the constraint of minimizing the probability of false alarm (PF A ). During the most part of this work, unless otherwise noted, the EEG signal will be considered as a quasi-stationary pseudo AR process that can be represented with AR parameters within a given segment of time, despite of certain modeling errors. If the EEG is considered to have the AR characteristics, then h iT the values of µ and C are closely related with the AR parameters a = 1 a1 . . . aM where M is the order of the AR. The Background section shows the relation within these parameters. h iH For example, considering the set of non central moments (i.e. non zero-mean) c = 2 , h iH b = 1 5 and the AR model order of M = 1 (in section 5.2.1 it is verified by trial and error that this is the Best Set of Moments among 15 pre-established sets), the values of µ and C will be given by . .  E {ν̂2 }   µ = E {r̂x (1)} E {r̂x (5)}.  Cov(ν̂2 , ν̂2 ) Cov(ν̂2 , r̂x (1)) Cov(ν̂2 , r̂x (5))   C = Cov(r̂x (1), ν̂2 ) Cov(r̂x (1), r̂x (1)) Cov(r̂x (1), r̂x (5)) Cov(r̂x (5), ν̂2 ) Cov(r̂x (5), r̂x (1)) Cov(r̂x (5), r̂x (5)). (4.5). (4.6).