• No se han encontrado resultados

Automatic epileptic seizure detection in EEG based on convolutional neural networks

N/A
N/A
Protected

Academic year: 2020

Share "Automatic epileptic seizure detection in EEG based on convolutional neural networks"

Copied!
56
0
0

Texto completo

(1)

PROYECTO FIN DE CARRERA

Presentado a

LA UNIVERSIDAD DE LOS ANDES

FACULTAD DE INGENIER´

IA

DEPARTAMENTO DE INGENIER´

IA BIOM´

EDICA

Para obtener el t´ıtulo de

INGENIERO BIOM´

EDICO

por

Rafael Cuperman Coifman

Automatic epileptic seizure detection in EEG based on convolutional

neural networks

Sustentado el 11 de Diciembre de 2015 frente al jurado:

Composici´

on del jurado

- Asesores: Mario Andr´es Valderrama Manrique PhD, Universidad de Los Andes, Bogot´a, Colombia

Michel Le Van Quyen PhD, ICM, Paris, Francia

(2)

Acknowledgments

First of all I thank G-d, for giving me the opportunity to succeed and achieve my goals. To my parents, brother and family, not only for believing and supporting me, but also for being my best friends. To all my friends, for making me have amazing times and holding me when something wasn’t going well. To Mario Valderrama and Michel Le Van Quyen, for their help, guide and recommendations, which allowed this project to be what it is. To the Charpier Lab team, the Bioserenity staff and all the people at the ICM that gave me ideas and advice: I learned a lot! And finally, to this amazing city called Paris, which left me speechless. All the people that direct or indirectly made me who I am,

Merci Beaucoup!

(3)

Contents

1 Introduction 1

1.1 Introduction . . . 1

1.2 Objectives . . . 1

1.2.1 General Objectives . . . 1

1.2.2 Specific Objectives . . . 1

2 Description and motivation 2 3 Theoretical and Conceptual Framework 3 3.1 Theoretical and Conceptual Framework . . . 3

3.1.1 Epilepsy [4] . . . 3

3.1.2 EEG [4] . . . 3

3.1.3 Wavelet Transform and Spectograms [10] . . . 4

3.1.4 ICA [7] . . . 7

3.1.5 Supervised Learning [20] . . . 7

3.1.6 Convolutional Neural Networks [8] . . . 8

3.2 Related works . . . 9

4 Work methodology 12 4.1 Work plan . . . 12

5 Algorithm development 14 5.1 Initial objective modification . . . 14

5.2 Epilepsiae database . . . 14

5.3 Time-frequency representation: spectrograms . . . 15

5.4 CHB-MIT database [22], [5] . . . 16

5.5 Independent Component Analysis (ICA) . . . 16

5.6 CNN with spectrograms as inputs . . . 18

5.7 Raw signal images . . . 19

5.8 CNN with raw signal images as inputs . . . 20

5.9 Evaluation . . . 20

6 Results and Discussion 22 6.1 Time-frequency representation: spectrograms . . . 22

6.2 Independent Component Analysis (ICA) . . . 22

6.3 CNN with spectrograms as inputs . . . 23

6.4 CNN with raw signal images as inputs . . . 26

6.4.1 First layer . . . 28

6.4.2 Second layer . . . 31

6.5 Evaluation . . . 37

(4)

CONTENTS iii

7 Conclusions and future work 41

7.1 Conclusions . . . 41 7.2 Future work . . . 42

References 43

(5)

Figure index

3.1 Epileptic crisis recorded with EEG [21] . . . 4

3.2 International 10-20 EEG configuration [23] . . . 5

3.3 Example of time-frequency representation of a signal [1] . . . 6

3.4 Basic architecture of a CNN. Convolutional and pooling layers one after the other, followed by fully connected layers [13] . . . 8

3.5 Filters learned in a facial recognition problem with CNN. Deeper filters learn more complex patterns [14] . . . 10

5.1 Examples of crisis (left) and non-crisis (right) windows for the CHB-MIT database . . . 17

5.2 Dimensions of the resulting spectrograms images for each window . . . 18

5.3 General strategy for the CNN building and training: one big layer at a time. . . 19

6.1 Signal, spectrograms with db wavelets . . . 23

6.2 Spectrograms with sym wavelets . . . 24

6.3 Spectrograms with gaus wavelets . . . 25

6.4 Spectrograms with coif and haar wavelets . . . 26

6.5 Spectrograms with mexh, meyr, dmey and gabor wavelets . . . 27

6.6 Examples of crisis and non crisis windows before and after ICA . . . 28

6.7 Examples of crisis and non crisis spectrograms before and after ICA. Top row are the windows without ICA and bottom row are the windows after ICA. On the left column are crisis windows and on the right are examples of non-crisis windows. . . 29

6.8 Examples of the two cases obtained when training the CNN with spectrograms: over-fitting (left) and no learning (right). . . 30

6.9 Input images for the second approach: raw image windows . . . 31

6.10 Mean images for crisis windows (left) and non-crisis windows (right) . . . 32

6.11 One big layer architecture for the CNN . . . 33

6.12 Images in between the layers . . . 34

6.13 Blurry horizontal bands in images generated from crisis windows . . . 35

6.14 Generation of vectors from the 50×50 images by taking the standard deviation for each column . . . 36

6.15 Standard deviation vectors produced from the std.deviation pooling of the 50×50 images. Each column is a vector, and both classes (seizure or non-seizure) are separated by the red vertical dashed line). . . 37

6.16 Mean training and validation curve obtained with 10 different randomly selected training and validation windows . . . 38

6.17 Sensitivity obtained with the final classifier under the leave-one-out training method . . 39

6.18 Specificity obtained with the final classifier under the leave-one-out training method . . 39

(6)

Table index

3.1 Related works . . . 11

6.1 Performance in train and validation of the CNN vs convolutional filter size . . . 29 6.2 Performance in train and validation of the CNN vs number of convolutional filters . . . 31

(7)

chapter 1

Introduction

1.1

Introduction

In this document the problem of Automatic Epileptic Seizure Detection is studied using a technique well known in Image Processing and Analysis: Deep Learning. Fast and accurate automatic detection of crisis and other epileptic behaviors is a topic with lots of interest nowadays, because it can help prevent and treat faster and more efficiently this disease, being an area of constant study and investi-gation by engineers and doctors. Because of that, there are many ways to approach this problem, but in this specific case, machine learning image tools are used.

Many authors and investigators have studied this in different ways, giving a special interest to the manual extraction of features that are fed into a classifier, who takes the final decision. The classifiers can be constructed in many different ways according to a large pool of several Machine Learning tech-niques (SVM, Neural Nets, Decision Forests, HMM, etc.). So, the analysis of the Automatic Epileptic Seizure and Biomarker Detection problem can be divided in two sub-problems: feature extraction, and classification (that are tightly related). In this document, a method based in Convolutional Neural Networks is studied, which unifies both sub-problems in only one: the final classification and detection of those epileptic activities.

1.2

Objectives

1.2.1

General Objectives

Develop computational and processing strategies for the analysis of electro-physiological epileptic data in order to allow a better diagnosis of epilepsy.

1.2.2

Specific Objectives

• Review literature concerning existing epileptic biomarkers and the methods for its detection and analysis.

• Develop algorithm methods for the detection of epileptic seizures based on existing databases and convolutional neural networks.

• Evaluate the developed methods according to the performance in different type of contexts.

• Implement methods for real-time detection of epileptic seizures based on pre-annotated measure-ments.

(8)

chapter 2

Description and motivation

As it is defined by the Mayo Clinic, “epilepsy is a central nervous system disorder (neurological disor-der) in which nerve cell activity in the brain becomes disrupted, causing seizures or periods of unusual behavior, sensations and sometimes loss of consciousness.” [15] This neurological condition affects around 65 million people in the world (around 1% of total Earth population), giving them strong seizures that can be really dangerous for their life and lifestyle. The key point in the diagnosis and treatment of this disorder is its correct identification based on EEG (Electroencephalogram) records. This kind of study is currently the principal clinical exam for the diagnosis of epileptic patients. The principle of scalp EEG is based in recollecting, at the scalp, electrical potentials generated by the brain, but because those signals are measured superficially, they have to go across thick layers of skin, skull and other tissues, making it sometimes difficult to interpret for the specialist.

Regarding what was said before, the analysis of EEG records is a difficult and controversial procedure, such that is made by a very specialized physician. Traditionally this is made by the visual examina-tion of the signals, which requires a significant amount of time, resources and specialists. Also, those records are usually contaminated with noise and artifacts due to patient movements or poor electrode contacts. Those conditions constitute important barriers for the accurate and early diagnosis of epilep-tic patients, making them more vulnerable to develop high states of the disease and complications. It is estimated that nearly 20% of the patients actually have non-epileptic events and are diagnosed as so, because what was mentioned. [11]

Recent advances in computational algorithms based on huge datasets give the possibility of developing automatic methods to analyze EEG records, making the diagnosis of epilepsy faster and easier. Those methods may give additional tools for the detection of important biomarkers in a real-time environment, giving deeper information regarding the events occurring in the brain when epileptic activities take place. With that information, doctors may be able to make better decisions about treatments, probably making way less mistakes on drug prescription. Also, those algorithms may generate automatic alarms based on the real-time identification of abnormal electrical activities, preventing risky situations. Based on what was said, the development of computational strategies for real-time analysis of epileptic activity is considered a big area of study, promising huge medical benefits.

(9)

chapter 3

Theoretical and Conceptual

Framework

3.1

Theoretical and Conceptual Framework

3.1.1

Epilepsy [4]

”What is epilepsy?” is a question of high debate among the medical and scientific community. It is well known that it is one of the most common disorders of the brain [26] and there are findings that show that ancient cultures knew about this disease. The oldest medical reference of epilepsy comes from the Assyrian-Babylonian civilization around the year 1050 BCE [25]. It affects worldwide around 1% of the population. To be diagnosed as epileptic, a person should experience at least two unprovoked seizures occurring more than 24 hours apart. [17]

The Mayo Clinic defines epilepsy as ”a central nervous system disorder (neurological disorder) in which nerve cell activity in the brain becomes disrupted, causing seizures or periods of unusual behavior, sen-sations and sometimes loss of consciousness.” [15] The most common manifestations of this illness are the well known seizures. When a person has an epileptic crisis, the abnormal neural electric discharges in the brain may affect the person’s muscles, provoking uncontrollable contractions, producing these highly noticeable seizures. Nevertheless, epilepsy is more than seizures, it is ”a chronic condition of the brain characterized by an enduring propensity to generate epileptic seizures, and by the neurobio-logical, cognitive, psychoneurobio-logical, and social consequences of this condition” [19]. This means that the person doesn’t only have epilepsy when seizures occur; he/she has epilepsy even in between seizures and due to all the social and psychological problems that follow it.

Epileptic crisis can be manifested as seizures, which are highly noticeable, but sometimes the person can experience other kind of symptoms, more frequent, like absences that last few seconds. During these absences, the person interrupts all his activities and stares at some distant point, without having conscience of it. Normally, after the absence, the person resumes the activity he was doing without even noticing what has happened. Epilepsy can be very dangerous for the person, not only socially and psychologically, but also physically, because a crisis can occur at any moment, putting the person and the surroundings in danger if it happens at a non-fortunate place and time (when driving, for example).

3.1.2

EEG [4]

There are many ways to diagnose and analyze epileptic brain activity, but the principal techniques used for these reasons are the ones related with the study of the electric and magnetic fields produced

(10)

CHAPTER 3. THEORETICAL AND CONCEPTUAL FRAMEWORK 4

by brain cells. Inside them, the most popular is the Electroencephalogram (EEG), which records the electrical activity of the brain.

Those signals are recorded with the help of electrodes, but to reach these electrodes, the electric ac-tivity has to go through different layers of tissues, face interference and other problems that can make the records confusing. Because of this, the analysis of these signals can be very challenging, and it has to be done by experienced specialists. Despite these problems, EEG is a very powerful instrument in the diagnosis and study of epilepsy. In figure 3.1, it is possible to appreciate a typical epileptic crisis recorded with EEG.

Figure 3.1: Epileptic crisis recorded with EEG [21]

There are many different ways to place the electrodes when performing a EEG in a patient. The most used one is the international 10-20 system (figure 3.2). This configuration is internationally recognized, and it was developed and documented to make EEGs reproducible and standardized. Each electrode position is named with a letter and, almost always, a number. Even numbers refer to right hemisphere, odd numbers to left hemisphere and the letter z(zero) corresponds to the mid line. F is for frontal, T for temporal, C for central, P for parietal and O for occipital.

It is also possible to take a bipolar or banana montage, where the records are processed as the subtrac-tion of two neighbor electrodes. This is normally done to reduce the common noise in both electrodes.

3.1.3

Wavelet Transform and Spectograms [10]

The Wavelet Transform is a technique from signal processing and analysis that transforms the signal into another representation of itself. By using the wavelet transform, the time-frequency map of the signal can be extracted, where it is shown how frequency spectrum varies with time for that signal (figure 3.3).

(11)

CHAPTER 3. THEORETICAL AND CONCEPTUAL FRAMEWORK 5

Figure 3.2: International 10-20 EEG configuration [23]

Transforms decomposes the signal with specific functions, called mother wavelets. The general defini-tion of the Wavelet Transform is

F(a, b) =

Z ∞

−∞

f(x)ψ(a,b)(x)dx

(12)

CHAPTER 3. THEORETICAL AND CONCEPTUAL FRAMEWORK 6

(13)

CHAPTER 3. THEORETICAL AND CONCEPTUAL FRAMEWORK 7

transformed intoF(a, b).

An easy way to understand this transform is to think about it as many convolutions of the original signal with different versions of the mother wavelet. The convolution of the original signal is made with the mother wavelet and the result is stored as scale 1. Then, the mother wavelet is scaled (by scaled it can be understood as stretched in time) and a new convolution is made with the original signal, saving the result as scale 2. The mother signal is scaled again and the process is repeated many times, having a convolution for each scale. At the end, all the results are placed one on top of the others, building the final spectrogram (figure 3.3).

3.1.4

ICA [7]

Independent Component Analysis (ICA) is a technique of signal processing which is used to separate a signal into its independent sources. Many times the signals acquired, specially in EEG records, have different and independent sources, but because of the acquisition method, some characteristics and features are mixed. The use of ICA allows the decomposition of those mixed signals into the ideally independent sources that generated them. The hard assumption that is made when using this method is that all the resulting signals (or sources) are independent between them. Since this technique is not crucial in the development of this project, its details won’t be discussed. For more detailed informa-tion, refer to [7].

3.1.5

Supervised Learning [20]

Supervised Learning is a Machine Learning area in which it is intended to infer the structure of some training data according to their classes or labels. It is different from non-supervised learning princi-pally because in this second case, there is no prior knowledge of the labels of the data. The idea with the supervised learning techniques is to build a function which might be capable of separating the data into their different classes. After constructing this function, it is possible to predict the category of some new unknown data point to make it part of some of the predefined groups by assigning the corresponding label. Supervised learning problems can be divided in two main groups: continuous or regression problems, in which the output of the constructed function is a numerical value which is then used to infer a new function capable of modeling input data; and discrete or classification problems, in which the output of the constructed problem is directly the label of the input data, determined by the structure built from the training data. In this project, a binary classification problem is at-tacked using a supervised learning algorithm. By binary, it means that there are only 2 possible classes.

In other words, a supervised learning problem is a problem in which a set of training data is seen in which each data point belongs to a pre-assigned class. {xi, yi}ni=1 ∈χ× {−1,1}. This process of pre-assigning the labels is called labeling.

The objective is, once again, to construct a classifier C capable of minimizing the classification error over the data that is not part of that set that was already seen, that is, minimizing

L(C) =P{y6=Ic(x)}, where Ic is the indicator function.

In order to approach this type of problems, three different data sets are used: training set, validation set and test set. The training data set is used to construct the classification model C; the validation data set is used to validate the constructed model by adjusting different parameters, in order to avoid as much as it can the overfitting of the constructed model to the training data; and least, the test set are the data points that were not seen in the training nor the validation steps, and they are used to evaluate the real performance of the constructed classifier C in the whole database.

(14)

CHAPTER 3. THEORETICAL AND CONCEPTUAL FRAMEWORK 8

3.1.6

Convolutional Neural Networks [8]

Convolutional Neural Networks (CNN) are the type of models used in this project to construct the classifiers capable of discriminating between an epileptic crisis and non-epileptic activity. A CNN is a type of architecture based on artificial feed-forward neural networks which tries to resemble the human vision. Hubel and Wiesel [6] discovered that the neurons responsible for vision and cognitive recognition of forms and figures have receptive fields, and each cell responds to some specific simple pattern. The combination of this simple patterns gives higher complex representations, which allows the brain to process and interpret the visual information. In other words, complex images and forms are composed by the combination of simple patterns, recognized by cells with specific receptive fields.

This type of artificial neural networks tries to mimic that behavior. They were invented in 1989 by Yann LeCun [12], and after some development, they began to show great results in classification problems applied to computer vision. Without going into too many details, because the study and explanation of CNN is a widely investigated area, the general idea of these models is based on, principally, two main ideas achieved by two type of layers (figure 3.4):

Figure 3.4: Basic architecture of a CNN. Convolutional and pooling layers one after the other, followed by fully connected layers [13]

1. Weight sharing and feature maps via convolutional layers: in this layer a small filter is used to convolve the input image. The filter is set in one position of the image and calculates the convolution over that specific area, then the filter moves to another point and calculates the convolution on that place, and so on all over the input image generating a new image of the convolved image by that filter (this map is called a feature map). The important thing is that all the convolutions made with the filter over the whole input image are made using the same filter weights. This is called weight sharing. Many filters are used (all with same size but with

(15)

CHAPTER 3. THEORETICAL AND CONCEPTUAL FRAMEWORK 9

different weights), so many feature maps of the input image are formed (one feature map per filter). All the feature maps are stacked along the depth dimension, forming the output image of this layer. When the CNN is trained, the network learns the weights for each filter, building filters that respond to specific patterns.

2. Pooling layers: normally two different types of pooling layers are used: max pooling and mean pooling, being the former one the most popular. Small rectangular blocks are taken from the input image of this layer, and the maximum (or mean) value of the elements contained in these rectangles is extracted. This process has two major benefits: dimensional reduction of the feature maps and the concept of receptive fields. After this layer, the feature maps will be smaller, but also each pixel of those maps will have information about several pixels from the feature maps of the previous layer.

Beside these two main layers, other type of layers might be used, like non-linearity, normalization or fully connected.

At the end of the network, a fully connected layer is used, which works as a normal neural network and predicts the final label. The power of CNNs is that the same network learns to extract features from the input image, and classify it at the same time. When training a CNN, the weights for each filter are learned, making each filter ”expert” in recognizing a specific shape. Filters in the first layers will learn simple patterns, while others in deeper layers will learn more complex forms. This is exemplified in figure 3.5.

3.2

Related works

When working with a classification or recognition problem, there are two main steps: feature extraction, where some characteristics of the input signals or vectors are extracted; and classification, where a classifier is constructed based on those extracted features. One of the big benefits of the CNN is the realization of those two steps at the same time, automatically. Nevertheless, since the use of CNN is not yet very explored in epileptic EEG signal analysis, the feature extraction techniques traditionally used in this context can be classified as [2]:

• Mimetic methods: they try to imitate the visual interpretation of EEG records made by doctors. Some descriptors are extracted from the signals, like slope, duration, amplitude, vertex angle and sharpness.

• Morphological methods: they extract information such as statistical behavior, spectrum and time-frequency components.

• Other methods, like template matching, component analysis, raw signal analysis, non-linear features, and so on.

Then, many different methods and techniques are used to construct the classifier, for example with neural networks, SVM, linear classifiers, random forests, nearest neighbor, etc.

Since the procedure in this study was focused on using CNN (which is not yet very explored in this context), obtaining results over a specific database (CHB-MIT database), it is important to compare the proposed algorithm with methods developed and tested within the same database. In table 3.1 three different related works are summarized.

(16)

CHAPTER 3. THEORETICAL AND CONCEPTUAL FRAMEWORK 10

Figure 3.5: Filters learned in a facial recognition problem with CNN. Deeper filters learn more complex patterns [14]

(17)

CHAPTER 3. THEORETICAL AND CONCEPTUAL FRAMEWORK 11

Table 3.1: Related works

Work Feature extraction

method Classifier

Average Sensitivity

Average Specificity Shoeb

2009 [22]

Spectral, spatial and temporal

features using frequencial filters SVM with RBF kernel 96% Not reported

Nabeel, et al. 2014 [16]

Energy, entropy, std.dev, mean,

maximum and minimum from

wavelet decomposition

Linear classifier 98.5% Not reported

Kiranyaza, et al. 2014 [9]

Many morphological, time, frequency,

time-frequency, non-linear and MFCC

features combined

CNBC (Collective Network

(18)

chapter 4

Work methodology

4.1

Work plan

The Gantt diagram in Appendix A details the activities proposed, with their respective time and sequence, at the proposal phase of the study. Even though it was necessary to make changes in some details, the general work plan was the same in reality.

The Common BioElectric laboratory in Paris has a collaborative partnership between the ICM (Insti-tute for Brain and Spinal Cord) and the BioSerenity company, with the common objective of automated real-time analysis solutions of EEG in epileptic contexts. This laboratory is very interested in the de-velopment of methods based in the ICM research in epilepsy, with the help of big databases of EEG records recollected by BioSerenity and stored in the EpiLab cloud platform. This platform allows access to huge amount of data of EEG records for research, validation and identification of valuable epileptic biomarkers looking for an aid to doctors in real-time analysis and diagnosis of epilepsy.

Based on that, this project will be done in Paris, France in the second semester of 2015. The proposed methodology is based on the general objectives of the EpiLab partnership, which were mentioned above. Before starting to work and develop algorithms, it is important to gather some information: full (or high) understanding of epilepsy and usual methods of diagnosis, current epileptic biomarkers and their identification in EEG records and familiarization with the EpiLab database and its format. The construction of the database was (or is being) made with many patient EEG recordings following a standard protocol at a hospital, which is used to associate specific activities at the corresponding times of the recording. Those measurements are annotated by neurophysiologist specialists, indicating the nature of the event (whether is an epileptic activity or simply noise or an artifact). Also, the data is enriched with some patient information, such as sex, age, diagnosis, medication, etc. It is important to mention that the database not only has EEG records from epileptic patients, but also from healthy people, making possible to have a control group. BioSerenity continuously will enlarge the database with measurements made by their Neuronate system.

Once the familiarization with the database is done, the author will have access to the EpiLab platform and the whole information stored in it. With this large amount of data, computational algorithms will be developed for the research and validation of current and new epileptic biomarkers. Methods based on statistics, data analysis and Machine Learning will be evaluated, comparing and validating them on different kind of contexts (different types of epilepsy, different type of records, different type of people, etc.), concluding in an optimal method between the ones proposed. Finally, the algorithm will be given to a specialist for test-making, so that adjustments of the algorithm can be made in order to have an accurate real-time automated epileptic activity analyzer.

(19)

CHAPTER 4. WORK METHODOLOGY 13

During the development of the project, several reunions were planned with both advisors. The reunions with Michel le Van Quyen were face to face in Paris, while the ones with Mario Valderrama were via teleconference. Also, when specific problems and situations arose, help and advice were obtained from the specialized researchers and doctors at the ICM in Paris. All those reunions and discussions were meant to receive specific feedback and orientation in order to complete the project.

(20)

chapter 5

Algorithm development

The full development of the project which is mentioned in this document was implemented using the MATLAB computational tool in its version R2015a. The public MATLAB library MatConvNet [24] was used for the machine learning technique applied. All the code, database treatment, database and image processing, training and test of the algorithms developed were done in this MATLAB environment and language. Following the work plan proposed, the detailed steps done to accomplish the project are presented below.

5.1

Initial objective modification

Even though the original main objective of this project was to develop algorithms to detect epileptic biomarkers (specifically related with epileptic spikes) for the further analysis and diagnose of epileptic activity, after some study and consultation with the advisors, the objective was modified. The new idea of this project was specifically to use Convolutional Neural Networks (CNN) as the Machine Learning technique in order to classify and detect epilpetic spikes in EEG records. It was chosen this technique because it is widely used nowadays in image and pattern recognition problems with excellent results, since it tries to resemble how human vision work. After some research, it was found that this technique was almost not yet explored in epileptic-related problems, so this is why the new focus of the problem was using CNNs. Also, in order to have a better feeling of the potential of CNNs in epileptic-related problems, it was decided to develop an algorithm based in CNNs capable of differentiating an epileptic crisis from non-epileptic activity, a vastly studied epileptic-related problem with lots of information and algorithms, instead of going into epileptic spikes detection.

5.2

Epilepsiae database

When it is intended to solve a problem with supervised learning, the primordial first step consists in acquiring a large and rich database with annotated instances of all the classes meant to be classified. Since the categories in this project were only two, crisis or non-crisis, it was important to have access to a big database of EEG records from different patients with their corresponding annotations. At the ICM in Paris, France, there is access to the Epilepsiae database, a huge database developed by the collaboration of several European universities, institutes and the company BioSerenity. This database is stored in the Epilab cloud platform and is under continuous actualization and analysis. It has long term records of a total of 275 patients (225 scalp and 50 intracraneal) with their corresponding crisis annotations. The first algorithms and tests were tried in this database.

(21)

CHAPTER 5. ALGORITHM DEVELOPMENT 15

5.3

Time-frequency representation: spectrograms

Having the database to work with with the corresponding annotations, it was necessary to define an approach to treat the data and feed it into the machine learning technique. By simple visual exami-nation of a crisis, just like a doctor does, it is clear that when there is a crisis, there is a change in the frequencial/spectral information of the EEG. So, the first idea was to try to find a way to visualize or extract the frequencial components of the crisis or non-crisis windows. The first option was to use the Fourier Transform of the signal, since it is one of the most known and used transform to extract spectral information. Nevertheless, EEG signals have a special property, which is that they are highly non-stationary (Quiroga1998). This means that some characteristics of the time series, such as mean, variance, power or spectrum vary with the time. The Fourier Transform is not appropriate for those kind of signals, because it doesn’t provide any information about the changes in frequency over time, which is primordial in epileptic EEG analysis.

To overcome this difficulty, it was necessary to use a different type of representation that could give information about time and frequency at the same time. The proposed solution, used also by some authors, was to use time-frequency inspection (or multiresolution analysis), which gives information about time and frequency at the same time. This technique can be implemented easily with the help of the Wavelet Transform.

The Wavelet Transform is, just like the Fourier Transform, a convolution between the original sig-nal and some other specific sigsig-nal. For the Fourier Transform, the convolution of the origisig-nal sigsig-nal is made with sinusoidal waves with frequency changes and, as it was previously described, is highly recommended for stationary signals. On the other hand, the Wavelet Transforms uses a scaled and translated mother wavelet to make the convolution. This allows to decompose the original signal in sub-bands, retaining both time and frequency information, making it more suitable for non-stationary signals, just like EEG. Using a smaller wavelet scale will make the wavelet compress, giving a finer scale and high frequency information. A larger scale will provide a low frequency representation.

As said, there are many different types of mother wavelets. Several of them were tried, looking to have the cleanest time frequency representation (spectrogram) possible. Some papers recommended using the Daubechies 4 wavelet (db4) for epileptical EEGs, so it was the first choice. Nevertheless, looking a little deeper, the db4 wavelet was highly recommended specifically for spike detection, since it resembles a lot to a epileptic spike. Comparing the db4 form with crisis morphology, it was noted that they are both really different. Even though, as said, many different wavelets were tried, trying to decide which one to use according to the resulting spectrograms. The following wavelets were tested:

• Haar

• Daubechies 1, 4 and 10 (db1, db4 and db10)

• Symlet 2, 5 and 8 (sym2, sym5 and sym8)

• Coiflet 2 and 8 (coif2 and coif8)

• Meyer (meyr)

• DMeyer (dmeyr)

• Gaussian 1, 5 and 8 (gaus1, gaus5 and gaus8)

• Mexican hat (mexh)

(22)

CHAPTER 5. ALGORITHM DEVELOPMENT 16

The spectrograms constructed with those wavelets were compared and by simple visual examination a mother wavelet was selected (Gabor wavelet). It is important to recall that each signal produces a time-frequency representation that has two dimensions and can be understood as an image. Since the idea was to work with banana montage, a record had 16 different channels (signals), meaning 16 images (spectrograms) per record. To unify all them, all those 16 images were concatenated one behind the other, having a final image of 16 channels deep for each window analyzed. This huge images were the inputs for the following CNN, which was meant to classify them between crisis or non-crisis.

5.4

CHB-MIT database [22], [5]

During the development of the project, another public database was found online. This database is called CHB-MIT Scalp EEG Database, because it was constructed under a collaboration between inves-tigators of the Children’s Hospital Boston (CHB) and Massachusetts Institute of Technology (MIT). It was collected at the CHB and consists of scalp EEG records with or without epileptic seizures. The records are from 23 different patients, which were monitored for several days without taking any anti-seizure medication. All signals were acquired with the International 10-20 System of electrode positions and nomenclature, using a sample rate of 256Hz with 16 bit resolution. There are a total of 182 seizures in the whole database, each one with its corresponding start and end time. In order to use this database, windows around all the 142 seizures (patient 12 had different configuration and protocol, so he wasn’t used) were taken as the positive examples (seizures), and other 142 randomly selected seizure windows were taken as the negative examples (seizure). 142 random non-seizure windows were selected to have a balanced problem (same positive and negative examples) and all signals were filtered between 1 and 30 Hz.

Some examples of crisis and non-crisis windows are shown in the figure 5.1.

The use this database and not only the huge Epilepsiae database (section 5.2) was mainly because of two reasons:

1. As it is shown in the results and discussion chapters, using the Epilepsiae database at the beginning with the spectrogram representation wasn’t giving positive results. Thinking that the problem might be the database annotations, it was decided to try the same spectrogram approach with the CHB-MIT database. Nevertheless, that approach didn’t work either in this second database, making it necessary to change the spectrogram representation, as will be explained later on.

2. The CHB-MIT is a database that has been used by several authors to develop their algorithms, showing that this database is well annotated and can be used to develop and test new techniques. Also, it would be possible to compare the results of this work with the ones from other authors, since it will be done with the same database.

Because of those reasons, it was decided to work with the CHB-MIT database from this point on.

5.5

Independent Component Analysis (ICA)

Trying to clean a little bit the signals, by reducing common noise and shared characteristics between two or more channels in the same record, the ICA technique was applied to each record, generating new, independent and cleaner signals to train with. This idea was done thinking that, by using ICA, it would be easier to see the epileptic patterns in a specific channel of the record, instead of watching it across all the electrodes. In other words, the goal was to generate new signals, each one independent from the others in order to have a cleaner and nicer visualization of the crisis. Since the records were in banana montage with 16 channels, 16 independent signals were constructed using ICA. Next, the construction and training of the CNN were tried using these 16 “artificial” signals.

(23)

CHAPTER 5. ALGORITHM DEVELOPMENT 17

(24)

CHAPTER 5. ALGORITHM DEVELOPMENT 18

Figure 5.2: Dimensions of the resulting spectrograms images for each window

5.6

CNN with spectrograms as inputs

The first idea was to use the spectrogram representation of the records as the input images for the CNN. This, in theory, made sense, because it is clear that when a crisis occurs, a change in frequency and spectrum can be perceived. This way was taken trying to quantify time-frequency information at the same time, important characteristic in epileptic seizure detection.

As told, seizure and non-seizure windows were taken as the positive and negative instances. These windows took into account 16 channels in the following order (front to back): FP1-F7, F7-T7, T7-P7, P7-O1, FP1-F3, F3-C3, C3-P3, P3-O1, FP2-F4, F4-C4, C4-P4, ,P4-O2, FP2-F8, F8-T8, T8-P8, P8-O2. The wavelet transform was configured to extract the frequency components between 1 and 30 Hz with 1 Hz step (30 points high); the windows were 20 seconds long (as the sampling rate is 256Hz, the windows were 20×256 = 5120 points long); and each record had 16 channels, each one of them with its corresponding 30×5120 image, so that all 16 images were put one behind the other (so, 16 points deep). In summary, each crisis or non-crisis window was represented by a huge grayscale spectro-gram image of 30×5120×16 = 20457.600 pixels. Figure 5.2 shows the dimensions of those final images. This same configuration was made with the records treated with ICA, having input images of the same size. Trying to reduce this huge dimensionality, time re-sampling was made. However, by re-sampling the time, it might be possible to lose important information. This is why it was only considered 2-fold re-sampling (having now 2560 points long images), making the input images a little smaller. No bigger re-sampling was made in order to keep information of the record.

With those huge images, the training and building of the CNN was done. A big inconvenience when working with convolutional Neural Networks is the high quantity of parameters that must be chosen. In general, CNN work with three types of layers: convolutional layer, pooling layer and non-linearity layer. Each one of these layers with their corresponding parameters as follows:

• Convolutional layer: number of filters, filters’ height, filters’ length, filters’ deep.

• Pooling layer: type of pooling (max, mean, etc), filter’s height, filter’s length, filter’s deep

• Non-linearity layer: type of non-linearity (ReLU, sigmoid, tan, etc.)

The selection of one parameter changes the possible values for the other parameters. Also, the architec-ture used is another parameter: normally a big layer is formed with convolution-nonlinearity-pooling, so the number of big layers is another parameter to take into account.

(25)

CHAPTER 5. ALGORITHM DEVELOPMENT 19

Due to the large amount of possible parameter combinations and the lack of techniques or algorithms to chose them, the planned strategy was constructing, training and validating a CNN with only one big layer (convolution + non-linearity + max-pooling) until the best performance is achieved. Once this is done, a new big layer is added after the best first layer, and so on. Figure 5.3 shows this strategy. The pair convolution+pooling is a big layer (in between them a non-linearity might be). The idea is to construct one big layer by one.

Figure 5.3: General strategy for the CNN building and training: one big layer at a time.

The whole database of seizure and non-seizure windows (now as spectrograms images) was divided into 3 parts: training (50%) which was used to generate the CNN and train it, validation (25%) which wasn’t seen in the training part and was used to select the best parameters, and testing (25%) which also wasn’t used at the training and was used to evaluate the real and final performance of the con-structed CNN.

Several CNN were tried with this strategy varying the parameters and choosing multiple combinations between them.

5.7

Raw signal images

The second type of solution that was worked on, was using once again CNNs as the tool to build the classifier, but using another type of input images. As it is explained in the chapter 6, using spec-trograms didn’t work well. No matter which parameters or architecture were used for the CNN, the classifier wasn’t learning anything. This is the reason a new approach was made. This new approach is really simple, and basically tried to make the CNN see the input windows just like a person sees them. When doctors watch a EEG (a window of the EEG) and want to determine if there is an epileptic crisis or not, they don’t do any time-frequency conversion or transform. They just watch the signals just like in the figure 5.1. With this kind of images they recognize and extract by them-selves mostly all the information they need. This was the general idea of this new approach: simplicity.

To do so, the new windows were not transformed into the spectrograms or by any other transform. They were simply treated as the input images for the CNN. This has, in advance, two major advantages:

• Speed and simplicity: no transform is needed. The images are used just like we see them. Really simple.

• Size: unlike the other case, where the input images are really big, this new representation gives smaller images, determined only by the visual size of the signals. This opens also the possibility of using more time points in the windows, getting more information around the crisis. The

(26)

CHAPTER 5. ALGORITHM DEVELOPMENT 20

windows used for this new approach were 120 seconds long and their corresponding images had size 434×343 pixels, having way smaller input images for the CNN, which is better.

5.8

CNN with raw signal images as inputs

Those raw images where used as input images for the CNN, trying to make the CNN think as a human and understand whether there is a crisis or not only based on that simple images.

As told, seizure and non-seizure windows were taken as the positive and negative instances. These windows took into account 16 channels in the following order (up to down): FP1-F7, F7-T7, T7-P7, P7-O1, FP1-F3, F3-C3, C3-P3, P3-O1, FP2-F4, F4-C4, C4-P4, P4-O2, FP2-F8, F8-T8, T8-P8, P8-O2. The windows were 120 seconds long. taking into account all the 16 channels. Each crisis or non-crisis window was represented by an image of 434×343 pixels large.

Once again, the whole database of seizure and non-seizure windows images was divided in 3 parts: training (50%) which was used to generate the CNN and train it, validation (25%) which wasn’t seen in the training part and was used to select the best parameters, and testing (25%) which also wasn’t used at the training and was used to evaluate the real and final performance of the constructed CNN.

The same kind of images were generated using ICA (Independent Component Analysis) over the original records, just like explained in section 5.5. CNNs with different architectures and parameters were constructed and trained using those two possibilities (with ICA and without ICA), using always the strategy that was already explained (get the best CNN with one big layer, and then add the following).

5.9

Evaluation

The evaluation of the performance of the CNN classifier was done principally calculating the classifica-tion error in the validaclassifica-tion set. This error allowed to take the best combinaclassifica-tion of parameters and the best architecture. It has no sense to concentrate on the training error, since it is calculated with the images that were used to train the system. The classification error is calculated simply by counting the number of windows that are misclassified and dividing that number by the total of evaluated windows. This number is also defined as 1-accuracy. This is,

Error= # of misclassified windows

# of evaluated windows = 1−ACC

Where a misclassified window is a window that is classified as seizure when it is not or is classified as non-seizure when it is a seizure.

The final performance of the classifier is the accuracy over the test dataset.

After having a definitive CNN architecture and parameters with its corresponding accuracy, other type of evaluation was made: leave-one-out training. Leave-one-out training consists in training the classifier with all the patients except with patient 1, and testing the CNN with that patient. Then the same thing is done with patient 2, 3, 4 and so on, leaving one patient out in each case which is only used for testing. To do this kind of training-evaluation, all the records of the testing patient were taken and a sliding window of 120 seconds was rolled over the whole EEG of that patient with an overlap of 50% between each consecutive windows. For each position of the sliding window, a prediction of crisis or non-crisis was made. This generated hundreds or thousands of windows per patient. Nevertheless, it is clear that the great majority of those windows were non-seizures in reality, having an unbalanced

(27)

CHAPTER 5. ALGORITHM DEVELOPMENT 21

classification problem (the negative category was larger than the positive). In this kind of situations it is better to calculate other metrics instead of the accuracy that was used before: specificity and sensibility were calculated for each patient. The definitions of both metrics are:

Sensitivity=Recall=T rueP ositiveRate(T P R) = T P T P +F N =

T P P

Specif icity=T rueN egativeRate(T N R) = T N T N+F P =

T N N

Where

TP=True Positives: # of windows classified as seizures when they are really seizures.

TN=True Negatives: # of windows classified as non-seizures when they are really non-seizures. FP=False Positives: # of windows classified as seizures when they are really non-seizures. FN=False Negatives: # of windows classified as non-seizures when they are really seizures.

These type of metrics are better than the traditional accuracy with an unbalanced classification prob-lem, because they show the accuracy for each class separately.

Just to note, the traditional accuracy is calculated as:

Accuracy= T P +T N

(28)

chapter 6

Results and Discussion

According to the work methodology explained in the Section 5, the results of each step and their corresponding discussion are presented below.

6.1

Time-frequency representation: spectrograms

The spectrogram representation of the windows was the first approach. It was decided to use this strategy in order to try to quantify or visualize easily the time-frequency relationships that occur in a seizure. The idea was to obtain an image that could be representative for changes in frequency in different channels and in different time. The way to do this was using the Wavelet Transform, because, as it was already explained, EEG signals are really non-stationary, which makes them completely suit-able to this specific transform.

The first decision was selecting which range of frequencies should be taken into account. Since the range of interest is up to 30Hz, the signals were previously filtered up to that frequency, and then, the Wavelet Transform was configured to focus only in frequencies from 1Hz to 30Hz in steps of 1Hz.

The second, and crucial, decision was choosing the mother wavelet. There are several type of wavelets, and it is absolutely important to chose the one that works better with the specific problem than one is trying to solve. To make this selection, the time-frequency representation of some windows was made with the different wavelets. The corresponding spectrogram images generated with different mother wavelets can be found in the figures 6.1, 6.2, 6.3, 6.4 and 6.5. All of them constructed for the signal 6.1(a). Note that those images have the 16 spectrograms (one per channel) one below the other. This is showed here like that just for illustrative reasons. In reality, each spectrogram is below the other, in 3D, generating a 16 channels deep spectrogram image, just how it is explained in the figure 5.2. It is clear that the best and cleaner spectrogram representation of the window is when the gabor wavelet is used. With this wavelet, it is possible to appreciate the different frequencies present in the whole time span of the signals. With those images, it was decided to use this gabor wave as the mother wavelet for the Wavelet Transforms.

6.2

Independent Component Analysis (ICA)

Knowing that the crisis can be focused in one specific electrode, but propagated through the other ones, a specific signal treatment technique was used to try to make the signals from the different elec-trodes as independent as possible to the others. This was made to reduce redundancy and try to make the signals cleaner and easier to visualize and analyze. To achieve this, ICA (Independent Component Analysis) was applied over the 16 signals, resulting in a new set of 16 signals independent between

(29)

CHAPTER 6. RESULTS AND DISCUSSION 23

(a) Record (b) Spectrogram with db1 wavelet

(c) Spectrogram with db4 wavelet (d) Spectrogram with db10 wavelet

Figure 6.1: Signal, spectrograms with db wavelets

them. An example for crisis and non crisis windows can be found in figure 6.6.

As it can be seen, using ICA can make the signals easier to see and analyze. The difference between two really similar windows can be perceived when ICA is applied. This signal preprocessing technique seemed to work well at this point, so the spectrograms of the database previously treated with ICA were extracted. With this, two different spectrograms representations were extracted from the windows: spectrograms from original signals without preprocessing and spectrograms after the utilization of ICA over the original signals. In figure 6.7 are examples of the resulting spectrograms for crisis and non-crisis windows before and after applying ICA. As it can be seen, using ICA seems to work well, at least visually. Nevertheless, looking at more examples of crisis and non-crisis windows, it is difficult to perceive a difference between both classes looking at their spectrograms, even with ICA.

6.3

CNN with spectrograms as inputs

As said, two different spectrograms were extracted from the windows: with ICA and without ICA. The next step was creating the CNN and train it to make it learn to differentiate between both classes: crisis or non-crisis windows, based on their spectrograms.

Some considerations were taken into account before constructing the CNN, because of the nature of the problem and the idea beneath it. When working with CNNs, the size and dimensions of the

(30)

con-CHAPTER 6. RESULTS AND DISCUSSION 24

(a) Spectrogram with sym2 wavelet (b) Spectrogram with sym5 wavelet

(c) Spectrogram with sym8 wavelet

Figure 6.2: Spectrograms with sym wavelets

volutional filters are crucial, since those parameters can change completely what the CNN is learning and seeing. In a traditional character recognition problem, for example, square filters are used and they sweep all over the image looking for the character that wants to be recognized. These square and sweep-all-the-image filters are used because it really doesn’t matter where the character is in the window. What the CNN is trying to learn in that case is to simply find the character whenever it is, and it can sweep all over the image because in those cases the two (or three) dimensions of the image mean the same: simply pixels. But in this case, with spectrograms and epileptic crisis recognition, it is not same situation, so different type of filters must be used.

Looking at the figure 5.2, it is clear that in the input images for the CNN (those spectrograms), each one of the three dimensions mean something different: time, frequency or channels. It is not a normal image where all its dimensions mean the same: the crisis can be found in any moment in time of the window (so a sweep over time must be taken into account); but a crisis has specific patterns in frequency, which means that finding a yellow band at high frequencies means someting completely different than finding the same band at low frequencies. This is why there must not be any type of sweeping across the frequency bands, in order to keep the frequency invariability. This thought is found also in [3]. Same thing happens across the channels dimension (deep), because a crisis takes into account not a single channel or electrode, but the relationships and differences between all of them, so there must not be neither a sweep across the third dimension of those spectrograms.

(31)

CHAPTER 6. RESULTS AND DISCUSSION 25

(a) Spectrogram with gaus1 wavelet (b) Spectrogram with gaus5 wavelet

(c) Spectrogram with gaus8 wavelet

Figure 6.3: Spectrograms with gaus wavelets

other two dimensions of the filters completely fixed. The filters used for the CNN have dimensions x×30×16, where x is a adjustable parameter (length of filter).

Thus, understanding the restrictions in the filters, allowed the building and training of the CNN. As said, the followed strategy was to construct the best CNN possible with only one big layer (convo-lution+pooling), and then adding more layers to try to make it work better. This process was done focusing to make the network with only one big layer to work. It didn’t have to work perfectly and have a very low error rate, but the idea was trying to make the CNN learn something.

Nevertheless, no matter which combination of parameters were used, no good results were achieved. The examination and testing of different combination of parameters took several weeks, trying multiple combinations, without having any positive results: the CNN wasn’t capable of learning anything using those spectrograms as input images. It didn’t matter if the spectrograms were constructed with or without ICA.

In fact, only two situations were happening when training and testing the CNN. Examples of these two situations are shown in figure 6.8. The situations were: absolutely no learning (characterized by a classification error near 50%, which means nothing in a binary classification problem) and absolute overfitting (characterized by a really low training error, but around 50% validation error). Both sit-uations show no learning and so the CNNs were non functional. No matter which combination of parameters or which architecture was tested, one out of two of those results happened.

(32)

CHAPTER 6. RESULTS AND DISCUSSION 26

(a) Spectrogram with coif2 wavelet (b) Spectrogram with coif5 wavelet

(c) Spectrogram with haar wavelet

Figure 6.4: Spectrograms with coif and haar wavelets

Because of this situation, where no good results were being obtained, no matter the different param-eters or architecture that were used, it was decided to change the approach and stop working with spectrograms. The conclusion of this part was that spectrograms were not rich enough in information for the discrimination between crisis and non-crisis windows with a CNN. Really, when examined vi-sually both categories, their corresponding spectrograms were very similar, making the classification very complicated.

6.4

CNN with raw signal images as inputs

Having no results with the spectrogram representation of the windows, it was decided to change ap-proach and try a simpler way: using raw images. The idea here, as it was explained in section 5.8, was not to treat each signal as a vector of points and process them into spectrograms or any other transform, but simply taking the screenshot of the plot of all the signals and using these images as inputs for the CNN. Just like a medical doctor does by visual examination. In this new case, the input images were just like the ones in figure 5.1.

A first advantage of doing this is that there is no Wavelet transform, so the mother wavelet choice doesn’t have to be done which means also a simpler input for the CNN. Another important advantage

(33)

CHAPTER 6. RESULTS AND DISCUSSION 27

(a) Spectrogram with mexh wavelet (b) Spectrogram with meyr wavelet

(c) Spectrogram with dmey wavelet (d) Spectrogram with gabor wavelet

Figure 6.5: Spectrograms with mexh, meyr, dmey and gabor wavelets

is the size and dimension of the images. Since the records were not taken as vectors of points that represent a signal, but as mere screenshots, it was possible to extract windows with more duration in time, without having larger images. Also, now the images would be 2 dimensional, instead of the 3 dimension spectrograms. To give explicit numbers, a window of 20 seconds long was represented as a image of 30×5120×16 = 20457.600 pixels using the spectrogram representation, while the same window was represented by 434×343 = 148.862 pixels image with this new approach (around 16.5 times smaller images). Even more, this raw signal images were always the same size, so it was possible to extract windows of longer duration, so that more information from before and after the seizure was taken into account.

All the seizures of the database had a mean duration of 64.29 seconds with standard deviation of 71.27 seconds. Taking mean+std.dev, 135.56 seconds were obtained, but to use a simpler number in terms of time, it was decided to use 120 seconds long windows. So, finally, this new approach used images 16.5 times smaller than the spectrograms, and which carried 6 times more information. The input images and their dimensions are exemplified in figure 6.9.

In order to see if both classes were sufficiently different between them, the average image of all the positive and negative instances were calculated. These images can be seen in figure 6.10 , where it is possible to appreciate that both classes are a little different between them.

(34)

CHAPTER 6. RESULTS AND DISCUSSION 28

Figure 6.6: Examples of crisis and non crisis windows before and after ICA

case, just like in the former, a restriction over the filters was necessary. In this kind of images, the horizontal axis represented time, while the vertical one represented channels in a specific order. The filters must had the property of sweeping only in the time axis, but not in the channel one. This is why rectangular filters were selected with size 343×x, where x, once again, was an adjustable parameter (filter length). The same strategy for the building of the CNN was followed: construction of one big layer, followed by the second, and so on.

6.4.1

First layer

The architecture of the CNN tried is the one presented in the figure 6.11. As it can be seen, the layers are:

• Convolutional layer, which has 343×434 input images. A number of convolutional filters (50 in this case) of size 343×85 (85 in this case) are used, which produces the input images of the next layer.

• Max-pooling layer, which has 50×350 input images. 350 comes because of the using of 85-pixel long filters (84 pixels are subtracted in the dimension of the sweeping), and the new 50-pixels dimensions represent the number of filters used. In this layer a max-pooling is taken, where for every 7 pixels in time, the maximum is value is saved. This pooling produces the new input images of the next layer.

• Fully connected layer, which has 50×50 input images. 50 comes from the max-pooling layer, because 350/7 = 5.

• Output layer, which generates the label of crisis or non-crisis from the 50×50 images that were produced before.

(35)

CHAPTER 6. RESULTS AND DISCUSSION 29

Figure 6.7: Examples of crisis and non crisis spectrograms before and after ICA. Top row are the windows without ICA and bottom row are the windows after ICA. On the left column are crisis windows and on the right are examples of non-crisis windows.

The construction of the first big layer needed the tuning of, principally, 2 parameters: filter length and number of filter for the convolutional layer (in figure 6.11, those values are 85 and 50, respectively). To select the best combination of parameters, many possible combinations were tested. In tables 6.1 and 6.2 some results of this tests can be seen.

Table 6.1: Performance in train and validation of the CNN vs convolutional filter size Size Filters Error Train Error Val Energy Train Energy Val

36 11% 17% 0,33 0,54

64 9% 15% 0,25 0,52

85 7% 14% 0,23 0,53

120 5% 15% 0,20 0,53

155 5% 17% 0,17 0,53

246 3% 21% 0,13 0,57

This new approach worked, because the validation error was low, so the CNN was actually learning something from the input raw images that allowed it to classify windows as seizures or not with low error (below 20%). There was some overfitting, since the training error was lower than the validation error, but it was really little. It is normal to have bigger errors in validation than in training, as long as the difference is not big.

(36)

CHAPTER 6. RESULTS AND DISCUSSION 30

Figure 6.8: Examples of the two cases obtained when training the CNN with spectrograms: overfitting (left) and no learning (right).

(37)

CHAPTER 6. RESULTS AND DISCUSSION 31

Figure 6.9: Input images for the second approach: raw image windows

Table 6.2: Performance in train and validation of the CNN vs number of convolutional filters # of Filters Error Train Error Val Energy Train Energy Val

20 12% 18% 0,3 0,53

50 7% 14% 0,23 0,53

100 7% 16% 0,25 0,53

200 7% 16% 0,21 0,52

500 3% 20% 0,14 0,53

6.4.2

Second layer

Once the first layer was good enough, with a classification error of around 14%, the idea was adding another big layer to try to reduce that error even more. Starting with the 50×50 images, just after the pooling of the first big layer, two different alternatives were explored:

Option 1: convolution

The first option consisted in repeating the strategy used on the first big layer to construct the second one. This is, with the 50×50 images, to try different combination of parameters and see which one works better for an architecture convolution+pooling. Nevertheless, no matter which combination of parameters was selected, no good results were obtained. Once again, the validation error went around 50%, just like what was happening as in the images from figure 6.8. What might have happened here is that adding another convolutional layer was confounding the network, making it more complex than it really needed to be. Because of that, this option was not considered due to the lack of positive results.

(38)

CHAPTER 6. RESULTS AND DISCUSSION 32

(39)

CHAPTER 6. RESULTS AND DISCUSSION 33

Figure 6.11: One big layer architecture for the CNN

Option 2: std. deviation pooling

Trying to understand why the first option might have failed, the idea of looking into the 50×50 images appeared. When working with only one big layer, it was known with which kind of images the CNN was working with as input, so some considerations (that worked well), like the restrictions over the filter dimensions, were taken. This was not happening with the input images of the second layers, so the idea was looking into those images to try to understand them and how to treat them. When looking into those unknown images, the ones from figure 6.12 were obtained (example for a crisis window and one for a non-crisis window)

The special interest were the square images, the ones that appear in the bottom of the figure. At first sight, these images for both categories seemed really similar between them, but when looking at several examples, it was possible to identify that in the majority of the cases, the crisis windows showed a blurry horizontal band in those 50×50 images. (Figure 6.13). So, the difference between positive and negative instances in that scale was the presence or absence of those horizontal bands. The next step was creating a way to identify (or quantify) that blurriness.

The simpler way thought to quantify those blurry zones was using the standard deviation for each column. As it can be seen, all the points in the same column (time) in the 50×50 images are really similar between them. When there is a blurry band, some pixels will be different. The standard devi-ation might be able to quantify that difference. Taking the standard devidevi-ation over the pixels of the same column would give a point for that specific column, and if that was made over all the columns, that image would be converted into a 50-elements vector (Figure 6.14) where, in theory, if there is a blurry band, the elements of the vector would be larger. This might be understood as a standard deviation pooling.

(40)

CHAPTER 6. RESULTS AND DISCUSSION 34

Figure 6.12: Images in between the layers

In fact, when watching the resulting vectors for all the windows of crisis and non-crisis episodes, a clear difference was appreciated: crisis vectors were clearer, which means higher std. deviation, as predicted; and non-crisis windows were darker. In figure 6.15 it is possible to see all those vectors for the training, validation and test dataset. In that figure, each column is a vector, so each column represents a window.

At this point, the crisis and non-crisis windows were represented as 50-elements vectors, clearly differ-entiated between both classes. Those vectors were used to train and validate a normal neural network, with only fully connected layers. To do so, a neural network with those vectors as inputs was con-structed and training varying the number of hidden neurons. The best result obtained was a training classification error of 13.6% and a validation classification error of 15.5%

This strategy worked, because the classification errors are low and the networks are learning how to differentiate between both classes, but the performance is not better than the one obtained with only one layer in the CNN, so adding this new algorithm is not worth it. The idea was completely logical, and the development of this standard deviation layer showed that the thought below it was correct, but sadly, it did not improve the previous results.

As an hypothesis, it is thought that calculating those standard deviation vectors from the 50×50 images inside the CNN might work better. This would mean implementing a standard deviation pooling layer, which does not exist in the library of CNN that was used. The pooling layer in this library can only be configured as max-pooling or mean-pooling, but there is not the implementation, nor the possibility of using a std.dev-pooling. Maybe, by extracting those 50×50 images from the CNN, processing them manually and then feeding them into another network, the general optimization of the CNN is altered, and it does not converge to the best point it might. This is just an hypothesis,

(41)

CHAPTER 6. RESULTS AND DISCUSSION 35

(42)

CHAPTER 6. RESULTS AND DISCUSSION 36

Figure 6.14: Generation of vectors from the 50×50 images by taking the standard deviation for each column

(43)

CHAPTER 6. RESULTS AND DISCUSSION 37

Figure 6.15: Standard deviation vectors produced from the std.deviation pooling of the 50×50 images. Each column is a vector, and both classes (seizure or non-seizure) are separated by the red vertical dashed line).

that would be very interesting to try by implementing that std.dev-pooling layer in the library, but due to time limitations, it was not tried.

6.5

Evaluation

After all the tests, options and possibilities tried, the final classifier selected was the CNN with only one big layer, as presented in the figure 6.11. This architecture showed a classification error of around 14% in the validation test, which is a really good performance, using the simplest input images possible. The final performance of the classifier, calculated over the training, validation and test datasets is:

• Training: accuracy=0.923, sensitivity=0.958, specificity=0.887

• Validation: accuracy=0.857, sensitivity=0.800, specificity=0.914

• Test: accuracy=0.861, sensitivity=0.778, specificity=0.944

The training and validation curve obtained by the mean performance of 10 different randomly selected training and validation windows, with the corresponding error bars, is shown in figure 6.16. In that curve, no matter which windows are used to train and validate, the results are almost always the same (error bars are small). This shows that the proposed algorithm and architecture is robust and does not depend of the training and validation instances. At the end of the 100 epochs, the mean training error is 0.0757±0.0033 and the mean validation error is 0.1446±0.0051.

The performance with the leave-one-out technique is shown in figures 6.17 and 6.18. In these figures, the column of patient 1, corresponds to training the network without the patient 1 and testing the

(44)

CHAPTER 6. RESULTS AND DISCUSSION 38

Figure 6.16: Mean training and validation curve obtained with 10 different randomly selected training and validation windows

(45)

CHAPTER 6. RESULTS AND DISCUSSION 39

classifier in that patient over all the possible windows in his records (how it was explained in section 5.9)

Figure 6.17: Sensitivity obtained with the final classifier under the leave-one-out training method

Figure 6.18: Specificity obtained with the final classifier under the leave-one-out training method

(46)

CHAPTER 6. RESULTS AND DISCUSSION 40

is the evaluation time. It takes 0.053 seconds (53 ms) to classify the image as a seizure or not, which means this classifier can process 18,87 windows of 2 minutes in one second. All this calculations were made in MATLAB R2015a in a computer with double processor Intel Xeon X5650 @ 2.67GHz.

Referencias

Documento similar