Ordering of functions according to multiple fuzzy
criteria: Application to electroencephalography
por
Andrea Burgos Madrigal
Tesis sometida como requerimiento parcial para obtener el grado de
Maestra en Ciencias en el ´area de Ciencias Computacionales
por el
Instituto Nacional de Astrof´ısica, ´Optica y Electr ´onica Enero, 2018
Tonantzintla, Puebla
Supervisada por:
Dr. Felipe Orihuela Espina, Dr. Carlos Alberto Reyes Garc´ıa
c
INAOE 2018 Derechos Reservados
El autor(a) otorga al INAOE el permiso de distribuir y reproducir copias de esta tesis en su
A mis asesores por su tiempo, comentarios y lecciones pues nadie m´as sabe lo que implic´o ´este
trabajo. A mi familia y amigos; los que est´an y los que han tenido que irse. A todos aquellos que
sin dudarlo han estado conmigo. Por ´ultimo, agradezco a la beca no. 426901 que me fu´e otorgada
Contents
Notations xv
Abstract xvii
1 Introduction 1
1.1 The research problem in a nutshell . . . 1
1.2 Preliminaries . . . 2
1.2.1 Fuzzy sets and membership function; contribution from a physio-logical source . . . 5
1.3 Motivation and Justification . . . 6
1.4 Research Problem . . . 8
1.5 Research Questions . . . 10
1.6 Hypothesis . . . 11
1.7 Objectives . . . 11
1.8 Assessment: scope and limitations . . . 12
1.9 Contributions . . . 13
1.10 Chapter summary . . . 13
2 Theoretical Framework 15
2.1 Fuzzy sets . . . 15
2.1.1 Set properties and basic operations . . . 17
2.2 Relations . . . 17
2.2.1 Subsethood . . . 18
2.2.2 Properties of Binary Relations . . . 18
2.3 Orders . . . 19
2.3.1 Order Bounds . . . 20
2.3.2 Hasse Diagram . . . 21
2.4 Fuzzy ordering indices . . . 22
2.4.1 Fuzzy ordering by pairwise comparisons . . . 22
2.4.2 Transitive Closure . . . 23
2.4.3 α-cut . . . 24
2.4.4 Equivalence classes, groups and partial orders from α-cuts . . . 24
2.5 Probability models . . . 25
2.5.1 Markov chains . . . 25
2.5.2 The Box Cox transformation . . . 26
2.6 Criteria priorization . . . 26
2.6.1 Additive model (crisp case) . . . 26
2.6.2 Linguistic Hedges (fuzzy case) . . . 27
2.6.3 Contrast intensification (fuzzy case) . . . 27
2.8 Centrality . . . 29
2.8.1 Independent Component Analysis . . . 30
2.9 Electroencephalography . . . 33
2.9.1 Denoising of EEG . . . 33
2.10 Chapter summary . . . 34
3 Related work 35 3.1 Ordering ICA Components in neuroimaging . . . 35
3.2 Estimation of Membership Functions . . . 37
3.3 Fuzzy Ordering . . . 39
3.4 Chapter Summary . . . 40
4 Proposed Solution 41 4.1 Overview: Determination of the ordering criterion . . . 41
4.2 Membership functions . . . 42
4.2.1 Knowledge based memberships . . . 43
4.2.2 Prototype based memberships . . . 45
4.2.3 Distribution based membership . . . 46
4.3 Fuzzy ordering of functions . . . 48
4.3.1 Equalize membership values . . . 48
4.3.2 Lifting . . . 49
4.3.3 Criteria’s weighting . . . 49
4.3.5 Ensuring transitiveness (Preorder): transitive closure of the
subset-hoods . . . 52
4.3.6 Apply cutset defuzzification . . . 53
4.3.7 Ordering relation: Detection of equivalence classes for removal of potential symmetry . . . 54
4.3.8 Ensuring antisymmetry: Grouping equivalent classes . . . 56
4.3.9 The Hasse diagram: The fuzzy-zeta matrix of adjacency . . . 56
4.4 Generation of synthetic order examples for evaluation . . . 57
4.4.1 Evaluation of weighting strategies . . . 60
4.4.2 Application of an order . . . 60
4.5 Chapter Summary . . . 60
5 Experiments and Results 61 5.1 Data used in experiments . . . 61
5.2 Experiment 1: Assessment membership level functions . . . 64
5.3 Experiment 2: Stability or Tolerance to noise . . . 68
5.4 Experiment 3: Effect of different weighting strategies . . . 72
5.5 Experiment 4: Performance in EEG application . . . 77
5.5.1 Imagined speech . . . 77
5.5.2 Attention . . . 80
5.6 Empirical demonstration of the properties of the order relation . . . 85
6 Conclusions and Future Work 87 6.1 Conclusions . . . 87
6.1.1 Membership levels . . . 87
6.1.2 Weighting criteria . . . 88
6.1.3 Ordering method . . . 89
6.1.4 Application to EEG . . . 90
6.2 Future Work . . . 90
A Appendix 93 A.0.1 Derivation of the distribution . . . 93
List of Figures
1.1 Fuzzy sets nature . . . 5
1.2 Motivation and Justification . . . 7
1.3 Problem statement . . . 9
2.1 Classical Graph vs Hasse Diagram . . . 21
2.2 Concentration, dilation and intensification modifiers . . . 28
2.3 Example ICA . . . 32
4.1 Proposed methodology . . . 42
4.2 Subsets selected for the thesis . . . 43
4.3 Energy of a signal in the frequency bands . . . 44
4.4 Blink eye signals . . . 46
4.5 Real heart rate signals . . . 47
4.6 Result of transitive subsethoods following defuzzyfication with theα-cut (α= 0.76) . . . 53
4.7 Equivalent classes . . . 55
4.8 Resulting relation from example . . . 56
4.9 Example of a synthetic order automatically generated, and comparisson
with an aimed order . . . 59
5.1 Imagined Speechdistributions across the scalp . . . 63
5.2 TheImagined Speechdataset and the EMOTIV Epoc+ device . . . 63
5.3 Biosemi ActiveTwo with 32 channels . . . 64
5.4 Synthetic data . . . 65
5.5 Spectra of sources and components . . . 66
5.6 Raw and distributed membership levels . . . 67
5.7 Syntheticδsignals for evaluating knowledge based membership functions tolerance to noise . . . 68
5.8 Synthetic blink signals for evaluating prototype based membership func-tions tolerance to noise . . . 69
5.9 Synthetic heart beat signals for evaluating distribution based membership functions tolerance to noise . . . 70
5.10 Tolerance to noise replications . . . 71
5.11 Membership values tendency across weighting methods . . . 72
5.12 Determinant of the matrices of membership values for each component across criteria modulated by the weighting strategy for a single execution . 73 5.13 Determinant of the matrices of membership values for each component across criteria modulated by the weighting strategy across 80 replications . 74 5.14 Memberships values weighted for the 5 criteria set . . . 75
5.15 Memberships values weighted for the 6 criteria set . . . 76
5.17 Orders achieved for EEG data . . . 78
5.18 Example ordering strategies outputs: imagined speech . . . 79
5.19 Comparison between fuzzy methods based on the number of edges re-trieved: imagined speech . . . 80
5.20 Classification rates supported by the different orderings for imagined speech dataset classified with Random Forest . . . 81
5.21 Number of blocks sampled from the attention dataset depending on the duration . . . 82
5.22 Band pass filtering . . . 83
5.23 Example of ordering strategies output: Attention scenario . . . 83
5.24 Comparison between fuzzy methods based on the number of edges re-trieved: Attention . . . 84
5.25 Precision and accuracy reported for attention scenario . . . 85
List of Tables
1.1 Example of ordering according to multiple fuzzy criteria . . . 4
2.1 Set identities: crisp sets . . . 17
2.2 Properties of crisp binary relations . . . 19
2.3 Properties of fuzzy binary relations . . . 19
2.4 Properties: relations of order . . . 20
2.5 Frequencies and associated states of mind common in the brain signal . . . 34
3.1 Related work on ordering ICA components in neuroimaging . . . 36
3.2 Related work about membership functions . . . 38
3.3 Related work about fuzzy ordering by pairwise comparisons . . . 39
4.1 Resulting membership levels . . . 49
4.2 Membership levels already equalized and with the bottom added. Note that the bottom is an element that does no have membership to any criteria. 51 4.3 Subsethoods among functions . . . 52
4.4 Transitive closure of the subsethood . . . 52
4.5 The cutset obtained with anα-cut=0.76 (calculated by Otsu method from
Table??) from a no weighted input . . . 53
4.6 Detecting symmetry. Equivalent classes . . . 55
4.7 Fuzzy-zeta matrix used to create the Hasse diagram. . . 56
4.8 Similitudes . . . 58
4.9 Similitudes weighted . . . 58
Nomenclature
µci(yj) Membership level of a Component y to some criterionc. ci ={(ci,k,µcci} Prototype-based criterion definition proposal.
ci =g(X) +ε Knowledge-based criterion definition proposal.
ci =P(η,σ) Distribution-based criterion definition proposal.
η Mean value.
σ Standard deviation.
µconci(yj) Concentration operator.
µdil
ci(yj) Dilation operator.
µintci(yj) Contrast intensifier operator.
δ Electroencehalografic band frequency named Delta in
the range of 1 to 4 Hz .
θ Electroencehalografic band frequency named Theta in
the range of 4 to 8 Hz.
α Electroencehalografic band frequency named Alpha in
the range of 8 to 12 Hz.
β Electroencehalografic band frequency named Beta in the
range of 13 to 20 Hz.
α−cut Defuzzifier to crisp sets. Also found asα-level.
Abstract
This thesis looks at the problem of ordering of functions, here referred to ascomponents, over multiple fuzzy criteria. Current solutions require explicit quantification of the relevance of the criteria to the ordering which may be unavailable. We hypothesized that the relevance can be encoded in a weigthed strategy such that the resulting ordering of the components approaches that which an expert would have done. The solution here relies on a new set of membership functions to the criteria and the incorporation to the ordering relation of intensifiers to yield the weighting strategy. The new ordering relation is applied electroencephalography (EEG) where the relevance of independent components to certain neuroscientific process has to be determined. Three new fuzzy membership functions are proposed; knowledge based,prototype basedanddistribution based. Validity of the new membership functions is established by showing membership values convergent (high values for components close to a criteria) and divergent (low values for components unrelated to the criteria) over synthetic data. Membership functions sustained a tolerance of noise up to 0.25, 0.4 and 0.2 for knowledge, prototype and distribution based respectively before affecting the ordering. Two weightings strategies mapping qualitative appreciations to quantitative contributions (contrast modifiers and linguistic hedges) are tested, and their performance is compared to two explicit weighting strategies (unweighted, additive). The suggested weighting methods changed the determinant of the mixture matrix (Friedmann: p <0.05). Compared to the unweighted, linguistic hedes showed higher similarity to the aimed order. Finally, a new order relation is proposed by integrating weighting, equalization and lifting to the comparison process. The performance of the new ordering relation is assessed on two ordering tasks over
EEG datasets, one onimagined speech, and the other from anattentionaltaks labelled by an expert psychologist. Performance was assessed through classification wrapping and compared to ordering obtained with the Hurst method, proposed ad-hoc for ordering components of theimagined speechdataset. In both datasets, the clasiffication after ordering is above chance; being closer to the Hurst method (than to chance) for the imagined speech dataset, and statistically indistinguishable to this standard when accuracy rates are compared in the attention dataset. In the attention dataset, the Hurst method failed to order the components, whilst our proposal maintain some discriminative capacity. Contributions are (i) the modeling of the criteria in terms of three new membership functions, (ii) the incorporation of intensifiers as a means to automatically resolve the relevance of the functions and (iii) a new ordering relation. EEG studies of brain activity can benefit from the proposal.
Chapter 1
Introduction
Ordering relations are pervasive in everyday life and science. Relations of orders can be demanded over functions, across multiple ordering criteria and under fuzzy domains, a problem for which current solutions are still unsatisfactory. This thesis investigates a novel relation of order that innovatively uses qualitative knowledge of the domain to automatically modulate the importance of the functions across the multiple fuzzy criteria. This chapter introduces the research carried out in this thesis. It provides a description of the problem, an overview and the rationale of the solution. It defines the critical elements of this thesis; the research questions, hypothesis, goals, contributions, scope and limitations.
1.1
The research problem in a nutshell
Electroencephalograms are affected by a number of artifactssuch as eye movement, blink eye, muscle activity or heart beats [Delorme et al., 2007, Joyce et al., 2004] which can be alleviated using several techniques. A popular choice isIndependent Component Analysis (ICA) whereby the observed signals are decomposed into sources (components). The components retrieved byIndependent Component Analysisare yielded unsorted without any quantitative or qualitative guidance of what the components do represent. The
2 CHAPTER 1. INTRODUCTION
analyst then chooses which components to keep and which to discard, and reconstructs a cleaner Electroencephalography (EEG) from the kept components for further analysis. The expert implicitly ranks the components according to what he considers to be more relevant for the cognitive process under study. Efforts to provide some automatic sorting or ranking of theIndependent Component Analysiscomponents of theEEGrecord often rely on correlations against the stimulus train, with low correlation expected for noise-related components and high correlation expected for stimulus evoked components [Katura et al., 2008]. Then, a threshold decides which components to maintain. The success of these approaches has been limited, perhaps because theIndependent Component Analysis components may not always be crisply associated to noise or signal.
This thesis is motivated by the need to produce a ranking of the Independent Component Analysiscomponents forElectroencephalography. From an abstract point of view this is equivalent to a problem of ordering functions (components) according to multiple fuzzy criteria, a problem which arises in neuroimaging but also in other domains. To do so, it assumes that each of the functions can be fuzzily associated to sets (criteria) characterizing some underlying subprocess known in advance. Then, the ranking of functions is achieved by evaluating the importance of a component in terms of the membership of the component to each one of the known sets.
1.2
Preliminaries
Orders are used naturally in many daily activities such as, for instance, picking the best fruit in the market, choosing where to go on Sunday or deciding which coin has a bigger value. Orders are of utmost importance in science where the fundamental property of magnitude in measuring systems is characterized by an order. Any task requiring sorting is supported by a mathematical relation of order even if implicit. Relations of order, or simply orders, are relations holding the properties of reflexivity, antisymmetry and transitivity according to a sorting criterion defining the precedence (≤) of elements. Orders can be total or partial depending on compliance with trichotomy.
1.2. PRELIMINARIES 3
Three further considerations have to be made when defining a relation of order. First, whether the ordering criterion is single, or the sorting has to be made considering multiple criteria. Second, whether the objects are characterized by constant information or a function of it. Third, whether membership of objects to the set upon which the order is to be defined is dichotomical or not.
Ordering across multiple ordering criteria If you buy a car, you rarely choose based only in price, but also take into account other considerations. Ordering over multiple criteria is just as common in everyday life as doing it over a single criterion. Concil-iating all criteria for ordering imposes additional considerations. Mathematically the set of objects to be sorted are the same, but they are associated to one or more other sets, each image set with its own ordering criterion. It is possible to think of the ordering over multiple criteria as defining an order over a vector rather than a scalar.
Ordering functions If you want to assess the opportunity to invest on stocks of several companies to decide where to invest (ranking investment alternatvies), will you just take the actual value or the trend of latest months? The ordering is not made over a fixed value of the independent variable but consideration of the whole function has to be accounted for. The ordering is made over functions.
Ordering over fuzzy sets Say that you want to sort three avocados according to whether they are ripe or not. Ripeness is not a crisp criterion. The ordering has to take into account the membership of the object to the set over which the ordering criterion is defined. For such applications, the ordering may be expressed under fuzzy set theory with the simplest case attaching the precedence sequence to the membership value. When the domain of application are fuzzy sets (membership of elements to sets is not dichotomical), the ”classical” definitions of the aforementioned properties of reflexivity, antisymmetry, transitivity and trichotomy are modified accordingly. For fuzzy sets their boundary is not sharp. Oredering over fuzzy may still result in a crisp ordering even though the domain e.g. the sets over which the ordering is defined, is fuzzy. This is the case studied here. There are also orderings that are fuzzy themselves but we will not work on those in this thesis.
4 CHAPTER 1. INTRODUCTION
The above considerations may all happen together with increasing degree of com-plexity. For instance, let P = f(C)|C = {c1,c2,c3}be the process commuting to work
where every fuzzy subset cn ∈ C is a criterion; speedyc1, safe c2 and cheap c3. Then,
let O = {o1,o2,o3}be the alternatives to be sorted; bike, subway and car respectively.
The membership level of commuting alternative oj to criterionci,µi(oj)is given on the
Table 1.1. Further, the commuting alternativeoj changes with time -not shown in Table
1.1-e.g. the car becomes old, the bike breaks or the subway service gets worse, that isoj(t)
but we shall keep the notationojfor simplicity. In this example, it may be possible for the
decision maker to exploita prioriinformation from the process P, perhaps considering preferences regarding each criterion. For example, whenever the commuting process will happen early, she may favour the safety criterion over the others, while if he is running late she may favour speediness. The resulting order will change accordingly. All aforementioned considerations are present in this example; ordering across multiple criteria cn, ordering over fuzzy sets with membership function µci and ordering over functions.
Table 1.1: Membership levelµi(oj)to defined criteria of commuting alternatives for the
commut-ing example.
Speedyc1 Safec2 Cheapc3
Bikeo1 0.2 0.3 1
Subwayo2 0.9 0.4 0.8
Caro3 0.7 0.9 0.3
This example is remarkably similar to our Electroencephalography application of interest. The process in our case may be some cognitive process of interest under investigation. The sorting criteria are typical considerations in Electroencephalography analysis; e.g. energy in the αband, the occurrence of blinking, etc. The objects to be
sorted, our ICA components are functions. And further, if the neuroscientific question changes, then the sorting criteria relevance also changes accordingly.
1.3. MOTIVATION AND JUSTIFICATION 5
1.2.1 Fuzzy sets and membership function; contribution from a physiological source
An example of the level of membership to a given set is illustrated in Figure 1.1. Letθ be
the set of time series with frequency components in the range from 4 to 7 Hz, and the membership of a function to this set be given by some indicator of the energy that the series has in such frequency window. Figure 1.1 shows several potential elements. As exemplified, depending on the values selected for those details, the level of membership to the set is thus defined.
Figure 1.1: The sets representing the sorting criteria are fuzzy because each elements to be sorted have different degrees of membership to it. The set of theta is illustrated. Signals with frequencies in other ranges will get lower membership values than signals with frequencies only in the theta range. Signals with higher amplitude (i.e. more energy) in the band have higher membership to the set than signals with less amplitude. Finally, signals with more instances will have higher membership to the set than one with less.
In this thesis, the domain application is EEG. The information to be ordered comes from electrical activity of the brain captured by EEG. The EEG signals have been processed forartifactremoval using Blind Source Separation (BSS), specificallyIndependent Component Analysis, and we aim to sort the Blind Source Separation Components according to their relevance for the cognitive process being investigated [Zibulevsky and Pearlmutter, 2001].
6 CHAPTER 1. INTRODUCTION
1.3
Motivation and Justification
In neuroscience,Electroencephalography(EEG) monitors the electrical activity of the brain. The EEG recording is affected by a number of artifacts (contributions from unwanted sources) such as ocular or muscular. Questioning EEG data often requires clearing the information of interest regarding the cognitive process under investigation [Jung et al., 2000a]. A popular choice for artefact removal for EEG isIndependent Component Analysis (ICA), a type ofBlind Source Separation(BSS) approach that permits decoupling effects of independent contributors to the EEG signal [Joyce et al., 2004]. ICA is a powerful decoupling technique, but its use comes at a price; the sources identified by ICA (i.e. the components 1) aren’t labelled. Labelling the components has to be done by hand
or automatically using some heuristics about which sources do or do not represent information of interest [Tohka et al., 2008]. This is in effect an ordering exercise of functions where the ones mostly related to noise are rejected.
Traditionally, in any attempt to decide on which components to keep and which to reject, the dichotomical decision has been to establish whether each component is noise or not assuming that the components correspond to pure sources [LaConte et al., 2001, Katura et al., 2008]. However, sources are more often than not, not necessarily pure, i.e. they do not represent orthogonally independent physical processes as ICA assumes. Keeping or rejecting them involves accepting noise or removing signal, none of which is a desirable circumstance. When the components aren’t pure, or analogously they fuzzily represent certain purer contributions, it is not the same rejecting the noise than looking for the signal. This is illustrated in Figure 1.2 where an order specifically generated for eliminating noise is compared with another that ranks according to the component relevance for the cognitive process. Over 100 executions of ICA, it is observed that the trend of one ordering is not equivalent to the negation of the other ordering method. At
1In this document, we will refer toIndependent Component Analysisafforded variables assourcesorcomponents
indistinctly. Mathematically, a component is one of the elements of a base defining a coordinate system. Locations within a space are a combination (linear or not) of scaled components. For instance, the classical3~i+4~j+2~zin the Cartesian 3 dimensional space.Independent Component Analysisoutputs a base for the space of all possible recordings, with the particularity that the base is formed by independent components.
1.3. MOTIVATION AND JUSTIFICATION 7
this point, there is a need forordering functions i.e. ICA output components,according to multiple criteriai.e. physiological contributions, that they characterized fuzzilyi.e. each component can convey information of one or more physiological source and to a varying degree.
(a) Ordering focusing on eliminating noise based on Hurst coefficient [Torres-Garc´ıa et al., 2016].
(b) Ordering focusing on the relevance for the process based on similarity to a task related signal [Katura et al., 2008].
Figure 1.2: In fuzzy sets, the complement of set is not the negation a set, thus ordering to eliminate noise is not analogous to ordering to keep signal related information. In the plots, the colors differ-entiate the resulting rank of each component over 100 executions to account for ICA variability.
Efforts have been made in literature to achieve ordering of functions over multiple fuzzy criteria [Van de Walle et al., 1995, Bruggemann et al., 2011, Haven, 1998]. However, solutions thus far are limited. They require explicit quantification of the contribution of each criteria to the ordering which may be unknown or unavailable. Often, asking an expert to sort such Independent Component Analysiscomponents, he will be able to tell qualitatively; e.g. ’very much’, which of the components are more relevant and thus must be kept, and which are dominated by noise and thus ought to be discarded [Van de Walle et al., 1995]. However, he will be unable to explicitly quantify i.e. putting a specific value, to how much is every component contributing as an additive model would do [Van de Walle et al., 1995] (this case is also analyzed in this thesis). The only order found that exploited such qualitative modifiers to produce the final ordering was [Chuang et al.,
8 CHAPTER 1. INTRODUCTION
2006] but only using a concentration and with comparisson between the translators from qualitative appreciations to quantitative assessments that do exist in fuzzy theory for different purposes [Zadeh, 1972]. Besides, we are not aware that these approaches for ordering functions over multiple fuzzy criteria have been used to solve the described demand in neuroimaging analysis.
1.4
Research Problem
LetX={x1,x2, ...,xj}be a set of measured signals e.g. EEG channels timecourses, andP
be a hidden generating process of interest e.g. a certain stimulus-evoked cognitive activity, to which certain latent variables C = {c1,c2, ...,ci}are known to contribute P ' f(C).
In EEG, these may correspond to activity in a certain frequency band or a systemic contribution. C={c1,c2, ...,ci}is a set of sorting criteria to be considered for the process
P. For a certain process, these may be given or defined through the literature. Also, let Y={yj :C→R}be a set of functionsyj de-mixed by Afrom the measured signals xl:
Y= AX e.g. the components found by ICA from the EEG recordings. The problem is to generate a relation of order rP over Y, such that the resulting indexing Jr(Y)of the elements
yj ∈Y reflects the relevance of each yj for P according to a given appreciation of the importance of
ci for P, WP(C;µ)as illustrated in Figure 1.3.
By definition, as variables, the sorting criteria are sets. Considering that the functions to be evaluated present levels of membership instead of being just 0 or 1, here they will be defined as fuzzy sets. In this thesis, the membership of yj to a fuzzy set
criteriaci is denoted as:
µci(yj)∈[0, 1] (1.1) where µci(yj)is the evaluation of the component yj in the criteria set ci. As it will be discussed, theci may be described in different manners depending on the evidence or
information known from the criteria set and this will be one of the contributions of this thesis. Three possible manners to obtain the specification of the setci are:
1.4. RESEARCH PROBLEM 9
Figure 1.3: The functions to be ordered yj have some membership levelµci(yj)to each set (or criterion) ci. This degree of membership is obtained depending on the information known from
the criterion ci. Then, the criteria are prioritized depending on their relevance WP(C) for the
processP. The problem is to define the relation of orderrPexploitingWP(C). The expected result
is an ordering Jr(Y)of the elements evaluated.
ci = {(ci,k,µcci)} with ci,k being the k-th prototype describing ci and µcci is the membership to setci.
• Knowledge-based; where the description ofci is in terms of an explicit generative model e.g. ci = g(Z) +εwith Zbeing some independent factor e.g. time.
• Distribution-based; where the description ofci is in terms of the probability (den-sity) ci =p(η,σ)assumed for those observed outcomes given the parameter values
of a known model that describes a set.
In addition to affording the relation of orderrP as the main contribution, this thesis
will further propose ways to estimate these membership depending on how theci are
described. For our application domain,yj will be the output of aBlind Source Separation,
and may be referred to ascomponents2. Finally, we shall contribute
2Please note that this may be appreciated as ambiguous, since mathematically they
10 CHAPTER 1. INTRODUCTION
The relation of order rP shall be crisp, but considering that the elements may not
contain information from the same sources then, it is not required to be a total order. The associated resulting indexingJr(Y)may be given in terms of a Hasse diagram. We shall
give an empirical proof that the proposed relationrP fullfils the properties of a relation of
order.
1.5
Research Questions
In this thesis, we aim at answering the following questions:
How to estimate the membership level µci(yj)of an element yjto a fuzzy set ci considering
the different ways the set may be described?
As afore described, the sorting criteria sets may be described by prototypes, genera-tive models or distributions. In all cases, given a functionyj to be ordered, membership
levels to each criterion µci(yj) has to be estimated. Of course, an infinite number of functions µci can be defined e.g. a naive membership function may simply assign a constant value to every yj. So the question is about choosing membership functions
µci that have face validity i.e. may preserve the criterion essence, convergent validity i.e. it should output high values of membership for objects related to the criterion, and divergent validity i.e. it should output low values for those unrelated to the criterion.
Given some expert qualitative evaluation regarding the relevance of different criteria ci for
P, which among certain weighting strategies afford a relation of order rP that yields an ordering
Jr(Y)that best approximates the ground truth?
The weighting strategies to be examined in this research are: no weighting, explicit quantitative weighting (additive model), weighting based on contrast intensifiers and based on linguistic hedges. UsiThe contrast intensifiers and the linguistic hedges is a contribution of this thesis and will be based on modifiers known in fuzzy set theory. Ground truth orderings will be generated synthetically and will be withheld from the of thePprocess componentsci. But we willnotrefer tocias components ofPas they are not expected to completely
1.6. HYPOTHESIS 11
tested relations of order. Assessment of best approximation will be based on graph matching metrics (i.e. the Jaccard index and the XOR operation over adjacency matrices).
How the proposed ordering strategy does performs in the EEG application domain to sort
ICA components when compared against other ordering strategies based on stimulus-locked
correlations?
The performance of the proposed order strategies will be evaluated on two experi-mental data sets [Torres Garc´ıa et al., 2013] and [Soto et al., 2014] against expert manual ordering for concurrent validity. A ccomparison will be made against the approach chosen in [Torres Garc´ıa et al., 2013].
1.6
Hypothesis
If knowledge about the process Pis encoded in a weighting strategyWP(C)using fuzzy
modifiers over preset ordering criteriaCsuch thatP' f(C),
then a relation of order rP is generated such that its associated ordering Jr(Y)
statistically better matches an expert manual ordering than an approach that exploit ex-perimental knowledge, whilst still alleviating the need of explicitly setting the importance of the different functions.
1.7
Objectives
The main objective of the thesis is to propose a relation of order to rank a series of functions (components obtained from Blind Source Separation) according to a process characterized by a set of fuzzy criteria.
The specific objectives are:
• For a given domain of interest, establish an appropriate set of sorting criteria and define ways to calculate memberships to them. Accompany such definition of membership functions with evidence of convergent and divergent validity.
12 CHAPTER 1. INTRODUCTION
• Propose an ordering of functions over multiple fuzzy sets.
• Establish if, after experimental replications, the order holds (internal validity).
• Modify the relation considering the information available from the components to be ordered using fuzzy modifiers.
• Establish the concurrent validity of the proposed relation of ordering for the com-ponents using synthetic data against other ordering relations from the literature.
• For the domain of interest, order the independent components (obtained from Independent Component Analysis) associated to EEG records according to the relevance for the neural activity evoked for a specific task and establishing an evaluation by classification comparing the proposed relation of order with a gold standard.
1.8
Assessment: scope and limitations
Ranking objects is common in areas such as economics [Zavadskas and Turskis, 2011], industry [Halouani et al., 2009, Venkata Rao and Patel, 2010, Baker et al., 1992], envi-ronmental decisions [McDaniels et al., 1999, Dongier et al., 2003, Kangas et al., 2001] among many others. In this thesis, the application domain is restricted to sorting signal components retrieved by ICA from an EEG. However, the problem is seen from a more abstract point of view and the proposed solution is amenable to different domains e.g. [Bruggemann et al., 2011, Van de Walle et al., 1995]. The research focuses on generating the relation of order of functions considering multiple fuzzy criteria. This has several implications. First, being multiple criteria implies a conflict during evaluating the options e.g., cost vs price. A consequent limitation is the necessity of a partial order instead of a total one. Second, choosing a fuzzy approach where the bounds of the sets are not crisp, it demands a way to calculate the level of membership of the objects to be sorted to the set over which the ordering criteria are defined. Finally, as the objects to be ordered are functions, common scalar membership functions defined over fuzzy numbers are insufficient. This can be considered a limitation because an specific membership function should be applied depending on the information known from the set.
1.9. CONTRIBUTIONS 13
1.9
Contributions
• Three new membership functions for fuzzy sets are proposed depending on the previous definition of the criteria and the information given for it:
1. Comparison directly to prototypes of a set. Some synthetic prototypes are available where it is known that these prototypes are elements from the set. 2. Function from known information. That is, the information about the set is
enough to define from there the membership function.
3. Proposal based on the distribution of the aimed model and the elements to be evaluated.
• The criteria defined contribute to the final decision in different degrees. In some cases, preferences for the different criteria are identified from an expert and a method to give that preference is needed. We proposed fuzzy alternatives in terms of contrast intensification and linguistic hedges.
• It is common to deal with alternatives which are indifferent for the decision maker or non-comparable. The problem is defined in terms of a fuzzy methodology [Van de Walle et al., 1995] for obtaining an Order Relation whereby the use of a bottom element ⊥grantees that output yields an order.
• A novel relation of order for functions based on fuzzy theory is put forth.
1.10
Chapter summary
Although the application could be extended, in this thesis, it is confined to EEG. The objective is to generate a relation of order over functions according to the cognitive process being studied. Depending on the information known about the process, some subpro-cesses are identified and considered as criteria of interest to evaluate the contribution of the functions to the process.
Chapter 2
Theoretical Framework
This chapter puts together all those concepts necessary to understand the research developed and its relevance. The chapter provides an introduction to fuzzy sets and relations of order. Also, it explains several advanced concepts used for the proposed ordering method like subsethoods, transitive closure,α-cut, the weighting strategies and
the Hasse Diagram. Finally, the chapter is further complemented with concepts related to the application domain.
2.1
Fuzzy sets
Definition 2.1.1 Let P be a classical set of objects, called the universe, whose generic elements are denoted by yj. Membership in a classical subset Ci of P is viewed as a characteristic function ,
µCi from P to the valuation set {0, 1}such that the element is a member of a set or not. These sets are calledcrisp sets. If the valuation set is allowed to be the real interval [0, 1], Ci is called a
fuzzy set[Zadeh, 1965].
Definition 2.1.2 Aboundaryorfrontierof a subset Ci of a space P is the set of points which can be approached both from interior of Ciand from the outside of Ci. In other words, the boundary
of the set is the subset of elements for which it is not possible to define a neighborhood that it is
only made either by interior or by exterior points. Interior points are those which it is possible to
16 CHAPTER 2. THEORETICAL FRAMEWORK
define a neighborhood for which all neighbor points also belongs to the set.
If a set does contain its boundary then it is called aclosed set, otherwise it is said to be anopen set. A classical set has crisp boundaries [Ross, 2009]. Instead, fuzzy sets deal with the representation of entities whose boundaries are not crisp [Dubois et al., 2014, Ross, 2009].
Definition 2.1.3 Amembership functionµ:P→[0, 1]∈Ris a function from some domain
P to the interval [0, 1]∈R[Klir and Yuan, 1995].
Definition 2.1.4 Let P be some universal set. Afuzzy setCi is a collection of pairs:
Ci ={(yj,µCi(yj))|yj ∈ P,µCi(yj)∈[0, 1]⊆R}. (2.1) The value ofµCi(yj)is called thegrade of membershipof yj in C [Dubois, 1980].
The closer µCi(yj) is to 1, the more yj belongs to Ci. The fuzzy set Ci has no sharp boundary inP. When Pis a finite sety1, ...,yn, a fuzzy set on Pis expressed as [Zadeh,
1972]:
Ci = µCi(y1)/y1+...+µCi(yn)/yn=
n
∑
j=1µCi(yj)/yj (2.2)
Each fuzzy set is completely and uniquely defined by one particular membership function; consequently, symbols of membership functions may also be used as labels of the associated fuzzy sets.
While the definition of openness and closeness for a crisp set is straight forward, in fuzzy sets these classical definitions are insufficient and more elaborated topologies such as semi-open or semi-closed also appear [Ahmad and Kharal, 2009]. Note that a fuzzy set may also be well- or ill- defined, as the well-definition depends not on a property of the boundary as fuzzyness, but on the capacity to evaluate membership. If for any objectyj you can evaluate its membership to P(regardless of being fuzzy), then the set is
2.2. RELATIONS 17
2.1.1 Set properties and basic operations
The classical union (∪) and intersection (∩) [Aspnes, 2014] of ordinary subsets ofPcan be extended by the following formulas, proposed by [Zadeh, 1965]:
∀yj ∈P,µCi∪Ck(yj) = max(µCi(yj),µCk(yj)) =µCi(yj)∧µCk(yj) (2.3)
∀yj ∈P,µCi∩Ck(yj) = min(µCi(yj),µCk(yj)) =µCi(yj)∨µCk(yj) (2.4)
Certain laws are important for manipulation of sets. The ones listed in Table 2.1 only hold for crisp but not fuzzy sets (check [Rosen, 2007] for the proof).
Table 2.1: Set identities that hold for crisp sets but do not hold for fuzzy sets.
Identity Name
A∩B=A∪B
A∪B=A∩B
De Morgan’s law
A∪A¯ =U
A∩A¯ =∅
Complement laws
Definition 2.1.5 Let P be some universal set. Thecomplementof a fuzzy set C is a fuzzy setC˜ with the membership function [Rutkowski, 2008]
∀y∈ P:µC˜(y) =1−µC (2.5)
Particularly relevant for this thesis is the fact that the laws of complement do not hold in fuzzy sets. In the classical dichotomy of signal vs noise, eliminating noise equals to enhancing signal. Under fuzzy modeling, this is not necessarily the case.
2.2
Relations
The concept of afuzzy relationis introduced as a generalization of crisp relations.
18 CHAPTER 2. THEORETICAL FRAMEWORK
Binary relations may be written as(yj,yl)∈ R, or in infix notationyjRyl.
2.2.1 Subsethood
Definition 2.2.2 Let C = {c1, ...,ci}be a set of criteria already defined. The fuzzy set yj is
containedin the fuzzy set yl (or, equivalently, yjis a subset of yl, or yjis smaller than or equal
than to yl) if and only if∀y∈ C:µci(yj)≤µci(yl)[Zadeh, 1965].
[Kosko, 1986] contends that if this inequality holds for all but just a fewci, one can
still consideryj to be a subset ofyl to some degree.
Definition 2.2.3 Fuzzy subsethood(SH) allows a given fuzzy set to contain another to some degree between 0 and 1. Kosko generalizes Zadeh’s definition by using the subsethood measure in
Equation 2.6, which was first proposed in [Sanchez, 1979].
SH(yj,yl)=
∑imin(µci(yj),µci(yl))
∑iµci(yj) = |yj∩yl|
|yj| i f yj 6=0
1 Otherwise
(2.6)
SH is a fuzzy set in the product yj x yl being yj, yl twoComponents. µci(yj) and
µci(yl)are the scores (or memberships) of the alternatives (or components) on criterionci. TheSHis calculated through theicriteria. Subsethoodmeasures are also calledinclusion grades[Young, 1996]. We may also interpret SH as a fuzzy (binary) relation between two Componentswith the following properties [Bruggemann et al., 2011]:
• Reflexive, since∀yj ∈Ci :SH(yj,yj) =1, and • It is not necessarily transitive.
2.2.2 Properties of Binary Relations
Letyj,yl,ysbe elements ofPand a grade of subsethood (defined later)SH(yj,yl)of the
2.3. ORDERS 19
a universal set Pare listed in Table 2.2. For fuzzy relations, these properties have to be redefined accordingly as in Table 2.3.
Table 2.2: Properties of crisp binary relationsRonPimportant for ordering
Name Property Name Property
Reflexive ∀yj∈P:(yj,yj)∈R Irreflexive ∃yj∈P:(yj,yj)∈/R Symmetric ∀yj,yl∈P(yj,yl)∈R ⇐⇒(yl,yj)∈R Asymmetric ∀yj,yl∈P:(yj,yl)∈/R
Antireflexive ∀yj∈P:(yj,yj)∈/R Antisymmetric
(yj,yl)∈Rand (yl,yj)∈R⇒yj=yl
Transitive ∀yj,yl,ys∈P:(yj,yl),(yl,ys)∈R⇒(yj,ys)∈R Antitransitive ∀yj,yl,ys∈P:(yj,yl),(yl,ys)∈R⇒(yj,ys)/∈R Nontransitive Does not satisfy transitivity Trichotomy ∀yj,yl∈P(yj,yl)∈R⇒j=loryjRyloryjRyl
Table 2.3: Properties of fuzzy binary relations relevant for ordering.
Name Property
Reflexive ∀yj∈P SH(yj,yj) =1
Symmetric ∀yj,yl∈P SH(yj,yl) =SH(yj,yl)
Antisymmetric ∀yj,yl∈P SH(yj,yl)>0andyj6=yl⇒SH(yj,yl) =0
Transitive ∀yj,yl,ym∈P min{SH(yj,ym),SH(ym,yl)} ≤SH(yj,yl)
Trichotomy ∀yj,yl∈P(yj,yl)∈R⇒j=lorSH(yj,yl)>0orSH(yl,yj)>0
2.3
Orders
Ordering, or ranking, refers to prioritizing following some association rule i.e. a relation. A relation of order, or simply order, is a binary relation on a set that meet some properties and as a consequence, precedence is established. Depending on the satisfied properties (of those in Tables 2.2 and 2.3), it is the type of ordering [Klir and Folger, 1987] as summarized in Table 2.4.
Definition 2.3.1 Apreorderorquasiorderis a binary relation that is reflexive and transitive.
Definition 2.3.2 Apreorderthat also satisfies the property of antisymmetry is apartial order.
Definition 2.3.3 Let Y be a non-empty set and letbe a relation on Y [Bloch, 2011]. (1) The relationis antisymmetric if yj yl and yl yj together imply that yj =yl.
20 CHAPTER 2. THEORETICAL FRAMEWORK
Table 2.4: Properties meet by binary relations of order.
Reflexi
vity
Symmetry Antisymmetry Transiti
vity
T
richotomy
Preorder (quasiorder) X X
Equivalence relation X X X
Partial order X X X Total order (simple,
complete, linear) X X X X
(2) The relationis a partial ordering if it is reflexive, transitive and antisymmetric. If is a
partial ordering, the pair(Y,)is a partial ordered set known asposet.
2.3.1 Order Bounds
Definition 2.3.4 Let (Y,) be a poset. Let A ⊆ Y. An upper bound for A is an element p∈Y such that a p∀a ∈ A. Aleast upper boundfor A is an element p∈Y such that p is an upper bound for A, and such that p z for any upper bound z for A. Alower boundfor A is an elemet q∈ Y such that q a∀ a ∈ A. Agreatest lower boundfor A is an element q∈Y such that q is a lower bound for A, and such that wq for any other lower bound w for
A [Bloch, 2011].
Definition 2.3.5 If the poset has a smallest element, then this element is called the zero or bottomelement and it is written as⊥.
Definition 2.3.6 Liftingis the operation of adding a bottom element to a poset P. The resulting structure is denoted by P⊥(P lifted). Critically, continuity or algebraicity do not suffer any harm
from lifting [Abramsky and Jung, 1994, Gunter, 1985, Gierz et al., 2003].
Given any poset P(with or without⊥) we formP⊥ as follows. Take an element⊥ and define onP⊥:=P∪ ⊥[Backhouse et al., 2003]. A poset Pcan be extended to a P⊥
2.3. ORDERS 21
by adding a new element as a lower bound of every element ofP. In particular, ifPis a discrete poset, then P⊥is a trivial extension of Pwith a bottom element and ifP⊥is not discrete, that bottom element maybe defined as the value −∞[Sanchis et al., 1967]. On this thesis because the memberships to be ordered are from 0 to 1, the⊥element is 0 in every criteria.
2.3.2 Hasse Diagram
Definition 2.3.7 AHasse diagramis a oriented graph whose nodes are the elements of the poset
(Y,)and the edges indicate the precedence defined by[Dubois, 1980].
The elements of the set are the nodes of the graph and the ordering is encoded in the edges with the direction indicating the sequence [Garnier and Taylor, 2001] as illustrated in Figure 2.1. Since partial orders are reflexive and transitive the graph has no cycles.
Definition 2.3.8 Let be the poset(Y,). TheMinimal Elementsare the elements a∈Y with no element b ∈ Y such that b < a TheMaximal Elements are the elements a ∈ Y with no element b ∈Y such that a<b
(a) Graph (b) Hasse Diagram
Figure 2.1: On the left is an example of a graph and the equivalent representation in a more intuitive tool known as Hasse Diagram. The same information can be obtained from a graph as in a Hasse Diagram if both are reflexive and transitive by deleting redundant information.
There can be more than one minimal and and maximal element in a poset. Dif-ferently drawn Hasse diagrams may nevertheless graphically represent the same partial
22 CHAPTER 2. THEORETICAL FRAMEWORK
order. In that case we speak of isomorphic Hasse diagrams [Bruggemann et al., 2011], and they represent equivalent relations of order.
2.4
Fuzzy ordering indices
The problem of ordering fuzzy quantities has been addressed [Wang and Kerre, 2001a, Wang and Kerre, 2001b]. Three major lines of thinking stand out [Ruan, 2012];
1. Fuzzy quantities are independently evaluated. To each fuzzy quantity, one associates a real number. Fuzzy quantities are then compared according to the corresponding real numbers.
2. One or more reference sets are defined and the fuzzy quantities to be ranked are compared against the reference sets. The results of these comparisons serve as a basis to obtain the final ranking order.
3. Fuzzy quantities are compared pairwise. When all the pairwise comparisons have been made, a procedure derives anOrder Relationamong the fuzzy quantities from the table of pairwise comparisons.
In all cases the domain is fuzzy, however the endpoint is crisp, either because the ordering is inherently crisp or because a so calleddefuzzificationprocedure is added. In chapter 3 more details are given about defuzzification. In this thesis, the fuzzy quantities to be compared are functions and to achieve a ordering it is exploited the third strategy.
2.4.1 Fuzzy ordering by pairwise comparisons
After all pairwise comparisons have been computed among the fuzzy quantities, addi-tional procedures may be used to yield anOrder Relation among the alternatives or to choose the non dominated alternatives. An example of this ordering approach is Analytic Hierarchy Process (AHP) [Saaty, 2008]. In these ordering approaches, two steps are necessary [Ruan, 2012]. First, a fuzzy (binary) relation fulfills the pairwise comparisons. In this work, the concept of Subsethood between each component is used. Second, an
2.4. FUZZY ORDERING INDICES 23
Order Relationis derived from these comparisons. The final ranking is determined by real numbers. In this work, the transitive closurefollowed by theα- cut will be used. Both
concepts are defined below.
2.4.2 Transitive Closure
Relations of order ought to hold the transitive property. SH may not be ”by default” sufficient to define a relation of order. Since this needs a generalized transitivity condition suitable for fuzzy order to be fulfilled, to ensure that a relation is transitive, the transitive closure may be calculated [Bruggemann et al., 2011].
min{SH(yj,ym),SH(ym,yl)} ≤SH(yj,yl) (2.7)
Definition 2.4.1 The transitive closureSHT of a SH relation is the smallest fuzzy relation which is transitive and of which SH is a subset.
Definition 2.4.2 Considering two binary fuzzy relations, P(X,Y)and Q(Y,Z)with a common set Y. Thecompositionof these relations, which is denoted by P(X,Y)◦Q(Y,Z), produces a binary relation R(X,Z)on X x Z defined by Equation 2.8 [Klir and Yuan, 1995].
R(x,z) = (P◦Q) =maxy∈Ymin[P(x,y),Q(y,z)] (2.8)
The relationSHTthat holds the property of transitivity can be generated fromSH using the following algorithm [Xiu et al., 2012]:
1. SH0 ←SH∪(SH◦SH)
2. IfSH0 6= SH, letSH←SH0. Go to step 1. 3. LetSHT ←SH0. Stop.
In [Klir and Yuan, 1995] thetransitive closureis expressed by the composition◦and set union found in step 1, that is:
24 CHAPTER 2. THEORETICAL FRAMEWORK
Equation 2.9 can also be found replacing the composition and the union to the min and max operations [Bruggemann et al., 2011] as follows:
Definition 2.4.3 Given a relation SH(yj,yl), itstransitive closureSHT(yj,yl)is determined
by :
SHT(yj,yl) =max{min{SH((yj,ym)),SH(ym,yl)}} (2.10)
2.4.3 α-cut
Definition 2.4.4 Let Y =y1,y2, ...,yjand A be a fuzzy set A= (Y,µA(yj)), the set of elements
yj that belong to the fuzzy set A at least to the degreeα∈ [0, 1]∈Ri.e.µA(yj)≥α, denoted Aα is calledα-cutorα-level set[Zimmermann, 2011]
The α-cut is a popular defuzzification approach. Since there are infinite values for α in [0, 1], it follows that any particular fuzzy set can be transformed into an infinite
number ofα−cutsets [Ross, 2009]. The α-cutcan be applied to the transitive subsethood
SHT= (Y,SHT(yj,yl))as per Equation. 2.11:
SHTα(yj,yl) ={(yj,yl)∈ P|SHT(yj,yl)≥ α} (2.11)
The set SHTα is a crisp set derived from its parent fuzzy setSH.
2.4.4 Equivalence classes, groups and partial orders fromα-cuts
Cut sets are preorders (see Table 2.4). From each of them a relation of equivalence can be obtained by collecting pairs in order to obtain a symmetric relation. The equivalence relations yield a partial order obtained from the cutset.
[yj]α ≤[yl]α ⇐⇒ (yj,yl)∈SHTα while(yl,yj)∈/ SHTα (2.12)
These equivalence classes become larger with decreasing α and melt together
2.5. PROBABILITY MODELS 25
known as the fuzzy-zeta matrixorSHTEα [Bruggemann et al., 2011]. Classes conformed of different alternatives (components) considered as equally good denoted as[yr,s,t]α being r,s andt different components that resulted equivalent under thatαare Grouped and
the final (partial) order is established . This condenses the fuzzy-zeta matrix that is the equivalent to the adjacency matrix of the graph underpinnig the Hasse diagram.
2.5
Probability models
This section presents some miscellaneous definitions that will be useful for defining memberships functions and estimating membership values to a set according to a certain criterion, and that will be later used for ordering.
2.5.1 Markov chains
Definition 2.5.1 A Markov chainis a triplet< S,A,π > where S = {si}is a set of states
with cardinality#S, A#S×#Sis a matrix of probability transitions between states, andπ is vector
of prior probabilities.
The pair γ= {A,π}is a state machine over S where the transition probabilities
A = {aij} are the probability of reaching a state si at time t, denoted si,t, given the
previous onesj at timet−1, i.e. P(si,t|sj,t−1). When a statesi, 0 does not have a previous
state is known as an initial state in the sequence [Sucar, 2015]. The parameters γ for
a Markov chain can be estimated from a given synthetic signal simply by counting the number of times that the sequence is in certain state e.g. value range or observation,i, and the number of times there is a transition from state ito state j. Assume there is a sequence of Nobservations e.g. a discrete digital signal ofN samples. γi is the number
of times that is observed the stateiand γij is the number of times the transition fromito
jis observed [Sucar, 2015]:
• Initial probabilities :πi =γi/N (2.13) • Transition probabilities :aij =γij/γi (2.14)
26 CHAPTER 2. THEORETICAL FRAMEWORK
The probability of a sequence of states given the model is basically the product of the transition probabilities of the sequence of states (departing from the intial probability to the transition probabilities occuring in the signal being evaluated) as shown in Eq. 2.15:
P(yi) =P(si,t|sj,t−1) =πiaijajk... (2.15)
2.5.2 The Box Cox transformation
Definition 2.5.2 TheBox Cox transformationhas the form [Box and Cox, 1964]:
˜ yj(λ) =
(yλ
j −1)/λ λ6=0
log(yj) λ=0
(2.16)
The Box Cox transformation is a transformation for normalizing skewed distributions. The exponent, lambda (λ) sets the strength of the correction applied. The optimal value λis the one that fits better to the normal distribution. In this thesis,λ=0 thus using the
logarithmic transformation to transform probabilities to membership levels.
2.6
Criteria priorization
When an order is to be made over multiple criteria, priorization permits establishing a certain unique relation of order. Priorization establishes relevance of the criteria for the subsequent ordering. Here, three priorization schemas from literature are discussed.
2.6.1 Additive model (crisp case)
LetY ={y1,y2, ...,yj}be the set of objects to be ranked andC = {ci = (Y,µci(yj))}be a set of fuzzy criteria for sorting. A vectorW =<w1, . . . ,wj > withwj ∈ Rexpressing
the relative importance of each criterion or setci ∈Cfor a subsequent ordering over the
wholeC[Van de Walle et al., 1995]:
µC(yj) =
∑
ci∈C2.6. CRITERIA PRIORIZATION 27
The weightswi will usually be normalized such thatwi ∈[0, 1]∈Rand∑ci∈Cwi =
1. The choice of thewi may be arbitrary or more commonly, informed by expertise.
2.6.2 Linguistic Hedges (fuzzy case)
Defining the specific weights in the additive model may not always comes naturally. In our daily life, we more often use words (rather than numbers) to describe a rating. For example, after testing a car the rating by linguistic variable might beverycomfortable, slightly comfortable,more or lesscomfortable, etc, using the linguistic hedgesvery,slightly andmore or less. Alinguistic hedgeormodifieris an operator that provides a quantitative modulation to the meaning of a term, and more generally of a fuzzy set. Some modifiers frequently used in literature are concentration and dilation [Tarmudi et al., 2010].
Concentration Concentrating a fuzzy setci = (Y,µci)(yj)to a fuzzy subset ofci results in a relatively small reduction in the magnitude of the grade of membership ofyj to
ci for thoseyj which have a large grade of membership toci and relatively large
reduction of membership for thoseyj with low membership toci [Zadeh, 1972].
µcon(ci)(yj) = [µci(yj)]
n; n∈R (2.18)
wherenis any real number bigger than 1. The typical appearance of the concentra-tion operator is shown in Figure 2.2a.
Dilation The effect of dilation is the opposite of that of concentration [Zadeh, 1972] spreading the grade of memberships.
µdil(ci)(yj) = [µci(yj)] 1
n; n∈R (2.19)
where n is any real number bigger than 1. The typical appearance is shown in Figure 2.2b.
2.6.3 Contrast intensification (fuzzy case)
Contrast intensification, or simply intensification, differs from concentration in that it increases the values ofµci(yj)which are above a threshold and diminishes those which
28 CHAPTER 2. THEORETICAL FRAMEWORK
(a) Concentration (b) Dilation (c) Intensification
Figure 2.2: Effects on the area under the curve of the membership function by different modifiers: a) Concentration, b) Dilation and c) Intensification. Figure reproduced from [Zadeh, 1972]
are below [Zadeh, 1972] according to Equation. 2.20 [Pal et al., 1981]:
µint(ci)(yj) =
2(µci(yj))
2 i f
µci(yj)∈[0,t]
[1−2(1−µci(yj))]
2 Otherwise
(2.20)
wheret ∈Ris a given threshold. The typical appearance is shown in Figure 2.2c.
2.7
Graph matching
Graph matching is the process of finding a correspondence between the nodes and the edges of two graphs that satisfy some (more or less stringent) constraints ensuring that similar substructures in one graph are mapped to similar substructures in the other. Graph matching approaches can be coarsely divided in [Conte et al., 2004]:
• Exact matching: demands that the mapping between the nodes of the two graphs must be edge-preserving in the sense that if two nodes in the first graph are linked by an edge, they are mapped to two nodes in the second graph that are linked by an edge as well. In the most stringent form of exact matching, graph isomorphism, this condition must hold in both directions, and the mapping must be bijective.
• Inexact matching: In inexact matching, nodes that do not satisfy the edge-preservation are penalize by assigning to it a cost. Some of the inexact matching methods propose the use of the matching cost as a measure of dissimilarity of the graphs.
2.8. CENTRALITY 29
In this thesis, comparison between the rankings yields by different orders is per-formed using inexact graph matching measures. Let GA andGB be two simple graphs
defined over the same set Vof nodes with adjacency matrices A={ak,k =1, . . . ,n}and
B={bk,k=1, . . . ,n}andn=#V2:
Definition 2.7.1 TheJaccard indexorTanimoto indexis given by Equation 2.21 [Rogers et al., 1960].
T(A,B) = ∑i(ai∩bi)
∑i(ai∪bi)
(2.21)
Definition 2.7.2 For two directed graphs with the same number of nodes N =#V, theHamming distance[Hamming, 1950] is the number of uncommon connections over the number of possible connections, and is given by Equation 2.22.
HD(A,B) = ∑i
(|ai∪bi|)−∑i(|ai∩bi|)
N(N−1) (2.22)
If adjacency is sparse; the Jaccard index is better suited than the Hamming distance [Dehmer and Varmuza, 2015]. In this research we have special interest on the ones (repre-senting the relation between two elements), which are measured with the intersection (numerator from Jaccar index).
Definition 2.7.3 TheXOR graph matching similarity measurewill consider the cases where both matrices coincide as per Equation 2.23 [Radhakrishna et al., 2013].
S(A,B) =1− ∑iXOR(ai,bi)
n (2.23)
where n=#V2is the total of cases evaluated, i.e. the size of the adjacency matrix.
2.8
Centrality
In this thesis, centrality is used to determine the relevance of aComponentbased on the obtained order. Centrality indices establish the relative importance of the vertices in
30 CHAPTER 2. THEORETICAL FRAMEWORK
a graph [Geisberger et al., 2008, Scott, 2000]. [Freeman, 1978] reviewed a number of published measures and reduced them to three basic concepts: degree, closeness and betweenness. LetGbe a graph consisting of a set ofVnodes andEedges connecting pair of nodes. For simple graphs,Gmay be described by the adjacency matrix, AV×V ={aij}
whose entries aij is 1 if edge (Vi,Vj)∈ Eand 0 otherwise.
Definition 2.8.1 Thedegree of node Vi is the total number of incoming and outgoing edges
associated with the node.
Definition 2.8.2 The degree centrality of a node i is defined as Equation 2.24 [Latora and Marchiori, 2007].
CiD = ei
N−1 =
∑j∈Gaij
N−1 (2.24)
where ei is the degree of node Vi and N =#V. Thedegree centralityis based on the idea that
important nodes are those with the largest number of ties to other nodes in the graph.
Definition 2.8.3 Closeness centralityis the minimum number of edges traversed to get from i to j [Latora and Marchiori, 2007].
CCi = (Li)−1=
N−1
∑j∈Gdij
(2.25)
where Liis the average distance from Vito all the other nodes. Thecloseness centralityof a node
Vi is based on the concept of minimum distance (geodesic) dij.
Definition 2.8.4 ThebetweennessCiBof a node i, sometimes referred to asload, is defined as [Boccaletti et al., 2006].
CiB =
∑
jk∈G,j6=k
nj,k(i)
njk
(2.26)
where njk is the number of shortest paths connecting two nodes Vj and Vk, while njk(i)is the
number of shortest paths linking the two nodes Vj and Vkand passing through Vi.
2.8.1 Independent Component Analysis
Blind Source Separation (BSS) is the process that aims at separating a number of source signals from observed mixtures of these sources. The term blind comes from the very
2.8. CENTRALITY 31
weak assumptions made about the mixing and the sources [Schobben et al., 1999]. This is mathematically equivalent to find a different implicit coordinate basis ∆0 = {δ0}
for a given space by rotating the explicit coordinate basis ∆ = {δ}according to some
assumptions or constraints. Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA), Independent Component Analysis (ICA) or beamforming are examples of BSS techniques. For this thesis,ICA is relevant, the others being beyond the scope of this thesis.
Definition 2.8.5 Two signals X and Y are said to bestatistically independentof each other, if the value of one signal provides no information regarding the value of the others [Stone, 2004];
P(X|Y) =P(X)where P(X)is the probability of X and P(X|Y)is the conditional probability of
X known Y.
The basis of mostIndependent Component Analysis (ICA)approaches is a generative model that assumes that the signals are the product of instantaneous linear combinations of some independentsources [Semmlow and Griffel, 2014] stated mathematically as:
xi(t) =ai1s1(t) +ai2s2(t) +...+aiNsN(t)f or:i=1, ...,N (2.27)
For brevity it is common to drop the time dependency. Equation 2.27 is a series of equations for Ndifferent signal variables xi(t)where Ais the mixing matrix:
x= As (2.28)
Where s is a vector composed of all the sources, A the mixing matrix composed of constant elements aij and x is a vector of the measured signals. ICA is used to solve
for the mixing matrix, A, from which the independent components,s, can be obtained through matrix inversion:
32 CHAPTER 2. THEORETICAL FRAMEWORK
Figure 2.3 illustrates the separation of sources using ICA. The independent compo-nentsYare an estimate of the original signalSand the mixing matrixW is an estimate of A−1having [Semmlow and Griffel, 2014]:
Y=WX (2.30)
Figure 2.3: In a real case, the measured signals are the ones already mixed and the original signals (the soruces) are unknown. Here is illustrated how the sources are mixed and then separated (or approximated to the sources) usingICA. Note that the separation is not perfect and the components are still mixed in some grade.
The problem is ill-posed with infinite possible solutions. There are a number of algorithms for estimating the mixing matrix Abut the objective function is always guided by independence i.e. fastICA maximizes a measure of non-Gaussianity as a proxy of statistical independence [Hyv¨arinen and Oja, 2000].Independent Component Analysiscan be expressed in terms of the related concepts of entropy [Bell and Sejnowski, 1995], mutual information [Cichocki and Yang, 1996], contrast functions [Comon, 1989] and other measures of the statistical independence of signals.
2.9. ELECTROENCEPHALOGRAPHY 33
2.9
Electroencephalography
Electroencephalography (EEG) [Nunez and Srinivasan, 2005] senses the electrical activity of the brain by means of electrodes attached to the scalp as shown in Figure 5.2a. It is a non-invasive electrophysiological neuroimaging technique widely used for many neurological applications [Adeli et al., 2007, Schlogl et al., 2007, Hassanien and Az Ar, 2015]. The recording sites in an Electroencephalographyare referred to aschannelsoften located according to the international 10-20 system [Jasper, 1958] or the successors the 10/10 and the 5/10 [Jurcak et al., 2007].
EEG signals are often analyzed in terms of a set of frequency bands. The brain oscillations are usually categorized into five frequency bands: Deltaδ(0.5 to 3.5Hz), Theta θ(4 to 7Hz), Alphaα(8 to 12Hz), Beta β(13 to 30Hz), and Gammaγ(>30Hz), although
there is generally a lack of consistency between studies with maintaining a definite standard range ofElectroencephalographybands [Knyazev, 2013]. Sometimes other bands are further defined e.g. the Muµ1rhythm. Despite this lack of unicity, the understanding
of brain activity according the these rhythms has become ade factostandard in EEG-based neuroscience [Nunez and Srinivasan, 2005]. Some relevant characteristics of these bands are reported in Table 2.5.
2.9.1 Denoising of EEG
EEG recording is susceptible to various forms and sources of noise and artifacts such as eye movement, muscle, and cardiac noise [Zhou and Gotman, 2004], which poses significant difficulties and challenges during analysis and interpretation of EEG data. A number of strategies are available to deal with noise effectively both at the time of EEG recording as well as during preprocessing of recorded data [Repovs, 2010]. These include but are not limited toBlind Source Separation[Naik et al., 2014, Joyce et al., 2004], filtering
1Please note that the symbol
µused in literature for the Mu rhythm is the same that the one used in this study for the membership level but it does not refer to the same concept. This ambiguity is irrelevant as we will not study theµ rhythm.