Acoustic feature selection and classiﬁcation of emotions in speech using a 3D continuous emotion model

(1)

ContentslistsavailableatScienceDirect

Biomedical

Signal

Processing

and

Control

j o u r n al ho m e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / b s p c

Acoustic

feature

selection

and

classiﬁcation

of

emotions

in

speech

using

a

3D

continuous

emotion

model

Humberto

Pérez-Espinosa

∗

,

Carlos

A. Reyes-García,

Luis

Villase ˜

nor-Pineda

InstitutoNacionaldeAstrofísica,ÓpticayElectrónica,LuisEriqueErro1,Tonantzintla,Puebla72840,Mexico

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received15September2010

Receivedinrevisedform23February2011 Accepted24February2011

Available online 3 April 2011

Keywords:

Automaticemotionrecognition Continuousemotionmodel Featureselection

a

b

s

t

r

a

c

t

Inthispaperwereporttheresultsobtainedfromexperimentswithadatabaseofemotionalspeech inEnglishinordertofindthemostimportantacousticfeaturestoestimateEmotionPrimitiveswhich determinetheemotionalcontentonspeech.Weareinterestedinexploitingthepotentialbenefitsof continuousemotionmodels,sointhispaperwedemonstratethefeasibilityofapplyingthisapproach toannotationofemotionalspeechandweexplorewaystotakeadvantageofthiskindofannotationto improvetheautomaticclassificationofbasicemotions.

1. Introduction

Emotions are very important in our everyday life, theyare present in everything wedo. Thereis a continuous interaction betweenemotions,behaviorandthoughts,insuchawaythatthey constantlyinﬂuenceeach other.Emotions area greatsourceof informationincommunicationandinteractionamongpeople,they areassimilatedintuitively.

Theapplicationsofemotionrecognitionencompassmanyfields, forexample,asasupportingtechnologyinmedicalareassuchas psychology,neurologyand caringof agedand impairedpeople. Automaticemotionrecognitionbasedonbiomedicalsignals,facial andvocalexpressionshasbeenappliedtodiagnosisand following-upofprogressiveneurologicaldisorders,specificallyHuntington’s and Parkinson’s diseases [1]. These pathologies are character-izedbyadeficitinemotionalprocessingoffearanddisgustand, thus,thesystemcouldbeutilizedtodeterminethesubject reac-tion/orabsenceofreactiontospecificemotions,helpingthehealth professionals togain abetterunderstandingof thesedisorders. Furthermore,thesystemcouldbeusedasa referenceto evalu-atethepatients’responsetocertainmedicines.Anotherimportant medicalapplicationisremotemedicalsupport.Thesekindsof envi-ronmentsenablecommunicationbetweenmedicalprofessionals andpatientsforcasesofregularmonitoringandemergency situ-ations.Inthisscenariothesystemrecognizespatient’semotional

∗_{Corresponding}_author.

E-mailaddresses:[email protected](H.Pérez-Espinosa), [email protected](C.A.Reyes-García),[email protected] (L.Villase ˜nor-Pineda).

statesandthentransmitsdataindicatingthepatientis experienc-ingdepressionorsadness,health-careprovidersmonitoringthem willbebetterpreparedtorespond.Suchasystemhasthe poten-tialtoimprovepatientsatisfactionandhealth[2–4].Itcanalsobe anassetfordisabledpeoplewhohavedifﬁcultieswith communi-cation.Hearing-impairedpeoplewhoarenotprofoundlydeafcan useresidualhearingtocommunicatewithotherpeopleandlearnto speakwithemotionsmakingcommunicationmorecompleteand understandable.Inthesecases,emotionrecognitionenginescan beusedasanimportantelementofacomputer-assistedemotional speechtrainingsystem[5].Forhearing-impairedpeople,itcould provideaneasierwaytolearnhowtospeakwithemotionmore nat-urallyorhelpspeechtherapisttoguidethemtoexpresscorrectly emotionsin speech. Emotionrecognitionarousesgreat interest intheinterfacedesigngiventhatrecognizingandunderstanding emotionsautomaticallyisoneofthekeystepstowardsemotional intelligenceinHuman–ComputerInteraction(HCI).Theneedfor automaticemotionrecognitionhasemergedduetothetendency towardsamorenaturalinteractionbetweenhumansand comput-ers.AffectivecomputingisatopicwithinHCIthatencompassthese researchtendencytryingtoendowcomputerswiththeabilityto detect,recognize,modelandtakeintoaccountuser’semotional statethatplaysaroleofparamountimportanceinthewayhumans makedecisions[6].Emotionsareessentialforhumanthought pro-cessesthatinﬂuenceinteractionsbetweenpeopleandintelligent systems.

In the area of automatic emotion recognition mainly two annotationschemeshavebeenusedtocaptureanddescribethe emotionalcontentinspeech:discreteandcontinuousapproaches. Discreteapproachisbasedontheconceptofbasicemotionssuch asanger,joy,andsadness,thatarethemostintenseformof

(2)

tionsfromwhichallotheremotionsaregeneratedbyvariations orcombinationsofthem. Theyassumetheexistenceof univer-salemotionsthatcanbeclearlydistinguishedfromoneanother bymostpeople.Ontheotherhand,continuousapproach repre-sentemotionalstatesusingacontinuousmultidimensionalspace. Emotionsarerepresentedbyregionsinann-dimensionalspace whereeachdimensionrepresentsanEmotionPrimitive.Emotion Primitivesaresubjectivepropertiesshown byallemotions.The mostwidelyacceptedprimitivesareArousalandValence.Valence describeshownegativeorpositiveisaspeciﬁcemotion.Arousal, alsocalledActivation,describestheinternalexcitementofan indi-vidualand rangesfrombeingverycalmtobeveryactive.Also, three-dimensionalmodelshavebeenproposed.Themostcommon three-dimensionalmodelincludesValence,Activationaswellas Dominance[7].Thisadditionalprimitivedescribesthedegreeof controlthattheindividualintendstotakeonthesituation,orin otherwords,howstrongorweaktheindividualseemstobe.

Bothapproaches,discreteandcontinuous,provide complemen-taryinformationabouttheemotionalmanifestationsobservedin individuals[8].Discretecategorizationallowsamore particular-izedrepresentationofemotionsinapplicationswhereitisneeded torecognizeapredefinedsetofemotions.However,thisapproach ignoresmostofthespectrumofhumanemotionalexpressions.In contrast,continuousmodelsallowtherepresentationofany emo-tionalstateandsubtleemotions,butithasbeenfoundthatitis difficulttoestimatewithhighprecisiontheEmotionPrimitives basedonlyonacousticinformation.Someauthorshavebegunto researchabout howto takeadvantageof this theory[9–11] to estimatemoreadequatelytheemotionalcontentinspeech.Both approaches arecloselyrelated, byassessingtheemotional con-tentinspeechusingoneofthesetwoschemeswecaninferits counterpartsintheotherscheme.Forinstance,ifanutteranceis evaluatedasangerwemayinferthattheutterancewouldhave alowvalueforValenceandhighforActivationandDominance. Conversely,ifanutteranceisevaluatedwithlowValenceandhigh ActivationandDominancewecouldinferthatthisisAnger.Several authors[12–14]haveworkedontheanalysisofthemostimportant acousticfeaturesfromthepointofviewofdiscretecategorization; however,theyhavenotyetstudiedwiththesamedepththe impor-tanceofacousticattributesfromthecontinuousmodelspointof view.Webelievethatthecontinuousapproachhasgreat poten-tialtomodeltheoccurrenceofemotionsintherealworld.This three-dimensionalcontinuousmodelisadoptedinthispaper.As afirststeptowardsexploitingthecontinuousapproachwe ana-lyzethemostimportantacousticfeaturestoautomaticallyestimate EmotionPrimitivesinspeech.Then,wewillbeabletousethis esti-mationinordertolocatetheindividual’semotionalstateinthe multidimensionalspace,andifnecessary,tomapittoabasic emo-tion.Inthisworkweapplytheseideastoimprovetheautomatic emotionrecognitionfromacousticinformation.Weperformsome experimentsinordertofindthemostimportantacousticfeatures forestimatingEmotionPrimitivesinspeechandthenwepropose amethodthatusestheseestimationstodeterminetheindividual’s emotionalstatemappingittoabasicemotion.Theremainderofthis paperisorganizedasfollows.First,wedescribethedatabasein Sec-tion2.Next,inSection3wedescribetheacousticfeaturesandhow weareextractingthemfromspeechsignals.Thefiltersweapplied toselectthebestinstancesareexplainedinSection4.InSection

5weproposeanddescribetwowaysofﬁndingthebestacoustic featuresforEmotionPrimitivesEstimation.Experimentalresults anddiscussionsaboutfeatureselectionareprovidedinSection6. InSection7weproposeawayofapplyingtheautomaticEmotion Primitiveestimationinordertoclassifybasicemotions.InSection

8,wepresentanddiscusstheresultsobtainedfrombasicemotions classiﬁcation.Finally,theconclusionofthisstudyandfuturework arediscussedinSection9.

2. Database

Forthepurposesofthisworkitisnecessarytohaveadatabase labeledwithEmotionPrimitives,namelyValence,Activationand Dominance.Inaddition,weneedbasicemotionsannotationsfor eachsampletovalidatetheEmotionPrimitivesestimation accu-racybyevaluatingthemappingdonefromcontinuoustodiscrete approach.Therearemanydatabaseslabeledwithemotional cate-goriessuchasFAUAibo[15],BerlinDatabaseofEmotionalSpeech

[16]SpanishEmotionalSpeech[17]andafewthatarelabeledwith EmotionPrimitivesVAMCorpus[18],IEMOCAPDatabase[8].The databaseweusedislabeledwithcommondiscretecategoriesand EmotionPrimitives.ThisdatabaseiscalledIEMOCAP[8] (Interac-tiveEmotionalDyadicMotionCaptureDatabase).Itwascollected attheSpeechAnalysisandInterpretationLaboratoryatthe Uni-versityofSouthernCaliforniaanditisinEnglish.Itwasrecorded fromtenactorsinmale–womanpairs.IEMOCAPincludes informa-tionabouthead,faceandhandsmotionaswellasaudioandvideo offeringdetailedinformationaboutfacialexpressionandgestures. Togenerateanemotionaldialoguetheydesignedtwoscenarios.In thefirstone,theactorswerefollowingascript,whileinthesecond one,theactorsimprovisedaccordingtoapresetsituation.The com-pletecorpuscontainsabout12h,althoughatthemomentonlythe firstsession,i.e.,theinteractionofapairofactorshasbeenreleased. Theutteranceswerelabeledwiththeemotionalcategories: hap-piness,anger,sadness,frustration,fear,surpriseandneutral.The categories“other”andnot“identified”werealsoincluded.To eval-uateEmotionPrimitivesanintegervaluebetweenoneandfivewas annotatedforeachprimitive,Valence(1–negative,5–positive), Activation(1–calm,5–excited),andDominance(1–weak,5– strong).Thecharacteristicsofthisdatabasemakeitvery interest-ingforthepurposesofourworksincetheannotationincludesthe twomostimportantapproaches.Attentionhasbeengivento spon-taneousinteraction,inspiteofusingactors,inidealconditionsfor recording.Thedatabaseshowsasignificantdiversityofemotional states.Weuseonlythesegmentedaudiointurnsofthefirstsession forthecategories:anger,happiness,neutralandsurprise.Intotal, thereare1.820instances.

3. Featuresextraction

Weextractedacousticfeaturesfromthespeechsignalusingtwo programsforacousticanalysisPraat[19] andOpenEar[20].We evaluatedtwosetsoffeatures;oneofthemwasobtainedthrougha selectiveapproach,i.e.,basedonastudytakingintoaccountthe fea-turesthatcouldbeuseful,thefeaturesthathavebeensuccessfulin relatedworksandfeaturesusedforothersimilartasks.Thesecond featuresetwasobtainedbyapplyingabruteforceapproach,i.e., generatingalargeamountofthemhopingthatsomewillbefound useful.TheSelectiveFeatureSetisasetoffeaturesthatwehave beenbuildingoverourresearch[21,22]and wasextractedwith thesoftwarePraat.Wedesignedthissetoffeaturesrepresenting severalvoiceaspects,weincludedthetraditionalattributes asso-ciatedwithprosody,e.g.,duration,pitchandenergy.Otherswhich haveshowngoodresultsintaskslikespeechrecognition,speaker recognition,classiﬁcationofbabycry[23],languagerecognition

[24],andpathologydetectioninvoice[25,26]andthatwebelieve canbesuccessfulinemotionrecognition.Table1showsthenumber ofacousticfeaturesthatwereincludedineachfeaturegroup.We dividethethreetypesoffeaturesweincludein:prosodic, spec-tralandvoicequality.Prosodicfeaturesdescribesuprasegmental phenomena,i.e.,speechunitslargerthanphonemes,suchaspitch, loudness,speed,duration,pausesand rhythm.Prosodyisarich sourceofinformationinspeechprocessingbecauseithasavery importantparalinguisticinformationthatcomplementsthe

(3)

mes-Table1

Selectiveapproachfeatureset.

Group Featuretype Numberoffeatures

Prosodic

Times Elocutiontimes 8

F0 Melodiccontour 9

Energy Energycontour 12 Voicequality

Voicequality Qualitydescriptors 24 Voicequality Articulation 12 Spectral

LPC FastFouriertransform 4 LPC Longtermaverage 5

LPC Wavelets 6

MFCC MFCC 96

Cochleagram Cochleagram 96

LPC LPC 96

Total 368

sagewithanintentionthat canreflect anattitudeoremotional state[27].Thesetype of featuresare themostcommonlyused inspeechemotionrecognition.Wesubdivideprosodicfeaturesin elocutiontimes,melodiccontourandenergycontour.Thesecond typeoffeaturesisvoicequalitythatgivestheprimarydistinction toa given speaker’svoicewhen prosodic aspectsare excluded. Someofthedescriptionsofvoicequalityareharshness, breathi-nessandnasality.Someauthors[15]hasstudiedtheimportance ofvoicequalitystatingthatthekeytothevocaldifferentiationof discreteemotionsseemstobevoicequality.Weincludedthetwo mostpopularvoicequalitydescriptorsthatarejitterandshimmer. Wealsoincludedothervoicequalitydescriptorsthathavebeen relatedtotheGRBASscaleinrelatedwork[25–29].Someofthese featureshaveneverbeenusedinspeechemotionrecognition.For example,theenergydifferencesbetweenfrequencybandsandthe ratiobetweenfrequencybandswereusedby[25]todiscriminate betweenpathologicalandnormalvoices.Increasesanddecreases inenergypeakswereusedby[26]forautomaticdetectionofvocal fry.Articulationisalsoanimportantparametertomeasurevoice quality.Weincludedsomestatisticalmeasuresofthefirstfour for-mantsasarticulatorydescriptors.Spectral features describethe characteristicsofaspeechsignalinthefrequencydomainbesides F0likeharmonicsandformants[15].Weincludedseveraltypesof spectralrepresentations.Someoftheserepresentationshavenever beenusedinemotionrecognitionascochleagramsthathavebeen usedforinfantcryclassification[23]andothersthathavestudied verylittleforthistaskasWavelets[30].Acochleagram[19] repre-sentstheexcitationoftheauditorynervefilamentsofthebasilar membrane,whichissituatedinthecochleaintheinnerear.This excitationisrepresentedasafunctionovertime(s)andBark fre-quencythatisapsychoacoustic.Acochleagramalsomodelsthe soundvolumeandfrequencymasking.Frequencymaskinginthe ear,happenswhenweheartwosoundsofdifferentintensityat thesametime,theweakestsoundisnotdistinguishedasthebrain onlyprocessesthemaskingsound.Featuresbasedoncochleagrams havebeenusedforspeechrecognitionwithgoodresults. Inthe workofShammaetal.[31]cochleagramswereusedforspeech recognitionatphonemelevel,surpassingtheresultsobtainedby LPCfeatures.WaveletsareanalternativetotheFouriertransform. Wavelettransformallowsagoodresolutionatlowfrequencies.We alsoincludedfeatureswidelyusedinspeechprocessingasMFCCs (Mel-frequencycepstralcoefficients)[32]thatrepresentsspeech perceptionbasedonhumanhearingandhavebeensuccessfully usedtodiscriminatephonemes.MFCCshaveshownthattheynot onlyareusefultodeterminewhatissaidbut,alsohowitissaid.

Thebruteforcefeaturesetwasextractedusingthesoftware OpenEar.Weextractatotalof6552featuresincludingﬁrst-order functionalsoflow-leveldescriptors(LLD)suchasFFT-Spectrum,

Table2

Bruteforceapproachfeatureset.

Group Featuretype Numberoffeatures

Prosodic

Energy LOGenergy 117

Times Zerocrossingrate 117 PoV Probabilityofvoicing 117

F0 F0 234

Spectral

MFCC MFCC 1521

MEL MELspectrum 3042

SEB Spectralenergyinbands 469 SROP Spectralrolloffpoint 468 SFlux Spectralﬂux 117 SC Spectralcentroid 117 SpecMaxMin Spectralmaxandmin 233

Total 6552

Mel-Spectrum,MFCC,Pitch(FundamentalFrequencyF0viaACF), Energy, Spectral, LSP.39 functionals such as Extremes, Regres-sion,Moments,Percentiles,Crossings,Peaks,Meanswereapplied. Table2showsfeaturesextractedbybruteforceapproach.In emo-tionrecognitionfromspeechtherearetwoapproachesforfeature processing static and dynamic approach. Thedynamic process-ingcapturesinformationabouthowfeaturesevolveovertime.By theotherside,staticprocessingavoidsoverfittinginthephonetic modeling byapplyingstatisticalfunctionsonlow level descrip-tors in periods of time. Static processing is more common on emotionrecognitionfromspeech.However,dynamicprocessing hasshowedgoodresultsinrecentyears[33–35].Andevennew methods havebeenproposedtorepresentthesignalproperties associatedwiththerelationshipbetweenexpressivenessandvoice quality[36].Ourspectralfeaturesincludestaticcoefficientsthat describethespectralpropertieswithinoneframewherethesignal isapproximatelystationary.Ourfeaturesetincludesdynamic fea-tures,whichdescribethebehaviorofthestaticfeaturesovertime. Forthispurpose,thefirstandthesecondderivativeofthestatic featuresinthebruteforcesetwerecalculated.

4. Instanceselection

Doinganinspectiononthedatabaseinstances,werealizedthat therewereproblematicinstances;wethoughtthatourmachine learning algorithm would havea betterperformance by select-ingthemostappropriateandcongruentinstancesrepresentingthe propertiesofcontinuousanddiscreteapproaches.Ourinitialdata setconsistsof1820instances.Weworkedonlywiththemore rep-resentedclasses,sowefirstselectedtheinstancesfromthefour classeswithmoreexamples.Wechoosetheinstancesfromthese fourclassestoenablethecomparisonofourresultstothework doneby[37],wheretheyuseonlythesefourclasses.After apply-ing thisfilterweare leftwith942instances, 20%wasreserved forfinaltesting.TheexperimentsreportedinSections6,7and8

weredoneusingthe80%equivalentto753instances.Another ﬁl-terwasappliedtotheseinstanceswhichconsistedofremovingall instancesthathavethesameannotationforeachofthethree prim-itives,butdifferentannotationforbasicemotion,toberegarded ascontradictoryinstancesthataddnoisetoourlearningprocess. Forexample,ifannotationsforaninstanceare(Valence=2, Acti-vation=4, Dominance=3, Emotion=Angry)and the annotations foranotherinstanceare(Valence=2,Activation=4,Dominance=3, Emotion=Happiness)everyinstanceannotatedwithValence=2, Activation=4andDominance=3iseliminatedfromourdataset. Afterthisﬁltering,wewereworkingwithasetof401instancesin thefeatureselectionprocess.

(4)

Table3

Correlationindexobtainedforprimitivesestimationusingthewholefeatureset.

EmotionPrimitive Correlationindex

Valence 0.0151

Activation 0.0095

Dominance −0.001

5. Featureselection

Theneedoffindingthebestfeaturesubsetsforbuildingour learningmodelsarisesgiventhelowcorrelationobtainedinthe estimationofprimitivesusingatrainedmodelforthefullsetof 6920acousticfeatures.Correlationcoefficientmeasuresthe qual-ity of theestimated variabledetermining thestrength and the directionofalinearrelationshipbetweentheestimatedandthe actualvalueofthevariable.Thecloserthecoefficientistoeither

−1or1,thestrongerthecorrelationbetweenthevariables.Asit approacheszerothereislessofarelationship.Table3showsthe correlationcoefficientobtainedwhenestimatingthevalueofthe EmotionPrimitiveswithamodelbuiltfrom942instances(using onlythefourclasseswithmoreinstances)and6920attributes.As wecansee,correlationisverylowandtherefore,thelearntmodels fromthesedataarenotuseful.Havingtoomanyfeaturesinrelation tothenumberofinstancescomplicatestheclassifierstask,SVM inthiscase,impedingaproperpredictionmodel.Althoughmany attributescanenhancediscriminationpower,inpractice,witha limitedamountofdata,anexcessiveamountofattributes signifi-cantlydelaysthelearningprocessandoftenresultsinover-fitting. Initially,we tested differentattribute selectorssuchas Sub-SetEval[38]thatevaluatesthevalueofasubsetofattributesby consideringtheindividualpredictiveabilityofeachfeaturealong withthedegreeofredundancybetweenthem,preferringthe sub-setsoffeaturesthatarehighlycorrelatedwithaclassandlowly correlatedwitheachother.WealsotestedReliefAttribute[38]The keyideaofReliefAttributeistoestimateattributesaccordingtothe valuesthatdistinguishthecloserinstances.Foragiveninstance, ReliefAttributeseekstwonearestneighbors:onefromthesame classandanotherfromadifferentclass.Goodattributesmustbe ononehand,differentvaluesbetweeninstancesofdifferentclasses andontheotherhand,thesamevaluesforinstancesofthesame class.Usingthesetechniqueswecouldnotimprovethecorrelation resultsforprimitivesestimation.Duetotheseproblems,itis nec-essarytodeviseawaytoselectthebestfeaturesamongalarge numberofthem,bearinginmindthatwehaveafewinstances.We proposetwoschemesforselectingattributesworkingonsmaller featuresetswiththeideaofavoidingthesearchforthebestfeatures onthewholeset.

5.1. Scheme1:ungroupedfeatureselection

Fig.1showstheprocessesappliedinthisscheme.Westarted fromaninitialfeaturesetobtainedfromafeatureselection pro-cessappliedtotheVAM(VeraamMittag)Germanspontaneous speechdatabase[18].Thisinitialfeaturesetachievedgood cor-relationresultsintheestimationofEmotionalPrimitivesinVAM

Fig.1.Ungroupedfeatureselectionscheme.

Fig.2.Linearforwardselectionwithfixedwidth(takenfrom[40]). database.Thisselectionprocesswascarriedoutwith252features obtainedbyselectiveapproachand949instances[22].Later,we conductedtheinstanceselectionprocessexplainedinSection4. Finally,weappliedthefeatureselectionprocessknownasLinear FloatingForwardSelection(LFFS)[39]whichmakesaHill-Climbing search,startingwiththeemptysetorwithapredefinedset, eval-uatesallpossibleinclusionsofasingleattributetothesolution subset.Ateachsteptheattributewiththebestevaluationisadded. Thesearchendswhentherearenoinclusionsthatimprovethe evaluation.Inaddition,LFFSdynamicallychangesthenumberof featuresincludedoreliminatedateachstep.Inourexperiments weusetheLFFSmodalitycalledfixedwidthshowninFig.2.Inthis modethesearchisnotperformedonthewholefeatureset. Ini-tially,thekbestfeaturesareselectedandtherestareremoved.In eachiterationthefeaturesaddedtothesolutionsetarereplacedby featurestakenfromthosethathadbeeneliminated.Thisscheme processesarerepeatedforeachEmotionPrimitive.

5.2. Scheme2:groupedfeatureselection

Inthisscheme,theideaistodividethewholefeaturesetinto smallergroupsaccordingtotheacousticpropertiestheyrepresent. FeaturegroupsareshowninTables1and2.Fig.3showsthesteps followedinthisscheme.First,weappliedtheinstanceselection ﬁlterexplainedinSection4.Second,wedividedthewholedata setintosmallersetsgroupingfeaturessharingthesameacoustic properties(seeTables1and2)andweappliedLFFS,unlikeScheme 1,thistimethesearchprocessbyLFFSstartedfromtheemptyset. Third,theselectedfeaturesforeach groupwereputtogetherin aﬁnalset.Thethreestepsinthisselectionschemebygroupsof featuresarerepeatedforeachEmotionPrimitive.

6. Featureselectionresults

Alltheresultsofthelearningexperimentsinthispaperwere obtainedusingSupportVectorMachines(SVM)andvalidatedby 10-FoldCross-Validation.

Themetricsusedtomeasuretheimportanceoffeaturegroups are:correlationcoefﬁcient,shareandportion.Thecorrelation coef-ﬁcientis themostcommonparametertomeasurethemachine learningalgorithmsperformanceonregressiontasks,asinourcase. Weuseshareandportionthataremeasuresproposedin[14]to assesstheimpactofdifferenttypesoffeaturesontheperformance ofautomaticrecognitionofdiscretecategoricalemotions.

(5)

Fig.3.Groupedfeatureselectionscheme.

Correlationcoefﬁcient: Indicatesthestrengthand directionof thelinearrelationshipbetweentheannotatedprimitivesandthe estimatedprimitivesbythetrainedmodel.Itisourmainmetricto measuretheclassiﬁcationresults.

Share:Showsthecontributionoftypesoffeatures.For exam-ple,if28featuresareselectedfromthegrouptimesand150were selectedintotalthen.

Share=(28×100) 150 =18.7

Portion:Showsthecontributionoftypesoffeaturesweighted bythenumberoffeaturespertype.Forexample,if28featuresare selectedfromatotalof125featuresofthetimessetthen:

Portion=(28₁₂₅×100)=22.4

Havingidentiﬁedthebestacousticfeaturesetsweconstructed individualclassiﬁerstoestimateeachEmotionPrimitive.Table4

showstheevaluationresultsoftheinstances/attributesselection schemeillustratedinFig.1.Thesecondcolumnshowstheresults whenusingallattributesandallinstancesinthelearningprocess. Aswecanseethecorrelationcoefficientistoolow.Thethird col-umnshowstheresultswhenthelearningprocessis performed usingthebestfeaturesforeachprimitiveproposedby[22].The fourthcolumnshowstheresultswhenthelearningprocessusesthe samefeaturesasthepreviousexperiment,butwefilteredinstances asdescribedinSection4.Finally,thefifthcolumnshowstheresults afterfilteringinstancesandapplyingthefeatureselectionmethod LFFS.Aswecanseetheimprovementinresultswasgradualafter applyingeachdataprocessing.

Thetwofeatureselectionschemeswereappliedtoeachofthe threeEmotionPrimitives,thatis,weransixtimesthefeature selec-tionprocess,obtainingsixdifferentfeaturesets.Thispapershows theresultsofthebestsubsetsforeachprimitive.Tables5–7reﬂect theeffectivenessof eachfeaturesetand thedifferencesamong groups. The groups PoV and SC are notshown in theseTables becausenofeatureswereselectedfromthem.Itisimportantto notethatthecorrelation,shareandportionshowninthesetables areobtainedbydecomposingingroups offeaturesthesolution setfoundbytheselectionSchemes1and2andevaluatingthem separatelywiththesemetrics.Thehigestvalueforeachmetricis

Table4

Resultsforeachstepoftheungroupedfeatureselectionscheme.Eachcolumnshows thenumberoffeatures/correlationcoefﬁcient.

Emotion Primitive

Baseline results

Scheme1 results

Allattributes Initialfeature selection

Instance selection

LFFS

Valence 6920/0.0151 56/0.5188 56/0.5597 62/0.6189 Activation 6920/0.0095 67/0.7463 67/0.7861 60/0.7969 Dominance 6920/−0.001 23/0.6536 23/0.7117 31/0.7437

Table5

Scheme1/scheme2featureselectionresults–Valence.

Featuregroup Total Selected Correlation Share Portion

Voicequality 36 6/5 0.29/0.36 9.67/7.14 16.66/13.88 Times 125 1/5 0.33/0.36 1.61/7.14 0.80/4.00 Cochleagrams 96 8/10 0.40/0.51 12.90/14.28 8.33/10.41 LPC 111 13/8 0.34/0.53 20.96/11.42 11.71/7.20 SFlux 117 0/5 −/0.48 0/7.14 0/4.27 Energy 129 0/6 −/0.37 0/8.57 0/4.65 F0 243 0/1 −/0.31 0/1.42 0/0.41 SpecMaxMin 234 0/1 −/0.18 0/1.42 0/0.42 SEB 234 0/5 −/0.45 0/7.14 0/2.13 SROP 468 0/6 −/0.35 0/8.57 0/1.28 MFCC 1617 27/13 0.48/0.49 43.54/18.57 1.67/0.80 MEL 3042 7/5 0.51/0.51 11.29/7.14 0.24/0.16 All 6920 62/70 0.61/0.62 100/100 0.89/1.04

Table6

Scheme1/scheme2featureselectionresults– Activation.

Voicequality 36 3/10 0.08/0.72 5.00/9.80 8.33/27.77

Times 125 1/10 0.16/0.71 1.66/9.80 0.80/8.00 Cochleagrams 96 24/10 0.77/0.78 40.00/9.80 25.00/10.41 LPC 111 4/12 0.61/0.78 6.66/11.76 3.60/10.81 SFlux 117 0/4 −/0.65 0.00/3.92 0.00/3.41 Energy 129 4/11 0.76/0.78 6.66/10.78 3.10/8.52 F0 243 1/9 0.29/0.65 1.667/8.82 0.412/3.70 SpecMaxMin 234 0/11 −/0.78 0.00/10.78 0.00/4.70 SEB 234 0/4 −/0.53 0.00/3.92 0.00/1.70 SROP 468 0/6 −/0.65 0.00/5.88 0.00/1.28 MFCC 1617 23/6 0.77/0.78 38.33/5.88 1.42/0.37 MEL 3042 0/9 −/0.73 0/8.82 0/0.29 All 6920 60/102 0.79/0.78 100/100 0.86/1.47

Table7

Scheme1/scheme2featureselectionresults–Dominance.

Voicequality 36 0/10 −/0.65 0.00/13.15 0.00/27.77

Times 125 2/6 0.35/61 6.45/7.89 1.60/4.80 Cochleagrams 96 7/8 0.70/72 22.58/10.52 7.29/8.33 LPC 111 1/7 0.66/72 3.22/9.21 0.90/6.30 SFlux 117 1/5 0.61/0.66 3.22/6.57 0.85/4.27 Energy 129 2/12 0.15/0.71 6.45/15.78 1.15/9.30 F0 243 3/4 0.22/0.59 9.67/5.26 1.23/1.64 SpecMaxMin 234 0/6 −/0.42 0.00/7.89 0.00/2.56 SEB 234 0/2 −/0.59 0.00/2.63 0.00/0.85 SROP 468 0/4 −/0.53 0.00/5.26 0.00/0.85 MFCC 1617 11/8 0.71 35.48/10.52 0.68/0.49 MEL 3042 4/4 0.68/0.69 12.90/5.26 0.13/0.13 All 6920 31/76 0.74/0.72 100/100 0.44/1.09

markedinbold.WhilefeatureselectionScheme2ensuresthatat leastoneattributeofeachgroupwillbeincluded,Scheme1maynot includeanyelementofcertaingroupsinthesolutionset.Thelast rowshowsthetotalnumberoffeaturesselectedfortheEmotion Primitive.IntheexperimentsforValencewiththeselectionScheme

(6)

1,thegroupwithhighercorrelationwasMEL(0.5167), contribut-ingtothesolutionsetwith7outof62features.TheMELportion isverylow(0.230)thereisatotalof3042featuresbelongingto thisgroup.Sharemaybeconsideredmedium(11.290).This indi-catesthatfewMELcoefﬁcientsprovideveryimportantinformation. AnotherimportantgroupwasMFCCwithacorrelationof0.4877, indicatingthatthespectralinformationgroupsareimportantto estimateValence.IntheexperimentforValencewiththe selec-tionScheme2,thebestgroupswereLPC(0.5349)cochleagrams (0.5152)andMEL(0.5129).LPCandcochleagramsshowashare (11.429–14.284)and portion(7.207–10.417)similar,meanwhile MELshowsashare(7.143)andportion(0.164)lowerincomparison withthetwomentionedgroups.AsinScheme1,MELprovidesfew features,butveryimportantfeatures.WerealizedthatforValence spectraltypegroupswerethebestforSchemes1and2.Nogroupby itselfreachedthecorrelationobtainedwithallgroupstogether.We canseethatthe8selectedattributesinScheme2fromLPCwere muchbetter(0.535)thanthe13attributesselectedinScheme1 forthesamegroup(0.342).ThebestcorrelationforValencewas obtainedusingtheScheme2reaching0.6232.

ForActivationthegroupMFCChasthebestcorrelationinboth selectionschemesMFCCobtained0.7795inScheme1and0.7897 inScheme2.Wecanseethatitsshare(38.333and5.882)and por-tion(1.422and0.371)areverydifferent.Wecanseethatusing onlythegroupMFCCwecanreachsimilarresultstothoseobtained usingallgroups(0.7964and0.787).Theseresultsclearlyindicate theimportanceofthisgrouptoestimateActivation.Asexpected, thesignalenergywasveryimportantforthisprimitivewitha cor-relationof0.7851inScheme2and 0.7639inScheme1asit is expectedthatpeopleexperiencingincreasedactivityorexcitement showahighervolumeinitsvoice.Otherimportantgroupsfor Acti-vationarecochleagramsandLPC.Aninterestingeffectisthatthe groupSpecMaxMinwasimportantintheScheme2witha corre-lationof0.7831,withahighshare,butinScheme1,thiswasnot selectedanyfeaturefromthisgroup.Itwasexpectedthatthegroup timeswasimportantforActivationsinceithasbeenfoundthatthe fasterspeechisperceivedthemoreexcitedtheindividualisand theslowerspeechisperceivedthemorerelaxedtheindividualis perceivedtobe[27].

OnlyonetimesfeaturewasselectedintheScheme1;its cor-relationwasverylow(0.167),whereasinScheme2,10features wereselectedobtainingarelativelyhighcorrelation(0.7128).In Scheme2the bestresultswerenot obtainedusingallselected attributes(0.787)butusingonlytheonesselectedbythebrute forceapproach(0.7952).ThebestresultforActivationwasobtained usingallgroupsinScheme10.7964.WeinferthatActivationmajor groupsarespectral,MFCC,cochleagramsandenergy.

IntheexperimentscarriedoutwithDominanceinScheme1 thebestgroupswereMFCC(0.72)andcochleagrams(0.702)and inScheme2LPC(0.7266)andcochleagrams(0.7244).InScheme 2thebestresultswereobtainedusingonlythefeaturesselected fromgroupLPC(0.7266),surpassingtheresultsobtainedusingall groups(0.7157).InScheme2thebestresultswerenotobtained usingallthefeaturesselected(0.7157)butonlythoseselectedby theselectiveapproach(0.726).ThebestresultforDominancewas obtainedwithScheme1usingallgroups(0.7437).Wecaninferthat inthecaseofDominancethemostimportantgroupsarespectral, MFCCandcochleagrams.

6.1. Selectivevs.bruteforce

Theﬁrststudiesonautomaticemotionrecognitionoftenopted foraselectivefeatureextractionapproachbasedonexpert knowl-edge,usually witha smallnumber offeatures. Today,withthe emergenceoftoolsthatallowustoextractalargenumberof fea-turesandtheavailabilityofmorecomputingpoweritiseasierto

Table8

Selectivevs.bruteforcefeatureextractionapproach–scheme1/scheme2.

Approach Total Selected Correlation Share Portion

Valence

Selective 368 46/27 0.48/0.56 74.19/38.57 12.50/7.34 Bruteforce 6552 16/43 0.57/0.55 25.81/61.43 0.24/0.68 Activation

Selective 368 56/37 0.79/0.79 93.33/36.28 15.22/10.05 Bruteforce 6552 4/65 0.76/0.80 6.67/63.72 0.06/1.03 Dominance

Selective 368 13/29 0.72/0.73 41.93/38.16 3.53/7.88 Bruteforce 6552 18/47 0.73/0.70 58.06/61.84 0.28/0.74

applyabruteforceapproach.Oneofthepointstoconsiderinthis paperistocomparetheselectiveandbruteforcefeatureextraction approachesandtoanalyzewhichonecouldbebetter.Furthermore, weareinterestedinvalidatingtheworkthatwehavedonebefore followingaselectiveapproach[21,22].

Table 8 shows a comparison between feature extraction approaches (selectivevs. brute force) as wellas a comparison betweenthefeatureselectionschemeshereproposed.Inall exper-iments, theselectiveand bruteforceextractionapproaches got correlation coefﬁcients very similar with the exception of the experimentdonewithScheme1forValencewheretheresultswith bruteforce(0.5715)weremuchbetterthanwithselective(0.4799). However,theselectiveapproachshare(74.194)wasmuchhigher thanthebruteforceone(25.806)andthecorrelationusingboth groupswashigher(0.6189)thantheobtainedforeachgroup sepa-rately.Itisverydifﬁculttosaywhichfeatureextractionscheme is better since theyobtained similar resultsusing the selected featuresfromthesegroupsseparatelyandgenerallyyieldbetter resultsjoiningthemintooneset.

7. BasicemotionclassiﬁcationbasedonEmotionPrimitives

Havingidentifiedthemostrelevantacousticfeaturesfor esti-matingtheEmotionPrimitives,thenextstepistodeviseawayto usetheseestimationstodiscriminateemotionalstatesinpeople. Inthissection,wewanttostrengthenthefindingsinthefeatures studybydemonstratingthat,thecontinuousapproachcan actu-allyhelpustoimprovetheautomaticemotionclassificationfrom acousticfeaturesbasedonacontinuousemotionmodel.Section2

describesthedatabaseweareworkingon.ThecorpusIEMOCAP wasannotatedwithbasicemotionsandEmotionPrimitives.

The classiﬁcation scheme used to perform the experiments reportedinthispaperisillustratedinFig.4.

1.Acousticfeaturesareextractedfromthespeechsamples,aswe havementioned,wetestedtwosetsoffeatures,theselectiveset andthebruteforceset.

2.Weappliedaprocedurecalledtransformation,whichconsistin replacingthebasicemotionslabelswithvaluescorresponding tolinguisticlabels,forexample,ifaninstanceislabeledasanger thatlabelisreplacedbythreelabels,oneforeachprimitive: •LowforValence

•HighforActivation •HighforDominance

ThislabelingwasdoneaccordingtoTable9.Sincethisdatabase hasamanualannotationofEmotionPrimitivesthose annota-tionswereusedtobuildthetable.Weobtainedthemeansofthe annotatedvaluesperclassfromtheannotationsoftheinstances ﬁlteredaccordingtotheprocessdescribedinSection4. 3.Afterextractingtheacousticfeaturesandperformingthe

trans-formationstep,wehadadatasetconsistingofacousticfeature vectors whose values to predict are the Emotion Primitives Valence, Activationand Dominance, thepossible values that

(7)

Table9

PrimitivevaluescalculatedaccordingtoannotationsmadebyIEMOCAPcorpus eval-uators.Thevaluesinitalicareconsideredlowandvaluesinboldareconsidered high.

Basicemotion Valence Activation Dominance

Anger 2 4 3.5

Happiness 4.42 3.35 2.58

Neutral 2.75 3 2.5

Sadness 2 2 1.75

theseprimitivescantakearehighorlow.Withthisinformation, threemodelsaretrainedtoestimatethevaluesofeach Emo-tionPrimitive.TobuildthesemodelsweusedSupportVector Machines.

4. To performtheclassiﬁcation ofbasic emotionsweextracted acousticfeatures,andapplythethreemodelsobtainedinthe previousstep.Inthiswayyougetahighorlowvalueforeach oftheinstances.Withthesethreevaluesweconstructed vec-torswhoseattributesarethethreeprimitivesandthevalueto predictisthebasicemotionoftheoriginalinstance.

5.Finallyweappliedamachinelearningalgorithm(SVM)to con-structamodeltoassignanemotiontoeachinstance.

8. Basicemotionsclassiﬁcationresults

Table10showstheresultswhenclassifyingEmotionPrimitives accordingtohighandlowclassesasdescribedinSection7point3 oftheclassiﬁcationprocessdescription.

Table11showsacomparisonoftheresultsobtainedby apply-ingtheprocessdescribedinFig.4,usingadiscreteapproachfor

Table10

AccuracyinclassifyingEmotionPrimitivesbyhighandlowclasses.

EmotionPrimitive Accuracy

Valence 76.808

Activation 83.5411

Dominance 88.0299

Table11

RecallwhenclassifyingbasicemotionsfromEmotionPrimitives.

Recall Anger Happiness Neutral Sadness

Continuous 85.82 54.93 79.17 42.11 Discrete 74.92 43.52 75.96 67.10 Baseline 69.68 21.01 35.23 76.84

theclassificationofemotionsandthebaselineresultsforthe cor-puswithwhichweareworkingreportedin[37].Wecanseethatit hasachievedabestrecall(numberofcorrectlyclassifieddividedby numberoftotalitems)usingourmethodonfouremotions,anger, happinessandneutral,whereassadnesshasalowerperformance. TheseresultsshowthatbyestimatingEmotionPrimitiveswecan discriminatewithanadequateaccuracyprototypicalemotionsuch asanger,happinessandsadnessinadditiontotheneutralstate andinsomecasesevenimprovetheclassificationthatcanbedone directlybytrainingbasicemotionsmodelsfromacousticfeatures. However,itis veryimportanttoemphasize thatbyusing Emo-tionPrimitiveswecangeneralizethedetectionofemotionalstates inamuchbroadersensebecausewecandefinedifferentareasin thethree-dimensionalspacedependingontheemotionalstatesof interestaccordingtotheapplicationandtheemotionalstatesthat

(8)

arerequiredtobediscriminated.Forexample,anapplicationmight beinterestedinemotionalstatesclosetoanger,oremotionalstates withhighActivation,ornegativeValenceoranycombinationofthe threeprimitives.

9. Conclusions

Wecarriedoutastudyabouttheimportanceofdifferent acous-ticfeaturetypesfromacontinuousthree-dimensionalemotions modelpointofview.WeanalyzedeachEmotionPrimitive sepa-rately.Throughtheidentificationofthebestfeaturestheautomatic estimation ofEmotionPrimitivesbecomes moreaccurately and thustherecognitionandclassificationofpeople’semotionalstate improves.Toourknowledgetheimportanceofacousticfeatures hasnotbeenstudiedfromthisapproach.Wehavetakensomeideas usedinthestudyoftheimpactoffeaturesintheclassificationof discreteemotionalcategories,likeshareandportionmetrics,and wehave appliedthemtothecontinuousapproach. Wedivided our6920featuressetinto12groupsaccordingtotheiracoustic properties.Wecalculatedsomemetricsforeachgroupinorderto estimatetheirperformanceintheautomaticestimationof Emo-tionPrimitives(correlationcoefficient)andtheircontributionto thefinalsetoffeatures (shareand portion).Weworkedwitha databaseofactedemotions;labeledwiththetwomostimportant annotationschemes,continuousanddiscrete.Despitebeingacted thisdatabasewasdesignedtryingtomakeitleastartificialthrough theimprovisationofdialogues.

Weproposedtwofeatureselectionschemestakingintoaccount thatwehadmanyfeatures,butaverylimitedsetofinstances.We observedthatbothfeatureselectionschemesobtainedvery simi-larresultsandtheygenerallyagreeontheimportanceoffeature groups.

Themaincontributionofthispaperistheanalysisofacoustic features.Thisanalysiswasbasedmainlyonthecorrelationobtained bythemodelsgeneratedfromamachinelearningprocess(SVM). Werealizedthat spectralfeaturegroupsarevery importantfor thethreeprimitives.Accordingtoourresultsthemostimportant featuregroupsforeachEmotionPrimitiveare:

•Valence:MEL–MFCC–cochleagrams–LPC •Activation:MFCC– cochleagrams– energy– LPC •Dominance:MFCC–cochleagrams–LPC–MEL

ClearlyMFCC,LPCandcochleagramsgroupsareveryimportant toestimatethethreeEmotionPrimitives,sincetheyappearamong themostimportantforthethreeprimitives.Thesethreegroups belongtothespectralinformationcategory,wecanconcludefrom this factthat spectralanalysisis moreimportant thanprosodic andvoicequalityanalysisforEmotionPrimitiveestimation,except forActivation,whereenergyisalsoveryimportant.Cochleagrams groupisaninterestingﬁndingbecause,toourknowledge,ithas notbeenusedbeforeforemotionrecognition.

We observed that there were similar correlationresults for selectiveandbruteforcefeatureextractionapproacheswhenthe selectedfeaturesbelongingtothesesetsweretestedseparately althoughthebruteforcesetwasmuchlargerthattheselectiveset. Theportionfortheselectivewasalwaysmuchhigherthanportion forbruteforce,thistellusthattheselectivesethasfewerfeatures, butmoreimportantones.Thesharewasmorebalancedforboth approaches,tendingtobehigherforbruteforce.

We proposed a way for mapping Emotion Primitives into basic emotions and we performed a basic emotion classifica-tiontoshowthefeasibilityofapplyingthecontinuousapproach. Theinstance/attributeselectionprocessbasedonEmotion Prim-itives has been successful in the discrimination of emotions reflected in the classification of basic emotions, raising the

recall property for three out of four emotions studied in this work.

Interestingresultshavebeenobtainedinthispaper.However, a limitationis thatexperimentswere doneonan actedspeech database.It isknownthat there aremajordifferences between workingwithactedandrealdata.Anotherlimitationisthefact thattherecordingsweusedbelongonlytotwodifferentpeople. Thenexttaskwillbetovalidateourfindingswiththeextended versionoftheIEMOCAPdatabaseandwithotherdatabases, includ-ingdifferentlanguagesanddifferentrecordingconditions.Wealso plantorefinethemappingandclassificationscheme,tryingtotake betteradvantageofcontinuousemotionalmodel.

Acknowledgements

Theauthorswishtoexpresstheirgratitudeforthesupportgiven tocarryoutthisresearchtotheNationalCouncilofScienceand TechnologyofMexicothroughthepostgraduatescholarship49296 andtheproject106013.

References

[1] C.Vera-Mu ˜noz,L.Pastor-Sanz,G.Fico,M.Arredondo,AWearableEMG Moni-toringSystemforEmotionsAssessment,ﬁrsted.,SpringerPublishingCompany, Incorporated,2008.

[2]G.González,Bilingualcomputer-assistedpsychologicalassessment:an inno-vative approachfor screening depression in chicanos/latinos, Tech. rep., UniversityofMichigan,1999.

[3]F. Nasoz, K.Alvarez, L. Lisetti, N. Finkelstein,Emotion recognition from physiologicalsignalsusingwirelesssensorsforpresencetechnologies, Cog-nition,Technology&Work6(2004)4–14,URL:http://portal.acm.org/citation. cfm?id=1008243.1008250.

[4]L.Vidrascu,L.Devillers,Real-lifeemotionrepresentationanddetectionincall centersdata,in:ACII,2005,pp.739–746.

[5] M.S.Hussain,R.A.Calvo,Aframeworkformultimodalaffectrecognition,in: HCSNetPerceptionandActionWorkshop:ToolsandTechniquesforConducting EEGandMEGExperiments,2009,pp.1–8.

[6]M.Murugappan,M.Rizon,R.Nagarajan,S.Yaacob,D.Hazry,I.Zunaidi,4thKuala LumpurInternationalConferenceonBiomedicalEngineering,Berlin, Heidel-berg,2008.

[7]H.Schlosberg,Threedimensionsofemotion,PsychologicalReview61(2) (1954)81–88.

[8] C.Busso,M.Bulut,C.-C.Lee,A.Kazemzadeh,E.Mower,S.Kim,J.Chang,S. Lee,S.S.Narayanan,IEMOCAP:InteractiveEmotionalDyadicMotionCapture Database,JournalofLanguageResourcesandEvaluation42(4)(2008)335–359. [9]M.Lugger,B.Yang,Cascadedemotionclassiﬁcationviapsychologicalemotion dimensionsusinglargesetofvoicequalityparameters,in:International Con-ferenceonAcoustics,Speech,andSignalProcessing(ICASSP2008),Instituteof ElectricalandElectronicsEngineers,2008,pp.4945–4948.

[10]M.Wollmer,F.Eyben,B.Schuller,E.Douglas-Cowie,R.Cowie,Data-driven clus-teringinemotionalspaceforaffectrecognitionusingdiscriminativelytrained lstmnetworks,in:Interspeech2009,InternationalSpeechCommunication Association,2009,pp.1595–1598.

[11]F.Eyben,M.Wollmer,A.Graves,B.Schuller,E.Douglas-Cowie,R.Cowie,On-line emotionrecognitionina3-dactivation-valence-timecontinuumusingacoustic andlinguisticcues,JournalonMultimodalUserInterfaces3(1–2)(2010)7–19. [12]B.Xie,L.Chen,G.Chen,C.Chen,StatisticalFeatureSelectionforMandarin SpeechEmotionRecognition.LectureNotesinComputerScience,Springer, Berlin/Heidelberg,2005.

[13]M.Lugger,B.Yang,Anincrementalanalysisofdifferentfeaturegroupsin speakerindependentemotionrecognition,in:ProceedingsoftheInternational ConferenceonPhoneticSciences,2007,pp.2149–2152.

[14]A.Batliner,S.Steidl,B.Schuller,D.Seppi,T.Vogt,J.Wagner,L.Devillers,L. Vidrascu,V.Aharonson,L.Kessous,N.Amir,Whodunnit–searchingforthe mostimportantfeaturetypessignallingemotion-relateduserstatesinspeech, ComputerSpeechandLanguage25(1)(2010)4–28.

[15]S.Steidl,AutomaticClassiﬁcationofEmotion-RelatedUserStatesin Spon-taneousChildren’sSpeech,ﬁrsted.,LogosVerlag,2009,URL:http://www5. informatik.uni-erlangen.de/Forschung/Publikationen/2009/Steidl09-ACO.pdf. [16]F.Burkhardt,A.Paeschke,M.Rolfes,W.Sendlmeier,B.Weiss,Adatabaseof germanemotionalspeech,in:Interspeech2005,InternationalSpeech Com-municationAssociation,2005,pp.1517–1520.

[17]J.Montero,Estrategiasparalamejoradelanaturalidadylaincorporaciónde variedademocionalalaconversióntextoavozencastellano,Ph.D.thesis, UniversidadPolitécnicadeMadrid,2003.

[18] M.Grimm,K.Kroschel,S.Narayanan,Theveraammittaggermanaudio–visual emotionalspeechdatabase,in:ProceedingsoftheIEEEInternational Confer-enceonMultimediaandExpo(ICME2008),2008,pp.865–868.

(9)

[19]P.Boersma,Praat,asystemfordoingphoneticsbycomputer,GlotInternational 5(2001)341–345.

[20]F. Eyben, M. Wollmer, B. Schuller, Openear – introducing the munich open-sourceemotionandaffectrecognitiontoolkit,in:Proceedingsof4th InternationalHUMAINEAssociationConferenceonAffectiveComputingand IntelligentInteraction,2009,pp.1–6.

[21]H.Pérez-Espinosa,C.A.Reyes-García,Detectionofnegativeemotionalstatein speechwithanﬁsandgeneticalgorithms,in:WorshoponModelsand Anal-ysisofVocalEmissionsforBiomedicalApplications(MAVEBA2009),Firenze UniversityPress,2009,pp.25–28.

[22]H.Pérez-Espinosa,C.A.Reyes-García,L.Villase ˜norPineda,Featuresselection forprimitivesestimationonemotionalspeech,in:InternationalConferenceon Acoustics,Speech,andSignalProcessing(ICASSP2010),InstituteofElectrical andElectronicsEngineers,Dallas,Texas,2010,pp.5138–5141.

[23]K.Santiago,C.A.Reyes-García,M.Gomez,Conjuntosdifusostipo2aplicados alacomparacióndifusadepatronesparaclasiﬁcacióndellantodeinfantes conriesgoneurológico,Master’sthesis,INAOE,Tonantzintla,Puebla,México, 2009.

[24]A.L.Reyes,Unmétodoparalaidentiﬁcacióndellenguajehabladoutilizando informaciónsuprasegmental,Ph.D.thesis,INAOE,Tonantzintla,Puebla, Méx-ico,2007.

[25]T.Dubuisson,T.Dutoit,B.Gosselin,M.Remacle,Ontheuseofthecorrelation betweenacousticdescriptorsforthenormal/pathologicalvoices discrimina-tion,EURASIPJournalonAdvancesinSignalProcessing,AnalysisandSignal ProcessingofOesophagealandPathologicalVoices10.1155/2009/173967. [26]C.T.Ishi,H.Ishiguro,N.Hagita,Proposalofacousticmeasuresforautomatic

detectionofvocalfry,in:Interspeech2005,InternationalSpeech Communica-tionAssociation,2005,pp.481–484.

[27]R.Kehrein,Theprosodyofauthenticemotions,in:SpeechProsody2002, Inter-nationalConference,2002,pp.423–426.

[28]M.Lugger,B.Yang,Classiﬁcationofdifferentspeakinggroupsbymeansofvoice qualityparameters,ITG-FachtagungSprach-Kommunikation.

[29]B.F.Nu ñes,Evaluaciónperceptualdeladisfonía:correlaciónconlosparametros acústicosyfiabilidad,Actaotorrinolaringológicaespa ñola:Organooficialde laSociedadespa ñoladeotorrinolaringologíaypatologíacérvico-facial55(6) (2004)282–287.

[30]A.B.Kandali,A.Routray,T.K.Basu,Vocalemotionrecognitioninﬁvenative languagesofassamusingnewwaveletfeatures,InternationalJournalofSpeech Technology12(1)(2009)1–13.

[31]W.Byrne,J.Robinson,S.Shamma,Theauditoryprocessingandrecognition ofspeech,in:ProceedingsoftheWorkshoponSpeechandNaturalLanguage, HLT’89,AssociationforComputationalLinguistics,Stroudsburg,PA,USA,1989, pp.325–331,URL:http://dx.doi.org/10.3115/1075434.1075490.

[32] T.Zbynik,J.Psutka,Speechproductionbasedonthemel-frequencycepstral coefﬁcients,in:Eurospeech1999,InternationalSpeechCommunication Asso-ciation,1999,pp.2335–2338.

[33]P.Dumouchel,N.Dehak,R.Attabi,Y.B.NDehak,Cepstralandlong-termfeatures foremotionrecognition,in:Interspeech2009,InternationalSpeech Commu-nicationAssociation,2009.

[34] E.Bozkurt,Improvingautomaticemotionrecognitionfromspeechsignals,in: Interspeech2009,InternationalSpeechCommunicationAssociation,2009. [35] B.Vlasenko,B.Schuller,A.Wendemuth,G.Rigoll,Framevs.turn-level:

emo-tionrecognitionfromspeechconsideringstaticanddynamicprocessing,in: A.Paiva,R.Prada,R.Picard(Eds.),AffectiveComputingandIntelligent Interac-tion,LectureNotesinComputerScience,vol.4738,Springer,Berlin/Heidelberg, 2007,pp.139–147.

[36] O.Fujimura,K.Honda,H.Kawahara,Y.Konparu,M.Morise,J.C.Williams,Noh voicequality,LogopedPhoniatrVocol34(4)(2009)157–170.

[37]A.Metallinou,S.Lee,S.S.Narayanan,Decisionlevelcombinationofmultiple modalitiesforrecognitionandanalysisofemotionalexpression,in: Proceed-ingsoftheInternationalConferenceonAcoustics,Speech,andSignalProcessing (ICASSP2010), InstituteofElectricalandElectronics Engineers,2010,pp. 2462–2465.

[38]I.H.Witten, E.Frank,Data Mining:PracticalMachine LearningTools and Techniques with Java Implementations. (The Morgan Kaufmann Series in Data Management Systems), ﬁrst ed.,Morgan Kaufmann, 1999, URL: http://www.worldcat.org/isbn/1558605525.

[39] P.Pudil,J.Novoviˇcová,J.Kittler,Floatingsearchmethodsinfeatureselection, PatternRecognitionLetters15(1994)1119–1125.

[40]M.Gutlein,E.Frank,M.Hall,A.Karwath,Large-scaleattributeselectionusing wrappers,in:IEEESymposiumonComputationalIntelligenceandDataMining 2009,InstituteofElectricalandElectronicsEngineers,2009,pp.332–339.