ContentslistsavailableatScienceDirect
Biomedical
Signal
Processing
and
Control
j o u r n al ho m e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / b s p c
Acoustic
feature
selection
and
classification
of
emotions
in
speech
using
a
3D
continuous
emotion
model
Humberto
Pérez-Espinosa
∗,
Carlos
A.
Reyes-García,
Luis
Villase ˜
nor-Pineda
InstitutoNacionaldeAstrofísica,ÓpticayElectrónica,LuisEriqueErro1,Tonantzintla,Puebla72840,Mexico
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Received15September2010
Receivedinrevisedform23February2011 Accepted24February2011
Available online 3 April 2011
Keywords:
Automaticemotionrecognition Continuousemotionmodel Featureselection
a
b
s
t
r
a
c
t
Inthispaperwereporttheresultsobtainedfromexperimentswithadatabaseofemotionalspeech inEnglishinordertofindthemostimportantacousticfeaturestoestimateEmotionPrimitiveswhich determinetheemotionalcontentonspeech.Weareinterestedinexploitingthepotentialbenefitsof continuousemotionmodels,sointhispaperwedemonstratethefeasibilityofapplyingthisapproach toannotationofemotionalspeechandweexplorewaystotakeadvantageofthiskindofannotationto improvetheautomaticclassificationofbasicemotions.
© 2011 Elsevier Ltd. All rights reserved.
1. Introduction
Emotions are very important in our everyday life, theyare present in everything wedo. Thereis a continuous interaction betweenemotions,behaviorandthoughts,insuchawaythatthey constantlyinfluenceeach other.Emotions area greatsourceof informationincommunicationandinteractionamongpeople,they areassimilatedintuitively.
Theapplicationsofemotionrecognitionencompassmanyfields, forexample,asasupportingtechnologyinmedicalareassuchas psychology,neurologyand caringof agedand impairedpeople. Automaticemotionrecognitionbasedonbiomedicalsignals,facial andvocalexpressionshasbeenappliedtodiagnosisand following-upofprogressiveneurologicaldisorders,specificallyHuntington’s and Parkinson’s diseases [1]. These pathologies are character-izedbyadeficitinemotionalprocessingoffearanddisgustand, thus,thesystemcouldbeutilizedtodeterminethesubject reac-tion/orabsenceofreactiontospecificemotions,helpingthehealth professionals togain abetterunderstandingof thesedisorders. Furthermore,thesystemcouldbeusedasa referenceto evalu-atethepatients’responsetocertainmedicines.Anotherimportant medicalapplicationisremotemedicalsupport.Thesekindsof envi-ronmentsenablecommunicationbetweenmedicalprofessionals andpatientsforcasesofregularmonitoringandemergency situ-ations.Inthisscenariothesystemrecognizespatient’semotional
∗Correspondingauthor.
E-mailaddresses:humbertop@inaoep.mx(H.Pérez-Espinosa), kargaxxi@inaoep.mx(C.A.Reyes-García),villasen@inaoep.mx (L.Villase ˜nor-Pineda).
statesandthentransmitsdataindicatingthepatientis experienc-ingdepressionorsadness,health-careprovidersmonitoringthem willbebetterpreparedtorespond.Suchasystemhasthe poten-tialtoimprovepatientsatisfactionandhealth[2–4].Itcanalsobe anassetfordisabledpeoplewhohavedifficultieswith communi-cation.Hearing-impairedpeoplewhoarenotprofoundlydeafcan useresidualhearingtocommunicatewithotherpeopleandlearnto speakwithemotionsmakingcommunicationmorecompleteand understandable.Inthesecases,emotionrecognitionenginescan beusedasanimportantelementofacomputer-assistedemotional speechtrainingsystem[5].Forhearing-impairedpeople,itcould provideaneasierwaytolearnhowtospeakwithemotionmore nat-urallyorhelpspeechtherapisttoguidethemtoexpresscorrectly emotionsin speech. Emotionrecognitionarousesgreat interest intheinterfacedesigngiventhatrecognizingandunderstanding emotionsautomaticallyisoneofthekeystepstowardsemotional intelligenceinHuman–ComputerInteraction(HCI).Theneedfor automaticemotionrecognitionhasemergedduetothetendency towardsamorenaturalinteractionbetweenhumansand comput-ers.AffectivecomputingisatopicwithinHCIthatencompassthese researchtendencytryingtoendowcomputerswiththeabilityto detect,recognize,modelandtakeintoaccountuser’semotional statethatplaysaroleofparamountimportanceinthewayhumans makedecisions[6].Emotionsareessentialforhumanthought pro-cessesthatinfluenceinteractionsbetweenpeopleandintelligent systems.
In the area of automatic emotion recognition mainly two annotationschemeshavebeenusedtocaptureanddescribethe emotionalcontentinspeech:discreteandcontinuousapproaches. Discreteapproachisbasedontheconceptofbasicemotionssuch asanger,joy,andsadness,thatarethemostintenseformof
emo-1746-8094/$–seefrontmatter© 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.bspc.2011.02.008
tionsfromwhichallotheremotionsaregeneratedbyvariations orcombinationsofthem. Theyassumetheexistenceof univer-salemotionsthatcanbeclearlydistinguishedfromoneanother bymostpeople.Ontheotherhand,continuousapproach repre-sentemotionalstatesusingacontinuousmultidimensionalspace. Emotionsarerepresentedbyregionsinann-dimensionalspace whereeachdimensionrepresentsanEmotionPrimitive.Emotion Primitivesaresubjectivepropertiesshown byallemotions.The mostwidelyacceptedprimitivesareArousalandValence.Valence describeshownegativeorpositiveisaspecificemotion.Arousal, alsocalledActivation,describestheinternalexcitementofan indi-vidualand rangesfrombeingverycalmtobeveryactive.Also, three-dimensionalmodelshavebeenproposed.Themostcommon three-dimensionalmodelincludesValence,Activationaswellas Dominance[7].Thisadditionalprimitivedescribesthedegreeof controlthattheindividualintendstotakeonthesituation,orin otherwords,howstrongorweaktheindividualseemstobe.
Bothapproaches,discreteandcontinuous,provide complemen-taryinformationabouttheemotionalmanifestationsobservedin individuals[8].Discretecategorizationallowsamore particular-izedrepresentationofemotionsinapplicationswhereitisneeded torecognizeapredefinedsetofemotions.However,thisapproach ignoresmostofthespectrumofhumanemotionalexpressions.In contrast,continuousmodelsallowtherepresentationofany emo-tionalstateandsubtleemotions,butithasbeenfoundthatitis difficulttoestimatewithhighprecisiontheEmotionPrimitives basedonlyonacousticinformation.Someauthorshavebegunto researchabout howto takeadvantageof this theory[9–11] to estimatemoreadequatelytheemotionalcontentinspeech.Both approaches arecloselyrelated, byassessingtheemotional con-tentinspeechusingoneofthesetwoschemeswecaninferits counterpartsintheotherscheme.Forinstance,ifanutteranceis evaluatedasangerwemayinferthattheutterancewouldhave alowvalueforValenceandhighforActivationandDominance. Conversely,ifanutteranceisevaluatedwithlowValenceandhigh ActivationandDominancewecouldinferthatthisisAnger.Several authors[12–14]haveworkedontheanalysisofthemostimportant acousticfeaturesfromthepointofviewofdiscretecategorization; however,theyhavenotyetstudiedwiththesamedepththe impor-tanceofacousticattributesfromthecontinuousmodelspointof view.Webelievethatthecontinuousapproachhasgreat poten-tialtomodeltheoccurrenceofemotionsintherealworld.This three-dimensionalcontinuousmodelisadoptedinthispaper.As afirststeptowardsexploitingthecontinuousapproachwe ana-lyzethemostimportantacousticfeaturestoautomaticallyestimate EmotionPrimitivesinspeech.Then,wewillbeabletousethis esti-mationinordertolocatetheindividual’semotionalstateinthe multidimensionalspace,andifnecessary,tomapittoabasic emo-tion.Inthisworkweapplytheseideastoimprovetheautomatic emotionrecognitionfromacousticinformation.Weperformsome experimentsinordertofindthemostimportantacousticfeatures forestimatingEmotionPrimitivesinspeechandthenwepropose amethodthatusestheseestimationstodeterminetheindividual’s emotionalstatemappingittoabasicemotion.Theremainderofthis paperisorganizedasfollows.First,wedescribethedatabasein Sec-tion2.Next,inSection3wedescribetheacousticfeaturesandhow weareextractingthemfromspeechsignals.Thefiltersweapplied toselectthebestinstancesareexplainedinSection4.InSection
5weproposeanddescribetwowaysoffindingthebestacoustic featuresforEmotionPrimitivesEstimation.Experimentalresults anddiscussionsaboutfeatureselectionareprovidedinSection6. InSection7weproposeawayofapplyingtheautomaticEmotion Primitiveestimationinordertoclassifybasicemotions.InSection
8,wepresentanddiscusstheresultsobtainedfrombasicemotions classification.Finally,theconclusionofthisstudyandfuturework arediscussedinSection9.
2. Database
Forthepurposesofthisworkitisnecessarytohaveadatabase labeledwithEmotionPrimitives,namelyValence,Activationand Dominance.Inaddition,weneedbasicemotionsannotationsfor eachsampletovalidatetheEmotionPrimitivesestimation accu-racybyevaluatingthemappingdonefromcontinuoustodiscrete approach.Therearemanydatabaseslabeledwithemotional cate-goriessuchasFAUAibo[15],BerlinDatabaseofEmotionalSpeech
[16]SpanishEmotionalSpeech[17]andafewthatarelabeledwith EmotionPrimitivesVAMCorpus[18],IEMOCAPDatabase[8].The databaseweusedislabeledwithcommondiscretecategoriesand EmotionPrimitives.ThisdatabaseiscalledIEMOCAP[8] (Interac-tiveEmotionalDyadicMotionCaptureDatabase).Itwascollected attheSpeechAnalysisandInterpretationLaboratoryatthe Uni-versityofSouthernCaliforniaanditisinEnglish.Itwasrecorded fromtenactorsinmale–womanpairs.IEMOCAPincludes informa-tionabouthead,faceandhandsmotionaswellasaudioandvideo offeringdetailedinformationaboutfacialexpressionandgestures. Togenerateanemotionaldialoguetheydesignedtwoscenarios.In thefirstone,theactorswerefollowingascript,whileinthesecond one,theactorsimprovisedaccordingtoapresetsituation.The com-pletecorpuscontainsabout12h,althoughatthemomentonlythe firstsession,i.e.,theinteractionofapairofactorshasbeenreleased. Theutteranceswerelabeledwiththeemotionalcategories: hap-piness,anger,sadness,frustration,fear,surpriseandneutral.The categories“other”andnot“identified”werealsoincluded.To eval-uateEmotionPrimitivesanintegervaluebetweenoneandfivewas annotatedforeachprimitive,Valence(1–negative,5–positive), Activation(1–calm,5–excited),andDominance(1–weak,5– strong).Thecharacteristicsofthisdatabasemakeitvery interest-ingforthepurposesofourworksincetheannotationincludesthe twomostimportantapproaches.Attentionhasbeengivento spon-taneousinteraction,inspiteofusingactors,inidealconditionsfor recording.Thedatabaseshowsasignificantdiversityofemotional states.Weuseonlythesegmentedaudiointurnsofthefirstsession forthecategories:anger,happiness,neutralandsurprise.Intotal, thereare1.820instances.
3. Featuresextraction
Weextractedacousticfeaturesfromthespeechsignalusingtwo programsforacousticanalysisPraat[19] andOpenEar[20].We evaluatedtwosetsoffeatures;oneofthemwasobtainedthrougha selectiveapproach,i.e.,basedonastudytakingintoaccountthe fea-turesthatcouldbeuseful,thefeaturesthathavebeensuccessfulin relatedworksandfeaturesusedforothersimilartasks.Thesecond featuresetwasobtainedbyapplyingabruteforceapproach,i.e., generatingalargeamountofthemhopingthatsomewillbefound useful.TheSelectiveFeatureSetisasetoffeaturesthatwehave beenbuildingoverourresearch[21,22]and wasextractedwith thesoftwarePraat.Wedesignedthissetoffeaturesrepresenting severalvoiceaspects,weincludedthetraditionalattributes asso-ciatedwithprosody,e.g.,duration,pitchandenergy.Otherswhich haveshowngoodresultsintaskslikespeechrecognition,speaker recognition,classificationofbabycry[23],languagerecognition
[24],andpathologydetectioninvoice[25,26]andthatwebelieve canbesuccessfulinemotionrecognition.Table1showsthenumber ofacousticfeaturesthatwereincludedineachfeaturegroup.We dividethethreetypesoffeaturesweincludein:prosodic, spec-tralandvoicequality.Prosodicfeaturesdescribesuprasegmental phenomena,i.e.,speechunitslargerthanphonemes,suchaspitch, loudness,speed,duration,pausesand rhythm.Prosodyisarich sourceofinformationinspeechprocessingbecauseithasavery importantparalinguisticinformationthatcomplementsthe
mes-Table1
Selectiveapproachfeatureset.
Group Featuretype Numberoffeatures
Prosodic
Times Elocutiontimes 8
F0 Melodiccontour 9
Energy Energycontour 12 Voicequality
Voicequality Qualitydescriptors 24 Voicequality Articulation 12 Spectral
LPC FastFouriertransform 4 LPC Longtermaverage 5
LPC Wavelets 6
MFCC MFCC 96
Cochleagram Cochleagram 96
LPC LPC 96
Total 368
sagewithanintentionthat canreflect anattitudeoremotional state[27].Thesetype of featuresare themostcommonlyused inspeechemotionrecognition.Wesubdivideprosodicfeaturesin elocutiontimes,melodiccontourandenergycontour.Thesecond typeoffeaturesisvoicequalitythatgivestheprimarydistinction toa given speaker’svoicewhen prosodic aspectsare excluded. Someofthedescriptionsofvoicequalityareharshness, breathi-nessandnasality.Someauthors[15]hasstudiedtheimportance ofvoicequalitystatingthatthekeytothevocaldifferentiationof discreteemotionsseemstobevoicequality.Weincludedthetwo mostpopularvoicequalitydescriptorsthatarejitterandshimmer. Wealsoincludedothervoicequalitydescriptorsthathavebeen relatedtotheGRBASscaleinrelatedwork[25–29].Someofthese featureshaveneverbeenusedinspeechemotionrecognition.For example,theenergydifferencesbetweenfrequencybandsandthe ratiobetweenfrequencybandswereusedby[25]todiscriminate betweenpathologicalandnormalvoices.Increasesanddecreases inenergypeakswereusedby[26]forautomaticdetectionofvocal fry.Articulationisalsoanimportantparametertomeasurevoice quality.Weincludedsomestatisticalmeasuresofthefirstfour for-mantsasarticulatorydescriptors.Spectral features describethe characteristicsofaspeechsignalinthefrequencydomainbesides F0likeharmonicsandformants[15].Weincludedseveraltypesof spectralrepresentations.Someoftheserepresentationshavenever beenusedinemotionrecognitionascochleagramsthathavebeen usedforinfantcryclassification[23]andothersthathavestudied verylittleforthistaskasWavelets[30].Acochleagram[19] repre-sentstheexcitationoftheauditorynervefilamentsofthebasilar membrane,whichissituatedinthecochleaintheinnerear.This excitationisrepresentedasafunctionovertime(s)andBark fre-quencythatisapsychoacoustic.Acochleagramalsomodelsthe soundvolumeandfrequencymasking.Frequencymaskinginthe ear,happenswhenweheartwosoundsofdifferentintensityat thesametime,theweakestsoundisnotdistinguishedasthebrain onlyprocessesthemaskingsound.Featuresbasedoncochleagrams havebeenusedforspeechrecognitionwithgoodresults. Inthe workofShammaetal.[31]cochleagramswereusedforspeech recognitionatphonemelevel,surpassingtheresultsobtainedby LPCfeatures.WaveletsareanalternativetotheFouriertransform. Wavelettransformallowsagoodresolutionatlowfrequencies.We alsoincludedfeatureswidelyusedinspeechprocessingasMFCCs (Mel-frequencycepstralcoefficients)[32]thatrepresentsspeech perceptionbasedonhumanhearingandhavebeensuccessfully usedtodiscriminatephonemes.MFCCshaveshownthattheynot onlyareusefultodeterminewhatissaidbut,alsohowitissaid.
Thebruteforcefeaturesetwasextractedusingthesoftware OpenEar.Weextractatotalof6552featuresincludingfirst-order functionalsoflow-leveldescriptors(LLD)suchasFFT-Spectrum,
Table2
Bruteforceapproachfeatureset.
Group Featuretype Numberoffeatures
Prosodic
Energy LOGenergy 117
Times Zerocrossingrate 117 PoV Probabilityofvoicing 117
F0 F0 234
Spectral
MFCC MFCC 1521
MEL MELspectrum 3042
SEB Spectralenergyinbands 469 SROP Spectralrolloffpoint 468 SFlux Spectralflux 117 SC Spectralcentroid 117 SpecMaxMin Spectralmaxandmin 233
Total 6552
Mel-Spectrum,MFCC,Pitch(FundamentalFrequencyF0viaACF), Energy, Spectral, LSP.39 functionals such as Extremes, Regres-sion,Moments,Percentiles,Crossings,Peaks,Meanswereapplied. Table2showsfeaturesextractedbybruteforceapproach.In emo-tionrecognitionfromspeechtherearetwoapproachesforfeature processing static and dynamic approach. Thedynamic process-ingcapturesinformationabouthowfeaturesevolveovertime.By theotherside,staticprocessingavoidsoverfittinginthephonetic modeling byapplyingstatisticalfunctionsonlow level descrip-tors in periods of time. Static processing is more common on emotionrecognitionfromspeech.However,dynamicprocessing hasshowedgoodresultsinrecentyears[33–35].Andevennew methods havebeenproposedtorepresentthesignalproperties associatedwiththerelationshipbetweenexpressivenessandvoice quality[36].Ourspectralfeaturesincludestaticcoefficientsthat describethespectralpropertieswithinoneframewherethesignal isapproximatelystationary.Ourfeaturesetincludesdynamic fea-tures,whichdescribethebehaviorofthestaticfeaturesovertime. Forthispurpose,thefirstandthesecondderivativeofthestatic featuresinthebruteforcesetwerecalculated.
4. Instanceselection
Doinganinspectiononthedatabaseinstances,werealizedthat therewereproblematicinstances;wethoughtthatourmachine learning algorithm would havea betterperformance by select-ingthemostappropriateandcongruentinstancesrepresentingthe propertiesofcontinuousanddiscreteapproaches.Ourinitialdata setconsistsof1820instances.Weworkedonlywiththemore rep-resentedclasses,sowefirstselectedtheinstancesfromthefour classeswithmoreexamples.Wechoosetheinstancesfromthese fourclassestoenablethecomparisonofourresultstothework doneby[37],wheretheyuseonlythesefourclasses.After apply-ing thisfilterweare leftwith942instances, 20%wasreserved forfinaltesting.TheexperimentsreportedinSections6,7and8
weredoneusingthe80%equivalentto753instances.Another fil-terwasappliedtotheseinstanceswhichconsistedofremovingall instancesthathavethesameannotationforeachofthethree prim-itives,butdifferentannotationforbasicemotion,toberegarded ascontradictoryinstancesthataddnoisetoourlearningprocess. Forexample,ifannotationsforaninstanceare(Valence=2, Acti-vation=4, Dominance=3, Emotion=Angry)and the annotations foranotherinstanceare(Valence=2,Activation=4,Dominance=3, Emotion=Happiness)everyinstanceannotatedwithValence=2, Activation=4andDominance=3iseliminatedfromourdataset. Afterthisfiltering,wewereworkingwithasetof401instancesin thefeatureselectionprocess.
Table3
Correlationindexobtainedforprimitivesestimationusingthewholefeatureset.
EmotionPrimitive Correlationindex
Valence 0.0151
Activation 0.0095
Dominance −0.001
5. Featureselection
Theneedoffindingthebestfeaturesubsetsforbuildingour learningmodelsarisesgiventhelowcorrelationobtainedinthe estimationofprimitivesusingatrainedmodelforthefullsetof 6920acousticfeatures.Correlationcoefficientmeasuresthe qual-ity of theestimated variabledetermining thestrength and the directionofalinearrelationshipbetweentheestimatedandthe actualvalueofthevariable.Thecloserthecoefficientistoeither
−1or1,thestrongerthecorrelationbetweenthevariables.Asit approacheszerothereislessofarelationship.Table3showsthe correlationcoefficientobtainedwhenestimatingthevalueofthe EmotionPrimitiveswithamodelbuiltfrom942instances(using onlythefourclasseswithmoreinstances)and6920attributes.As wecansee,correlationisverylowandtherefore,thelearntmodels fromthesedataarenotuseful.Havingtoomanyfeaturesinrelation tothenumberofinstancescomplicatestheclassifierstask,SVM inthiscase,impedingaproperpredictionmodel.Althoughmany attributescanenhancediscriminationpower,inpractice,witha limitedamountofdata,anexcessiveamountofattributes signifi-cantlydelaysthelearningprocessandoftenresultsinover-fitting. Initially,we tested differentattribute selectorssuchas Sub-SetEval[38]thatevaluatesthevalueofasubsetofattributesby consideringtheindividualpredictiveabilityofeachfeaturealong withthedegreeofredundancybetweenthem,preferringthe sub-setsoffeaturesthatarehighlycorrelatedwithaclassandlowly correlatedwitheachother.WealsotestedReliefAttribute[38]The keyideaofReliefAttributeistoestimateattributesaccordingtothe valuesthatdistinguishthecloserinstances.Foragiveninstance, ReliefAttributeseekstwonearestneighbors:onefromthesame classandanotherfromadifferentclass.Goodattributesmustbe ononehand,differentvaluesbetweeninstancesofdifferentclasses andontheotherhand,thesamevaluesforinstancesofthesame class.Usingthesetechniqueswecouldnotimprovethecorrelation resultsforprimitivesestimation.Duetotheseproblems,itis nec-essarytodeviseawaytoselectthebestfeaturesamongalarge numberofthem,bearinginmindthatwehaveafewinstances.We proposetwoschemesforselectingattributesworkingonsmaller featuresetswiththeideaofavoidingthesearchforthebestfeatures onthewholeset.
5.1. Scheme1:ungroupedfeatureselection
Fig.1showstheprocessesappliedinthisscheme.Westarted fromaninitialfeaturesetobtainedfromafeatureselection pro-cessappliedtotheVAM(VeraamMittag)Germanspontaneous speechdatabase[18].Thisinitialfeaturesetachievedgood cor-relationresultsintheestimationofEmotionalPrimitivesinVAM
Fig.1.Ungroupedfeatureselectionscheme.
Fig.2.Linearforwardselectionwithfixedwidth(takenfrom[40]). database.Thisselectionprocesswascarriedoutwith252features obtainedbyselectiveapproachand949instances[22].Later,we conductedtheinstanceselectionprocessexplainedinSection4. Finally,weappliedthefeatureselectionprocessknownasLinear FloatingForwardSelection(LFFS)[39]whichmakesaHill-Climbing search,startingwiththeemptysetorwithapredefinedset, eval-uatesallpossibleinclusionsofasingleattributetothesolution subset.Ateachsteptheattributewiththebestevaluationisadded. Thesearchendswhentherearenoinclusionsthatimprovethe evaluation.Inaddition,LFFSdynamicallychangesthenumberof featuresincludedoreliminatedateachstep.Inourexperiments weusetheLFFSmodalitycalledfixedwidthshowninFig.2.Inthis modethesearchisnotperformedonthewholefeatureset. Ini-tially,thekbestfeaturesareselectedandtherestareremoved.In eachiterationthefeaturesaddedtothesolutionsetarereplacedby featurestakenfromthosethathadbeeneliminated.Thisscheme processesarerepeatedforeachEmotionPrimitive.
5.2. Scheme2:groupedfeatureselection
Inthisscheme,theideaistodividethewholefeaturesetinto smallergroupsaccordingtotheacousticpropertiestheyrepresent. FeaturegroupsareshowninTables1and2.Fig.3showsthesteps followedinthisscheme.First,weappliedtheinstanceselection filterexplainedinSection4.Second,wedividedthewholedata setintosmallersetsgroupingfeaturessharingthesameacoustic properties(seeTables1and2)andweappliedLFFS,unlikeScheme 1,thistimethesearchprocessbyLFFSstartedfromtheemptyset. Third,theselectedfeaturesforeach groupwereputtogetherin afinalset.Thethreestepsinthisselectionschemebygroupsof featuresarerepeatedforeachEmotionPrimitive.
6. Featureselectionresults
Alltheresultsofthelearningexperimentsinthispaperwere obtainedusingSupportVectorMachines(SVM)andvalidatedby 10-FoldCross-Validation.
Themetricsusedtomeasuretheimportanceoffeaturegroups are:correlationcoefficient,shareandportion.Thecorrelation coef-ficientis themostcommonparametertomeasurethemachine learningalgorithmsperformanceonregressiontasks,asinourcase. Weuseshareandportionthataremeasuresproposedin[14]to assesstheimpactofdifferenttypesoffeaturesontheperformance ofautomaticrecognitionofdiscretecategoricalemotions.
Fig.3.Groupedfeatureselectionscheme.
Correlationcoefficient: Indicatesthestrengthand directionof thelinearrelationshipbetweentheannotatedprimitivesandthe estimatedprimitivesbythetrainedmodel.Itisourmainmetricto measuretheclassificationresults.
Share:Showsthecontributionoftypesoffeatures.For exam-ple,if28featuresareselectedfromthegrouptimesand150were selectedintotalthen.
Share=(28×100) 150 =18.7
Portion:Showsthecontributionoftypesoffeaturesweighted bythenumberoffeaturespertype.Forexample,if28featuresare selectedfromatotalof125featuresofthetimessetthen:
Portion=(28125×100)=22.4
Havingidentifiedthebestacousticfeaturesetsweconstructed individualclassifierstoestimateeachEmotionPrimitive.Table4
showstheevaluationresultsoftheinstances/attributesselection schemeillustratedinFig.1.Thesecondcolumnshowstheresults whenusingallattributesandallinstancesinthelearningprocess. Aswecanseethecorrelationcoefficientistoolow.Thethird col-umnshowstheresultswhenthelearningprocessis performed usingthebestfeaturesforeachprimitiveproposedby[22].The fourthcolumnshowstheresultswhenthelearningprocessusesthe samefeaturesasthepreviousexperiment,butwefilteredinstances asdescribedinSection4.Finally,thefifthcolumnshowstheresults afterfilteringinstancesandapplyingthefeatureselectionmethod LFFS.Aswecanseetheimprovementinresultswasgradualafter applyingeachdataprocessing.
Thetwofeatureselectionschemeswereappliedtoeachofthe threeEmotionPrimitives,thatis,weransixtimesthefeature selec-tionprocess,obtainingsixdifferentfeaturesets.Thispapershows theresultsofthebestsubsetsforeachprimitive.Tables5–7reflect theeffectivenessof eachfeaturesetand thedifferencesamong groups. The groups PoV and SC are notshown in theseTables becausenofeatureswereselectedfromthem.Itisimportantto notethatthecorrelation,shareandportionshowninthesetables areobtainedbydecomposingingroups offeaturesthesolution setfoundbytheselectionSchemes1and2andevaluatingthem separatelywiththesemetrics.Thehigestvalueforeachmetricis
Table4
Resultsforeachstepoftheungroupedfeatureselectionscheme.Eachcolumnshows thenumberoffeatures/correlationcoefficient.
Emotion Primitive
Baseline results
Scheme1 results
Allattributes Initialfeature selection
Instance selection
LFFS
Valence 6920/0.0151 56/0.5188 56/0.5597 62/0.6189 Activation 6920/0.0095 67/0.7463 67/0.7861 60/0.7969 Dominance 6920/−0.001 23/0.6536 23/0.7117 31/0.7437
Table5
Scheme1/scheme2featureselectionresults–Valence.
Featuregroup Total Selected Correlation Share Portion
Voicequality 36 6/5 0.29/0.36 9.67/7.14 16.66/13.88 Times 125 1/5 0.33/0.36 1.61/7.14 0.80/4.00 Cochleagrams 96 8/10 0.40/0.51 12.90/14.28 8.33/10.41 LPC 111 13/8 0.34/0.53 20.96/11.42 11.71/7.20 SFlux 117 0/5 −/0.48 0/7.14 0/4.27 Energy 129 0/6 −/0.37 0/8.57 0/4.65 F0 243 0/1 −/0.31 0/1.42 0/0.41 SpecMaxMin 234 0/1 −/0.18 0/1.42 0/0.42 SEB 234 0/5 −/0.45 0/7.14 0/2.13 SROP 468 0/6 −/0.35 0/8.57 0/1.28 MFCC 1617 27/13 0.48/0.49 43.54/18.57 1.67/0.80 MEL 3042 7/5 0.51/0.51 11.29/7.14 0.24/0.16 All 6920 62/70 0.61/0.62 100/100 0.89/1.04
Table6
Scheme1/scheme2featureselectionresults– Activation.
Featuregroup Total Selected Correlation Share Portion
Voicequality 36 3/10 0.08/0.72 5.00/9.80 8.33/27.77
Times 125 1/10 0.16/0.71 1.66/9.80 0.80/8.00 Cochleagrams 96 24/10 0.77/0.78 40.00/9.80 25.00/10.41 LPC 111 4/12 0.61/0.78 6.66/11.76 3.60/10.81 SFlux 117 0/4 −/0.65 0.00/3.92 0.00/3.41 Energy 129 4/11 0.76/0.78 6.66/10.78 3.10/8.52 F0 243 1/9 0.29/0.65 1.667/8.82 0.412/3.70 SpecMaxMin 234 0/11 −/0.78 0.00/10.78 0.00/4.70 SEB 234 0/4 −/0.53 0.00/3.92 0.00/1.70 SROP 468 0/6 −/0.65 0.00/5.88 0.00/1.28 MFCC 1617 23/6 0.77/0.78 38.33/5.88 1.42/0.37 MEL 3042 0/9 −/0.73 0/8.82 0/0.29 All 6920 60/102 0.79/0.78 100/100 0.86/1.47
Table7
Scheme1/scheme2featureselectionresults–Dominance.
Featuregroup Total Selected Correlation Share Portion
Voicequality 36 0/10 −/0.65 0.00/13.15 0.00/27.77
Times 125 2/6 0.35/61 6.45/7.89 1.60/4.80 Cochleagrams 96 7/8 0.70/72 22.58/10.52 7.29/8.33 LPC 111 1/7 0.66/72 3.22/9.21 0.90/6.30 SFlux 117 1/5 0.61/0.66 3.22/6.57 0.85/4.27 Energy 129 2/12 0.15/0.71 6.45/15.78 1.15/9.30 F0 243 3/4 0.22/0.59 9.67/5.26 1.23/1.64 SpecMaxMin 234 0/6 −/0.42 0.00/7.89 0.00/2.56 SEB 234 0/2 −/0.59 0.00/2.63 0.00/0.85 SROP 468 0/4 −/0.53 0.00/5.26 0.00/0.85 MFCC 1617 11/8 0.71 35.48/10.52 0.68/0.49 MEL 3042 4/4 0.68/0.69 12.90/5.26 0.13/0.13 All 6920 31/76 0.74/0.72 100/100 0.44/1.09
markedinbold.WhilefeatureselectionScheme2ensuresthatat leastoneattributeofeachgroupwillbeincluded,Scheme1maynot includeanyelementofcertaingroupsinthesolutionset.Thelast rowshowsthetotalnumberoffeaturesselectedfortheEmotion Primitive.IntheexperimentsforValencewiththeselectionScheme
1,thegroupwithhighercorrelationwasMEL(0.5167), contribut-ingtothesolutionsetwith7outof62features.TheMELportion isverylow(0.230)thereisatotalof3042featuresbelongingto thisgroup.Sharemaybeconsideredmedium(11.290).This indi-catesthatfewMELcoefficientsprovideveryimportantinformation. AnotherimportantgroupwasMFCCwithacorrelationof0.4877, indicatingthatthespectralinformationgroupsareimportantto estimateValence.IntheexperimentforValencewiththe selec-tionScheme2,thebestgroupswereLPC(0.5349)cochleagrams (0.5152)andMEL(0.5129).LPCandcochleagramsshowashare (11.429–14.284)and portion(7.207–10.417)similar,meanwhile MELshowsashare(7.143)andportion(0.164)lowerincomparison withthetwomentionedgroups.AsinScheme1,MELprovidesfew features,butveryimportantfeatures.WerealizedthatforValence spectraltypegroupswerethebestforSchemes1and2.Nogroupby itselfreachedthecorrelationobtainedwithallgroupstogether.We canseethatthe8selectedattributesinScheme2fromLPCwere muchbetter(0.535)thanthe13attributesselectedinScheme1 forthesamegroup(0.342).ThebestcorrelationforValencewas obtainedusingtheScheme2reaching0.6232.
ForActivationthegroupMFCChasthebestcorrelationinboth selectionschemesMFCCobtained0.7795inScheme1and0.7897 inScheme2.Wecanseethatitsshare(38.333and5.882)and por-tion(1.422and0.371)areverydifferent.Wecanseethatusing onlythegroupMFCCwecanreachsimilarresultstothoseobtained usingallgroups(0.7964and0.787).Theseresultsclearlyindicate theimportanceofthisgrouptoestimateActivation.Asexpected, thesignalenergywasveryimportantforthisprimitivewitha cor-relationof0.7851inScheme2and 0.7639inScheme1asit is expectedthatpeopleexperiencingincreasedactivityorexcitement showahighervolumeinitsvoice.Otherimportantgroupsfor Acti-vationarecochleagramsandLPC.Aninterestingeffectisthatthe groupSpecMaxMinwasimportantintheScheme2witha corre-lationof0.7831,withahighshare,butinScheme1,thiswasnot selectedanyfeaturefromthisgroup.Itwasexpectedthatthegroup timeswasimportantforActivationsinceithasbeenfoundthatthe fasterspeechisperceivedthemoreexcitedtheindividualisand theslowerspeechisperceivedthemorerelaxedtheindividualis perceivedtobe[27].
OnlyonetimesfeaturewasselectedintheScheme1;its cor-relationwasverylow(0.167),whereasinScheme2,10features wereselectedobtainingarelativelyhighcorrelation(0.7128).In Scheme2the bestresultswerenot obtainedusingallselected attributes(0.787)butusingonlytheonesselectedbythebrute forceapproach(0.7952).ThebestresultforActivationwasobtained usingallgroupsinScheme10.7964.WeinferthatActivationmajor groupsarespectral,MFCC,cochleagramsandenergy.
IntheexperimentscarriedoutwithDominanceinScheme1 thebestgroupswereMFCC(0.72)andcochleagrams(0.702)and inScheme2LPC(0.7266)andcochleagrams(0.7244).InScheme 2thebestresultswereobtainedusingonlythefeaturesselected fromgroupLPC(0.7266),surpassingtheresultsobtainedusingall groups(0.7157).InScheme2thebestresultswerenotobtained usingallthefeaturesselected(0.7157)butonlythoseselectedby theselectiveapproach(0.726).ThebestresultforDominancewas obtainedwithScheme1usingallgroups(0.7437).Wecaninferthat inthecaseofDominancethemostimportantgroupsarespectral, MFCCandcochleagrams.
6.1. Selectivevs.bruteforce
Thefirststudiesonautomaticemotionrecognitionoftenopted foraselectivefeatureextractionapproachbasedonexpert knowl-edge,usually witha smallnumber offeatures. Today,withthe emergenceoftoolsthatallowustoextractalargenumberof fea-turesandtheavailabilityofmorecomputingpoweritiseasierto
Table8
Selectivevs.bruteforcefeatureextractionapproach–scheme1/scheme2.
Approach Total Selected Correlation Share Portion
Valence
Selective 368 46/27 0.48/0.56 74.19/38.57 12.50/7.34 Bruteforce 6552 16/43 0.57/0.55 25.81/61.43 0.24/0.68 Activation
Selective 368 56/37 0.79/0.79 93.33/36.28 15.22/10.05 Bruteforce 6552 4/65 0.76/0.80 6.67/63.72 0.06/1.03 Dominance
Selective 368 13/29 0.72/0.73 41.93/38.16 3.53/7.88 Bruteforce 6552 18/47 0.73/0.70 58.06/61.84 0.28/0.74
applyabruteforceapproach.Oneofthepointstoconsiderinthis paperistocomparetheselectiveandbruteforcefeatureextraction approachesandtoanalyzewhichonecouldbebetter.Furthermore, weareinterestedinvalidatingtheworkthatwehavedonebefore followingaselectiveapproach[21,22].
Table 8 shows a comparison between feature extraction approaches (selectivevs. brute force) as wellas a comparison betweenthefeatureselectionschemeshereproposed.Inall exper-iments, theselectiveand bruteforceextractionapproaches got correlation coefficients very similar with the exception of the experimentdonewithScheme1forValencewheretheresultswith bruteforce(0.5715)weremuchbetterthanwithselective(0.4799). However,theselectiveapproachshare(74.194)wasmuchhigher thanthebruteforceone(25.806)andthecorrelationusingboth groupswashigher(0.6189)thantheobtainedforeachgroup sepa-rately.Itisverydifficulttosaywhichfeatureextractionscheme is better since theyobtained similar resultsusing the selected featuresfromthesegroupsseparatelyandgenerallyyieldbetter resultsjoiningthemintooneset.
7. BasicemotionclassificationbasedonEmotionPrimitives
Havingidentifiedthemostrelevantacousticfeaturesfor esti-matingtheEmotionPrimitives,thenextstepistodeviseawayto usetheseestimationstodiscriminateemotionalstatesinpeople. Inthissection,wewanttostrengthenthefindingsinthefeatures studybydemonstratingthat,thecontinuousapproachcan actu-allyhelpustoimprovetheautomaticemotionclassificationfrom acousticfeaturesbasedonacontinuousemotionmodel.Section2
describesthedatabaseweareworkingon.ThecorpusIEMOCAP wasannotatedwithbasicemotionsandEmotionPrimitives.
The classification scheme used to perform the experiments reportedinthispaperisillustratedinFig.4.
1.Acousticfeaturesareextractedfromthespeechsamples,aswe havementioned,wetestedtwosetsoffeatures,theselectiveset andthebruteforceset.
2.Weappliedaprocedurecalledtransformation,whichconsistin replacingthebasicemotionslabelswithvaluescorresponding tolinguisticlabels,forexample,ifaninstanceislabeledasanger thatlabelisreplacedbythreelabels,oneforeachprimitive: •LowforValence
•HighforActivation •HighforDominance
ThislabelingwasdoneaccordingtoTable9.Sincethisdatabase hasamanualannotationofEmotionPrimitivesthose annota-tionswereusedtobuildthetable.Weobtainedthemeansofthe annotatedvaluesperclassfromtheannotationsoftheinstances filteredaccordingtotheprocessdescribedinSection4. 3.Afterextractingtheacousticfeaturesandperformingthe
trans-formationstep,wehadadatasetconsistingofacousticfeature vectors whose values to predict are the Emotion Primitives Valence, Activationand Dominance, thepossible values that
Table9
PrimitivevaluescalculatedaccordingtoannotationsmadebyIEMOCAPcorpus eval-uators.Thevaluesinitalicareconsideredlowandvaluesinboldareconsidered high.
Basicemotion Valence Activation Dominance
Anger 2 4 3.5
Happiness 4.42 3.35 2.58
Neutral 2.75 3 2.5
Sadness 2 2 1.75
theseprimitivescantakearehighorlow.Withthisinformation, threemodelsaretrainedtoestimatethevaluesofeach Emo-tionPrimitive.TobuildthesemodelsweusedSupportVector Machines.
4. To performtheclassification ofbasic emotionsweextracted acousticfeatures,andapplythethreemodelsobtainedinthe previousstep.Inthiswayyougetahighorlowvalueforeach oftheinstances.Withthesethreevaluesweconstructed vec-torswhoseattributesarethethreeprimitivesandthevalueto predictisthebasicemotionoftheoriginalinstance.
5.Finallyweappliedamachinelearningalgorithm(SVM)to con-structamodeltoassignanemotiontoeachinstance.
8. Basicemotionsclassificationresults
Table10showstheresultswhenclassifyingEmotionPrimitives accordingtohighandlowclassesasdescribedinSection7point3 oftheclassificationprocessdescription.
Table11showsacomparisonoftheresultsobtainedby apply-ingtheprocessdescribedinFig.4,usingadiscreteapproachfor
Table10
AccuracyinclassifyingEmotionPrimitivesbyhighandlowclasses.
EmotionPrimitive Accuracy
Valence 76.808
Activation 83.5411
Dominance 88.0299
Table11
RecallwhenclassifyingbasicemotionsfromEmotionPrimitives.
Recall Anger Happiness Neutral Sadness
Continuous 85.82 54.93 79.17 42.11 Discrete 74.92 43.52 75.96 67.10 Baseline 69.68 21.01 35.23 76.84
theclassificationofemotionsandthebaselineresultsforthe cor-puswithwhichweareworkingreportedin[37].Wecanseethatit hasachievedabestrecall(numberofcorrectlyclassifieddividedby numberoftotalitems)usingourmethodonfouremotions,anger, happinessandneutral,whereassadnesshasalowerperformance. TheseresultsshowthatbyestimatingEmotionPrimitiveswecan discriminatewithanadequateaccuracyprototypicalemotionsuch asanger,happinessandsadnessinadditiontotheneutralstate andinsomecasesevenimprovetheclassificationthatcanbedone directlybytrainingbasicemotionsmodelsfromacousticfeatures. However,itis veryimportanttoemphasize thatbyusing Emo-tionPrimitiveswecangeneralizethedetectionofemotionalstates inamuchbroadersensebecausewecandefinedifferentareasin thethree-dimensionalspacedependingontheemotionalstatesof interestaccordingtotheapplicationandtheemotionalstatesthat
arerequiredtobediscriminated.Forexample,anapplicationmight beinterestedinemotionalstatesclosetoanger,oremotionalstates withhighActivation,ornegativeValenceoranycombinationofthe threeprimitives.
9. Conclusions
Wecarriedoutastudyabouttheimportanceofdifferent acous-ticfeaturetypesfromacontinuousthree-dimensionalemotions modelpointofview.WeanalyzedeachEmotionPrimitive sepa-rately.Throughtheidentificationofthebestfeaturestheautomatic estimation ofEmotionPrimitivesbecomes moreaccurately and thustherecognitionandclassificationofpeople’semotionalstate improves.Toourknowledgetheimportanceofacousticfeatures hasnotbeenstudiedfromthisapproach.Wehavetakensomeideas usedinthestudyoftheimpactoffeaturesintheclassificationof discreteemotionalcategories,likeshareandportionmetrics,and wehave appliedthemtothecontinuousapproach. Wedivided our6920featuressetinto12groupsaccordingtotheiracoustic properties.Wecalculatedsomemetricsforeachgroupinorderto estimatetheirperformanceintheautomaticestimationof Emo-tionPrimitives(correlationcoefficient)andtheircontributionto thefinalsetoffeatures (shareand portion).Weworkedwitha databaseofactedemotions;labeledwiththetwomostimportant annotationschemes,continuousanddiscrete.Despitebeingacted thisdatabasewasdesignedtryingtomakeitleastartificialthrough theimprovisationofdialogues.
Weproposedtwofeatureselectionschemestakingintoaccount thatwehadmanyfeatures,butaverylimitedsetofinstances.We observedthatbothfeatureselectionschemesobtainedvery simi-larresultsandtheygenerallyagreeontheimportanceoffeature groups.
Themaincontributionofthispaperistheanalysisofacoustic features.Thisanalysiswasbasedmainlyonthecorrelationobtained bythemodelsgeneratedfromamachinelearningprocess(SVM). Werealizedthat spectralfeaturegroupsarevery importantfor thethreeprimitives.Accordingtoourresultsthemostimportant featuregroupsforeachEmotionPrimitiveare:
•Valence:MEL–MFCC–cochleagrams–LPC •Activation:MFCC– cochleagrams– energy– LPC •Dominance:MFCC–cochleagrams–LPC–MEL
ClearlyMFCC,LPCandcochleagramsgroupsareveryimportant toestimatethethreeEmotionPrimitives,sincetheyappearamong themostimportantforthethreeprimitives.Thesethreegroups belongtothespectralinformationcategory,wecanconcludefrom this factthat spectralanalysisis moreimportant thanprosodic andvoicequalityanalysisforEmotionPrimitiveestimation,except forActivation,whereenergyisalsoveryimportant.Cochleagrams groupisaninterestingfindingbecause,toourknowledge,ithas notbeenusedbeforeforemotionrecognition.
We observed that there were similar correlationresults for selectiveandbruteforcefeatureextractionapproacheswhenthe selectedfeaturesbelongingtothesesetsweretestedseparately althoughthebruteforcesetwasmuchlargerthattheselectiveset. Theportionfortheselectivewasalwaysmuchhigherthanportion forbruteforce,thistellusthattheselectivesethasfewerfeatures, butmoreimportantones.Thesharewasmorebalancedforboth approaches,tendingtobehigherforbruteforce.
We proposed a way for mapping Emotion Primitives into basic emotions and we performed a basic emotion classifica-tiontoshowthefeasibilityofapplyingthecontinuousapproach. Theinstance/attributeselectionprocessbasedonEmotion Prim-itives has been successful in the discrimination of emotions reflected in the classification of basic emotions, raising the
recall property for three out of four emotions studied in this work.
Interestingresultshavebeenobtainedinthispaper.However, a limitationis thatexperimentswere doneonan actedspeech database.It isknownthat there aremajordifferences between workingwithactedandrealdata.Anotherlimitationisthefact thattherecordingsweusedbelongonlytotwodifferentpeople. Thenexttaskwillbetovalidateourfindingswiththeextended versionoftheIEMOCAPdatabaseandwithotherdatabases, includ-ingdifferentlanguagesanddifferentrecordingconditions.Wealso plantorefinethemappingandclassificationscheme,tryingtotake betteradvantageofcontinuousemotionalmodel.
Acknowledgements
Theauthorswishtoexpresstheirgratitudeforthesupportgiven tocarryoutthisresearchtotheNationalCouncilofScienceand TechnologyofMexicothroughthepostgraduatescholarship49296 andtheproject106013.
References
[1] C.Vera-Mu ˜noz,L.Pastor-Sanz,G.Fico,M.Arredondo,AWearableEMG Moni-toringSystemforEmotionsAssessment,firsted.,SpringerPublishingCompany, Incorporated,2008.
[2]G.González,Bilingualcomputer-assistedpsychologicalassessment:an inno-vative approachfor screening depression in chicanos/latinos, Tech. rep., UniversityofMichigan,1999.
[3]F. Nasoz, K.Alvarez, L. Lisetti, N. Finkelstein,Emotion recognition from physiologicalsignalsusingwirelesssensorsforpresencetechnologies, Cog-nition,Technology&Work6(2004)4–14,URL:http://portal.acm.org/citation. cfm?id=1008243.1008250.
[4]L.Vidrascu,L.Devillers,Real-lifeemotionrepresentationanddetectionincall centersdata,in:ACII,2005,pp.739–746.
[5] M.S.Hussain,R.A.Calvo,Aframeworkformultimodalaffectrecognition,in: HCSNetPerceptionandActionWorkshop:ToolsandTechniquesforConducting EEGandMEGExperiments,2009,pp.1–8.
[6]M.Murugappan,M.Rizon,R.Nagarajan,S.Yaacob,D.Hazry,I.Zunaidi,4thKuala LumpurInternationalConferenceonBiomedicalEngineering,Berlin, Heidel-berg,2008.
[7]H.Schlosberg,Threedimensionsofemotion,PsychologicalReview61(2) (1954)81–88.
[8] C.Busso,M.Bulut,C.-C.Lee,A.Kazemzadeh,E.Mower,S.Kim,J.Chang,S. Lee,S.S.Narayanan,IEMOCAP:InteractiveEmotionalDyadicMotionCapture Database,JournalofLanguageResourcesandEvaluation42(4)(2008)335–359. [9]M.Lugger,B.Yang,Cascadedemotionclassificationviapsychologicalemotion dimensionsusinglargesetofvoicequalityparameters,in:International Con-ferenceonAcoustics,Speech,andSignalProcessing(ICASSP2008),Instituteof ElectricalandElectronicsEngineers,2008,pp.4945–4948.
[10]M.Wollmer,F.Eyben,B.Schuller,E.Douglas-Cowie,R.Cowie,Data-driven clus-teringinemotionalspaceforaffectrecognitionusingdiscriminativelytrained lstmnetworks,in:Interspeech2009,InternationalSpeechCommunication Association,2009,pp.1595–1598.
[11]F.Eyben,M.Wollmer,A.Graves,B.Schuller,E.Douglas-Cowie,R.Cowie,On-line emotionrecognitionina3-dactivation-valence-timecontinuumusingacoustic andlinguisticcues,JournalonMultimodalUserInterfaces3(1–2)(2010)7–19. [12]B.Xie,L.Chen,G.Chen,C.Chen,StatisticalFeatureSelectionforMandarin SpeechEmotionRecognition.LectureNotesinComputerScience,Springer, Berlin/Heidelberg,2005.
[13]M.Lugger,B.Yang,Anincrementalanalysisofdifferentfeaturegroupsin speakerindependentemotionrecognition,in:ProceedingsoftheInternational ConferenceonPhoneticSciences,2007,pp.2149–2152.
[14]A.Batliner,S.Steidl,B.Schuller,D.Seppi,T.Vogt,J.Wagner,L.Devillers,L. Vidrascu,V.Aharonson,L.Kessous,N.Amir,Whodunnit–searchingforthe mostimportantfeaturetypessignallingemotion-relateduserstatesinspeech, ComputerSpeechandLanguage25(1)(2010)4–28.
[15]S.Steidl,AutomaticClassificationofEmotion-RelatedUserStatesin Spon-taneousChildren’sSpeech,firsted.,LogosVerlag,2009,URL:http://www5. informatik.uni-erlangen.de/Forschung/Publikationen/2009/Steidl09-ACO.pdf. [16]F.Burkhardt,A.Paeschke,M.Rolfes,W.Sendlmeier,B.Weiss,Adatabaseof germanemotionalspeech,in:Interspeech2005,InternationalSpeech Com-municationAssociation,2005,pp.1517–1520.
[17]J.Montero,Estrategiasparalamejoradelanaturalidadylaincorporaciónde variedademocionalalaconversióntextoavozencastellano,Ph.D.thesis, UniversidadPolitécnicadeMadrid,2003.
[18] M.Grimm,K.Kroschel,S.Narayanan,Theveraammittaggermanaudio–visual emotionalspeechdatabase,in:ProceedingsoftheIEEEInternational Confer-enceonMultimediaandExpo(ICME2008),2008,pp.865–868.
[19]P.Boersma,Praat,asystemfordoingphoneticsbycomputer,GlotInternational 5(2001)341–345.
[20]F. Eyben, M. Wollmer, B. Schuller, Openear – introducing the munich open-sourceemotionandaffectrecognitiontoolkit,in:Proceedingsof4th InternationalHUMAINEAssociationConferenceonAffectiveComputingand IntelligentInteraction,2009,pp.1–6.
[21]H.Pérez-Espinosa,C.A.Reyes-García,Detectionofnegativeemotionalstatein speechwithanfisandgeneticalgorithms,in:WorshoponModelsand Anal-ysisofVocalEmissionsforBiomedicalApplications(MAVEBA2009),Firenze UniversityPress,2009,pp.25–28.
[22]H.Pérez-Espinosa,C.A.Reyes-García,L.Villase ˜norPineda,Featuresselection forprimitivesestimationonemotionalspeech,in:InternationalConferenceon Acoustics,Speech,andSignalProcessing(ICASSP2010),InstituteofElectrical andElectronicsEngineers,Dallas,Texas,2010,pp.5138–5141.
[23]K.Santiago,C.A.Reyes-García,M.Gomez,Conjuntosdifusostipo2aplicados alacomparacióndifusadepatronesparaclasificacióndellantodeinfantes conriesgoneurológico,Master’sthesis,INAOE,Tonantzintla,Puebla,México, 2009.
[24]A.L.Reyes,Unmétodoparalaidentificacióndellenguajehabladoutilizando informaciónsuprasegmental,Ph.D.thesis,INAOE,Tonantzintla,Puebla, Méx-ico,2007.
[25]T.Dubuisson,T.Dutoit,B.Gosselin,M.Remacle,Ontheuseofthecorrelation betweenacousticdescriptorsforthenormal/pathologicalvoices discrimina-tion,EURASIPJournalonAdvancesinSignalProcessing,AnalysisandSignal ProcessingofOesophagealandPathologicalVoices10.1155/2009/173967. [26]C.T.Ishi,H.Ishiguro,N.Hagita,Proposalofacousticmeasuresforautomatic
detectionofvocalfry,in:Interspeech2005,InternationalSpeech Communica-tionAssociation,2005,pp.481–484.
[27]R.Kehrein,Theprosodyofauthenticemotions,in:SpeechProsody2002, Inter-nationalConference,2002,pp.423–426.
[28]M.Lugger,B.Yang,Classificationofdifferentspeakinggroupsbymeansofvoice qualityparameters,ITG-FachtagungSprach-Kommunikation.
[29]B.F.Nu ˜nes,Evaluaciónperceptualdeladisfonía:correlaciónconlosparametros acústicosyfiabilidad,Actaotorrinolaringológicaespa ˜nola:Organooficialde laSociedadespa ˜noladeotorrinolaringologíaypatologíacérvico-facial55(6) (2004)282–287.
[30]A.B.Kandali,A.Routray,T.K.Basu,Vocalemotionrecognitioninfivenative languagesofassamusingnewwaveletfeatures,InternationalJournalofSpeech Technology12(1)(2009)1–13.
[31]W.Byrne,J.Robinson,S.Shamma,Theauditoryprocessingandrecognition ofspeech,in:ProceedingsoftheWorkshoponSpeechandNaturalLanguage, HLT’89,AssociationforComputationalLinguistics,Stroudsburg,PA,USA,1989, pp.325–331,URL:http://dx.doi.org/10.3115/1075434.1075490.
[32] T.Zbynik,J.Psutka,Speechproductionbasedonthemel-frequencycepstral coefficients,in:Eurospeech1999,InternationalSpeechCommunication Asso-ciation,1999,pp.2335–2338.
[33]P.Dumouchel,N.Dehak,R.Attabi,Y.B.NDehak,Cepstralandlong-termfeatures foremotionrecognition,in:Interspeech2009,InternationalSpeech Commu-nicationAssociation,2009.
[34] E.Bozkurt,Improvingautomaticemotionrecognitionfromspeechsignals,in: Interspeech2009,InternationalSpeechCommunicationAssociation,2009. [35] B.Vlasenko,B.Schuller,A.Wendemuth,G.Rigoll,Framevs.turn-level:
emo-tionrecognitionfromspeechconsideringstaticanddynamicprocessing,in: A.Paiva,R.Prada,R.Picard(Eds.),AffectiveComputingandIntelligent Interac-tion,LectureNotesinComputerScience,vol.4738,Springer,Berlin/Heidelberg, 2007,pp.139–147.
[36] O.Fujimura,K.Honda,H.Kawahara,Y.Konparu,M.Morise,J.C.Williams,Noh voicequality,LogopedPhoniatrVocol34(4)(2009)157–170.
[37]A.Metallinou,S.Lee,S.S.Narayanan,Decisionlevelcombinationofmultiple modalitiesforrecognitionandanalysisofemotionalexpression,in: Proceed-ingsoftheInternationalConferenceonAcoustics,Speech,andSignalProcessing (ICASSP2010), InstituteofElectricalandElectronics Engineers,2010,pp. 2462–2465.
[38]I.H.Witten, E.Frank,Data Mining:PracticalMachine LearningTools and Techniques with Java Implementations. (The Morgan Kaufmann Series in Data Management Systems), first ed.,Morgan Kaufmann, 1999, URL: http://www.worldcat.org/isbn/1558605525.
[39] P.Pudil,J.Novoviˇcová,J.Kittler,Floatingsearchmethodsinfeatureselection, PatternRecognitionLetters15(1994)1119–1125.
[40]M.Gutlein,E.Frank,M.Hall,A.Karwath,Large-scaleattributeselectionusing wrappers,in:IEEESymposiumonComputationalIntelligenceandDataMining 2009,InstituteofElectricalandElectronicsEngineers,2009,pp.332–339.