ContentslistsavailableatScienceDirect
Computer Methods and Programs in Biomedicine
journalhomepage:www.elsevier.com/locate/cmpb
Voiceprint and machine learning models for early detection of bulbar dysfunction in ALS R
Alberto Tena
a,b,∗, Francesc Clarià
b, Francesc Solsona
b, Mónica Povedano
ca CIMNE. Building C1, North Campus, UPC. Gran Capità, Barcelona 08034, Spain
b Department of Computer Science and Industrial Engineering, University of Lleida, Lleida, 25001 Spain
c Neurology Department, Hospital Universitari de Bellvitge, L’Hospitalet, Barcelona, Spain
a rt i c l e i nf o
Article history:
Received 6 April 2022 Revised 2 November 2022 Accepted 12 December 2022
Keywords:
ALS
Bulbar dysfunction Voice
Diagnosis Machine learning
a b s t r a c t
Backgroundand Objective:Bulbar dysfunctionis aterm usedinamyotrophiclateral sclerosis(ALS).It referstomotorneurondisabilityinthecorticobulbarareaofthebrainstemwhichleadstoadysfunction ofspeechandswallowing.Oneoftheearliestsymptomsofbulbardysfunctionisvoicedeteriorationchar- acterized bygrosslydefectivearticulation,extremelyslow laboriousspeech,markedhypernasality and severeharshness.Recently,researchefforts havefocusedonvoiceanalysistocapturethisdysfunction.
Themainaimofthispaperistoprovideanewmethodologytodiagnosethisdysfunctionautomatically atearlystagesofthedisease,earlierthanclinicianscando.
Methods:Thestudyfocusedonthecreationofavoiceprintconsistingofapatterngeneratedfromthe quasi-periodiccomponents ofasteadyportionofthefiveSpanishvowels andthe computationofthe fiveprincipalandindependentcomponentsofthispattern.Then,asetofstatisticallysignificantfeatures wasobtainedusingmultivariateanalysisofvarianceandtheoutcomesofthemostcommonsupervised classificationmodelswereobtained.
Results:Thebestmodel(randomforest)obtainedanaccuracy,sensitivityandspecificityof88.3%,85.0%
and 95.0%respectivelywhenclassifyingbulbarvs.control participantsbuttheresultsworsenedwhen classifyingbulbarvs.no-bulbarpatients (accuracy,sensitivityandspecificity of78.7%, 80.0%and77.5%
respectivelyforsupportvectormachines).Duetothegreatuncertaintyfoundintheannotatedcorpusof theALSpatientswithoutbulbarinvolvement,weusedasafesemi-supervisedsupportvectormachineto relabeltheALSparticipantsdiagnosedwithoutbulbarinvolvementasbulbarandno-bulbar.Theperfor- manceoftheresultsobtainedincreased,especiallywhenclassifyingbulbarandno-bulbarpatientsob- taininganaccuracy,sensitivityandspecificityof91.0%,83.3%and100.0%respectivelyforsupportvector machines.Thisdemonstratesthatourmodelcanimprovethediagnosisofbulbardysfunctioncompared notonlywithclinicians,butalsothemethodspublishedtodate.
Conclusions:Theresultsobtaineddemonstratetheefficiencyandapplicabilityofthemethodologypre- sentedinthispaper. It mayleadtothe developmentofacheap andeasy-to-usetool toidentifythis dysfunctioninearlystagesofthediseaseandmonitorprogress.
© 2022TheAuthors.PublishedbyElsevierB.V.
ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/)
R This work was supported by the Intelligent Energy Europe (IEE) programme and the Ministerio de Economía y Competitividad under contract TIN2017-84553-C2-2- R. FS is member of the research group 2017-SGR363, funded by the Generalitat de Catalunya.
∗Corresponding author.
E-mail addresses: [email protected] (A. Tena), [email protected] (F. Clarià), [email protected] (F. Solsona), [email protected] (M. Povedano) .
1. Introduction
Amyotrophiclateral sclerosis(ALS)is aneurodegenerative dis- easecharacterized by a progressivelossof bothupperandlower motor neuronsleading to muscular atrophy, paralysis and death.
Currently, there is no cure for ALS, although early detection can slowprogress[1].
ALSisknown asspinal (80%;limborspinalonset) andbulbar (20%;bulbaronset).Thefirstbulbarsymptomsappearearlyinthe diseaseinbulbarALS,butcanalsoappearinlaterstagesofspinal
https://doi.org/10.1016/j.cmpb.2022.107309
0169-2607/© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
ALS.Earlydetectionofbulbardysfunctionmaybethekeytoeffec- tively increasingpatients’ survival.However,diagnosing voicedis- turbances could be challenging due to the limitations of human hearing[2].
Several studies demonstrated that the voice is one of the most important aspects for detecting bulbar dysfunction. Norel etal.[3]developedmachine modelsforrecognizingthepresence and severity of ALS using a variety of frequency, spectral, and voicequalityfeatures.Anetal.[4]usedConvolutionalNeuralNet- works (CNNs) to classify the intelligible speech produced by pa- tients with ALS compared with healthy individuals. Wang et al.
[5] usedSupport VectorMachines(SVM) andNeuronalNetworks (NN) employingacousticfeatures andaddingarticulatory motion information(fromthetongueandlips).Suhas etal.[6]usedSVM andDeepNeuronalNetworks(DNNs)basedonmelfrequencycep- stralcoefficients(MFCCs)tocapturethisdysfunction.Recentstud- ies [7–9] demonstrated the feasibility of automatic detection of bulbardysfunctionthroughphonatory obtainedfromvowelutter- ance evenbefore itbecomesperceptibletohumanhearing.Great uncertainty in the annotated corpus ofthe ALS patientswithout bulbarinvolvementwasfound.In amorerecentstudy[10]time- frequency features were added improving the results. Although all methodsperformedwell ingeneral,thisperformance dropped significantly whendiagnosing bulbarinvolvementamongALS pa- tients. The aforementioned studyargued that themain causes of thisuncertaintywasasmallandwronglyannotatedcorpusofthe ALS patients withoutbulbarinvolvement. Thissuggeststhat sub- jectivemethodsemployed byclinicianscouldleadthemtomisdi- agnose thisdysfunction. This is coherent with the NEALS bulbar subcommittee,whichcallsforobjective-basedapproaches.
We conjecture that the diagnosisof ALS patientswith bulbar dysfunctionwouldgreatlybenefitfromthecreationofavoiceprint able to detect bulbar dysfunction in ALS before the first symp- tomscanbedetectedbyhumanhearing.Thiscouldbedoneeffec- tively by meansofanalyzing apatterngeneratedfromthequasi- periodic waveform produced by the vocalfolds when a vowelis elicited.Quasi-periodicwaveformanalysishasbeenappliedtosev- eral clinical applications such as heartbeat detection, cardiopul- monarymodelingandintrinsicbrainactivitydetection[11,12].Fur- thermore, performance could be increased by correctingthe bias aswell asenlargingthecorpusupsamplingit[13],andrelabeling bulbarandnon-bulbarALSpatientsbyusingsemi-supervisedclas- sifiers,aspointedoutin[14,15].
Ourobjective(andcontribution)istocreateamachine-learning model obtained by applying supervised andunsupervised classi- fiersandupsamplingtoimprovethecorpusfordiagnosingbulbar dysfunction.Thiswillbedonethroughthecreationofavoiceprint consistingof apatterngeneratedfromthe quasi-periodiccompo- nentsofasteadyportionofthefiveSpanish vowels,andthe five principalandindependentcomponentsofthispattern.Thismodel should behave properly with small and usually badly annotated corpus, the kind associated with rare diseases (i.e. ALS without bulbarinvolvement).
2. Methods
Themethodspresentedinthissectionwereimplementedanda syntheticdatasetbasedonarandomsampleofthecorpusisfreely availableonline[16].
2.1. Participants
Forty-fiveALSparticipants(26malesand19females)agedfrom 37 to 84 (M= 57.8 years, SD= 11.8 years) and 18 control sub- jects(9malesand9females)agedfrom21to68(M=45.2years, SD = 12.2years) took partinthisstudy. Allthe ALSparticipants
werediagnosedbyaneurologistandtoparticipateinthestudyit wasrequired that they were nativeSpanish speakers. No restric- tionwasestablishedrelatedtotheparticipantsbeingbilingualsas nativespeakersofother languages,such asCatalan,whichisvery commonintheCataloniaregionwherethestudyoccurred.
Bulbardysfunctionwasdiagnosedbyfollowingsubjectiveclin- ical approaches [17] and the neurologist made the diagnosis of whetheranALSpatienthadbulbardysfunction.AmongalltheALS participants,5reportedbulbaronsetand40spinalonset.However, atthetimeofthestudy,14ofthempresentedbulbarsymptoms.
Tosummarize,14ofthe63participantswereALSpatientsdiag- nosedwithbulbardysfunction(3malesand11females)agedfrom 38to84(M=56.8years,SD=12.3),31wereALSpatientsthatdid not presentthisdysfunction (23males and8females) agedfrom 37 to 81 (M = 58.3 years, SD= 11.7)and 18 were control sub- jects(9malesand9females)agedfrom21to68(M=45.2years, SD=12.2years).
2.2. Vowelrecording
SustainedsamplesoftheSpanishvowels,a,e,i, oandu,were elicitedundermedium vocalloudnessconditionsfor3-4seconds.
TherecordingsweremadeinaregularhospitalroomusinganUSB EMITAStreamingGXT252microphoneconnectedtoalaptopata samplingrateof44,100Hz.32-bitquantizationwasdoneusingAu- dicity, an open-source application. Eachindividual phonationwas cut out and anonymously labeled. The boundaries of the speech segmentsweredeterminedwithanoscillogramandaspectrogram usingthePraat manual[18],andwere audiblychecked.Thestart- ing point of the boundaries was established at the onset of the periodicenergyinthewaveform observedintheoscillogramand checked by the appearance of the formants in the spectrogram.
The endpoint was established at the endof the periodic oscilla- tionwhenamarkeddecreaseinamplitudeintheperiodicenergy wasobserved. It wasalso identified by the disappearance of the waveformintheoscillogramandtheformantsinthespectrogram.
2.3. Generatingthepatternofthequasi-periodiccomponentsofthe fiveSpanishvowels
Asample of250 ms ofeach vowel wasconsidered for analy- sisbytakingthemiddlepointatthecenterofthephonation.This fragment ofthe signal wasnormalized by centeringeach sample tohaveameanof0andscaledtohaveastandard deviationof1 (x(n)). Apatterngeneratorwasdevelopedtoobtaina patternse- quence ofthequasi-periodiccomponents ofthefundamentalfre- quencyofx(n)inspiredby[19].Thisprocessconsistedof3steps.
2.3.1. Detrendingmethod
Thebaselinewanderingofx(n),whichisalow-frequencyarte- factpresentinsignalrecordings,wasremovedbyimplementinga detrendingmethod.Toobtainthetrend,asix-orderlow-passBut- terworth filterwitha cutoff frequency of 0.0035Hzwas applied twice (forward and backward) to x(n). The combined filter had zerophasedistortion,afiltertransferfunctionequaltothesquared magnitudeoftheimplementedButterworthfiltertransferfunction, andafilterorderthatwasdoubletheorderoftheButterworthfil- ter.Then, the detrending signal xd(n) wasobtainedby removing thetrendfromx(n).Finally,eachsampleofxd(n)wascenteredto haveameanof0.Figure1ashowsx(n)andthetrendofx(n)and Fig.1bshowsx(n)andxd(n).
2.3.2. Markingthequasi-periodiccomponentsofx(n)
The spectral density
|
Xd(f)|
2 of xd(n) (Fig. 2) was obtained by means of the discrete Fourier transform (DFT) implementingFig. 1. Detrending method: Removing the trend from x (n ) .
Fig. 2. The spectral density of x d(n ) .
the fast Fouriertransform (FFT)algorithm. To identify thequasi- periods, the samples of
|
Xd(f)|
2 whose frequency was≥ 300 Hz were considered to identifythe peaksof the spectral density. To avoid noise, only the three highest peaks were selected. Finally, the quasi-periodofxd(n)wasdefinedasthe lowerspectral com- ponent ofthesethreepeaks (fr). The numberofsamplesofeach quasi-period(nrep)wascalculatedasthenearestintegerof(fs/ fr),fsbeingtherecordingsamplingrate.
The signal envelop xe(n) was obtainedby computingthe cu- mulative sum of xd(n) and then calculating the envelopeof the analyticalsignal.Figure3showsxe(n)andx(n).
Todetectthestartingandendingpointofeach quasi-period,a quasi-sinusoidalsignal,s(n),synchronizedwiththeperiodofx(n) wascomputed.Itwasobtainedbyapplyingasecond-orderButter- worth pass-bandfilterforwardandbackwardtoxe(n)withacut- off frequency fc= fs/nrep Hz. From s(n), a quadratic-bipolar sig- nal(q(n))wasgeneratedassigningaconstant-Ainthosesamples wheres(n)<0,andAinthosewheres(n)>0.Thus,bydifferenti- atingq(n),thezerocrossingsofthesynchronizedsignals(n)were obtained,whichrepresentthebeginningandendofeachquasipe- riod ofx(n).Figure4illustrates thisprocess.Figure4a represents
Fig. 3. The signal envelope of x (n ) .
x(n), s(n) and q(n) and Fig. 4b depicts the starting and ending pointsdetectedofeachquasi-periodofx(n).
Finally,thepatternfunction p(T)wasobtainedastheaverage ofthequasi-periodsofx(n),T beingtheaverageofthenumberof samplesofthequasi-periodofx(n).
2.3.3. Patternrefinement
p(T) wascompared with xd(n) to improve the boundariesof eachquasi-period.First,asanadaptedfilter,thepatternp(T)was invertedandtheresultingsignalwasconvolvedwithxd(n)tode- tectthepositions of p(T) inxd(n).The positivevalues ofthere- sultingsignalweretakenandthenegativevaluesweresetat0.
Eachquasi-perioddetectedpreviously wascenteredinthepo- sitionwherethemaximumsvaluesoftheconvolutionwerefound.
The refined pattern, pre f(T) (Fig. 5a)was computedas theaver- ageofthequasi-periodsofxd(n)withtheirnewboundariesestab- lished.
Finally, pre f(T)wasnormalizedto550samplesandthendeci- matedto110samplestoobtainpatterns,pN(T)(Fig.5b),withthe samelength.
Fig. 4. Detecting the starting and ending point of each quasi-period of x (n ) .
Fig. 5. Pattern refinement and normalization.
2.4. Principalandindependentcomponentanalysis
Principal Component Analysis (PCA) andIndependent Compo- nentAnalysis (ICA)havegreatpotentialinthetreatmentofmedi- calsignals[20].PCAisaclassicaltechniqueinstatisticaldataanal- ysis, feature extraction and data reduction, aiming at explaining observed signals asa linear combination of orthogonal principal components.ICAisatechniqueforarrayprocessinganddataanal- ysis, aimingatrecovering unobservedsignalsfromobservedmix- tures,exploitingonlytheassumptionofmutualindependencebe- tweenthesignals.
InthePCAofpN(T),fivePrincipalComponents(PCs)werecom- puted. The decompositionwasobtained asX=USV where X is pN(T) standardized, U is a unitary matrix and S is the diagonal matrixofsingularvaluessi.PCsweregivenbyUS andV contained thedirections inthisspacethat capturethemaximal variance of thematrixX.
In the ICAof pN(T), five independent components (ICs) were extracted by means of a reconstruction independent component analysis (RICA) algorithm [21]. pN(T) was standardized to have zeromeanandidentityco-variance.Themodelx=
μ
+Asismade up of the five rowsof matrix x representingthe patterns of thefive vowels with 110 samples for each pattern.
μ
is a constantrepresented bya columnvector offiverowsandsis amatrix in which eachrow(5)is an independentcomponentwith110sam- ples. A(5x5) isthe mixing matrix.Once the modelhas been ob- tained,thefiveindependentcomponentscomputedwereusedfor theanalysis.
2.5. Featuresobtainedforanalysis
Atotalof70featureswereobtainedasfollow:
• Threeentropymeasurements pereachvowelwereobtainedby meansoftheShannonentropy:
• Fromtheprobabilitydensityfunctionofasignal,formedby thepattern pN(T) repeatedthe numberofperiods of x(n) andquantified withN=2q levelsandq=8.Thismeasure- mentwascodedasentPat1...entPat5.
• The density function of the five PCs was normalized and quantifiedwithN=2q levelsandq=6.Then,theShannon entropy ofthe probability density function of thefive PCs wasobtainedandcodedasentPC1...entPC5.
• Similarly, to compute the Shannon entropyof the fiveICs, the density function of the five ICs was normalized and quantified with N=2q levels and q=6 and the Shannon entropyofeachICwasobtained.Theresultswerecodedas entIC1...entIC5.
• ThevarianceofasignalformedbythepatternpN(T)repeated the number of periods of x(n) wascomputed and coded for eachvowelasvar1...var5.
• TheKurtosisisdefinedasameasureofoutlier-prone.Itiscal- culatedfromthedistributionofasignalformedbythepattern pN(T)repeating the numberof periodsof x(n),andcoded as kurt1...kurt5. The bias-corrected equation definedin [21] was appliedtoobtaintheKurtosis.
• The rhythmvariability,RR(n),ofx(n)wascomputedby firstly calculating the differences (in seconds) between the quasi- periodic elementsofx(n)anddividingthe resultby thesam- plingfrequency(44.100Hz).Finally,RR(n)wasobtainedbyre- ducing the sampling to f r=350. Thus, RR(n) is a signal re- sampledto f r=350(Tr=0.0029 s)withabandwidth of175 Hz.Themeanandthestandard deviationofRR(mean_RRand std_RR) were then computed and coded for each vowel as medRR1...medRR5anddsvRR1...dsvRR5respectively.
• The spectrum of P_N(f) was obtained from the positive and normalizedpartoftheFFToftheautocorrelationofpN(T).The meanfrequencyofP_N(f)(fmEsPat)wascomputedaccording toEq.(1)inthefrequencyband0Hzto2205Hzandcodedfor eachvowelasfmEsPat1...fmEsPat5.
fmEsPat=
fP_N
(
f)
df (1)• Similarly, themean frequencyof theprobability densityfunc- tion of the five PCs was computed and coded as fmE- sPC1...fmEsPC5.
• Finally,theaveragespectralenergywascalculatedastheinte- gralofP_N(f).Theaveragespectralenergywascomputedand normalizedto1for5frequencybandsofthetotalspectrum(0- 4,410Hz):1,0-250Hz;2,250-750Hz;3,750-1500Hz;4,1500- 2500 Hz; 5,2500-4,410Hz. These measurements forthe five patternsandthefivebandsofeachpatternwerecodedforeach vowelasenBnEs_a1...enBnEs_a5andenBnEs_u1...enBnEs_u1,re- spectively.
2.6. Classificationmodels
FivesupervisedclassificationmodelswereimplementedinRto measuretheclassificationperformance.ThesemodelsareRandom
Forest (RF), Logistic Regression (LR), Linear DiscriminantAnalysis (LDA),NeuralNetworks(NN) andSupportVectorMachine (SVM).
The classification models were fitted with the features selected.
Thesewere standardizedbysubtractingthemeanandcenteredat 0.10-fold cross-validation wasimplemented inR using the caret package [22] to draw suitable conclusions. The upsamplingtech- niquewithreplacementwasapplied tothetrainingdataby mak- ing the group distributions equal to deal with the unbalanced dataset that could bias the classification models [13]. Supervised modelswithclassificationthresholdsof50%werebuilt.Theclassi- ficationthresholdisavaluethatconvertstheresultofaquantita- tivetestintoasimplebinarydecisionbytreatingthevaluesabove orequaltothethresholdaspositive,andthosebelowasnegative.
Inaddition,thesemi-supervisedclassificationmodelS4VMwas implementedusingtheRSSLpackage[23].S4VMreturnspredicted labelsforunlabeledinstances.Itrandomlygeneratesmultiplelow- densityseparators andmerges their predictionsby solving a lin- earprogrammingproblemmeanttopenalizethecostofdecreasing theperformanceoftheclassifier,comparedtothesupervisedSVM [24].As forSVM,a linearkernelwasused,andtheregularization parameterCforlabeledandunlabeleddatawassetat0.05.
2.7. Featureselection
Toselectasubset ofrelevantfeatures foruseinthe construc- tionoftheclassification model,theMultivariantAnalysisofVari- ance (MANOVA),whichuses thecovariancebetweenthefeatures intestingthestatistical significanceofthemean differences,was implementedinIBMSPSSStatistics.Byusingthisprocedure,itwas possibletocontrastthenullhypothesisinthefeaturesobtained.
Toperformthisstatisticalanalysis,itwasassumedthatthefea- tureshada multivariablenormaldistributionandnoassumptions weremaderegardingthehomogeneityofthevariance orthecor- relationbetweenthefeatures.Asignificancevalueofp<0.05was consideredsufficienttoassumetheexistenceoffeaturedifferences betweenthefourgroupsanalyzed.
2.8. Experiments
The participants in this study belonged to three different groups: thecontrol group with18 participants, labeled asC, the groupwith14ALSparticipantsdiagnosedwithbulbardysfunction, labeled B,andthe group with31 ALS participants not diagnosed withbulbardysfunction,labeled NB. Inaddition,the Alabelwas added to every ALS participant, withor withoutbulbar dysfunc- tion.
Threeexperimentswereperformedwiththesegroups:
1. Performance evaluation of the supervised models for 4 cases (Cvs.B, Cvs. NB, Bvs. NBandC vs. A)by usingthe original corpus.
2. Re-labelingoftheNBparticipantsasB’andC’byapplyingthe semi-supervisedS4VM algorithm. Thus, theNB group wasre- moved.
3. Re-evaluationofthemodelperformance withfournewgroups ofparticipants:Cvs.B+(B+B’),Cvs.NB-(NB-B’),B+vs.NB- andC+(C+C’)vs.B+.
Thefirstexperimentobtainedtheoutcomesofthemodelsus- ingtheoriginalcorpus.Next,duetothegreatuncertaintyfoundin the ALS participantsdiagnosed without bulbarinvolvement [2,9], itwasintended tore-labelthe participantsofthe NBgroup asB andCusingthesemi-supervisedS4VMmodel.Wetriedtoobtaina newcorpusthatcontainselementsclassifiedasbulbar(B’)orcon- trol(C’)byS4VMamongthosewhowerepreviouslydiagnosedas non-bulbarbyaclinician(NB).Inthethirdexperiment,themodels outcomeswereagainobtainedbychangingthecompositionofthe
BandNBgroupsbyaddingB’toB(B+)andremovingC’fromNB (NB-).
2.9. Performancemetrics
Thereareseveralmetricsforevaluatingclassificationalgorithms [25]. The Accuracy, Sensitivity and Specificity metrics, the most popularones,wereusedtoevaluatetheperformanceoftheclassi- ficationmodels.
3. Results
Firstly, thevoiceprintrepresentationsandthefeaturesselected inrelationtothefourcases(CvsB,CvsNB,BvsNBandCvsA) are presented.Then, theperformance ofthe classificationmodels isevaluated.
3.1. VoiceprintrepresentationstodetectbulbardysfunctioninALS
ThevoiceprintfordetectingbulbardysfunctioninALSconsisted ofthecomputationofpN(T),the5PCsof pN(T)andthe5ICsof pN(T)forthe5Spanishvowels.
Figure 6 shows the voiceprint computed of a ALS patient.
Figure 6(a)showsthe pN(T)ofthefiveSpanishVowels(a, e,i,o, u). Figure6(b)showsthe5PCsofpN(T)ofthefiveSpanishVow- els.Figure6(c)showsthe5ICsof pN(T)ofthefiveSpanishVow- els. Figure 6(d)showsthe spectrum of pN(T)of thefive Spanish Vowels.Figure 6(e)showsthespectrum ofthe5PCs of pN(T) of the fiveSpanish Vowels.Figure 6(f)showstheprobability density functionofthe5ICsofpN(T)ofthefiveSpanishVowels.
3.2. Featuresselected
Atotalof70featureswereobtained.TheMANOVAanalysiswas appliedtoselectthestatisticallysignificantfeatures(p-value<0.05) for the four comparisons analyzed: C vs. B, C vs. NB, B vs. NB and C vs. A. The features not showingstatistical significance (p- value≥0.05)were discarded.The boxplots ofthestatisticallysig- nificantfeaturesaredepictedinFig.7.
In the case C vs B, a set of 19 statistically significant fea- tures(p-value<0.05)wereobtained.TheseweremedRR2,medRR3, medRR5, fmEsPat1, enBnEs_a3, enBnEs_e4, enBnEs_e5, enBnEs_i5, enBnEs_o5, entPat2, entPat3, entPat4, entPC1, entPC2, entPC4, entPC5,entIC2,entIC3andentIC4.
In thecaseC vsNB,aset of2statisticallysignificant features wereobtained.ThesewereenBnEs_o4andenBnEs_o5.
In the case B vs NB, a set of 20 statistically significant fea- tures were obtained. These were medRR1, medRR2, medRR3, medRR4, enBnEs_e4,enBnEs_e5, enBnEs_i3, entPat1,entPat2,ent- Pat4, entPC1, entPC2, entPC3,entPC4, entPC5, entIC1,entIC2, en- tIC3,entIC4andentIC5.
In the case C vs A, a set of 3 statistically significant features wereobtained.ThesewerefmEsPat1,enBnEs_o4andenBnEs_o5.
3.3. Classificationmodelperformanceandexperiments
In the first experiment, the classification models were fitted withthe features selectedper each case.Table 1showsthe clas- sificationperformance (Accuracy,SensitivityandSpecificitymetrics) oftheclassification modelstestedforthefourcasesdefinedwith theoriginallabels(B,NBandC).
In thecaseCvs. B,the resultsindicatethat RFandSVM have a good classification performance. RF obtained thebest Accuracy, 88.3%, witha Sensitivityof 85.0% and a Specificity of95.0%. SVM obtained an Accuracy of 86.5% with a Sensitivity of 88.3% and a Specificityof85.0%.NN,LRandLDAshowedapoorerperformance,
Table 1
Model performance for the first experiment.
C vs B C vs NB B vs NB C vs A
RF Accuracy 88.3 48.7 66.7 68.6
Sensitivity 85.0 46.7 40.0 84.0
Specificity 95.0 50.0 75.8 30.0
LR Accuracy 56.7 62.0 60.2 65.0
Sensitivity 45.0 68.3 60.0 66.0
Specificity 65.0 50.0 59.2 65.0
LDA Accuracy 54.2 62.0 54.0 65.2
Sensitivity 45.0 68.3 60.0 68.5
Specificity 60.0 50.0 51.7 60.0
NN Accuracy 66.7 59.2 66.3 60.5
Sensitivity 70.0 63.3 60.0 64.0
Specificity 65.0 50.0 68.3 55.0
SVM Accuracy 86.5 68.0 78.7 73.1
Sensitivity 88.3 71.7 80.0 79.0
Specificity 85.0 60.0 77.5 60.0
Table 2
Model performance for the third experiment.
C vs B + C vs NB- B + vs NB- C + vs B+
RF Accuracy 93.5 69.3 89.7 92.4
Sensitivity 96.6 66.7 83.3 83.3
Specificity 90.0 70.0 96.7 97.5
LR Accuracy 56.0 69.2 71.7 71.4
Sensitivity 53.3 73.3 76.7 66.7
Specificity 60.0 60.0 65.0 75.0
LDA Accuracy 62.9 69.2 82.3 80.9
Sensitivity 60.0 73.3 76.7 78.3
Specificity 65.0 60.0 88.3 82.5
NN Accuracy 75.9 68.7 86.9 82.6
Sensitivity 80.0 80.0 83.3 83.3
Specificity 70.0 50.0 91.6 82.5
SVM Accuracy 89.2 69.6 91.0 92.1
Sensitivity 90.1 73.3 83.3 86.7
Specificity 90.0 60.0 100 95.0
obtaining respectiveAccuracies of66.7%,56.7%and 54.2% respec- tively.
InthecasesCvs.NB,Bvs.NBandCvs.A,poorerresultswere obtained.InallthesecasesSVMobtainedthebestAccuracy,these being68.0%,78.7%and73.1%respectively.
Inthesecondexperiment, S4VMwasappliedfromthedatala- beledCandBtoestimatetheclassofNBswhichweresplitintoC’
andB’.Fromthetotalof31NBs,9weresplitasB’and22asC’.
In the third experiment, the classification models were fitted withthefeaturesselectedandtestedforthefournewcases(Cvs.
B+,C vs. NB-,B+vs. NB-andC+ vs. B+).Table2 showstheclas- sificationperformance(Accuracy,SensitivityandSpecificitymetrics) forthisexperiment.
In the caseof C vs. B+, the results indicate that RF and SVM havegood classification performance. RF obtainedthe best Accu- racy, 93.5%,with a Sensitivityof 96.6% anda Specificity of 90.0%. SVMobtainedanAccuracyof89.2%withaSensitivityof90.1%and a Specificity of 90.0%. NN obtained an Accuracy of 75.9% with a Sensitivityof80.0%andaSpecificity of70.0%.LR andLDAshowed poorer performance, obtaining Accuracies of 56.0% and62.9% re- spectively.
IntheB+vs.NB-andC+vs.B+cases,goodmodelclassification performancewasalsoobserved.
In B+vs. NB-, SVM obtainedthe best Accuracy,91.0%,with a Sensitivityof83.3%andaSpecificityof100.0%.RFobtainedan Ac- curacyof89.7%withaSensitivityof83.3%andaSpecificityof96.7%. NNobtainedanAccuracyof86.9%withaSensitivityof83.3%anda Specificityof91.6%.LDAobtainedanAccuracyof82.3%withaSensi- tivityof76.7%andaSpecificityof88.3%.Finally,LR performedthe
Fig. 6. Voiceprint for a patient.
Fig. 7. Box plots of the statistically significant features per each case.
worst performance withan Accuracy,71.7%,a Sensitivityof 76.7%
andaSpecificityof65.0%.
InC+vs. B+,RF obtainedthebestAccuracy,92.4%,withaSen- sitivity of83.3%andaSpecificity of97.5%.SVMobtainedan Accu- racyof92.1%withaSensitivityof86.7%andaSpecificityof95.0%. NNobtainedanAccuracyof82.6%withaSensitivityof83.3%anda Specificityof82.5%.LDAobtainedanAccuracyof80.9%withaSen- sitivityof78.3%andaSpecificityof82.5%.Finally,LRhadtheworst performance withanAccuracy of71.4%,aSensitivityof66.7% and aSpecificityof75.0%.
Finally, in C vs NB- poorer results were obtained. All models showed similar performance. SVM, RF,LR, LDA and NN obtained Accuraciesof69.6%,69.3%,69.2%,69.2%and68.7%.
4. Discussion 4.1. Principalfindings
Wehavecarriedouta preliminaryassessmentofthepotential forobtainingavoiceprintforanearlydetectionofbulbardysfunc- tion inALS patients. This wasmotivatedby the need fora stan- dardizeddiagnosticprocedureforassessingbulbardysfunctionand newmethodologiesbasedonobjectivemeasurements[2].
Thestudydemonstratedthefeasibilityofthemethodologypro- posed.Itsmajorbenefitistoprovideamethodologybasedonob- jective measures to identifybulbardysfunction in early stagesof theALSdisease.Wesuggesttwonewlabels,C’andB’,toimprove
the diagnosisof those patients in whom bulbar dysfunction has notyetbeendetectedbythecurrentsubjectiveprocedures.
From the voiceprint, a total of 70 features were obtained.
These were: entPat1...entPat5, entPC1...entPC5, entIC...entIC5, var1...var5, kurt1...kurt5, medRR1...medRR5, dsvRR1...dsvRR5, fmEsPat1...fmEsPat5, fmEsPC1...fmEsPC5, enBnEs_a1...enBnEs_a5, enBnEs_e1...enBnEs_e5, enBnEs_i1...enBnEs_i5, enBnEs_o1...
enBnEs_o5andenBnEs_u1...enBnEs_u5.
The first experimentshowed the performance ofthe machine learning models used for the four cases. From the good results achievedby Cvs. B,itcanbe inferredthat themethodologypro- posedis goodatdetecting bulbardysfunction,RFandSVM being the best models for performing this task. The poor performance obtained in C vs. NB revealeda similar voice performance of Cs and NBs,asexpected. Instead, theperformance obtainedinB vs.
NB mayindicatethat someNBs voicescould be affectedinsome NBsbutthismayyetnotbeperceptibletohumanhearing.
The second experimentrevealed that 9of thetotal of31 NBs may have bulbar dysfunction. This result is consistent with the previous statement that indicated that the voices of some NBs couldbeaffected.WesuggestlabellingthesepatientsasB’iftheir voicesshowasimilarperformance toBsandC’iftheyaresimilar toC.
Thethirdexperimentperformedbetterthanthefirstonewhen B’andC’labelswereconsidered.InCvsB+,RF outperformedthe results obtained in the first experiment. Similarly, inB+ vs. NB-, theclassificationperformancewasgreatlyimproved.
In general, the third experiment achievedbetter performance than the first one. This indicates that our method can diagnose bulbardysfunction betterthanclinicianswiththecurrentsubjec- tiveapproaches.
4.2. Comparisonwithpriorwork
ThisstudyisconsistentwithTenaetal.[9],whichfoundagreat uncertainty inALS patients inwhom bulbardysfunction wasnot detected yetsuggestingthat some ofthemwere misdiagnosed.It isalsoconsistentwithPlowmanetal.[2],whichindicatedthedif- ficultiesindiagnosingbulbardysfunctionbysubjectiveapproaches.
Inmanycases,theperturbanceinthosesubjects’voicescouldnot beappreciatedbythehumanearuntiladvancedstagesofthedis- ease.Wewentastepfurtherby providingtwo newlabels,B’and C’,toachieveanearlierandmoreaccuratediagnosis.
This study isalso in line withSuhas et al., Vashkevich etal., Norel etal.,An etal.andWangetal.[3–6,8],who demonstrated that voiceisone ofthemostimportantaspects fordetecting bul- bar dysfunction. Suhas et al. obtained accuracies of 92.2% using SVMandDNNsbasedonMFCCs.Vashkevich etal.achieved90.7%
accuracy (sensitivity86.7%,specificity92.2%) withLDAusingper- turbation measurements. Norel et al. identified acoustic features in naturalistic contexts, achieving 83% accuracy (sensitivity 86%, specificity 78%) usingSVM. An et al.implemented CNNs to clas- sify the intelligible speech produced by patients with ALS and healthy individuals.Theexperimental resultsindicated asensitiv- ityof76.9%andaspecificityof92.3%.Wangetal.[5]implemented SVM andNNusingacoustic features andadding articulatory mo- tioninformation(fromtongue andlips).Accuraciesof91.7%were obtainedusingonlyacousticfeatures,increasingto96.5%withthe additionofbothlipandtonguedata.Addingmotionmeasures in- creased the classifier accuracy significantly atthe expense of in- cludingmoreinvasivemeasurementstoobtainthedata.
ThesestudiesonlyfocusedonBvs.Ccases.Weinvestigatedthe meansofoptimizingaccuracyindetectingALS bulbardysfunction byonlyanalyzingthevoicesofpatients.
Todate,onlyTenaetal.[9,10]haveconductedstudiesconsider- ing additional cases. In [9], they used phonatory subsystem fea-
tures, such as jitter, shimmer, harmonic-to-noise ratio and pitch andPCA. In [10]a set of time-frequency features were added to theanalysisandbetterresultswereachieved.Theperformanceof severalmachinelearningmodels were evaluatedconsidering four scenarios (C vs. B, C vs. NB, B vs. NB and C vs. A). In C vs. B, theyobtainedanAccuracyof98.1%witha Sensiti
v
ityof96.6%and Speci ficity of 100.0% using SVM. The results worsened in B vs.NB(Accuracyof 84.8% with Sensiti
v
ity of92.3% andSpeci ficityof 75.0% using RF). In C vs. NB andC vs. A, goodresults were also obtained,RFbeingthemodelwhichperformedbestinbothcases (Accuraciesof 94.1%and 95.8%respectively).Inthisstudy,inCvs.B,an Accuracy of88.3% wasobtained forRF witha Sensiti
v
ityof 85.0%andaSpeci ficityof 95.0%.Thisperformance improvedwhen considering B’ patients, C vs. B+,obtaining an Accuracy of93.5%outperforming the results of [3–5]. In B vs. NB, we obtained an Accuracyof 78.7% (SVM).This performance was greatly improved whenconsideringB’patients,B+vs. NB-,obtaininganAccuracyof 91.0%with a Sensiti
v
ity of83.3% anda Speci ficityof 100.0% out- performingtheresultsobtainedby[9,10].Thissuggeststhathaving well-annotated patients isessential forproperly assessing bulbar dysfunction in B vs. NB. We demonstrated that semi-supervised classificationmodelssuchasS4VMareusefultoolsforperforming thistask.4.3. Limitations
Thesizeandbiasofthisstudyisheavilyinfluencedbythefact thatALSisarareandaveryheterogeneousdiseasewherenot all thepatientspresentthe samesymptomatology. Althoughupsam- plingandsemi-supervised classifiertechniques wereused tocor- rectthebias,itwouldbenecessarytoincreasethenumberofpar- ticipantstodrawdefinitiveconclusions.
However, we proved that the method presented can be suc- cessfully appliedto such acorpus. Thequestionis whatthe out- comesshould be whenapplyingitto alarge enoughcorpus. The outcomesindicatethataccuracycouldincreasemuchmore.Aspe- cific studyshould be performed to determine the extent of this increase.
Furthermore, detecting bulbar dysfunction inevitably reflects the clinician’s own learned approach, which must depend on a similar comparative dataset analysis, gainedby so-called clinical experience.Assuch therearehiddenvariables; forexample,den- talhealthorabsentteeth,glossaldisorders,respiratorydysfunction altering breath holding and rhythm of breathing, nasal obstruc- tion, nasal-sinus mucusdischarge, etc. When a machine learning paradigmisappliedtodetectingbulbardysfunction,butitalsoap- plies to themajority of the clinical problems,these hiddenvari- ablescouldleadtodetectingbulbardysfunctionwhen,actually,the speechdisorder isproduced by alternative disorders. Thishasan effectin specificitybecause subjectsaffected bythese alternative disorders could havebeen predictedas sufferingbulbar dysfunc- tionwhen actuallythey wouldnot besuffering fromthedisease.
Althoughthisisalimitation,theALS subjectswhoparticipatedin thisstudywerediagnosedandselectedbyaneurologist,bounding thepossibleeffectsofthisphenomenon.
Additionally,concerningtheacousticanalysisemployedforthe detection of the voice impairment, neurologists are accustomed tousingcertaindifficultphraseswhentestingspeechquality,and theseincluderequirementsforrhythmiccadence,constancyofut- teranceforlongsounds,repetitiveconsonantlydrivingutterances, as well assimple vowels. In this study, the five Spanish vowels wereanalysedratherthananyconsonantsorothersoundsexisting intheSpanishlanguage.Therefore,theanalysisisnotstandardized totheextentthatitisnot inlinewithaclassicalclinicalanalysis.
AdditionalanalysisareenvisagedtoanalyseotherSpanishexisting soundstoobtainthemoreaccuratevoiceprintaspossible.
4.4. Conclusions
Promising outcomesin detecting bulbar dysfunctionwere ob- tained when comparing ALS patients withandwithout this dys- functioninearlystagesofthedisease,orpriortobeingdiagnosed byclinicians.
This could lead to the development of a cheap and simple tool that mayhelp todevelop standardizeddiagnostic procedures forassessingbulbardysfunctionbasedonobjectivemeasures and monitor progress. This directly addresses a recent statement re- leasedby theNEALSbulbarsubcommitteeregardingthe needfor objective-basedapproaches[2].
Due to the great uncertainty of the corpus, we highlight the importanceofimprovingtheannotationofALSpatientsasregards bulbar dysfunctionto develop powerfulmachine learningmodels able to distinguish this dysfunction. We provide two new labels, C’andB’,anddemonstratethatSemi-SupervisedMachinelearning models could help inthe earlydetection ofthisdysfunction.Yet, furtheranalysesareneededtodevelopthisconceptfully.Thesein- cludeperforminglongitudinalstudies inwhichpatients’ diagnosis areretrievedatseveralfollow-ups.
The usefulness of this methodology is that it could be ap- pliedto theautomatedidentificationandearlydiagnosisofmany other neurologicalorrespiratoryillnesseswhereobtainingalarge enoughandwell-annotatedcorpusisdifficult.
DeclarationofCompetingInterest Authorsdeclarenoconflictsofinterest.
Acknowledgments
This work was approved by the Research Ethics Committee for Biomedical Research Projects (CEIm) at the Bellvitge Uni- versity Hospital in Barcelona and was supported by the Min- isterio de Economía y Competitividad (TIN2017-84553-C2-2-R) and the Ministerio de Ciencia e Innovacion (PID2020-113614RB- C22). AT is a member of CIMNE, a Severo Ochoa Centre of Ex- cellence (2019-2023) under grant CEX2018-000797-S, funded by MCIN/AEI/10.13039/501100011033. The Neurology Department of theBellvitgeUniversityHospitalinBarcelonapermittedtherecord- ing of the voices of the participants in its facilities. The clini- cal records wereprovided by CarlosAugustoSalazar Talavera. Dr.
MartaFullaandMariaCarmenMajosBellmuntcontributedadvice abouttheprocessofelicitingthesounds.
References
[1] C. Carmona-Duarte, et al., Study of several parameters for the detection of amyotrophic lateral sclerosis from articulatory movement, Loquens 4 (2017), doi: 10.3989/loquens.2017.038 .
[2] E.K. Plowman, L.C. Tabor, J. Wymer, G. Pattee, The evaluation of bulbar dys- function in amyotrophic lateral sclerosis: survey of clinical practice patterns in the United States, Amyotroph. Lateral Scler. Frontotemporal Degener. 18 (5–6) (2017) 351–357, doi: 10.1080/21678421.2017.1313868 .
[3] R. Norel, M. Pietrowicz, C. Agurto, S. Rishoni, G. Cecchi, Detection of amy- otrophic lateral sclerosis (ALS) via acoustic analysis, bioRxiv (2018), doi: 10.
1101/383414 .
[4] K. An, et al., Automatic early detection of amyotrophic lateral sclerosis from intelligible speech using convolutional neural networks, in: Interspeech, 2018, pp. 1913–1917 .
[5] J. Wang, et al., Towards automatic detection of amyotrophic lateral sclerosis from speech acoustic and articulatory samples, INTERSPEECH, 2016 . [6] B.N. Suhas, et al., Comparison of speech tasks and recording devices for voice
based automatic classification of healthy subjects and patients with amy- otrophic lateral sclerosis, in: Proc. Interspeech 2019, 2019, pp. 4564–4568, doi: 10.21437/Interspeech.2019-1285 .
[7] R. Chiaramonte, C. Luciano, I. Chiaramonte, A. Serra, M. Bonfiglio, Multi- disciplinary clinical protocol for the diagnosis of bulbar amyotrophic lateral sclerosis, Acta Otorrinolaringologica (English Edition) 70 (2019) 25–31, doi: 10.
1016/j.otoeng.2017.12.010 .
[8] M. Vashkevich, A. Petrovsky, Y. Rushkevich, Bulbar ALS detection based on analysis of voice perturbation and vibrato, in: 2019 Signal Process- ing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2019, pp. 267–272 .
[9] A. Tena, F. Claria, F. Solsona, E. Meister, M. Povedano, Detection of bulbar in- volvement in patients with amyotrophic lateral sclerosis by machine learn- ing voice analysis: diagnostic decision support development study, JMIR Med.
Inform. 9 (3) (2021) e21331, doi: 10.2196/21331 . http://www.ncbi.nlm.nih.gov/
pubmed/33688838
[10] A. Tena, F. Clarià, F. Solsona, M. Povedano, Detecting bulbar involvement in patients with amyotrophic lateral sclerosis based on phonatory and time- frequency features, Sensors 22 (3) (2022), doi: 10.3390/s22031137 . https://
www.mdpi.com/1424-8220/22/3/1137
[11] B. Yousefi, J. Shin, E.H. Schumacher, S.D. Keilholz, Quasi-periodic patterns of intrinsic brain activity in individuals and their relationship to global signal, NeuroImage 167 (2018) 297–308, doi: 10.1016/j.neuroimage.2017.11.043 . https:
//www.sciencedirect.com/science/article/pii/S1053811917309771
[12] D. Obeid, S. Sadek, G. Zaharia, G.E. Zein, Touch-less heartbeat detection and cardiopulmonary modeling, in: 2009 2nd International Symposium on Ap- plied Sciences in Biomedical and Communication Technologies, 2009, pp. 1–5, doi: 10.1109/ISABEL.2009.5373616 .
[13] M. Kuhn, K. Johnson, Applied Predictive Modeling, Springer, 2013, doi: 10.1007/
978- 1- 4614- 6 84 9-3 .
[14] K.N. Batmanghelich, D.H. Ye, K.M. Pohl, B. Taskar, C. Davatzikos, ADNI, Disease classification and prediction via semi-supervised dimensionality reduction, in:
2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2011, pp. 1086–1090, doi: 10.1109/ISBI.2011.5872590 .
[15] E. Adeli, K.-H. Thung, L. An, G. Wu, F. Shi, T. Wang, D. Shen, Semi-supervised discriminative classification robust to sample-outliers and feature-noises, IEEE Trans. Pattern Anal. Mach.Intell. 41 (2) (2019) 515–522, doi: 10.1109/TPAMI.
2018.2794470 .
[16] A. Tena, ALS models and data repository, ( https://github.com/atenad/ALS . Date accessed: November 9, 2021.).
[17] G.L. Pattee, Others, Provisional best practices guidelines for the evaluation of bulbar dysfunction in amyotrophic lateral sclerosis, Muscle Nerve 59 (5) (2019) 531–536, doi: 10.1002/mus.26408 .
[18] P. Boersma, D. Weenink, Praat: Doing Phonetics by Computer [Computer Pro- gram] Version 6.1.01, Technical Report, University of Amsterdam, 2019 . [19] MATLAB and Signal Processing Toolbox Release, The MathWorks, Inc., Natick,
Massachusetts, 2021 . United States
[20] I. Rodriguez-Lujan, G. Bailador, C. Sánchez Ávila, A. Herrero, G. Vidal, Analysis of pattern recognition and dimensionality reduction techniques for odor bio- metrics, Knowl.-Based Syst. 52 (2013), doi: 10.1016/j.knosys.2013.08.002 . [21] MATLAB, Version 9.9.0.1495850 (R2020b Update 1), The MathWorks Inc., Nat-
ick, Massachusetts, 2020 .
[22] Max Kuhn, The caret Package, 2009. https://github.com/topepo/caret/ . [23] J.H. Krijthe, RSSL: R package for semi-supervised learning, in: B. Kerautret,
M. Colom, P. Monasse (Eds.), Reproducible Research in Pattern Recognition, 2016, pp. 104–115 .
[24] Y.-F. Li, Z.-H. Zhou, Towards making unlabeled data never hurt, in: Proceed- ings of the 28th International Conference on Machine Learning (ICML’11), 2011 . Bellevue, Washington
[25] A. Tharwat, Classification assessment methods, Appl. Comput. Inform. (2018), doi: 10.1016/j.aci.2018.08.003 . http://www.sciencedirect.com/science/article/pii/
S2210832718301546