• No se han encontrado resultados

In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information

N/A
N/A
Protected

Academic year: 2023

Share "In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information"

Copied!
13
0
0

Texto completo

(1)

Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies are encouraged to visit:

http://www.elsevier.com/copyright

(2)

ContentslistsavailableatSciVerseScienceDirect

Applied Soft Computing

jo u r n al h om epa g e : w w w . e l s e v i e r . c o m / lo c a t e / a s o c

MicroCBR: A case-based reasoning architecture for the classification of microarray data

Juan F. De Paz

a

, Javier Bajo

a,∗

, Vicente Vera

b

, Juan M. Corchado

a

aDepartamentoInformáticayAutomática,UniversityofSalamanca,PlazadelaMerceds/n,37008Salamanca,Spain

bDepartamentodeEstomatologiaII,FacultaddeOdontologia,UniversidadComplutensedeMadrid,PlazadeRamonyCajal,s/n,CiudadUniversitaria,28040Madrid,Spain

a rt i c l e i n f o

Articlehistory:

Received13July2009

Receivedinrevisedform25July2011 Accepted14August2011

Availableonline22August2011

Keywords:

Case-basedreasoning HGU133

ESOINN

CLLleukemiaclassification Decisiontree

a b s t r a c t

Microarraytechnologyprovidesagreatamountofdatarequiringanalysis,andtheuseofnewintelligent algorithmsthatidentifytherelevantinformationfortheclassificationprocesshasbecomeessential.This studypresentsaclassificationtoolcalledMicroCBRthatusesthecase-basedreasoningparadigmand incorporatesanovelfilteringtechniquebasedonstatisticalmethods,anewclusteringtechniquethat usesESOINN(EnhancedSelf-OrganizingIncrementalNeuronalNetwork),andaknowledgeextraction techniquebasedontheJ48algorithm.MicroCBRhasbeenappliedtoclassify91CLLpatientsandthe resultsobtainedareshowninthispaper.

©2011ElsevierB.V.Allrightsreserved.

1. Introduction

Theuseofmicroarrays,andmorespecificallyexpressionarrays, enablestheanalysisofdifferentsequencesofoligonucleotides[1,2].

Microarrayscanbeusedinthefieldofmedicinetodetectchanges atthegeneticlevel.Simplyput,amicroarrayisanarrayofprobes thatcontainsgeneticmaterialwithapredeterminedsequence.Pre- determinedsequencescanvaryaccordingtothefieldinwhichthe studyisapplied;inthecaseofhumans,thesequenceswillcontain humangeneticmaterial.Thesesequencesarehybridizedwiththe geneticmaterialofpatients,thusallowingthedetectionofgenetic mutationsthroughtheanalysisofthepresenceorabsenceofcertain sequencesofgeneticmaterial.Thisstudywasconductedtoanalyze theintensitiesproducedduringthehybridization.Theanalysisof thisinformationallowsthedetectionofpatternsthatcharacterize certaindiseasesand,mostimportantly,thegenesassociatedwith thesedifferentdiseases.

Theanalysisofexpressionarraysiscalledexpressionanalysis.

Anexpressionanalysisbasicallyconsistsofthreestages:normal- izationandfiltering;clusteringandclassification;andextraction ofknowledge.Thesestagesarecarriedoutfromtheluminescence valuesfoundintheprobes.Presently,thenumberofprobescon-

Correspondingauthor.Address:Compa ˜nía5,37002Salamanca,Spain.

Tel.:+34639771985;fax:+34923277101.

E-mailaddresses:[email protected](J.F.DePaz),[email protected](J.Bajo), [email protected](V.Vera),[email protected](J.M.Corchado).

tainingexpressionarrayshasincreasedconsiderablytotheextent thatithasbecomenecessarytousenewmethodsandtechniques toanalyzetheinformationmoreefficiently[3].Itisnecessaryto developnewtechniquestoanalyzelargevolumesofdata,extract therelevantinformation,anddeletetheinformationwhichhasno relevancetotheclassificationprocess.Moreover,theknowledge obtainedduringtheclassificationprocessisofgreatimportance forsubsequentclassifications.Therearevariousartificialintelli- gencetechniquessuchasartificialneuralnetworks[4,5],Bayesian networks [6], and fuzzy logic [7] which have been applied to microarrayanalysis.Whilethesetechniquescanbeappliedatvari- ousstagesofexpressionanalysis,theirintegrationintosystemsthat combinevariousartificialintelligencetechniques,suchasCBR,is ofparticularinterest.Previousresearch[47]hasstudiedthecapac- ityofdifferentclassifiersinthefieldofsequencing;however,the systemonlyusesnewclassificationsinthelearningphase,andcon- sequentlydoesnottakeintoaccounttherelevanceofthechanges madebyexpertanalysis.ThispaperpresentsMicroCBR,asystem basedonCBR(case-basedreasoning)whichusespastexperiences tosolvenewproblems[8].Assuch,itisperfectlysuitedforsolving theproblemathand.Inaddition,CBRmakesitpossibletoincor- poratethevariousstagesofexpressionanalysisintothereasoning cycleoftheCBR,thusfacilitatingthecreationofstrategiessimi- lartotheprocessesfollowedinmedicallaboratories.Therecovery ofinformationfrompreviousanalysessimplifiestheclassification processbydetectingandeliminatingrelevantandirrelevantprobes detectedinpreviousanalyses.Thisallowsforalearningprocess thatcannotonlyincorporatenewcases,butalsoselecttherelevant informationfromcasesthathavebeenreviewedbyanexpert.

1568-4946/$seefrontmatter©2011ElsevierB.V.Allrightsreserved.

doi:10.1016/j.asoc.2011.08.021

(3)

Forsometimenow,wehavebeenworkingontheidentification oftechniquestoautomatethereasoningcycleofseveralCBRsys- temsasappliedtocomplexdomains[9].TheobjectiveofMicroCBR istodevelopaCBRforclassifyingpatientsbasedoninformation obtainedfrommicroarrays.Thissystemisappliedtotheclassi- ficationof subtypes ofleukemia,specifically,todetectpatterns andextractsubgroupswithintheCLLtypeofleukemia.MicroCBR incorporatesvarioustechniquesofartificialintelligence(filtering, clustering,artificialneuralnetworksandextractionofknowledge) atdifferentstages ofthereasoningcycle.In theretrievephase, newpre-processing andfilteringtechniquesareincorporatedin ordertoselecttheprobeswithrelevantinformationforclassifying patients.Thisinnovativemethodiscapableofobtainingafiltering processthatincorporatesmultiplestatisticaltechniquesbasedon hypothesiscontrasts,whichnotablyreducesthedimensionalityof thedata.Reducingthedimensionalityofthedatamakesitpossible tousetechniqueswithgreatercomputationalcomplexityinlater stagesoftheCBRcycle,whichwouldotherwisebeunviable.The reusestageincorporatesaclassificationtechniquebasedonneural networkESOINN[10].Thistechniqueproposesanovelmethodfor generatingclustersandforidentifyingthenearestclusterforthe finalclassification.ESOINNperforms clusteringbyincorporating boththedistributionprocessoftheentiresurfaceofclassification, andtheseparationbetweengroupswithlowdensityamongthem.

AnadditionalgroupingtechniqueknownasPAM[11](partition aroundmedoids),isalsoused,resultinginamoreaccurateclas- sification,sincetheresultssuggestedbytheESOINNnetworkare comparedtothoseobtainedusingthePAMtechnique.Oncethe classificationprocessisfinished,therevisephaseinitiatesatech- niqueforextractingknowledgeJ48[12]thatcanextractrulesto explainthedecisionstakenintheretrieveandreusestages.More- over,therevisestageincludesaMDS(MultidimensionalScaling) technique[13–15]forpresentinginformationinlowdimension- ality.Ahumanexpertanalyzesthisinformationandevaluatesthe proposedclassificationaswellasthevalidityoftherulesgenerated.

Finally,intheretainstage,ifthehumanexpertconsidersthepro- posedsolutionvalid,thesystemstoresthecaseinformationand therulesthat havebeen obtained.In this way,MicroCBRtakes advantageoftheinformationfrompreviouslyclassifiedpatients, andfromtherelevantandirrelevantprobes,andrulesfromthe previouslyappliedknowledgeextraction.

Thepaperisstructuredasfollows:thenextsectionbrieflyintro- ducestheproblemthatmotivatesthisresearch.Section3presents MicroCBRanddescribesthenovelstrategiesincorporatedinthe stagesoftheCBRcycle.Section4describesacasestudyspecifically developedtoevaluatetheCBRsystempresentedwithinthisstudy, consistingofaclassificationofCLLleukemiapatients.Finally,Sec- tion5presentstheresultsandconclusionsobtainedaftertesting themodel.

2. Microarraydataanalysis

Microarrayhasbecomeanessentialtoolingenomicresearch, making it possible to investigate global gene expression in all aspects of humandisease [16]. Microarray technologyis based onadatabaseofgenefragmentscalledESTs(ExpressedSequence Tags), which are used to measure target abundance using the scannedfluorescenceintensitiesfromtaggedmoleculeshybridized toESTs[17].Specifically,theHGU133plus2.0[3]arechipsused forexpressionanalysis.Thesechipsanalyzetheexpressionlevel of over 47,000 transcripts and variants, including 38,500 well- characterizedhumangenes.Itiscomprisedofmorethan54,000 probesetsand1,300,000distinctoligonucleotidefeatures.TheHG U133plus2.0providesmultiple,independentmeasurementsfor eachtranscript.Theuseofmultipleprobesprovidesacompletedata

setwithaccurate,reliable,reproducibleresultsfromeveryexper- iment. Microarray technology is a critical element for genomic analysisandallowsanin-depthstudyofmolecularcharacteriza- tionofRNAexpression,genomicchanges,epigeneticmodifications orprotein/DNAunions.

Expressionarrays[3]areatypeofmicroarraythathavebeen usedindifferentapproachestoidentifythegenesthatcharacter- izecertaindiseases[7,18,19].Inallcases,thedataanalysisprocess isessentiallycomposedofthreestages:normalizationandfilter- ing,clustering,andclassification.Thefirststepiscriticaltoachieve bothagoodnormalizationofdataandaninitialfilteringtoreduce thedimensionalityofthedatasetwithwhichtowork[20].Since theproblemathandisworkingwithhigh-dimensionalarrays,itis importanttohaveagoodpre-processingtechniquethatcanfacil- itateautomaticdecision-makingaboutthevariablesthatwillbe vitalfortheclassificationprocess.Inlightofthesedecisionsitwill bepossibletoreducetheoriginaldataset.Moreover,thechoice ofaclusteringtechniqueallowsdatatobegroupedaccordingto certainvariablesthatdominatethebehaviourofthegroup.After organizing thedataintogroups it ispossibletoextract knowl- edgeandclassifypatientswithinthegroupthatpresentsthemost similarities.

Case-based reasoning [9] is particularly applicable to this problemdomainbecauseit(i)supportsarichandevolvablerep- resentationofexperiences,problems,solutionsandfeedback;(ii) providesefficientandflexiblewaystoretrievetheseexperiences;

and(iii)appliesanalogicalreasoningtosolvenewproblems[21].

CBRsystems canbeusedtoproposenewsolutions orevaluate solutions toavoidpotentialproblems.Theresearchin[22]sug- gests that analogical reasoning is particularlyapplicable tothe biological domain, in partbecausebiological systems are often homologous(rootedinevolution).Moreover,biologistsoftenusea formofreasoningsimilartoCBR,whereexperimentsaredesigned andperformedbasedonthesimilaritybetweenfeaturesofanew systemandthoseofknownsystems.In[23]amixtureofexperts forcase-basedreasoning(MOE4CBR)isproposed.Itisamethod thatcombinesanensembleofCBRclassifierswithspectralclus- teringandlogisticregression,butdoesnotincorporateextraction of knowledgetechniquesand doesnot focusondimensionality reduction.Someapproachessuchas[9]provideCBRsolutionsand knowledgeextractiontechniques,facilitatingthecomprehension oftheclassificationprocess.ThispaperpresentsMicroCBR,aCBR solutionwhichalsoincorporatesnewknowledgeextractiontech- niques, but additionallyfocuses onthe definitionof innovative strategiesfordimensionalityreductionandclustering.Thefollow- ingsectionpresentsadetailedaccountoftheCBRsystemproposed inthiswork.

3. CBRsystemforclassifyingmicroarraydata

ThissectionpresentstheCBRsystemproposedinthecontext ofthisresearchandprovidesaclassificationtechniquebasedon previousexperiencesfordatafrommicroarrays.TheCBRdeveloped systemimitatesthebehaviourofhumanexpertsinthelaboratory andincorporatesinnovativeknowledgediscoverytechniques.The systemreceivesdatafromtheanalysisofchipsandisresponsible forclassifyingindividualsbasedonevidenceandexistingdata.

ThepurposeofCBRistosolvenewproblemsbyadaptingsolu- tionsthathavebeenusedtosolvesimilarproblemsinthepast[8].

TheprimaryconceptwhenworkingwithCBRsistheconceptof case.Acasecanbedefinedasapastexperience,andiscomposedof threeelements:aproblemdescriptionwhichdescribestheinitial problem;asolutionwhichprovidesthesequenceofactionscar- riedoutinordertosolvetheproblem;andthefinalstagewhich describesthestateachievedoncethesolutionwasapplied.CBR

(4)

managescases(pastexperiences)tosolvenewproblems.Theway casesaremanagedisknownastheCBRcycle,andconsistsoffour sequentialstepswhicharerecalledeverytimeaproblemneedsto besolved:retrieve,reuse,reviseandretain.EachstepoftheCBRlife cyclerequiresamodelormethodinordertoperformitsmission.

IntheCBRsystemproposedwithinthisstudy,theretrievephase filtersvariables,andrecoversimportantvariablesfromthecasesto determinethemostinfluentialfortheclassification.Oncethemost importantvariableshavebeenretrieved,thereusephasebegins adaptingthesolutionsfortheretrievedcasestoobtainthecluster- ing.Oncethisgroupingisaccomplished,thenextstepisknowledge extraction.Therevisephaseconsistsofanexpertrevisionforthe proposedsolution,andfinally,theretainphaseallowsthesystemto learnfromtheexperiencesobtainedinthethreepreviousphases, consequentlyupdatingthecasesmemory.

AkeyelementinaCBRsystemisacase,whichcanbedefined asapastexperience[8]andiscomposedofthreeelements:prob- lemdescription,problemsolution,andthefinalstateobtainedafter applyingthesolution.AcaseinMicroCBR contains information relatedtothepatient,therules,theproposedclassification,and theprobesmarkedasirrelevantorimportant.Thecaseisdefined bythefollowingexpression:

ij=(id,S=(A1,...,An),Cp,Cr)

whereij∈IandI={i1,...,iS}isthesetofindividuals/cases,Aisthe setofalltheprobes,Airepresentstheprobei,Cpisthepredicted classandCrtheactualclass.

In addition tothecases memory, oursystemincorporates a memoryofrulesthatcontainstheinformationextractedthrough theknowledgeextractiontechniques.Thememoryofrulesisstruc- turedasfollows:

R={r1,...,rl}, with ri=(l1∧...∧lm)→cj

where ls=(dts,os,)/dts∈Dt,os∈O,SIrr⊆A whereRisthesetofrulesfromthedecisiontree,lscontainsasetof discretizedprobes,anoperatorandarealvalue,Dtisthediscretiza- tionvaluefortheprobeAt,O={=, =/,>,<,≤,≥}operator,andSIrr

isthesetofprobesmarkedasirrelevant,cj∈Cp.

Whenanewcaseisclassified,anewdecisiontreeisgenerated intherevisestage.Asetofrulesareextractedfromthedecision treewhichprovideknowledgeabouttherelevanceoftheprobes intheclusteringandclassificationprocess.Fig.1showsascheme ofthetechniquesappliedinthedifferentstagesoftheCBRcycle.

AsseeninFig.1,theimportantprobesthatallowtheclassifica- tionofpatientsarerecoveredintheretrievephase.Theretrieve phaseisdividedinto6sub-phases:pre-processingthroughRMA, removalofirrelevantvariables,uniformdistribution,probeswith- outmeaningfulcut-offpoints,andcorrelatedvariables.Inthereuse phasethepatientsaregroupedbymeansofanESOINNneuralnet- work.Then,thepatientswithnopriorclassificationareassigned toagroupusingthenearestcluster.IntherevisephasetheJ48 [12]techniqueisappliedforextractingknowledgeaboutthemost importantprobes forthe classification,and theMDS technique [13–15]isusedtomakearepresentationinlowdimensionality.

Finally,intheretainphase,theknowledgeisupdated.Thisknowl- edgeincludesthecaseclassification,thedecisionrulesobtained, andtheinformationassociatedwiththeimportanceorirrelevance ofcertainprobesextractedfromtherules.

Fig.1showstheschemeoftheCBRsystemproposedwithinthis study.Fig.2detailsthestepsfollowedineachofthestagesofthe CBRcycle.ThestructureoftheMicroCBRwillnowbeexplainedin detail,presentinginnovativetechniquesmodelledineachofthe stagesoftheCBR.

Fig.1.StructureoftheproposedMicroCBRmodel.

3.1. Retrieve

Traditionally,onlythecasessimilartothecurrentproblemare recovered,oftenbecauseofperformance,andthenadapted.With regardtoexpressionarray,thenumber ofcasesisnotacritical factor,rather thenumberofvariables. Forthis reason,wehave incorporatedaninnovativestrategywherevariablesareretrieved atthisstageandthen,dependingontheidentifiedvariables,therest ofthestagesoftheCBRarecarriedout.Thenewstrategyallowsa notablereductioninthedimensionalityofthedata.Thefiltering phaseiscarriedoutonItogetherwiththenewcaseis.Thefiltering isonlyappliedtothoseprobesnotassociatedwithanyoftherules.

First,apre-processingofthedataisconductedusingRMA.Then, the5filteringsub-phasesareexecuted:removalofcontrolprobes, removaloferroneousprobes,removaloflow variabilityprobes, removalofprobeswithauniformdistribution,andremovalofcor- relatedprobes.Thesefivesub-phasesareoutlinedinthefollowing paragraphs.

3.1.1. RMA

Thisphasebeginsoncethelaboratoryexperimentwithmicroar- rayshasbeencompleted.Theresearcherobtainsvariousfilesthat contain grossintensityvalues. Prior toanalyzing thedata, itis important to complete the pre-processing phase, which elimi- nates defective samples and standardizes the data. As seen in Fig. 2, this phaseis normally divided into 3 sub-phases:back- groundcorrection,standardization,andsummarization.Thereis currently a limited group of algorithms that investigators use for performingthesesteps.ThemostcommonareMAS5.0[24]

(MicroarrayAffymetrixSuite5.0),PLIER[25](ProbeLogarithmic IntensityError),andRMA)[26](RobustMulti-arrayAverage).

The RMA [26] algorithm is a method for normalizing and summarizingprobe-levelintensitymeasurements.Itanalyzesthe valuesforthePM(Perfect-Match):inthefirststep,aBackground Correctioniscarriedouttoremovethenoisefromtheaveragesof thePM;inthesecondstep,thedataisquantilenormalizedinorder tocomparedatafromdifferentmicroarrays;finally,asummariza- tionismadeandthevaluesforeachprobe-setaregenerated.

(5)

Fig.2. DetailedarchitecturefortheMicroCBRmodel.

3.1.2. Irrelevantprobes

Oncethecontrol andtheerroneous probeshave beenelim- inated, the filtering process begins. The first step consists of eliminatingtheprobesmarkedasirrelevantinpreviousexecutions oftheCBRcycle.Thisway,allprobesthatcanpassthefiltering phase,butarepronetocauseerroneousresultsduringthereuse phase,areremoved.

3.1.3. Variability

Thesecondstageistoremovetheprobesthathavelowvariabil- ity.Thisworkiscarriedoutaccordingtothefollowingsteps:

1.Calculatethestandarddeviationforeachoftheprobesj

·j=+

1

n

n

j=1

·j−xij)2 (1)

wherenisthetotalnumberofcases,¯·jistheaveragepopula- tionforthevariablej,andxijisthevalueoftheprobejforthe individuali.

2.Standardizetheabovevalues zi= ·j

(2)

where =(1/n)

n

j=1·j and ·j=+

(1/n)

n

j=1·j−xij)2 wherezi≡N(0,1)

3.Discardprobesforwhichthevalueofzmeetsthefollowingcon- dition:z<−1.0.Thiswillachievetheremovalofabout16%ofthe probesifthevariablefollowsanormaldistribution.

3.1.4. Uniformdistribution

Finally,allremainingvariablesthatfollowauniformdistribu- tionareeliminated.Thevariablesthatfollowauniformdistribution willnotallowtheseparationofindividuals.Therefore,thevariables thatdonotfollowthisdistributionwillbereallyusefulvariables in theclassificationof thecases.Thecontrastofassumptionsis explainedbelow,usingtheKolmogorov–Smirnov[27]testasan example.H0:thedatafollowauniformdistribution;H1:theana- lyzeddatadonotfollowauniformdistribution.Statisticalcontrast:

D=max{D+,D} (3)

where D+=max

1≤i≤n{(i/n)−F0(xi)}, D=max

1≤i≤n{F0(xi)−((i−1)/n)} withiasthepatternofentry,nthenumberofitemsandF0(xi)the probabilityofobservingvalueslessthaniwithH0beingtrue.The valueofstatisticalcontrastiscomparedtothenextvalue:

D˛= C˛

k(n) (4)

Inthespecialcaseofuniformdistributionk(n)=√

n+0.12+ (0.11/√

n)andalevelofsignificance˛=0.05,C˛=1.358.

3.1.5. Cut-offpoints

Thisstepremovestheprobesthat,despitenotfollowingauni- formdistribution,havenoseparationbetweenelements,anddo notallowtheelementstobepartitioned.Thewaytoremovethe probesistodetectchangesinthedensitiesofthedata,andtoselect thefinalprobes.Theprobesinwhichcut-offsorhighdensitiesare notdetectedareeliminated,astheydonotprovideusefulinfor- mationtotheclassificationprocess.Thiswillkeeptheprobesthat allowtheseparationofindividuals.Thedetectionoftheseparation intervalsisperformedbycalculatingthedistancebetweenadjacent

(6)

Select a probe

Sort the values

Remove elemets

< threshold Uniform distribution

Confidence

interval Percentil

Cluster Cluster

More probes

yes no

No

yes 1, 2

3, 4

5

6, 7

8 10

10

11

12

Fig.3.Cut-offpointsoverview.

individuals.Oncethedistanceiscalculated,itispossibletodeter- minethepotentiallyrelevantvalues.Theselectioniscarriedoutby applyingconfidenceintervalsforthevaluesofthesedifferencesif thevaluesfollowauniformdistribution,orbyselectingthevalues aboveacertainpercentileifthevaluesdonotfollowanormaldis- tribution.ThediagraminFig.3illustratesthedetectionprocessfor cut-offpoints.

Thisprocessisformalizedasfollows:

1.LetIbethesetofindividualswithfilteredprobestogetherwith thenewindividual,wherex·jrepresentstheprobejforallthe individuals,andxijtheindividualifortheprobej

2.Selecttheprobej=1,x·j

3.Sortinincreasingordervaluesx·j

4.Calculatethevalueforxij=xi+1j−xij

5.Determineifthevariablexijfollowsauniformdistributionby meansoftheShapiro-Wilktest[28],otherwisegotostep10.

6.Calculatethevaluefor ¯x.j

7.Establish the confidence interval for the variance, which is established as 2·j∈[((n−1)·S2)/(2n

1,1˛/2),((n−1)· S2)/2n

1,˛/2] with ˛=0.05 and n=#x·j and the number of elementsforx·j,S2isthesamplingvariance.

8.Establishthesetofelementsformxijnotbelongingtotheset Qj={xij/xij >I+

·j}whereI+

·j

isthevalueoftheupperlimitof theconfidenceinterval.

9.Gotostep11.

10.SelectthosevaluesuptothepercentileP˛fromeveryx·jand establishthesetQj={xij/xij>P˛}

11. Selecttheprobej+1inthecaseofmoreprobesneedingrevision andgotostep2.

12.CreatethenewsetofprobesI=∪x·j/∃xij∈Qj/i>#x·u∧i<

#x−#x·u.Whereuisaconstantbetween[0,0.5].

13.Finalizeandreturnthenewsetofindividualswiththefiltered probesI.

3.1.6. Correlations

Atthelaststageofthefilteringprocess,correlatedvariablesare eliminatedsothatonlytheindependentvariablesremain.Tothis end,thelinearcorrelationindexofPearsoniscalculatedandthe probesmeetingthefollowingconditionareeliminated.

rxiy·j<˛ (5)

given ˛=0.95, rx·iy·j=(x·ix.j/x·ix·j), x·ix·j=(1/N)

n s=1.i− xsi)( ¯u.j−xsj)wherex·ix.jisthecovariancebetweenprobesiand j.

3.2. Reuse

Inthisphasetheclusteringofindividualsiscarriedout,along withtheclassificationofnewindividualstooneoftheclusters.

Thereareseveralalgorithmsforclustering,butthemostcommon arethehierarchicalalgorithms[29]andthosebasedonpartition- ing[11].Withinthehierarchicalalgorithmsthemostcommonis the dendrogram[29]. Among thepartition-basedmethods it is possibletofindalternativesbasedonRNAssuchasSOM[30](Self- OrganizingMap),GNG[31](GrowingNeuralGas)orSOINN[32]

(Self-Organizing IncrementalNeuronalNetwork).Otheralterna- tivesincludek-means[33]orPAM[11].Thesetwomethodsdefine aseriesofinitialclusterswhichcorrespondtoanewindividualor toexistingindividuals,andaremarkedastheclusterrepresenta- tives,whiletheremainingindividualsareallocatedtothenearest cluster.Theproblemwitheachofthesemethodsisthattheydonot considerchangesinthedistributionofdensitiesofindividuals,and usuallydonotdetectclusterswithatypicalforms,suchaselongated clusters.Analternativetoclustermethodscouldbethedirectappli- cationofclassifiers,includingoptimization-basedfunctionssuchas SVM[48],decisiontrees,decisionrules,ordiffusemethodssuchas thek-nearestneighbour.

3.2.1. ESOINNneuralnetwork

NeuralNetworksbasedonGNGcandetectclusterswithatyp- icalforms,adjustiterativelytothedistributionoftheindividuals, anddetectlowdensityzones.TherearevariantsoftheGNG,such as the GCS [34] (Growing Cell Structure) or SOINN [32] (Self- OrganizingIncrementalNeuronalNetwork).Unlikeself-organizing mapsbasedonmeshes,GrowingGridorGCSdonotsetthenum- berofneurons,orthedegreeofconnectivity,buttheydoestablish thedimensionalityofeachmesh.Thiscomplicatestheseparation phasebetweengroups oncetheneurons aredistributed evenly acrossthe surface.The ESOINNneuralnetwork [10] (Enhanced Self-OrganizingIncrementalNeuronalNetwork)isavariation of theSOINNneuralnetwork [32],which allowsthecreation ofa singlelayer,whileESOINNisabletoincorporateboththedistri- butionprocessalongthesurfaceandtheseparationbetweenlow densitygroups.Thelearningprocessofthenetworkisdistributed intotwostages:thefirststageofcompetitionCHL[34](Competi- tiveHebbianLearning)wheretheclosestnodetotheinputpattern isselected;andthesecondadaptation/growingstagesimilartoa GNG.Thetrainingphaseandthevariousalgorithmsappliedatevery modifiedstageareoutlinedbelow:

1.Updatetheweightsofneuronsbyfollowingaprocesssimilarto theSOINN,butintroducinganewdefinitionforthelearningrate inordertoprovidegreaterstabilityforthemodel.Thislearning

(7)

ratehasproducedgoodresultsinothernetworkssuchasSOM [36].

Wa1=n1(Ma1)(−Wa1)

Wai=n2(Ma1)(−Wai) withai∈Na1 (6) wheren1(x)=(1/√

x),n2(x)=(1/

2+x2),aiisneuroni,is theinputpattern,Ma1isthenumberofwinningsofneuronai, Na1isthesetofneighboursofai.

2.Deletetheconnectionswithhigherage.Theagesarestandard- izedandthosewhosevaluesareintheregionofrejectionwith k>0areremoved.Theassignedvalueof˛is0.05,therefore zi= ei

,z≡N(0,1) thenf(z)= 1

√2exp −z2 2

(7) where P(z<k)=˛/2→P(z<k)=0.975→(z)=0.975, k=1.96.

Thereforeallzvaluesthataregreaterthan1.96aredeleted.

3.OnceallinputpatternshavebeenintroducedthenaKS-Test[27]

iscarriedoutinordertodetermineifthedensitydistributionfor theneuronsineachgroupfollowsanormaldistribution.Ifso, thelearningprocedureisfinished;otherwisethenextpatternis processed.Thevalueof˛chosenis0.05.

Oncethecaseshavebeendistributedinthemeshes,itisnec- essarytoassigneach ofthemeshestoa class accordingtothe followingprocedure:LetIbethesetofindividualsoncetheprobes havebeen filteredand GE theset ofclusterscreatedby means oftheESOINNneuralnetwork,definedasGE={gE/gE⊆I},where giE∩gEj = ,∀i=/j with giE,gjE∈GE. Let C be the set of existing classesfortheindividualswherecj∈Cistheclassjintheset.Wecan saythatthemeshi,gEi belongstoaclassj,gEi∈cjandisrepresented asgiE

cjwhen

maxcjC#Icj/giE

#Icj ·#Icj

#I (8)

whereIcj ={s∈I/s∈cj}andIcj/gEi isthesetofindividualsfromcj restrictedtothegroupgiE.

ThesetofmeshesbelongingtotheclassisdenotedasGEcjandis definedbytheexpression(9).

GEcj=

gE icjGE

gEi

cj (9)

3.2.2. PAM

ThePAMalgorithm[11]isexecutedinparalleltotheclustering inordertofacilitateacomparisonoftheresultsobtained.Theclas- sificationmadebybothmethods,PAMandESOINN,generatesan equivalenceindexbetweenthetwomethodsthatdeterminesthe consistencyofthereusephase.ThealgorithmusedforPAMisas follows:

1.Selectthenumberofclustersdependingon#C.

2. Themetricusedforthedistanceisthesameastheoneusedin theESOINNnetwork

3.Classifythepatientstakingallofthevariablesintoaccount,with- outanyfilteringGP={gP/gP⊆I}withgPi ∩gjP= ,∀i=/j 4. OncethegroupsGParecreated,anassignationismadefollowing

theprocedureindicatedin(8).

3.2.3. Equivalenceindex

OncetheindividualshavebeenclassifiedusingboththePAM andtheESOINNneuralnetworks,theequivalenceindexforboth

Table1 Numberofprobes.

Uniform Cut-off Correlations

CBR 1311 618 541

methodseqiscalculated,andtheerrorratefortheESOINNnet- workisdeterminedasafunctionofthepre-classifiedcases.The equivalenceindexisdefinedasindicatedin(10):

eq= #{i∈I/i∈cEj ∧i∈cPj}

#I (10)

wherecjErepresentsthesetofmeshesbelongingtoclassjthrough the ESOINN network and cjP represents the set of individuals belongingtoclassjthroughthePAMalgorithm.

3.2.4. Classification

Oncethemeshesaregeneratedbytheclusteringprocess,pre- viously unclassified individuals are now classified by selecting thenearestmesh.Whenthemeshhasbeenselected,thecaseis assignedtotheclassofthemeshselected.Theassignmentisdefined as(11).

is∈GEk

cj→iu∈Cj (11)

whereiuistheunclassifiedindividual,isistheindividualclosestto theindividualandiucalculatedusingtheEuclideandistance.

Fig.2describesthefunctioningofthereusephase.Asshown inFig.2,thereusestagereceivesthefilteredandnot-filtereddata resultingfromtheretrievestageasinputs.Theinputisusedforboth theESOINNneuralnetworkandthePAMtechnique.TheESOINN neural network generates a setof groups assigned todifferent classes.Thesegroupsarecomposedofmeshescontainingdifferent elementstogetherwiththeinformationofthepreviousclassifica- tion.ThePAMtechniquerepeatsthesameprocessconcurrentlyand generatesthegroupsforeachoftheclasses.Thegroupsgenerated byPAMcontaintheindividualsandtheirpreviousclassification, butdonotconsidersub-groups.Finally,theequivalenceindexis calculatedandthenewpatientisclassified.Theerrorrateforthe ESOINNnetworkismadethrough(8) todeterminetheclassfor eachofthegroups.

3.3. Revise

AsshowninFig.1,therevisioniscarriedoutbyahumanexpert whodeterminesifthegroupassignedbythesystemiscorrect.To facilitatethehumanexperttask,theequivalenceindexandthe errorratewerecalculatedinthereusestage.It isimportantfor themedicalhumanexperttounderstandtheclassificationprocess performedinthetwopreviousstages.Forthisreason,MicroCBR providesaknowledgeextractionmethodintheRevisephase.This methodanalysesthestepsfollowedintheretrieveandreusestages, and extractsknowledge whichisthenformalizedin rules.Con- sequentlythehumanexpertcaneasilyevaluatetheclassification andextractconclusionsconcerningtheefficiencyoftheclassifica- tionprocess.Theknowledgeextractionphasedetectsanomalous classifications,sinceitaccountsfortheexistenceofprobeswith irrelevantinformation,orthosethatweredecisiveinthemisclassi- fication.Forexample,thereareprobeswhichseparateindividuals aseithermenorwomen.Similarly,thehumanexpertcanmark importantprobesthatarenoteliminatedintheretrievestagedur- ingthefiltering.

Theextractionofknowledgethatis presentedtothehuman expertiscarriedoutusingtheJ48algorithm[12].TheJ48algo- rithmistheJavaimplementationoftheC4.5algorithm,anevolution

(8)

Fig.4. Densityplotandhistogramforprobes1552280 atand219703atthatdonotmeettheuniformitytestandinwhichcut-offpointsweredetected.

Table2

Classificationerrors.

Type1 Type2 Type3

Total 46 12 33

Predicted/Erroneous 47/5 9/0 35/4

oftheoriginalID3[37],whosemainadvantageisthatitincorpo- ratesnumericalattributesintothelogicaloperationscarriedoutin thetestnodes.Thereareotheralternativesforgeneratingdecision ruleswhichoperatesimilartothedecisiontrees,includingRIPPER [38]andPART[39].Thesemethodsextractsimilarinformationto classifyindividualsaccordingtodecisionrules.Whiletheresults aresimilar,thetreesallowtheexperttoapplyamoreintuitive classificationofthecases.BeforeapplyingJ48,itisnecessaryto performadiscretizationofthevaluessothatthealgorithmiscapa- bleofworkingwithlargevolumesofinformation.Thereasonfor thisisthesearchforthecut-offpointsinthevariablesusesastrat- egythatdependsmainlyonthenumberofdifferentvaluesthata variablecontains.

Table3

Comparisonofmethods.*,differentmedian;=,equal;(−),medianofcolumnless thanmedianofrow.

CBR Dendrogram PAM SVM KNN

CBR

Dendrogram *(−)

PAM *(−) *(−)

SVM =(−) *(+) *(+)

KNN *(−) *(+) *(+) *(−)

Thegeneralobjectiveofextractionofknowledgetechniquesis toprovideahumanexpertwithinformationaboutthesystem- generatedclassificationbymeansof asetof rules thatsupport thedecision-makingprocess.Itshouldbenotedthatknowledge extractiontechniquesarenotintendedtosubstitutetherationale andexperienceofa humanexpert duringadiagnosis,ratherto complementtheprocessandserveasanadditionalmethodology orguidelineforcommonproceduresinanalysis.

Theprocessisdescribedinthefollowingsteps.LetIbetheset ofindividualsandsthenumberofprobesoncethefilteringprocess hasfinished:

fr: A1×...×As→A1×...×As

(a1,...,as)→fr(a1,...,as)=(a1,...,as)

whereAi∈[0,1]isthevalueofthetermiusingthefunctionfrand isobtainedinthefollowingway:

ai= ai−min(Ai) max(Ai)−min(Ai)

Finallythevaluesarediscretizedbymeansoffuinaseriesof predefinedlevelst.

fu: A1×...×As

s

D×...×D (a1,...,as)→fu(a1,...,as)=(d1,...,ds) whereD=

i∈{0,...,t1}i·(1/(t−1)),wecansaythatdi=djifapply- ingthefunctionfu,withdj∈Difdi∈[dj−1/(2t), dj+1/(2t)].

Oncethetransformationis finished,thesetof individualsis determinedbythesubsetI⊆

s

D×...×Dofthedata,andJ48[12]

(9)

Fig.5. Decisiontree.

isusedtogeneratetherulesthatclassifytheindividuals.Theuse ofJ48,allowsrulestobeobtainedforclassifyinganindividualik∈I totheclasscjbymeansofrulessimilarto:

ri=(l1∧...∧lm)→cj

wheredpisthevaluefortheattributepfortheindividualik.Inthis waythesetisdefinedforrulesRthatclassifytheindividualsfor eachoftheclasses.Theserulescanberepresentedasadecision tree.

Fig.2representstherevisestageforMicroCBR.Theinputcorre- spondstothediscretizationofthevalues(ifthereusephasehas beensuccessful). Subsequently,knowledgeextractionisapplied through the J48. Finally, the relevant information extracted is stored(inconsequentialprobes,importantprobes,andresultsof the classification). At this stage, in addition to therepresenta- tionofthedecisiontree,a3Drepresentationwiththeinformation retrievedisdisplayed.ThedimensionalityisreducedbyusingMDS [13–15]

3.4. Retain

Ifthehumanexpertidentifiesrelevantinformationattherevise stage,theknowledgeisacquiredandtheinformationobtainedis stored.Theinformationthatisstoredcorrespondstotheclassifi- cationsthatareconsideredcorrect,thedecisionrulesgenerated thatareconsideredrelevant,andtheprobesmarkedasirrelevant.

Fig.6.Illustrationscomparingprobesfoundintherulesfortheknowledgeextractionstage.Withinthediagonal,theillustrationrepresentsthedensitiesforeachprobe.For theillustrationsoneithersideofthediagonal,thegraphisobtainedusingtheprobescorrespondingtoitsrowandcolumn+class3individuals,class2,andoclass1.

(10)

Fig.7.MainprobesrecoveredusingJ48.

TheinformationstoredisdividedintothecasesmemoryIandthe memoryofrulesRandSIrr.

Fig.2showsthestructureoftheretainstage.Takingtherevision oftheexpertintoaccount,thesystemlearnsfromthenewexpe- rienceandstorestheinformationthattheexpertestablishedas relevant.Thestoredinformationcanincludeprobes,classifications andrules.

4. Casestudy:classificationofCLLleukemiapatients

Microarray analysishasmade it possibletocharacterizethe molecularmechanismsthat cause severalcancers. With regard to leukemia, microarray analysis has facilitated the identifica- tion of certain characteristic genes in the different variants of leukemia[7,35,40].Cancerexpertsremarkontheimportanceofthe

identificationofthegenesassociatedtoeachtypeofcancerinorder toestablishthemostefficienttreatmentsforthepatients[41,42].

TheCancerInstituteinthecityofSalamancawasinterestedinnovel toolsfordecisionsupportintheprocessofCLL(ChronicLympho- cyticLeukemia)patientclassification.

TheInstituteprovideduswith91samplesofpatientdataand askedforatooltoprovidedecisionsupportintheexpressionarray analysisprocessandtoincorporateinnovativetechniquestoreduce thedimensionalityofthedataandidentifythevariables witha higherinfluenceintheclassificationofpatients.Thesamplescorre- spondedtopatientsaffectedbychroniclymphocyticleukemia.CLL isadiseaseoflymphocytesthatappeartobematurebutarebiolog- icallyimmature.TheseBlymphocytesarisefromasubsetofCD5-B cellsthatappeartohavearoleinautoimmunity.Thepathogene- sisofchroniclymphocyticleukemiaislikelyamultistepprocess,

(11)

Table4

Classifiedcasesforeachtypedetectedaccordingtothetotalnumberofexisting elements/classifiedelements/erroneouselements.

Type1 Type2 Type3 Total

46/48/5 12/9/0 33/34/3 91

42/44/5 10/8/0 29/29/3 81

37/38/4 8/8/1 26/25/2 71

31/30/3 7/8/3 23/23/2 61

26/24/2 5/7/3 20/19/1 51

24/24/3 5/4/0 17/18/3 46

initiallyinvolvingapolyclonalexpansionofCD5-Bcellsfollowedby thetransformationofasinglecell[43].CLLisoneofthefourmain typesofleukemia.About15.110newcasesofCLLwillbediag- nosedin2008.Approximately90.179peoplearecurrentlyliving withCLL,morethanthenumberofpeoplelivingwithanyother typeofleukemia.MostpeoplewithCLLareatleast50yearsold [44].CLLstartswithachangetoasinglecellcalledalymphocyte.

Overtime,theCLLcellsmultiplyandreplacenormallymphocytes inthemarrowandlymphnodes.ThehighnumberofCLLcellsin themarrowmaycrowdoutnormalblood-formingcells,andCLL cellsarenot abletofightoffinfectionlikenormal lymphocytes do[44].Theaimofthetestsperformedinthisstudyistodeter- minewhetheroursystemisabletoclassifynewpatientsbasedon previouslyanalyzedandstoredcases.

5. Resultsobtained

Thispaper has presented MicroCBR, a case-based reasoning systemspecifically designed to analyze data from microarrays, facilitatingthegroupingandclassificationofindividuals.Moreover, MicroCBRprovidesaninnovativemethodforexploringtheclassifi- cationprocessandextractingknowledgeintheformofruleswhich helpthehumanexpertstounderstandtheclassificationprocess andobtainconclusionsabouttherelevanceoftheprobes.MicroCBR wasappliedtoa realcasescenarioforclassifyingCLLleukemia patientsattheCancerInstituteinthecityofSalamanca.Thehuman expertsinthelaboratoryhaveremarkedontheadvantagesofusing MicroCBRasadecisionsupportsystemforCLLclassification,and haveespeciallynotedthefacilityinacquiringknowledgeandexpla- nations.

Section4presentedthecasestudyconsideredinthis report, whichclassified91CLLleukemiapatientsintogroups.Theaimof thecasestudywastoidentifytheprobesthatallowclassifyingthe CLLleukemiapatientsintosubgroups.Inaninitialtest,datafrom 90patients,wherethepreviousclassificationwasnottakeninto account,wereusedintheMicroCBRsystem.Thepre-processing phasebeganwith54.675probesandtheRMAwasappliedtoobtain theluminescencevaluesforeachoftheprobesandtohomogenize thevalues fromdifferentchips.Afterthepre-processing phase, thefilteringprocesswasapplied,notablyreducingtheprobesto 618,withoutincreasingtheerrorrate.Thecut-offphasereduced theprobescomingfromtheuniformfilteringfrom1.311to618 probes.Table1showsthenumberofprobesgeneratedaftereach ofthefilteringstages.Fig.4a–dpresentstwoprobesthatpassed theuniformitytestandcontaincut-offpoints.Fig.4aandcshowsa densityplotandFig.4banddpresentthehistogramfortheprobes 1552280atand219703at.AsseeninFig.4,neitheroftheprobes followsauniformdistribution,andbothofthemhavelowdensity pointswhichallowaseparationofindividuals.Thisstagetakesthe irrelevantprobesfilteringintoaccount,asprobe201909at,which isassociatedtotheYchromosome.

Thereusephasebeginswhentheprobeshavebeenfiltered,and generatesthemeshesforthegroupsaswellasthedistributionof theindividualsalongthespace.Themeshclosesttothenewcase wasthenselectedandtheclassificationwasmade.Toevaluatethe

Fig.8. RepresentationoflowdimensionalityprobeswithMDS.

proposedmodel,thesystemclassified90individualsandonenew individual,andtheresultsobtainedwerecomparedtotheprevi- ousexistingclassifications.Table2showsthenumberofelements withanerroneousclassification.Thefinalclassificationwascom- paredwiththedataobtainedusingaclusteranalysisdendrogram [45],PAM[11],theSVMclassificationtechnique[48]andKNN[49]

onthefiltereddata,andtheresultsofthefinalclassificationwere showntohavealowerrateoferror.Theproportionoferrorsin everygroupwascalculatedandtheKurskal–Wallistest[46]was applied,asshowninTable3,todeterminateifthemedianofthese proportionswasequal.Thistestwasperformedbyapplyinga5×2 crossvalidationfollowedbytheKurskal–Wallistest.Weoptedfor theKurskal–Wallistestbecauseitisanon-parametrictechnique thatdoesnotrequireanysuppositionsaboutthedata;asanalter- nativetheMann–Whitneycouldhavebeenused.Thedifference obtainedbetweenCBRandSVM wasnot consideredsignificant, althoughthemedianwaslower.

Intherevisephase,MicroCBRextractedtheknowledgeobtained duringtheclassificationprocess,asshowninFig.5.Fig.5presents atreewherethenodesareprobesandthebranchesaredecision rules.Theleafnodescontainthegroupassignedandthenumberof elementsclassifiedcorrectlyandincorrectly.Thetreerepresented inFig.5istheresultoftheclassificationofthe90patientsand thesinglenewpatient.Fig.6representssomegraphicswherethe valuesoftheretrievedprobesarecomparedtwobytwowithJ48 [12],andthediagonalshowsthedensitygraph.AsseeninFig.6,

(12)

thereareindividualsbelongingtoclass3whichcanbeeasilysep- aratedformtherest.Fig.7a–dshowsthedatacorrespondingto thefirstthreeprobesrecoveredwithJ48.Fig.7aandbshowsthe valuesfortheprobesoftheindividuals.Theindividualsforeach ofthegroupsarerepresentedindifferentcolours,andaregres- sionplaneisobtainedforeachofthegroups.Thegraphshowshow theindividualscanbeseparatedinthespaceusingtheseprobes.

Fig.7canddshowsthesameinformationbutellipsoidshavebeen addedinordertogroup50%ofthedataforeachofthegroups,thus facilitatingthespatialseparationofthedata.Toobtainavisualrep- resentationofthepatient’sclassification,weusetheMDS[13–15], andthedimensionalityofthedataisreducedtothree.Fig.8aand brepresentstheinformationonceMDShasbeenappliedand,as shown,theindividualsofthedifferentclustersareseparatedinthe space.

Aftercarryingoutexperimentalstudiesofthedifferentproce- duresappliedtothereasoningcycle,thesystemperformancewas evaluated.Inordertoevaluatetheglobalfunctioningofthesys- tem,weincludedanadditionaltest.InthistestMicroCBRcontains 45previouslyclassifiedindividuals,andaimstoclassifytheremain- ing46casesusingpreviousknowledge.Table4showstheresults obtainedafterthetest.EachcolumninTable4representsthesub- typesdetected.EachrowinTable4representsthetotalnumber ofelementsbelongingtoeachoftheclasses,thetotalnumberof elementsassignedtothatclass,andthenumberofmisclassified elements.Wecanseethatthenumberoferrorsdecreasesasmore casesareincorporatedintothesystem.

6. Conclusions

MicroCBRisaspecializedandnovelsystemthatintegratesthe stepsofanexpressionanalysiswithinthestagesofaCBRcycle.

MicroCBRisabletoincorporatetheknowledgeacquiredinprevious classificationsanduseittoperformnewclassifications,providinga muchappreciateddecisionsupporttoolfordoctors.Theuseofthe CBRsystemfacilitatestheselectionandeliminationofprobesthat provideanomalousresults.Havingbeendiscardedintheretrieve phase,theseprobesarenotconsideredinsubsequentiterationsof thereasoningcycle.Asdemonstrated,theproposedsystemreduces thedimensionalitybasedonthefilteringofgeneswithlittlevari- abilityandthosethatdonotallowaseparationofindividualsdue tothedistributionofdata.Italsopresentsaclusteringtechnique basedontheneuronalnetworkESOINN,whichisvalidatedwith aPAMtechnique.Finally,MicroCBRincorporatesatechniquefor knowledgeextractionandpresentsittothehumanexpertsina veryintuitiveformat.

Withtheresultsobtainedfromempiricalstudieswecancon- cludethatMicroCBRprovidesatoolthatdetectsgenesandprobes, whicharethemostimportantfactorforthedetectionofpathol- ogy,andfacilitatesaclassificationandreliablediagnosis,asshown bytheresultspresentedinthispaper.MicroCBRhasbeenapplied toclassifyCLLleukemiapatientsandallowsthehumanexpertto obtaininformationabouttheclassificationprocess andtoiden- tifytheprobesconsideredasimportantorirrelevantforfurther classifications.

Acknowledgment

SpecialthankstotheInstituteofCancerofSalamancaforthe informationandtechnologyprovided.

References

[1]K.S.Lina,C.F.Chien,Clusteranalysisofgenome-wideexpressiondataforfeature extraction,ExpertSystemswithApplications36(2–2)(2009)3327–3335.

[2]Z.K.Stadlera,S.E.Come,Reviewofgene-expressionprofilinganditsclinicaluse inbreastcancer,CriticalReviewsinOncology/Hematology69(1)(2009)1–11.

[3]Affymetrix.GeneChip®HumanGenomeU133Arrayshttp://www.affymetrix.

com/support/technical/datasheets/hgu133arraysdatasheet.pdf.

[4]T.Sawa,L.Ohno-Machado,Aneuralnetworkbasedsimilarityindexforclus- teringDNAmicroarraydata,ComputersinBiologyandMedicine33(1)(2003) 1–15.

[5] D.Bianchia,R.Calogero,B.Tirozzi,Kohonenneuralnetworksandgeneticclas- sification,MathematicalandComputerModelling45(1–2)(2007)34–60.

[6] V.Baladandayuthapani,S.Ray,B.K.Mallick,BayesianmethodsforDNAmicroar- raydataanalysis,HandbookofStatistics25(1)(2005)713–742.

[7] R.Avogadri,G.Valentini,Fuzzyensembleclusteringbasedonrandomprojec- tionsforDNAmicroarraydataanalysis,ArtificialIntelligenceinMedicine45 (2–3)(2009)173–183.

[8]J.Kolodner,Case-BasedReasoning,MorganKaufmann,SanMateo,1993.

[9] F.Riverola,F.Díaz,J.M.Corchado,Gene-CBR:acase-basedreasoningtoolfor cancerdiagnosisusingmicroarraydatasets,ComputationalIntelligence22 (3–4)(2006)254–268.

[10]S.Furao,T.Ogura,O.Hasegawa,Anenhancedself-organizingincrementalneu- ralnetworkforonlineunsupervisedlearning,NeuralNetworks20(8)(2007) 893–903.

[11]L.Kaufman,P.J.Rousseeuw,FindingGroupsinData:AnIntroductiontoCluster Analysis,Wiley,NewYork,1990.

[12]N.Saravanan,S.Cholairajana,K.I.Ramachandran,Vibration-basedfaultdiag- nosisofspurbevelgearboxusingfuzzytechnique,ExpertSystemswith Applications36(2–2)(2009)3119–3135.

[13]I.Borg,P.Groenen,ModernMultidimensionalScalingTheoryandApplications, Springer-Verlag,NewYork,1997.

[14]J.B.Kruskal,MultidimensionalScalingbyoptimizinggoodnessoffittonon- metrichypothesis,Psychometrika29(1)(1964)1–27(Springer,NewYork).

[15]M.Turea,F.Tokatlib,I.Kurtc,UsingKaplan–Meieranalysistogetherwith decisiontreemethods(C&RT,CHAID,QUEST,C4.5andID3)indetermining recurrence-freesurvivalofbreastcancerpatients,ExpertSystemswithAppli- cations36(2)(2009)2017–2026.

[16]J.Quackenbush,Computationalanalysisofmicroarraydata,NatureReview Genetics2(6)(2001)418–427.

[17]R.J.Lipshutz,S.P.A.Fodor,T.R.Gingeras,D.H.Lockhart,Highdensitysynthetic oligonucleotidearrays,NatureGenetics21(1)(1999)20–24.

[18]M.Taniguchia,L.L.Guana,J.A.Basarabb,M.V.Dodsonc,S.S.Moorea,Compar- ativeanalysisongeneexpressionprofilesincattlesubcutaneousfattissues, ComparativeBiochemistryandPhysiologyPartD:GenomicsandProteomics3 (4)(2008)251–256.

[19]O. Margalit, R. Somech, N.Amariglio, G. Rechav, Microarray basedgene expressionprofilingofhematologicmalignancies:basicconceptsandclinical applications,BloodReviews4(4)(2005)223–234.

[20]N.J.Armstrong,M.A.VandeWiel,Microarraydataanalysis:fromhypotheses toconclusionsusinggeneexpressiondata,CellularOncology26(5–6)(2004) 279–290.

[21] I.Jurisica,J.Glasgow,Applicationsofcase-basedreasoninginmolecularbiol- ogy,ArtificialIntelligenceMagazine,SpecialIssueonBioinformatics25(1) (2004)85–95.

[22] J.S.Aaronson,H.Juergen,G.C.Overton,KnowledgeDiscoveryinGENBANK,in:

ProceedingsoftheFirstInternationalConferenceonIntelligentSystemsfor MolecularBiology,AAAIPress,1993,pp.3–11.

[23]N. Arshadi, I. Jurisica, Data mining for case-based reasoning in high- dimensionalbiologicaldomains,IEEETransactionsonKnowledgeandData Engineering17(8)(2005)1127–1137.

[24]Affymetrix. Statistical Algorithms Description Document, http://www.

affymetrix.com/support/technical/whitepapers/saddwhitepaper.pdf.

[25]Affymetrix.Guideto ProbeLogarithmicIntensityError (PLIER)Estimation http://www.affymetrix.com/support/technical/technotes/pliertechnote.pdf.

[26]R.A.Irizarry,B.Hobbs,F.Collin,Y.D.Beazer-Barclay,K.J.Antonellis,U.Scherf, T.P.Speed,Exploration,normalization,andsummariesofhighdensityoligonu- cleotidearrayprobeleveldata,Biostatistics4(2003)249–264.

[27]R.Brunelli,Histogramanalysisforimageretrieval,PatternRecognition34 (2001)1625–1637.

[28]J.Jureˇckováa,J.Picek,Shapiro–Wilktypetestofnormalityundernuisance regressionandscale,ComputationalStatistics&DataAnalysis51(10)(2007) 5184–5191.

[29]N.Saitou,M.Nie,Theneighbor-joiningmethod:anewmethodforreconstruct- ingphylogenetictrees,MolecularBiologyandEvolution4(1987)406–425.

[30]T.Kohonen,Self-organizedformationoftopologicallycorrectfeaturemaps, BiologicalCybernetics(1982)59–69.

[31]B.Fritzke,Agrowingneuralgasnetworklearnstopologies,AdvancesinNeural InformationProcessingSystems7(1995)625–632.

[32]F.Shen,Analgorithmforincrementalunsupervisedlearningandtopologyrep- resentation,Tokyo:Ph.D.Thesis,TokyoInstituteofTechnology,2006.

[33]S.J.Redmond,C.Heneghan,AmethodforinitialisingtheK-meansclustering algorithmusingkd-trees,PatternRecognitionLetters28(8)(2007)965–973.

[34]T.Martinetz,CompetitiveHebbianlearningruleformsperfectlytopologypre- servingmaps, in:ICANN’93:InternationalConferenceonArtificial Neural Networks,Amsterdam,1993,pp.427–434.

[35]B.Guinn,A.F.Gilkes,E.Woodward,N.B.Westwood,G.J.Muftia,D.Linchc, A.K.Burnett,K.I.Mills,Microarrayanalysisoftumourantigenexpressionin presentationacutemyeloidleukaemia,BiochemicalandBiophysicalResearch Communication333(5)(2005)703–713.

Referencias

Documento similar

A este respecto es oportuno subrayar que la estructura de los tipos de omisión y la de los tipos de comisión por omisión son distintas. En los tipos de omisión: a) se prohíbe la