Pattern recognition and classification of images of biological macromolecules using artificial neural networks

(1)

BiophysicalJournal Volume 66 June 1994 1804-1814

Pattern Recognition and Classification of Images of Biological Macromolecules using Artificial Neural Networks

R. Marabini and J. M. Carazo

Centro Nacional deBiotecnologia(CSIC), UniversidadAut6noma, Campusde CantoBlanco, 28049Madrid, Spain

ABSTRACT Thegoal of this workwastoanalyzeanimagedata set and to detect the structuralvariabilitywithinthis set.Two algorithms for pattern recognition basedon neuralnetworksarepresented, onethatperformsan unsupervised classification (theself-organizing map) and the otherasupervised classification (the learningvectorquantization).Theapproachhasadirect impact in current strategies for structuraldetermination from electronmicroscopic imagesofbiologicalmacromolecules. Inthis workweperformedaclassification of bothaligned butheterogeneous imagedata setsaswellasbasicallyhomogeneousbut otherwise rotationallymisaligned image populations, inthe lattercasecompletely avoidingthetypical referencedependency of correlation-basedalignmentmethods. A number ofexamplesonchaperoninsarepresented.Theapproachiscomputationally fast and robust with respectto noise. Programsareavailable through ftp.

INTRODUCTION

The range ofproblems associated with detecting and rec- ognizingpatternsisvast.Itisthereforenotsurprising thatthe number ofapproachesthathave beenproposedovertheyears islarge. Focusing the attentionon the field of electronmi- croscopyofbiologicalmacromolecules, the problemwead- dress here may be described by the following sequence ofevents: givenasamplecontainingeitherabiochemically homogeneouspopulation of macromolecules or aheterogeneous one, obtain electron microscope images of these specimensandextractthe maximumamountofrelevanttwo- dimensional and/or three-dimensional biological information bymeansofimageprocessing operations performedon the whole set of images (for a review, see Frank, 1990).

Images are in general very noisy, so the processing tech- niques have tobe very robust.

Twomainquestionsrelated to patternclassificationappear whenaddressing this type ofproblem. The first one is the assessment of the homogeneity of the image data set(note thatabiochemically homogeneous population doesnotnec- essarilyrenderahomogeneoussetofimages, since different two-dimensional viewsofthesamethree-dimensionalstruc- ture may exist). Ifa number ofdifferent image classes are detected atthis stage, itisthen necessary to devisemethods toclassify theoriginaldatasetinto these classes.This step isusuallycarried outbytools such asmultivariatestatistical analysis(MSA)ortherelatedprincipalcomponentanalysis (PCA) (for a review, see Franket al., 1988b).

The second main problem appears afterthisclassification, sincenowthe classesmustbealignedinsuchawaythatan averaging process performed over the images within each classrenders newinformation with an enhanced signal-to-

Receivedforpublication 22 October 1993 and in final form 10 March 1994.

Addressreprint requeststoCarazo,J.M., CentroNacional deBiotecnologia (CSIC),UniversidadAut6noma,Campusde CantoBlanco, 28049 Madrid, Spain.Tel.: 34-1-585-4510, 34-1-585-4543;Fax:34-1-585-4506; E-mail:

[email protected].

X 1994bytheBiophysical Society 0006-3495/94/06/1804/11 $2.00

noise ratio. The alignment is usually carried outby cross- correlation withrespectto areferenceimage. Unfortunately, a clear-cut separation between these two problems appar- entlydoes notexist.

Thedifficultythat arises whentryingtoseparatethe problem ofclassification from theone ofalignment is that differences between imagesmaybedue eithertogenuine differences or to positional factors, such as rotation and translation.Ideally,wewouldprefertoclassifyanimagedata setwithoutconsideringthe latter as genuine differences.In principle, methods to define a description of an image in- dependent of itsrelativepositionarepossible,butthey donot seemtowork forgeneralcasesinvolvingverynoisyimages (Schatzand VanHeel, 1990;Franketal.,1992; Marabiniand Carazo, signalproc.,inpress).Itis therefore necessarytofirst

"align" the images before classification; however, aligning isonly well defined withinahomogeneousdata set. Clearly, theproblem is recursiveandthesingle practical approach to date is to iterate between alignment with respect to some reference, an always potentially problematic step (Boekema et al., 1986), and a posteriori classification.

Topics ofrecognitionandclassification havebeenprevi- ously treated in this context by van Heel (1986), using hierarchical ascendantclassification; Frank et al. (1988a), using a hybrid k-means and ascendant classification approach; Wagenknechtetal.(1989) and Carazoetal. (1989), using directly MSAto analyze rather continuous structural changes; and Carazo et al. (1990), usingfuzzy sets, among other authors(for a review, seeFrank, 1990).

The approach we propose in this work is conceptually differenttoprevious onesin thatit usesprinciplesofneural networks toaccomplishthe classification task (for a review of neuronal networks seeLippmann, 1987).Inessence, we use aparticular type ofneural network known as the self- organizing map (Kohonen, 1990) and we show that this method works remarkably well when confronted with a va- rietyoftasks, fromtheclassificationof aheterogeneous im- agedatasetafterafirstcycleofrefinement, revealingsubtle structural variations, to the directreference-free alignment of a homogeneous set of particle images through a

1804

(2)

classification-based angularassignment.The method is very fast, taking only afew minuteson amodemworkstation to classify hundreds ofimages.

The use of neural networksisincreasinginall thoseareas where, as in image processing, many solutions can be si- multaneously tested and large computer resources are needed. Theintrinsicparallelprocessingof the data and the abilitytoadapt makes theneural networkpotentiallyfaster and more robust than the classicalsequential process. Fur- thermore, the self-organizing maps are notparametric and needvery little aprioriknowledge.

Thealgorithm wepresentherepossesses anumber ofinteresting properties that may solve some of the problems encountered when following the procedures described above. As we will show, if the image data set is homogeneous, then the method is able to directly analyze sets of images that havenotbeenrotationally aligned, by perform- ingaclassification basedontheparticle rotation angle, there- foreeliminating the reference problem.Alternatively, ifthe population ofimagesisheterogeneous buttheyareknownto be correctly aligned, then the algorithm concentrates on genuine particle differences by performing a classification without anypriordatareduction step.

THEORETICAL BACKGROUND

The self-organizing algorithmmaps a set ofn-dimensional vectors onto a two-dimensional array of nodes in such a way that vectors projected onto adjacent nodes are more similar than vectors projected onto distant ones. Note that this algorithm does notwork like the usual n-dimensional onto two-dimensional mapping programs (Radermacher and Frank, 1985) in the sense that the emphasis is not on approaching the mathematical distances between the vectors but in plotting similarvectors close to one another in the output space. In the same way, the self-organizing algorithm could be interpreted as a nonlinear projection of theprobability density function of the n-dimensional input onto an output array of nodes. Accordingly, distances in theoutput space mustbeinterpreted withcaution, because they are not directly related to metric distances in the n-dimensional space.

Themapping isobtainediteratively byasequenceof steps.

Each steprequires thepresentation of aninputvectortothe arrayof outputnodes(eachnodehasassignedacertainvector that wewill refer in the following as"code vectors"). The inputvectorisused as argument to anactivationfunction that estimatesthesimilitude betweentheinputand thecodevectors.Finally,themostsimilarcode vector as well as itsneighborhood areadjustedsuch as toimprove itsresponse to any other similar input vector.

Following the description above, the input of the self- organizing map is the set of our experimentally obtained images, while the output is a two-dimensional array of nodes, withan image assigned to each ofthem. These assigned images start with some arbitrary value but, as the algorithm proceeds,will tend to approximate the probability density distribution oftheinput data (Kohonen, 1990).

In this way, we always have two pieces of information within this scheme: one is provided directly by the mj's (thecodevectors), and the otheris the final assignment of each input data to agiven code vector, which results in a classification of the data set.

Kohonen (1990) showed that this algorithm has some interesting properties. First, it represents most faithfully those dimensions of the input space along which the variance in the sequence of inputs

xi

is most pronounced.

These will oftencorrespond to the most important features of the inputs. Second,it tries to preserve continuity, i.e., it maps similar inputs

xi

to neighboring locations. Finally, it reflects differences in the sampling density of the input space in a natural way: regions from which inputs have occurred more frequently are mapped onto larger domains and therefore with better resolution.

The basic idea of the self-organizing algorithm is as follows.

Step 0: Assume a sequence ofinput vectors x(t) with a probability density function 5, where t is the "time" coor- dinate. The input vectorsx(t) areimagesrandomlyselected from our experimentally obtained images that are sequen- tially presented to the algorithm. Note that the expression x(t)does notrefer to achangeintheimagesthemselves,but to atemporalevolution of theinputdata. In ourapplication, this evolution refers to the sequential presentation ofinput vectors to the network. In this way,x(O)(x(t), t = 0) refers tothe firstimage presentedtothenetwork,x(l)(x(t),t = 1) refers to the secondimage, etc.

Step 1: Assume a set of code vectors

mj(t)

assigned to the output nodes. Supposethat

mj(t)

hasbeeninitialized in some way.

Step2: Eachvectorx(t)isnowpresentedtotheoutput set ofcode vectors at each successive instant oftime.

Step 3: x(t)iscomparedsimultaneouslywith each

mj(t),

forexample usingadistance

dj.

Note thatalthough thedis- tance

dj

is a critical factor in themappingalgorithm, there- sults will not reflect the mathematical distance in the n-dimensional spacebetween inputimages.

Step4: The bestmatching

mj(t)

is then selected.

Step5: The selected

nodej,

that will be called nodec, is usedtosetacenteraroundwhichaneighborhoodNofnodes is definedbysomefunctionh(r, t) thatdecreases withrand t,where risthedistance between nodes in theoutputplane (thismeans thatthe value ofhwill be higher the closer to

node,

^weare,andthattheneighborhood decreases insize as the algorithm proceeds). All the code vectors within this neighborhoodwill now bemodifiedinsuch a way that they will tendtox(t)evenmoreclosely.Obviously,since h(r, t) decreases withtheradiusrfromnodec, nodes closer to

nodec

will bemore strongly modified. To make the whole procedure more stable, anotherfunction a(t) isalso introduced.

The action of this a(t) is to set an upperlimit to the magnitude ofthe changes allowed at each presentation, and it decreases witht. a(t) takes values between0and 1;fora(t) closeto0themagnitudeof thechangeswill be verysmall, whilefora(t) closeto1thechange will besobigastoalmost

(3)

Volume 66 June 1994

force the codevector

mc(t)

tobecomeequaltox(t).a(t)and h(r, t) control the speed of the code vectorupdating. a(t) fixes an upper limit to the modification of

mj(t)

in each step, avoiding the formation oflocally but not globally ordered regions in the outputplane. On the otherhand, h(r, t) is a sigmoidalfunction thatincorporates alateral inhibition acting over nodes that aresomedistance apart,causingthenear- est nodes to be more strongly updated. The inhibition in- creaseswiththetime.

Step6: The procedure continues at step 2 for each x(t) untilconvergence.

SeeTable 1 for a summary of theprocedure.

Mathematically, theself-organizingmapalgorithmcanbe describedas aMarkov process (Ritterand Schulten, 1988) whose states are the code vectors mj and the transitions, which are describedin step5 (Table 1),aredeterminedby theprobability density distribution 8of the input x(t).The necessaryandsufficient conditionstoguaranteetheconver- gence of the process are:

t

lim ae(t)dt⁼ oo tX+o

lima(t) ⁼ 0

t-xo

A full discussion onconvergence and topological properties of theself-organizing map canbe found in the work ofCottrell and Fort (1986) and Ritter and Schulten (1986, 1988).

When the numberof stepsisfinite, asin allthereal cases, theconvergenceisnotfully guaranteed. RitterandSchulten (1986)havereported the appearance of two different kinds ofinstabilities. The firsttypeis relatedtothe finalposition ofthe code vectors on the outplane, while the second one affects thecode vectorfeatures themselves.

TABLE 1 Self-organizingmapalgorithm Step

no. Process

0 Convert theimagesI intovectorsx(t).

1 Initialize themicodevectors.

Initialize the codevectorsmjassociated with each output node to random values.

2 Presentoneinputvectorx(t).

3 Compute distancetoallnodesattimet.

Compute the distance djbetween theinputvector x(t) and each output nodejusingdj ⁼ ^||x(t)- mj(t)^||

4 Select output node withminimum distance.

Select the output node as that node associated with the code vectorsmj(t)thatminimizesdj.

5 Update codevectorsfor nodej andits neighborhood.

mj are updated for node j and all nodes in a neighborhood N whose radius is defined by some function h(r, t) decreasing withrandt.

mj(t+1) =

mf(t)

+ a(t)h(r,t)(x(t)- mj(t))

wherea(t) isadecreasing function that controls the magnitude ofthe changes with the time (O < a(t) < 1) and h(r, t) is a sigmoidalfunction that controls theneighborhood of the node j inwhich the codevectors areupdated.

6 Repeat(gotostep2)until process converges.

As to the firsttype ofinstability, it should be noted that forpracticalapplications these smooth, fluctuating distribu- tions over alarge spatial scale areusuallynotverydisturbing, as oneis interestedmainly inpreserving the correct neighborhood relationships along the most important feature dimensions. As to the secondtype, the fluctuations follow the variance of the input data. If this variance exceeds some critical value, the associated nodes become unstable and a reconfiguration can occur. The calculation of this critical value istheoretically provided, for some very simple cases, by Ritter and Schulten(1986). In essence, andtranslatingthe mathematical ideas behind thispossible source ofinstability intosimpleterms,thetheoryistellingusthat if the variance amongtheimagesis verylargeorthe number ofimagesis very small, then the algorithm would be unstable,which is alogical result.

Centering our attention on the real data applications to electron microscopy to be presented later in this work, we have never encountered any convergence problems ofthis second class when analyzingthe datasetsused here,a fact that indicates that there is a good trade-off between vari- abilityand number ofimagesintypicalelectronmicroscopic applications.

We note that theprojectionof theinputdataontothe output plane does not require any a priori knowledge of the number of classes present in the input data. Itis, however, obvious that adecisiononthe total number andspatial dis- tributionof outputcells hastobedone beforehandand that, since the codevectorstendtofollowtheprobabilitydensity distribution 8of theinput data, there isarelationship between the numberof outputcodevectors andboth the number of inputimagesand8.Still,andas wenotein thesectionpre- senting the examples, simple decisions on the number of codevectorshave been sufficienttoperformthe studies presented inthis work.

Self-organizing maps could bepresented as a "kind" of k-meansclustering algorithm, in the sense that this iscon- sidered its most similar "classical" method (Lippmann, 1987).There are,however, clear differences between thetwo approaches,such asthenotionofneighborhoodcode vector updating, thatisnotpresentink-means, andthe factthatin self-organizingnetseachinputvectorispresentedonly once (if the number of samples is greater than the number of iterations), while in k-means algorithms each image is pre- sentedonce foreach iteration.

Learningvectorquantization (Kohonen, 1990),avariant ofthe described algorithm, has been used to implement a supervisedpatternclassificationtool; by supervised we mean thata set ofalready classifiedvectors isnecessary.

This latter approach is different from the one described before; insteadoftryingto approximate the code vectors to theinput vectors(or theirprobability function), we wish to obtain vectors thateffectivelyrepresent each class and pro- duceoptimal decisions.This secondmethod has two distinct stages.

First is the training procedure,inwhich aset of classified vectorsis needed.Wepresenttheclassifiedinput vectorsx(t) BiophysicalJournal

1806

(4)

tothe networkasin thepreviousalgorithm,but theupdating is done in a different way. First, no neighborhood is taken into account. Second, if the input vector x(t) is well classified, then the codevector mj isupdated tomatch itmore closely, while iftheclassificationfails,thenmjisupdatedin such a way that the distance

dj

is increased.

Theupdatingstrategy isbasicallythesame one asin the previous case ofunsupervisedclassificationin that codevec- tormjismodifiedbyaquantity proportionaltothe difference betweenthe node and the lastassignedinput, i.e.,totheterm (x(t)-

mj(t)).

However,thereare twoimportant differences inthe exact way ofproceeding.Thefirst one is that noneigh- borhood is taken inaccountandthat,therefore,thefunction h(r, t) is notconsidered.The secondoneis that thequantity (x(t) -

mj(t))

is added or subtracted to the vector mj de- pendingonwhethertheclassificationwasconsideredcorrect or incorrect. Ifcorrect, this term is added and the updated codevector

mi

becomes more similartotheassigned input x(t).Ifincorrect,then thistermis subtracted andmjbecomes less similar tox(t) than itwasbefore.

Second istheclassification itself, which isaccomplished bycalculating the distance between the unclassifiedvectors and the tuned code vectors and selecting that vector (that class) which is the closest in somedefined way.

It can beproved that thedescribed strategygives an op- timumclassifierintheBayesiansense(Kohonen, 1990).The training procedure is summarized in Table 2.

HOWTHE SELF-ORGANIZING ALGORITHM WORKS

We have defined mathematically how the self-organizing map deals with generalinputvectorsx(t). We nowpresent

TABLE 2 Learningvectorquantizationtraining algorithm Step

no. Training procedure

0 Convert the classifiedimagesIiintovectorsx(t).

1 Initialize themjvectors.

Initialize the code vectors mi associated with each output node to an initial value (for example, to some of the already classifiedimages).

2 Present oneinputvector.

3 Computedistancetoallnodesattimet.

Compute the distancedjbetween the input vectorx(t) and each outputnodejusing

d,

⁼ ||x(t) - mj(t)11

4 Selectoutput node with minimum distance.

Select the output node as that node associated with the code vectormj(t)thatminimizesdj.

5 Updatecode vectors innodej.

mj areupdated in the following way:

mj(t+1) = mj(t) + a(t)(x(t) - mj(t))

ifxiscorrectly classified

mi(t+

1) = mj(t) ^- at(t)(x(t) ^- mj(t))

if xisincorrectly classified mk(t+ 1)= mk(t) if k is different fromj.

wherea(t)isadecreasingfunction that controlsthemagnitude of thechangeswith time(O<a(t) <1).

6 Repeat(gotostep2)until theprocess converges.

a moreconceptualexplanation of how the algorithm behaves whenconfronted with a set of different images,acting as x(t) vectors. We are mainly interested in showing, through a simpleexample, how the output code vectors evolve and how images areassignedtothese vectorsthroughtheirevolution.

The case study we aregoing to use in this section corre- spondstothe classification ofasetof lateral views obtained from a biochemically pure preparation of the cytoplasmic chaperonin Tailless Complex Polypeptide-1 side view (see example 2 for a full description of the data set). The het- erogeneity within theprocessed population ofimagescomes from the fact that, intentionally, the original images have been centered but notrotationally aligned.The evolutionof the code vectors at different iterations is shown in Fig. 1.

Fig. 1,1 presents the code vectors before anytuning.These initial code vectors have been calculatedusing thefollowing equation:

CVj

=

min(Ij)

+

(max(Ij)

-

min(Ij))*randnumberj

(1) thatis,the value ofpixeliof codevectorjisarandomnum- berlyinginthe interval[min(J), max(Ij)],where

Ij

refersto animagerandomlyselected fromourdatasetthat is different for each code vector.

Itis clear that the initial code vectorsgenerated in this way arealldifferent,and that itis almostimpossibletorecognize any characteristics of the images under analysis in them.

Otherforms of code vectorinitialization have beentested, but we havenotfoundanydifference inthe final results of thealgorithm.

At anyrate,there isalways one code vector that has minimum "distance"from a certain view of the particles under analysis. Letus assumethat this codevectoris theencircled one(Fig. 1, 1). After some iterations, this code vector and itsneighborhoodareconstrained to resemble this view more closely(Fig. 1, 2).

The codevectorssurroundingcodevectora8are notsub- ject to similar forces,because while the upperright corner (codevector a4) is going to be changed to match mainly vectorswith thisparticularview,thelower left(codevector a12) is surrounded by othercode vectors that are notwell defined,so it isnotgoingto be modified toresembleonly asingleview.Whatfinally happensinthisstepisthata4will bebetter defined than a12 and its surroundings.

The immediate consequence (Fig. 1, 3) is that the code vector a12 isgoing toreceive less and less influence from theparticular viewthatbegantheclassification because its distance tothis view is larger than the distance between this samespecimenview and codevector a4.Furthermore,after this initialmodification, a4anda12aregoing to attract different views, which in this case means rotated views, im- mediately forcing a12anda4 torotate inopposite directions.

Of course, a12 influences itsneighborhood just as a8 did.

Fig. 1,4shows how the bestdefined code vectors migrate toward the corners, or borders, of the output plane. At the sametime that thismigrationoccurs, the code vectors rotate following the mechanism described inFig. 1, 3.

(5)

Volume 66 June 1994

FIGURE 1 Codevectorevolution during theclassification procedure ofa setof404 imagescorresponding to side views of the chaperoninTCP-1 complex that have been centered butnot rotationally aligned.Inall thecasespresented in thiswork,imageswere of dimensions 64X64pixelsandwere ana- lyzedwithinacircle of radius 22pixels.The labelsattached to eachcode vectoraretheir uniqueimagenamesused withinourimage processing system.(1)Codevectorsbefore anytuning. Thecircledparticle(labeledas a8)isgoingtominimize the distance to a particularview.(2)Codevectorsafter50 iterations. The upperrightareabeginstomatch aparticular view.(3)Codevectorsafter 100 iterations. Thecode vectors start to differen- tiate.(4) Code vectors after 150 iterations.

The bestmatchingcodevectorsmigrate to theborderof the outputplane andcomplete theirdifferentiation.

Itisacentralresult inself-organizingtheory that theoutput codevectorstendtofollow theprobability densitydistribu- tion8of theinput data(Kohonen, 1990).Itisthereforepos- sible inthose casesin which 8 presentswelldefined peaks (for example, whenthesample under analysispresentstwo clearclusters) that the meaningful codevectorslie inopposite corners of themapand therestoftheoutput mapremains mainly empty.

FIELDS OF APPLICATION

We propose to use theself-organizingmapsmainly fortwo purposes: after a translational and rotational alignment of macromolecules,andafteratranslational(butnotrotational) alignment ofmacromolecules.

Translational and rotational alignment of macromolecules

Theclassificationfocuses onintrinsic differencesamongthe images. In a homogeneous population only minor changes are tobe anticipated, usually related to small deformations of the particles or inhomogeneous staining (example 2, Fig. 3). On the other hand, if we have a heterogeneous population ofviewsor specimens, then the self-organizing mapapproach wouldseparatethe differentclasses(example

3). Furthermore, in populations with impurities, self- organizingmapshelptodistinguish betweenthe specimens and the impurities.

Translational (butnot rotational) alignment of macromolecules

If we startfrom abasically homogeneous datasetthatis not rotationally aligned,thentheclassification willgroupimages accordingtotheir relativeorientation(examples 1and2). It isthereforepossible tousethecodevectors as aninterme- diate step toward reference-free rotational alignment, and even to useone of the codevectors asa reference ifa reference isneeded.

The second algorithm performs a supervised classification,thatis,a setofpreclassifliedvectorsis needed.Although wehavenotusedthisapproach for any realapplicationin our laboratory,webelieve thatitmight be useful forsomespecial cases. To provide the reader with an example ofhowthis otheralgorithm works,we will presentin example 4 a real dataapplication ofthe method.

EXAMPLES

Inthis sectionweillustrate theuseoftheself-organizingmap in addressing four different classification problems. The

1808 BiophysicalJournal

(6)

following protocolhas beenfollowed from the algorithmic pointof view.1)Ahexagonaltopologyhasalwaysbeen used fortheoutputplane (outputnodesarerepresentedonasquare gridjust for displaypurpose). 2)Thetotal numberof nodes ranged from25to100.3)Tworounds ofparticle presentation wereapplied,the firstonecontaining1000presentationsand the secondone10,000(asthetotal numberoforiginalimages wasabout400,presentationswereiteratedoverthe dataset.

Thatis, eachimagewaspresented 10,000/400 = 24times, approximately). 4)Values ofa(0) = 0.05 and a(0) = 0.01 wereused in thetwo rounds ofpresentations, respectively.

Theexplicit form of the functiona(t) was:

n.iterations-t a = a(0) 1-

n.iterations

)

wheren.iterations isthenumber of iterations. 5)Theneigh- borhoodfunctionh(r, t) hada Gaussian profile, startingas the whole outputplane in the firstround and about half of it in the secondround. The explicit form of h(r, t) is:

h(r, t)

⁼ exp

2ur(t)2

⁾

where

(n.iterations -t

a(t)

⁼ 1 + (Rmax-1)

n.iterations

)

n.iterations is the number of iterations; Rmax is the initial neighborhoodradius; and

rj-rcII

is the distancebetween the nodej and the node i. 6) In all cases, images were of dimensions 64 X 64 pixels, andtheywereanalyzed within acircular mask of radius 22pixels.

Weshouldnotethattypicalcomputingtimeswerefrom5 to10 minon aSiliconGraphicsIndigo4000workstation and that the general behaviorofthe method changedvery little when different choices ofa andneighborhood wereused.

The firsttwo examplesinvolve samples from which ho- mogeneousimagesets wereobtained andpreprocessed.The preprocessingamountedto atranslational, butnotrotational, alignmentof theparticles.Thegoalistodemonstrate that the method is able to correctly order the different images as a function of their rotational angle.

Example 3 dealswith thecaseofabiochemically heterogeneous sample givingriseto a setofimages showingdif- ferent views. In thiscase,theimagesweretranslationallyand rotationallycentered and thegoalwas todifferentiateamong the biochemically distinct classes.

Finally, example 4 usesthelearningvector quantization algorithmtoautomaticallyclassifytheimagesobtained from thebiochemically heterogeneous sample alreadyused inthe previous example. An initial classification had to be per- formedto provide thealgorithm withthe needed preclassi- fied training set.

Example 1

The specimenused is B. subtilis GroEL(Carrascosa et al., 1990). GroEL is amember of the so-calledchaperoning, a

family ofproteins that playsanactive role in the translation, folding, and assembly of polypeptides (for a review, see Gatenby andEllis, 1990). A total of 307 translationally but notrotationallyaligned imagescorrespondingtothetopview of theaggregate have been used in thisexample for classification. Fig. 2 summarizes the results. Figure 2, 1 shows input data.

Fig. 2, 2 and 3 show the final ("tuned") code vectors, whicheffectivelyrepresentall thepossibledifferent views of arotationallyunalignedsetofparticles(inthistypeofanaly- sis themeaningfulcodevectors arelocatedattheperiphery of theoutputmap). Fig. 2,4showssomeof the actualimages assigned to each code vector.

As a comparison with another already well established approach in thisfield, we also present the firstfour eigenimages from the PCA performed over the input data set (Fig.2,5).While PCA is abletodetectatrend ofvariability which isindicativeof thepresenceofasevenfoldsymmetry, no explicitclassification isperformedatthisstep. The self- organizing map,by contrast, directly group theimages ac- cordingtotheirrelativeanglesand renders codevectorsthat canbevery easily interpreted.

Example 2

Theaimofthissecondexampleistoshow theversatilityof the methodwhen appliedtoanotherspecimen witha symmetry lessevident than in thepreviouscase.

Thespecimen used is the TCP-1complex(side view) (Gao etal., 1993;Marcoetal., manuscriptsubmitted).The TCP-1 complex is a cytoplasmic chaperonin that contains the t-complex polypeptide TCP-1 among several other related polypeptides.

Thegeneralimage processing approach,aswellasthelay- outof thecorresponding figure(Fig. 3) presentingtheresults, arebasicallythesameasintheprevious example.Atotal of 404imagesweretranslationally butnotrotationally aligned before being used asinputto theself-organizing map.

Asbefore, while itisvery difficulttodiscernany pattern in the input images (a representative gallery is shown in Fig.3, 1), thetunedcodevectors atthe borders of the map clearlyshow differentrotated versionsof the TCP-1complex side view(Fig. 3,2and3).Besides the ensembleinformation provided by the codevectors themselves,we also havethe list of individualimages assignedtoeachvector(see gallery inFig.3,4).^Wehave thereforeassignedrotationalanglesto theinput images directly throughthis classification scheme.

Thefirst foureigenimages from PCAare also presented inFig. 3, 5, resulting inbasically thesame situation asthe one shown in theprevious example.

A question raised is how the method would perform if appliedin thefollowing three-step procedure. First, acycle of rotationalangleassignment byclassification is carriedout

(Fig. 3, 1-4). Second,theanglesofound isappliedtoeach input image. Third, a second cycle of classification onthe already alignedimages is accomplished.

(7)

Volume 66 June 1994

* . *

* * *_

* 99

__

⁰⁰

* 00

FIGURE 2 Unsupervised classification of B. subtilis GroEL chaperonin (top view). A totalof 307translationallybutnot rotationally aligned images have been used. The external diameter of theparticles in thischaracteristic view is--14.5nm.(I ) Galleryofparticles translationallybutnot rotationally aligned.(2)Tunedcodevec- torsshowingthat the main difference be- tweentheparticlesisarotationalmisreg- istration. (3) Tuned code vectors, detail.

(4) An image randomly chosen among those assignedtothe codevectorsshown inFig. 2, 4. Note that therewere noimages assignedtothe central codevectora60 and that, therefore, theimagethat is shownat the center marked withadotisnot anorigi- nal imagebut the codevectora60itself.

(5) First four eigenimages obtained by PCA. The first two suggesta sevenfold symmetry.

The resultsofsuchadouble classification scheme ispresented inFig. 3, 6, clearly proving that the algorithm now classifiesimages accordingtosubtlevariations, suchasdif- ferencesin contrast atthe left side ofthe view.

Example3

Inthisexampleweanalyze a data set formed by an already translationally and rotationally aligned set of 420 images, corresponding to topviews obtained froma biochemically heterogeneous sample containing the TCP-1 complex (Gatenby and Ellis, 1990; Gao et al., 1993) and the TCP- 1/actin binary complex. In this case, the aim is to detect this biochemical variability within the aligned set of images.

That this datasetdoesnotlend itselfto anobvious clas- sificationis evident from thegallery of images presented in

Fig. 4,1. However, the tuned codevectorsshown inFig. 4, 2clearlypresent apatternofvariability atthecenterof the particle, providingadirectinterpretation oftheglobal variability of the dataset.Thefinalassignment of the inputimages tothe code vectors ispresented in Fig. 4, 3, which is actually aclassification of the input data set.

Inthisparticularcase aPCAapproach,while useful, was more difficult to interpret than the self-organizing map (Fig.4,4).Inessence,the first eigenimage detected a vari- abilityatthecenterof theparticle,butevenin thisimage,and certainlyinthe other threeeigenimagesshownhere, thepat- tern ofvariability is rather complex.

Example4

The purpose ofthis example is to test the learning vector quantization algorithm for supervised classification. We BiophysicalJoumal

1810

(8)

FIGURE 3 Unsupervised classification of TCP-1-complex (sideview).Atotal of 407 translationally but not rotationally aligned images have been used. The maximum dimensions of the particles in this view are -15.3 X 16.3 nm. (1) Galleryofparticles.

(2)Tuned codevectors.(3)Tuned codevectors(detail). (4)Animagerandomlychosen among those assigned to the code vectors shownin4.Note that theimagemarkedwith adot,originally assignedtothe centralcode vectoraSS,is not a lateral view butatop one.

(5)First foureigenimages obtained byprin- cipal component analysis.(6)Codevectorsof the same initial image set after rotational alignment.

haveused the same images asin example 3. By visual inspection, approximately 200 particles were classified, 100 as TCP-1 alone and the other 100 asTCP-1/actin (the re- mainder of the 420particlesare noteasily classified). This manually classified data set was then divided into two groups of equal size, each one containing 50 particles of the two classes. The first group was then used to tunethe code vectors (Fig. 5, 3), while the other one was used as thetest dataset. Theaccuracy of thesubsequent classification of the test data set was 92-96%, depending on the choice upon thetuning subsets.

Figs. 5, 1 and 2, are galleries ofTCP-1 withand without actin. Fig. 5, 3 shows the tuned code vectors. From this

output vector map it is evidentthat virtually all the influence has been received by only one code vector of each class (a3/a13).

DISCUSSION

Inthiswork we present the applicationofaparticulartype of neural network, known asthe self-organizing map, to a number of interesting problems in the field of macromolecular structure determination byelectronmicroscopy and image processing. More precisely, we address those situa- tions in whichcrystals are not availableand the structural determination has to be based on the analysis of a large

ft,,,

z,

z6-

0 lllllllllllm,;;141

(9)

BiophysicalJoumal

FIGURE 4 Unsupervisedclassification of

TCP-1 and TCP-1/actin complexes (top 4 view) performed fromasetof 420images.

The externaldiameter of theparticlesin this

characteristic view is -16.5 nm.(1) Gallery *z * ofparticles translationally and rotationally

aligned.(2)Tuned codevectors(detail). (3) 3 Animage randomlychosen among thoseas-

signedtothe codevectorsshown in 2. (4) First foureigenimagesobtainedbyprincipal component analysis. The first eigenimage, labeled asO.vec,showedaclearvariability locatedatthecenterof theparticle,but the generalpatternwasdifficulttointerpret.

FIGURE 5 Supervised classification of TCP-1 complex and TCP-1/actin complex (topview) performedfromatotalsetof 420

translationaily and rotationally aligned imagesof 64 X64pixels. (1) Galleryof TCP- 1/actin complex particles. (2) Gallery of TCP-1 complex particles. (3) Tuned code vectors.Notice that codevectorsa3anda13 have receivedmostof the influenceduringthe tuningprocess.

collection ofimagesofisolatedparticles. Recentexamples,

suchas the resolution of the three-dimensional structure of theEscherichia coli ribosome(Franket al., 1991) orof adenovirus (Stewart et al., 1991), are good examples of the

biological potentials of this type ofstudy.

Hi.,~~~~~~~.

- _ m

-_I-- III.

The needfor powerful classification toolsis clearly rec- ognizedas akey problem, andmuch work has been devoted to this problem in recent years (for a review, see Frank, 1990).We present inthis work a totally new approach to this problemthatisbased onaparticulartypeof neural network Volume 66 June 1994 1812

(10)

and that is able toperform, in a rather simple way, many of themainclassification tasks.

Wehave used asexamplesanumber of studiesinvolving chaperonins,inparticular GroELandTCP-1,toprovidethe reader with a senseofthewide rangeofproblemsthat can be addressed withthe approach we propose. On structural biology grounds,we would liketofurther comment on the results obtained whenanalyzingtopviewsof the TCP-1 and TCP-1/actin binary complexes(example 3).The methodology we proposein this work is abletoprovideaverysimple, clear, and direct piece of information onthe location of the unfoldedactin within the TCP-1complex.Itis evident from an inspection of the code vectors shown in Fig. 4 that the location of the actin is in the central channel of the aggregate, suggesting a number ofinteresting hypotheses (amore in- depth analysis is presented by Marco et al. in an accompa- nyingpaper).

We have shown that if we start from abasically homogeneouspopulation of views, thentheneural network-based approach isable toclassify directly a setofrotationallymis- aligned images into groupsrelated to their relative orientation. Alternatively, ifthepopulationofviews is potentially heterogeneous but the images are known to be correctly aligned, a classification centered on more intrinsic differ- encesamongtheparticlesmaybeperformed. The application tonegativelystainedsamplesof GroEL and TCP-1indicate that the method worksremarkably well inavarietyof cases.

Placing thisnewwork in the context of the methodology commonly used in image processing, it is clear that it is particularly valuable when addressing two vital steps, one being the elucidation of the underlying ensemblestructure fromalarge collectionofhomogeneous misaligned images, and the other the detectionofdifferent viewsamong acol- lection of alignedinput images. As far as the analysis ofa setofmisaligned images is concerned, this methodology of- fersawaytoperformaclassificationbasedonsimilaritiesin the relative orientations of the views without any prior angular correlation-basedalignment. Onthe detection of structuraldif- ferences, theapproach doesnotneedan apriori knowledgeof the number of clusters present inthestarting population.

Comparing this approachtootherspreviously proposed in thisfieldfor structure-basedclassification,^weshouldmainly comment on MSA. MSA is a very good method for data reduction, capable ofextracting the main patterns of variability within agivendataset and, inmany cases, enabling us tounderstandthepatternsofheterogeneitypresent inour imagedata set,especiallyifcoupledwith some further classification step(forareview,seeFrank,1990).MSAhas also been recently used to perform an orientation classification (Dubeetal.,1993).The action of the neural network method weproposehereisquitedifferent from MSA in that it directly performs a classification (the assignment to specific code vectors)whileatthe same time it offers adirectvisualization ofthepattern ofvariability throughaninspectionofthe code vectors themselves. The algorithm is also very fast, since typical computer times for the analysis of 400 particles within a circle ofradius 22 were on the order of 5-10 min on a modern workstation. The classification process of our

neural network approach is also very easy to understand through inspection of the codevectors.

Taking into account the properties of the newapproach discussed above,we propose to usetheself-organizing map as a new routine tool forimage classification that may, if needed, becomplemented by other tools,suchasMSA.Con- siderationsonthespeedof the neural networkapproach,the simplicity and directness of the output results, and its ro- bustness against noise, make it a valuable tool.

Author's note-The programs usedtoperform this work, user'sguide,and acompleteexampleareavailableby anonymousftpandgopheratgopher.

cnb.uam.es.

The authors thank S. Marco for carefultesting of the programs and J. J.

Merelo forintroducingus tothe world of neural networks. We also wish tothank Dr. T. Kohonen formakinghis programs availablebyanonymous ftp. GroEL imageswerekindly provided byDr. J. L.Carracosa's group, and TCP-1images by Dr. N. Cowan's andDr.J. L.Carrascosa's groups. Com- mentsfromDr.J. Frankand Dr. P. Penczekonearlierversions of this work arealsoacknowledged and greatlyappreciated.

Thisworkwassupportedin partbyGrant PB91-0910 from Plan General de Promoci6n del Conocimiento (Direcci6n General de Investigaci6n Cientifica yTecnica, DGICYT,Spain).R.Marabini holdsaCSICpredoc- toralfellowship.

REFERENCES

Boekema E. J., J. A. Berden,andM. vanHeel. 1986. Structure of mito- chondrialF1-ATPase studiedbyelectronmicroscopyandimageprocess- ing.Biochim.Biophys. Acta. 851:353-360.

Carazo, J. M.,F. F.Rivera,E. L.Zapata, M.Radermacher,and J. Frank.

1990. Fuzzysets-based classification of electronmicroscopy imagesof biological macromolecules with anapplication toribosomal particles.

J.Microsc. 157:187-203.

Carazo, J. M.,T.Wagenknecht, and J. Frank. 1989. Variationsof the 3D structureof theE. coli ribosome in the range ofoverlapviews.anapplication of the methods of multicone and local single-cone three- dimensional reconstruction.Biophys.J. 55:465-477.

Carrascosa, J. L., G. Abella, S. Marco, and J. M. Carazo. 1990. Three- dimensional reconstruction of the 7-folded form of theB.subtilis GroEL chaperonin. J. Struct. Biol. 104:2-8.

Cottrell, M., and J. C. Fort. 1986.Astochastic model ofretinotopy:aself organizingprocess.Biol. Cybern.53:405-411.

DubeP., P. Tavares,R.Lurz, and M.vanHeel. 1993. Theportal protein of bacteriophage SPP1: aDNApumpwith 13-fold symmetry. EMBO J.

12:1303-1309.

Frank, J. 1990. Classification of macromolecular assemblies studied as single particles. Q.Rev.Biophys.23:281-329.

Frank, J.,J. P. Betraudiere, J. M.Carazo,A. Verschoor, andT.Wagen- knecht, 1988a. Classification of images of biomolecular assemblies:

a study of ribosomes and ribosomal subunits of Escherichia coli.

J. Microsc. 150:99-115.

Frank J., P. Penczek, R. Grassucci, and S. Srivastava. 1991. Three- dimensional reconstruction of the 70S Escherichia coli ribosome in ice:

the distribution of ribosomal RNA. J. CellBiol. 115:597-605.

FrankJ.,P.Penczek, and W. Liu. 1992.Alignment, classification, and 3-D reconstruction ofsingle particles embedded in ice. In Proceedings of the 10thPfefferkornConferenceonSignal Processing: Scanning Microscopy Supplement. P. W. Hawkes and W. 0. Saxton, editors. Scanning MicroscopyInternational.

FrankJ., M. Radermacher, T. Wagenknecht, and A. Verschoor. 1988b.

Studyingribosomestructureby electronmicroscopyandcomputer-image processing. MethodsEnzymoL 164:3-35.

GaoY., I. E.Vainberg,R. L.Chow, andN.J.Cowan. 1993.Twocofactors and cytoplasmic chaperonin are required for the folding of alpha-tubulin and beta-tubulin. Mol. Cell. Biol. 13:2478-2485.

Gatenby, A., and R. J. Ellis. 1990. Chaperone function: the assembly of

(11)

1814 Biophysical Journal Volume66 June 1994 ribulose biphosphate carboxylase-oxygenase. Annu. Rev. Cell Bio.

6:125-149.

Kohonen,T.1990. The selforganizing map. Proc.IEEE.78(9):1464-1480.

Lippmann,R. P.1987.An introductiontocomputing with neuronalnets.

IEEEASSP Magazine. 4-22.

Radermacher, M., and J. Frank. 1985. Use of nonlinear mapping in multivariate image analysis of molecule projections. Ultramicros- copy. 17:117-126. (Erratum, Ultramicroscopy 19:75 (1986)).

Ritter, H., and K. Schulten. 1986. On thestationary state of Kohonen's self-organizing sensorymapping. Bio. Cybern. 54:99-106

Ritter H., and K. Schulten. 1988. Convergence properties of Kohonen's topology conserving maps: fluctuations, stability and dimension selec- tion.Bio. Cybern. 60:59-71.

Schatz, M., and M.vanHeel, 1990.Invariant classification of molecular views in electron micrographs.Ultramicroscopy. 32:255-264.

Stewart,P.L.,R. M.Burnett, M.Cyrklaff, and S. D. Fuller. 1991. Image reconstruction reveals the complex molecular organization of adenovirus.

Cell. 67:145-154.

vanHeel,M.1986.Finding the characteristic views of macromolecules in extremelynoisy electron micrographs. InPattern Recognition in Practice.

Vol.II.E.S.Gelsema andL.N.Kanal, editors. ElsevierNorth-Holland, New York. 291-299.

Wagenknecht, T., J.M.Carazo, M.Radermacher,and J. Frank. 1989.Three- dimensional reconstruction of theribosome fromE.coli.Biophys. J. 55:

455-464.