• No se han encontrado resultados

FUNDAMENTACIÓN TEÓRICA DEL PROGRAMA

4. PROPUESTA CURRICULAR

4.2 FUNDAMENTACIÓN TEÓRICA DEL PROGRAMA

For an addition operation , �he 3 2 -bit words con­ tai n i ng the exponents are sent to the main ALU . There t hey a rc passed to t h e A a n d B port s , w h i c h fee d t h e s h i ft e r m o d u l e . These ports drive all the gate arrays i n para l l e l .

The exponents are then loaded i nto the XALU and th e sh ift-a mou nt ALU (SALU) , which com­ p u te s t he al ign m e n t s h i ft a m o u n t sent to t h e shifter. T h e SALU a lso generates some 2 0 branch conditions for the m icrocode. These conditions i n d i ca t e t h e s i ze o f t h e a l i g n m e n t s h i ft a n d w h e t h e r a n y s o u r c e o p e r a n d i s z e r o o r a rese rved opera n d . They a lso he l p to opt i m ize the microcode tlow.

The XAllJ, which selects the larger exponent and saves it for later use , has a 1 2 -b i t datapath and a register to hold the exponent. The size of this datapath is sufficient for the F, D , and G for­ mats plus a guard bit for overtlow or undertlow detection. An ALl! is provided to perform arith­ metic opera tions o n the exponen t . The SAUl , with a n l l -bit datapath, su btracts the exponents to determ ine the a lignment shift amount, which is always positive . The s ign man ipu lation logic also resi des in the SALU.

Next, the fract iona l part of the smaller operand is a ligned hy the shifter. This operati o n i nvolves e i t her one CPU cyc le for F format o perands or two CPU cyc l e s for the D a n d G for m a ts . The shifter unit shifts i n the tloating point format and can do a ful l 6 4 -bit shift . The l ogic that deter­ m i n es the rou nd bits i s related to the align ment s h i ft operation but i s phys i ca l ly l ocated in the priority encoder gate array . This gate array also conta i ns some of the shifter fu nctionality .

N i ne gate arrays are used for the shifter unit. Of those , eight make u p the datapath, the n i nth is t he c o n t rol d ev i c e . The s h i fter c a n accept ei ther a 64 -bit operand o n the A and B ports or a 3 2 -bir operand on ei ther port . The shifter gener­ ates a 3 2 -bit resul t that can be ei ther the h igh­ order or the low-order part of the answer. The

65

Floating Point in the VAX 8800 Familv

s h i fter datapath gate a rrays a rc i d ent i ca l : each

e ffectively constitutes a byte s l i ce of the design and performs a bit s h i ft of u p to seven places

Byte sh ifting is then performed by send i n g t he correct s h i fter omput to the correct byte pos i ­

tion . This opera t i on is fac i l i ta ted by having a l l

the outp ut s wired t o the OR ga tes a t a l l poss ible byte positions and by enabl ing the con·ecr output.

The s h i fter performs floati ng po int. inte ge r. and logical shifts, as wel l as a n umber of m i sce l ­

laneous fu nctions . These i n c l u d e conve rts from deci m a l - format data i n to integer format and \'icc versa . The ma s kin g of the expo n e n t f i e l d a n d the i nsertion o f t h e h idden b i t are also done by the shifter.

After t h e a l i g n m e n t s h i ft . the o u t p u t of t he

s h i fter is d i rected to the m a i n ALU on the lwpass

bus. There. the output is added to or su btracted from the fraction of the larger operand. The out­ put of the ALU operation is now ready to be nor­

mal i zed i n the shifter. I n most cases a sma l l nor­ m a l i ze s h i ft of at most one b i t pos i t i o n l eft or r i g h t w i l l be s u ffic ien t . The specia lized hard­

ware i n t h e s h i fter ha n d l es this case and then ro u n d s t h e r e s u l t . S h o u l d a l a r g e r s h i ft b e req u i red , then m i c rocode w i l l fi rst d irect t h e ALU res u l t to t h e priori ty e n coder g a t e a rray . There , the p osition of the leading l is fou n d a n d used t o determ i ne the norma l i ze a m o u n t for t he subsequent cyc l e .

The round ing operat ion i n the V�'( 8800 CPU is unusual i n that i t is l i mi ted to the low-order e i ght bits . Therefore . a small 8-bit adder can be

used for this opera tion . This adder is both faster and cheaper than the u s u a l met hod of u s i n g a

fu l l 64 - b i t adder. The 8 - b i t adder is a I so s u ffi ­ c i e n t to cal c u la te t h e correct a n swer i n over

\)\). 5 percen t of the addition operations. Shou ld a carry-out be generated b y t h i s 8-bit roundi ng

add , then c learly the resu lt created i s i ncorrect . l n t h a t c a s e t h e c o m p u t e r i s t r appe d a n d mi crocode i nvoked to correct the resu l t .

Multiplication

A-; men tioned earlier, the 8BOO contains a h i gh­ pe rformance. custom-designed mu l tiplier a n d d i v i der u n i t . A nu mber of factors impelled u s to usc such a unit. F i rs t . m u l t i p l i cation i s a very

frequent operatio n that i s used ex tensively in

matrix mani p u lati on. For example, i n the LIN­

PACK benchmark, the t i m e-critical rou t i n e con­ ta i ns an even mix of addi tion and m u l t i p l ication operations. '

66

Second . it was not poss i b le to succumb tO t lw temptation of using the m a i n AUJ to provide t he d ivision o pe ration . This desi re was natura l s i n ce cl i ,· i s ion is an infrequent opera t i o n . and the usc of an AUJ i n a repeated su btract and shift mode was a ppealing. For exam ple. the VAX 8600 uses the ALU for just t hat pu rpose . In t he 8 8 0 0 the

m a i n AUJ a l so c o m p u t e s the virtual a d d ress . Si nce this clatapa th is very t i me-cri tical ( i n t he

8 8 0 0 a s w e l l a s i n mos t o t h e r com pute r designs) . i t can not be a l lowed to go any slower.

I ncl uding a n extra path to accom modate d i v i ­

s i o n wou ld have slowed down t h i s critica l path

by around '5 ns, resul ting i n a 1 0 perce n t perfor­

mance degradation for a l l operations.

J\tlorcovcr , the ava i I a b l e space for the m ulti­

p l i cr and d i vider u nit was l i m i ted s i nce fl oati ng

poi nt operations a rc integrated with t he rest of the mach i n e . Approx i mately one-t h i rd of a mod­

u l e ( 1 2 inches by 1 6 inc hes) was available . I n

contrast, t h e VAX 8 6 '5 0 C P U dedi cates a fu l l module to m u l t i pl i cation .

T h e c u s t o m d es ign o f t h e mu l ti pli er a n d d i v i d e r u n i t is basi ca l l y a byte s l i c e o f a l a rge word - s i z e d multipli e r a n d d i v i de r u n i t . T h e

multiplier handles 8 b i ts p e r cyc l e , the d ivider

handles I h i t . F i gu re 4 sh ows t h e c o m p l e t e 5 6 - b i t by H- bit multi pl i e r w i t h i ts e i g h t byte­

s l i ce custom ch ips . Eight c h i ps arc used tO form

the requ ired word size of 64 bi ts ( 5 6 data b i ts pl us 8 gua rd b i ts ) . T h i s arrange m e n t is s u ffi ­ c i e n t to handle F. 0 , and G format operations.

H format operations arc performed by part i tion ­

i n g t h e problem i n t o many smaller '5 6 - b i t m u l t i ­

pl i cat ions u nder m i crocode contro l .

The mult iplicand is loaded i n to the M D latch a ft e r pass i n g t h ro u g h r h c m a s k l o g i c . w h i c h c l e a r s t h e sign a n d th e e x p o n e n t f i e l d a n d i n se rt s t h e h i d d e n b i t . T h e P R latch a n d th e P R G B arc c l eared a t the s t a r t of the m u l ti p l y . The P R G B c o n t a i n s t h e g u a rd b i ts for t h e P R latc h . A t the e n d of a mul t i ply . t h i s l a t c h w i l l hold the b i ts requi red for a poss i b le norma l i za­ tion s h i ft a nd a lso for a rou n d i ng operation. The l east s i gn ificant eight bits of the multi plier arc

loaded i n to t he m u l t i pl ier larc h . The fi rst m u l t i ­

p l y cyc l e is now ready to be performed.

A '5 6 - b i t by 8-bi t mul ti pli cation is performed between the contents of the MD a nd mul tiplier latches. The result is then added to the contents of the PR latch (which i s i n i t ia l. l y zero ) and then written back i n to it with a right s h i ft of 8 b i ts . The PR latch is th us an accumu lati ng l atch and

Digital Technical journal 1Vo. 4 FehruaJ:J' l lJ87

MULTIPLICAND IN PUT MULTIPLIER I N PUT S·BIT SHIFT PRGB 64 BITS BOOTH RECODE MULTIPLIER OUTPUT

Figure 4 Multiplier and Divider Unit

conta ins the 64·bit partial product of each m u l· t i p l i c a r i on opera t i o n . T h e next 8 b i ts o f t h e m u l t i p l ier are loaded i nto the m u l tiplier larc h , ready for the next cycle. This cycling cont inues unti l the m u l t i p l icand has been m u l t i p l ied by a l l the m u l t i p l ier byres. This algorithm is si m i lar to the one u s e d in t h e VAX R 6 5 0 s c h e m e , except that that processor has a narrower data· path of 32 bits.

Notice that the least s i g n i fi ca n t byte of t h e partial product is discarded after each cycle and absorbed by the s h i ft e r u n i t . These bytes are requ i red only for the H format multi ply.

O n c e c o m p l e t e d , t h e re s u l t i s s e n t o u r