Two major problems were encou n tered in the design of an arbitration scheme for the NMI bus. The first was the fact that between the CPUs and the 1/0 su bsystems, called the NBfs, there was a possibi l ity that a high-priority device cou ld lock our a low-priority device from the bus. This is certa i n ly possible with a fixed priority-arbitra tion sche me. To address this problem, the C Box imp lements a dynam i c prior ity- a l location
Digital Technical journal
s c h e m e t h a t c a u s e s p r i o r i ty to be a s s i g n e d between two groups: the 1 /0 devices , a n d rhc CPUs . Wi t h i n these grou ps, t he priority s h i fts between rhe rwo CPUs and the two 1/0 devi ces . For exa mple. i f a l l four devices wanted to usc the bus a l l the time, the order in which the bus wou ld be granted to the devi ces wou ld be
first CPU , first l/0 , second CPU, second 1/0 . first CPU. first ljO, second CPU, second ljO , etc.
This scheme guarantees that all devices on the bus wi II have n e a r l y eq ua I access to rhe bus , rhus solving rhe lock-our proble m .
T h e second p rob lem i nvo lves the " memory busy" situation . Whenever rhe memory subsys tem cannot process m ore requests, it sends a " me mory busy" s i gn a l . I t cou l d h a p pe n , for i n s t a n c e . r h a r a C P U a c c e s s e s t h e b u s a n d attempts ro write ro memory . Upon receiv i ng a mem ory-b usy s i gna l , t h e C PU w i l l abort t h e wri te . W h e n memory i s released , some o t h e r device w i l l access t h e bus a n d perform a write. rhus fi l l ing the write queue in memory . Once aga i n , the fi rst CPU re-arbitrares, accesses the bus , and tries to w r i t e . Once aga i n , that CPU n:cc ives a memory busy signa l . And so on .
The NMI arbitration scheme mentioned above so lves t h is problem in which a device might get l ocked-our of me mory . As i m p l e m e nted , t h e arbi tration scheme saves r h e priori ty state at the r i m e b e fo r e t h e m e m o r y - b u s y s i g n a l w a s asserred. The arbitration logic then restores that stare so that rhe device that received the signa l wi l l get the bus when the memory-busy signa I is deasscrted .
Bus Bandwidth
For r h e processors on t h e i n terco n n e c t , bus bandwidth i nvolves two components: read band wid t h . and w r i te bandw i d t h . The prob lem of inadequate read bandwidth is addressed by hav ing a high h i t-rate cache . The higher t he hit rate , the fewer the requests tO memory. The problem of inadequate write bandwidth can be treated in rwo ways . The first way is to have a write-back cache l i ke rhc one on the VAX 8650 processor. ' Such a cache wri tes a block ro m e m ory on l y when r h e cache block is dea l located. This tech n ique can significantly reduce the write band width requirements.
Di!!,ilal Tecbnical journal No. ·1 Februcny 1 1)87
I n m u l t i processor sys t e m s l i ke t h e 8 8 0 0 ,
however, i n which each processor has a n i nter n a l cache . this technique becomes complicated .
In these systems, a data i tem can exist not o n ly in memory bur also i n a l l rhe caches. To main rain coherency. each write-back cache wou ld have to notify rhe other cache w h e n the first cache writes. This technique usu a l ly leads ro a complex protocol and design i mplementation.
Another approach in a multiprocessor system, rhe o n e u s e d in the 8 8 0 0 , i s r o i m p l e m e n t write-through cac hes . I n such a n approach, a l l write references go d i rectly t o memory s o that each cache on rhe bus can "sec" all write activ ity. The caches can then be inva l idated . Such an approach grea t l y s i m p l i fies the prorocol for cache coherency but, as d iscussed earl ier, gen erates a high degree of write traffi c . The unique design of rhe write buffer helps ro reduce this traffic , a l t hough not as m u c h as a write-back cache wou ld . In the 8800 processor, however, rhe write buffer reduces traffic enough so rhar the rwo VAX 8800 processors can write at their max i m u m banclwicl rhs on rhe NMI bus.
Coherency in a Multiprocessor System
A m u l t iprocessor system , with i n terna l caches, presents a n u m b e r of i n teres t i n g c o h e re n cy issues when sharing data. Ideally, i f one proces sor writes ro a location and rhe other processor reads rhar location, the read w i l l always get the data rhar was written . In practice, achieving this con d i t i on is d i fficu l t . Severa l major questions arise : Did the read happen before the write or afrer ir' What happens if both processors write ro the same location at rhe same r ime' Un less controlled , t hese siruations can produce unpre dictable resu l ts .
If programs on t h e processors wan t t o s hare clara . they must usc rhe interlock instructions in the VAX archi tecture ." O n ly after a n interl ock i nstruction is processed wi l l the memory loca tion be guaranteed ro have the correct clara . The general method is as fol lows . Processes must decide to share a block of memory. One mem ory location is cal led the software lock, and only one process ar a rime is a l l owed ro write to (or lock) t h a t l o ca t i o n . T h i s is accessed w i t h an i n te r l ock i nstruc t i o n , for exa m p l e , t he branch on bit ser and set interlocked (BBSSI ) or the add al igned word interl ocked (ADAWI) instructions.
4 9
A spects of the VAX 8800 C Bo.x Design
Upon gai ning the software lock. a given process can proceed to write any location in the shared bloc k . Read·wr i te coherency wi l l be assu red o n l y if t he other processes s h a r i n g t h a t d a t a observe t h e protocol of obta i n i ng t h e software
lock before modi fying the data structure . The VAX i nt e r l o c k i nstru c t i o ns a rc i m p l e · m e n ted u s i n g i nt e r l o c k m i c ro i n s t r u c t i o n s . These enable a processor to lock and unlock the memory su bsystem . Once locked . this su bsys· tem excludes further attempts to lock it until an u nlock has occurred . Thus only one processor or 1/0 system can lock the memory subsystem at any one time.
When each processor has an in tern a I cache. there is one more mechanism that keeps the two processors coheren t . Wh i l e one processor i s perform i ng a w r i t e to me mory and w h i le t h e wri te c o m m a n d i s on the N M I bus, the other processor w i l l exam ine i ts cache store to see if i t conta i n s a copy of t h a t d a ta . If the data is there, i t is marked inva l i d . The next req uest for
LEFT PROCESSOR
�
WRITE BUFFER WRITE I NTERLOCK FORCES WRITE B U FFER CONTENTS TO M EMORYN M I
this data '"''i I I then resu lt in a cache miss and a s u b s e q u e n t fe t c h t o m e m o r y . T h i s s i m p l e a p proa c h i s poss i b l e because t he VAX 8 8 0 0
cac hes a re write-thro u g h . Alt hough a l l wri tes arc s e e n on t h e b u s , the w r i t e b u ffer packs together consecutive writes within an octaword . Therefore , the nu mber of i nval i d a t i o n cycles p e rfo r m e d by a p ro cessor w i l l be red u c e d . When a n i nterlock write is performed , the con tents of the wri te bu ffer are sent to memory . Thus the interlock mechanism ensures that data coherency w i l l work under a l l cond i tions . Fig u r e 6 i l l u s t r a t e s t h e e v e n t s t h a t a c h i e v e coherency in the 8800 .
Summary
The genera l concepts used in the design of the C Box arc we l l known to computer designers . Our goal was to achieve a simple yet high-per for m a n ce d e s i gn t h a t a v o i d e d u n ne cessa r i l y complex solutions that d i d not give comparable i ncreases in performance . The choices made
OTHER PROCESSOR SEES WRITE ON NMI AND LOOKS I N CACHE FOR I NVALIDATION RIGHT PROCESSOR CACH E WRITE BUFFER SOFTWARE LOCK Figure G 5 0 MEMORY Multiprocessor Coherencv
Digital Technical journal
have y i e l d ed a design t h a t fu l ly supports t h t: m u ltiproct:ssor concept. The VAX 8800 system can translate a d d resses and access data faster than any previous VAX processor.
Acknowledgments
Al l those who worked on the VAX 8800 system cont ributed to the t h i n k ing that went i nto the C Box design . Special thanks go to Dave Sager for keep i ng things going.
References
l . VAX A rchitecture Handbook , ( Maynard : D ig i t a l Equ i p ment Corpora t i o n , Order No. EB- 2 6 1 1 5 -4 6 , 1 986 ) : 7- 1 1 to 7- 1 9 .
2 . A . Smith, "Cache Memories, " Computing S u r v eys , vo l . 1 4 , n o . 3 , ( S e p t e m b e r
1 98 2 ) : 4 7 3 - 5 3 0 .
3 . S. Mishra, · 'The VAX 8800 M icroarchitec ture . " Digital Technical journal (Febru ary 1 9 87, this issue) : 2 0- 3 3 .
4 . T . Foss u m , J . M c E l roy, and W . E ng l i s h ,
"An Overview of the VAX 8600 System , " D ig it a l Te c h n i c a l jo u r n a l ( A u g u s t
J 9 8 5 ) : 8-23
5. S. Farn h a m , M. Harve y , a n d K . Mo rse .
"VMS M u l t iprocessi ng on the VAX 8800 Syst e m , ' ' Dig ital Tec h n ical jo u r n a l
( Ft:bruary 1 98 7 , this issue) : 1 1 1 - 1 1 9
Digital Technical journal
No. 4 Fe/Jmmy 1 987
New Products
Paul]. Natusch
David C. Senerchia
Eugene L. Yu