the next and probably the most complex major issue. The results are shown in Table 7 . In all cases, re lative n um bers are used with t he charac teristics of the octaword mac hine as t he refer ence .
Relative MB/sec
!-stream 0-stream All Reads
1 .00 1 .00 1 .00
1 . 1 2 1 .68 1 .42
1 .08 1 .6 1 1 .36
1 . 1 4 1 .74 1 .52
Percent Memory
!-stream 0-stream All Reads
1 .00 1 .00 1 .00
.91 1 .36 1 . 1 5
.88 1 .31 1 . 1 3
.95 1 .4 1 1 .27
Interfacing a VAX Microprocessor to a High- speed Multiprocessing Bus
A summary of the resu lts in Table 7 follows :
• The fi l l size has a negl igible effect on perfor
mance ( less than 1 percen t d i fference) . The hexword al ternative delivered an average of I percent better performance.
It is i mportant to keep in m ind that the s i mu la tion was performed assuming the m i n i mu m delay from m a i n memory. I n a mul t iprocessor syste m , the alternative with the lower miss rate i ncreases i n performance relative to the other alternatives as the main memory access t i me i ncreases .
• Hexword fetches dropped the overall m iss rate
by al most 30 percent. (As expected , the ! -stream miss rate i mprovement was much h i gher - a l most 5 0 percent.)
• The megabytes per second req u i red to main
tain a given performance level i ncreased by about 4 0 percent overal l for the hexword fetc h .
• As mentioned earlier, we were not a s con
cerned about megabytes per second as much as the percentage of the bus and memory con troller cycles per second . I n t h is l ight the hex word alternative req u i red about 1 8 percent
Table 8 Write Buffer Effectiveness
Average Minimum Maximum
Ratio With Write Buffer; Without Write Buffer* Write Buffer X M I Memory Miss Rate Utilization Utilization
47. 1 % 40.4% 54.9% .55 .50 .64 .49 .42 .58 • The utilization numbers are expressed as ratios between the
utilization with a write buffer and the utilization without the write buffer.
Table 9 XMI Bus Utilization per CPU !-stream D-stream
Reads Reads Writes Total*
Average . 89% 1 .39% 4 .4 1 % 6.27%
Minimum .24% 1 .26% 3 . 57% 5 .27%
Maximum 1 .65% 2 . 1 0% 5 . 97% 7.25% • The numbers in this column are averages of the total X M I bus
utilization across the seven workloads. These numbers are not sums of the individual utilization percentages in each column.
4 0
more bus cycles and 1 6 percent more memory cycles to support read traffic to m a i n memory. E i gh teen percent and 1 6 percent may seem l i ke a b ig i ncrease, but it i s i m portant to look
at overall bus bandwidt h . O n a write-through i nterconnect , the writes generally domi nate the traffi c .
• The overal l b u s traffic (taking i n to account
wri tes) i ncreased by only about 9 percent. Overa l l memory controller cycles i ncreased by even less - only about 4 percen t . The low i ncrease resulted because the ratio of write cycles to read cycles is h i gher in t he memory controller t han on the XMI bus.
Based on this data , we chose the hexword fi ll alternati ve . We fel t the potential for significantly more consistent performance i n large multiprocessor configurations ( d u e to decreased cache m iss rate) was wort h the esti mated 9 percent i ncrease i n bus u t i l ization .
Write Buffer Effectiveness and Overall
Bus Utilization
We were pleased to find that the write buffer was about as effective as we had predicted . The data i n Table 8 compares the XMI write traffic gener ated with and wit hout a write buffer. The data is q u i te consistent . On average , the write buffer reduced the number of write cycles on the bus by sl ightly less than half ( 4 5 percent) and reduced the memory control ler cycles by slightly more than half ( 5 1 percent) .
Table 9 shows the bus u t i l ization by the VAX 6 2 0 0 CPU running t he test benchmarks. Using t he average bus u t i l i zation number of 6 . 2 7 percent still yields only 50 percen t for a fu ll eight-processor system ; the 7 . 2 5 percent maxi m u m value yields 5 8 percent u t i l i zation . These figures are well w i t h i n our 7 5 percent u t i l i zation design goa l , and we decided to i mplement the write-buffer i nstead of the write-back design .
Another more conservative way to look at the data is to assume that we may not have the worst case environment covered i n any single bench mark. Therefore we should look at the "sum of maxi mums" to determi ne whether the design goal is met . Usi ng t he sum of maximums approach , we requ i re 9 . 7 2 percen t of t he X M I p e r processor, o r about 78 percent for e ight pro cessors. This figure is sufficiently close tO our design goa l of 75 percent max i mu m u t i l i zation to be acceptable .
Digital Technical journal No. 7 A ug ust 1 988