Y SUS DERECHOS FUNDAMENTALES
4. Los derechos sociales
Perfo
rmance
An area seldom add ressed in compilers is the opti mization of programs in memory to improve cache hit rates. Modern microprocessor performance has been increasing faster than the supporting memory systems. Taking advantage of this higher perfor mance without introducing costly memories has required the use of small and high-speed cache memory. Cache memory contains recently used instructions and data. H ardware substitutes cache memory if the desired word is in the cache. This substitution is invisible (other than performance improvements) to the program . Each main memory location must share a cache location with other memory locations because the cache is smaller than main memory. For a cache to be effective, it must contain enough of the program or data to ensure multiple reuse of the instructions or data.
TOTAL NUMBER OF PROGRAM CYCLES cycles \cycle s cum \ cycles bytes procedure ( f i l e )
/c, li l / l l/'\e
JO H eQln L./t.cxt. C)
23 2"1 nt.-d ehdt" I
. . .
ext input .e)8 w r i t.; ch.Hs I ./texto t.pu L . c l ( . . /t�xtoUlJJut . c)
.
, /telOI�O'-ltpUt. . c ) 9 0 0 . 0 0 1 0 0 . 0 0 82 o.oo 100.00 " o.oo 100.00 " 0 . 00 lOO.CO 1> 0 . 00 lOC.OO lJ 0 . 00 100 • .00 l l 0 . 0 0 6 0 . 00 1 00 . 00 > 0 . 0 0 1 00 . 00 0.00 100.00 o:�odl;;- 1. .
/ t . e . l nput . c: l f l t c: J n ( . . I t xt.output . c ) o f ( . . /tex t i :>ut . c l f h b u f ( . . / f 11bu{ . c )STATISTICS DESCRIBE CALLS TO wrile_char COMPILED FROM
SOURCE FILE lexloutput.c.
9 p.1d ( . . /tt•xtoutput . c )
.
/{open.cl/11brk. sl
" >
THE MOST HEAVILY USED LINE IS 1 20. LOCATED IN PROCEDURE
write_char, COMPILED FROM
SOURCE FILE lexloutpul.c.
w r i t e d�.1r (, . /text.output .c)
w r I t <!:-cho\ r I .. /t.ext.out.put. . c) �oln ( . . / t. lt iC l I n put. . c)
read ch<H ( . . /taxt Input. . c ) !Tiai n-f ( J IC!ont . p l
w r ! to ch.-rs ( . . /te"tOi.l t pu L . c ) ( . . /tc"tout.pu l . c )
writ.tt-Ch,H ( , , /t.I!:ICLOlllp�l t . C )
writ.tt�Lnt:eqer ( . . /teiCtoutput . c l m a l n C f l d o n L p ) w r i te i n t e q e t ( , , /t.exto'.lt.pu t . c ) :!�< � I n ( H x !ont. p ) eo l n { . . /tc.ut i n p u t . C ) J:",a i n l ! l x f o n t . p ) !'YI 1 n ( f l "fonl.pl
write ' t r i :-:q (. . /text.out put. . c J w r i t e-ch.- rs { .
.
/t.ext.out.put. . c) wrJte-eh,us ( . . /t.ttxt.output. .c) w r i t e=ch.:tr:� ( • . /t.e"toutpu t. . c ) w r ! te chars 1 . . / t e"t.out.po t . c ) m• i n ( f ! J dont. .pl I l l 2 eycl<lll \ cu'" \ l ' L 0 9 1 9 . 0 9 27 12 21369lb 1 . 85 8 4 . 8819 LINE 36 OF texlou1put.c HAS 6
J BYTES OF CODE, USED 1 .795.672
2 CYCLES. OR 1.21% OF TOTAL J > O . . 5 7 1 050 O . J'J
95.
� 6 .. 561855 0 . 38 9 5 . 8 4 " . 561855 0 . 38 96..21 " 28 481387 0 . 3 3 9 6. . 55 J8 10 3 4 8 1 5 0 0 . 24 96.1'} ) J1
00 3 4 8000 0 . 23 97 . 0 2Figure 2 Examples of prof Output Listings
A valuable optimization is to position the pro gram in memory so that the most frequently used memory locations never compete for the same cache locations.
MIPS
has built a tool called cord that rearranges the program to improve instruction cache utilization. This tool is made possible through the existence of precise profiling tools.To use cord, the programmer compiles the pro gram in the usual way. Pixie is used to add counters to the program for each basic block. After exe cuting the instrumented program, prof is run with an option that creates a file containing dynamic execution information. That file is given to cord, along with the original executable module.
Cord computes the density for each function (procedure). The density is defined as the average number of cycles executed by each i nstruction in
the function. Figure
3
is a n example of eightfunctions, their sizes, cycle counts, and density.
Cord then creates a new executable module after
sorting the functions according to density. Figure
3
also shows the order of the functions in the rear ranged program.
This sort improves cache hit rate because it places the functions that use t he most cycles in memory so they do not compete for the same cache location as other frequently executing functions. The effectiveness of sort is helped by two other features in the
MIPS
architecture. First, t he caches are direct-mapped to memory so that each memory location corresponds to a single cache location. Sec-94
ond, the
MIPS
operating system places virtual pages in physical memory so that adjacent virtual pages map to adjacent cache pages. As a result, the place ment of functions by cord has very predictable effects on the cache.Figure
3
also shows the arrangement in an unexpected way. Rather than placing the densest function (A) at the beginning of memory, it is placed farthest from the start of the cache. This arrange-80 70 60 >- f- 50 (j, z 4 0 c UJ Cl D B 30 20 1 0 OK 64K MEMORY ADDRESSES KEY
NAME S I Z E CYCLES DEN SITY
A 1 2K 960K 80 B 24K 1 680K 70 c 20K 1 200K 60 D 8 K 400K 50 E 1 6K 640K 40 F 1 2 K 360K 30 G 1 6K 320K 20 H 20K 200K 1 0
Figure 3 Cache Performance Improved by cord
ment has the effect of making the densest func tion share cache locations with the function 1 28 kilobytes (twice the cache size) away
(H). Cord
improves performance by as much as 20 percent to 30 percent on programs exceeding the size cache. lt works solely by improving the efficiency of the instruction cache. Methods to improve data cache accesses are not available yet because of the more random nature of data accesses and difficulty in get ting accurate data reference i nformation.An advanced architectural simulator, Sable mod els (in C) the processor, including TLB, pipeline, register set, and system design, incorporating the cache subsystem, main memory, and 110 interface. Developers can customize Sable for a unique system design. Sable has been used routinely at M I PS to
bring up the UNIX operating system before hard ware is available. Simulation with Sable assures that the software is reliable and performs optimally when the hardware is actually available. Sable can be used with the debugger to provide full symbolic debugging in the simulation environment. Also, Sable can provide the same address traces that
pixie
provides to analyze operating system performance. After a system has been brought up using Sable, other tools can assist in constructing the system on the real hardware. A simple debug monitor is available to work with the symbolic debugger to provide a symbolic debugging environment on the real hardware.Digital Tecbnicaljow71al Vol. 2 !Vo. 2, Spt·ing 1990