The major benefit of incremental collection is real-time performance. Real-time applications cannot tolerate the long and potentially unbounded GC pauses imposed by most non-incremental collectors. Figure 4-1 shows the time profile of the bipsctrl program running with the non-incremental, generational mostly-copying collector. The dark areas indicate that program control is inside the application itself, and the lighter areas indicate that control is inside the garbage collector. The duration of each of the lighter areas is on the order of hundreds of milliseconds. This is unacceptable for real-time performance. Appel et al. indicate that for a garbage collector to be real-time, the GC pauses must be less than a very small constant time. And for interactive applications, this maximum pause should be less 100 milliseconds [Appel et al. 88].
0 50ms 100ms 200ms
Figure 4-1: Time profile ofbipsctrlrunning with the non-incremental collector
With the incremental collector, GC is divided into smaller chunks of work, as illustrated in figure 4-2. Instead of having pauses of hundreds of milliseconds each time the program enters garbage collection mode, GC now pauses much more frequently, but the length of each pause is significantly reduced. Most are less than 10 milliseconds, and the
maximum pause is still well within 100 milliseconds. Comparing figures 4-1 and 4-2, one can also notice that the total execution time for the incremental collector is longer. This is due to the overhead for having real-time collection.
0 50ms 100ms 200ms
Figure 4-2: Time profile ofbipsctrlrunning with the incremental collector
Looking across theMax Pause andMedian Pause rows in table 4-I on page 50, one can see that the incremental collector has much smaller maximum pause and median pause than the non-incremental one for both benchmark programs. However, this is not achieved without any cost. For instance, the total time for words running with the incremental generational collector is 24% longer than that running with the non-incremental generational collector ((18.18-14.65)/14.65 = 24%). This is due to the fact that a larger number of garbage collection sessions are initiated in the incremental case than in the non- incremental case (27 times vs. 18 times). The incremental collector inherently requires more memory in order to run as efficiently as the non-incremental collector, because the previous- space heap pages are not reclaimed until the end of the collection, so comparatively less space is available. Therefore if the heap is not large enough, then collection is called frequently, resulting in much wasted scanning.
Along the same line of thinking, the reason why the incremental generational version ofbipsctrlis only 7% slower than the non-incremental generational version (4.61 sec vs. 4.29 sec) is likely because the behavior of the program is such that equal number of garbage collection sessions (3) are initiated in both cases, and the heap is expanded once equally. It is likely that the amount of scanning is comparable in both cases, and therefore the comparable total execution times.
The costs of incremental collection come not only from scanning and forwarding, but from the overhead of mprotectand page fault trap also. Figure 4-II itemizes these costs. On the DECStation 3100, the cost of mprotect is 45µsec per call. A page trap takes 200µsec, which includes the time it takes to interrupt the program, enter and then exit the trap handler to return to the main program control. The matrix shows the overhead (in milliseconds) of the number of pointers to scan versus the size of each forwarded object (in number of words). For example, to scan a physical page of 1024 pointers and forward the 1024 64-word long objects that these pointers reference takes 16 milliseconds. The scanning and forwarding operation is essentially free unless the collector has to forward a large number of relatively large objects, e.g. the collector scans an object with 4096 pointers, which occupies 4 physical pages, such that each pointer points at a 100-words long objects to be forwarded. Such an object is somewhat unlikely, but the collector can still handle it in a reasonable amount of time (94 milliseconds). In addition, since the collector does not copy large objects exceeding one heap page (512 bytes = 128 words) in size, the overhead for forwarding these objects -- by simply changing the space identifier(s) -- is very low. This can be observed by the sharp drop in time overhead right past the 128-word mark: forwarding 4096 100-word long objects takes 94 milliseconds, but forwarding the same number of 128-word long objects takes only 20 milliseconds.
Figure 4-II indicates that the costs of each overhead is not overwhelmingly large. But when the cost is incurred over and again a large number of times, the cumulative overhead becomes significant. In the original design of the collector when no attempt was made to minimize the number calls to protect memory,mprotectwas the dominant overhead. But with the optimization described in section 3.2.4.2,mprotect ceases to be the major cost of incremental collection. In fact, the combination of mprotect and page fault overhead in the benchmark programs accounts for only 3% of the total execution time. It is, on the other hand, when GC is initiated too often, then the cost of scanning/forwarding in the incremental collector becomes the dominating overhead. The data in table 4-I supports the claim that total execution time is proportional to the number of collections initiated. When garbage collection is called for frequently, it is probably because the heap is not big enough for the application’s allocation need. Instead of spending most effort on reclaiming stale objects, the collector is likely to expend a disproportionate amount of time scanning objects that have just been scanned recently.
It is therefore important to tune performance of an application running with the incremental collector to ensure that the heap is large enough, so that excessive scanning and
Table 4-II: Overhead of page fault trap,mprotectand scanning
Hardware platform: DECStation 3100
Trap overhead = 200µsec/trap Mprotect overhead = 45µsec/call
Scanning and forwarding overhead:
| # W O R D S / O B J E C T | 4 8 16 32 64 100 128 256 512 1024 2048 4096 ---+--- # 2| - - - - - - - - - 4 - - | P 4| - - 4 - - - - - - - - - | T 8| - - - - - - - - - - - - | R 16| - - - - | S 32| - - - - | 64| - - - - | 128| - - - 4 4 4 - - - - | 256| 4 - 4 - 4 4 - - - 4 | 512| 4 3 4 12 8 11 3 4 4 4 4 4 | 1024| 4 8 8 12 16 24 4 3 4 4 4 4 | 2048| 19 16 19 19 35 43 12 12 11 11 11 11 | 4096| 24 27 35 43 66 94 20 20 16 20 24 24 |
forwarding as a result of too many GC initiations is prevented. But under most circumstances, figure 4-II suggests that the real-time performance of the incremental collector will still be satisfactory for even very memory intensive codes. The sum of the trap andmprotectoverheads, and the time for scanning and forwarding objects are likely to be within the 100 millisecond limit.