SENTIDOS Y ARQUITECTURA
LUGAR SENTIDOS QUE ACTÚAN
We have focused our evaluations on the specific implementation of squids in the Hamal architecture, i.e. a hardware-recognized field within capabili-ties. A number of other approaches to the problem of pointer disambigua-tion can be used in place of or in addidisambigua-tion to this technique.
12.4.1 Generation Counters
We can associate with each pointer an m bit saturating generation counter which indicates the number of times that the object has been migrated. If two pointers being compared have the same generation counter (and it has not saturated), then the hardware can simply compare the address fields directly.
The migrated (M) bit in Hamal capabilities is a single-bit generation counter that deals with the common case of objects that are never migrated.
This completely eliminates aliasing overhead for applications that choose not to make use of forwarding pointers. Using two generation bits handles the additional case in which objects are migrated exactly once as a compaction operation after the program has initialized its data (this is one of the techniques used in [Luk99]). Again, overhead is completely elimi-nated in this case.
More generally, generation counters are effective in programs for which (1) objects are migrated a small number of times, and (2) at all times most working pointers to a given object have the same level of indirection.
They lose their effectiveness in programs for which either of these state-ments is false.
12.4.2 Software Comparisons
Instead of relying on hardware to ensure the correctness of pointer compari-sons, the compiler can insert code to explicitly determine the final addresses and then compare them directly, as in [Luk99]. Figure 12-3 shows the code that must be inserted; for each pointer a copy is created, and the copy is replaced with the final address by repeatedly checking for the presence of a forwarding pointer in the memory word being addressed. The outer loop is required in systems that support concurrent garbage collection or object migration to avoid a race condition when an object is migrated while the final addresses are being computed. In a complex superscalar processor, the cost of this code may only be a few cycles (and a few registers) if the memory words being addressed are present in the data cache. The overhead will be much larger if a cache miss occurs while either of the final ad-dresses is being computed.
Making use of hardware traps, and placing this code in a trap handler rather than inlining it at every pointer comparison, has the advantages of reducing code size and eliminating overhead when the hardware is able to disambiguate the pointers. On the other hand, overhead is increased when a trap is taken due to the need to initialize the trap handler and clean up when it has finished. In our simulations, we found that 25% of the trap cycles were used to perform the actual comparisons. Thus, using software com-parisons would give roughly the same performance as hardware compari-sons with two squid bits.
temp1 = ptr1;
temp2 = ptr2;
flag = 0;
do {
while (check_forwarding_bit(temp1)) temp1 = unforwarded_read(temp1);
while (check_forwarding_bit(temp2)) {
temp2 = unforwarded_read(temp2);
flag = 1;
}
} while (flag);
compare(temp1, temp2);
Figure 12-3: Using software only to ensure the correctness of pointer compari-sons, the compiler must insert the above code wherever two pointers are com-pared.
The cost of software comparisons can be reduced (but not eliminated) by incorporating squids, as shown in Figure 5. This combined approach features both exponential reduction in overhead and fast inline comparisons at the expense of increased code size and register requirements.
temp = ptr1 ^ ptr2;
if (temp & SQUID_MASK)
<pointers are different>
else
<compare by dereferencing>
Figure 12-4: Using squids in conjunction with software comparison.
12.4.3 Data Dependence Speculation
In [Luk99], the problem of memory operation reordering is addressed using data dependence speculation ([Moshovos97], [Chrysos98]). This is a tech-nique that allows loads to speculatively execute before an earlier store when the address of the store is not yet known. In order to support forwarding pointers, the speculation mechanism must be altered so that it compares final addresses rather than the addresses initially generated by the instruc-tion stream. This in turn requires that the mechanism is somehow informed each time a memory request is forwarded. The details of how this is ac-complished would depend on whether forwarding is implemented directly by hardware or in software via exceptions.
Data dependence speculation does not allow stores to execute before earlier loads/stores, but this is unlikely to cause problems as a store does not produce data which is needed for program execution. A greater concern is the failure to reorder atomic read-and-modify memory operations, such as those supported by Tera [Alverson90], the Cray T3E [Scott96], or Hamal.
12.4.4 Squids without Capabilities
It is possible to implement squids without capabilities by taking the upper n bits of a pointer to be that pointer’s squid. This has the effect of subdivid-ing the virtual address space into 2n domains. When an object is allocated, it is randomly assigned to one of the domains. Objects migration is then restricted to a single domain in order to preserve squids.
Using a large number of domains introduces fragmentation problems and makes data compaction difficult since, for example, objects from dif-ferent domains cannot be placed in the same page. However, as seen in Section 12.2, noticeable performance improvements are achieved with only one or two squid bits (two or four domains).
Alternately, the hardware can cooperate to avoid the problems associ-ated with multiple domains by simply ignoring the upper n address bits. In this case the architecture begins to resemble a capability machine since the pointer contains both an address and some additional information. The dif-ference is that the pointers are unprotected, so user programs must take care to avoid mutating the squid bits or performing pointer arithmetic that causes a pointer to address a different object. Additionally, because the pointer contains no segment information, arrays of objects are still a problem since a single squid would be created for the entire array.