The cache is designed as a set-associative cache, and is indexed with the virtual address comprised of the xor of the source and target addresses of an attempted control-flow edge. Owing to the nature of the cache to assure secure control flow of an application, all address fields in the cache must contain the full 64 bits for each of the source and target addresses for an entry, shown in Figure 3.3. This is to prevent aliasing of malicious, attacker-injected addresses with valid entries contained within the cache. For replacement within a set of the cache, whenever an entry is accessed, the useful bit is incremented. Whenever a new entry is added, a victim is randomly chosen from the set of ways whose useful bit is not set. In the event all useful bits are set, all bits are reset and a victim is chosen at random. Whenever the edge cache is accessed, the full source and target addresses comprise the tag for matching the query. The size of the edge cache and associated structures is demonstrated in Table 3.1.
Edge Cache n-way
If hit, commit target. Else if miss, execute sled
and update edge cache. Validated <src,target> From commit Un-Validated <src,target> From commit Region Table Region Address G U V write Target Addr. Offset tag Region Offset
full address generation bit(s) Source
Addr. Offset G Pointer(S)Region G Pointer(T) U VRegion
Figure 3.3 Edge Cache Validation of Indirect Edges. The edge cache enables memoization of indirect control flow edges to accelerate CDI-compliant code execution. The edge cache is an n-way, set-associative cache which, in conjunction with the region table, stores the source and target addresses of previously seen edges comprised of indirect instruction source addresses and direct, conditional branch targets executed from the sled. Whenever an indirect instruction (e.g., return or indirect call) instruction reaches commit, the source and target addresses are verified against control edge information stored in the edge cache and region table. The highest n bits of an address are stored in the region table to exploit address locality, while the lowest 64-n address bits remain in the edge cache. If the edge cache misses on the access, execution falls through to the conditional branch sled backing the indirect instruction. Otherwise the instruction is allowed to commit.
The edge cache is accessed whenever an indirect instruction is committed (e.g., return instruction). Since the potentially large conditional branch sled will be executed whenever verification fails, it is imperative to achieve a high hit rate when polling the edge cache. Factors which influence the hit rate of the edge cache include cache parameters such as size, associativity, and replacement policy. In the case of the edge cache, the point in the pipeline at which the cache is accessed can also have an impact on the rate of verification.
The immediate intuition is to poll the edge cache when an instruction is fetched, much like would be done in the case of a Branch Target Buffer (BTB). However, this policy can lead to excessive misses for the edge cache. At the time an indirect instruction is decoded, the instruction address is known but the target may only be speculated. This gives rise to targets which are not valid control-flow targets for a given branch. Such an instruction can never be confirmed by the edge cache, as it would not be derived from the conditional branch sled executed whenever an indirect instruction is not verified. This situation leads to more frequent executions of conditional branch sleds, as the mispredicted instruction will result in a fall through to the sled and the sled is guaranteed to complete execution (i.e., if the mispeculation could be discovered later, the sled will continue to execute). Further, when an incorrect target supplied by prediction is verified by the edge cache, but subsequently identified as mispredicted, the resolved target must still be verified.
instead poll the edge cache when an instruction reaches commit. A mispredicted indirect instruction is generally identified as such after the execute stage when the actual target be- comes known. At the commit stage, the resolved target is always known and can be verified by the edge cache. This eliminates the execution of sleds when prediction mechanisms alias.
However, it also is the case that at commit, any unverified edge will result in execution of the conditional branch sled for the indirect instruction as well as result in squashing all instructions speculatively executed after the unverified edge. This increases the penalty per sled executed, but reduces the incidence of such executions. Given the length of longer sleds can exceed two thousand targets for some applications (such as 403.gcc) we observed placing the verification of edges at commit to be more efficient.
To minimize the performance impact of squashing while executing CDI-enabled code, the edge cache is polled immediately at the beginning of the commit stage. This allows identification of mispredicted indirect instructions before the fetch stage is notified of the misprediction. At this point the resolved target is known and the edge cache can be polled for the instruction and target addresses. In the event that the edge is verified, the resolved target is allowed to execute and fetching begins at the target address. In the event that the edge cannot be verified, the instruction is not allowed to direct execution flow and the program counter is updated to the fall through to the first instruction of the associated sled. For instructions which are correctly predicted, the edge cache is also polled. If the instruction and target address pair are verified, then the instruction is allowed to commit as normal. However, if the edge cache cannot verify the edge the instruction is treated as a mispredicted instruction. Program control will once again be directed to the fall through address to execute the sled, identical to the procedure when a mispredicted instruction cannot be verified.
As highlighted earlier in Section 3.2, whenever an edge cannot be verified and a sled is executed, the edge cache must be updated. Execution of the sled initiates the recording of the PC of the instruction that initiated the sled, plus the PC of the valid target that the sled selected. When ultimately committed, this ¡source, target¿ address pair are used to update the edge cache. Note that, so long as instructions are committed in order, only a single pair may be in flight at any one time assuring that the target will come from the sled corresponding to a given indirect instruction.