Using an IC model allow us to track the state of the IC throughout the course of a test execution. For every cacheable instruction retired in the trace, the IC model is looked up. On a miss, the cacheline containing the retired instruction is added to the IC model at the appropriate index and way. On a hit, the IC model is left unchanged. For example, let us assume that the processor contains a 2-way set associative IC with 128 entries indexed by linear address bits [12:6]. Suppose each entry holds a 64-byte aligned cacheline. The IC replicates this structure. For each retired instruction, we identify the 64-byte-aligned cacheline that it belongs to. If the cacheline is already present in the cache model, we proceed to the next instruction. If not, we simulate a ’fetch’ from L2 i.e the cacheline is added to the IC model. By tracking the instructions in an ISA simulation trace, we can track the IC fetch and evictions as they would happen in RTL.
00000000F6E4C7F jrnp err_guest
00000000F6E4C84 btr eax, 3
00000000F6E4C88 inov cr0, eax
00000000F6E4C8B pop eax
00000000F6E4C8C iretd
In the piece of code shown above, the conditional jump located at 0xF6E4C7F belongs to a cacheline whose index (LA[12:6]) is 31h. This 64-byte cacheline ex- tends from linear address 0xF6E4C40 to 0xF6E4C7F. At the time of retirement
of the jump instruction, this cacheline will be present in the i-cache. The next cacheline, with index 32h, extends from 0xF6E4C80 to 0xF6E4CBF. This instruc- tion will cause a prefetch of the next cacheline starting at 0xF6E4C80 (unless it is already present in the IC). This happens irrespective of the branch outcome, as the instruction itself spans into the next cacheline. If index 32h is vacant, the cacheline will be added to way0. If way0 is occupied, it will be filled in way1.
If both the ways of index 32h are already occupied, the IC model follows the way-replacement policy of the processor to ascertain which cacheline has to be evicted. For example, if either of ways is occupied by an invalidated cacheline, it is replaced. If not, if the processor follows a LRU-based replacement policy, the least recently used cacheline is replaced.
Tracking dense branches
Knowing the cache architecture allows us to calculate the cacheline bound- aries. We can also track branch discoveries and calculate if a discovered branch is recorded as a sparse branch or a dense branch. Thus an IC model allows us to use ISA simulation data to calculate if a test contains dense cacheline. It also allows us to estimate if a cacheline containing one or more dense branches is evicted to L2 or refetched to IC.
Speculative IC evictions CL0 CL1 CL2 0x043FC300 0x043FC340 0x04302340 Index = 0Ch Index = 0Dh Index = 0Dh CL3 0x043FC380 Index = 0Eh
Figure 5.3: Demonstration of how cache evictions can happen due to the specu- lative nature of code execution
Fig. 5.3 illustrates the flow of control between three cachelines. Cacheline CL1 (starting address 0x043FC300) contains a branch to CL0 (starting address 0x04302340) and has never been taken. Because the BPU has not discovered the branch, the fetch unit does sequential prefetches of cachelines CL2 (starting at address 0x43FC340) and CL3 (starting at 0x43FC380). Let us assume that CL0 already exists in the IC (since it belongs to a lower address range). CL0 contains a branch to CL3 that is known to the BPU. Let us also assume that there is
another valid cacheline CL5 (not shown) at index 0Dh of the IC, which has been used most recently recently. CL2 has an index of 0Dh and there is no vacant slot in the IC for insertion. Since both IC entries (CL0 and CL5) at index 0Dh are valid, the least recently used entry is evicted. This means that CL0 will be replaced by CL2. When the execution unit encounters the branch in CL1, it issues a redirect to the branch target. The branch target lies in CL0, but the sequential prefetch evicted CL0 from the IC. So CL0 has to be refetched from the L2 cache. The branch target for the predicted branch in CL0 is in CL3, and the cacheline CL3 exists in IC. So no additional prefetching is done. This example shows how a cacheline can be evicted due to speculative execution. Even though CL2 was never executed, it evicted CL0. If we followed the code execution without account for speculative prefetching, we would fail to see the eviction and the subsequent re-fetch of cacheline CL0. On the contrary, there would have been no such eviction if the branch in CL1 was known to the BPU.
Estimating speculative path behavior
To account for speculative evictions, in addition to updating the IC model based on code execution, we also ’prefetch code’. This is done whenever we en- counter a branch instruction or redirect. Entries are added to the IC model to account for microprocessor prefetches caused by speculation. Table 5.2 lists out
possible causes of program flow change. Whenever we encounter a branching in- Table 5.2: Sources of program flow change
Expected Unexpected
Static branch Branch discovery Dynamic branch JMP FAR/CALL FAR
RETs Interrupt Exception
struction or a change in program flow, we use the rules extracted in Sec. 4.5.4to predict the next k cachelines that the RTL is likely to prefetch. A static branch is an example of an expected redirect. A static branch is a discovered branch whose outcome and target remain unchanged. A static branch is always taken and predicted as taken by the BPU. Since the branch behavior is known, there is no bad-path overhead. When the BPU encounters a predicted branch, instruction decode automatically restarts from the branch target. For unexpected redirects, the processor is likely to have speculated incorrectly.
To determine the speculative path, we make use of code history. We use branch history information to locate the targets of discovered branches. A recur- sive prefetch is done along the speculative path until the next k cachelines are determined. If there are no discovered branches on the speculative path, since the prefetch is sequential in RTL, we simply follow the code flow to track IC activity.