In the case where a swap cannot be used to evict an entry from the directory cache, the WayPoint controller must traverse the linked list of overflow entries, find or allocate a free record and descriptor, and fill the evicted entry into those
lines resident in memory. An example of when a swap would not occur is when a line is accessed for the first time at the directory, and thus needs to fill into the directory cache, but all directory entries in the set that the access maps to are valid. Under these conditions, an eviction is necessary and there is no swap candidate.
Note that entries in the directory cache that are in a transient state are not allowed to be evicted. An example of a transient state is an entry that is stalled waiting to collect invalidation requests while transitioning from the shared state to modified or exclusive state. The eviction can stall while the controller waits for responses to return. When the responses are collected, the target of the eviction is moved into a non-transient state. If transient requests were allowed to be evicted to the overflow directory, thrashing could occur.
When an allocation in the overflow directory is required, the free list register is checked. If it is non-NULL, the head of the free list is swapped with its next pointer, and the allocated line is linked into the overflow list as a descriptor or into a descriptor as a newWayPoint record. Deallocation of a cleared descriptor or record is performed by setting the next pointer of the free block to the value in free list and then overwriting free list with the address of the free block. Note that operations performed to the free list are LIFO to increase the probability of a cache hit when the list is accessed. Since the values in lines in the free list are irrelevant, write-to-own semantics at the L3 forWayPoint could also reduce the latency of accessing lines in the free list by avoiding unnecessary off-chip memory accesses.
If the free list pointer is NULL, a new line is allocated by incrementing the top of heap register to generate an address that maps into the address region cached by the cache bank associated with the directory. The heap is only used to allocate lines since allocated lines are either used by the overflow directory or
the lines are in the free list. We allocate the maximum space required for all possible entries expected, i.e., there will never be an out-of-memory condition. It is possible to handle out-of-memory conditions by invalidating current overflow entries, but for simplicity we do not investigate such measures in this dissertation. Fragmentation may also be possible in overflow lists if a large number of records are allocated and then freed, but not cleared and deallocated. Compaction could be performed periodically to address this issue, but fragmentation was not found to be an issue in our implementation.
7.3
Summary and Discussion
In the design, implementation, and evaluation of WayPoint, we observe poten- tial bottlenecks and optimization opportunities to be addressed in future work. Ordering of overflow entries in the overflow directory, replacement policy for the directory cache, and the extra time required to determine if a directory cache miss is an overflow directory hit are potential bottlenecks. While added overhead due to ordering of overflow lists and traversals for determining residency in the overflow list would appear to be performance-limiting, they were not found to be a first-order concern in our initial implementation. LRU replacement in the directory cache was also suitable for WayPoint; however, we found it to be a poor choice for a fixed directory without WayPoint due to the filtration effect of lower-level caches with regard to directory cache criticality information.
7.3.1
Optimizations
The length of overflow lists could be reduced by increasing the number of lists in the WayPoint design. Such a technique trades off additional area to implement more head pointer registers for faster searches due to reduced list length. A second
potential mechanism is a counting Bloom filter [101] that uses hashes of directory entry addresses to increment counters when an entry is inserted into the overflow list. Finally, the order of entries in the lists could be modified dynamically to put frequently accessed entries near the beginning of the list to reduce traversal time. Incomplete knowledge of memory access patterns limits the performance of di- rectory cache replacement policies. Propagating LRU status from low-level caches to the directory using a side channel could mitigate the problem of LRU invali- dating directory cache entries that are often accessed by first-level caches but are seldom accessed at the directory. However, WayPoint already deals with such cases by avoiding invalidation, and insteadWayPoint uses overflow lists making replacement policy much less of a concern.
The WayPoint controller must access memory where the overflow lists are stored. In doing so, WayPoint must contend with normal requests for access to the top-level caches and the memory controller. In practice, the impact of contention is low since WayPoint must only access the data cache on a miss in the directory cache, which is the infrequent case, and many of those accesses hit in the LLC, minimizing the memory bandwidth requirements of the directory.