Reference: Thapar, M. & Delagi, B. Design and Implementation of a Distributed Cache Coherence Protocol. KSL, April, 1990.
Abstract: Cache coherence is an important well known problem in shared memory multiprocessor systems. If multiple caches are allowed to simultaneously have copies of a given memory location, a mechanism must exist to ensure that all copies remain consistent when the contents of that memory location are modified. The cache coherence problem may be solved through hardware or software. Hardware based protocols to solve the cache coherence problem are well understood for bus based shared memory architectures. However the shared bus limits the number of processors that can be connected to the bus without saturating it. To support scalable shared memory architectures, the cache coherence protocol needs to be able to work in the absence of a global broadcast mechanism. Directory based schemes are a possible solution in this environment. This paper describes a new distributed directory cache coherence protocol based on a linked-list of caches. The information about which caches have copies of the data is decentralized and distributed among the cache lines. Our implementation, like the fully mapped directory scheme, tracks any number of cache copies and never requires invalidates to be sent to all caches in the system. It has a lower cost and better network traffic characterisitics than the fully mapped directory based coherence scheme for expected memory and cache sizes. In the fully mapped scheme, the size of the memory required to hold the state information is O(MN), where M is the size of main memory and N is the number of caches. In our scheme, on the other hand, the size of the memory required to hold the state information is only O(M log N). We allow adaptive routing (so that network performance may be more robust) and do not assume that the network connecting caches preserves the order of messages. The network traffic generated for invalidations is reduced by a factor of two in our implementation compared to a fully mapped directory scheme for networks that do not preserve the order of messages. An important feature of this protocol is that locks can be supported very efficiently with minimal extra cost.