Saturday, August 28, 2010

Cache Coherence Protocol with Share Coherence Cache for Multiprocessor

INTRODUCTION
A multi-core processor is a processing system composed of two or more independent cores. It can be described as an integrated circuit to which two or more individual processors (called cores in this sense) have been attached. The cores are typically integrated onto a single integrated circuit die (known as a chip multiprocessor or CMP) or they may be integrated onto multiple dies in a single chip package. CMP refers to more than one microprocessor core in a chip, each microprocessor core essentially is relatively simple single-threaded microprocessor, and those microprocessors can implement the parallel program code. Compared with traditional single-core processor, CMP has the simple control logic, high-frequency, low-latency of communications, etc., which can bring more performance and productivity advantages. And it has no special requirements to the processor core’s architecture, or even much simpler than a wide superscalar processor core. Similarly CMP is widely used applications in image processing, network and other field.
In the CMP architecture, processor cores share a common memory space, which bring the processor and main memory a great speed gap. So the CMP architecture design must have multi-level cache, through the hierarchical storage structure to ease the problem. A CMP system must address the Cache coherence and the design problems. What kind of Cache Coherence protocol and its design mechanism will be a major impact to the overall design and development of CMP.
Traditional MESI protocol adopt write-invalidate mechanism, due to the relatively large expense to invalid the data copy blocks in each cache, and would be increase the access latency  when the processor accesses the invalidate data. In the Dragon protocol, the write operation to a shared copy of Cache block must be synchronize the rest of the block containing the copy of the Cache, which increases the burden on the bus.

 THE ARCHITECTURE OF CMP WITH SC-CACHE

In this of architecture, add a SC-Cache (Shared Coherence Cache) of the classic CMP architecture, which is located between the private Cache and bus, with small capacity, and fast access speed and other characteristics. At last, here bring new kind snooping cache coherence--CSC (Coherence with SC-Cache) protocol. As shown in Figure 1 is the new CMP architecture with SC-Cache.



DESIGN OF CSC PROTOCOL

For the price and techniques, the capacity of private cache is very small in the internal structure, typically between 4KB to 64KB. Therefore, the size of SC-Cache which design is much smaller than the private cache. In this experiment, the standard size of 1KB. In order to increase the utilization of SC-Cache, adopt the commonly used LRU (Least Recently Used) algorithm to replace the least recently used blocks away.
This architecture CSC coherence protocol adopts a combination mechanism of the write-through and write-back, which include these four states, PI (Private-Invalidate), PE (Private- Exclusive), PD (Private-Dirty) and SS (Share-Shared). Among them, the former three states are stored in the local or remote cache, and the last state only exists in SC-Cache, so the shared block writes operation simply to update the block, meanwhile, writes back into main memory without invalidating other cache copy. In this protocol, there are three kind of caches, the current processor's private cache (refers to as a local cache), the remote processor's private cache (refers to as a remote cache), as well as SC-Cache. The processor firstly accesses the local cache, if local cache miss happens, then accesses the SC-Cache,if this cache miss happens, then the local cache controller broadcasts the corresponding request to the remote cache on the bus.

CSC coherence protocol is described as follows:

A.    The main several states may occur in the CSC protocol

(1) Invalid PI (Private-Invalidate): the copy is inconsistent with the main memory or other copy of cache, or not found in the cache. Only the private cache may contain this state of cache block.

(2) The private effective state PE (Private-Exclusive): the copy has not been modified, so the copy is coherent with the main memory, and the other caches do not contain the valid copies, and it is the only valid copy of the cache. Only the private cache may contain this state of cache block.

(3) Private state PD (Private-Dirty): This copy has been modified several times, which is the only valid copy, is
inconsistent with the main memory and other copies in cache. When this cache block is replaced, it writes back into the main memory. Only the private cache may contain this state of cache block.

(4) Shared state SS (Share-Shared): The copy in cache is coherent with the copy of main memory, the state of cache block exists only in the SC-Cache which is shared by processors. Processor read directly from the SC-Cache to access the data, when write hit, the state is unchanged, and at the same time writes directly to update the main memory. In addition, the protocol is designed with two additional states, there are shared and non-shared. Among them, non-shared, means the remote cache does not contain a copy of the request, and the other, shared, means the remote cache contains a copy of the request.

B.     The request of processor

In this protocol, cache controller receives two kind requests of processor:
• Processor read request.
• Processor write request

C.     The coherent command of bus broadcast

• Bus read: Read hit, the requesting cache broadcasts the address of main memory to the bus, the remote cache
monitoring the bus, and puts the valid copy on the bus, then the SC-Cache gets the copy, and broadcasts the
copy invalid command on the bus. After the end of this process, the state of requesting cache not changes, the
SC-Cache state changes to SS.

• Bus Write: Write miss, the requesting cache broadcasts the main memory address on the bus, a remote cache
monitoring the bus, puts a valid copy on the bus, SC-Cache gets this valid copy, updates the copy, at the same time writes directly to update the master deposit, and the bus broadcasts invalid command. After the end of this process, the state of requesting cache not changes, SC-Cache state changes to SS.

D.     The state transition diagram of Local Cache and SC-Cache


E.    The detail described of CSC Protocol

Read miss, means that a requesting copy is neither in the local cache nor the SC-Cache. after getting the right of using he bus, The local cache then broadcasts the memory access address, if the remote cache exists a valid copy of the request (exists the sharing data ), and the cache block state is PD, then puts the copy of the processor request directly on the bus, at last supplies the copy into the SC-Cache, and changes the state of the copy block to SS, by write-through to update the main memory copy, and then makes the copy of the remote cache invalid; if the remote cache block state is PE, then this copy is coherent with the main memory, reads the copy from the main memory and supplies the copy into the SC-Cache, the copy of the cache block state changes to SS, at last then makes a copy of the remote cache invalid. If there is no requesting copy in other caches (no sharing copy), there must be a valid copy in the main memory, gets the copy from memory, and transfers to the local cache, cache state is marked as PE.
Write hit, if the state of the local cache block is PD, the current processor can be directly write or read the copy. If the status is PE, the state would be change to PD after updating the copy, and the copy of main memory is changed to invalid. If the copy hit in the SC-Cache, the share copy would be updated, but the copy state is still SS, and simultaneously updates the main memory.
Write miss, the requesting copy is neither in the local cache nor the SC-Cache. If the remote block cache block have the copy (shared), and the state is PE or PD, then puts the requesting copy directly on the bus, at last supplies into the SC-Cache, and changes state to SS. At the same time updates the main memory by the method of write-through, and then make the copy invalid in the remote cache; if the request copy does not exist in the remote cache, it must be in the main memory, then read the request copy and sent it into the SC-Cache.
At last changed the copy’s status to SS, at the same time to update the main memory by the method of write-through; If there was no the copy of requested in other cache (non-shared), there must be a valid copy in the main memory, gets the copy from main memory, and transfers to the local cache, cache state is marked as PD.
If the processor is going to assess a share copy, it would need some special operations and those operations must be controlled by the SC-Cache. For example, for multiple processors to read and write operations to the shared block, it must write first and then read ; one processor request to write the shared block, then the block would be ban of operations from the other processors, until the operation by the right processor is ending.



F.    CSC coherence protocol state transitions


STATE TABLE TRANSITION
operation
Local Cache Block
SC Cache Block
Initial state
Next state
Initial state
Next state
Read miss(no shared)
PI
PE
NA
NA
Read miss(shared)
PI
PI
Invalid
SS
Write miss
PI
PD
NA
NA
Write hit
PE
PD
NA
NA
Bus read
PE
PI
Invalid
SS
Bus write
PE
PI
NA
NA
Write /read hit
PD
PD
NA
NA
Bus read
PD
PI
Invalid
SS
Bus write
PD
PI
NA
NA
Write /read hit
NA
NA
SS
SS

NA mean sate is no change

Written by Amalendu Si, Faculty, Department of IT, MIT

No comments:

Post a Comment