WO1997004392A1 - Dispositif d'antememoire partagee - Google Patents
Dispositif d'antememoire partagee Download PDFInfo
- Publication number
- WO1997004392A1 WO1997004392A1 PCT/EP1995/002847 EP9502847W WO9704392A1 WO 1997004392 A1 WO1997004392 A1 WO 1997004392A1 EP 9502847 W EP9502847 W EP 9502847W WO 9704392 A1 WO9704392 A1 WO 9704392A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- line
- bit
- cache memory
- processors
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0831—Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
Definitions
- the invention relates to a cache memory device for usage as a shared cache in a multiprocessor system, wherein the memory space is organized in a plurality of storage blocks. Furthermore, the invention relates to a computer system incorporating such a multiprocessor system and to a method to operate a shared cache memory device in such a multiprocessor system.
- a snooping coherency protocol for a multiprocesor network is known. Every processor has its own private cache and bus interface means and the network is connected via a common system bus. Each processor has its own cache directory and image directory that duplicate each other non-atomically.
- the snooping protocol utilizes the duality of directories coupled with the non-atomicity of directory updates to maximize processor-cache availability and minimize processor-cache access times thus supporting high performance processors.
- EP-A-0 349 123 a multi-processor computer system having shared memory and private cache memories is disclosed.
- this multi ⁇ processor system is implemented using a common system bus as the communication mechanism between CPU, memory, and I/O adapters. It is also common to include features on each CPU module, such as a cache memory, that enhance the performance of the execution of instructions in the CPU.
- Many architectures require that the hardware employs a mechanism by which the data in the individual CPU cache memories is kept consistent with data in main memory and with data in other cache memories.
- One such method involves each CPU monitoring transactions on the system bus, and taking appropriate action when a transaction appears on the bus which would render data in the CPU's cache incoherent. If the CPU uses queues to hold records of incoming transaction information until it can service them, the bus interface must guarantee that the queued items are processed by the cache in the correct order. If this is not done, certain types of shared data protocols fail to operate correctly.
- the EP-A-0 349 123 describes a method by which hardware can guarantee the serialization of transactions requiring service by the CPU cache.
- the serialization method described guarantees that shared memory protocols operate correctly.
- the processor transfers instructions and data from system memory to the cache memory in order to have quick access to the variables of the currently executing program. As additional data, not in the cache, is required, such data is transferred from the main memory by replacing selected data in the cache.
- Various techniques or algorithms are utilized to determine which data is replaced. Since the data in the cache is duplicative of data in the main memory as seen from other processors, changes to data in one memory must similarly be changed or noted in the other memory. The problem of maintaining consistency between the cache and main memory data is referred to as coherency. For example, if the data in the cache is modified, the corresponding data in the main memory must similarly be modified.
- a scheme for modifying data in the main memory at the same time as data in the cache is known as a write-through cache.
- the main memory may be accessed by devices other than the system processor. For example, data my be written to the main memory by another device while the processor is operating on data in its cache. If the data written to main memory overwrite data presently in the cache, a coherency problem is presented. It is known in the art to use a method such as bus snooping to address this problem. This involves monitoring a bus for writes to main memory and then checking a tag directory associated with the cache to see if the data is in the cache. If present, a flag is set in the cache tag directory to invalidate the entry so that the old data is not used by the processor.
- the processing units, main memory peripheral devices and other devices are all tied together by a multidrop system bus which carries messages, instructions, data, addresses, protocol signals, etc. to and from all devices.
- the main memory is constructed of inexpensive, slow memory components. Each access to main memory to obtain data requires use of the data and address portions of the system bus and a substantial amount of memory access time thus seriously inhibiting the performance of the computer system. It is found that, in computer systems having large main memory capacity, a substantial portion of the memory is used very infrequently. On the other hand, certain data blocks, i.e. memory locations are used very frequently by the processing units (or processors).
- the data cache is inserted between the processing unit and the system bus which couples between processing units, the main memory and all other devices on the system.
- some form of virtual addressing is used to address the space in the data cache.
- the operation of the data cache during a read instruction from the processing unit is straightforward. If the requested data is in the cache, it is forwarded directly to the processing unit. If, however, the data is not in the cache, the cache obtains the data from main memory, stores it in the cache and also forwards it to the processing unit.
- a write to memory there are various possible implementations. In one possible method, data can be written to the cache and the main memory simultaneously. However, a substantial amount of processor time is saved by only writing to the cache memory during the write operation and providing a flag for the location in the cache which indicates that the data in that location has been updated and is no longer consistent with the data in main memory. The main memory, then, can be updated only when this block of memory is to be removed from the cache to make way for a new block. This method saves unnecessary write operations to the main memory when a given cache word is updated a number of times before it is replaced in the cache.
- a further concept to maintain data consisentcy is known from an article entitled "Data Consistency In A Multiprocessor System With Store-In Cache Concept", microprocessing and microprogramming 32 (1991), 215 -220, North-Holland by G. Doettling and US-A-5 113 514.
- the approach described in the article by G. Doettling, which is also known as the MESI-protocol is also employed in the multiprocessor system known from EP-A-0 575 651 and
- the MESI-protocol has four states: Modified, Exclusive, Shared and Invalid.
- a modified cache line is a cache line which has been changed by one of the processors which holds the actual copy in its cache.
- An exclusive cache line is a cache line which is owned by only one processor which may write into this line.
- a shared cache line is a cache line which is copied in the caches of more than one processor.
- An invalid cache line is an invalid copy of a cache line.
- cache line states may be recorded in each private cache directory using three bits: A multiple copy bit (NC-bit), a change bit (C-bit) and valid bit (V-bit) .
- a bus snooping action is initiated in each processor than a command and its address are placed on the bus. The command and its address are examined by the snooping processors and the required actions are taken.
- Each of the processors searches its cache directory for the address.
- a cache line hit i.e. match of the address on the bus and an address stored in the directory, is signaled by the processor which detected the cache line hit in its cache directory to the other processors. If this cache line where the hit occurred is valid and changed this cache line represents the actual copy in the system. In this case the actual cache line is retransmitted to the main memory. This is also called "Cast-Out".
- the cache line status in the cache directories of the caches which also have a cache line hit is updated correspondingly.
- This snooping action requires a couple of cycles to interrogate the bus address, the directory and signaling the result to the other processors. It is normally a synchronous operation. It is an object of the present invention to provide an improved cache memory device for usage as a shared cache in a multiprocessor system, an improved multiprocessor system and an improved computer system incorporating such a cache memory device and an improved method to operate a shared cache memory device in a multiprocessor system.
- the object of the invention is solved by the features laid down in the independent claims.
- the cache memory device of the invention is organized in a plurality of storage blocks, such as storage lines.
- the storage blocks can be used for the virtual addressing of the space in the cache memory device.
- the cache memory device has storage means for storage of status information for each of the storage blocks.
- Such a storage means can be realized in the cache memory device by a dedicated storage space wherein such status information can be stored for each of the storage blocks.
- the status information is indicative of valid or shared data which is stored or which is to be stored in the storage block to which the respective status information belongs.
- the cache memory device further comprises logic means to selectively store data in the storage blocks and to set the status information correspondingly.
- the provision of status information being indicative of shared data is beneficial because this allows to take full advantageous of the shorter access time of the shared cache memory device as compared to the private caches of the processors of the multiprocessor system. If one of the processors requests data which are present in a storage block of the shared cache memory device and if the status of these data is "shared", the share memory device can output the data stored in that block on the bus without further access time penalty, i.e. without having to wait for the other processors to signal that these data required by the requesting ⁇ ocessor is also present in one or more of the private cache memories of the snooping processors or not.
- the shared cache memory device stores the updated copy of the changed data in the corresponding storage block. This is controlled by logic means, i.e. a number of logic gates, which are incorporated in the cache memory device.
- the MESI-protocol is used to maintain the consistency of the data being stored in the private caches and the shared cache memory device.
- the status information for each of the storage blocks of the shared cache memory device comprises a valid bit (V-bit) and a multiple copy bit (MC-bit) . If the multiple copy bit of one of the storage blocks is logically one this indicates that the data of that storage block is shared, i.e. at least two private caches of different processors have a copy of these data stored therein.
- the status information being indicative of shared data in the shared cache memory device is already set to logically one if data which generally remains unchanged such as instructions, is requested by one of the processors. This is irrespective of the requested data already being shared by one of the private caches of the other processors or not. It is assumed that data, such as instructions, generally remains unchanged. Generally it is implied that a situation in which one of the private caches holds valid data which is changed - causing a cast-out situation - will occur only rarely. Hence it is possible in most cases to take full advantage of the shorter access time of the shared cache memory device to respond to a second request by another processor for the same data.
- Fig. 1 is a schematic block diagram of a multiprocessor system and a shared cache memory device according to the invention
- Fig. 2 is a signal diagram of the multiprocessor system when no usage is made of the shared cache memory device
- Fig. 3 is a signal diagram of the multiprocessor system when usage is made of the shared memory device.
- the multiprocessor system of Fig.l comprises a plurality of processors Pl, P2, ..., P n . Only two of the processors - Pl and P2 - are shown in Fig.l by way of example.
- the processors of the microprocessor system communicate with each other and with the main memory 2 of the system and the shared cache memory 3 via the system bus 1.
- Each of the processors has a private cache memory.
- the processors Pl and P2 have the private cache memories 4 and 5, respectively.
- Each of the private cache memories has a storage array for the storage of a plurality of storage blocks. In this case the storage blocks are realized as storage lines i.e. lines 1 to n.
- each of the private caches has a storage space for storage of a valid bit (V-bit), a multiple copy bit (MC-bit) and a change bit (C-bit) for each of the lines.
- V-bit valid bit
- MC-bit multiple copy bit
- C-bit change bit
- the first line of the storage space 7 holds V-, MC- and C-bit for the storage line 1 of the storage array 6.
- the V-, MC- and C-bit of line 1 equals logically 1, 0, 0, respectively.
- the corresponding bits for line 2 are 1, 1, 0; for line 3 1, 0, 1 and for line n 0, X, X.
- "X" indicates that this bits position is a "don't care” bit. This is the case if the valid bit is logically 0.
- the corresponding bits of the storage line 1 of the storage array 8 of the private cache 5 are 0, X, X, and for the further lines 2 to n: 1, 1, 0; 0, X, X; ...; 1, 1, 0. These bits are stored in the storage space 9 of the private cache 5.
- Each of the private caches has a directory which holds the logical addresses of the data which are stored in the storage lines.
- the directory 10 of the private cache 4 holds the logical address of the data stored in the storage line 1 as well as the further addresses of the data stored in the further lines 2 to n.
- the shared cache memory device 3 has a storage array 12 for the storage of the storage lines 1 to m.
- the number m of storage lines which can be stored in the shared cache memory 3 generally is much greater than the number n of storage lines of the private cache memories.
- the number of bit positions in a storage line of the storage array 12 of the shared cache memory 3 equals the number of bit positions in a storage line of one of the private caches. This is essential for the method of addressing which is employed in this example.
- the shared cache memory device 3 further comprises a storage space 13 which serves to store the status information for each of the storage lines of the storage array 12.
- the storage space 13 has two bits of status information for each of the storage lines of the storage array 12, i.e. a V- and a MC-bit.
- the storage line 1 of the storage array 12 has a valid bit V which equals logically 0; as a consequence the multiple copy bit MC is "don't care" (X).
- the corresponding status information of the lines 2 to m is 1,1; 0,X; ...; 0,X.
- the shared cache memory device 3 comprises a directory 14 which holds the logical addresses of the data stored in the storage lines 1 to m in the storage array 12, i.e. the logical address of the data stored in the storage line 1 and for the further storage lines 2 to m, respectively.
- the operation of the shared cache memory device 3 is controlled by the logic 15.
- the logic 15 enables the storage of data in a selected one of the storage lines of the storage array 12 and sets the status information in the storage space 13 correspondingly.
- the processors Pl, P2, ..., P n as well as the shared cache memory 3 are interconnected by the signal lines 16 and 17. If the signal line 16 is raised by one of the processors this causes a bus snooping operation of the other processors and of the shared cache memory 3. The bus snooping of the shared cache memory 3 is also accomplished by the logic 15.
- the signal line 17 is raised by a snooping processor which has detected a hit, or with other words an address match in its directory. If the signal line 17 is raised and a cast-out is required then this signal remains active until the cast-out operation is completed.
- table 1 eight examples are shown for possible states of the status information in a requesting processor and in a snooping processor according to the MESI-protocol which is employed here in order to maintain consistency of the data stored in the caches.
- the requesting processor requires a sequence of data having the logical address i for storage in one of the lines, for example line j of its storage array 6 for a read reference. This is called line fetch due to fetch because the requesting processor Pl does only fetch the required sequence of data.
- line fetch due to store the requesting processor requires the sequence of data for write purposes.
- Line Fetch 5. 1 0 1 0 0 0 0 0 0 due to 6.
- 1 0 1 1 0 0 0 0 0 0 store 7.
- the initial situation is that the data having the logical address i are not present in the private cache of the requesting processor or if the data having the logical address i is present in the private cache the V-bit is logically 0.
- the requesting processor puts a line fetch command together with the logical address i onto the bus and initiates a bus snooping operation of the other processors P2 to PN and the shared cache memory 3.
- the bus snooping is initiated by the processor Pl over the signal line 16.
- the snooping processor considered here has no logical address in its directory which matches the logical address i of the requested line of data or - if such a match should occur - the corresponding data is not valid which is indicated by the V-bit which equals logically 0. In this case no action or change of the status information is required by this snooping processor.
- the directory of the snooping processor considered here holds a logical address which matches the logical address i of the requested data.
- the corresponding line of data has the following status information:
- the V-bit is 1 the MC-bit is 0 and the C-bit is also 0.
- the snooping processor has a valid copy of the requested line of data, that this line of data is unchanged with respect to the corresponding data of the same logical address i being stored in the main memory 2 and that there are no further copies in other snooping processors which is indicated by the MC-bit.
- the requested line of data is outputted by the main memory 2 or by the shared cache memory 3 onto the system bus 1 to be inputted to the private cache of the requesting processor Pl.
- the requesting processor and the snooping processor hold a valid copy of the line of data having the logical address i. This is reflected by the multiple copy bit (MC-bit) .
- the shared memory cache 3 will set its MC-Bit to 1 if not already done by a previous bus operation.
- the multiple copy bit is set to be logically 1 in the line of the storage space 7 of the processor Pl which belongs to the storage line in the storage array 6 into which the requested data is stored.
- the status information as indicated by the V-, MC- and C-bits is 1,1,0 in the requesting processor and the snooping processor in which the hit situation occurred.
- Example 3 differs from example 2. in that the multiple copy bit initially equals logically 1 in the snooping processor. This means that there is more than one copy in the private cache memories of the processors. In this case no action has to be performed by the snooping processor considered here.
- the initial state of the status information in the snooping processor is 1,0,1, i.e. the snooping processor holes a valid copy of the requested data but this copy has been changed previously so that the C-bit is logically 1.
- the snooping processor has to cast-out the changed line of data. Therefore the snooping processor has to perform the following actions:
- the changed line of data having the address i is outputted by the snooping processor from its private cache to the memory 2 via the system bus 1 in order to update the corresponding data in the memory 2.
- the snooping processor has to switch off the corresponding change bit since the line of data which is stored in the private cache of the snooping processor does not equal the data having the same logical address i which are stored in the memory 2.
- the snooping processor switches on its multiple copy bit to be logically 1. This is because the changed data which is outputted from the private cache of the snooping processor in order to update the main memory 2 is inputted into the private cache of the requesting processor Pl.
- the corresponding multiple copy bit in the storage space 7 of the private cache 4 of the requesting processor Pl is also switched on to be logically 1. Additionally, the changed data are put into the shared cache memory and its V-Bit and MC-Bit are set to 1.
- the requesting processor requires the line of data having the address i because the requesting processor wants to store data into the line of data having the logical address i. This is called a "line fetch due to store".
- the example 5. is not different from the example 1. as regards the status information of the requesting and the snooping processor apart from the fact that the change bit belonging to the line of data having the address i which inputted into the private cache 4 of the processor Pl is set to be logically 1. This is because the processor Pl intends to store new data into this line of data having the address i.
- the multiple copy bit equals logically 1 in the initial state of the snooping processor. For the same reason as in the example 6. this line of data is invalidated in the private cache of the snooping processor.
- the private cache of the snooping processor initially holds valid, changed line of data having the address i. This situation is analogous to the initial situation in the example 4. Again the snooping processor performs a cast-out action in order to update the main memory 2. The updated data is also inputted into the private cache 4 of the processor Pl via the system bus 1 as it is the case in the example 4.
- the line of data having the address i is invalidated in the private cache of the snooping processor because these data are going to be changed by the requesting processor. Again this is reflected by the corresponding status information in the private cache 4 of the requesting processor Pl where the V- and C-bits are logically 1 and the MC-bit is logically 0.
- the status information which indicates that this line of data which is inputted in the private cache 4 of the processor Pl is exclusive for the processor Pl.
- the shared cache memory will also set its V-Bit to 0.
- Fig.2 shows an example of the bus snooping timing for a bus operation like line fetch. It is assumed that one of the processors of the multiprocessor system - for example processor Pl - has a cache miss in its private cache memory and requires a line of data having the address i to be inputted into its private cache memory. Therefore the requesting processor puts a corresponding command Cmd on the system bus 1 in order to communicate its request to the other bus participants. The requesting processor also outputs the address i of the required line of data onto the system bus 1 in order to specify which line of data is needed.
- the fetch command Cmd as well as the address i Adr are outputted onto the system bus 1 concurrently in cycle 1. by the requesting processor. Concurrently therewith the requesting processor raises the signal line 16 in order to "wake up" the other processors of the system as well as the shared cache memory 3. This is to initiate the bus snooping of the other processors and of the shared cache memory 3. In the following it is assumed that the shared cache memory 3 does not have a valid copy of the requested data so that only the signals relating to the private caches of the snooping processors are shown in Fig.2.
- the signal Cmd/Adr which is outputted by the requesting processor onto the system bus 1 propagates as an electromagnetic wave along the system bus 1. Due to the electrical characteristics of the system bus 1 the signal Cmd/Adr remains on the system bus 1 for two cycles. This is necessary to support a near end reception from processor to processor.
- a snooping processor which is situated at the near end - or with other words - a snooping processor neighboring the requesting processor can only receive the signal Cmd/Adr when the reflected electromagnetic wave comes back from the far end of the system bus 1. This is because the voltage level of the signal Cmc/Adr as it is issued by the requesting processor is only half a full voltage swing from logically 0 to logically 1.
- the timing which is shown in Fig.2 belongs to a snooping processor at the near end, for example processor P2.
- the near end snooping processor receives the signal Cmd/Adr in the cycle 3. in its bus-in register (Bus-in Reg) . This reception of the signal Cmd/Adr is initiated by the signal CmdSel on the signal line 16.
- a directory search is carried out in the directory of the private cache of the snooping processor. This is to check whether there is a match of a logical address stored in the directory and of the logical address i of the requested line of data. If such an address match is found in the directory and if the valid bit (V-bit) of the corresponding storage line in the private cache of the snooping processor considered here is logically 1 there is a hit situation. This situation is signaled to the other processors and to the shared cache memory 3 via the signal line 17.
- the snooping processor which has found the hit in the directory of its private cache memory issues a signal Hit-out in the cycle 4.
- the signal Hit-out is received by the requesting processor one cycle later in the cycle 5.
- the received signal is named Hit-in the reception of the signal Hit-in informs the requesting processor about the existence of a copy of the requested line of data in the private cache memory of at least one of the snooping processors.
- the signal Hit-in can only be received by the requesting processor during the "Hit-window". In this case the Hit-window is predefined as the cycle 5.
- the resulting actions of both processors are described in the above table 1.
- the snooping processor considered here has just to turn on the corresponding multiple copy bit to logically 1. In this case the signal Hit-out is only issued for one cycle, i.e. cycle 4., by the snooping processor on the signal line 17. This is the situation shown in Fig.2.
- the shared cache memory 3 of the multiprocessor system shown in Fig.l operates in a "store thru mode". This means that an update of a line of data in the main memory 2 is written into the shared cache memory 3 concurrently with the update of the main memory 2.
- the shared cache memory 3 has a much shorter access time than the main memory 2 which contributes to a more efficient usage of the system bus and thereby to a faster operation of the micro processor system.
- the valid and multiple copy bits which are stored in the storage space 13 of the shared cache memory 3 are controlled in the same way as the corresponding status bits in the private caches according to the MESI-protocol. If the valid bit (V-bit) of a line of data which is stored in the storage array 12 is logically 1 this indicates that this line of data is a valid copy of the same line of data having the same address i which is stored in the main memory 2.
- the multiple copy bit (MC-bit) of a valid line of data having the address i is logically 1 this indicates that there is a valid copy of this line of data in at least 2 of the private caches of 2 different processors of the multi processor system.
- the provision of the multiple copy bit in the storage space 13 of the shared cache memory 3 is essential for a further improvement of the usage of the system bus 1.
- processor Pl requires a line of data having the address i.
- This line of data is not present in the private cache 4 of the processor Pl so that there is a cache miss in the private cache 4.
- the processor Pl puts a line fetch command together with the address i Cmd/Adr on the system bus 1 (cf. Fig.l).
- the requested line of data is neither stored in any of the private caches of the other processors nor in the storage array 12 of the shared cache memory 3. This means that the requested line of data is only present in the main memory 2 of the microprocessor system.
- the requested line of data is outputted from the main memory 2 onto the system bus 1 and inputted into the private cache 4 of the requesting processor Pl.
- the requested line of data is also inputted into the shared cache memory 3 and stored in one of the storage lines of the storage array 12.
- This storage operation in the shared cache memory 3 is carried out under the control of the logic 15.
- the valid, multiple copy and change bit of the requested line of data which is stored now in the storage array 6 of the microprocessor Pl is 1,0,0, respectively.
- the valid bit of the requested line of data which is stored in the storage array 12 of the shared memory 3 is also logically 1 and the corresponding multiple copy bit is also logically 0.
- the valid and the multiple copy bit of the requested line of data which is stored now in the storage array 12 of the shared memory 3 are stored in storage fields of the storage space 13 which belong to the storage line in the storage array 12 in which the requested line of data is stored now.
- a second request for the same line of data having the address i comes from another processor, for example processor P2.
- the second request is also due to a cache miss in the private cache 5 of the requesting processor P2.
- the requesting processor P2 outputs a fetch command and the corresponding address i Cmd/Adr onto the system bus 1.
- This causes the processor Pl to signal a Hit-situation (cf. Fig.2).
- the hit is signaled to the other processors and to the shared cache memory 3 via the signal line 17 by the processor Pl.
- the shared cache 3 has also a hit because the requested line of data having the address i is stored in the storage array 12 and is valid.
- the shared cache memory 3 waits until the window cycle (cf. cycle 5. in Fig.2) is over. If there is no Hit-out signal which remains active for more than one cycle this means that there is no cast- out situation. This is the case in the example considered here because the processor Pl has not changed the valid copy of the line of data having the address i which has been inputted via the system bus 1 from the main memory 2.
- the Hit-situation in the private cache 4 of the processor Pl does only cause that the multiple copy bit in the storage space 7 which belongs to the storage line of the storage array 6 in which the line of data having the address i is stored belongs.
- the shared cache memory 3 outputs the requested line of data onto the system bus 1. This is carried out under the control of the logic 15.
- the requested line of data is inputted into the private cache 5 of the requesting processor P2.
- the multiple copy bit in the storage field of the storage space 13 which belongs to the line of the storage array 12 in which the requested line of data is stored is set to logically 1 under the control of the logic 15. This is because there is a valid copy of this line of data in both of the private caches 4 and 5 of the processor Pl and P2.
- the corresponding multiple copy bit in the storage space 9 of the requesting processor P2 is also turned on correspondingly.
- a third processor for example processor P3 which is not shown in Fig.l, requests the same line of data having the address i. Again this causes Hit-situation in the shared memory 3 which has stored the requested line of data in its storage array 12.
- the respective status information stored for this line of data in the storage space 13 indicates that this line of data which is stored in the storage array 12 is valid and that there are 2 ore more valid copies of this line of data in different private caches of different processors in the system.
- this situation can only occur if the requested line of data is not exclusive to one of the processors. Such a situation is not allowed because in this case the consistency of the data stored in the different cache memories could no longer be maintained.
- the MESI-protocol - which is employed in this preferred embodiment of the invention to maintain data consistency - that there is an exclusive line of data having the address i in one of the private caches of the processors and that the multiple copy bit for this line of data is logically 1 in at least another one of the private caches or in the storage space 13 of the shared cache memory 3 at the same time. Since - by definition - the requested line of data having the address i which is requested by the processor P3 is not an exclusive line of data of any of the processors because the corresponding multiple copy bit in the storage space 13 of the shared memory 3 is logically 1, no cast-out situation can occur.
- the shared cache memory 3 does not have to wait until the window cycle (cf. cycle 5. in Fig.2) is over. Hence, the shared memory 3 outputs the requested line of data directly after the reception of the request onto the system bus 1 from where it is inputted into the private cache of the requesting processor P3.
- the processor Pl requests a line of data having the address j whereby the requested data are instructions.
- the corresponding command which is issued by the processor is called "line fetch due to instruction fetch”.
- the other bus participants are informed that the requested line fetch is carried out for the provision of a line of instructions having the address j to the requesting processor Pl.
- the requested line of data is outputted by the main memory 2 onto the system bus 1 and inputted both into the private cache 4 of the requesting processor Pl and into the shared cache memory 3.
- the multiple copy bit for this line of data having the address j is turned on immediately in the storage space 13 even though there is no further copy of this line of data in any other of the private caches of the other processors. This is because it is relatively unlikely or even impossible that this line of data is going to be changed by the requesting processor Pl. As a consequence it is assumed that the requested line of data will not be an exclusive line of data for the processor Pl. This makes it possible to set the multiple copy bit of the requested line of data having the address j in the storage space 13 of the shared cache memory 3 already to logically 1 at the first request of this line by one of the processors, in this example by the processor Pl. Consequently, the MC-Bit of the requesting processor must be set to 1.
- the processor Pl has to issue a command on the system bus 1 so that the corresponding line of data which is stored in the storage array 12 of the shared cache memory 3 is invalidated prior to the storage operation of the processor Pl.
- the invalidation of this line of data in the shared cache memory 3 is carried out under the control of the logic 15.
- the signal diagram shown in Fig.3 illustrates the enhancement of the performance of the multiprocessor system which is due to the shared cache memory 3.
- the situation shown in Fig.3 corresponds to the situation as shown in Fig.2 with respect to the requesting processor and the snooping processors.
- the signals of the shared cache memory 3 ("shared cache") are illustrated in Fig.3.
- the shared cache memory 3 is placed at the far end of the system bus 1. Since the far end of the bus 1 has a high impedance a command on the bus (Cmd/Adr) is received by the shared cache memory 3 already when the corresponding electromagnetic signal reaches the far end. As a consequence the time for the reception of the signal is shorter for the shared cache memory 3 as compared to a near end processor which has to wait for the reflection of the electro magnetic signal, as already explained above.
- the electromagnetic signal from the requesting processor (Cmd/Adr) is already received in cycle 2. in the bus-in register of the shared cache memory 3. Thereby an directory search in the directory 14 of the shared cache memory 3 is initiated. In the example considered here there is an address match in the directory 14. Furthermore it is assumed that the corresponding line of data which is stored in the shared cache memory 3 is valid and that the corresponding multiple copy bit of this line of data is logically 1.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
L'invention concerne un dispositif d'antémémoire destinée à être utilisé comme antémémoire partagée (3) dans un système multiprocesseur. Le dispositif d'antémémoire (3) est organisé en une pluralité de blocs de mémoire et présente un moyen de stockage (13) destiné à mémoriser des informations d'état indicatives de données valides ou partagées des blocs de mémoire. L'antémémoire (3) comprend également un moyen logique (15) destiné à mémoriser sélectivement des données dans un des blocs de mémoire et à définir les informations d'état à l'avenant.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP9506194A JPH10501914A (ja) | 1995-07-19 | 1995-07-19 | 共用キャッシュ・メモリ装置 |
PCT/EP1995/002847 WO1997004392A1 (fr) | 1995-07-19 | 1995-07-19 | Dispositif d'antememoire partagee |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP1995/002847 WO1997004392A1 (fr) | 1995-07-19 | 1995-07-19 | Dispositif d'antememoire partagee |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997004392A1 true WO1997004392A1 (fr) | 1997-02-06 |
Family
ID=8166060
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP1995/002847 WO1997004392A1 (fr) | 1995-07-19 | 1995-07-19 | Dispositif d'antememoire partagee |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPH10501914A (fr) |
WO (1) | WO1997004392A1 (fr) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005088457A2 (fr) * | 2004-03-10 | 2005-09-22 | Intel Corporation | Procede et systeme d'ordonnancement d'operations memoire |
WO2006031414A2 (fr) * | 2004-09-09 | 2006-03-23 | Intel Corporation | Resolution de conflits de memoire cache |
US9904734B2 (en) | 2013-10-07 | 2018-02-27 | Apdn (B.V.I.) Inc. | Multimode image and spectral reader |
US9963740B2 (en) | 2013-03-07 | 2018-05-08 | APDN (B.V.I.), Inc. | Method and device for marking articles |
US10047282B2 (en) | 2014-03-18 | 2018-08-14 | Apdn (B.V.I.) Inc. | Encrypted optical markers for security applications |
US10519605B2 (en) | 2016-04-11 | 2019-12-31 | APDN (B.V.I.), Inc. | Method of marking cellulosic products |
US10741034B2 (en) | 2006-05-19 | 2020-08-11 | Apdn (B.V.I.) Inc. | Security system and method of marking an inventory item and/or person in the vicinity |
US10745825B2 (en) | 2014-03-18 | 2020-08-18 | Apdn (B.V.I.) Inc. | Encrypted optical markers for security applications |
US10920274B2 (en) | 2017-02-21 | 2021-02-16 | Apdn (B.V.I.) Inc. | Nucleic acid coated submicron particles for authentication |
US10995371B2 (en) | 2016-10-13 | 2021-05-04 | Apdn (B.V.I.) Inc. | Composition and method of DNA marking elastomeric material |
US11675705B2 (en) * | 2019-02-28 | 2023-06-13 | Micron Technology, Inc. | Eviction of a cache line based on a modification of a sector of the cache line |
US11914520B2 (en) | 2019-02-28 | 2024-02-27 | Micron Technology, Inc. | Separate read-only cache and write-read cache in a memory sub-system |
US12007917B2 (en) | 2019-02-28 | 2024-06-11 | Micron Technology, Inc. | Priority scheduling in queues to access cache data in a memory sub-system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0481233A2 (fr) * | 1990-10-12 | 1992-04-22 | International Business Machines Corporation | Antémémoires à niveaux multiples |
EP0608638A1 (fr) * | 1993-01-29 | 1994-08-03 | International Business Machines Corporation | Procédé et système pour augmenter l'efficacité de transferts de données de mémoire à plusieurs processeurs dans un système de traitement de données |
-
1995
- 1995-07-19 WO PCT/EP1995/002847 patent/WO1997004392A1/fr active Application Filing
- 1995-07-19 JP JP9506194A patent/JPH10501914A/ja not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0481233A2 (fr) * | 1990-10-12 | 1992-04-22 | International Business Machines Corporation | Antémémoires à niveaux multiples |
EP0608638A1 (fr) * | 1993-01-29 | 1994-08-03 | International Business Machines Corporation | Procédé et système pour augmenter l'efficacité de transferts de données de mémoire à plusieurs processeurs dans un système de traitement de données |
Non-Patent Citations (2)
Title |
---|
ANONYMOUS: "Second Level Cache for MP Systems", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 27, no. 1A, NEW YORK, US, pages 298 - 300 * |
DOETTLING: "Data consistency in a multiprocessor system with 'store in' cache concept", MICROPROCESSOR AND MICROPROGRAMMING 32, NORTH-HOLLAND, vol. 32, no. 1-5, pages 215 - 220 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005088457A3 (fr) * | 2004-03-10 | 2005-12-15 | Intel Corp | Procede et systeme d'ordonnancement d'operations memoire |
US7240168B2 (en) | 2004-03-10 | 2007-07-03 | Intel Corporation | Method and system to order memory operations |
WO2005088457A2 (fr) * | 2004-03-10 | 2005-09-22 | Intel Corporation | Procede et systeme d'ordonnancement d'operations memoire |
US10078592B2 (en) | 2004-09-09 | 2018-09-18 | Intel Corporation | Resolving multi-core shared cache access conflicts |
WO2006031414A2 (fr) * | 2004-09-09 | 2006-03-23 | Intel Corporation | Resolution de conflits de memoire cache |
WO2006031414A3 (fr) * | 2004-09-09 | 2007-01-25 | Intel Corp | Resolution de conflits de memoire cache |
US9727468B2 (en) | 2004-09-09 | 2017-08-08 | Intel Corporation | Resolving multi-core shared cache access conflicts |
US10741034B2 (en) | 2006-05-19 | 2020-08-11 | Apdn (B.V.I.) Inc. | Security system and method of marking an inventory item and/or person in the vicinity |
US9963740B2 (en) | 2013-03-07 | 2018-05-08 | APDN (B.V.I.), Inc. | Method and device for marking articles |
US10282480B2 (en) | 2013-10-07 | 2019-05-07 | Apdn (B.V.I) | Multimode image and spectral reader |
US9904734B2 (en) | 2013-10-07 | 2018-02-27 | Apdn (B.V.I.) Inc. | Multimode image and spectral reader |
US10047282B2 (en) | 2014-03-18 | 2018-08-14 | Apdn (B.V.I.) Inc. | Encrypted optical markers for security applications |
US10745825B2 (en) | 2014-03-18 | 2020-08-18 | Apdn (B.V.I.) Inc. | Encrypted optical markers for security applications |
US10519605B2 (en) | 2016-04-11 | 2019-12-31 | APDN (B.V.I.), Inc. | Method of marking cellulosic products |
US10995371B2 (en) | 2016-10-13 | 2021-05-04 | Apdn (B.V.I.) Inc. | Composition and method of DNA marking elastomeric material |
US10920274B2 (en) | 2017-02-21 | 2021-02-16 | Apdn (B.V.I.) Inc. | Nucleic acid coated submicron particles for authentication |
US11675705B2 (en) * | 2019-02-28 | 2023-06-13 | Micron Technology, Inc. | Eviction of a cache line based on a modification of a sector of the cache line |
US11914520B2 (en) | 2019-02-28 | 2024-02-27 | Micron Technology, Inc. | Separate read-only cache and write-read cache in a memory sub-system |
US12007917B2 (en) | 2019-02-28 | 2024-06-11 | Micron Technology, Inc. | Priority scheduling in queues to access cache data in a memory sub-system |
Also Published As
Publication number | Publication date |
---|---|
JPH10501914A (ja) | 1998-02-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR100567099B1 (ko) | L2 디렉토리를 이용한 멀티프로세서 시스템의 가-저장촉진 방법 및 장치 | |
EP1399823B1 (fr) | Utilisation d'un repertoire l2 pour faciliter les chargements speculatifs dans un systeme multiprocesseur | |
US5249284A (en) | Method and system for maintaining data coherency between main and cache memories | |
US6272602B1 (en) | Multiprocessing system employing pending tags to maintain cache coherence | |
US6718839B2 (en) | Method and apparatus for facilitating speculative loads in a multiprocessor system | |
US5913021A (en) | Memory state recovering apparatus | |
JPH07281955A (ja) | マルチプロセッサーシステムのスヌープ回路 | |
EP0303648B1 (fr) | Unite centrale pour systeme informatique numerique comprenant un mecanisme de gestion d'antememoire | |
US5918069A (en) | System for simultaneously writing back cached data via first bus and transferring cached data to second bus when read request is cached and dirty | |
US6922755B1 (en) | Directory tree multinode computer system | |
WO1997004392A1 (fr) | Dispositif d'antememoire partagee | |
US7024520B2 (en) | System and method enabling efficient cache line reuse in a computer system | |
US5737568A (en) | Method and apparatus to control cache memory in multiprocessor system utilizing a shared memory | |
KR100322223B1 (ko) | 대기행렬및스누프테이블을갖는메모리제어기 | |
US5748938A (en) | System and method for maintaining coherency of information transferred between multiple devices | |
US7464227B2 (en) | Method and apparatus for supporting opportunistic sharing in coherent multiprocessors | |
US9672153B2 (en) | Memory interface control | |
JPS60237553A (ja) | キヤツシユコヒ−レンスシステム | |
JPH03230238A (ja) | キャッシュメモリ制御方式 | |
KR100258883B1 (ko) | 멀티 프로세서 시스템의 캐시 메모리 제어방법 및 장치 | |
JP3340047B2 (ja) | マルチプロセッサシステムおよび複製タグの制御方法 | |
JP3081635B2 (ja) | キャッシュメモリの無効化処理装置および無効化制御方法 | |
JPH03172943A (ja) | キャッシュメモリ制御方式 | |
KR0145454B1 (ko) | 분산된 공유 메모리를 갖는 다중 프로세서 | |
KR970004520B1 (ko) | 고속 메모리 제어방법 및 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref country code: US Ref document number: 1997 765988 Date of ref document: 19970110 Kind code of ref document: A Format of ref document f/p: F |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
122 | Ep: pct application non-entry in european phase |