US20050027946A1 - Methods and apparatus for filtering a cache snoop - Google Patents
Methods and apparatus for filtering a cache snoop Download PDFInfo
- Publication number
- US20050027946A1 US20050027946A1 US10/630,465 US63046503A US2005027946A1 US 20050027946 A1 US20050027946 A1 US 20050027946A1 US 63046503 A US63046503 A US 63046503A US 2005027946 A1 US2005027946 A1 US 2005027946A1
- Authority
- US
- United States
- Prior art keywords
- cache
- snoop
- state
- cache line
- probe
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0817—Cache consistency protocols using directory methods
- G06F12/0822—Copy directories
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0831—Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present disclosure pertains to cache memory and, more particularly, to methods and an apparatus for filtering a cache snoop.
- a multi-processor system typically includes a plurality of microprocessors, a plurality of associated caches, and a main memory.
- many multi-processor systems use a “write-back” (as opposed to a “write-through”) policy.
- a “write-back” policy is a cache procedure whereby a microprocessor may locally modify data in its cache without updating the main memory until the cache data needs to be replaced.
- a cache coherency protocol In order to maintain cache coherency in such a system, a cache coherency protocol is used.
- One well known cache coherency protocol is the MESI (modified-exclusive-shared-invalid) cache coherency protocol.
- MESI modified-exclusive-shared-invalid
- a cache designed for the MESI protocol uses three bits to encode five states. The five states are modified, exclusive, shared, invalid, and pending.
- many of these cache coherency protocols allow a first cache that is holding locally modified data (i.e., “dirty” data) to directly supply a second cache that is requesting the same cache line (i.e., memory block) without updating main memory.
- the first cache then puts its cache line in an “owned” state to indicate that the line is “dirty” and shared.
- the “owned” cache line is victimized (e.g., replaced)
- the first cache must write the line back to main memory so that the modifications are not lost. This write-back generates bus traffic to the main memory. Bus traffic increase memory latency and power consumption. Subsequent modifications to the cache line in the second cache will also need to be written-back to main memory, thereby generating additional bus traffic.
- FIG. 1 is a block diagram of an example computer system illustrating an environment of use for the disclosed system.
- FIG. 2 is a more detailed block diagram of the example multi-processor module illustrated in FIG. 1 .
- FIG. 3 is a block diagram of an example memory hierarchy.
- FIG. 4 is a state diagram of a MESI cache coherency protocol which may be used by the L1 cache illustrated in FIG. 3 .
- FIG. 5 is a state diagram of an enhanced MESI cache coherency protocol which may be used by the L2 cache illustrated in FIG. 3 .
- FIG. 6 is a flowchart representative of an example process which may be executed by a device to implement an example mechanism for victimizing cache lines in the L2 cache illustrated in FIG. 3 .
- FIG. 6 b is a flowchart representative of an example process which may be executed by a device to implement an example mechanism for victimization from an L2 cache when the L2 cache is designed to be inclusive of the L1 cache.
- FIG. 7 is a flowchart representative of an example process which may be executed by a device to implement an example mechanism for responding to an internal inquiry at the L1 cache illustrated in FIG. 3 .
- FIG. 8 is a flowchart representative of an example process which may be executed by a device to implement an example mechanism for handling snoop probes to the L2 cache illustrated in FIG. 3 .
- FIGS. 9-10 are a flowchart representative of an example process which may be executed by a device to implement an example mechanism for filtering snoops without the use of tags.
- FIGS. 11-12 are a flowchart representative of an example process which may be executed by a device to implement an example mechanism for filtering snoops with the use of tags.
- the disclosed system maintains cache coherency and reduces write-back traffic by using an enhanced MESI cache coherency protocol.
- the enhanced MESI protocol includes the traditional MESI cache states (i.e., modified, exclusive, shared, invalid, and pending) as well as two additional cache states (i.e., enhanced modified and enhanced exclusive).
- a modified cache line is a cache line that is different than main memory.
- an enhanced modified cache line is a cache line that is different than main memory and a copy of the cache line is in another cache.
- an L2 cache holding a cache line in the enhanced modified state “knows” an L1 cache holds the same cache line.
- the L2 cache does not necessarily know the state of the cache line in the L1 cache.
- An exclusive cache line is a cache line that is not modified (i.e., the same as main memory).
- an enhanced exclusive cache line is a cache line that is not modified and a copy of the cache line is in another cache in a modified state. For example, an L2 cache holding a cache line in the enhanced exclusive state “knows” an L1 cache holds the same cache line in the modified state.
- a cache line may be selected for victimization according to any well known victimization policy. For example, a least recently used victimization policy may be used to select a cache line for victimization. Depending on the state of the selected cache line, an internal inquiry may be issued to other caches and/or a write-back operation may be performed prior to victimizing the selected cache line.
- an L2 cache may issue an inquiry to an L1 cache if the L2 cache is about to victimize a cache line in the enhanced modified state.
- the L1 cache responds to the internal inquiry by invalidating the cache line associated with the internal inquiry or posting a hit-modified signal depending on the current state of the cache line associated with the internal inquiry in the L1 cache. If the L1 cache does not hold the cache line, the L1 cache posts a no-hit signal. Based on how the L1 cache responds to the internal inquiry, the L2 cache may victimize the cache line without performing a write-back to main memory.
- snoop probes may be filtered by monitoring a snoop queue. Depending on the type of snoop probe in the snoop queue and the state of the associated cache line in the L2 cache and/or the L2 write-back queue 306 , cache hit information is posted and snoop probes may be sent to other caches in an effort to filter snoops and reduce bus traffic. This snoop filtering may be performed with or without the use of tags.
- FIG. 1 is a block diagram of an example computer system illustrating an environment of use for the disclosed system.
- the computer system 100 may be a personal computer (PC) or any other computing device.
- the computer system 100 includes a main processing unit 102 powered by a power supply 104 .
- the main processing unit 102 may include a multi-processor module 106 electrically coupled by a system interconnect 108 to a main memory device 110 , a flash memory device 112 , and one or more interface circuits 114 .
- the system interconnect 108 is an address/data bus.
- interconnects other than busses may be used to connect the multi-processor module 106 to the other devices 110 - 114 .
- interconnects other than busses may be used to connect the multi-processor module 106 to the other devices 110 - 114 .
- one or more dedicated lines and/or a crossbar may be used to connect the multi-processor module 106 to the other devices 110 - 114 .
- the multi-processor module 106 may include any type of well known processor, such as a processor from the Intel Pentium® family of microprocessors, the Intel Itanium® family of microprocessors, the Intel Centrino® family of microprocessors, and/or the Intel XScale® family of microprocessors.
- the multi-processor module 106 may include any type of well known cache memory, such as static random access memory (SRAM).
- SRAM static random access memory
- the main memory device 110 may include dynamic random access memory (DRAM) and/or any other form of random access memory.
- the main memory device 110 may include double data rate random access memory (DDRAM).
- the main memory device 110 may also include non-volatile memory.
- the main memory device 110 stores a software program which is executed by the multi-processor module 106 in a well known manner.
- the flash memory device 112 may be any type of flash memory device.
- the flash memory device 112 may store firmware used to boot the computer system 100 .
- the interface circuit(s) 114 may be implemented using any type of well known interface standard, such as an Ethernet interface and/or a Universal Serial Bus (USB) interface.
- One or more input devices 116 may be connected to the interface circuits 114 for entering data and commands into the main processing unit 102 .
- an input device 116 may be a keyboard, mouse, touch screen, track pad, track ball, isopoint, and/or a voice recognition system.
- One or more displays, printers, speakers, and/or other output devices 118 may also be connected to the main processing unit 102 via one or more of the interface circuits 114 .
- the display 118 may be a cathode ray tube (CRT), a liquid crystal displays (LCD), or any other type of display.
- the display 118 may generate visual indications of data generated during operation of the main processing unit 102 .
- the visual indications may include prompts for human operator input, calculated values, detected data, etc.
- the computer system 100 may also include one or more storage devices 120 .
- the computer system 100 may include one or more hard drives, a compact disk (CD) drive, a digital versatile disk drive (DVD), and/or other computer media input/output (I/O) devices.
- CD compact disk
- DVD digital versatile disk drive
- I/O computer media input/output
- the computer system 100 may also exchange data with other devices 122 via a connection to a network 124 .
- the network connection may be any type of network connection, such as an Ethernet connection, digital subscriber line (DSL), telephone line, coaxial cable, etc.
- the network 124 may be any type of network, such as the Internet, a telephone network, a cable network, and/or a wireless network.
- the network devices 122 may be any type of network devices 122 .
- the network device 122 may be a client, a server, a hard drive, etc.
- FIG. 2 is a more detailed block diagram of the example multi-processor module 106 illustrated in FIG. 1 .
- the multi-processor module 106 shown includes one or more processing cores 202 and one or more caches 204 electrically coupled by an interconnect 206 .
- the processor(s) 202 and/or the cache(s) 204 communicate with the main memory 110 over the system interconnect 108 via a memory controller 208 .
- Each processor 202 may be implemented by any type of processor, such as an Intel XScale® processor.
- Each cache 204 may be constructed using any type of memory, such as static random access memory (SRAM).
- the interconnect 206 may be any type of interconnect such as a bus, one or more dedicated lines, and/or a crossbar.
- Each of the components of the multi-processor module 106 may be on the same chip or on separate chips.
- the main memory 110 may reside on a separate chip.
- activity on the system interconnect 108 is reduced, computational performance is increased and power consumption is reduced. For example, avoiding unnecessary write-backs from a cache 204 to the main memory 110 increases the overall efficiency of the multi-processor module 106 .
- FIG. 3 is a block diagram of an example memory hierarchy 300 .
- memory elements e.g., registers, caches, main memory, etc.
- memory elements that are “closer” to the processor 202 in the memory hierarchy 300 are faster than memory elements that are “farther” from the processor 202 in the memory hierarchy 300 .
- closer memory elements are used for potentially frequent operations, and closer memory elements are checked first when the processor 202 executes a memory operation (e.g., a read or a write).
- Closer memory elements are typically constructed using faster memory technologies. However, faster memory technologies are typically more expensive than slower memory technologies. Accordingly, close memory elements are typically smaller than distant memory elements. Although four levels of memory are shown in FIG. 3 , persons of ordinary skill in the art will readily appreciate that more or fewer levels of memory may alternatively be used.
- the request is passed to an L0 cache 302 .
- the L0 cache 302 is internal to the processor 202 .
- the L0 cache 302 may be external to the processor 202 . If the L0 cache 302 holds the requested memory in a state that is compatible with the memory request (e.g., a write request is made and the L0 cache holds the memory in an “exclusive” state), the L0 cache 302 fulfills the memory request (i.e., an L0 cache hit). If the L0 cache 302 does not hold the requested memory (i.e., an L0 cache miss), the memory request is passed on to an L1 cache 204 a which is typically external to the processor 202 , but may be internal to the processor 202 .
- the L1 cache 204 a If the L1 cache 204 a holds the requested memory in a state that is compatible with the memory request, the L1 cache 204 a fulfills the memory request (i.e., an L1 cache hit). In addition, the requested memory may be moved up from the L1 cache 204 a to the L0 cache 302 . If the L1 cache 204 a does not hold the requested memory (i.e., an L1 cache miss), the memory request is passed on to an L2 cache 204 b which is typically external to the processor 202 , but may be internal to the processor 202 .
- the L2 cache 204 b fulfills the memory request (i.e., an L2 cache hit).
- the requested memory may be moved up from the L2 cache 204 b to the L1 cache 204 a and/or the L0 cache 302 .
- the main memory 110 i.e., an L2 cache miss. If the memory request is passed on to the main memory 110 , the main memory 110 fulfills the memory request.
- the requested memory may be moved up from the main memory 110 to the L2 cache 204 b , the L1 cache 204 a , and/or the L0 cache 302 .
- an intermediate structure may be used.
- an L1L2 queue 304 may be used to move a cache line from the L1 cache 204 a to the L2 cache 204 b when the cache line is being victimized in the L1 cache 204 a .
- a write-back queue 306 may be used to move a cache line from the L2 cache 204 b to the main memory 110 when the cache line is being victimized in the L2 cache 204 b.
- FIG. 4 is a state diagram of a MESI cache coherency protocol 400 which may be used by the L1 cache 204 a illustrated in FIG. 3 .
- This protocol 400 includes five states. Specifically, the MESI cache coherency protocol 400 includes an invalid state 402 , an exclusive state 404 , a modified state 406 , a shared state 408 , and a pending state (not shown).
- the MESI cache coherency protocol 400 includes an invalid state 402 , an exclusive state 404 , a modified state 406 , a shared state 408 , and a pending state (not shown).
- a person of ordinary skill in the art will readily appreciate that other cache coherency protocols may be used without departing from the scope or spirit of the disclosed system.
- Each cache line in the cache 204 a using the MESI protocol 400 is associated with one of these states.
- the state of a cache line is recorded in a cache directory.
- the state of a cache line is recorded in a tag associated with the cache line.
- the state of a cache line may be changed by retagging the cache line.
- retagging a cache line from “exclusive” to “shared” may be accomplished by changing a tag associated with the cache line from “001” to “010.”
- a person of ordinary skill in the art will readily appreciate that any method of storing and changing a cache line state may be used without departing from the scope or spirit of the disclosed system.
- An “invalid” cache line is a cache line that does not contain useful data (i.e., the cache line is effectively empty).
- An “exclusive” cache line is a cache line that is “non-modified” (i.e., the same as main memory 110 ).
- a “modified” cache line is a cache line that is “dirty” (i.e., different from main memory 110 ) (e.g., a new value was written to the cache copy, but not to main memory's copy).
- a “shared” cache line is a cache line that is held by more than one cache. In some implementations, an “owned” state is added or combined with another state.
- An “owned” cache line is a cache line that is “modified” and “shared” (i.e., “dirty” and held by another cache).
- the “owner” of a cache line is responsible for eventually updating main memory 110 with the modified value (i.e., the “owner” is responsible for performing the write-back).
- a “pending” cache line is a cache line that is associated with a memory fill in progress or a snoop confirm in progress.
- a write-back of an exclusive or shared line is merely a notification on the system bus of victimization from the processor caches.
- a write-back of an exclusive or shared line does not involve any data transfer.
- WM write-miss
- RME read-miss-exclusive
- RMS read-miss-shared
- RH read-hit
- WH write-hit
- SHR snoop-hit-on-read
- SHW snoop-hit-on-write/read-with-intent-to-modify
- a write-miss (WM) event is caused by the multi-processor module 106 attempting to write a cache line to a cache 204 a when that cache 204 a does not hold the cache line.
- a read-miss-exclusive (RME) event is caused by the multi-processor module 106 attempting to read a cache line from one cache 204 a when that cache 204 a does not hold the cache line, and no other cache 204 currently holds the cache line.
- a read-miss-shared (RMS) event is caused by the multi-processor module 106 attempting to read a cache line from one cache 204 a when that cache 204 a does not hold the cache line, but another cache 204 does hold the cache line in the shared state.
- a read-hit (RH) event is caused by the multi-processor module 106 attempting to read a cache line from a cache 204 a that holds the cache line. Other caches 204 holding the same line (but not supplying the line) see such a read as a snoop-hit-on-read (SHR) event.
- a write-hit (WH) event is caused by the multi-processor module 106 attempting to write a cache line to a cache 204 a that holds the cache line. Other caches 204 holding the same line see such a write as a snoop-hit-on-write (SHW) event.
- SHW snoop-hit-on-write
- a snoop hit associated with a read operation where there is an intent to modify the data is handled the same way as a snoop-hit-on-write (SHW) event.
- a cache state transition may be associated with an invalidate operation (I), a cache line fill operation (R), a snoop push operation (S), or a read-with-intent-to-modify operation (RM).
- An invalidate operation (I) causes an invalidate signal to be broadcast on the interconnect 206 .
- the invalidate signal causes other caches 204 to place the associated cache line in an invalid (I) state (effectively erasing that line from the cache).
- a cache line fill operation (R) causes a cache line to be read into the cache 204 a from main memory 110 .
- a snoop push operation causes the contents of a cache line to be written back to main memory 110 .
- a read-with-intent-to-modify operation causes other caches 204 to place the associated cache line in an invalid (I) state and causes a cache line to be read into the cache 204 a from main memory 110 .
- a “tag update” event may be included.
- the tag update event is generated when an L1 cache line changes from the exclusive state 404 to the modified state 406 .
- the tag update event causes a tag update entry to be placed in the tag update queue 304 .
- an entry in the tag update queue causes an L2 cache 204 b using an enhanced MESI cache coherency protocol to transition the cache line to an enhanced exclusive state.
- the enhanced exclusive state denotes that the L1 cache 204 a owns the cache line in the modified state 406 .
- FIG. 5 is a state diagram of an enhanced MESI cache coherency protocol which may be used by the L2 cache 204 b illustrated in FIG. 3 .
- This protocol 500 includes seven states. Specifically, the enhanced MESI cache coherency protocol 500 includes an invalid state 502 , an exclusive state 504 , an enhanced exclusive state 506 , a modified state 508 , an enhanced modified state 510 , a shared state 512 , and a pending state (not shown). In addition to these seven states, an error state 514 is shown for the purpose of illustration.
- Each cache line (i.e., memory block) in the L2 cache 204 b is associated with one of these enhanced MESI protocol states.
- the state of a cache line is recorded in a cache directory.
- the state of a cache line is recorded in a tag associated with the cache line.
- the state of a cache line may be changed by retagging the cache line.
- retagging a cache line from “exclusive” to “shared” may be accomplished by changing a tag associated with the cache line from “001” to “010.”
- a person of ordinary skill in the art will readily appreciate that any method of storing and changing a cache line state may be used without departing from the scope or spirit of the disclosed system.
- An “invalid” cache line is a cache line that does not contain useful data (i.e., the cache line is effectively empty).
- An “exclusive” cache line is a cache line that is “non-modified” (i.e., the same as main memory 110 ).
- Another cache e.g., L1 cache 204 a
- a “modified” cache line is a cache line that is “dirty” (i.e., different from main memory 110 ).
- Another cache e.g., L1 cache 204 a ) does not own this cache line.
- a “shared” cache line is a cache line that is held by more than one cache.
- a “pending” cache line is a cache line that is associated with a memory fill in progress or a snoop confirm in progress.
- An “enhanced exclusive” cache line is a cache line that is “non-modified” (i.e., the same as main memory 110 ) and a copy of the cache line is in another cache (e.g., an L1 cache 204 a ) in a modified state.
- An “enhanced modified” cache line is a cache line that is “dirty” (i.e., different from main memory 110 ) and a copy of the cache line may be in another cache (e.g., an L1 cache 204 a ) in any state.
- a cache 204 b may transition from one of these states to another of these states.
- These events include an instruction fetch (I-fetch) event, a load event, a load hit event, a load no-hit event, a store event, and an L1 write-back event.
- I-fetch instruction fetch
- a load event a load hit event
- a load no-hit event a load no-hit event
- a store event a store event
- L1 write-back event may be used in some examples.
- An instruction fetch event is caused by the multi-processor module 106 attempting to read an instruction from the cache 204 b .
- a load event is caused by the multi-processor module 106 attempting to read data from the cache 204 b regardless of whether the cache 204 b actually holds the desired data.
- a load hit event is caused by the multi-processor module 106 attempting to read data from the cache 204 b when the cache 204 b holds the desired data.
- a load no-hit event is caused by the multi-processor module 106 attempting to read data from the cache 204 b when the cache 204 b does not hold the desired data.
- a store event is caused by the multi-processor module 106 attempting to store data to the cache 204 b .
- An L1 write-back event is caused by the L1 cache 204 a writing a cache line back to main memory.
- a tag update event is caused when a L1 cache line transitions from the exclusive state to the modified state.
- a “tag update” event may be included.
- the tag update event is generated when an L1 cache line changes from the exclusive state 404 to the modified state 406 .
- the tag update event causes a tag update entry to be placed in the tag update queue 304 .
- An entry in the tag update queue 304 causes an L2 cache 204 b using an enhanced MESI cache coherency protocol to transition the cache line to the enhanced exclusive state 506 .
- the enhanced exclusive state 506 denotes that the L1 cache 204 a owns the cache line in the modified state 406 .
- the tag update queue 304 facilitates an early MESI update to the L2 cache 402 a .
- the tag update queue 304 is a snoopable structure, since it can provide information about a HITM in the L1 cache 402 a .
- the L1 cache 402 a maintains its cache line in the modified state 406 , since the L1 cache 402 a is responsible for writing-back the modified data.
- FIG. 6 is a flowchart representative of an example process 600 which may be executed by a device (e.g., L2 cache 204 b and/or memory controller 208 ) to implement an example mechanism for victimizing lines in a cache (e.g., L2 cache 204 b ).
- a device e.g., L2 cache 204 b and/or memory controller 208
- the illustrated process 600 is embodied in an integrated circuit associated with a cache device 204 b .
- the illustrated process 600 may be embodied in one or more software programs which are stored in one or more memories (e.g., flash memory 112 and/or hard disk 120 ) and executed by one or more processors (e.g., multi-processor module 106 ) in a well known manner.
- processors e.g., multi-processor module 106
- the example process 600 handles the victimization of a cache line.
- a victimized cache line is a cache line that is overwritten by another cache line or returned to the invalid state 502 .
- a cache line may be victimized according to any well known victimization policy. For example, a least-recently-used (LRU) policy may be employed to select a cache line for victimization.
- LRU least-recently-used
- the process 600 determines the current state of the selected cache line. Depending on the current state of the selected cache line, the process 600 may issue an internal inquiry and/or perform a write-back operation prior to victimizing the selected cache line.
- the example process 600 begins by determining if a cache line in an L2 cache 204 b needs to be victimized (block 602 ). For example, if a new cache line is about to be read into the cache 204 b , the cache 204 b may need to remove an existing cache line in order to make room for the new cache line.
- the process 600 selects a cache line for victimization (block 604 ). This selection may be performed according to any well known cache victimization selection process. For example, a least recently used (LRU) process may be used to select a cache line for victimization.
- LRU least recently used
- the process 600 determines what state the selected cache line is in. In the illustrated example, if the L2 cache line is in the pending state (block 606 ), the process 600 loops back to select another cache line (block 604 ). If the L2 cache line is in the shared state 512 (block 608 ), the process 600 issues an internal inquiry (block 610 ). Similarly, if the L2 cache line is in the exclusive state 504 (block 612 ) or the enhanced modified state 510 (block 614 ), the process 600 issues the internal inquiry (block 610 ).
- the internal inquiry generates a snoop response from other caches (e.g., the L1 cache 204 a ).
- a “hit” snoop response means another cache (e.g., the L1 cache 204 a ) also holds the selected cache line (i.e., the cache line that is about to be victimized).
- a “no hit” snoop response means no other cache holds the selected cache line. If the snoop response is a “no hit” (block 616 ), the process 600 writes the selected cache line back to main memory 110 (block 618 ) before the cache line is victimized (block 620 ). If the snoop response is a “hit” (block 616 ), the process 600 may victimize the cache line (block 620 ) without performing the write-back (block 618 ).
- the L1 cache posts a “no hit” if the inquiry is due to an L2 victim in the enhanced modified state 510 and the L1 state is not modified.
- the L1 cache 204 a invalidates its entry if the inquiry is due to an L2 victim in the enhanced modified state 510 and the L1 state is exclusive 404 or shared 408 .
- the L1 cache 204 a need not invalidate an exclusive 404 or shared 408 line upon receiving an inquiry if the L2 cache 204 b is not designed to be inclusive of the L1 cache 204 a .
- the L1 cache 204 a posts a “no hit” if the L1 state is not modified and if the L2 cache is designed to be inclusive of the L1 cache 204 a .
- the L1 cache 204 a should invalidate an exclusive 404 or shared 408 line upon receiving an inquiry if the L2 cache 204 b is designed to be inclusive of the L1 cache 204 a.
- the process 600 may write the selected cache line back to main memory 110 (block 618 ) and victimize the cache line (block 620 ) without the need for the internal inquiry (block 610 ). If the L2 cache line is in the invalid state 502 (block 624 ) or the enhanced exclusive state 506 (block 626 ), the process 600 may victimize the selected cache line without the need for the internal inquiry (block 610 ) or the write-back (block 618 ). If the L2 cache line is not in any of the predefined states, the process 600 may generate an error (block 628 ).
- FIG. 6 b is a flowchart representative of an example process 650 which may be executed by a device (e.g., L2 cache 204 b and/or memory controller 208 ) to implement an example mechanism for victimizing lines in a L2 cache 204 b when the L2 cache 204 b is designed to be inclusive of the L1 cache 204 a .
- the illustrated process 650 is embodied in an integrated circuit associated with a cache device 204 b .
- the illustrated process 650 may be embodied in one or more software programs which are stored in one or more memories (e.g., flash memory 112 and/or hard disk 120 ) and executed by one or more processors (e.g., multi-processor module 106 ) in a well known manner.
- processors e.g., multi-processor module 106
- FIG. 6 b a person of ordinary skill in the art will readily appreciate that many other methods of performing the process 650 may be used. For example, the order of many of the blocks may be altered, the operation of one or more blocks may be changed, blocks may be combined, and/or blocks may be eliminated.
- the example process 650 handles the victimization of a cache line.
- a victimized cache line is a cache line that is overwritten by another cache line or returned to the invalid state 502 .
- a cache line may be victimized according to any well known victimization policy. For example, a least-recently-used (LRU) policy may be employed to select a cache line for victimization.
- LRU least-recently-used
- the process 650 determines the current state of the selected cache line. Depending on the current state of the selected cache line, the process 650 may issue an internal inquiry and/or perform a write-back operation prior to victimizing the selected cache line.
- the example process 650 begins by determining if a cache line in an L2 cache 204 b needs to be victimized (block 652 ). For example, if a new cache line is about to be read into the cache 204 b , the cache 204 b may need to remove an existing cache line in order to make room for the new cache line.
- the process 650 selects a cache line for victimization (block 654 ). This selection may be performed according to any well known cache victimization selection process. For example, a least recently used (LRU) process may be used to select a cache line for victimization.
- LRU least recently used
- the process 650 determines what state the selected cache line is in. In the illustrated example, if the L2 cache line is in the pending state (block 656 ), the process 650 loops back to select another cache line (block 654 ). If the L2 cache line is in the shared state 512 (block 658 ), the process 650 issues an internal inquiry (block 660 ) before the cache line is victimized (block 662 ).
- the process 650 issues an internal inquiry (block 666 ). As described below with reference to FIG. 7 , the internal inquiry generates a snoop response from other caches (e.g., the L1 cache 204 a ). If the snoop response is a “hit modified” response (block 668 ), the process 650 issues an invalidate signal on the FSB (block 670 ) before the cache line is victimized (block 662 ). If the snoop response is not a “hit modified” response (block 668 ), the process 650 may victimize the cache line (block 662 ) without issuing an invalidate signal on the FSB (block 670 ).
- the internal inquiry generates a snoop response from other caches (e.g., the L1 cache 204 a ). If the snoop response is a “hit modified” response (block 668 ), the process 650 issues an invalidate signal on the FSB (block 670 ) before the cache line is victimized (block 662
- the process 650 may issue the invalidate signal on the FSB (block 670 ) and victimize the cache line (block 662 ) without issuing the internal inquiry (block 666 ).
- the process 650 issues an internal inquiry (block 676 ). As described below with reference to FIG. 7 , the internal inquiry generates a snoop response from other caches (e.g., the L1 cache 204 a ). If the snoop response is a “hit modified” response (block 678 ), the process 650 issues an invalidate signal on the FSB (block 670 ) before the cache line is victimized (block 662 ). If the snoop response is not a “hit modified” response (block 678 ), the process 650 writes the selected cache line back to main memory 110 (block 680 ) before the cache line is victimized (block 662 ). If the L2 cache line is in the modified state 508 (block 682 ), the process 650 may write the cache line back to main memory 110 (block 680 ) and victimize the cache line (block 662 ) without issuing the internal inquiry (block 676 ).
- the internal inquiry generates a snoop response from other caches (e.g., the
- the process 650 victimizes the selected cache line (block 662 ). If the L2 cache line is not in any of the predefined states, the process 650 may generate an error (block 686 ).
- FIG. 7 is a flowchart representative of an example process 700 which may be executed by a device (e.g., L1 cache 204 a ) to implement an example mechanism for responding to the internal inquiry.
- a device e.g., L1 cache 204 a
- the illustrated process 700 is embodied in an integrated circuit associated with a cache device 204 a .
- the illustrated process 700 may be embodied in one or more software programs which are stored in one or more memories (e.g., flash memory 112 and/or hard disk 120 ) and executed by one or more processors (e.g., multi-processor module 106 ) in a well known manner.
- processors e.g., multi-processor module 106
- the example process 700 responds to an internal inquiry by posting a hit or a hit-modified signal depending on the current state of the cache line associated with the internal inquiry in the L1 cache 204 a .
- the L2 cache 204 b is inclusive of the L1 cache 204 a
- victimization in the L2 cache 204 b requires invalidation from the L1 cache 204 a .
- the L2 cache 204 b is not inclusive of the L1 cache 204 a
- victimization from L2 cache 204 b need not result in invalidation from the L1 cache 204 a.
- the illustrated process 700 begins by waiting for the internal inquiry (block 702 ).
- the process 700 determines if the cache line associated with the internal inquiry is in the L1 cache 204 a in the exclusive state 404 or the shared state 408 (block 704 ). If the cache line associated with the internal inquiry is in the L1 cache 204 a in the exclusive state 404 or the shared state 408 , the process 700 determines if the L2 cache 204 b is inclusive of the L1 cache 204 a (block 706 ).
- the process 700 invalidates the cache line (block 708 ) and posts a ⁇ HIT signal (i.e., a NO HIT or a MISS) signal from the L1 cache 204 a (block 710 ). If the L2 cache 204 b is not inclusive of the L1 cache 204 a (block 706 ), the process 700 posts a hit signal (block 712 ).
- a ⁇ HIT signal i.e., a NO HIT or a MISS
- the process 700 determines if the cache line associated with the internal inquiry is in the L1 cache 204 a in the modified state 406 (block 714 ). If the cache line associated with the internal inquiry is in the L1 cache 204 a in the modified state 406 , the process 700 posts a HITM (i.e., a HIT MODIFIED) signal from the L1 cache 204 a (block 716 ). The HITM signal indicates to other caches that this L1 cache 204 b holds the cache line associated with the snoop probe in a modified state.
- a HITM i.e., a HIT MODIFIED
- the process posts a ⁇ HIT signal (i.e., a NO HIT or a MISS) signal from the L1 cache 204 a (block 710 ).
- a ⁇ HIT signal i.e., a NO HIT or a MISS
- the ⁇ HIT signal indicates to other caches that this L1 cache 204 a does not hold the cache line in any non-pending state.
- FIG. 8 is a flowchart representative of an example process 800 which may be executed by a device to implement an example mechanism for handling snoop probes to a cache using the enhanced MESI protocol (e.g., L2 cache 204 b ).
- the illustrated process 800 is embodied in an integrated circuit associated with a cache device 204 b .
- the illustrated process 800 may be embodied in one or more software programs which are stored in one or more memories (e.g., flash memory 112 and/or hard disk 120 ) and executed by one or more processors (e.g., multi-processor module 106 ) in a well known manner.
- the process 800 is described with reference to the flowchart illustrated in FIG.
- the example process 800 handles snoop probes to the L2 cache 204 b by determining what state (if any) the cache line associated with the snoop probe is in and posting a signal indicative of the current cache line state. In addition, the process 800 determines which cache (if any) has an implicit write-back of the cache line, and the process 800 may invalidate the cache line in the L2 cache 204 b.
- the illustrated process 800 begins by determining if the cache line associated with the snoop probe is in the L2 cache 204 b in the exclusive state 504 , the enhanced exclusive state 506 , or the shared state 512 (block 802 ). If the cache line associated with the snoop probe is in the L2 cache in the exclusive state 504 , the enhanced exclusive state 506 , or the shared state 512 , the process 800 posts a HIT signal from the L2 cache 204 b (block 804 ). The HIT signal indicates to other caches that this L2 cache 204 b holds the cache line associated with the snoop probe in a non-modified state.
- the process 800 determines if the cache line associated with the snoop probe is in the L2 cache 204 b in the modified state 508 or the enhanced modified state 510 (block 806 ).
- the process 800 posts a ⁇ HIT (i.e., a NO HIT or a MISS) signal from the L2 cache 204 b (block 808 ).
- ⁇ HIT i.e., a NO HIT or a MISS
- the -HIT signal indicates to other caches that this L2 cache 204 b does not hold the cache line associated with the snoop probe in any valid or non-pending state.
- the process 800 posts a HITM (i.e., a HIT MODIFIED) signal from the L2 cache 204 b (block 810 ).
- HITM i.e., a HIT MODIFIED
- the HITM signal indicates to other caches that this L2 cache 204 b holds the cache line associated with the snoop probe in a modified (or enhanced modified) state.
- the process 800 determines if the L1 cache 204 a also posted a HITM signal (block 812 ). If the L1 cache 204 a also posted a HITM signal, there is an implicit write-back of the cache line associated with the snoop probe from the L1 cache 204 a (block 814 ). As a result, the L2 cache 204 a may invalidate the cache line (block 816 ). However, if the L1 cache 204 a does not post a HITM signal, there is an implicit write-back of the cache line associated with the snoop probe from the L2 cache 204 b (block 818 ). As a result, the L2 cache 204 a may not invalidate the cache line.
- FIGS. 9-10 are a flowchart representative of an example process 900 which may be executed by a device (e.g., L2 cache 204 b and/or memory controller 208 ) to implement an example mechanism for filtering snoops (without the use of tags as described below with reference to FIGS. 11-12 ).
- a device e.g., L2 cache 204 b and/or memory controller 208
- the illustrated process 900 is embodied in an integrated circuit associated with the memory hierarchy 300 .
- the illustrated process 900 may be embodied in one or more software programs which are stored in one or more memories (e.g., flash memory 112 and/or hard disk 120 ) and executed by one or more processors (e.g., multi-processor module 106 ) in a well known manner.
- processors e.g., multi-processor module 106
- process 900 is described with reference to the flowchart illustrated in FIGS. 9-10 , a person of ordinary skill in the art will readily appreciate that many other methods of performing the process 900 may be used. For example, the order of many of the blocks may be altered, the operation of one or more blocks may be changed, blocks may be combined, and/or blocks may be eliminated.
- the example process 900 monitors a snoop queue. Depending on the type of snoop probe in the snoop queue and the state of the associated cache line in the L2 cache 204 b , the L2 write-back queue 306 , and/or the L1L2 queue 304 , the process 900 posts information and sends snoop probes to other caches in an effort to filter snoops and reduce bus traffic. This process 900 assumes that the L2 cache 204 b is maintained inclusive of the L1 cache 204 a by using an L2 victimization process (see FIG. 6 b ).
- the L2 write-back queue 306 and the L1L2 queue 304 are considered part of the L2 cache 204 b for snoop purposes.
- a snoop probe that finds a line in L2 write-back queue 306 is interpreted as finding the line in L2 cache 204 b in that state.
- a snoop probe that finds an L1 write-back in the L1L2 queue 304 is interpreted as finding the line in the L2 cache 204 b in the modified state 508 .
- the illustrated process 900 begins by waiting for an entry in the snoop queue (block 902 ).
- a snoop queue entry is a snoop probe that would normally be propagated to all the cache devices.
- the process 900 determines if the type of entry in the snoop queue is a snoop-to-share type (block 904 ).
- a snoop-to-share entry is an entry that causes a cache line to transition to the shared state if a hit is found.
- the process 900 sends a snoop probe only to the L2 cache 204 b , the L2 write-back queue 306 , and the L1L2 queue 304 (block 906 ).
- the process 900 may send the snoop probe to each of these entities simultaneously or one at a time.
- the process 900 may not send the snoop probe to all of these entities if a predetermined state is found before the snoop probe is sent to all of the entities.
- the tests described below may be rearranged to perform all of the tests for one of the entities before moving on to the tests for another of the entities.
- the process 900 may check for each of the L2 cache 204 b states described below before moving on to test each of the L2 write-back queue 306 states.
- the process 900 posts a HITM signal (block 910 ).
- the HITM signal indicates to other caches that some cache 204 holds the cache line associated with the snoop probe in a modified state 406 , 508 or an enhanced modified state 510 .
- the cache that holds the cache line associated with the snoop probe in a modified state 508 or an enhanced modified state 510 may be the L2 cache 204 b , as directly indicated by the L2 cache 204 b posting the HITM signal in response to the L2 cache 204 b holding the line in the modified state 508 or the enhanced modified state 510 .
- the cache that holds the cache line associated with the snoop probe in a modified state 406 may be the L1 cache 204 a , as indirectly indicated by the L2 cache 204 b posting the HITM signal in response to the L2 cache 204 b holding the cache line in the enhanced exclusive state 506 .
- the process 900 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 912 ).
- a snoop-to-invalidate probe is probe that causes a cache line to transition to the invalid state if a hit is found.
- the process 900 posts a HIT signal (block 916 ).
- the MISS signal from the L2 write-back queue 306 indicates the L2 write-back queue 306 does not hold a copy of the cache line.
- the HIT signal from the L2 cache 204 b indicates to other caches that the L2 cache 204 b holds the cache line associated with the snoop probe in the shared state 512 .
- the process 900 sends a snoop-to-share probe to the L1 cache 204 a and the L0 cache 302 (block 922 ).
- the process 900 then posts a HIT signal or a HITM signal based on the response to the snoop-to-share probe from the L1 cache 204 a (block 924 ).
- the HIT signal is posted if the response from the L1 cache 204 a is not a HITM.
- the HITM signal is posted if the response from the L1 cache 204 a is a HITM.
- the process 900 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 928 ).
- the HIT signal from the L2 write-back queue 306 indicates the L2 write-back queue 306 holds a copy of the cache line.
- the process 900 posts a HIT signal or a HITM signal based on the response to the snoop-to-invalidate probe from the L1 cache 204 a (block 930 ).
- the NO HIT signal is posted if the response from the L1 cache 204 a is not a HITM.
- the HITM signal is posted if the response from the L1 cache 204 a is a HITM.
- the process 900 posts a NO HIT signal (block 934 ) and a NO HITM signal (block 936 ).
- the process 900 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 938 ).
- the process 900 posts a NO HIT signal (block 942 ) and a NO HITM signal (block 944 ).
- the process 900 determines if the type of entry in the snoop queue is a snoop-to-invalidate type (block 946 ). If the type of entry in the snoop queue is a snoop-to-invalidate entry, the process 900 sends a snoop probe only to the L2 cache 204 b , the L2 write-back queue 306 , and the L1L2 queue 304 (block 1002 of FIG. 10 ).
- the process 900 may send the snoop probe to each of these entities simultaneously or one at a time.
- the process 900 may not send the snoop probe to all of these entities if a predetermined state is found before the snoop probe is sent to all of the entities.
- the tests described below may be rearranged to perform all of the tests for one of the entities before moving on to the tests for another of the entities.
- the process 900 may check for each of the L2 cache 204 b states described below before moving on to test each of the L2 write-back queue 306 states.
- the process 900 posts a HITM signal (block 1006 ). Again, the HITM signal indicates to other caches that some cache 204 holds the cache line associated with the snoop probe in a modified state 406 , 508 or an enhanced modified state 510 .
- the cache that holds the cache line in a modified state 508 or an enhanced modified state 510 may be the L2 cache 204 b , as directly indicated by the L2 cache 204 b posting the HITM signal in response to the L2 cache 204 b holding the line in the modified state 508 or the enhanced modified state 510 .
- the cache that holds the cache line in a modified state 406 may be the L1 cache 204 a , as indirectly indicated by the L2 cache 204 b posting the HITM signal in response to the L2 cache 204 b holding the cache line in the enhanced exclusive state 506 .
- the process 900 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 1008 ).
- the process 900 posts a NO HIT signal (block 1012 ) and a NO HITM signal (block 1014 ). In addition, the process 900 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 1016 ).
- the process 900 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 1020 ).
- the process 900 posts a HITM signal based on the response to the snoop-to-invalidate probe from the L1 cache 204 a (block 1022 ).
- the HITM signal is posted if the response from the L1 cache 204 a is a HITM.
- the process 900 posts a NO HIT signal (block 1026 ) and a NO HITM signal (block 1028 ).
- FIGS. 11-12 are a flowchart representative of an example process 1100 which may be executed by a device (e.g., L2 cache 204 b and/or memory controller 208 ) to implement an example mechanism for filtering snoops with the use of tags.
- the illustrated process 1100 is embodied in an integrated circuit associated with the memory hierarchy 300 .
- the illustrated process 100 may be embodied in one or more software programs which are stored in one or more memories (e.g., flash memory 112 and/or hard disk 120 ) and executed by one or more processors (e.g., multi-processor module 106 ) in a well known manner.
- processors e.g., multi-processor module 106
- the example process 1100 monitors a snoop queue. Depending on the type of snoop probe in the snoop queue and the state of the associated cache line in the L2 cache 204 b , the L2 write-back queue 306 , and/or the L1L2 queue 304 , the process 1100 posts information and sends snoop probes to other caches in an effort to filter snoops and reduce bus traffic.
- This process 1100 assumes that the L2 cache 204 b is maintained inclusive of the L1 cache 204 a by using an L2 victimization process (see FIG. 6 b ). Unlike the process 900 described above, this process 1100 performs these actions with the use of the tag update queue 304 as described herein.
- the tag update queue 304 , the L2 write-back queue 306 and the L1L2 queue 304 are considered part of the L2 cache 204 b for snoop purposes.
- a snoop probe that finds a line in L2 write-back queue 306 is interpreted as finding the line in L2 cache 204 b in that state.
- a snoop probe hit in the tag update queue 304 is interpreted as finding the line in L2 cache 204 b in enhanced exclusive state.
- a snoop probe that finds an L1 write-back in the L1L2 queue 304 is interpreted as finding the line in L2 cache 204 b in the modified state 508 .
- the illustrated process 1100 begins by waiting for an entry in the snoop queue (block 1102 ). Again, a snoop queue entry is a snoop probe that would normally be propagated to all the cache devices.
- the process 1100 determines if the type of entry in the snoop queue is a snoop-to-share type (block 1104 ). If the type of entry in the snoop queue is a snoop-to-share entry, the process 1100 sends a snoop probe only to the L2 cache 204 b , the L2 write-back queue 306 , the L1L2 queue 304 , and the tag update queue 304 (block 1106 ).
- the process 1100 may send the snoop probe to each of these entities simultaneously or one at a time, and the process 1100 may not send the snoop probe to all of these entities if a predetermined state is found before the snoop probe is sent to all of the entities. Accordingly, the tests described below may be rearranged to perform all of the tests for one of the entities before moving on to the tests for another of the entities.
- the process 1100 posts a HITM signal (block 1110 ). Again, the HITM signal indicates to other caches that some cache 204 holds the cache line associated with the snoop probe in a modified state 406 , 508 or an enhanced modified state 510 .
- the cache that holds the cache line associated with the snoop probe in a modified state 508 or an enhanced modified state 510 may be the L2 cache 204 b , as directly indicated by the L2 cache 204 b indicating it is holding the line in the modified state 508 or the enhanced modified state 510 .
- the cache that holds the cache line associated with the snoop probe in a modified state 406 may be the L1 cache 204 a , as indirectly indicated by the L2 cache 204 b indicating it is holding the cache line in the enhanced exclusive state 506 .
- the process 1100 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 1112 ).
- the process 1100 posts a HIT signal (block 1116 ).
- the HIT signal indicates to other caches that the L2 cache 204 b holds the cache line associated with the snoop probe in the shared state 512 .
- the process 1100 posts a HIT signal (block 1120 ) and sends a snoop-to-share probe to the L1 cache 204 a and the L0 cache 302 (block 1122 ).
- this process 1100 does not post a HIT signal or a HITM signal based on the response to the snoop-to-share probe from the L1 cache 204 a because the process 1100 guarantees that the L1 cache 204 a will not modify the cache line for which a snoop is in progress. Stores committed to that cache line prior to the snoop probe will be present in the tag update queue 304 and result in a HITM from the L2 cache 204 b.
- the process 1100 posts a NO HIT signal (block 1126 ) and a NO HITM signal (block 1128 ).
- the process 1100 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 1130 ). Again, unlike the process 900 described above, this process 1100 does not post a HIT signal or a HITM signal based on the response to the snoop-to-invalidate probe from the L1 cache 204 a.
- the process 1100 posts a NO HIT signal (block 1134 ) and a NO HITM signal (block 1136 ).
- the process 1100 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 1138 ).
- the process 1100 posts a NO HIT signal (block 1142 ) and a NO HITM signal (block 1144 ).
- the process 1100 determines if the type of entry in the snoop queue is a snoop-to-invalidate type (block 1146 ). If the type of entry in the snoop queue is a snoop-to-invalidate entry, the process 1100 sends a snoop probe only to the L2 cache 204 b , the L2 write-back queue 306 , the L1L2 queue 304 , and the tag update queue 304 (block 1202 of FIG. 12 ).
- the process 1100 posts a HITM signal (block 1206 ).
- the HITM signal indicates to other caches that some cache 204 holds the cache line associated with the snoop probe in a modified state 406 , 508 or an enhanced modified state 510 .
- the cache that holds the cache line in a modified state 508 or an enhanced modified state 510 may be the L2 cache 204 b , as directly indicated by the L2 cache 204 b .
- the cache that holds the cache line in a modified state 406 may be the L1 cache 204 a , as indirectly indicated by the L2 cache 204 b indicating it is holding the cache line in the enhanced exclusive state 506 .
- the response to the snoop probe is indicative of the L2 cache 204 b holding the cache line in the enhanced exclusive state 506 , the modified state 508 , or the enhanced modified state 510 (block 1204 )
- the process 1100 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 1208 ).
- the process 900 posts a NO HIT signal (block 1212 ) and a NO HITM signal (block 1214 ).
- the process 1100 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 1216 ).
- the process 1100 posts a NO HIT signal (block 1220 ) and a NO HITM signal (block 1222 ).
- the process 1100 sends a snoop-to-invalidate probe to the L1 cache 204 a and the L0 cache 302 (block 1224 ). Again, unlike the process 900 described above, this process 1100 does not post a HIT signal or a HITM signal based on the response to the snoop-to-invalidate probe from the L1 cache 204 a.
- the process 1100 posts a NO HIT signal (block 1228 ) and a NO HITM signal (block 1230 ).
Abstract
Methods and apparatus for maintaining cache coherency and reducing write-back traffic by using an enhanced MESI cache coherency protocol are disclosed. The enhanced MESI protocol includes the traditional MESI cache states (i.e., modified, exclusive, shared, invalid, and pending) as well as two additional cache states (i.e., enhanced modified and enhanced exclusive). An enhanced modified cache line is a cache line that is different than main memory and a copy of the cache line may be in another cache. An enhanced exclusive cache line is a cache line that is not modified and a copy of the cache line is in another cache in a modified state. Depending on the state of a victimized cache line, an internal inquiry may be issued to other caches and/or a write-back operation may be performed prior to victimizing selected cache line. Snoop probes are filtered to reduce bus traffic.
Description
- The present disclosure pertains to cache memory and, more particularly, to methods and an apparatus for filtering a cache snoop.
- In an effort to increase computational power, many computing systems are turning to multi-processor systems. A multi-processor system typically includes a plurality of microprocessors, a plurality of associated caches, and a main memory. In an effort to reduce bus traffic to the main memory, many multi-processor systems use a “write-back” (as opposed to a “write-through”) policy. A “write-back” policy is a cache procedure whereby a microprocessor may locally modify data in its cache without updating the main memory until the cache data needs to be replaced.
- In order to maintain cache coherency in such a system, a cache coherency protocol is used. One well known cache coherency protocol is the MESI (modified-exclusive-shared-invalid) cache coherency protocol. Typically, a cache designed for the MESI protocol uses three bits to encode five states. The five states are modified, exclusive, shared, invalid, and pending.
- In an effort to further reduce bus traffic to the main memory, many of these cache coherency protocols allow a first cache that is holding locally modified data (i.e., “dirty” data) to directly supply a second cache that is requesting the same cache line (i.e., memory block) without updating main memory. Typically, the first cache then puts its cache line in an “owned” state to indicate that the line is “dirty” and shared. However, when the “owned” cache line is victimized (e.g., replaced), the first cache must write the line back to main memory so that the modifications are not lost. This write-back generates bus traffic to the main memory. Bus traffic increase memory latency and power consumption. Subsequent modifications to the cache line in the second cache will also need to be written-back to main memory, thereby generating additional bus traffic.
-
FIG. 1 is a block diagram of an example computer system illustrating an environment of use for the disclosed system. -
FIG. 2 is a more detailed block diagram of the example multi-processor module illustrated inFIG. 1 . -
FIG. 3 is a block diagram of an example memory hierarchy. -
FIG. 4 is a state diagram of a MESI cache coherency protocol which may be used by the L1 cache illustrated inFIG. 3 . -
FIG. 5 is a state diagram of an enhanced MESI cache coherency protocol which may be used by the L2 cache illustrated inFIG. 3 . -
FIG. 6 is a flowchart representative of an example process which may be executed by a device to implement an example mechanism for victimizing cache lines in the L2 cache illustrated inFIG. 3 . -
FIG. 6 b is a flowchart representative of an example process which may be executed by a device to implement an example mechanism for victimization from an L2 cache when the L2 cache is designed to be inclusive of the L1 cache. -
FIG. 7 is a flowchart representative of an example process which may be executed by a device to implement an example mechanism for responding to an internal inquiry at the L1 cache illustrated inFIG. 3 . -
FIG. 8 is a flowchart representative of an example process which may be executed by a device to implement an example mechanism for handling snoop probes to the L2 cache illustrated inFIG. 3 . -
FIGS. 9-10 are a flowchart representative of an example process which may be executed by a device to implement an example mechanism for filtering snoops without the use of tags. -
FIGS. 11-12 are a flowchart representative of an example process which may be executed by a device to implement an example mechanism for filtering snoops with the use of tags. - Generally, the disclosed system maintains cache coherency and reduces write-back traffic by using an enhanced MESI cache coherency protocol. The enhanced MESI protocol includes the traditional MESI cache states (i.e., modified, exclusive, shared, invalid, and pending) as well as two additional cache states (i.e., enhanced modified and enhanced exclusive). A modified cache line is a cache line that is different than main memory. In contrast, an enhanced modified cache line is a cache line that is different than main memory and a copy of the cache line is in another cache. For example, an L2 cache holding a cache line in the enhanced modified state “knows” an L1 cache holds the same cache line. However, the L2 cache does not necessarily know the state of the cache line in the L1 cache. An exclusive cache line is a cache line that is not modified (i.e., the same as main memory). In contrast, an enhanced exclusive cache line is a cache line that is not modified and a copy of the cache line is in another cache in a modified state. For example, an L2 cache holding a cache line in the enhanced exclusive state “knows” an L1 cache holds the same cache line in the modified state.
- When a cache using the enhanced MESI protocol is full and needs to bring in a new cache line, the cache must select an existing cache line to victimize. A cache line may be selected for victimization according to any well known victimization policy. For example, a least recently used victimization policy may be used to select a cache line for victimization. Depending on the state of the selected cache line, an internal inquiry may be issued to other caches and/or a write-back operation may be performed prior to victimizing the selected cache line.
- For example, an L2 cache may issue an inquiry to an L1 cache if the L2 cache is about to victimize a cache line in the enhanced modified state. The L1 cache responds to the internal inquiry by invalidating the cache line associated with the internal inquiry or posting a hit-modified signal depending on the current state of the cache line associated with the internal inquiry in the L1 cache. If the L1 cache does not hold the cache line, the L1 cache posts a no-hit signal. Based on how the L1 cache responds to the internal inquiry, the L2 cache may victimize the cache line without performing a write-back to main memory.
- In addition, snoop probes may be filtered by monitoring a snoop queue. Depending on the type of snoop probe in the snoop queue and the state of the associated cache line in the L2 cache and/or the L2 write-
back queue 306, cache hit information is posted and snoop probes may be sent to other caches in an effort to filter snoops and reduce bus traffic. This snoop filtering may be performed with or without the use of tags. - Turning now to a more detailed description,
FIG. 1 is a block diagram of an example computer system illustrating an environment of use for the disclosed system. Thecomputer system 100 may be a personal computer (PC) or any other computing device. In the example illustrated, thecomputer system 100 includes amain processing unit 102 powered by apower supply 104. Themain processing unit 102 may include amulti-processor module 106 electrically coupled by asystem interconnect 108 to amain memory device 110, aflash memory device 112, and one ormore interface circuits 114. In an example, thesystem interconnect 108 is an address/data bus. Of course, a person of ordinary skill in the art will readily appreciate that interconnects other than busses may be used to connect themulti-processor module 106 to the other devices 110-114. For example, one or more dedicated lines and/or a crossbar may be used to connect themulti-processor module 106 to the other devices 110-114. - The
multi-processor module 106 may include any type of well known processor, such as a processor from the Intel Pentium® family of microprocessors, the Intel Itanium® family of microprocessors, the Intel Centrino® family of microprocessors, and/or the Intel XScale® family of microprocessors. In addition, themulti-processor module 106 may include any type of well known cache memory, such as static random access memory (SRAM). Themain memory device 110 may include dynamic random access memory (DRAM) and/or any other form of random access memory. For example, themain memory device 110 may include double data rate random access memory (DDRAM). Themain memory device 110 may also include non-volatile memory. In an example, themain memory device 110 stores a software program which is executed by themulti-processor module 106 in a well known manner. Theflash memory device 112 may be any type of flash memory device. Theflash memory device 112 may store firmware used to boot thecomputer system 100. - The interface circuit(s) 114 may be implemented using any type of well known interface standard, such as an Ethernet interface and/or a Universal Serial Bus (USB) interface. One or
more input devices 116 may be connected to theinterface circuits 114 for entering data and commands into themain processing unit 102. For example, aninput device 116 may be a keyboard, mouse, touch screen, track pad, track ball, isopoint, and/or a voice recognition system. - One or more displays, printers, speakers, and/or
other output devices 118 may also be connected to themain processing unit 102 via one or more of theinterface circuits 114. Thedisplay 118 may be a cathode ray tube (CRT), a liquid crystal displays (LCD), or any other type of display. Thedisplay 118 may generate visual indications of data generated during operation of themain processing unit 102. The visual indications may include prompts for human operator input, calculated values, detected data, etc. - The
computer system 100 may also include one ormore storage devices 120. For example, thecomputer system 100 may include one or more hard drives, a compact disk (CD) drive, a digital versatile disk drive (DVD), and/or other computer media input/output (I/O) devices. - The
computer system 100 may also exchange data withother devices 122 via a connection to anetwork 124. The network connection may be any type of network connection, such as an Ethernet connection, digital subscriber line (DSL), telephone line, coaxial cable, etc. Thenetwork 124 may be any type of network, such as the Internet, a telephone network, a cable network, and/or a wireless network. Thenetwork devices 122 may be any type ofnetwork devices 122. For example, thenetwork device 122 may be a client, a server, a hard drive, etc. -
FIG. 2 is a more detailed block diagram of theexample multi-processor module 106 illustrated inFIG. 1 . Themulti-processor module 106 shown includes one ormore processing cores 202 and one ormore caches 204 electrically coupled by aninterconnect 206. The processor(s) 202 and/or the cache(s) 204 communicate with themain memory 110 over thesystem interconnect 108 via amemory controller 208. - Each
processor 202 may be implemented by any type of processor, such as an Intel XScale® processor. Eachcache 204 may be constructed using any type of memory, such as static random access memory (SRAM). Theinterconnect 206 may be any type of interconnect such as a bus, one or more dedicated lines, and/or a crossbar. Each of the components of themulti-processor module 106 may be on the same chip or on separate chips. For example, themain memory 110 may reside on a separate chip. Typically, if activity on thesystem interconnect 108 is reduced, computational performance is increased and power consumption is reduced. For example, avoiding unnecessary write-backs from acache 204 to themain memory 110 increases the overall efficiency of themulti-processor module 106. -
FIG. 3 is a block diagram of anexample memory hierarchy 300. Typically, memory elements (e.g., registers, caches, main memory, etc.) that are “closer” to theprocessor 202 in thememory hierarchy 300 are faster than memory elements that are “farther” from theprocessor 202 in thememory hierarchy 300. As a result, closer memory elements are used for potentially frequent operations, and closer memory elements are checked first when theprocessor 202 executes a memory operation (e.g., a read or a write). - Closer memory elements are typically constructed using faster memory technologies. However, faster memory technologies are typically more expensive than slower memory technologies. Accordingly, close memory elements are typically smaller than distant memory elements. Although four levels of memory are shown in
FIG. 3 , persons of ordinary skill in the art will readily appreciate that more or fewer levels of memory may alternatively be used. - In the example illustrated, when a
processor 202 executes a memory operation, the request is passed to anL0 cache 302. Typically, theL0 cache 302 is internal to theprocessor 202. However, theL0 cache 302 may be external to theprocessor 202. If theL0 cache 302 holds the requested memory in a state that is compatible with the memory request (e.g., a write request is made and the L0 cache holds the memory in an “exclusive” state), theL0 cache 302 fulfills the memory request (i.e., an L0 cache hit). If theL0 cache 302 does not hold the requested memory (i.e., an L0 cache miss), the memory request is passed on to anL1 cache 204 a which is typically external to theprocessor 202, but may be internal to theprocessor 202. - If the
L1 cache 204 a holds the requested memory in a state that is compatible with the memory request, theL1 cache 204 a fulfills the memory request (i.e., an L1 cache hit). In addition, the requested memory may be moved up from theL1 cache 204 a to theL0 cache 302. If theL1 cache 204 a does not hold the requested memory (i.e., an L1 cache miss), the memory request is passed on to anL2 cache 204 b which is typically external to theprocessor 202, but may be internal to theprocessor 202. - Like the
L1 cache 204 a, if theL2 cache 204 b holds the requested memory in a state that is compatible with the memory request, theL2 cache 204 b fulfills the memory request (i.e., an L2 cache hit). In addition, the requested memory may be moved up from theL2 cache 204 b to theL1 cache 204 a and/or theL0 cache 302. In the illustrated example, if theL2 cache 204 b does not hold the requested memory, the memory request is passed on to the main memory 110 (i.e., an L2 cache miss). If the memory request is passed on to themain memory 110, themain memory 110 fulfills the memory request. In addition, the requested memory may be moved up from themain memory 110 to theL2 cache 204 b, theL1 cache 204 a, and/or theL0 cache 302. - When cache lines and/or memory requests are being moved between two caches and/or between a cache and
main memory 110, an intermediate structure may be used. For example, anL1L2 queue 304 may be used to move a cache line from theL1 cache 204 a to theL2 cache 204 b when the cache line is being victimized in theL1 cache 204 a. Similarly a write-back queue 306 may be used to move a cache line from theL2 cache 204 b to themain memory 110 when the cache line is being victimized in theL2 cache 204 b. -
FIG. 4 is a state diagram of a MESIcache coherency protocol 400 which may be used by theL1 cache 204 a illustrated inFIG. 3 . Thisprotocol 400 includes five states. Specifically, the MESIcache coherency protocol 400 includes aninvalid state 402, anexclusive state 404, a modifiedstate 406, a sharedstate 408, and a pending state (not shown). Of course, a person of ordinary skill in the art will readily appreciate that other cache coherency protocols may be used without departing from the scope or spirit of the disclosed system. - Each cache line in the
cache 204 a using theMESI protocol 400 is associated with one of these states. In an example, the state of a cache line is recorded in a cache directory. In another example, the state of a cache line is recorded in a tag associated with the cache line. In the MESIcache coherency protocol 400 there are five possible states. Accordingly, each state may be represented by a different digital combination (e.g., 000=Modified, 001=Exclusive, 010=Shared, 011=Invalid, and 100=Pending). The state of a cache line may be changed by retagging the cache line. For example, retagging a cache line from “exclusive” to “shared” may be accomplished by changing a tag associated with the cache line from “001” to “010.” Of course, a person of ordinary skill in the art will readily appreciate that any method of storing and changing a cache line state may be used without departing from the scope or spirit of the disclosed system. - An “invalid” cache line is a cache line that does not contain useful data (i.e., the cache line is effectively empty). An “exclusive” cache line is a cache line that is “non-modified” (i.e., the same as main memory 110). A “modified” cache line is a cache line that is “dirty” (i.e., different from main memory 110) (e.g., a new value was written to the cache copy, but not to main memory's copy). A “shared” cache line is a cache line that is held by more than one cache. In some implementations, an “owned” state is added or combined with another state. An “owned” cache line is a cache line that is “modified” and “shared” (i.e., “dirty” and held by another cache). The “owner” of a cache line is responsible for eventually updating
main memory 110 with the modified value (i.e., the “owner” is responsible for performing the write-back). A “pending” cache line is a cache line that is associated with a memory fill in progress or a snoop confirm in progress. A write-back of an exclusive or shared line is merely a notification on the system bus of victimization from the processor caches. A write-back of an exclusive or shared line does not involve any data transfer. - Several different events may be observed by a
cache 204 a using thisprotocol 400 which will cause thecache 204 a to transition from one of these states to another of these states. These events include a write-miss (WM), a read-miss-exclusive (RME), a read-miss-shared (RMS), a read-hit (RH), a write-hit (WH), a snoop-hit-on-read (SHR), and a snoop-hit-on-write/read-with-intent-to-modify (SHW). - A write-miss (WM) event is caused by the
multi-processor module 106 attempting to write a cache line to acache 204 a when thatcache 204 a does not hold the cache line. A read-miss-exclusive (RME) event is caused by themulti-processor module 106 attempting to read a cache line from onecache 204 a when thatcache 204 a does not hold the cache line, and noother cache 204 currently holds the cache line. A read-miss-shared (RMS) event is caused by themulti-processor module 106 attempting to read a cache line from onecache 204 a when thatcache 204 a does not hold the cache line, but anothercache 204 does hold the cache line in the shared state. A read-hit (RH) event is caused by themulti-processor module 106 attempting to read a cache line from acache 204 a that holds the cache line.Other caches 204 holding the same line (but not supplying the line) see such a read as a snoop-hit-on-read (SHR) event. A write-hit (WH) event is caused by themulti-processor module 106 attempting to write a cache line to acache 204 a that holds the cache line.Other caches 204 holding the same line see such a write as a snoop-hit-on-write (SHW) event. A snoop hit associated with a read operation where there is an intent to modify the data is handled the same way as a snoop-hit-on-write (SHW) event. - In addition to transitioning a cache line from one state to another state, these events may cause bus transactions to occur. For example, a cache state transition may be associated with an invalidate operation (I), a cache line fill operation (R), a snoop push operation (S), or a read-with-intent-to-modify operation (RM). An invalidate operation (I) causes an invalidate signal to be broadcast on the
interconnect 206. The invalidate signal causesother caches 204 to place the associated cache line in an invalid (I) state (effectively erasing that line from the cache). A cache line fill operation (R) causes a cache line to be read into thecache 204 a frommain memory 110. A snoop push operation (S) causes the contents of a cache line to be written back tomain memory 110. A read-with-intent-to-modify operation (RM) causesother caches 204 to place the associated cache line in an invalid (I) state and causes a cache line to be read into thecache 204 a frommain memory 110. - In addition to the typical MESI cache coherency protocol events, a “tag update” event may be included. The tag update event is generated when an L1 cache line changes from the
exclusive state 404 to the modifiedstate 406. The tag update event causes a tag update entry to be placed in thetag update queue 304. As described in detail below, an entry in the tag update queue causes anL2 cache 204 b using an enhanced MESI cache coherency protocol to transition the cache line to an enhanced exclusive state. The enhanced exclusive state denotes that theL1 cache 204 a owns the cache line in the modifiedstate 406. -
FIG. 5 is a state diagram of an enhanced MESI cache coherency protocol which may be used by theL2 cache 204 b illustrated inFIG. 3 . Thisprotocol 500 includes seven states. Specifically, the enhanced MESIcache coherency protocol 500 includes aninvalid state 502, anexclusive state 504, an enhancedexclusive state 506, a modifiedstate 508, an enhanced modifiedstate 510, a sharedstate 512, and a pending state (not shown). In addition to these seven states, anerror state 514 is shown for the purpose of illustration. - Each cache line (i.e., memory block) in the
L2 cache 204 b is associated with one of these enhanced MESI protocol states. In an example, the state of a cache line is recorded in a cache directory. In another example, the state of a cache line is recorded in a tag associated with the cache line. In the enhanced MESIcache coherency protocol 500 there are seven possible states. Accordingly, each state may be represented by a different digital combination (e.g., 000=Modified, 001=Exclusive, 010=Shared, 011=Invalid, 100=Pending, 101=Enhanced Modified, and 110=Enhanced Exclusive). The state of a cache line may be changed by retagging the cache line. For example, retagging a cache line from “exclusive” to “shared” may be accomplished by changing a tag associated with the cache line from “001” to “010.” Of course, a person of ordinary skill in the art will readily appreciate that any method of storing and changing a cache line state may be used without departing from the scope or spirit of the disclosed system. - An “invalid” cache line is a cache line that does not contain useful data (i.e., the cache line is effectively empty). An “exclusive” cache line is a cache line that is “non-modified” (i.e., the same as main memory 110). Another cache (e.g.,
L1 cache 204 a) may own the same cache line in either an exclusive state or a modified state. A “modified” cache line is a cache line that is “dirty” (i.e., different from main memory 110). Another cache (e.g.,L1 cache 204 a) does not own this cache line. A “shared” cache line is a cache line that is held by more than one cache. A “pending” cache line is a cache line that is associated with a memory fill in progress or a snoop confirm in progress. An “enhanced exclusive” cache line is a cache line that is “non-modified” (i.e., the same as main memory 110) and a copy of the cache line is in another cache (e.g., anL1 cache 204 a) in a modified state. An “enhanced modified” cache line is a cache line that is “dirty” (i.e., different from main memory 110) and a copy of the cache line may be in another cache (e.g., anL1 cache 204 a) in any state. - Several different events may be observed by a
cache 204 b using thisprotocol 500 which will cause thecache 204 b to transition from one of these states to another of these states. These events include an instruction fetch (I-fetch) event, a load event, a load hit event, a load no-hit event, a store event, and an L1 write-back event. In addition, a tag update event may be used in some examples. - An instruction fetch event is caused by the
multi-processor module 106 attempting to read an instruction from thecache 204 b. A load event is caused by themulti-processor module 106 attempting to read data from thecache 204 b regardless of whether thecache 204 b actually holds the desired data. A load hit event is caused by themulti-processor module 106 attempting to read data from thecache 204 b when thecache 204 b holds the desired data. A load no-hit event is caused by themulti-processor module 106 attempting to read data from thecache 204 b when thecache 204 b does not hold the desired data. A store event is caused by themulti-processor module 106 attempting to store data to thecache 204 b. An L1 write-back event is caused by theL1 cache 204 a writing a cache line back to main memory. A tag update event is caused when a L1 cache line transitions from the exclusive state to the modified state. - As described above, in addition to the typical MESI cache coherency protocol events, a “tag update” event may be included. The tag update event is generated when an L1 cache line changes from the
exclusive state 404 to the modifiedstate 406. The tag update event causes a tag update entry to be placed in thetag update queue 304. An entry in thetag update queue 304 causes anL2 cache 204 b using an enhanced MESI cache coherency protocol to transition the cache line to the enhancedexclusive state 506. The enhancedexclusive state 506 denotes that theL1 cache 204 a owns the cache line in the modifiedstate 406. - The
tag update queue 304 facilitates an early MESI update to the L2 cache 402 a. Thetag update queue 304 is a snoopable structure, since it can provide information about a HITM in the L1 cache 402 a. After the tag update entry is placed in thetag update queue 304, the L1 cache 402 a maintains its cache line in the modifiedstate 406, since the L1 cache 402 a is responsible for writing-back the modified data. -
FIG. 6 is a flowchart representative of anexample process 600 which may be executed by a device (e.g.,L2 cache 204 b and/or memory controller 208) to implement an example mechanism for victimizing lines in a cache (e.g.,L2 cache 204 b). Preferably, the illustratedprocess 600 is embodied in an integrated circuit associated with acache device 204 b. However, the illustratedprocess 600 may be embodied in one or more software programs which are stored in one or more memories (e.g.,flash memory 112 and/or hard disk 120) and executed by one or more processors (e.g., multi-processor module 106) in a well known manner. Although theprocess 600 is described with reference to the flowchart illustrated inFIG. 6 , a person of ordinary skill in the art will readily appreciate that many other methods of performing theprocess 600 may be used. For example, the order of many of the blocks may be altered, the operation of one or more blocks may be changed, blocks may be combined, and/or blocks may be eliminated. - In general, the
example process 600 handles the victimization of a cache line. A victimized cache line is a cache line that is overwritten by another cache line or returned to theinvalid state 502. A cache line may be victimized according to any well known victimization policy. For example, a least-recently-used (LRU) policy may be employed to select a cache line for victimization. Once a cache line is selected for victimization, theprocess 600 determines the current state of the selected cache line. Depending on the current state of the selected cache line, theprocess 600 may issue an internal inquiry and/or perform a write-back operation prior to victimizing the selected cache line. - The
example process 600 begins by determining if a cache line in anL2 cache 204 b needs to be victimized (block 602). For example, if a new cache line is about to be read into thecache 204 b, thecache 204 b may need to remove an existing cache line in order to make room for the new cache line. When a cache line needs to be victimized, theprocess 600 selects a cache line for victimization (block 604). This selection may be performed according to any well known cache victimization selection process. For example, a least recently used (LRU) process may be used to select a cache line for victimization. - Once a cache line is selected for victimization, the
process 600 determines what state the selected cache line is in. In the illustrated example, if the L2 cache line is in the pending state (block 606), theprocess 600 loops back to select another cache line (block 604). If the L2 cache line is in the shared state 512 (block 608), theprocess 600 issues an internal inquiry (block 610). Similarly, if the L2 cache line is in the exclusive state 504 (block 612) or the enhanced modified state 510 (block 614), theprocess 600 issues the internal inquiry (block 610). - As described below with reference to
FIG. 7 , the internal inquiry generates a snoop response from other caches (e.g., theL1 cache 204 a). A “hit” snoop response means another cache (e.g., theL1 cache 204 a) also holds the selected cache line (i.e., the cache line that is about to be victimized). A “no hit” snoop response means no other cache holds the selected cache line. If the snoop response is a “no hit” (block 616), theprocess 600 writes the selected cache line back to main memory 110 (block 618) before the cache line is victimized (block 620). If the snoop response is a “hit” (block 616), theprocess 600 may victimize the cache line (block 620) without performing the write-back (block 618). - The L1 cache posts a “no hit” if the inquiry is due to an L2 victim in the enhanced modified
state 510 and the L1 state is not modified. TheL1 cache 204 a invalidates its entry if the inquiry is due to an L2 victim in the enhanced modifiedstate 510 and the L1 state is exclusive 404 or shared 408. TheL1 cache 204 a need not invalidate an exclusive 404 or shared 408 line upon receiving an inquiry if theL2 cache 204 b is not designed to be inclusive of theL1 cache 204 a. TheL1 cache 204 a posts a “no hit” if the L1 state is not modified and if the L2 cache is designed to be inclusive of theL1 cache 204 a. TheL1 cache 204 a should invalidate an exclusive 404 or shared 408 line upon receiving an inquiry if theL2 cache 204 b is designed to be inclusive of theL1 cache 204 a. - If the L2 cache line is in the modified state 508 (block 622), the
process 600 may write the selected cache line back to main memory 110 (block 618) and victimize the cache line (block 620) without the need for the internal inquiry (block 610). If the L2 cache line is in the invalid state 502 (block 624) or the enhanced exclusive state 506 (block 626), theprocess 600 may victimize the selected cache line without the need for the internal inquiry (block 610) or the write-back (block 618). If the L2 cache line is not in any of the predefined states, theprocess 600 may generate an error (block 628). -
FIG. 6 b is a flowchart representative of an example process 650 which may be executed by a device (e.g.,L2 cache 204 b and/or memory controller 208) to implement an example mechanism for victimizing lines in aL2 cache 204 b when theL2 cache 204 b is designed to be inclusive of theL1 cache 204 a. Preferably, the illustrated process 650 is embodied in an integrated circuit associated with acache device 204 b. However, the illustrated process 650 may be embodied in one or more software programs which are stored in one or more memories (e.g.,flash memory 112 and/or hard disk 120) and executed by one or more processors (e.g., multi-processor module 106) in a well known manner. Although the process 650 is described with reference to the flowchart illustrated inFIG. 6 b, a person of ordinary skill in the art will readily appreciate that many other methods of performing the process 650 may be used. For example, the order of many of the blocks may be altered, the operation of one or more blocks may be changed, blocks may be combined, and/or blocks may be eliminated. - In general, the example process 650 handles the victimization of a cache line. A victimized cache line is a cache line that is overwritten by another cache line or returned to the
invalid state 502. A cache line may be victimized according to any well known victimization policy. For example, a least-recently-used (LRU) policy may be employed to select a cache line for victimization. Once a cache line is selected for victimization, the process 650 determines the current state of the selected cache line. Depending on the current state of the selected cache line, the process 650 may issue an internal inquiry and/or perform a write-back operation prior to victimizing the selected cache line. - The example process 650 begins by determining if a cache line in an
L2 cache 204 b needs to be victimized (block 652). For example, if a new cache line is about to be read into thecache 204 b, thecache 204 b may need to remove an existing cache line in order to make room for the new cache line. When a cache line needs to be victimized, the process 650 selects a cache line for victimization (block 654). This selection may be performed according to any well known cache victimization selection process. For example, a least recently used (LRU) process may be used to select a cache line for victimization. - Once a cache line is selected for victimization, the process 650 determines what state the selected cache line is in. In the illustrated example, if the L2 cache line is in the pending state (block 656), the process 650 loops back to select another cache line (block 654). If the L2 cache line is in the shared state 512 (block 658), the process 650 issues an internal inquiry (block 660) before the cache line is victimized (block 662).
- If the L2 cache line is in the exclusive state 504 (block 664), the process 650 issues an internal inquiry (block 666). As described below with reference to
FIG. 7 , the internal inquiry generates a snoop response from other caches (e.g., theL1 cache 204 a). If the snoop response is a “hit modified” response (block 668), the process 650 issues an invalidate signal on the FSB (block 670) before the cache line is victimized (block 662). If the snoop response is not a “hit modified” response (block 668), the process 650 may victimize the cache line (block 662) without issuing an invalidate signal on the FSB (block 670). If the L2 cache line is in the enhanced exclusive state 506 (block 672), the process 650 may issue the invalidate signal on the FSB (block 670) and victimize the cache line (block 662) without issuing the internal inquiry (block 666). - If the L2 cache line is in the enhanced modified state 510 (block 674), the process 650 issues an internal inquiry (block 676). As described below with reference to
FIG. 7 , the internal inquiry generates a snoop response from other caches (e.g., theL1 cache 204 a). If the snoop response is a “hit modified” response (block 678), the process 650 issues an invalidate signal on the FSB (block 670) before the cache line is victimized (block 662). If the snoop response is not a “hit modified” response (block 678), the process 650 writes the selected cache line back to main memory 110 (block 680) before the cache line is victimized (block 662). If the L2 cache line is in the modified state 508 (block 682), the process 650 may write the cache line back to main memory 110 (block 680) and victimize the cache line (block 662) without issuing the internal inquiry (block 676). - If the L2 cache line is in the invalid state 502 (block 684), the process 650 victimizes the selected cache line (block 662). If the L2 cache line is not in any of the predefined states, the process 650 may generate an error (block 686).
- As discussed above, the internal inquiry generates a snoop response.
FIG. 7 is a flowchart representative of anexample process 700 which may be executed by a device (e.g.,L1 cache 204 a) to implement an example mechanism for responding to the internal inquiry. Preferably, the illustratedprocess 700 is embodied in an integrated circuit associated with acache device 204 a. However, the illustratedprocess 700 may be embodied in one or more software programs which are stored in one or more memories (e.g.,flash memory 112 and/or hard disk 120) and executed by one or more processors (e.g., multi-processor module 106) in a well known manner. Although theprocess 700 is described with reference to the flowchart illustrated inFIG. 7 , a person of ordinary skill in the art will readily appreciate that many other methods of performing theprocess 700 may be used. For example, the order of many of the blocks may be altered, the operation of one or more blocks may be changed, blocks may be combined, and/or blocks may be eliminated. - In general, the
example process 700 responds to an internal inquiry by posting a hit or a hit-modified signal depending on the current state of the cache line associated with the internal inquiry in theL1 cache 204 a. When theL2 cache 204 b is inclusive of theL1 cache 204 a, victimization in theL2 cache 204 b requires invalidation from theL1 cache 204 a. When theL2 cache 204 b is not inclusive of theL1 cache 204 a, victimization fromL2 cache 204 b need not result in invalidation from theL1 cache 204 a. - The illustrated
process 700 begins by waiting for the internal inquiry (block 702). When the internal inquiry is received, theprocess 700 determines if the cache line associated with the internal inquiry is in theL1 cache 204 a in theexclusive state 404 or the shared state 408 (block 704). If the cache line associated with the internal inquiry is in theL1 cache 204 a in theexclusive state 404 or the sharedstate 408, theprocess 700 determines if theL2 cache 204 b is inclusive of theL1 cache 204 a (block 706). If theL2 cache 204 b is inclusive of theL1 cache 204 a, theprocess 700 invalidates the cache line (block 708) and posts a ˜HIT signal (i.e., a NO HIT or a MISS) signal from theL1 cache 204 a (block 710). If theL2 cache 204 b is not inclusive of theL1 cache 204 a (block 706), theprocess 700 posts a hit signal (block 712). - If the cache line is not in the
exclusive state 404 or the shared state 408 (block 704), theprocess 700 determines if the cache line associated with the internal inquiry is in theL1 cache 204 a in the modified state 406 (block 714). If the cache line associated with the internal inquiry is in theL1 cache 204 a in the modifiedstate 406, theprocess 700 posts a HITM (i.e., a HIT MODIFIED) signal from theL1 cache 204 a (block 716). The HITM signal indicates to other caches that thisL1 cache 204 b holds the cache line associated with the snoop probe in a modified state. If the cache line associated with the internal inquiry is not in theL1 cache 204 a, the process posts a ˜HIT signal (i.e., a NO HIT or a MISS) signal from theL1 cache 204 a (block 710). The ˜HIT signal indicates to other caches that thisL1 cache 204 a does not hold the cache line in any non-pending state. -
FIG. 8 is a flowchart representative of anexample process 800 which may be executed by a device to implement an example mechanism for handling snoop probes to a cache using the enhanced MESI protocol (e.g.,L2 cache 204 b). Preferably, the illustratedprocess 800 is embodied in an integrated circuit associated with acache device 204 b. However, the illustratedprocess 800 may be embodied in one or more software programs which are stored in one or more memories (e.g.,flash memory 112 and/or hard disk 120) and executed by one or more processors (e.g., multi-processor module 106) in a well known manner. Although theprocess 800 is described with reference to the flowchart illustrated inFIG. 8 , a person of ordinary skill in the art will readily appreciate that many other methods of performing theprocess 800 may be used. For example, the order of many of the blocks may be altered, the operation of one or more blocks may be changed, blocks may be combined, and/or blocks may be eliminated. - In general, the
example process 800 handles snoop probes to theL2 cache 204 b by determining what state (if any) the cache line associated with the snoop probe is in and posting a signal indicative of the current cache line state. In addition, theprocess 800 determines which cache (if any) has an implicit write-back of the cache line, and theprocess 800 may invalidate the cache line in theL2 cache 204 b. - The illustrated
process 800 begins by determining if the cache line associated with the snoop probe is in theL2 cache 204 b in theexclusive state 504, the enhancedexclusive state 506, or the shared state 512 (block 802). If the cache line associated with the snoop probe is in the L2 cache in theexclusive state 504, the enhancedexclusive state 506, or the sharedstate 512, theprocess 800 posts a HIT signal from theL2 cache 204 b (block 804). The HIT signal indicates to other caches that thisL2 cache 204 b holds the cache line associated with the snoop probe in a non-modified state. - If the cache line associated with the snoop probe is not in the L2 cache in the
exclusive state 504, the enhancedexclusive state 506, or the sharedstate 512, theprocess 800 determines if the cache line associated with the snoop probe is in theL2 cache 204 b in the modifiedstate 508 or the enhanced modified state 510 (block 806). If the cache line associated with the snoop probe is not in theL2 cache 204 b in theexclusive state 504, the enhancedexclusive state 506, the sharedstate 512, the modifiedstate 508, or the enhanced modifiedstate 510, theprocess 800 posts a ˜HIT (i.e., a NO HIT or a MISS) signal from theL2 cache 204 b (block 808). The -HIT signal indicates to other caches that thisL2 cache 204 b does not hold the cache line associated with the snoop probe in any valid or non-pending state. - If the cache line associated with the snoop probe is in the
L2 cache 204 b in the modifiedstate 508 or the enhanced modifiedstate 510, theprocess 800 posts a HITM (i.e., a HIT MODIFIED) signal from theL2 cache 204 b (block 810). The HITM signal indicates to other caches that thisL2 cache 204 b holds the cache line associated with the snoop probe in a modified (or enhanced modified) state. - After the
process 800 posts the HITM signal (block 810), theprocess 800 determines if theL1 cache 204 a also posted a HITM signal (block 812). If theL1 cache 204 a also posted a HITM signal, there is an implicit write-back of the cache line associated with the snoop probe from theL1 cache 204 a (block 814). As a result, theL2 cache 204 a may invalidate the cache line (block 816). However, if theL1 cache 204 a does not post a HITM signal, there is an implicit write-back of the cache line associated with the snoop probe from theL2 cache 204 b (block 818). As a result, theL2 cache 204 a may not invalidate the cache line. -
FIGS. 9-10 are a flowchart representative of anexample process 900 which may be executed by a device (e.g.,L2 cache 204 b and/or memory controller 208) to implement an example mechanism for filtering snoops (without the use of tags as described below with reference toFIGS. 11-12 ). Preferably, the illustratedprocess 900 is embodied in an integrated circuit associated with thememory hierarchy 300. However, the illustratedprocess 900 may be embodied in one or more software programs which are stored in one or more memories (e.g.,flash memory 112 and/or hard disk 120) and executed by one or more processors (e.g., multi-processor module 106) in a well known manner. Although theprocess 900 is described with reference to the flowchart illustrated inFIGS. 9-10 , a person of ordinary skill in the art will readily appreciate that many other methods of performing theprocess 900 may be used. For example, the order of many of the blocks may be altered, the operation of one or more blocks may be changed, blocks may be combined, and/or blocks may be eliminated. - In general, the
example process 900 monitors a snoop queue. Depending on the type of snoop probe in the snoop queue and the state of the associated cache line in theL2 cache 204 b, the L2 write-back queue 306, and/or theL1L2 queue 304, theprocess 900 posts information and sends snoop probes to other caches in an effort to filter snoops and reduce bus traffic. Thisprocess 900 assumes that theL2 cache 204 b is maintained inclusive of theL1 cache 204 a by using an L2 victimization process (seeFIG. 6 b). The L2 write-back queue 306 and theL1L2 queue 304 are considered part of theL2 cache 204 b for snoop purposes. A snoop probe that finds a line in L2 write-back queue 306 is interpreted as finding the line inL2 cache 204 b in that state. A snoop probe that finds an L1 write-back in theL1L2 queue 304 is interpreted as finding the line in theL2 cache 204 b in the modifiedstate 508. - The illustrated
process 900 begins by waiting for an entry in the snoop queue (block 902). A snoop queue entry is a snoop probe that would normally be propagated to all the cache devices. When an entry in the snoop queue is ready, theprocess 900 determines if the type of entry in the snoop queue is a snoop-to-share type (block 904). A snoop-to-share entry is an entry that causes a cache line to transition to the shared state if a hit is found. If the type of entry in the snoop queue is a snoop-to-share entry, theprocess 900 sends a snoop probe only to theL2 cache 204 b, the L2 write-back queue 306, and the L1L2 queue 304 (block 906). - The
process 900 may send the snoop probe to each of these entities simultaneously or one at a time. In the event the snoop probe is sent to theL2 cache 204 b, the L2 write-back queue 306, and theL1L2 queue 304 one at a time, theprocess 900 may not send the snoop probe to all of these entities if a predetermined state is found before the snoop probe is sent to all of the entities. Accordingly, the tests described below may be rearranged to perform all of the tests for one of the entities before moving on to the tests for another of the entities. For example, theprocess 900 may check for each of theL2 cache 204 b states described below before moving on to test each of the L2 write-back queue 306 states. - In any event, if the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the enhancedexclusive state 506, the modifiedstate 508, or the enhanced modified state 510 (block 908), theprocess 900 posts a HITM signal (block 910). The HITM signal indicates to other caches that somecache 204 holds the cache line associated with the snoop probe in a modifiedstate state 510. The cache that holds the cache line associated with the snoop probe in a modifiedstate 508 or an enhanced modifiedstate 510 may be theL2 cache 204 b, as directly indicated by theL2 cache 204 b posting the HITM signal in response to theL2 cache 204 b holding the line in the modifiedstate 508 or the enhanced modifiedstate 510. Alternatively, the cache that holds the cache line associated with the snoop probe in a modifiedstate 406 may be theL1 cache 204 a, as indirectly indicated by theL2 cache 204 b posting the HITM signal in response to theL2 cache 204 b holding the cache line in the enhancedexclusive state 506. - In addition, if the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the enhancedexclusive state 506, the modifiedstate 508, or the enhanced modified state 510 (block 908), theprocess 900 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 912). A snoop-to-invalidate probe is probe that causes a cache line to transition to the invalid state if a hit is found. - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the sharedstate 512, and the L2 write-back queue 306 posts a MISS signal (block 914), theprocess 900 posts a HIT signal (block 916). The MISS signal from the L2 write-back queue 306 indicates the L2 write-back queue 306 does not hold a copy of the cache line. The HIT signal from theL2 cache 204 b indicates to other caches that theL2 cache 204 b holds the cache line associated with the snoop probe in the sharedstate 512. - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in theexclusive state 504, and the L2 write-back queue 306 posts a MISS signal (block 920), theprocess 900 sends a snoop-to-share probe to theL1 cache 204 a and the L0 cache 302 (block 922). Theprocess 900 then posts a HIT signal or a HITM signal based on the response to the snoop-to-share probe from theL1 cache 204 a (block 924). The HIT signal is posted if the response from theL1 cache 204 a is not a HITM. The HITM signal is posted if the response from theL1 cache 204 a is a HITM. - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in theexclusive state 504, and the L2 write-back queue 306 posts a HIT signal (block 926), theprocess 900 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 928). The HIT signal from the L2 write-back queue 306 indicates the L2 write-back queue 306 holds a copy of the cache line. Theprocess 900 then posts a HIT signal or a HITM signal based on the response to the snoop-to-invalidate probe from theL1 cache 204 a (block 930). The NO HIT signal is posted if the response from theL1 cache 204 a is not a HITM. The HITM signal is posted if the response from theL1 cache 204 a is a HITM. - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the sharedstate 512, and the L2 write-back queue 306 posts a HIT signal (block 932), theprocess 900 posts a NO HIT signal (block 934) and a NO HITM signal (block 936). In addition, theprocess 900 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 938). - If the
L2 cache 204 b and the L2 write-back queue 306 both post a MISS signal in response to the snoop probe (block 940), theprocess 900 posts a NO HIT signal (block 942) and a NO HITM signal (block 944). - If the type of entry in the snoop queue is not a snoop-to-share entry (block 904), the
process 900 determines if the type of entry in the snoop queue is a snoop-to-invalidate type (block 946). If the type of entry in the snoop queue is a snoop-to-invalidate entry, theprocess 900 sends a snoop probe only to theL2 cache 204 b, the L2 write-back queue 306, and the L1L2 queue 304 (block 1002 ofFIG. 10 ). - As described above, the
process 900 may send the snoop probe to each of these entities simultaneously or one at a time. In the event the snoop probe is sent to theL2 cache 204 b, the L2 write-back queue 306, and theL1L2 queue 304 one at a time, theprocess 900 may not send the snoop probe to all of these entities if a predetermined state is found before the snoop probe is sent to all of the entities. Accordingly, the tests described below may be rearranged to perform all of the tests for one of the entities before moving on to the tests for another of the entities. For example, theprocess 900 may check for each of theL2 cache 204 b states described below before moving on to test each of the L2 write-back queue 306 states. - In any event, if the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the enhancedexclusive state 506, the modifiedstate 508, or the enhanced modified state 510 (block 1004), theprocess 900 posts a HITM signal (block 1006). Again, the HITM signal indicates to other caches that somecache 204 holds the cache line associated with the snoop probe in a modifiedstate state 510. The cache that holds the cache line in a modifiedstate 508 or an enhanced modifiedstate 510 may be theL2 cache 204 b, as directly indicated by theL2 cache 204 b posting the HITM signal in response to theL2 cache 204 b holding the line in the modifiedstate 508 or the enhanced modifiedstate 510. Alternatively, the cache that holds the cache line in a modifiedstate 406 may be theL1 cache 204 a, as indirectly indicated by theL2 cache 204 b posting the HITM signal in response to theL2 cache 204 b holding the cache line in the enhancedexclusive state 506. In addition, theprocess 900 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 1008). - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the shared state 512 (block 1010), theprocess 900 posts a NO HIT signal (block 1012) and a NO HITM signal (block 1014). In addition, theprocess 900 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 1016). - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the exclusive state 504 (block 1018), theprocess 900 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 1020). Theprocess 900 then posts a HITM signal based on the response to the snoop-to-invalidate probe from theL1 cache 204 a (block 1022). The HITM signal is posted if the response from theL1 cache 204 a is a HITM. - If the
L2 cache 204 b and the L2 write-back queue 306 both post a MISS signal in response to the snoop probe (block 1024), theprocess 900 posts a NO HIT signal (block 1026) and a NO HITM signal (block 1028). -
FIGS. 11-12 are a flowchart representative of anexample process 1100 which may be executed by a device (e.g.,L2 cache 204 b and/or memory controller 208) to implement an example mechanism for filtering snoops with the use of tags. Preferably, the illustratedprocess 1100 is embodied in an integrated circuit associated with thememory hierarchy 300. However, the illustratedprocess 100 may be embodied in one or more software programs which are stored in one or more memories (e.g.,flash memory 112 and/or hard disk 120) and executed by one or more processors (e.g., multi-processor module 106) in a well known manner. Although theprocess 1100 is described with reference to the flowcharts illustrated inFIGS. 11-12 , a person of ordinary skill in the art will readily appreciate that many other methods of performing theprocess 1100 may be used. For example, the order of many of the blocks may be altered, the operation of one or more blocks may be changed, blocks may be combined, and/or blocks may be eliminated. - In general, the
example process 1100 monitors a snoop queue. Depending on the type of snoop probe in the snoop queue and the state of the associated cache line in theL2 cache 204 b, the L2 write-back queue 306, and/or theL1L2 queue 304, theprocess 1100 posts information and sends snoop probes to other caches in an effort to filter snoops and reduce bus traffic. Thisprocess 1100 assumes that theL2 cache 204 b is maintained inclusive of theL1 cache 204 a by using an L2 victimization process (seeFIG. 6 b). Unlike theprocess 900 described above, thisprocess 1100 performs these actions with the use of thetag update queue 304 as described herein. Thetag update queue 304, the L2 write-back queue 306 and theL1L2 queue 304 are considered part of theL2 cache 204 b for snoop purposes. A snoop probe that finds a line in L2 write-back queue 306 is interpreted as finding the line inL2 cache 204 b in that state. A snoop probe hit in thetag update queue 304 is interpreted as finding the line inL2 cache 204 b in enhanced exclusive state. A snoop probe that finds an L1 write-back in theL1L2 queue 304 is interpreted as finding the line inL2 cache 204 b in the modifiedstate 508. - The illustrated
process 1100 begins by waiting for an entry in the snoop queue (block 1102). Again, a snoop queue entry is a snoop probe that would normally be propagated to all the cache devices. When an entry in the snoop queue is ready, theprocess 1100 determines if the type of entry in the snoop queue is a snoop-to-share type (block 1104). If the type of entry in the snoop queue is a snoop-to-share entry, theprocess 1100 sends a snoop probe only to theL2 cache 204 b, the L2 write-back queue 306, theL1L2 queue 304, and the tag update queue 304 (block 1106). - As described above, the
process 1100 may send the snoop probe to each of these entities simultaneously or one at a time, and theprocess 1100 may not send the snoop probe to all of these entities if a predetermined state is found before the snoop probe is sent to all of the entities. Accordingly, the tests described below may be rearranged to perform all of the tests for one of the entities before moving on to the tests for another of the entities. - In any event, if the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the enhancedexclusive state 506, the modifiedstate 508, or the enhanced modified state 510 (block 1108), theprocess 1100 posts a HITM signal (block 1110). Again, the HITM signal indicates to other caches that somecache 204 holds the cache line associated with the snoop probe in a modifiedstate state 510. The cache that holds the cache line associated with the snoop probe in a modifiedstate 508 or an enhanced modifiedstate 510 may be theL2 cache 204 b, as directly indicated by theL2 cache 204 b indicating it is holding the line in the modifiedstate 508 or the enhanced modifiedstate 510. Alternatively, the cache that holds the cache line associated with the snoop probe in a modifiedstate 406 may be theL1 cache 204 a, as indirectly indicated by theL2 cache 204 b indicating it is holding the cache line in the enhancedexclusive state 506. - In addition, if the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the enhancedexclusive state 506, the modifiedstate 508, or the enhanced modified state 510 (block 1108), theprocess 1100 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 1112). - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the sharedstate 512, and the L2 write-back queue 306 posts a MISS signal (block 1114), theprocess 1100 posts a HIT signal (block 1116). The HIT signal indicates to other caches that theL2 cache 204 b holds the cache line associated with the snoop probe in the sharedstate 512. - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in theexclusive state 504, and the L2 write-back queue 306 posts a MISS signal (block 1118), theprocess 1100 posts a HIT signal (block 1120) and sends a snoop-to-share probe to theL1 cache 204 a and the L0 cache 302 (block 1122). Unlike theprocess 900 described above, thisprocess 1100 does not post a HIT signal or a HITM signal based on the response to the snoop-to-share probe from theL1 cache 204 a because theprocess 1100 guarantees that theL1 cache 204 a will not modify the cache line for which a snoop is in progress. Stores committed to that cache line prior to the snoop probe will be present in thetag update queue 304 and result in a HITM from theL2 cache 204 b. - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in theexclusive state 504, and the L2 write-back queue 306 posts a HIT signal (block 1124), theprocess 1100 posts a NO HIT signal (block 1126) and a NO HITM signal (block 1128). In addition, theprocess 1100 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 1130). Again, unlike theprocess 900 described above, thisprocess 1100 does not post a HIT signal or a HITM signal based on the response to the snoop-to-invalidate probe from theL1 cache 204 a. - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the sharedstate 512, and the L2 write-back queue 306 posts a HIT signal (block 1132), theprocess 1100 posts a NO HIT signal (block 1134) and a NO HITM signal (block 1136). In addition, theprocess 1100 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 1138). - If the
L2 cache 204 b and the L2 write-back queue 306 both post a MISS signal in response to the snoop probe (block 1140), theprocess 1100 posts a NO HIT signal (block 1142) and a NO HITM signal (block 1144). - If the type of entry in the snoop queue is not a snoop-to-share entry (block 1104), the
process 1100 determines if the type of entry in the snoop queue is a snoop-to-invalidate type (block 1146). If the type of entry in the snoop queue is a snoop-to-invalidate entry, theprocess 1100 sends a snoop probe only to theL2 cache 204 b, the L2 write-back queue 306, theL1L2 queue 304, and the tag update queue 304 (block 1202 ofFIG. 12 ). - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the enhancedexclusive state 506, the modifiedstate 508, or the enhanced modified state 510 (block 1204), theprocess 1100 posts a HITM signal (block 1206). Again, the HITM signal indicates to other caches that somecache 204 holds the cache line associated with the snoop probe in a modifiedstate state 510. The cache that holds the cache line in a modifiedstate 508 or an enhanced modifiedstate 510 may be theL2 cache 204 b, as directly indicated by theL2 cache 204 b. Alternatively, the cache that holds the cache line in a modifiedstate 406 may be theL1 cache 204 a, as indirectly indicated by theL2 cache 204 b indicating it is holding the cache line in the enhancedexclusive state 506. In addition, if the response to the snoop probe is indicative of theL2 cache 204 b holding the cache line in the enhancedexclusive state 506, the modifiedstate 508, or the enhanced modified state 510 (block 1204), theprocess 1100 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 1208). - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the shared state 512 (block 1210), theprocess 900 posts a NO HIT signal (block 1212) and a NO HITM signal (block 1214). In addition, theprocess 1100 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 1216). - If the response to the snoop probe is indicative of the
L2 cache 204 b holding the cache line in the exclusive state 504 (block 1218), theprocess 1100 posts a NO HIT signal (block 1220) and a NO HITM signal (block 1222). In addition, theprocess 1100 sends a snoop-to-invalidate probe to theL1 cache 204 a and the L0 cache 302 (block 1224). Again, unlike theprocess 900 described above, thisprocess 1100 does not post a HIT signal or a HITM signal based on the response to the snoop-to-invalidate probe from theL1 cache 204 a. - If the
L2 cache 204 b and the L2 write-back queue 306 both post a MISS signal in response to the snoop probe (block 1226), theprocess 1100 posts a NO HIT signal (block 1228) and a NO HITM signal (block 1230). - Although the above discloses example systems, it should be noted that such systems are merely illustrative and should not be considered as limiting. For example, it is contemplated that any or all of the disclosed hardware and software components could be embodied exclusively in dedicated hardware, exclusively in software, exclusively in firmware or in some combination of hardware, firmware and/or software.
- In addition, although certain methods, apparatus, and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all apparatuses, methods and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.
Claims (44)
1. A method of filtering an external cache snoop probe to a first cache, the method comprising:
determining a cache line state, the cache line state being associated with the external cache snoop probe and a cache line in a second cache; and
posting a hit-modified signal if the cache line state is an enhanced exclusive state, wherein the enhanced exclusive state indicates a copy of the cache line is in the first cache in a modified state.
2. A method as defined in claim 1 , further comprising posting the hit-modified signal if the cache line state is an enhanced modified state, wherein the enhanced modified state indicates a copy of the cache line may be in the first cache.
3. A method as defined in claim 2 , further comprising posting the hit-modified signal if the cache line state is a modified state, wherein the modified state indicates the second cache owns the cache line and the first cache does not own the cache line.
4. A method as defined in claim 1 , further comprising sending a snoop-to-invalidate probe to the first cache after posting the hit-modified signal.
5. A method as defined in claim 1 , wherein the external cache snoop probe is sent to the second cache, a second cache write-back queue, and an intermediate structure between the first cache and the second cache.
6. A method as defined in claim 1 , wherein the external cache snoop probe is sent only to the second cache, a second cache write-back queue, and an intermediate structure between the first cache and the second cache.
7. A method as defined in claim 1 , further comprising determining a snoop type associated with the external cache snoop probe is one of a snoop-to-share type and a snoop-to-invalidate type.
8. An apparatus to filter a cache snoop probe, the apparatus comprising:
a first cache;
a second cache;
a memory controller operatively coupled to the first cache and the second cache, the memory controller being structured to (i) determine a cache line state associated with a cache line in the second cache, the cache line in the second cache being identified by the cache snoop probe, and (ii) post a hit-modified signal if the cache line state is an enhanced exclusive state, wherein the enhanced exclusive state indicates a copy of the cache line is in the first cache in a modified state.
9. An apparatus as defined in claim 8 , wherein the memory controller is structured to post the hit-modified signal if the cache line state is an enhanced modified state, wherein the enhanced modified state indicates a copy of the cache line may be in the first cache.
10. An apparatus as defined in claim 8 , wherein the memory controller is structured to post the hit-modified signal if the cache line state is a modified state, wherein the modified state indicates the second cache owns the cache line and the first cache does not own the cache line.
11. An apparatus as defined in claim 8 , wherein the memory controller is structured to send a snoop-to-invalidate probe to the first cache after posting the hit-modified signal.
12. An apparatus as defined in claim 8 , further comprising:
a main memory;
a write-back queue operatively coupled to the second cache and the main memory; and
an intermediate structure operatively coupled to the first cache and the second cache, wherein the cache snoop probe is sent to the second cache, the write-back queue, and the intermediate structure.
13. An apparatus as defined in claim 8 , further comprising:
a main memory;
a write-back queue operatively coupled to the second cache and the main memory; and
an intermediate structure operatively coupled to the first cache and the second cache, wherein the cache snoop probe is sent only to the second cache, the write-back queue, and the intermediate structure.
14. A method of filtering an external cache snoop probe to a first cache, the method comprising:
determining a snoop type associated with the external cache snoop probe;
determining a cache line state, the cache line state being associated with the external cache snoop probe and a cache line in a second cache;
posting a hit signal if (i) the snoop type is a snoop-to-share type, (ii) the cache line state is a shared state, and (iii) the second cache posts a write-back miss signal; and
sending a snoop-to-share probe to the first cache after posting the hit signal if the hit signal is posted.
15. A method as defined in claim 14 , further comprising posting a no hit signal if (i) the snoop type is the snoop-to-share type, (ii) the cache line state is the shared state, and (iii) the second cache posts a write-back hit signal.
16. A method as defined in claim 15 , further comprising posting a no hit modified signal if (i) the snoop type is the snoop-to-share type, (ii) the cache line state is the shared state, and (iii) the second cache posts the write-back hit signal.
17. A method as defined in claim 16 , further comprising sending a snoop-to-invalidate probe to the first cache after posting the no hit signal and the no hit modified signal.
18. An apparatus to filter a cache snoop probe, the apparatus comprising:
a first cache;
a second cache;
a memory controller operatively coupled to the first cache and the second cache, the memory controller being structured to (i) determine a snoop type associated with the cache snoop probe, (ii) determine a cache line state associated with a cache line in the second cache, the cache line in the second cache being identified by the cache snoop probe, and (iii) post a hit signal if (a) the snoop type is a snoop-to-share type, (b) the cache line state is a shared state, and (c) the second cache posts a write-back miss signal.
19. An apparatus as defined in claim 18 , wherein the memory controller is structured to post a no hit signal if (d) the snoop type is the snoop-to-share type, (e) the cache line state is the shared state, and (f) the second cache posts a write-back hit signal.
20. An apparatus as defined in claim 19 , wherein the memory controller is structured to post a no hit modified signal if (d) the snoop type is the snoop-to-share type, (e) the cache line state is the shared state, and (f) the second cache posts the write-back hit signal.
21. A method of filtering an external cache snoop probe to a first cache, the method comprising:
determining a snoop type associated with the external cache snoop probe;
determining a cache line state, the cache line state being associated with the external cache snoop probe and a cache line in a second cache; and
sending a snoop-to-share probe to the first cache if (i) the snoop type is a snoop-to-share type, (ii) the cache line state is an exclusive state, and (iii) the second cache posts a write-back miss signal.
22. A method as defined in claim 21 , further comprising:
receiving a response to the snoop-to-share probe from the first cache; and
posting one of a hit signal and a hit modified signal based on the response to the snoop-to-share probe.
23. A method as defined in claim 22 , further comprising sending a snoop-to-invalidate probe to the first cache if (i) the snoop type is the snoop-to-share type, (ii) the cache line state is the exclusive state, and (iii) the second cache posts a write-back hit signal.
24. A method as defined in claim 23 , further comprising:
receiving a response to the snoop-to-invalidate probe from the first cache; and
posting one of the hit signal and the hit modified signal based on the response to the snoop-to-invalidate probe.
25. An apparatus to filter a cache snoop probe, the apparatus comprising:
a first cache;
a second cache;
a memory controller operatively coupled to the first cache and the second cache, the memory controller being structured to (i) determine a snoop type associated with the cache snoop probe, (ii) determine a cache line state associated with a cache line in the second cache, the cache line in the second cache being identified by the cache snoop probe, and (iii) send a snoop-to-share probe to the first cache if (a) the snoop type is a snoop-to-share type, (b) the cache line state is an exclusive state, and (c) the second cache posts a write-back miss signal.
26. An apparatus as defined in claim 25 , wherein the memory controller is structured to:
receive a response to the snoop-to-share probe from the first cache; and
post one of a hit signal and a hit modified signal based on the response.
27. An apparatus as defined in claim 25 , wherein the memory controller is structured to send a snoop-to-invalidate probe to the first cache if (d) the snoop type is the snoop-to-share type, (e) the cache line state is the exclusive state, and (f) the second cache posts a write-back hit signal.
28. An apparatus as defined in claim 27 , wherein the memory controller is structured to:
receive a response to the snoop-to-invalidate probe from the first cache; and
post one of a hit signal and a hit modified signal based on the response.
29. A method of filtering an external cache snoop probe to a first cache, the method comprising:
determining a snoop type associated with the external cache snoop probe;
determining a cache line state, the cache line state being associated with the external cache snoop probe and a cache line in a second cache;
posting a no hit signal and a no hit modified signal if (i) the snoop type is a snoop-to-invalidate type, and (ii) the cache line state is a shared state; and
sending a snoop-to-invalidate probe to the first cache after posting the no hit signal and the no hit modified signal.
30. A method as defined in claim 29 , further comprising sending the snoop-to-invalidate probe to the first cache if (i) the snoop type is the snoop-to-invalidate type, and (ii) the cache line state is an exclusive state.
31. A method as defined in claim 30 , further comprising posting one of the hit modified signal and the no hit modified signal based on a response to the snoop-to-invalidate probe.
32. A method as defined in claim 29 , further comprising posting the no hit signal and the no hit modified signal if (i) the snoop type is the snoop-to-invalidate type, and (ii) the cache line state is an exclusive state.
33. An apparatus to filter a cache snoop probe, the apparatus comprising:
a first cache;
a second cache;
a memory controller operatively coupled to the first cache and the second cache, the memory controller being structured to (i) determine a snoop type associated with the cache snoop probe, (ii) determine a cache line state associated with a cache line in the second cache, the cache line in the second cache being identified by the cache snoop probe, and (iii) post a no hit signal and a no hit modified signal if (a) the snoop type is a snoop-to-invalidate type, and (b) the cache line state is a shared state.
34. An apparatus as defined in claim 33 , wherein the memory controller is structured to send a snoop-to-invalidate probe to the first cache after posting the no hit signal and the no hit modified signal.
35. An apparatus as defined in claim 33 , wherein the memory controller is structured to send a snoop-to-invalidate probe to the first cache if (c) the snoop type is the snoop-to-invalidate type, and (d) the cache line state is an exclusive state.
36. An apparatus as defined in claim 35 , wherein the memory controller is structured to post one of a hit modified signal and no hit modified signal based on a response to the snoop-to-invalidate probe.
37. An apparatus as defined in claim 35 , wherein the memory controller is structured to post the no hit signal and the no hit modified signal if (i) the snoop type is the snoop-to-invalidate type, and (ii) the cache line state is the exclusive state.
38. A method of filtering an external cache snoop probe to a first cache, the method comprising:
determining a snoop type associated with the external cache snoop probe;
determining a cache line state, the cache line state being associated with the external cache snoop probe and a cache line in a second cache;
posting a hit signal if (i) the snoop type is a snoop-to-share type, (ii) the cache line state is an exclusive state, and (iii) the second cache posts a write-back miss signal; and
sending a snoop-to-share probe to the first cache if (i) the snoop type is a snoop-to-share type, (ii) the cache line state is an exclusive state, and (iii) the second cache posts a write-back miss signal.
39. A method as defined in claim 38 , further comprising posting a no hit signal and a no hit modified signal if (i) the snoop type is the snoop-to-share type, (ii) the cache line state is the exclusive state, and (iii) the second cache posts a write-back hit signal.
40. A method as defined in claim 39 , further comprising sending a snoop-to-invalidate probe to the first cache after posting the no hit signal and the no hit modified signal if (i) the snoop type is the snoop-to-share type, (ii) the cache line state is the exclusive state, and (iii) the second cache posts the write-back hit signal.
41. An apparatus to filter a cache snoop probe, the apparatus comprising:
a first cache;
a second cache;
a memory controller operatively coupled to the first cache and the second cache, the memory controller being structured to (i) determine a snoop type associated with the cache snoop probe, (ii) determine a cache line state associated with a cache line in the second cache, the cache line in the second cache being identified by the cache snoop probe, and (iii) post a hit signal if (a) the snoop type is a snoop-to-share type, (b) the cache line state is an exclusive state, and (c) the second cache posts a write-back miss signal.
42. An apparatus as defined in claim 41 , wherein the memory controller is structured to send a snoop-to-share probe to the first cache if (d) the snoop type is the snoop-to-share type, (e) the cache line state is the exclusive state, and (f) the second cache posts the write-back miss signal.
43. An apparatus as defined in claim 41 , wherein the memory controller is structured to post a no hit signal and a no hit modified signal if (d) the snoop type is the snoop-to-share type, (e) the cache line state is the exclusive state, and (f) the second cache posts a write-back hit signal.
44. An apparatus as defined in claim 43 , wherein the memory controller is structured to send a snoop-to-invalidate probe to the first cache after posting the no hit signal and the no hit modified signal if (g) the snoop type is the snoop-to-share type, (h) the cache line state is the exclusive state, and (j) the second cache posts the write-back hit signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/630,465 US20050027946A1 (en) | 2003-07-30 | 2003-07-30 | Methods and apparatus for filtering a cache snoop |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/630,465 US20050027946A1 (en) | 2003-07-30 | 2003-07-30 | Methods and apparatus for filtering a cache snoop |
Publications (1)
Publication Number | Publication Date |
---|---|
US20050027946A1 true US20050027946A1 (en) | 2005-02-03 |
Family
ID=34103853
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/630,465 Abandoned US20050027946A1 (en) | 2003-07-30 | 2003-07-30 | Methods and apparatus for filtering a cache snoop |
Country Status (1)
Country | Link |
---|---|
US (1) | US20050027946A1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060224829A1 (en) * | 2005-03-29 | 2006-10-05 | Arm Limited | Management of cache memories in a data processing apparatus |
US20070079072A1 (en) * | 2005-09-30 | 2007-04-05 | Collier Josh D | Preemptive eviction of cache lines from a directory |
US20080104333A1 (en) * | 2006-10-31 | 2008-05-01 | Veazey Judson E | Tracking of higher-level cache contents in a lower-level cache |
US20080140893A1 (en) * | 2004-12-03 | 2008-06-12 | International Business Machines Corporation | Prioritization of out-of-order data transfers on shared data bus |
US20090106499A1 (en) * | 2007-10-17 | 2009-04-23 | Hitachi, Ltd. | Processor with prefetch function |
WO2009130671A1 (en) * | 2008-04-22 | 2009-10-29 | Nxp B.V.Accomp. Pdoc | Multiprocessing circuit with cache circuits that allow writing to not previously loaded cache lines |
US20120317362A1 (en) * | 2011-06-09 | 2012-12-13 | Apple Inc. | Systems, methods, and devices for cache block coherence |
US20140075091A1 (en) * | 2012-09-10 | 2014-03-13 | Texas Instruments Incorporated | Processing Device With Restricted Power Domain Wakeup Restore From Nonvolatile Logic Array |
US20140181448A1 (en) * | 2012-12-21 | 2014-06-26 | David A. Buchholz | Tagging in a storage device |
JP2017018900A (en) * | 2015-07-10 | 2017-01-26 | 株式会社エコ・ストリーム | Oil/water separator |
CN106716949A (en) * | 2014-09-25 | 2017-05-24 | 英特尔公司 | Reducing interconnect traffics of multi-processor system with extended MESI protocol |
US11263720B2 (en) * | 2017-04-10 | 2022-03-01 | Intel Corporation | Frequent data value compression for graphics processing units |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5113514A (en) * | 1989-08-22 | 1992-05-12 | Prime Computer, Inc. | System bus for multiprocessor computer system |
US20040186964A1 (en) * | 2003-03-20 | 2004-09-23 | International Business Machines Corporation | Snoop filtering |
-
2003
- 2003-07-30 US US10/630,465 patent/US20050027946A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5113514A (en) * | 1989-08-22 | 1992-05-12 | Prime Computer, Inc. | System bus for multiprocessor computer system |
US20040186964A1 (en) * | 2003-03-20 | 2004-09-23 | International Business Machines Corporation | Snoop filtering |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7890708B2 (en) * | 2004-12-03 | 2011-02-15 | International Business Machines Corporation | Prioritization of out-of-order data transfers on shared data bus |
US20080140893A1 (en) * | 2004-12-03 | 2008-06-12 | International Business Machines Corporation | Prioritization of out-of-order data transfers on shared data bus |
US7434007B2 (en) * | 2005-03-29 | 2008-10-07 | Arm Limited | Management of cache memories in a data processing apparatus |
US20060224829A1 (en) * | 2005-03-29 | 2006-10-05 | Arm Limited | Management of cache memories in a data processing apparatus |
US20070079072A1 (en) * | 2005-09-30 | 2007-04-05 | Collier Josh D | Preemptive eviction of cache lines from a directory |
US20080104333A1 (en) * | 2006-10-31 | 2008-05-01 | Veazey Judson E | Tracking of higher-level cache contents in a lower-level cache |
US20090106499A1 (en) * | 2007-10-17 | 2009-04-23 | Hitachi, Ltd. | Processor with prefetch function |
US20110082981A1 (en) * | 2008-04-22 | 2011-04-07 | Nxp B.V. | Multiprocessing circuit with cache circuits that allow writing to not previously loaded cache lines |
WO2009130671A1 (en) * | 2008-04-22 | 2009-10-29 | Nxp B.V.Accomp. Pdoc | Multiprocessing circuit with cache circuits that allow writing to not previously loaded cache lines |
US20120317362A1 (en) * | 2011-06-09 | 2012-12-13 | Apple Inc. | Systems, methods, and devices for cache block coherence |
US8856456B2 (en) * | 2011-06-09 | 2014-10-07 | Apple Inc. | Systems, methods, and devices for cache block coherence |
US20140075091A1 (en) * | 2012-09-10 | 2014-03-13 | Texas Instruments Incorporated | Processing Device With Restricted Power Domain Wakeup Restore From Nonvolatile Logic Array |
US20140181448A1 (en) * | 2012-12-21 | 2014-06-26 | David A. Buchholz | Tagging in a storage device |
US9513803B2 (en) * | 2012-12-21 | 2016-12-06 | Intel Corporation | Tagging in a storage device |
CN106716949A (en) * | 2014-09-25 | 2017-05-24 | 英特尔公司 | Reducing interconnect traffics of multi-processor system with extended MESI protocol |
JP2017018900A (en) * | 2015-07-10 | 2017-01-26 | 株式会社エコ・ストリーム | Oil/water separator |
US11263720B2 (en) * | 2017-04-10 | 2022-03-01 | Intel Corporation | Frequent data value compression for graphics processing units |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7287126B2 (en) | Methods and apparatus for maintaining cache coherency | |
US7062613B2 (en) | Methods and apparatus for cache intervention | |
US7100001B2 (en) | Methods and apparatus for cache intervention | |
US6775748B2 (en) | Methods and apparatus for transferring cache block ownership | |
CN103714015B (en) | Method device and system for reducing back invalidation transactions from a snoop filter | |
US5940856A (en) | Cache intervention from only one of many cache lines sharing an unmodified value | |
US6721848B2 (en) | Method and mechanism to use a cache to translate from a virtual bus to a physical bus | |
KR100293136B1 (en) | Method of shared intervention for cache lines in the recently read state for smp bus | |
US5963974A (en) | Cache intervention from a cache line exclusively holding an unmodified value | |
US5940864A (en) | Shared memory-access priorization method for multiprocessors using caches and snoop responses | |
US6343347B1 (en) | Multiprocessor system bus with cache state and LRU snoop responses for read/castout (RCO) address transaction | |
US6353875B1 (en) | Upgrading of snooper cache state mechanism for system bus with read/castout (RCO) address transactions | |
US6662275B2 (en) | Efficient instruction cache coherency maintenance mechanism for scalable multiprocessor computer system with store-through data cache | |
US20070136535A1 (en) | System and Method for Reducing Unnecessary Cache Operations | |
US20050204088A1 (en) | Data acquisition methods | |
JP2008512772A (en) | Resolving cache conflicts | |
JP2000250812A (en) | Memory cache system and managing method therefor | |
JPH10333985A (en) | Data supply method and computer system | |
US5943685A (en) | Method of shared intervention via a single data provider among shared caches for SMP bus | |
US6574714B2 (en) | Efficient instruction cache coherency maintenance mechanism for scalable multiprocessor computer system with write-back data cache | |
US6378048B1 (en) | “SLIME” cache coherency system for agents with multi-layer caches | |
US20050027946A1 (en) | Methods and apparatus for filtering a cache snoop | |
US6615321B2 (en) | Mechanism for collapsing store misses in an SMP computer system | |
US7464227B2 (en) | Method and apparatus for supporting opportunistic sharing in coherent multiprocessors | |
US6976132B2 (en) | Reducing latency of a snoop tenure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, A DELAWARE CORPORATION, CALIFOR Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DESAI, KIRAN R.;REEL/FRAME:014418/0687 Effective date: 20030728 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |