US20020078306A1 - Method and apparatus for improving system performance in multiprocessor systems - Google Patents

Method and apparatus for improving system performance in multiprocessor systems Download PDF

Info

Publication number
US20020078306A1
US20020078306A1 US09/334,061 US33406199A US2002078306A1 US 20020078306 A1 US20020078306 A1 US 20020078306A1 US 33406199 A US33406199 A US 33406199A US 2002078306 A1 US2002078306 A1 US 2002078306A1
Authority
US
United States
Prior art keywords
data
processor
bus
cache
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/334,061
Other versions
US6397304B1 (en
Inventor
Varghese George
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US09/334,061 priority Critical patent/US6397304B1/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GEORGE, VARGHESE
Application granted granted Critical
Publication of US6397304B1 publication Critical patent/US6397304B1/en
Publication of US20020078306A1 publication Critical patent/US20020078306A1/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means

Definitions

  • the invention relates to computers in general.
  • the invention relates to a method and apparatus for improving system performance in multiprocessor systems by expanding the type of data available for implicit writebacks.
  • a request for data is originated by a first processor.
  • the other processors detect the data request and ascertain whether they have the requested data in one of their internal caches (“snoop phase”). If the requested data is present, the processor provides the requested data on a bus for transport to the first processor (“data phase”).
  • This entire process is typically governed by a particular bus protocol for the system, and is generally referred to as an “implicit write back” scheme.
  • the cache system of the processor In addition to determining whether the requested data is present in a processor's internal cache, the cache system of the processor must determine the state of the requested data. For example, in a system using the Modified/Exclusive/Shared/Invalid (MESI) cache protocol, the requested data can be in one of four states: Modified (M); Exclusive (E); Shared (S); and Invalid (I).
  • M state indicates that the data within a cache has been modified relative to the same data stored in main memory. Further, both the M and E states indicate that only one processor in the multiprocessor system owns the requested data.
  • the S state indicates that multiple processors own the data. Finally, the I state indicates that the line is invalid (i.e., the cache does not have a copy of the data).
  • One embodiment of the invention comprises a method and apparatus to retrieve data for a multiprocessor system.
  • a request for data is received at a first processor from a bus.
  • a cache is searched for the data, with the data having a state.
  • a determination is made as to whether the state is an exclusive state or shared state.
  • the data is sent to the bus in accordance with the determination.
  • FIG. 1 is a block diagram of a multiprocessor system suitable for practicing one embodiment of the invention.
  • FIG. 2 is a block diagram of a cache system in accordance with one embodiment of the invention.
  • FIG. 3 is a block flow diagram of the operation of a snoop controller in accordance with one embodiment of the invention.
  • the embodiments of the invention provide an improved implicit write back scheme for use in multiprocessor system.
  • the improved implicit write back scheme retrieves more types of data from a cache than conventional implicit write back techniques. This results in a faster, and more efficient, computer system.
  • the embodiments of the invention permit the retrieval of data from a processor's cache that is stored in multiple states.
  • one embodiment of the invention utilizes the MESI cache protocol.
  • data can be retrieved from a processor's cache that is stored in the M state, E state or S state. This is a substantial improvement over conventional implicit writeback schemes.
  • Retrieving data stored in a cache in the E state or S state is substantially faster than retrieving the same data from memory.
  • measurements from real workloads such as defined by the Transaction Processing Step Benchmark “C” (TPC-C) show that in a four processor system, the chances of a processor finding the data it needs in another processor's cache in the E state or S state is approximately 25-38%, depending on the size of the cache.
  • TPC-C Transaction Processing Step Benchmark “C”
  • this embodiment of the invention can retrieve E state or S state data from a cache in as little as 6 bus clocks.
  • the actual latencies achievable depend in part on whether the bus protocol allows out-of-order data transfers or not.
  • the latency decrease derived from this improvement can be further estimated using a simulator.
  • simulating a TPC-C workload using an in-order bus protocol such as the Intel® Pentium® Pro bus protocol
  • an in-order bus protocol such as the Intel® Pentium® Pro bus protocol
  • this embodiment of the invention will reduce the memory latency for all processor reads anywhere from 16-25% depending on the cache size. This reduction is even greater for code fetches in particular, which is on the order of 27-38%.
  • the reductions are in the range of 40-42% and 63-67%, respectively.
  • any reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention.
  • the appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
  • FIG. 1 a multiprocessor system suitable for practicing one embodiment of the invention.
  • a multiprocessor system 100 comprises processors 102 , 104 , 106 and 107 .
  • Each processor includes a cache system 112 .
  • System 100 also includes a memory 108 , which is connected to processors 102 , 104 and 106 via a bus 110 .
  • Processors 102 , 104 , 106 and 107 are also connected to each other via lines 114 , 116 , 118 and 120 . Each line is present to carry a signal defined as HIT 0 #, HIT 1 #, HIT 2 # or HIT 3 #.
  • HITO# is an output signal from a processor indicating that the processor has the requested data in its internal cache in one of an E or S state and its willingness to provide data for the data request.
  • HIT[ 1 - 3 ]# are inputs to a processor which come from other processor's HITO# output indicating other processors have the requested data in their internal caches in one of an E or S state and their willingness to provide data for the data request. The use of these signals will be discussed with reference to FIG. 3.
  • FIG. 2 is a block diagram of a cache system in accordance with one embodiment of the invention.
  • FIG. 2 shows cache system 112 comprising a snoop controller 206 connected to bus 110 via a bus controller 208 .
  • Bus controller 208 includes a request queue 210 .
  • Snoop controller 206 is also connected to an L1 cache 202 , an L2 cache 204 , and buffers 212 , 214 and 216 .
  • Cache system 112 performs implicit writebacks in accordance with a bus protocol.
  • cache system 112 performs implicit writebacks in accordance with the Intel® Pentium® Pro bus protocol bus protocol.
  • Snoop controller 206 controls the overall operation for the implicit write back scheme for a particular processor.
  • snoop controller 206 implements in circuit form an implicit write back algorithm, which is modified to implement the functionality discussed in detail with reference to FIG. 3.
  • the operation of cache system 112 in general, and snoop controller 206 in particular, will also be discussed with reference to FIG. 3.
  • FIG. 3 is a block flow diagram of the operation of a snoop controller in accordance with one embodiment of the invention.
  • a data request is received at a first processor from a bus at step 300 .
  • a search is made as to whether the requested data is present in one of its internal caches of the processor at step 302 .
  • cache data exists in one of many states, depending on such factors as which processor “owns” the data and whether it has been modified.
  • each possible state for the data is defined by the MESI cache protocol. If the requested data is in an M or E state at step 304 , then this data is sent to the bus at step 310 . If the requested data is in the I state at step 304 , then the process ends.
  • steps 304 and 306 are accomplished using the signals defined as HIT 0 #, HIT 1 #, HIT 2 # or HIT 3 #, which will be described in more detail later.
  • the requested data is then sent to the bus in accordance with the priority scheme at step 310 , at which time this process completes.
  • steps 304 and 306 are implemented as follows. Each processor is connected via lines 114 , 116 , 118 and 120 . If a processor has the requested data in the S state, it drives an output HIT# 0 signal which is detected by the other processors. For example, if processor 102 has the requested data, it will send an output HIT 0 # signal via line 114 . Processors 104 , 106 and 107 will detect the HIT 0 # signal sent via line 114 as input signals HIT 3 #, HIT 2 # and HIT 1 # for processors 104 , 106 and 107 , respectively.
  • processor 104 if processor 104 has the requested data, it will send an output HIT 0 # signal via line 116 .
  • Processors 102 , 106 and 107 will detect the signal sent via line 116 as input signals HIT 1 #, HIT 3 # and HIT 2 # for processors 102 , 106 and 107 , respectively. This continues for processors 106 and 107 . It can be appreciated that as the number of processors increase or decrease within a given system, that the above-described configuration could be modified accordingly and still fall within the scope of the invention.
  • the first processor if it has the requested data in S state at step 304 , it will send a confirmation signal (e.g., HIT 0 #) indicating such to the other processors.
  • the first processor will then determine which of the other processors also has the requested data in the S state by checking the HIT[ 1 - 3 ]# input signals. This information will be used, in conjunction with a priority scheme, to determine which processor should actually drive the requested data on the data bus.
  • the embodiments of the invention help relieve processing burdens from the system chipset.
  • the system chipset when detecting any of the HIT[ 0 - 3 ]# signals as active, will know that it does not need to provide the data for this particular data request and it will let one of the processors provide the requested data on the bus.
  • a priority scheme is implemented for handling this situation.
  • the priority scheme utilizes an agent identifier (AgentID) which is a unique identifier for each processor.
  • AgentID can be assigned during the system initialization process.
  • processors 102 , 104 , 106 and 107 are assigned AgentID numbers 0 , 1 , 2 and 3 , respectively.
  • the cache system for the first processor would use the information gathered at step 306 to determine whether it should drive the requested data on the data bus at step 310 . If another processor is selected by the priority scheme at step 308 , the cache system for the first processor would simply end the process. If the first processor has priority according to the priority scheme at step 308 , then the first processor sends the requested data to the bus at step 310 and terminates the process.
  • processor 102 receives a data request from processor 104 over bus 110 .
  • Cache system 112 of processors 102 , 106 and 107 receive the data request and begin the snoop phase at step 300 .
  • Each cache system 112 determines whether the requested data is present in one of its internal caches L1 202 or L2 204 , and the current state of the requested data at step 302 . If caches L1 202 or L2 204 has the requested data in an I state at step 304 , then the process is completed. If either L1 202 or L2 204 has the requested data in an M state or an E state at step 304 , then snoop controller 112 sends the data to the bus at step 310 and the process terminates.
  • snoop controller 206 for processor 102 sends a confirmation signal to let processors 104 , 106 and 107 that processor 102 has the requested data in the S state.
  • Snoop controller 206 determines which of the other processors also has the requested data in one of its internal caches at step 306 .
  • snoop controller 206 performs this function by checking the pins for input signals HIT[ 1 - 3 ]# during the snoop phase to determine which are active. Assume for purposes of this example that input signal HIT 3 # is active for processor 102 .
  • Snoop controller 206 is now aware that processor 107 also contains the requested data in the S state.
  • Snoop controllers 206 for both processors 102 and 107 check the priority scheme to determine whether processor 102 or processor 107 should send the requested data at step 308 .
  • processor 102 is AgentID 0
  • processor 107 is AgentID 3
  • the sample binary values for input signals HIT 0 #, HIT 1 #, HIT 2 # and HIT 3 # are w, x, y and z, respectively.
  • Snoop controller 206 for processor 102 determines that since w is active (e.g., has a value of “1”), and since AgentID 0 is first in the priority scheme, then processor 102 is selected to drive the requested data.
  • snoop controller 206 for processor 107 determines that since w and x are active (e.g., has a value of “1”), and since AgentID 0 comes before AgentID 3 in the priority scheme, then processor 102 has been selected and processor 107 is not to drive the requested data on the data bus. Snoop controller 206 for processor 102 then sends the requested data to the bus at step 310 , accordingly, and snoop controller 206 terminates the process.
  • processors 102 , 104 , 106 and 107 were used in system 100 , it can be appreciated that any number of processors could be used and still fall within the scope of the invention.
  • MESI cache protocol was used in the embodiments of the invention, any cache protocol having data in a shared state or exclusive state could be used in accordance with the principles described herein and still fall within the scope of the invention.

Abstract

A method and apparatus to retrieve data for a multiprocessor system is described. A request for data is received at a first processor from a bus. A cache is searched for the data, with the data having a state. A determination is made as to whether the state is an exclusive state or shared state. The data is sent to the bus in accordance with the determination.

Description

    FIELD OF THE INVENTION
  • The invention relates to computers in general. In particular, the invention relates to a method and apparatus for improving system performance in multiprocessor systems by expanding the type of data available for implicit writebacks. [0001]
  • BACKGROUND OF THE INVENTION
  • In a shared memory multiprocessor system, data necessary for one processor is often present in a cache of another processor. It is more efficient to retrieve such data from the cache rather than memory. Furthermore, the system must ensure that a request for data by a processor or input/output device) is answered with the most current version of the data available. Therefore, the system processes a request for data by first attempting to retrieve the requested data from a processor's internal cache before going to main memory. [0002]
  • In conventional multiprocessor systems, a request for data is originated by a first processor. The other processors detect the data request and ascertain whether they have the requested data in one of their internal caches (“snoop phase”). If the requested data is present, the processor provides the requested data on a bus for transport to the first processor (“data phase”). This entire process is typically governed by a particular bus protocol for the system, and is generally referred to as an “implicit write back” scheme. [0003]
  • In addition to determining whether the requested data is present in a processor's internal cache, the cache system of the processor must determine the state of the requested data. For example, in a system using the Modified/Exclusive/Shared/Invalid (MESI) cache protocol, the requested data can be in one of four states: Modified (M); Exclusive (E); Shared (S); and Invalid (I). The M state indicates that the data within a cache has been modified relative to the same data stored in main memory. Further, both the M and E states indicate that only one processor in the multiprocessor system owns the requested data. The S state indicates that multiple processors own the data. Finally, the I state indicates that the line is invalid (i.e., the cache does not have a copy of the data). [0004]
  • Conventional implicit write back schemes, however, may be less than satisfactory for a number of reasons. For example, if the requested data that one processor wants to read is present in the M state in another processor's cache, that processor is required to provide the requested data on the bus. If any of the processor's have the data in the S or E state, however, the data is not provided from the processor's cache, but rather it is read from memory. This introduces latency into the data retrieval process, since it is slower to retrieve data from memory than from a cache. Moreover, retrieval from memory may unnecessarily occupy limited main memory bandwidth since other caches contain the requested data. [0005]
  • In view of the foregoing, it can be appreciated that a substantial need exists for a method and apparatus that solves the above-discussed problems. [0006]
  • SUMMARY OF THE INVENTION
  • One embodiment of the invention comprises a method and apparatus to retrieve data for a multiprocessor system. A request for data is received at a first processor from a bus. A cache is searched for the data, with the data having a state. A determination is made as to whether the state is an exclusive state or shared state. The data is sent to the bus in accordance with the determination.[0007]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of a multiprocessor system suitable for practicing one embodiment of the invention. [0008]
  • FIG. 2 is a block diagram of a cache system in accordance with one embodiment of the invention. [0009]
  • FIG. 3 is a block flow diagram of the operation of a snoop controller in accordance with one embodiment of the invention. [0010]
  • DETAILED DESCRIPTION
  • The embodiments of the invention provide an improved implicit write back scheme for use in multiprocessor system. The improved implicit write back scheme retrieves more types of data from a cache than conventional implicit write back techniques. This results in a faster, and more efficient, computer system. [0011]
  • More particularly, the embodiments of the invention permit the retrieval of data from a processor's cache that is stored in multiple states. For example, one embodiment of the invention utilizes the MESI cache protocol. In this embodiment of the invention, data can be retrieved from a processor's cache that is stored in the M state, E state or S state. This is a substantial improvement over conventional implicit writeback schemes. [0012]
  • Previously, conventional implicit writeback schemes were limited to retrieving cache data stored only in the M state. One reason for this is that a processor must receive the latest copy of the requested data. Thus, a primary consideration driving implicit writeback schemes was not improved performance, but rather maintaining data accuracy. Due to their focus on data accuracy, conventional implicit writeback systems failed to retrieve E or S state data since there was no issue of data accuracy associated with data stored in the E or S state. [0013]
  • Retrieving data stored in a cache in the E state or S state is substantially faster than retrieving the same data from memory. For example, measurements from real workloads such as defined by the Transaction Processing Counsel Benchmark “C” (TPC-C) show that in a four processor system, the chances of a processor finding the data it needs in another processor's cache in the E state or S state is approximately 25-38%, depending on the size of the cache. Moreover, it can take approximately 30-40 bus clocks (assuming a 133 Megahertz bus frequency) for a conventional system to retrieve E state or S state data from memory. By way of contrast, this embodiment of the invention can retrieve E state or S state data from a cache in as little as 6 bus clocks. The actual latencies achievable depend in part on whether the bus protocol allows out-of-order data transfers or not. [0014]
  • The latency decrease derived from this improvement can be further estimated using a simulator. By simulating a TPC-C workload using an in-order bus protocol (such as the Intel® Pentium® Pro bus protocol), it has been determined that this embodiment of the invention will reduce the memory latency for all processor reads anywhere from 16-25% depending on the cache size. This reduction is even greater for code fetches in particular, which is on the order of 27-38%. For a bus protocol that supports out-of-order data transfers the reductions are in the range of 40-42% and 63-67%, respectively. [0015]
  • In addition to the latency decrease, every time a processor provides data from its cache means that the associated chipset does not need to provide the data. This reduces the burden on the chipset and allows it to do more useful work. The diversion of work from the chipset to the processors takes advantage of the high bandwidth and short latencies associated with current L2 caches, while reducing the demand for system memory and system memory latencies which are relatively scarce resources. [0016]
  • The capability of retrieving data stored in a cache in the E state or S state is also advantageous from a bus utilization viewpoint. Since more data reads come from the processors in a multiprocessor system, each processor can provide data very efficiently on the connecting bus. [0017]
  • It is worthy to note that any reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment. [0018]
  • Referring now in detail to the drawings wherein like parts are designated by like reference numerals throughout, there is illustrated in FIG. 1 a multiprocessor system suitable for practicing one embodiment of the invention. As shown in FIG. 1, a [0019] multiprocessor system 100 comprises processors 102, 104, 106 and 107. Each processor includes a cache system 112. System 100 also includes a memory 108, which is connected to processors 102, 104 and 106 via a bus 110.
  • [0020] Processors 102, 104, 106 and 107 are also connected to each other via lines 114, 116, 118 and 120. Each line is present to carry a signal defined as HIT0#, HIT1#, HIT2# or HIT3#. HITO# is an output signal from a processor indicating that the processor has the requested data in its internal cache in one of an E or S state and its willingness to provide data for the data request. HIT[1-3]# are inputs to a processor which come from other processor's HITO# output indicating other processors have the requested data in their internal caches in one of an E or S state and their willingness to provide data for the data request. The use of these signals will be discussed with reference to FIG. 3.
  • It can be appreciated that the particular configuration shown herein is chosen as an example only and is not limitive of the type of computer system on which the present invention can work. The number of configurations that computer systems can take are virtually limitless and techniques for setting up these configurations are well known to those skilled in the art. The present invention can operate on any of these possible configurations. [0021]
  • FIG. 2 is a block diagram of a cache system in accordance with one embodiment of the invention. FIG. 2 shows [0022] cache system 112 comprising a snoop controller 206 connected to bus 110 via a bus controller 208. Bus controller 208 includes a request queue 210. Snoop controller 206 is also connected to an L1 cache 202, an L2 cache 204, and buffers 212, 214 and 216.
  • [0023] Cache system 112 performs implicit writebacks in accordance with a bus protocol. In this embodiment of the invention, cache system 112 performs implicit writebacks in accordance with the Intel® Pentium® Pro bus protocol bus protocol.
  • [0024] Snoop controller 206 controls the overall operation for the implicit write back scheme for a particular processor. In particular, snoop controller 206 implements in circuit form an implicit write back algorithm, which is modified to implement the functionality discussed in detail with reference to FIG. 3. The operation of cache system 112 in general, and snoop controller 206 in particular, will also be discussed with reference to FIG. 3.
  • FIG. 3 is a block flow diagram of the operation of a snoop controller in accordance with one embodiment of the invention. As shown in FIG. 3, a data request is received at a first processor from a bus at step [0025] 300. A search is made as to whether the requested data is present in one of its internal caches of the processor at step 302.
  • A determination is made as to the state of the requested data at [0026] step 304. As previously explained, cache data exists in one of many states, depending on such factors as which processor “owns” the data and whether it has been modified. In this embodiment of the invention, each possible state for the data is defined by the MESI cache protocol. If the requested data is in an M or E state at step 304, then this data is sent to the bus at step 310. If the requested data is in the I state at step 304, then the process ends.
  • If the requested data is in the S state at [0027] step 304, then the cache system for the first processor sends a confirmation signal to let the other processors know that it has the requested data in the S state at Step 305. The cache system then determines which of the other processors also has the requested data in one of its internal caches at step 306. In this embodiment of the invention, steps 304 and 306 are accomplished using the signals defined as HIT0#, HIT1#, HIT2# or HIT3#, which will be described in more detail later. Once one or more of the above signals are received at the first processor from the bus indicating that a second processor (and/or third processor and/or fourth processor) also has the requested data, a determination is made as to whether the first processor or the second processor (or another processor having the requested data) should send the requested data using a priority scheme at step 308. The requested data is then sent to the bus in accordance with the priority scheme at step 310, at which time this process completes.
  • In this embodiment of the invention, steps [0028] 304 and 306 are implemented as follows. Each processor is connected via lines 114, 116, 118 and 120. If a processor has the requested data in the S state, it drives an output HIT#0 signal which is detected by the other processors. For example, if processor 102 has the requested data, it will send an output HIT0# signal via line 114. Processors 104, 106 and 107 will detect the HIT0# signal sent via line 114 as input signals HIT3#, HIT2# and HIT1# for processors 104, 106 and 107, respectively. Similarly, if processor 104 has the requested data, it will send an output HIT0# signal via line 116. Processors 102, 106 and 107 will detect the signal sent via line 116 as input signals HIT1#, HIT3# and HIT2# for processors 102, 106 and 107, respectively. This continues for processors 106 and 107. It can be appreciated that as the number of processors increase or decrease within a given system, that the above-described configuration could be modified accordingly and still fall within the scope of the invention.
  • In view of the above, if the first processor has the requested data in S state at [0029] step 304, it will send a confirmation signal (e.g., HIT0#) indicating such to the other processors. The first processor will then determine which of the other processors also has the requested data in the S state by checking the HIT[1-3]# input signals. This information will be used, in conjunction with a priority scheme, to determine which processor should actually drive the requested data on the data bus.
  • The embodiments of the invention help relieve processing burdens from the system chipset. The system chipset, when detecting any of the HIT[[0030] 0-3]# signals as active, will know that it does not need to provide the data for this particular data request and it will let one of the processors provide the requested data on the bus.
  • In this embodiment of the invention, a mechanism is helpful in resolving the situation where more than one processor has the requested data (i.e., in the S state). This is a fairly common scenario in the MESI cache protocol. In such a case, one processor should be picked as the provider of data and also to inform every other processor in the system as to which processor is picked, so that only processor will actually drive the data bus during the data phase. [0031]
  • In this embodiment of the invention, a priority scheme is implemented for handling this situation. The priority scheme utilizes an agent identifier (AgentID) which is a unique identifier for each processor. The AgentID can be assigned during the system initialization process. For example, in this embodiment of the [0032] invention processors 102, 104, 106 and 107 are assigned AgentID numbers 0, 1, 2 and 3, respectively. Thus, at step 308, each processor will execute an algorithm as follows:
    BEGIN ‘Determine if I need to drive the data bus on a HIT’
    {
    I_should_drive_the_bus= FALSE;
    Let each CPU sample its HIT0#, HIT1#, HIT2# and HIT3# pins during the snoop
    phase. Let the sampled binary values be w, x y and z respectively. Let a value of ’1'
    indicate active and ‘0’ indicate ‘inactive’.
    If (w is FALSE) { #If you are not a candidate, quit now!
    Exit(0)
    }
    If (ID= =0){    #If I am AgentID 0, I am picked
    I_should_drive_the_bus= TRUE;
    Exit (0);
    }
    if(ID= =1){      #If I am AgentID 1 and Agent ID 0 wasn't picked, I am picked.
    if(z= =0) {
    I_should_drive_the_bus=TRUE;
    }
    Exit (0);
    }
    if(ID= =2) {     #If I am AgendID 2 and neither ID 0 or 1 was picked, I am picked.
    if (z= =0 && y = =0) {
    I_should_drive_the_bus = TRUE;
    }
    Exit (0);
    }
    if(ID= =3) {     #If I am AgentID 3 and neither ID 0, 1 or 2 was picked, I am picked.
    if (z= =0 && y = = 0 && x = = 0) {
    I_should_drive_the_bus = TRUE;
    }
    Exit (0);
    }
    }
    END ‘Determine if I need to drive the bus on a HIT’
  • Thus, using the above priority scheme, the cache system for the first processor would use the information gathered at [0033] step 306 to determine whether it should drive the requested data on the data bus at step 310. If another processor is selected by the priority scheme at step 308, the cache system for the first processor would simply end the process. If the first processor has priority according to the priority scheme at step 308, then the first processor sends the requested data to the bus at step 310 and terminates the process.
  • The implicit write back scheme of FIG. 3 may be better understood using an example. For clarity, [0034] system 100 and cache system 112 will be used in the example. It can be appreciated, however, that the principles set forth herein could be used with any multiprocessor system or cache system and still fall within the scope of the invention.
  • In this example, [0035] processor 102 receives a data request from processor 104 over bus 110. Cache system 112 of processors 102, 106 and 107 receive the data request and begin the snoop phase at step 300. Each cache system 112 determines whether the requested data is present in one of its internal caches L1 202 or L2 204, and the current state of the requested data at step 302. If caches L1 202 or L2 204 has the requested data in an I state at step 304, then the process is completed. If either L1 202 or L2 204 has the requested data in an M state or an E state at step 304, then snoop controller 112 sends the data to the bus at step 310 and the process terminates.
  • If the requested data is in the S state at [0036] step 304, then snoop controller 206 for processor 102 sends a confirmation signal to let processors 104, 106 and 107 that processor 102 has the requested data in the S state. Snoop controller 206 then determines which of the other processors also has the requested data in one of its internal caches at step 306. In this example, snoop controller 206 performs this function by checking the pins for input signals HIT[1-3]# during the snoop phase to determine which are active. Assume for purposes of this example that input signal HIT3# is active for processor 102. Snoop controller 206 is now aware that processor 107 also contains the requested data in the S state. Snoop controllers 206 for both processors 102 and 107 check the priority scheme to determine whether processor 102 or processor 107 should send the requested data at step 308. Assume for this example that processor 102 is AgentID 0 and processor 107 is AgentID 3, and further assume that the sample binary values for input signals HIT0#, HIT1#, HIT2# and HIT3# are w, x, y and z, respectively. Snoop controller 206 for processor 102 determines that since w is active (e.g., has a value of “1”), and since AgentID 0 is first in the priority scheme, then processor 102 is selected to drive the requested data. Similarly, snoop controller 206 for processor 107 determines that since w and x are active (e.g., has a value of “1”), and since AgentID 0 comes before AgentID 3 in the priority scheme, then processor 102 has been selected and processor 107 is not to drive the requested data on the data bus. Snoop controller 206 for processor 102 then sends the requested data to the bus at step 310, accordingly, and snoop controller 206 terminates the process.
  • Although various embodiments are specifically illustrated and described herein, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention. For example, although only four processors ([0037] processors 102, 104, 106 and 107) were used in system 100, it can be appreciated that any number of processors could be used and still fall within the scope of the invention. Further, although the MESI cache protocol was used in the embodiments of the invention, any cache protocol having data in a shared state or exclusive state could be used in accordance with the principles described herein and still fall within the scope of the invention.

Claims (15)

What is claimed is:
1. A method to retrieve data for a multiprocessor system, comprising:
receiving a request for data at a first processor from a bus;
searching a cache for said data, said data having a state;
determining whether said state is one of exclusive or shared;
sending said data to said bus in accordance with said determination.
2. The method of claim 1, wherein said state is shared, further comprising determining whether to send said data to said bus prior to sending said data to said bus.
3. The method of claim 2, wherein said determining whether to send said data comprises:
receiving a first signal at said first processor from said bus indicating a second processor has said data;
determining whether said first processor or said second processor should send said data using a priority scheme; and
sending said data to said bus in accordance with said priority scheme.
4. The method of claim 3, wherein said determining whether said first processor or second processor should send said data using said priority scheme comprises:
determining whether said first processor is of a higher priority then said second processor; and
sending said data to said bus if said first processor is of a higher priority then said second processor.
5. The method of claim 4, further comprising sending a second signal from said first processor to said second processor indicating said first processor has said data.
6. The method of claim 1, further comprising:
sending a third signal from said first processor to said bus indicating said first processor has said data, said third signal being sent using the same timing as the second signal.
7. An apparatus comprising:
a cache memory; and
control logic to receive a data request from said bus and to responsively provide data to said bus if said data request is for data in an exclusive or shared state in said cache memory.
8. The apparatus of claim 7, wherein said control logic comprises a snoop controller and a bus controller.
9. The apparatus of claim 7, wherein said control logic contains logic to determine whether data in said shared state should be sent to the data bus in response to said data request.
10. The apparatus of claim 9, wherein said control logic determines whether to send data in said shared state to said bus as a function of cache control signals received from another cache.
11. A system, comprising:
a first processor having a first cache system, said first cache system capable of sending a data request;
a second processor having a second cache system;
a third processor having a third cache system;
a bus to communicate signals between said first, second and third cache systems; and
wherein said second and third cache systems each include a cache to store data identified by said data request in an exclusive or shared state; and
wherein said second and third cache systems are capable of receiving said data request, and determining whether to send said stored data to said first processor in response to said data request using said bus.
12. The system of claim 11, wherein said stored data is in said shared state.
13. The system of claim 12, wherein said second and third cache systems are connected to each other by at least two signal lines to communicate signals indicating whether each cache for said cache systems contain stored data in said shared state.
14. The system of claim 13, wherein said second and third cache systems use a priority scheme and said signals to determine whether to send said stored data to said first processor in response to said data request using said bus.
15. The system of claim 14, wherein said priority scheme determines whether second or third cache system is of a higher priority, and sends said stored data accordingly.
US09/334,061 1999-06-16 1999-06-16 Method and apparatus for improving system performance in multiprocessor systems Expired - Fee Related US6397304B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/334,061 US6397304B1 (en) 1999-06-16 1999-06-16 Method and apparatus for improving system performance in multiprocessor systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/334,061 US6397304B1 (en) 1999-06-16 1999-06-16 Method and apparatus for improving system performance in multiprocessor systems

Publications (2)

Publication Number Publication Date
US6397304B1 US6397304B1 (en) 2002-05-28
US20020078306A1 true US20020078306A1 (en) 2002-06-20

Family

ID=23305418

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/334,061 Expired - Fee Related US6397304B1 (en) 1999-06-16 1999-06-16 Method and apparatus for improving system performance in multiprocessor systems

Country Status (1)

Country Link
US (1) US6397304B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050268031A1 (en) * 2004-05-28 2005-12-01 Yoshinori Matsui Method for controlling cache memory of storage device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6604162B1 (en) 2000-06-28 2003-08-05 Intel Corporation Snoop stall reduction on a microprocessor external bus
US6839808B2 (en) * 2001-07-06 2005-01-04 Juniper Networks, Inc. Processing cluster having multiple compute engines and shared tier one caches
EP2823389B1 (en) 2012-05-18 2017-10-04 Hitachi, Ltd. Computer system and control method thereof

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5297269A (en) * 1990-04-26 1994-03-22 Digital Equipment Company Cache coherency protocol for multi processor computer system
US5802574A (en) * 1993-12-28 1998-09-01 Intel Corporation Method and apparatus for quickly modifying cache state
JP3660679B2 (en) * 1994-03-01 2005-06-15 インテル・コーポレーション Advanced pipeline bus architecture
US6073211A (en) * 1994-12-13 2000-06-06 International Business Machines Corporation Method and system for memory updates within a multiprocessor data processing system
US5765208A (en) * 1995-09-29 1998-06-09 Motorola, Inc. Method of speculatively executing store instructions prior to performing snoop operations
US5778438A (en) * 1995-12-06 1998-07-07 Intel Corporation Method and apparatus for maintaining cache coherency in a computer system with a highly pipelined bus and multiple conflicting snoop requests
US6115795A (en) * 1997-08-06 2000-09-05 International Business Machines Corporation Method and apparatus for configurable multiple level cache with coherency in a multiprocessor system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050268031A1 (en) * 2004-05-28 2005-12-01 Yoshinori Matsui Method for controlling cache memory of storage device
US7293144B2 (en) * 2004-05-28 2007-11-06 Hitachi, Ltd Cache management controller and method based on a minimum number of cache slots and priority

Also Published As

Publication number Publication date
US6397304B1 (en) 2002-05-28

Similar Documents

Publication Publication Date Title
US6353877B1 (en) Performance optimization and system bus duty cycle reduction by I/O bridge partial cache line write
US5774700A (en) Method and apparatus for determining the timing of snoop windows in a pipelined bus
US5659710A (en) Cache coherency method and system employing serially encoded snoop responses
US6012120A (en) Method and apparatus for providing DMA transfers between devices coupled to different host bus bridges
US6434639B1 (en) System for combining requests associated with one or more memory locations that are collectively associated with a single cache line to furnish a single memory operation
US6553442B1 (en) Bus master for SMP execution of global operations utilizing a single token with implied release
US20040139304A1 (en) High speed virtual instruction execution mechanism
US6502168B1 (en) Cache having virtual cache controller queues
US6321307B1 (en) Computer system and method employing speculative snooping for optimizing performance
WO2006012047A1 (en) Direct processor cache access within a system having a coherent multi-processor protocol
US6754775B2 (en) Method and apparatus for facilitating flow control during accesses to cache memory
US6507880B1 (en) Bus protocol, bus master and bus snooper for execution of global operations utilizing multiple tokens
US6480915B1 (en) Bus protocol and token manager for SMP execution of global operations utilizing a single token with implied release
US6976128B1 (en) Cache flush system and method
US6460101B1 (en) Token manager for execution of global operations utilizing multiple tokens
US6442629B1 (en) Bus protocol and token manager for execution of global operations utilizing a single token with multiple operations with explicit release
US6484230B1 (en) Method and system for speculatively processing a load instruction before completion of a preceding synchronization instruction
US6397304B1 (en) Method and apparatus for improving system performance in multiprocessor systems
US7203811B2 (en) Non-fenced list DMA command mechanism
US5802562A (en) Information processing system and including a supplemental memory and method of operation
KR0153487B1 (en) System and method for communicating between devices
US6604162B1 (en) Snoop stall reduction on a microprocessor external bus
US7159077B2 (en) Direct processor cache access within a system having a coherent multi-processor protocol
US6745298B2 (en) Internal processor buffering for implicit writebacks
US20020169930A1 (en) Memory access control system, method thereof and host bridge

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, OREGON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GEORGE, VARGHESE;REEL/FRAME:010044/0921

Effective date: 19990511

CC Certificate of correction
FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20140528