EP0507063A1 - Verfahren und Vorrichtung für ein Verzeichnis zum kreuzweisen Ungültigerklären - Google Patents

Verfahren und Vorrichtung für ein Verzeichnis zum kreuzweisen Ungültigerklären Download PDF

Info

Publication number
EP0507063A1
EP0507063A1 EP92102390A EP92102390A EP0507063A1 EP 0507063 A1 EP0507063 A1 EP 0507063A1 EP 92102390 A EP92102390 A EP 92102390A EP 92102390 A EP92102390 A EP 92102390A EP 0507063 A1 EP0507063 A1 EP 0507063A1
Authority
EP
European Patent Office
Prior art keywords
cpu
data unit
directory
ownership
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP92102390A
Other languages
English (en)
French (fr)
Inventor
Patrick Melvin Gannon
Michael Ignatowski
Matthew Anthony Krygowski
Lishing Liu
Donald Walter Price
William King Rodiger
Gregory Salyer
Yee-Ming Ting
Michael Paul Witt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of EP0507063A1 publication Critical patent/EP0507063A1/de
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/082Associative directories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies

Definitions

  • the invention relates to a cross-invalidate (XI) directory for private processor caches in a multiple processor system to control data coherence in a system.
  • XI cross-invalidate
  • Prior multiple-processor systems have used processor-private store-in L1 caches; and they have maintained the coherence of data in the system by using a set of copy directories, which are copies of all L1 cache directories. Each processor's fetch request is cross-interrogated in the copy directories of all other processors to find if any other processor has a copy of a requested data unit. This process assures that only one processor at a time can have exclusive (EX) ownership for writing in a data unit in the system. Only the one processor that has exclusive ownership of a data unit is allowed to write into the data unit.
  • a data unit can also have public ownership (previously called readonly (RO) authority) which allows all processors to read (fetch) the data unit, but prohibits all processors from writing into the data unit.
  • RO readonly
  • the data coherence problem is simpler with a store-through type of cache, which requires all stores made in the L1 cache also be concurrently made in a backing memory.
  • the memory backing the L1 private processor caches may be an L2 shared memory, or it may be the L3 main memory.
  • the shared L2 cache may be store-in or store-through, but preferably is store-in to reduce the store bus traffic to main memory.
  • the store-in type of cache has been used in computer systems because it requires less bandwidth for its memory bus (between the memory and the cache) than is required by a store-through type of cache for the same frequency of processor accesses.
  • Each cache location may be assigned to a processor request and receive a copy of a data unit fetched from system main memory or from another cache in the system.
  • a processor stores into a data unit in a cache location without storing into the correspondingly addressed data unit in main memory, which causes the cache location to become the only location in the system containing the latest changed version of the data unit.
  • the processor may make as many stores (changes) in the data unit as its executing program requires.
  • the integrity of data in the system requires that the latest version of any data unit be used for any subsequent processing of the data unit.
  • a store-through type of cache is used only for fetching, and maintains the latest version of their accessed data units by having all store accesses change both the processor's store-through cache as well as the same data unit in a memory (another cache or main storage) at the next level in the system storage hierarchy.
  • the store-through characteristic of such caches do not solve the coherence problem in the system since another processor's store-through cache could contain an older version of the same data unit. Therefore, cross-interrogation of the contents of private processor caches in multiple processor systems is needed whether they are store-in or store-through when a new request is being fetched into a processor cache.
  • Exclusive ownership (authority to change a cache data unit) is assigned to any processor before it is allowed to perform its first store operation in a data unit.
  • the assignment of processor ownership has been conventionally done by setting an exclusive (EX) flag bit in a cache directory (sometimes called a tag directory) associated with the respective data unit in the cache.
  • the EX flag bit's ON state typically indicates exclusive ownership and the off state of the EX flag bit indicates public ownership (called "read-only authority").
  • Exclusive ownership by a processor allows only it to store into the data unit, but public (read-only) ownership of a data unit does not allow any processor to store into that data unit and up to all processors in the system to read that data unit (which can result in multiple copies of the non-changeable data unit in different processor caches in the system).
  • a cache fetches data units from its storage hierarchy on a demand basis, and a processor cache miss generates a fetch request which is sent to the next level in the storage hierarchy for fetching the data unit.
  • a store-in cache transmits its changed data units to main memory under control of cache replacement controls, sometimes called the LRU controls.
  • Replacement of the data unit may occur when it has not been recently accessed in the cache, and no other cache entry is available for the new request. This replacement process is sometimes called “aging out” when a least recently used (LRU) entry is chosen to be replaced with a new request.
  • the replacement controls cause the data unit (whether changed or not) in the selected entry to be replaced by another data unit (fetched as a result of a cache miss).
  • the data unit to be replaced in the cache has been changed, it must be castout of the cache and written into another place such as main memory before it is lost by being overwritten by the newly requested data unit being fetched from main memory.
  • a processor may request a data unit not currently in the cache, which must be fetched from main memory (or from another cache) using the requested address and stored in the newly assigned LRU cache location.
  • the cache assignment of a location for the new data unit will be in a cache location not in current use if one can be found. If all of the useable cache locations are currently occupied with changed data units, then one of them must be reassigned for the new request. But before the new data unit can be written into the cache location, a castout to main memory is required of the updated cache data unit in that location. The castout process must then be used before the new data unit is written into the cache.
  • the castout data unit has its ownership changed from an exclusive processor ownership to a main memory ownership.
  • This XI process assured exclusivity of a data unit to only one processor at a time by invalidating any copy of the data unit found in any other processor's private cache.
  • exclusive ownership only one of the plural processors in a multiprocessing (MP) system can have exclusive ownership (write authority) at any one time over any data unit.
  • the exclusive ownership over any data unit may be changed from one processor to another when a different processor requests exclusive ownership.
  • the prior mechanism for indicating exclusive ownership for a processor was to provide an exclusive (EX) flag bit in each L1 directory entry in a processor's private L1 cache; and the EX bit was set on to indicate which of the associated data units were "owned” by that processor.
  • the reset state of the EX flag bit indicated public ownership, which was called "readonly authority" for the associated data unit that made it simultaneously available to all processors in the system.
  • each valid data unit in any processor's private L1 cache had either exclusive ownership or public ownership.
  • USA patent application serial no. 680,176 filed on 3 April 1991 and assigned to the same assignee describes and claims an ownership interlock control for cache data units. It interlocks a change of ownership for an exclusively-owned data unit in a store-in cache with the completion of all stores to the data unit issued by its processor up to the time it responds to a received cross-invalidate (XI) signal caused by another processor requesting the data unit either exclusively or with public ownership.
  • XI cross-invalidate
  • the invention provides a single cross-invalidate (XI) directory shared by processors in a multiple processor system (MP) for controlling the coherence of data in a shared memory.
  • Each processor has its own private (L1) cache for containing copies of data units fetched from the shared memory.
  • the shared memory may be the system main memory, or may be a shared cache (L2) in the memory hierarchy.
  • the XI directory is accessed by requests to the shared memory (to the L2 cache, or to main memory if there is no L2 cache).
  • the processor private L1 caches may be store-in or store-through types of caches.
  • the XI directory may be located in a system storage controller (SCE) which interfaces the private L1 CPU caches to any shared L2 cache and to the system main memory (L3).
  • SCE system storage controller
  • the shared directory is accessed by a CPU request for a data unit (sometimes called a "line" of data) when the CPU request misses in its private cache (i.e. does not find the requested data unit in its private cache).
  • the XI directory also may be the L2 directory for the L2 cache without having any L1 copy directories in the system.
  • a CPU request misses in the XI cache it is an L2 cache miss and system main storage is accessed for the CPU requested data unit, which is fetched both into the L2 cache and into the requesting processor's L1 cache.
  • the XI directory is used without an L2 cache, its directory content controls accesses to requested data units in system main memory as well as data coherence among the private processor caches. It is desirable that the XI cache has a valid entry for each data unit in all private CPU caches.
  • Each entry in the XI directory of this invention uses a processor identifier (CPID) field to identify which, if any, processor(s) own the associated data unit, and indicates whether the ownership is exclusive (EX) or public (RO).
  • Exclusive ownership means that only one of the processors is the owner of the data unit.
  • Public ownership means that all processors in the system have a common ownership of the data unit.
  • the use of a CPID field(s) in each entry eliminates the need for a plurality of copy directories (copies of the L1 private CPU cache directories previously used for providing CPU identification in response to a CPU fetch request for EX ownership of a requested data unit in main storage by any of the plural processors in a system).
  • the elimination of copy directories and their cross-interrogation speeds up the average processor access to memory data by eliminating several operations in the critical processing path of handling processor fetch requests requiring that a processor acquire exclusive ownership.
  • This invention uses the CPID field (in a directory entry accessed by a request) to generate an XI request to the current owner(s) of the data unit identified by the CPID field when a request requires a change in ownership of the data unit.
  • An XI request is only sent to a processor(s) other than the requesting processor.
  • the XI signal requests the other CPU's L1 cache to terminate its ownership by invalidation (or to change it to public ownership), which must be done before the data unit can be used by the requesting CPU.
  • the generated XI signal includes a identifier tag representing its requestor and an EX/RO indicator of whether exclusive or public ownership is being requested.
  • the XI signal is sent on XI busing to the CPID identified processor.
  • the XI busing may comprise a plurality of buses, one to each CPU, from the XI signal generating means, of which a particular XI transmission is provided only on the bus(es) selected by the CPID for the XI operation.
  • the XI busing may be a single bus serially connected to all processors, in which each processor can detect and select an XI signal containing the processor's address.
  • the receiving processor stores its received XI signals into an XI queue (sometimes called a BIAS, buffer invalidation address stack).
  • the SCE XI circuits can immediately transmit another request, because the requestor's ID tag is carried to the processor receiving any XI signal.
  • the receiving processor processes the received XI signal and then provides an XI response signal containing the related ID tag.
  • the SCE uses the ID tags in its received XI responses to correlate them with its transmitted XI requests, and this correlation is used by the SCE to consummate the ownership change required for each requested data unit.
  • the ID tags with the XI signals and the XI responses allows them to be processed in a pipelined manner in both their generation and transmission in the SCE and in their processing in the receiving processors.
  • This invention provides for different types of CPIDs, although only one CPID type may be used in each entry of the XI directory in a particular system.
  • one CPID type provide a combination of bits in each CPID field to represent a CPID value. Any processor having exclusive ownership of the associated data unit is identified by a unique CPID field value. And another unique CPID field value indicates when all processors may have public ownership of the associated data unit.
  • This type of CPID value field does not require any EX/RO field in the XI directory entry, because different CPID values respectively identify each CPU having exclusive ownership, and the public ownership CPID value represents all CPUs as having the capability of public ownership without identifying any particular CPU.
  • CPID field provides a set of bits in which each bit, when set on, identifies a respective processor as an owner of the associated data unit.
  • This type of CPID field is used in combination with an EX/RO field in the same XI directory entry to indicate whether exclusive or public ownership of the associated data unit is obtained by the identified processor(s).
  • the CPID field is allowed to identify only one processor in the set when the EX/RO field indicates EX ownership, but the CPID field can identify from zero to all processors in the set when the EX/RO field indicates public ownership.
  • the combination of the CPID and EX/RO fields in an XI directory entry is able to specifically identify one processor in a set of processors as having exclusive ownership, or to specifically identify zero or more processor in up to all processors in the set as having public ownership of the associated data unit.
  • each L1 directory entry has an EX/RO field, but no CPID field, for each data unit in its private-processor cache.
  • the L1 EX/RO field operates in combination with a correspondingly addressed XI entry in the XI directory having either the CPID field alone (without an EX/RO field), or having the CPID field combined with an EX/RO field, depending on whether or not specific identification is required for processors having public ownership, in addition to both cases providing specific identification of a processor having exclusive ownership.
  • the EX field When the EX field is set on in an L1 entry, it indicates its processor has the authority to change (write) into the associated L1 cache data unit owned exclusively, and no coherence signalling with the XI directory is required for L1 hits obtaining fetch or store L1 accesses.
  • the elimination of coherence signalling for fetch or store hits to valid data units in the L1 cache increases the efficiency of CPU operations, whether the L1 caches are of the store-in or store-thru type.
  • a cross-invalidation (XI) signal always requests its receiving processor to terminate its exclusive ownership being indicated in both its L1 cache directory and in the XI directory. But a cross-invalidation (XI) signal need not request its receiving processor to invalidate all ownership of the associated data unit, but instead may request the processor only to change-to-public-ownership. Invalidation of the data unit in the receiving processors L1 cache directory is the way a processor gives up all ownership.
  • a specific XI signal always requests the CPU identified in the accessed CPID field to give up its exclusive ownership, but the XI signal does not always signal a termination of ownership.
  • an XI signal can instead signal a change-to-public ownership which is a reduction in the CPU's ownership.
  • XI stands for cross-invalidation, but XI signals are used for more than invalidation since they may instead signal for a reduction from exclusive to public ownership by requesting the identified CPU to change to public ownership by a "demote-to-RO" signal.
  • an XI signal has a operation field that tells its receiving CPU what it is to do with the XI signal, which is: invalidation or demote-to-RO.
  • This invention does not physically perform any cross-interrogation operation (required with copy directories), but obtains the result of a cross-interrogation operation by merely detecting the CPID field alone, or by detecting the CPID and EX/RO fields together, depending on the type of entry being used in the XI directory.
  • the accessed XI entry may be an XI hit entry having an address matching the address of an L1 miss request, or may be an XI miss entry assigned by the XI directory replacement controls.
  • there is no cross-interrogation signalling required in this invention as was previously required among plural copy directories before an XI signal could be issued to any processor L1 cache, such as in prior USA patent 4,394,791 or 4,675,811.
  • the invention also includes several other features which decrease the signalling required in a system to increase system efficiency in handling XI requests.
  • One feature detects whether a processor is in wait state when it receives an XI request from a requestor (e.g. another processor) wanting a data unit exclusively.
  • a set of latches (located with the XI directory in the SCE) respectively represent the wait states of all processors in the system.
  • a processor is indicated in wait state when its latch is set on and not in wait state when its latch is set off, and each processor controls the state of its wait state latch. If any processor (not is wait state) exclusively requests a data unit which gets an XI hit in an entry indicating EX or RO ownership by another processor (the CPID processor), the wait state latch of the CPID processor is checked to determine if it is in wait state or not.
  • the requestor can immediately acquire ownership of the requested data unit (e.g. by the CPID field being changed in the entry from the XIed processor to the requestor) without waiting for an XI response from the XIed processor, as is normally required.
  • the wait state controls speed up the accessing of data units owned by processors in wait state.
  • Another feature of this invention is an exclusive-aged bit in each XI directory entry.
  • the EX-aged bit is provided as a flag bit in the XI directory entries.
  • the aged bit is set on in the XI directory entry when its corresponding exclusively-owned data unit is aged out of a processor's L1 cache (but not out of the XI directory) by the L1 directory replacement controls (often called L1 LRU controls) so that the L1 processor can use the entry for a different data unit.
  • L1 directory replacement controls often called L1 LRU controls
  • the EX aged bit is set off in the XI directory entry when the data unit is next accessed by any processor for its L1 cache. While the aged bit is on, any CPU can exclusively fetch the XI data unit (and obtain exclusive ownership) without the overhead of any XI signalling. If any CPU does a conditional-exclusive fetch of the data unit while its aged bit is set on in the XI directory, only the aged-out CPU will get the data unit with exclusive ownership, while any other CPU will get the data unit in its L1 cache with public ownership whereby the data unit is demoted to RO in the XI directory.
  • a further feature of this invention is that the size of each data unit in the processor-private L1 caches may be a submultiple of the size of each data unit represented by each entry in the XI directory (which represents data units in system main memory or in an L2 shared cache).
  • the data unit size represented in the XI directory is an integer multiple of the data unit size in the processor caches, for example twice the size.
  • the coherence controls reflect the data unit size difference.
  • the processor ownership represented by the CPID field in an XI directory entry for a larger data unit also represents the ownership of the smaller data unit in the L1 cache directory.
  • the CPID in a related entry in the XI directory for the related larger data unit also represents the ownership of the related smaller L1 data unit, which is a part (e.g. 1/2) of the larger data unit for XI coherence control reasons.
  • the remaining part of the larger unit cannot simultaneously be represented for exclusive ownership in another L1 cache, since only one exclusive processor CPID can be represented in an XI directory entry. But the multiple smaller data units can have public ownership in different processor caches.
  • each XI directory entry It is possible to represent multiple exclusive-owning processors in each XI directory entry for the different sub-multiple parts (each part equal to the L1 data unit size). But this may entail significant overhead and may not obtain a sufficient increase in system performance to be worth the effort. If multiple-EX representation is desired, then multiple CPID fields, each having a respective EX/RO field, could be provided in each XI directory entry, in which each CPID & EX/RO pair of fields is associated with a respective sub-multiple part of the XI directory entry's data unit.
  • Another feature of this invention is aggressive public (RO) fetching of "conditionally-public" fetch requests for a version of this invention using store-thru L1 caches and an XI directory with an L2 cache.
  • a "conditionally-public" type of fetch request is any request to fetch a data unit not previously committed to being stored into or being only fetched. Most requests are for fetches rather than stores.
  • "conditionally-exclusive" fetching was used, in which a bias was caused in returning a fetched data unit with exclusive ownership. This invention finds an advantage in aggressive public fetching for fetches which previously were handled as conditionally-exclusive.
  • a conditionally-public fetch returns data with public (RO) ownership, unless the requests misses in the XI directory, or if the XI directory already indicates exclusive (EX) ownership of the data unit by the requesting CPU.
  • OR exclusive
  • Prior IBM systems returned a data unit with EX ownership for a "conditionally-exclusive" fetch request when the data unit was found for an XI request with either EX or RO ownership).
  • This feature is advantageous in a system in which the L1 cache is store-thru, since no castout can occur for a store- thru L1 cache entry owned EX if it is later changed to RO ownership (unlike an L1 store-in type of cache in which an XI operation called "demote to RO" would cause a castout if a store had changed the data unit during its prior EX ownership.)
  • Figs. 1 and 9 each represent a multiprocessor system (MP) containing central processing units (CPUs) 1 - N in which each CPU contains one private cache and may have two private caches, an instruction cache and a data cache. Only the data cache can receive stores. The instruction cache is readonly and cannot be stored into by the processor.
  • MP multiprocessor system
  • the CPU accesses its instructions from its instruction cache and accesses its operand data from its data cache. Both the data cache and instruction cache are used for fetching a data unit requested by their CPU.
  • Each L1 data cache is a store-through type of cache, and hereafter it is referred to as each CPU's L1 cache for Fig. 1. If an instruction is to be stored into, it is done only in the instruction's data unit in the L2 cache, and then that data unit is fetched into the requesting instruction cache as a readonly data unit.
  • instruction caches are readonly and do not receive store requests, they receives XI requests (like any data cache).
  • An XI request to an instruction cache likewise controls the invalidation of a data unit in the instruction cache for which exclusive ownership is required by any data cache.
  • an XI request to a data cache may be caused by an instruction cache when it wants to fetch a data unit which is owned exclusively by any data cache, which causes a change-to-public-ownership in the data cache.
  • Fig. 1 has an L2 cache
  • Fig. 9 does not have an L2 cache.
  • the next level in the storage hierarchy after the L1 cache level in Fig. 1 is the L2 cache
  • Fig. 9 is the L3 main memory.
  • the XI directory is accessed by fetch requests that miss in any L1 cache.
  • the XI directory also serves as the L2 cache directory to receive fetch requests that miss at the L1 cache level, and to receive all store requests from the CPUs.
  • Store commands do not control ownership changes in the manner of fetch requests, because all stores to any data unit are proceeded by a fetch request that obtains the data unit in the respective cache prior to the issuance of any store commands to the data unit.
  • Fig. 1 if a CPU request is not found in the XI directory, then the XI directory has a "XI miss", which in Fig. 1 is also an L2 cache miss, and the requested address is sent to system main storage (L3), from which the requested data unit is fetched and is sent on the memory bus to the L2 cache, and to the L1 cache of the CPU generating the request.
  • L3 system main storage
  • the data unit for the L1 cache need not be the same size as the data unit in the L2 cache which contains the L1 data unit.
  • each L1 data unit may be sub-multiple of an L2 data unit, or they may be the same size.
  • the XI directory contains an input priority circuit that receives all requests to the L2 cache, i.e. all fetch requests and store commands from all CPUs and all I/O devices.
  • the priority circuit selects one request or command at a time for accessing in the L2 cache directory.
  • a high-order field in the selected request or command selects a row (congruence class) in the XI directory (not shown) and a comparison with an address portion finds any assigned cache directory entry and associated cache data unit location, as is conventionally done in set associative caches so these cache contained items are not shown herein.
  • Each L1 and L2 cache herein is presumed to be a 4-way set associative cache.
  • each request from L1 goes to the controller for the L3 main memory, and operates similarly to the way a request is made to the L2 cache.
  • the size of each data unit in L3 may be the same size as a data unit in the L2 cache of Fig. 1, which is a multiple of the size of the data unit in the L1 cache.
  • Each XI directory entry contains the fields shown in Fig. 2, and each L1 directory entry contains the fields shown in Fig. 3.
  • Each XI entry contains a CPU identifier (CPID) field (e.g. three bits) which are combinatorially set to a value (e.g. 1 to 6) that can identify one CPU in the MP which is the current exclusive owner of the corresponding data unit in the L2 cache and in L3 main memory.
  • CPID CPU identifier
  • a zero value in the CPID field indicates a public ownership for the corresponding L2 and L3 data unit.
  • an XI directory replacement means assigns an entry (which is the current LRU entry) in the set-associative row being addressed by the current request.
  • the missed request is sent to L3 to fetch the requested data unit.
  • a conventional LRU replacement means allocates a replacement entry for each directory row (congruence class), in which the LRU one of the four set-associative entries in each row is a candidate for being the next assignable entry in the row for allocation to a requested data unit that must be fetched from L3 memory.
  • the candidate entry is a currently invalid entry, but if there are no invalid entries, the LRU entry is selected of the four entries in each row.
  • any old data unit existing in that slot (represented by the current content of the L2 directory entry) must be checked in the directory entry to determine if it has changed data. This is done by checking the state of a change field (i.e. change bit) in content of the L2 entry before the entry is changed to represent the newly requested data unit. If the old data unit has been changed (as indicated by its CHG bit), it is the latest version of the old data unit which must be castout to the same address in main memory before the newly requested data unit can be stored in the associated location in the cache.
  • a change field i.e. change bit
  • the L2 cache in Fig. 1 is internally interleaved using the four ITRLVs illustrated.
  • the four L2 interleaves each have a separate bus to the L3 main memory, which is also shown with four L3 interleave sections connected to the L3 storage controller.
  • the L3 interleave size need not be the same size as the L2 cache interleave size, even though the L2 and L2 data unit sizes are the same.
  • each single L3 data unit (comprising dozens of bytes) may be contained within any one of the four L3 interleaves.
  • each single L2 data unit (same size as an L3 data unit) may be contained in all four L2 interleaves, in which each L2 interleave has 1/4th of the L2 data unit.
  • Each L2 data unit may be sequentially accessed in the four L2 interleaves by starting with the interleave having the part of the L2 data unit containing the byte address of the L2 request being accessed.
  • any sub-multiple of the L2 data unit size may be used as the L2 interleave size.
  • Fig. 11 also provides a system having multiple CPUs, which essentially operates the same as the system in Fig. 1 or 9. But Fig. 11 permits plural L2 data units to be processed simultaneously in the XI directory; while Fig. 1 processes one L2 data unit at a time, even though it is four way interleaved in the L2 cache.
  • the XI directory is divided by address into different hardware sections to obtain parallel accessing of plural requests among the different sections of the XI directory.
  • the XI directory sections are section 0 to section R, and the number of sections is preferably a power of two.
  • Priority section 0 to priority section R are the respective priority means.
  • Each section can be accessing a request in its subset of addresses simultaneously with accessing other requests in the other sections in their respective subsets of addresses.
  • Each XI directory section has its respective priority means, which receives all request addresses being provided by all CPUs, I/O and internally in the memory system. The priority means filters the received request addresses and passes one address at a time to its respective XI directory section. Filtering is done by selecting and passing an address within its respectively assigned subset of request addresses under a priority selection process.
  • Each XI directory section in Fig. 11 may have a respective cache section that contains the L2 data units that can be addressed by addresses outputted from its associated priority means. If no L2 cache is used in the system, then the multiple-section XI directory in Fig. 11 fundamentally operates in the manner of the system in Fig. 9, except that the Fig. 11 system can process plural XI requests in parallel in its different sections, which in some cases can obtain a significant improvement in system performance.
  • FIGs. 1 and 9 generally illustrate multiprocessor (MP) computer systems which may contain the subject invention.
  • Each system includes N number of CPUs each having a private store- through cache (L1) with its L1 cache directory.
  • Each CPU accesses storage fetch requests in its L1 cache as long as it obtains cache hits indicating the requested data is available in its L1 cache.
  • Fig. 9 does not have an L2 cache.
  • the L1 cache miss causes a fetch request to the next level in the system storage hierarchy, which is the L2 cache in Fig. 1 and the L3 main memory in Fig. 9, where the fetch request is put into a request register, REQ 1 - REQ N, associated with the miss requesting CPU.
  • the CPU request includes the main memory address of the requested data unit and the type of ownership being requested for it which may be either exclusive or public ownership.
  • the CPU may make store commands for storing data into the data unit.
  • a store command usually does not overwrite the entire data unit in either the L1 or L2 cache, but may write only changed byte(s) which are marked into the data unit (which may, for example, contain dozens of bytes).
  • This manner of writing into a data unit is well known in the art, using marker bits in the store command to represent the parts of a data unit to be changed by a given store command.
  • the writing into the L2 cache may be done in the same way or it may done by copying the entire data unit from L1 into L2 immediately after the data has been written into the L1 cache.
  • an I/O request register receives all input and output (I/O) device requests to memory.
  • I/O request accesses the L2 cache since the latest version of a data unit may reside in the L2 cache, where it may be changed by the I/O request. If the I/O request is not in L2, it is then accessed in the L3 main memory without accessing the data unit into the L2 cache.
  • REQ 1 - REQ K present their contained requests to the input priority circuit of the XI directory.
  • the presented requests are sequenced on a machine cycle, or sub-cycle, basis by the priority circuit which presents one request at a time to the XI directory for accessing.
  • Fig. 6 shows hardware controls that are associated with an XI directory 202, which may be the XI directory in either Fig. 1 or 9, except the EX/RO field in each directory entry is not needed in the embodiment of Fig. 1.
  • An L1 register 201 is any of the request registers 1 through N in Fig. 1 or 9; it has an absolute address field, a three bit command field (bits 0:2), and a Req_CPID field.
  • the absolute address field contains the main memory address of the requested data unit.
  • the Req_CPID field contains the CPID (CPU identifier) of the requesting CPU (bits 0:2 for supporting up to 7 CPUs).
  • the three bit command field provides any of four different commands, which are: readonly request, exclusive request, a conditionally-public (i. e. conditionally-readonly) request, and a change-to-exclusive request.
  • the hit/miss logic 206 indicates whether the currently requested data unit is represented in the XI directory in the systems of both Figs. 1 and 9. A hit occurs if the data unit is completely represented and signals access can be made without a change of ownership in the XI directory or in the associated data store. And it is a miss if the data unit is not represented which requires at least a change of ownership indication in the accessed XI directory entry.
  • fetch controls 221 is activated by hit logic 206 to access the requested data unit in the L2 cache for a hit, and for a miss sends a request to the L3 memory for accessing the requested data unit.
  • fetch controls 221 are activated by the hit logic 206 to access the requested data unit in the L3 memory for both a hit or miss.
  • the XI generate controls 211 (shown in more detail in Fig. 5) generate an XI signal from the CPID field in an accessed entry, and selects the XI buses 212a-212n to the respective CPUs for transmitting an XI signal to the CPU(s) indicated by the CPID field in the accessed entry.
  • a specific XI request is only sent on the XI bus the identified CPU in both Figs. 1 and 9.
  • Each specific XI request has a request identifier (TAG) and an XI address which is the main storage address (or representation) of the address to be invalidated or have its ownership changed in the identified CPU L1 cache.
  • a general XI request is sent on all XI buses to all CPUs only in the embodiment of Fig. 1.
  • the XI response controls 231 receive the CPU response, which is responding to a specific XI signal sent to the CPU.
  • the XI response controls the timing of when the XI directory controls can complete their processing for an access request (from another CPU, I/O or castout) that had the ownership conflict found in the XI directory which caused the XI signal.
  • the SCE can change the ownership indication in the accessed directory entry, which may involve only changing the ownership indication therein or may involve writing therein an entirely new content for representing a newly accessed data unit in the case of a castout request.
  • An exceptional XI response may indicate that the CPU was not able to give up ownership when requested, requiring the SCE controls to resend an identified XI signal, or that the CPU needs to get back the ownership of a data unit to perform a store in it when it had been marked in the CPU at the time the CPU gave up ownership of the data unit.
  • wait state vector (WSV) controls 261 are provided in the SCE to support the embodiment shown in Fig. 9.
  • WSV comprises eight bits that respectively correspond to the CPUs in the system. The corresponding CPU sets on its WSV bit when the CPU enters wait state, and sets it off when it leaves wait state.
  • a request/response protocol is used between the WSV set and reset control circuits and the respective CPUs.
  • Figs. 4 and 5 show the hardware pipeline contained in each of the CPUs and the SCE shown in Fig. 1 and 9.
  • the store pipeline in Figs. 4 and 5 connects the stores from any CPU to the shared L2 cache.
  • the nomenclature CPx is used in Figs. 4 and 5 to designate any of the N number of CPUs that is currently receiving an XI signal from the SCE.
  • Each CPU store command causes storing in both the respective CPU's L1 cache and in the shared L2 cache.
  • the manner of storing in L1 may be conventional.
  • Fig. 4 shows a store queue 26 which receives the store commands from its CPx in FIFO order, and sends them to a store stack 27 (located in the SCE, which is the L2 cache and L3 main memory controller) which is in Fig. 5.
  • the stack outputs its oldest store command to the L2 priority circuit for accessing in the L2 directory and L2 cache.
  • Each store command in the store queue 26 and store stack 27 contains both the address and the data for a single store operation.
  • INPTR & OUTPTR inpointer and outpointer registers
  • INPTR locates the current entry in the stack for receiving the next store from queue 26.
  • OUTPTR locates the oldest store in stack 27 to be outputted to the L2 cache.
  • INPTR is incremented each time a store is received in the current inpointer location, and OUTPTR is incremented each time a store is outputted from the stack.
  • Both the INPTR and OUTPTR wrap in the stack so that the stack never runs out of space for a next entry. This type of stack pointer control is conventional.
  • the CPz, IOy or CORn request command registers 1z, 1n or 1y respectively receive the L1 CPU fetch requests, L2 cache LRU replacement requests and I/O device requests for accesses in the L2 cache.
  • Each request command i.e. requestor
  • the registers 1z, 1n and 1y represent different types of request registers, of which only one register is doing a request into the L2 cache at any one time in the embodiment.
  • One of these registers is selected at a time by the L2 priority circuit for a current access cycle for accessing an entry in the L2 directory and its associated cache slot that contains the associated data unit.
  • CPz request register 1z represents any L2 request register that receives any CPU request to L2.
  • the subscript z indicates the CPU is a requesting CPU, while the subscript x is used herein to indicate any CPU which is receiving an XI signal.
  • the CORn (castout) register 1n represents ones of plural castout request registers that receives a current castout request for L2.
  • the subscript n indicates the assigned register of the plural castout registers assigned by an LRU replacement circuit for L2 (not shown) to receive the castout address. Replacement of the content of an L2 entry may done in the conventional manner when a CPU request does not hit (i.e. misses) in the L2 directory.
  • the IOy register 1y represents any of plural registers that is selected by the L2 priority as its current request to the L2 directory. Only I/O requests that hit in L2 are used by this embodiment; an I/O request that does not hit (i.e. misses in the L2 directory) is not fetched into L2, but is then accessed in the L3 main memory in the conventional manner.
  • An access 2 in the SCE tests the value of the CPID field in the currently accessed L2 directory entry in the detailed embodiment. If circuit 2 detects the tested CPID value is in the range of 1-6, it indicates an EX ownership by the identified CPU. But if the tested CPID is zero, access 2 has detected a public RO ownership for the data unit represented by currently selected L2 entry.
  • a detected CPID value of from 1 to 6 in this embodiment indicates the one CPU in the system having exclusive ownership of the data unit associated with the currently selected L2 directory entry.
  • a detected value of zero for the CPID indicates that data unit has public ownership and is therefore is readonly.
  • the specific XI signal initiated by access 2 is sent only to the CPU identified by the CPID in the L2 directory entry.
  • the specific XI signal includes the main memory address (or a representation thereof) for the affected data unit in the receiving processor's cache, an XI type indicator (specific or general), and an identifier (ID TAG) for this L2 request command (requestor) so that the SCE can determine which requestor is responsible for a received XI response.
  • the specific XI type indicator also indicates whether the addressed data unit is to be invalidated or changed to public ownership.
  • the sending of a specific XI signal sets an "XI response wait mode" latch 8 to "XI wait mode".
  • the XI wait cause by a specific XI signal is ended when the SCE receives the XI response from the XI requestor that sent the XI signal getting the XI response.
  • a general XI signal is detected by all CPUs except the requesting CPU and is put into all of their respective XI queues.
  • a general XI signal is sent to all CPUs except the requesting CPU and the receiving CPU only invalidates if it has the addressed data unit, and does not provide any XI response.
  • a XI signal received by any CPx causes a termination in the generation of any store commands by CPx, and the execution of any current instruction is suppressed and will need to be re-executed when processing of the XI signal by CPx is completed.
  • Circuit 21 gates the invalidation address with the XI signal to a compare circuit 22 that compares the XI invalidation address in parallel with all addresses currently in the CPx store queue 26 and generates a compare or no compare signal.
  • the XI invalidation address is used to invalidate any entry in the CPx L1 cache equal to the XI invalidation address.
  • circuit 22 If circuit 22 provides a compare signal, it activates a "send store” circuit 23 to mark any store command to the invalidate address in store queue 26, which then activates an "XI response circuit” 24 to send an XI response signal to the SCE where it resets the "XI response wait mode” latch 8 to terminate the XI wait mode in the SCE.
  • circuit 22 provides a no compare signal on its output G, it indicates there are no store commands in queue 26 for the invalidated address, and output signal G activates "XI response" circuit 24 to send the XI response signal to the SCE where it resets the "XI response wait mode” latch 8 to terminate the XI wait mode in the SCE.
  • the reset of wait mode circuit 8 causes it to output a wait mode termination signal which gates comparator 28 to compare the current L2 request address with all addresses currently in the the CPx store stack 27 using a single cycle parallel compare operation.
  • a compare-equal (cmpr) signal from circuit 28 to an AND gate 29 inputs the content of an INPTR register into a capture INPTR register 15 that captures the current location in stack 27 available for an input of a current CPU store command.
  • the captured INPTR value indicates the last location in the CPx stack 27 which may contain the last store command from CPx, and the OUTPTR register value indicates the CPz location having the oldest store command from CPz in the stack.
  • the OUTPTR value is being incremented to continuously output its store command entries to update the L2 cache entries. The incrementing of OUTPTR will cause its contained pointer address to wrap around and finally become equal to the captured INPTR value.
  • the captured INPTR value is provided to a pointer comparison circuit 38 which compares it with the incremented stack OUTPTR value as the OUTPTR is incremented to output the store commands to the L2 cache.
  • a output signal D is provided from pointer compare circuit 15 to set the "store done mode” latch 13 to indicate that the store stack outputting is not yet done.
  • a output signal E is provided from circuit 15 to reset the "store done mode” latch 13 to indicate that all possible store commands have been outputted from stack 27 into the cache.
  • a current CPU request from CPz may be requesting exclusive ownership when access 2 has found public ownership exists for the currently accessed L2 entry.
  • a current CPz request is detected by box 4 to determine if it wants exclusive or public ownership. If CPz request wants exclusive ownership, then special XI signalling is required, which in CPz specifically sets to exclusive ownership state an EX bit in its valid L1 directory entry at the XI address and generally invalidates the XI address in other all CPUs.
  • All IOy requests are handled by access 2 merely sending a general XI invalidate signal, which prevents any CPU from interfering with any I/O access in the L2 cache.
  • the general XI signal from access 2 is used when there there is no need for any XI response from any of the plural CPUs which may contain the data unit, since none can be doing store commands and all that is needed is L1 invalidation.
  • change access circuit 7 detects the change field in the current directory entry only for a CORn request access 7 it is not used for a CPz or IOy request.
  • a switch 17 sends the data unit (castout) to the next storage level L3.
  • access 16 can immediately obtain the associated data unit from the cache data arrays for the request, i.e. a switch 18 sends the data unit to CPz for a CPU request, or and switch 19 sends the data unit to the requesting channel IOy.
  • Directory entry update means 20 is immediately used for a CORn request that finds no change in the associated data unit. But if the directory entry update means 20 is being used for a CPz request, then the update of directory entry content by means 20 is delayed until after the castout has been completed (for system recovery reasons the initial content of the entry may be needed if a system failure should occur before the castout is completed).
  • the timing delay for the cache data access 16 is controlled by the output F from the "store done" latch 13 when it is reset by a compare-equal signal E from a PTR compare circuit 15 (when the INPTR and OUTPTR are equal). All CPz store command entries to the requested data unit in stack 27 will have been flushed out to the cache when circuit 27 signals its output signal E, since then the OUTPTR will have revolved back to the captured INPTR starting point for the stack output operation, and then cache data access 16 may be initiated.
  • FIGURES 7 and 8 Flow Diagram FIGURES 7 and 8 :
  • a requesting CPU indicates in each of its fetch commands whether it is a fetch request for an exclusive fetch, a readonly fetch, or a conditionally-exclusive fetch.
  • the L1 and XI directories handle each request in a manner dependent on which of these types of fetch commands is being provided by any requesting CPU.
  • An exclusive fetch provides the requested data unit in the processor's L1 cache with exclusive ownership.
  • a readonly fetch provides the requested data unit in the processor's L1 cache with public ownership.
  • a conditionally-exclusive fetch provides the requested data unit in the processor's L1 cache with either exclusive or public ownership depending on conditions existing at the time of the request.
  • FIGs. 7A - 7D and Figs. 8A - 8E apply to the systems shown in Figs. 1, 9 and 11 for the different types of CPU fetch requests that have misses in the L1 cache.
  • the flow diagrams in Figs. 8A -8D include features additional to those used in Figs. 7A - 7D for handling similar types of fetch requests. These flow diagrams are largely self explanatory in view of the background described herein.
  • a hit occurs when the an equal condition is found both between the processor requested address and the address in the accessed entry, and between the processor requested ownership and the ownership indicated in the accessed entry.
  • a hit causes the left path from step 301 or 401 to be taken, and a miss causes the right path from step 301 or 401 to be taken.
  • the hit path is significantly different among these flow charts; the miss path is similar in these flow charts.
  • FIG. 7 Another philosophy used in the process of Figs. 7 does not require all data units valid in the L1 directory to also be valid in the L2 cache directory.
  • the different philosophy used in the flow diagrams of Figs. 8 requires that all data units valid in the L1 directory also be valid in the L2 cache directory (the XI directory).
  • the embodiment of Figs. 7 has the advantage of allowing continued fetching (but no storing) of the LRU L1 data unit by the CPUs, but has the disadvantage of a subsequent exclusive fetch request for that data unit requiring a general XI to all CPUs (and the resulting overhead) which must be performed before any store can be done.
  • Figs. 7A and 8A each represent the operational steps for performing an exclusive fetch request to the XI directory in Figs. 1 or 11 which use the XI directory also as an L2 cache.
  • An exclusive fetch is a fetch request for giving the requesting processor exclusive ownership of a data unit, which is done when the processor knows the fetched data unit will be stored into, such as a fetch request for accessing a data unit for a sink operand.
  • step 302 The hit path is entered at step 302 if step 301 finds a hit entry to be accessed. Or the miss path is entered at step 331 and step 332 if step 301 does not find any entry in the XI directory. Then step 331 activates the entry replacement means for the XI directory entry to assign the current LRU entry in the set-associative row being addressed by the current request; and step 332 accesses the L3 main memory to fetch the requested data unit.
  • step 302 tests the CPID field in the accessed entry to determine if it indicates the public (RO) state.
  • the CPID field may have three bits representing eight different values, 0-7. Value 0 may be used to indicate public (RO) ownership, and values 1-7 may be used to represent exclusive (EX) ownership by respective CPUs 1-7 in a system of 7 CPUs. If the tested CPID value is 0, then the yes (Y) exit is taken from step 302 to step 311 which represents that a general XI signal is generated in the SCE and sent on all XI buses 212 in Fig. 6 to the XI queue in every CPU in the system to invalidate any XI addressed data unit found in any L1 cache.
  • step 312 sets the CPID field in the accessed XI directory entry to the requestor's CPID value to indicate the requestor's ownership of the associated data unit.
  • Step 321 updates the XI directory replacement means for selecting the next assignable entry in the accessed row in the XI directory.
  • step 322 returns the fetched data unit from L3 main memory to the requesting L1 and L2 caches. The exclusive fetch operation is then completed.
  • the size of a data unit in the L1 cache is a sub-multiple of the size of the data unit in either the L3 main memory or the L2 cache.
  • CPID in the associated XI directory entry indicates the CPU of that L1 cache is the owner of all sub-multiples of that data unit.
  • step 303 finds the requestor's CPID is in the accessed XI entry, it is accordingly determined that the requesting processor already exclusively owns the data unit but it does not have the requested sub-multiple of the L3/L2 data unit in its L1 cache. No change of ownership is therefore needed in this case, so step 321 is entered to update the replacement controls for the addressed row, and step 322 returns the required sub-multiple of the data unit from the L2 cache entry to the requesting L1 cache.
  • step 303 finds the accessed CPID field has a value 1-7 which is not equal to the requestor's CPID value, then another CPU has exclusive ownership of the requested data unit.
  • step 303 determines that an XI conflict exists between the requestor and another CPU which currently owns the data unit.
  • the no (N) exit is taken from step 303 to step 304 for which the SCE issues a specific XI signal that is sent only on one of the XI buses 212 to the identified CPU where it is put into the CPU's XI queue for invalidating the XI addressed data unit in its L1 cache.
  • step 309 is indicating the SCE is waiting for an XI response signal from the conflicting CPU that the invalidation has occurred and the conflict no longer exists.
  • step 312 updates the directory entry by setting its CPID ownership field to the value of the requesting CPU.
  • step 321 updates the XI directory replacement means to reflect the current access to the XI entry, and step 322 then accesses the requested L1 sub-multiple of the L2 data unit in the L2 cache and sends it in the L1 cache of the requesting CPU. The exclusive fetch operation is then completed.
  • step 333 is entered to test the state of the valid bit in the replacement assigned LRU entry to determine if further checking of its content is necessary. If the content has its valid bit set to invalid state, then step 333 takes its no exit to step 351 since an invalid content in the assigned LRU entry can be immediately replaced to represent the current request to the XI directory.
  • step 351 is entered to wait until the LRU content can be replaced, it can be done immediately when it is entered from step 333 indicating the LRU entry content is invalid.
  • step 333 finds the content of the LRU XI entry is valid, its content represents the associated LRU data unit which which could have a copy in another processor's L1 cache as either EX or RO, which is indicated by the value of the CPID in the LRU entry. (It is to be noted that the LRU data unit is a different data unit than the requested data unit, and they should not be confused.)
  • Step 342 next checks the state of a change flag bit in the LRU XI entry to determine if the associated LRU data unit has been changed or not. If not changed, the data unit currently has an exact copy in the L3 main memory and no castout is needed, so the no exit from step 342 is taken to step 351 which finds the LRU entry can be immediately replaced, so that step 352 is immediately entered and executed.
  • step 342 finds the LRU change bit is on, then the yes exit is taken to step 343 to perform the castout of the LRU entry from the L2 cache to the L3 memory, since the associated data unit is the only updated version which must be castout to the L3 main memory before the LRU entry content can be destroyed by overlaying it with information for the currently requested data unit.
  • Step 351 determines holds up the writing of information for the new request in the LRU XI entry until the LRU information is no longer needed for a potential recovery action should any failure occur in the castout operation.
  • step 352 is executed in which the XI directory entry for the requested data unit is written into the LRU entry (overlaying the prior entry), the fetched data unit is stored into the L2 cache, the CPID value of the new requestor's CPU is written into the LRU XI entry, and the valid bit is turned on to validate the new request's entries in the XI directory and in the L1 cache entry.
  • the XI miss signalled by step 301 indicates the requested data unit is not represented in the XI directory and therefore does not have its ownership represented by any CPU (and accordingly is not in the L2 cache in this embodiment).
  • the parallel path through step 332 performs the fetch from the L3 memory of the requested data unit and other operations in parallel with the execution of the path through step 331.
  • step 332 sends the request to the L3 main memory for fetching, while step 338 generates a general XI signal to all other CPUs to invalidate any copy of the requested data unit in their L1 caches at the request's address.
  • steps 351 and 352 are executed to put the fetched data unit into the associated L2 cache entry and the request's entry in the XI directory is completed.
  • Step 337 operates as soon as the fetch occurs in L3 to transfer the requested sub-multiple of the data unit on a bypass bus from the L3 memory to the L1 cache of the requesting CPU in order to allow the CPU to begin using the requested data as soon as possible.
  • Fig. 8A The exclusive fetch steps in the embodiment of Fig. 8A is similar but not the same as in the embodiment of Fig. 7A. Their primary differences are caused by the previously-mentioned different philosophy used in the flow diagrams of Figs. 8A-E which requires that all data units valid in the L1 directory also be valid in the L2 cache directory (the XI directory). This philosophy requires that a data unit invalidated in L2 must also be invalidated in all L1 directories. It would violate this philosophy in Fig. 8A to change a data unit to RO in L1 and invalidated it in L2 as is done in Fig. 7A.
  • Fig. 8A The reference numbers used in Fig. 8A all begin with the digit 4, while reference numbers used in Fig. 8A all begin with the digit 3. If the right-most two digits in the reference number are the same in Figs. 7A and 8A, then the step operates the same in Figs. 7A and 8A. But if the right-most two digits in the reference number are the same in Figs. 7A and 8A but they are followed by a letter, a philosophical difference applies.
  • step 441A in Fig. 8A invalidates all copies of the LRU data unit in the L1 directories, while step 441 in Fig. 7A demotes the data unit to readonly, when in both Figures these steps are followed by removal of the LRU data unit from L2 (which is the result of invalidation).
  • Fig. 8A has no equivalent of step 338 in the L2 miss path in Fig. 7A.
  • an L2 miss of a data unit assures it is not available in L2, which further assures that no L1 directory has any part of the data unit under the philosophy in Fig. 8A.
  • step 402A tests the state of the EX/RO bit in the accessed XI entry.
  • step 402A finds the tested EX/RO bit is in EX state in the accessed XI entry, the yes exit is taken to step 403A.
  • Step 403A accesses and tests the identification bit assigned to the requesting CPU in the CPID field of the accessed XI entry. If the requesting CPU's CPID bit is off, then the requesting processor does not have any part of the requested data unit in its L1 cache, and the no exit is taken to step 501 to test the state of the EX-aged bit field in the accessed XI directory entry. If the aged bit is found on, it indicates no processor currently has the associated data unit in its L1 cache, and that the requesting processor can obtain the data unit without any XI signalling overhead.
  • step 421A is entered to turn off the aged bit, update the XI directory replacement means for this access, turn on the CPID bit for the requesting CPU, turn off the CPID bit for each other CPU, and set the EX/RO bit to EX state in the accessed XI directory entry.
  • step 422 sends the requested sub-multiple of the data unit to the L1 cache of the requesting CPU.
  • Fig. 8E describes the process of setting the EX-aged flag bit in an XI directory entry.
  • the initial step 460 represents a signalling operation by the L1 directory entry replacement means that signals (to EX-aged controls associated with the XI directory) the address of a data unit being aged out of (and may be replaced in) the L1 cache.
  • Step 461 is the detection by the EX-aged controls in the addressed data unit of the state existing for its EX/RO bit. If the EX-aged controls find the public (RO) state exists for the addressed XI entry, the no exit is taken from step 461 to step 463.
  • Step 463 represents the controls turning off the CPID bit for the CPU having signalled the aging out, so that the CPID field indicates that CPU L1 cache no longer has the addressed data unit.
  • Step 462 has the EX-aged controls turn on the aged bit in the addressed XI entry when the EX on state is found in that entry, and the CPID bit is left on.
  • the EX-aged bit is turned on, the CPID bit for the aged-out CPU remains on, and it is not turned off until another CPU's L1 cache fetches the L2 data unit, or one of its submultiples.
  • Step 460 indicates when all parts (submultiples) are no longer available in the L1 cache after any submultiple ages out of the L1 cache, whether the data unit was exclusively or publically owned. If an exclusively-owned data unit is aged out of any private L1 CPU cache and it has a submultiple of its L2 data unit left in the L1 cache, neither the L1 or L2 data unit is available for accessing by any other CPU. If an aged-out submultiple data unit is publically owned, and it has a submultiple of an L2 data unit left in the L1 cache, the CPID bit is not turned off in step 463.
  • the L2 data unit and any remaining submultiple(s) in L1 can be changed to public ownership (i.e. RO) or they can be invalidated, because the aging-out process makes unlikely the possibility of any other submultiple being accessed.
  • RO public ownership
  • step 402A indicates the EX/RO bit indicates an RO (public) state in the accessed XI directory entry, then other CPUs may have a copy of any parts of the data unit, and the no exit is taken to step 414A.
  • Step 414A tests the other bits in the CPID field (for the other CPUs) to determine which, if any, of other CPUs have a copy of the public data unit, and a specific XI signal is sent only to each CPU having a CPID bit detected in an on state. Then steps 421A and 422 are entered to do their operations (described in the prior paragraph). Step 421 can be executed without waiting for the XI response signal resulting from the specific XI signal previously sent to another CPU by step 414A.
  • step 403A If step 403A is entered and finds the requestor's CPID bit is on, the requestor has a sub-multiple of the requested data unit but it is not the sub-multiple being currently requested since the current request to the XI directory would not have been made if the requested sub-multiple is in the requestor's L1 cache. Then no XI signalling overhead is needed, and the yes exit is taken to steps 421A and 422, which do their operations as described in the prior paragraph.
  • step 501 is entered and finds the aged bit is off in the accessed XI directory entry, then another CPU may have the requested data unit in its L1 cache, and the no exit is taken to step 406A.
  • Step 406A operates like step 414A to test the CPID field but will find only one bit on for another CPU since steps 402A and 403A indicate another CPU has EX ownership.
  • Step 406A sends a specific XI signal to the other CPU (indicated by the on bit in the CPID field) to invalidate its copy of the requested data unit.
  • step 502 is entered.
  • Step 502 tests the state of the "wait state bit" for the requesting CPU.
  • Each CPU has a wait state bit associated with the XI directory controls, and each CPU sets and resets its respective wait state bit to indicate whether the CPU is in wait state or operating state.
  • a CPU in wait state has no need for having data units with either exclusive or public ownership, except in so far as having a data unit saves refetching it when the CPU later goes out of wait state into operating state. It has been found that CPUs often spend a significant amount of time in wait state.
  • step 502 finds the wait state bit is off for the CPU that received the XI signal, the requesting CPU must enter step 409 to wait for the XI response signal resulting from the specific XI signal previously sent to another CPU by step 406A.
  • PR/SM processing resource/system management
  • PR/SM processor resource/system management
  • I/O input/output
  • CPU CPU resources of a data processing system into logical partitions which can respectively have their own operating system wherein an operating system executing in any partition cannot be interfered with by an operating system executing in any other partition.
  • a CPU may be time shared among a plurality of logical partitions.
  • a requesting CPU can shorten the time for obtaining a data unit owned exclusively by another CPU switched by PR/SM to a different logical partition. This is analogous to the shortened time obtained for getting a data unit owned exclusively by another CPU in wait state. In both of these cases, the requesting CPU can immediately obtain a data unit having XI signalling to an exclusively owning CPU without waiting for the XI response signal from that CPU.
  • a set of partition indicator bits are required for each CPU in the system to indicate the current partition in which the CPU is assigned, which are represented by the PR/SM indicator bits in Fig. 6. For example, three bits may be assigned to each CPU in a system having up to seven partitions, and the binary coding of these three bits indicates one of seven as the current partition number, or indicates zero if no partition is currently assigned.
  • Each CPU has the responsibility of maintaining the correct code setting the states of its respective three PR/SM bits, just as it must maintain the setting of its respective wait state bit.
  • step 502 when a CPU in a PR/SM system performs step 502 to test if the wait state bit is on for another CPU receiving an XI signal (identified by the on-state CPID bit), the CPU will also perform step 503 if the XI receiving CPU is not in wait state.
  • Step 503 tests of the XI receiving CPU is in a different PR/SM partition than the requesting CPU. If the XI receiving CPU is in a different partition, the requesting CPU can immediately get the use of the requested data without waiting for the XI receiving CPU to send an XI response signal.
  • Figs. 7B and 8B each represent the operational steps for performing a readonly fetch request to the XI directory in Figs. 1 or 11 which use the XI directory also as an L2 cache.
  • a readonly fetch is a fetch request for giving the requesting processor public ownership of a copy of the requested data unit.
  • a readonly fetch is requested when a processor knows the requested data unit will not be stored into, such as a fetch request for accessing a source operand or for accessing a data unit in an address translation table.
  • the miss path in readonly-fetch case in Fig. 7B is essentially the same as the miss path described for the exclusive-fetch case in Fig. 7A, except step 352B is used instead of step 352A, and step 338 is not used.
  • the difference between steps 352 and 352A is primarily in 352A in Fig. 7B sets the CPID field value to the public ownership value (zero), instead of step 352 in Fig. 7A which set this field to the CPID value for the requesting CPU to indicate its exclusive ownership.
  • Step 338 is not used in Fig. 7B requested data unit was not found in the requestor's L1 cache and therefore need not be invalidated with an XI signal. (Note: If the requested data unit was owned exclusively by the requesting processor, the hit path is taken in which its L1 copy is demoted to public ownership by step 304A.)
  • the hit path described for the readonly-fetch case in Fig. 7B is also similar to the hit path described for the exclusive-fetch case in Fig. 7A, except for steps 304A and 312A are used instead of steps 304 and 312, and step 311 is not used in Fig. 7B.
  • the difference between the specific XI steps 304 and 304A is that 304A does a demote to public ownership instead of a termination of ownership (i.e. invalidation) done by step 304.
  • step 312B sets the CPID value to the public CPID value of zero, instead of the step 312 operation for setting the CPID field to the requester's CPID value.
  • step 311 in Fig. 7B is because any readonly copy of the requested data unit is left in each other CPU's L1 for the readonly fetch in Fig. 7B, while the exclusive-fetch in Fig. 7A needed to invalidate it.
  • the miss path for the readonly case in Fig. 8B is essentially the same as the miss path described for the exclusive-fetch case in Fig. 8A, except step 452B is used instead of step 452A.
  • the difference between steps 452A and 452B is primarily in 452B in Fig. 8B resetting the EX/RO bit to indicate public ownership, instead of step 452A in Fig. 8A which set this bit to indicate exclusive ownership.
  • the hit path for the readonly-fetch case in Fig. 8B is also similar to the hit path described for the exclusive-fetch case in Fig. 8A, except for steps 406B being used instead of step 406A, and step 414A not being used in Fig. 8B.
  • the difference between the specific XI steps 406A and 406B is that 406B does a demote to public ownership instead of a termination of ownership (i.e. demote to RO, instead of invalidation) in the L1 of the other CPU having EX ownership).
  • step 414A in Fig. 8B is because a readonly copy of the requested data unit is left in each other CPU's L1 for the readonly fetch in Fig. 8B, while the exclusive-fetch in Fig. 8A needed to invalidate it.
  • Figs. 7C and 8C each represent the operational steps for performing a conditionally-public (i.e. conditionally-readonly) request to the XI directory in Fig. 1 or 11.
  • a conditionally-public fetch gives the requesting processor either readonly or exclusive ownership of a data unit depending on existing conditions which are indicated in the processes of Figs. 7C and 8C, but is biased in favor of giving readonly ownership.
  • miss path in conditionally-public case in Fig. 7C is the same as the miss path described for the exclusive-fetch case in Fig. 7A, in which the requested data unit is provided in the L1 cache with exclusive ownership.
  • the hit path described for the conditionally-public case in Fig. 7C is also similar to the hit path described for the readonly-fetch case in Fig. 7B, except for the yes exit from step 303 which does not change the currently existing exclusive ownership of the requesting CPU, and fetches into the requesting L1 cache a different sub-multiple of the data unit in the L2 cache with exclusive ownership.
  • miss path for the conditionally-public case in Fig. 8C is the same as the miss path described for the exclusive-fetch case in Fig. 8A.
  • the hit path for the conditionally-public case in Fig. 8C is also similar to the hit path described for the readonly-fetch case in Fig. 8B, except for the yes exits from steps 403B and 501. Neither step 403B or 501 in Fig. 8C changes the currently existing exclusive ownership of the requesting CPU, and fetches the requested sub-multiple into the requesting L1 cache with the exclusive ownership indicated for the corresponding L2 data unit.
  • Figs. 7D and 8D each represent the operational steps used by a "promote to exclusive" command for the different embodiments of the invention described herein for changing the ownership of a data unit from readonly to exclusive in the XI directory in Fig. 1 or 11 and in the L1 cache of the requesting processor.
  • a "promote to exclusive" command can miss, as well as hit, in the XI directory in this embodiment.
  • a "promote to exclusive" command can not miss in the XI directory because its philosophy requires all valid L1 data units to exist in the XI directory and the L2 cache.
  • step 321 and 323 are entered from step 332, in which step 323 represents the sending by the XI directory controls of a confirmation signal to the requesting CPU to change its ownership indication field in its L1 cache directory to an EX indication, which signals the CPU that it can now use the data unit exclusively, and may now store into it.
  • step 302 uses step 302 to test the CPID value in the accessed XI directory entry. If the CPID field has its RO value, the yes exit to step 311 which sends a general XI signal to all CPUs, since any or all may have an RO copy in its L1 cache. No XI response is needed. Then step 312 sets the CPID value in the XI directory entry to the CPID value of the requesting CPU. Step 321 is entered from step 312, and then step 323 sends a confirmation signal to the requesting CPU to change its ownership indication field in its L1 cache directory to an EX indication, which indicates to the CPU that it can now exclusively use that data unit in its L1 cache.
  • Fig. 8D the processing for the "promote to exclusive" command is much simpler because the philosophy used in this embodiment does not allow for an XI directory "miss" to occur, as has been previously explained herein. Therefore the only path represented in Fig. 8D is the hit path, in which step 411A sends a specific XI signal only to each CPU having its CPID bit set on, since only those CPUs in the system can have a copy of the public data unit at issue. Then, step 412A sets on the CPID bit of the requesting CPU in the CPID field and sets on the EX/RO bit field to exclusive state in the accessed XI directory entry. Finally, step 423 signals the requesting CPU to set its L1 cache entry to EX state and that it can use this data unit exclusively.
  • Figs. 12 The processes in Figs. 12 is similar the the processes described for Figs. 7 and 8, except no L2 cache is used in Figs. 12 as represented in Fig. 9.
  • the CPID and EX/RO fields are provided in each XI directory entry, so that only specific XI signalling is used for both exclusive and readonly requests.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
EP92102390A 1991-04-03 1992-02-13 Verfahren und Vorrichtung für ein Verzeichnis zum kreuzweisen Ungültigerklären Withdrawn EP0507063A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US07/679,900 US5265232A (en) 1991-04-03 1991-04-03 Coherence control by data invalidation in selected processor caches without broadcasting to processor caches not having the data
US679900 1991-04-03

Publications (1)

Publication Number Publication Date
EP0507063A1 true EP0507063A1 (de) 1992-10-07

Family

ID=24728850

Family Applications (1)

Application Number Title Priority Date Filing Date
EP92102390A Withdrawn EP0507063A1 (de) 1991-04-03 1992-02-13 Verfahren und Vorrichtung für ein Verzeichnis zum kreuzweisen Ungültigerklären

Country Status (3)

Country Link
US (1) US5265232A (de)
EP (1) EP0507063A1 (de)
JP (1) JPH0775010B2 (de)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995002865A1 (fr) * 1993-07-15 1995-01-26 Bull S.A. Procede de gestion de memoires d'un systeme informatique, systeme informatique et memoire mettant en ×uvre le procede
EP0690383A2 (de) * 1994-06-01 1996-01-03 Fujitsu Limited Speichesteuerverfahren und -vorrichtung, geeignet für ein Informationsverarbeitungssystem mit einem Cachespeicher
EP0735480A1 (de) * 1995-03-31 1996-10-02 Sun Microsystems, Inc. Cache-kohärentes Computersystem, das Entwertungs- und Rückschreiboperationen minimiert
WO1996039666A1 (en) * 1995-06-05 1996-12-12 Advanced Micro Devices, Inc. Reducing cache snooping overhead in a multilevel cache system with multiple bus masters and a shared level two cache
US5740399A (en) * 1995-08-23 1998-04-14 International Business Machines Corporation Modified L1/L2 cache inclusion for aggressive prefetch
US5758119A (en) * 1995-08-23 1998-05-26 International Business Machines Corp. System and method for indicating that a processor has prefetched data into a primary cache and not into a secondary cache
EP1021768A1 (de) * 1996-09-16 2000-07-26 Corollary, Inc. System und verfahren zum unterhalten von speicherkohärent in einem rechnersystem mit multiplen systembussen
EP1684181A2 (de) 2005-01-24 2006-07-26 Fujitsu Limited Speichersteuerungsvorrichtung und Verfahren
CN100514311C (zh) * 2005-02-11 2009-07-15 国际商业机器公司 用于实现组合式数据/相关性高速缓存的方法和装置
CN110959154A (zh) * 2017-07-20 2020-04-03 阿里巴巴集团控股有限公司 用于线程本地存储数据访问的私有高速缓存

Families Citing this family (80)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5553262B1 (en) * 1988-01-21 1999-07-06 Mitsubishi Electric Corp Memory apparatus and method capable of setting attribute of information to be cached
US5404483A (en) * 1990-06-29 1995-04-04 Digital Equipment Corporation Processor and method for delaying the processing of cache coherency transactions during outstanding cache fills
US5490261A (en) * 1991-04-03 1996-02-06 International Business Machines Corporation Interlock for controlling processor ownership of pipelined data for a store in cache
JPH0512117A (ja) * 1991-07-04 1993-01-22 Toshiba Corp キヤツシユ一致化方式
US5875464A (en) * 1991-12-10 1999-02-23 International Business Machines Corporation Computer system with private and shared partitions in cache
US5584017A (en) * 1991-12-19 1996-12-10 Intel Corporation Cache control which inhibits snoop cycles if processor accessing memory is the only processor allowed to cache the memory location
US5813030A (en) * 1991-12-31 1998-09-22 Compaq Computer Corp. Cache memory system with simultaneous access of cache and main memories
US5408653A (en) * 1992-04-15 1995-04-18 International Business Machines Corporation Efficient data base access using a shared electronic store in a multi-system environment with shared disks
US5355471A (en) * 1992-08-14 1994-10-11 Pyramid Technology Corporation Multiprocessor cache coherency tester that exercises the coherency logic exhaustively and also detects errors in a processor using an automatic CPU sort
US5455942A (en) * 1992-10-01 1995-10-03 International Business Machines Corporation Partial page write detection for a shared cache using a bit pattern written at the beginning and end of each page
JP2500101B2 (ja) * 1992-12-18 1996-05-29 インターナショナル・ビジネス・マシーンズ・コーポレイション 共用変数の値を更新する方法
JP2746530B2 (ja) * 1993-01-30 1998-05-06 洲 植 全 共有メモリマルチプロセッサ
US5642486A (en) * 1993-07-15 1997-06-24 Unisys Corporation Invalidation queue with "bit-sliceability"
US5598551A (en) * 1993-07-16 1997-01-28 Unisys Corporation Cache invalidation sequence system utilizing odd and even invalidation queues with shorter invalidation cycles
US5636365A (en) * 1993-10-05 1997-06-03 Nec Corporation Hierarchical buffer memories for selectively controlling data coherence including coherence control request means
US5530832A (en) * 1993-10-14 1996-06-25 International Business Machines Corporation System and method for practicing essential inclusion in a multiprocessor and cache hierarchy
GB2284494B (en) * 1993-11-26 1998-09-09 Hitachi Ltd Distributed shared memory management system
US5519846A (en) * 1993-12-23 1996-05-21 Unisys Corporation Multiprocessor system with scheme for managing allocation and reservation of cache segments in a cache system employing round-robin replacement and exclusive access
US5590309A (en) * 1994-04-01 1996-12-31 International Business Machines Corporation Storage protection cache and backing storage having system control element data cache pipeline and storage protection bits in a stack array with a stack directory for the stack array
JP2634147B2 (ja) * 1994-09-16 1997-07-23 インターナショナル・ビジネス・マシーンズ・コーポレイション コンピュータシステム、キャッシュヒットの判定方法
US5584013A (en) * 1994-12-09 1996-12-10 International Business Machines Corporation Hierarchical cache arrangement wherein the replacement of an LRU entry in a second level cache is prevented when the cache entry is the only inclusive entry in the first level cache
US5692153A (en) * 1995-03-16 1997-11-25 International Business Machines Corporation Method and system for verifying execution order within a multiprocessor data processing system
US5752264A (en) * 1995-03-31 1998-05-12 International Business Machines Corporation Computer architecture incorporating processor clusters and hierarchical cache memories
JP2776759B2 (ja) * 1995-04-14 1998-07-16 甲府日本電気株式会社 ロックリクエスト制御装置
US5634110A (en) * 1995-05-05 1997-05-27 Silicon Graphics, Inc. Cache coherency using flexible directory bit vectors
US5875462A (en) * 1995-12-28 1999-02-23 Unisys Corporation Multi-processor data processing system with multiple second level caches mapable to all of addressable memory
US5787477A (en) * 1996-06-18 1998-07-28 International Business Machines Corporation Multi-processor cache coherency protocol allowing asynchronous modification of cache data
US5752258A (en) * 1996-07-01 1998-05-12 Sun Microsystems, Inc. Encoding method for directory state in cache coherent distributed shared memory system
US5950226A (en) * 1996-07-01 1999-09-07 Sun Microsystems, Inc. Multiprocessing system employing a three-hop communication protocol
US5813029A (en) 1996-07-09 1998-09-22 Micron Electronics, Inc. Upgradeable cache circuit using high speed multiplexer
US5963978A (en) * 1996-10-07 1999-10-05 International Business Machines Corporation High level (L2) cache and method for efficiently updating directory entries utilizing an n-position priority queue and priority indicators
US5893161A (en) * 1996-11-12 1999-04-06 Hewlett-Packard Co. Method for allocating ownership of portions of memory in a coherent memory system
US5946710A (en) * 1996-11-14 1999-08-31 Unisys Corporation Selectable two-way, four-way double cache interleave scheme
US6078997A (en) * 1996-12-09 2000-06-20 Intel Corporation Directory-based coherency system for maintaining coherency in a dual-ported memory system
US5848434A (en) * 1996-12-09 1998-12-08 Intel Corporation Method and apparatus for caching state information within a directory-based coherency memory system
US5960455A (en) * 1996-12-30 1999-09-28 Unisys Corporation Scalable cross bar type storage controller
US5875201A (en) * 1996-12-30 1999-02-23 Unisys Corporation Second level cache having instruction cache parity error control
US6122711A (en) 1997-01-07 2000-09-19 Unisys Corporation Method of and apparatus for store-in second level cache flush
US5860093A (en) * 1997-01-21 1999-01-12 Unisys Corporation Reduced instruction processor/storage controller interface
US5900009A (en) * 1997-03-21 1999-05-04 Emc Corporation System and method for accessing records in a cache slot which are associated with a current owner storage element or at least one previous owner storage element
US6119197A (en) * 1997-10-31 2000-09-12 Micron Technology, Inc. Method for providing and operating upgradeable cache circuitry
US6052760A (en) * 1997-11-05 2000-04-18 Unisys Corporation Computer system including plural caches and utilizing access history or patterns to determine data ownership for efficient handling of software locks
US6633958B1 (en) * 1997-11-17 2003-10-14 Silicon Graphics, Inc. Multiprocessor computer system and method for maintaining cache coherence utilizing a multi-dimensional cache coherence directory structure
US6012127A (en) * 1997-12-12 2000-01-04 Intel Corporation Multiprocessor computing apparatus with optional coherency directory
US6334172B1 (en) * 1998-02-17 2001-12-25 International Business Machines Corporation Cache coherency protocol with tagged state for modified values
US6341336B1 (en) * 1998-02-17 2002-01-22 International Business Machines Corporation Cache coherency protocol having tagged state used with cross-bars
US6289419B1 (en) 1998-03-06 2001-09-11 Sharp Kabushiki Kaisha Consistency control device merging updated memory blocks
US6625694B2 (en) * 1998-05-08 2003-09-23 Fujitsu Ltd. System and method for allocating a directory entry for use in multiprocessor-node data processing systems
US6295598B1 (en) * 1998-06-30 2001-09-25 Src Computers, Inc. Split directory-based cache coherency technique for a multi-processor computer system
US6493798B2 (en) 1998-09-21 2002-12-10 Micron Technology, Inc. Upgradeable cache circuit using high speed multiplexer
US6378042B1 (en) * 1999-08-11 2002-04-23 Fast-Chip, Inc. Caching associative memory
US6457100B1 (en) 1999-09-15 2002-09-24 International Business Machines Corporation Scaleable shared-memory multi-processor computer system having repetitive chip structure with efficient busing and coherence controls
US6751698B1 (en) * 1999-09-29 2004-06-15 Silicon Graphics, Inc. Multiprocessor node controller circuit and method
JP2001265652A (ja) * 2000-03-17 2001-09-28 Hitachi Ltd キャッシュディレクトリ構成方法および情報処理装置
US6668308B2 (en) * 2000-06-10 2003-12-23 Hewlett-Packard Development Company, L.P. Scalable architecture based on single-chip multiprocessing
US6754859B2 (en) * 2001-01-03 2004-06-22 Bull Hn Information Systems Inc. Computer processor read/alter/rewrite optimization cache invalidate signals
US6647466B2 (en) * 2001-01-25 2003-11-11 Hewlett-Packard Development Company, L.P. Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy
US6871268B2 (en) * 2002-03-07 2005-03-22 International Business Machines Corporation Methods and systems for distributed caching in presence of updates and in accordance with holding times
US8185602B2 (en) * 2002-11-05 2012-05-22 Newisys, Inc. Transaction processing using multiple protocol engines in systems having multiple multi-processor clusters
US7082500B2 (en) * 2003-02-18 2006-07-25 Cray, Inc. Optimized high bandwidth cache coherence mechanism
US20050108481A1 (en) * 2003-11-17 2005-05-19 Iyengar Arun K. System and method for achieving strong data consistency
US8332592B2 (en) * 2004-10-08 2012-12-11 International Business Machines Corporation Graphics processor with snoop filter
US7577795B2 (en) * 2006-01-25 2009-08-18 International Business Machines Corporation Disowning cache entries on aging out of the entry
US20080104333A1 (en) * 2006-10-31 2008-05-01 Veazey Judson E Tracking of higher-level cache contents in a lower-level cache
US8683139B2 (en) 2006-10-31 2014-03-25 Hewlett-Packard Development Company, L.P. Cache and method for cache bypass functionality
US7669011B2 (en) * 2006-12-21 2010-02-23 Advanced Micro Devices, Inc. Method and apparatus for detecting and tracking private pages in a shared memory multiprocessor
US7966453B2 (en) 2007-12-12 2011-06-21 International Business Machines Corporation Method and apparatus for active software disown of cache line's exlusive rights
JP2009151457A (ja) * 2007-12-19 2009-07-09 Nec Corp キャッシュメモリシステムおよびキャッシュメモリ制御方法
US8560776B2 (en) * 2008-01-29 2013-10-15 International Business Machines Corporation Method for expediting return of line exclusivity to a given processor in a symmetric multiprocessing data processing system
US8868847B2 (en) * 2009-03-11 2014-10-21 Apple Inc. Multi-core processor snoop filtering
US9075732B2 (en) * 2010-06-15 2015-07-07 International Business Machines Corporation Data caching method
JP5687603B2 (ja) 2011-11-09 2015-03-18 株式会社東芝 プログラム変換装置、プログラム変換方法、および変換プログラム
CN103049392B (zh) * 2012-10-17 2016-04-06 华为技术有限公司 缓存目录的实现方法及装置
JP5936152B2 (ja) 2014-05-17 2016-06-15 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation メモリアクセストレース方法
US10114752B2 (en) 2014-06-27 2018-10-30 International Business Machines Corporation Detecting cache conflicts by utilizing logical address comparisons in a transactional memory
SG11201803730TA (en) * 2015-11-04 2018-06-28 Samsung Electronics Co Ltd Systems and methods for implementing coherent memory in a multiprocessor system
US10073783B2 (en) 2016-11-23 2018-09-11 Advanced Micro Devices, Inc. Dual mode local data store
US10909036B2 (en) * 2018-11-09 2021-02-02 International Business Machines Corporation Management of shared memory using asynchronous invalidation signals
WO2020175720A1 (ko) * 2019-02-28 2020-09-03 엘지전자 주식회사 디지털 디바이스 및 그 제어 방법
CN112579480B (zh) * 2020-12-09 2022-12-09 海光信息技术股份有限公司 存储管理方法、存储管理装置以及计算机系统

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0390538A2 (de) * 1989-03-28 1990-10-03 Kabushiki Kaisha Toshiba Hierarchische Cache-Speichervorrichtung

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0390538A2 (de) * 1989-03-28 1990-10-03 Kabushiki Kaisha Toshiba Hierarchische Cache-Speichervorrichtung

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PROCEEDINGS OF THE NATIONAL COMPUTER CONFERENCE. vol. 45, 1976, AFIPS,MONTVALE US pages 749 - 753; TANG: 'Cache system design in the tightly coupled multiprocessor system' *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6240491B1 (en) 1993-07-15 2001-05-29 Bull S.A. Process and system for switching between an update and invalidate mode for each cache block
WO1995002865A1 (fr) * 1993-07-15 1995-01-26 Bull S.A. Procede de gestion de memoires d'un systeme informatique, systeme informatique et memoire mettant en ×uvre le procede
EP0690383A2 (de) * 1994-06-01 1996-01-03 Fujitsu Limited Speichesteuerverfahren und -vorrichtung, geeignet für ein Informationsverarbeitungssystem mit einem Cachespeicher
EP0690383A3 (de) * 1994-06-01 1996-06-05 Fujitsu Ltd Speichesteuerverfahren und -vorrichtung, geeignet für ein Informationsverarbeitungssystem mit einem Cachespeicher
US5829039A (en) * 1994-06-01 1998-10-27 Fujitsu Limited Memory control method/device for maintaining cache consistency with status bit indicating that a command is being processed with respect to a memory area
EP0735480A1 (de) * 1995-03-31 1996-10-02 Sun Microsystems, Inc. Cache-kohärentes Computersystem, das Entwertungs- und Rückschreiboperationen minimiert
US5706463A (en) * 1995-03-31 1998-01-06 Sun Microsystems, Inc. Cache coherent computer system that minimizes invalidation and copyback operations
WO1996039666A1 (en) * 1995-06-05 1996-12-12 Advanced Micro Devices, Inc. Reducing cache snooping overhead in a multilevel cache system with multiple bus masters and a shared level two cache
US5740400A (en) * 1995-06-05 1998-04-14 Advanced Micro Devices Inc. Reducing cache snooping overhead in a multilevel cache system with multiple bus masters and a shared level two cache by using an inclusion field
US5740399A (en) * 1995-08-23 1998-04-14 International Business Machines Corporation Modified L1/L2 cache inclusion for aggressive prefetch
US5758119A (en) * 1995-08-23 1998-05-26 International Business Machines Corp. System and method for indicating that a processor has prefetched data into a primary cache and not into a secondary cache
EP1021768A1 (de) * 1996-09-16 2000-07-26 Corollary, Inc. System und verfahren zum unterhalten von speicherkohärent in einem rechnersystem mit multiplen systembussen
EP1021768A4 (de) * 1996-09-16 2002-08-14 Corollary Inc System und verfahren zum unterhalten von speicherkohärent in einem rechnersystem mit multiplen systembussen
US6622214B1 (en) 1996-09-16 2003-09-16 Intel Corporation System and method for maintaining memory coherency in a computer system having multiple system buses
EP1684181A2 (de) 2005-01-24 2006-07-26 Fujitsu Limited Speichersteuerungsvorrichtung und Verfahren
EP1684181A3 (de) * 2005-01-24 2009-05-06 Fujitsu Limited Speichersteuerungsvorrichtung und Verfahren
US8032717B2 (en) 2005-01-24 2011-10-04 Fujitsu Limited Memory control apparatus and method using retention tags
CN100514311C (zh) * 2005-02-11 2009-07-15 国际商业机器公司 用于实现组合式数据/相关性高速缓存的方法和装置
US8131936B2 (en) 2005-02-11 2012-03-06 International Business Machines Corporation Method and apparatus for implementing a combined data/coherency cache
CN110959154A (zh) * 2017-07-20 2020-04-03 阿里巴巴集团控股有限公司 用于线程本地存储数据访问的私有高速缓存
CN110959154B (zh) * 2017-07-20 2023-06-13 阿里巴巴集团控股有限公司 用于线程本地存储数据访问的私有高速缓存

Also Published As

Publication number Publication date
JPH0561770A (ja) 1993-03-12
US5265232A (en) 1993-11-23
JPH0775010B2 (ja) 1995-08-09

Similar Documents

Publication Publication Date Title
US5265232A (en) Coherence control by data invalidation in selected processor caches without broadcasting to processor caches not having the data
US5490261A (en) Interlock for controlling processor ownership of pipelined data for a store in cache
US4797814A (en) Variable address mode cache
US5875472A (en) Address conflict detection system employing address indirection for use in a high-speed multi-processor system
US6038647A (en) Cache memory device and method for providing concurrent independent multiple accesses to different subsets within the device
US5450564A (en) Method and apparatus for cache memory access with separate fetch and store queues
US5274790A (en) Cache memory apparatus having a plurality of accessibility ports
US4400770A (en) Cache synonym detection and handling means
US6523091B2 (en) Multiple variable cache replacement policy
EP0088239B1 (de) Mehrprozessor-Pufferspeicher-Ersatz unter Prozesssteuerung
JP2825550B2 (ja) 多重仮想空間アドレス制御方法および計算機システム
US5185871A (en) Coordination of out-of-sequence fetching between multiple processors using re-execution of instructions
US5146603A (en) Copy-back cache system having a plurality of context tags and setting all the context tags to a predetermined value for flushing operation thereof
EP1221095B1 (de) Ein arbitrierungsprotokoll für einen gemeinsamen daten-zwischenspeicher
US5440707A (en) Instruction and data cache with a shared TLB for split accesses and snooping in the same clock cycle
US4631660A (en) Addressing system for an associative cache memory
JPS6043540B2 (ja) デ−タ処理装置
US6571316B1 (en) Cache memory array for multiple address spaces
US6298411B1 (en) Method and apparatus to share instruction images in a virtual cache
US5675765A (en) Cache memory system with independently accessible subdivided cache tag arrays
US5361342A (en) Tag control system in a hierarchical memory control system
EP0675443A1 (de) Vorrichtung und Verfahren zum Zugriff auf direkt abgebildete Cachespeicher
EP0212678B1 (de) Anordnung zur Erkennung und Verarbeitung von Synonymen in Cache-Speichern
JP2008512758A (ja) 仮想アドレス・キャッシュに格納されたデータを共用する仮想アドレス・キャッシュ及び方法
US5966737A (en) Apparatus and method for serialized set prediction

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB

17P Request for examination filed

Effective date: 19930218

17Q First examination report despatched

Effective date: 19940405

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19980703