US20050289300A1 - Disable write back on atomic reserved line in a small cache system - Google Patents

Disable write back on atomic reserved line in a small cache system Download PDF

Info

Publication number
US20050289300A1
US20050289300A1 US10/875,953 US87595304A US2005289300A1 US 20050289300 A1 US20050289300 A1 US 20050289300A1 US 87595304 A US87595304 A US 87595304A US 2005289300 A1 US2005289300 A1 US 2005289300A1
Authority
US
United States
Prior art keywords
write back
reservation
cache
atomic
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/875,953
Inventor
Roy Kim
Yasukichi Okawa
Thuong Truong
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Interactive Entertainment Inc
International Business Machines Corp
Original Assignee
Sony Computer Entertainment Inc
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Computer Entertainment Inc, International Business Machines Corp filed Critical Sony Computer Entertainment Inc
Priority to US10/875,953 priority Critical patent/US20050289300A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TRUONG, TONY THUONG, KIM, ROY MOONSEUK
Assigned to SONY COMPUTER ENTERTAINMENT INC. reassignment SONY COMPUTER ENTERTAINMENT INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: OKAWA, YASUKICHI
Priority to CNA200580020710XA priority patent/CN1985245A/en
Priority to EP05856249A priority patent/EP1769365A2/en
Priority to PCT/IB2005/004003 priority patent/WO2006085140A2/en
Priority to KR1020067027236A priority patent/KR20070040340A/en
Priority to JP2007517534A priority patent/JP2008503821A/en
Publication of US20050289300A1 publication Critical patent/US20050289300A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/126Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30072Arrangements for executing specific machine instructions to perform conditional operations, e.g. using predicates or guards
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30087Synchronisation or serialisation instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/3834Maintaining memory consistency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means

Definitions

  • the invention relates generally to the field of computer systems and, more particularly, to small cache systems in microprocessors.
  • caches are designed to provide a way to keep data close to the processor with quicker access time for its data. Larger caches give better system performance overall but inadvertently can induce more latency and design complexities compared to smaller caches. Generally, smaller caches are designed to provide a fast way for a processor to synchronize or communicate to other processors in system applications level, especially in networking or graphics environment.
  • Processors send data to memory and retrieve data from memory, through Load and Store commands, respectively.
  • Data from a system memory fills up the cache.
  • a desirable condition is where most or all of data to be accessed by the processor is in the cache. This could happen if an application data size is same or smaller than the cache size.
  • cache size is usually limited by design or technology and can not contain the whole application data. This can be a problem when the processor accesses the new data, not in the cache, and no cache space is available to put the new data. Hence, the cache controller needs to find an appropriate space in the cache for the new data when it arrives from memory.
  • LRU east Recently Used
  • the LRU algorithm determines which location to be used for the new data based on the data access history information. If LRU selects a line which is consistent with the system memory, for example, shared state, then the new data will be over written to that location. When LRU selects a line that is marked Modified, which means that data is not consistent with the system memory and unique, cache controller forces the Modified data of this location to be written back to the system memory. This action is called a write back, or a castout, and the cache location that contains the write back data is called Victim Cache Line.
  • a bus agent the bus interface unit that handles the bus command for the cache, attempts to complete the write back operation as soon as it could, by sending the data to the system memory via Bus operations.
  • WB Write back
  • write back is a long latency bus operation since the data is going to the main memory.
  • cache control schemes There are two different kinds of cache control schemes. These are coherent cache scheme and non-coherent. In non-coherent, each cache has a unique copy of the data, and there can be no other cache with the same data. This approach is relatively easy to implement. However, this is inefficient, because there may be times when data should be distributed throughout a multiprocessor system. Therefore, a coherency cache scheme can be used, which ensures that the most up-to date data is used, distributed, or otherwise marked as valid.
  • MESI Modified, Exclusive, Shared, and Invalid
  • Snooping is the process whereby slave caches watch the system bus and compare the transferred address to addresses in the cache directory in order to keep the cache coherency. Additional operations can be performed in the case that a match is found.
  • bus snooping or bus watching are equivalent.
  • An invalidate command which is used as part of a snoop command, is issued to tell the other caches that their data is no longer valid and should mark that line invalid.
  • the invalid state indicates that the line in the cache is invalid in the cache, or that the line is no longer available. Therefore, this line of data within the cache is free to be overwritten by other data transfers.
  • Atomic-Facility is implemented at coherency point like a snoop cache to snoop other processor's store operations, and also to improve performance by caching a lock line.
  • the first is load and reserve instruction. Load and reserve is issued by a source processor and looks at its associated cache to determine whether the cache has the data requested. If the target cache has the data, then a “reservation” flag is set for that cache. The reservation flag means that the processor is making a reservation for that line for lock acquisition.
  • a lock acquisition (gaining a sole ownership) of a block of data in main memory is accomplished by first making a reservation using Load and Reserve and then modifying the reserved line to indicate its ownership via Store conditional instruction.
  • Store conditional is conditional on the reservation flag is still active.
  • Reservation can be lost by other processors wanting the same lock acquisition by executing Store conditional instruction or other reservation kill type snoop commands on the same line.
  • the processor then copies the reserved information from the cache into the processor for processing Load and Reserve. Basically the processor is looking for an indication in the reserved line for unlocked data pattern so that Store conditional can be executed to complete the lock.
  • the present invention provides for managing an atomic facility cache write back controller.
  • a reservation pointer pointing to the reserved line in the atomic facility cache data array is established.
  • An entry for the reservation point for the write back selection is removed whereby the valid reservation line is precluded from being selected for the write back.
  • a write back selection is made by employment of a least recently used (LRU) algorithm.
  • the write back selection is made with respect to reservation pointer.
  • LRU least recently used
  • FIG. 1 schematically depicts a multi-processing system
  • FIG. 2 schematically depicts an atomic facility cache
  • FIG. 3 schematically illustrates a Lock acquisition command example
  • FIG. 4 illustrates a write back operation flow chart
  • FIG. 5 illustrates an example block diagram of an atomic facility cache.
  • a processing unit may be a sole processor of computations in a device.
  • the PU is typically referred to as an MPU (main processing unit).
  • the processing unit may also be one of many processing units that share the computational load according to some methodology or algorithm developed for a given computational device.
  • all references to processors shall use the term MPU whether the MPU is the sole computational element in the device or whether the MPU is sharing the computational element with other MPUs, unless otherwise indicated.
  • FIG. 1 disclosed is a multi-processor system 100 with a general central processor unit (MPU 1 ) 110 , (MPU 2 ) 111 which can include an instruction unit, instruction cache, data cache, fixed point unit, floating point, local storage, and so on.
  • Each processor is connected to a lower level cache called Atomic Facility (AF).
  • Atomic Facility (AF 1 Cache) 120 , (AF 2 Cache) 121 is connected to the Bus Interface unit (Bus IF) 130 , (Bus IF) 131 and which in turn connects to the System Bus 140 .
  • Other processor's caches are connected to the system bus via bus interface units to have inter-processor communications.
  • a memory controller (Mem Ctr 1 ) 150 is attached to the system bus 140 as well.
  • a System Memory 151 is connected to the memory controller for common storage shared by multiple processors.
  • the system 100 provides a mechanism to disable write back operation on the reserved line from a Load and Reserve instruction of the lock acquisition software loop.
  • the reserved line from the Load and reserve instruction is used in subsequent Store condition instruction in this lock acquisition loop.
  • the victim line for write back is selected by LRU algorithm and the reservation line is not selected by skipping over this pointer.
  • Atomic facility includes data array circuitry 146 for data array and its control logic.
  • Control logic includes a directory 147 , RC (Read and Claim) finite state machine 143 , to handle instructions from processor core, WB (write back) state machine to handle write back 144 and Snoop state machine 145 .
  • Directory 147 holds the cache tags and its states.
  • the RC machine 143 executes atomic instructions called, load and reserve, store conditional instructions for inter process synchronization.
  • One purpose of this series of instructions is to synchronize operations between processors by giving ownership of common data to a processor in orderly fashion in multi-processor system.
  • a purpose, generally, of this series of instructions, is to synchronize operations between processors by giving ownership of the data to one processor at a time in multi-processor system.
  • WB machine 144 handles write back for the RC machine when cache miss occur for load or store operations issued by MPU and when the atomic facility (AF) cache is full, and victim entry is modified state.
  • Snoop machine 145 handles snoop operations coming from the system bus to maintain memory coherency throughout the system.
  • Lock acquisition operation entails two main atomic instructions, a Load and Reserve atomic instruction, a Store conditional atomic instruction.
  • the Lock acquisition scenario as in MPU 1 will first loop on Load and Reserve at “A” instruction until the released lock data pattern, zero's for simplicity, is loaded. During this instruction, a reservation flag is set with the reservation address in the RC machine. Once a lock is released by another processor, it can continue on to the next instruction called Store Conditional at “A”. This is a step to finalize the lock by storing its processor ID into the atomic line at address “A”. However this Store is conditional on reservation flag still being active. Another processor could have issued a store command to acquire same lock right before this Store conditional instruction.
  • this store can be snooped by receiving a cache-line-kill or a read-exclusive snoop command on the same lock line address, which kills the current reservation.
  • a reservation flag is reset. If lock acquisition is unsuccessful, it restarts from load and reserve again. Therefore, the processor has a full ownership of the common storage area to do its work. During this time, other processors are lock out for any access to the common area. Once the work is completed, it releases the lock by storing ‘0’ to address “A.” At this time, a second processor, MPU 2 can attain a lock when the second processor acquires the latest “A” data for the Load and Reserve instruction seeing the zero data pattern. The second processor continues with Store conditional instruction to finalize the lock as described above on the first processor.
  • FIG. 4 illustrated is one embodiment method 400 of write back operation.
  • the method 400 describes a decision making process on the write back, as to whether write back is needed or not.
  • this example implementation is such that the atomic facility (AF 142 ) has only one write back (WB) machine.
  • a write back request is dispatched by a ‘read and claim’ (RC) machine when load or store instructions and a directory lookup occur.
  • RC read and claim
  • step 402 it is determined whether there is an executed RC miss on DIR (Directory) lookup and there is no room in the AF. If there is not, then in step 407 (will add), it is determined that a write back is not needed, and the method ends.
  • DIR Directory
  • step 403 the RC dispatches WB machine right after DIR lookup 301 and found a miss with no empty space ( 302 and 303 ) in Data Array. If there is an empty space in Data Array, then write back is not needed. If there is not an empty space, step 404 executes.
  • step 404 the victim entry is chosen by the least recently used algorithm. If the designated least-recently-used victim entry 404 is modified, WB has to write the modified line 405 back to memory in order to make a room in AF.
  • step 405 it is determined whether the victim entry is modified. If no, step 407 executes, and write back is deemed not to be needed. WB machine selects victim entry by using the Least Recently Used algorithm, modified and skips over the reservation entry. It continues with storing the victim entry to the memory to complete the write back operation 406 .
  • FIG. 5 illustrated is a system 500 to manage the Atomic Facility 120 , there is a pointer to point the cache line in Atomic Facility Data cache where Reservation exists.
  • a victim pointer is used to write back a modified entry when there is a miss from an atomic instruction; the victim pointer denotes which information is to be written back out of the atomic cache, when the missed data is being reloaded. Since LRU algorithm never select Reservation pointer as victim pointer, the Load and Reserve data will never be written back to memory since it is used on subsequent Store conditional instruction. Therefore this capability will improve over all performance of an atomic operation in the Atomic Facility cache.

Abstract

The present invention provides for managing an atomic facility cache write back state machine. A first write back selection is made. A reservation pointer pointing to the reserved line in the atomic facility data array is established. A next write back selection is made. An entry for the reservation point for the next write back selection is removed, whereby the valid reservation line is precluded form being selected for the write back. This prevents a modified command from being invalidated.

Description

    TECHNICAL FIELD
  • The invention relates generally to the field of computer systems and, more particularly, to small cache systems in microprocessors.
  • BACKGROUND
  • High performance processing systems require fast memory access and low memory latency, to quickly get data to process. Because system memory can be slow to provide data to a processor, caches are designed to provide a way to keep data close to the processor with quicker access time for its data. Larger caches give better system performance overall but inadvertently can induce more latency and design complexities compared to smaller caches. Generally, smaller caches are designed to provide a fast way for a processor to synchronize or communicate to other processors in system applications level, especially in networking or graphics environment.
  • Processors send data to memory and retrieve data from memory, through Load and Store commands, respectively. Data from a system memory fills up the cache. A desirable condition is where most or all of data to be accessed by the processor is in the cache. This could happen if an application data size is same or smaller than the cache size. In general, cache size is usually limited by design or technology and can not contain the whole application data. This can be a problem when the processor accesses the new data, not in the cache, and no cache space is available to put the new data. Hence, the cache controller needs to find an appropriate space in the cache for the new data when it arrives from memory.
  • An LRU (Least Recently Used) algorithm is used by a cache controller to handle this situation. The LRU algorithm determines which location to be used for the new data based on the data access history information. If LRU selects a line which is consistent with the system memory, for example, shared state, then the new data will be over written to that location. When LRU selects a line that is marked Modified, which means that data is not consistent with the system memory and unique, cache controller forces the Modified data of this location to be written back to the system memory. This action is called a write back, or a castout, and the cache location that contains the write back data is called Victim Cache Line.
  • A bus agent, the bus interface unit that handles the bus command for the cache, attempts to complete the write back operation as soon as it could, by sending the data to the system memory via Bus operations. Write back (“WB”) or write back is a long latency bus operation since the data is going to the main memory.
  • There are two different kinds of cache control schemes. These are coherent cache scheme and non-coherent. In non-coherent, each cache has a unique copy of the data, and there can be no other cache with the same data. This approach is relatively easy to implement. However, this is inefficient, because there may be times when data should be distributed throughout a multiprocessor system. Therefore, a coherency cache scheme can be used, which ensures that the most up-to date data is used, distributed, or otherwise marked as valid.
  • One conventional technology that enforces coherency is the Modified, Exclusive, Shared, and Invalid (MESI) system. In MESI, data in a cache in a multiprocessor system is marked as one of the above, to ensure data coherency. The marking is done by hardware, the memory flow controller.
  • Snooping is the process whereby slave caches watch the system bus and compare the transferred address to addresses in the cache directory in order to keep the cache coherency. Additional operations can be performed in the case that a match is found. The terms bus snooping or bus watching are equivalent.
  • An invalidate command which is used as part of a snoop command, is issued to tell the other caches that their data is no longer valid and should mark that line invalid. In other words, the invalid state indicates that the line in the cache is invalid in the cache, or that the line is no longer available. Therefore, this line of data within the cache is free to be overwritten by other data transfers.
  • In a multi-processor system, some operations like test&set, compare&swap, or fetch&increment (or decrement) needs to be processed inseparably (that is, no other store to the same address can occur in between them). These operations are so called atomic operations. In general these operations are used for lock acquisition or semaphore operations. But some implementations provide only small building blocks like LL(Load-Locked) and SC(Store-Conditional) to build such a more functional operations. And some processors introduce Reservation flag to tie up these two operations (LL and SC) atomically together (that is, LL set up Reservation for lock variable, and SC can successfully store if that Reservation remains. Any store operation to same address can reset Reservation flag.)
  • In general Atomic-Facility is implemented at coherency point like a snoop cache to snoop other processor's store operations, and also to improve performance by caching a lock line. When performing atomic line data requests, there are a number of different commands. The first is load and reserve instruction. Load and reserve is issued by a source processor and looks at its associated cache to determine whether the cache has the data requested. If the target cache has the data, then a “reservation” flag is set for that cache. The reservation flag means that the processor is making a reservation for that line for lock acquisition. In another words, a lock acquisition (gaining a sole ownership) of a block of data in main memory is accomplished by first making a reservation using Load and Reserve and then modifying the reserved line to indicate its ownership via Store conditional instruction. Store conditional is conditional on the reservation flag is still active. Reservation can be lost by other processors wanting the same lock acquisition by executing Store conditional instruction or other reservation kill type snoop commands on the same line. The processor then copies the reserved information from the cache into the processor for processing Load and Reserve. Basically the processor is looking for an indication in the reserved line for unlocked data pattern so that Store conditional can be executed to complete the lock.
  • However, if the cache does not have the information, a BUS command is generated to try to get the information. If no other cache has the information, the data is retrieved from main memory. Once the data is received, reservation flag is set.
  • Due to the characteristic of the atomic operation tight loop and high likelihood of using the same lock again in normal programming, a reserved line from a first lock acquisition loop is needed for the future lock acquisitions. Hence this reserved data from the Load and Reserve instruction should not be written back to main memory as a write back, since the ownership of same data is needed for the subsequent lock acquisition loop. This improves performance since the reserved line write back and reload of same data from main memory is eliminated.
  • Therefore, there is a need for an atomic facility that addresses at least some of the problems associated with conventional atomic reservations.
  • SUMMARY OF THE INVENTION
  • The present invention provides for managing an atomic facility cache write back controller. A reservation pointer pointing to the reserved line in the atomic facility cache data array is established. An entry for the reservation point for the write back selection is removed whereby the valid reservation line is precluded from being selected for the write back. In one aspect, a write back selection is made by employment of a least recently used (LRU) algorithm. In a further aspect, the write back selection is made with respect to reservation pointer.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following Detailed Description taken in conjunction with the accompanying drawings, in which:
  • FIG. 1 schematically depicts a multi-processing system;
  • FIG. 2 schematically depicts an atomic facility cache;
  • FIG. 3 schematically illustrates a Lock acquisition command example;
  • FIG. 4 illustrates a write back operation flow chart; and
  • FIG. 5 illustrates an example block diagram of an atomic facility cache.
  • DETAILED DESCRIPTION
  • In the following discussion, numerous specific details are set forth to provide a thorough understanding of the present invention. However, those skilled in the art will appreciate that the present invention may be practiced without such specific details. In other instances, well-known elements have been illustrated in schematic or block diagram form in order not to obscure the present invention in unnecessary detail. Additionally, for the most part, details concerning network communications, electro-magnetic signaling techniques, and the like, have been omitted inasmuch as such details are not considered necessary to obtain a complete understanding of the present invention, and are considered to be within the understanding of persons of ordinary skill in the relevant art.
  • In the remainder of this description, a processing unit (PU) may be a sole processor of computations in a device. In such a situation, the PU is typically referred to as an MPU (main processing unit). The processing unit may also be one of many processing units that share the computational load according to some methodology or algorithm developed for a given computational device. For the remainder of this description, all references to processors shall use the term MPU whether the MPU is the sole computational element in the device or whether the MPU is sharing the computational element with other MPUs, unless otherwise indicated.
  • It is further noted that, unless indicated otherwise, all functions described herein may be performed in either hardware or software, or some combination thereof. In a preferred embodiment, however, the functions are performed by a processor, such as a computer or an electronic data processor, in accordance with code, such as computer program code, software, and/or integrated circuits that are coded to perform such functions, unless indicated otherwise.
  • Turning to FIG. 1, disclosed is a multi-processor system 100 with a general central processor unit (MPU1) 110, (MPU2) 111 which can include an instruction unit, instruction cache, data cache, fixed point unit, floating point, local storage, and so on. Each processor is connected to a lower level cache called Atomic Facility (AF). Atomic Facility (AF1 Cache) 120, (AF2 Cache) 121 is connected to the Bus Interface unit (Bus IF) 130, (Bus IF) 131 and which in turn connects to the System Bus 140. Other processor's caches are connected to the system bus via bus interface units to have inter-processor communications. In addition to processors, a memory controller (Mem Ctr1) 150 is attached to the system bus 140 as well. A System Memory 151 is connected to the memory controller for common storage shared by multiple processors.
  • Generally, the system 100 provides a mechanism to disable write back operation on the reserved line from a Load and Reserve instruction of the lock acquisition software loop. The reserved line from the Load and reserve instruction is used in subsequent Store condition instruction in this lock acquisition loop. Hence, by keeping the reserved line in the cache, instead of writing back to memory and bring it back, is better in performance. By using various pointers, the victim line for write back is selected by LRU algorithm and the reservation line is not selected by skipping over this pointer.
  • Turning now to FIG. 2, the view of an atomic facility 142 (hereafter referred to variably as “atomic facility” or “AF 142” is disclosed in more detail. Atomic facility includes data array circuitry 146 for data array and its control logic. Control logic includes a directory 147, RC (Read and Claim) finite state machine 143, to handle instructions from processor core, WB (write back) state machine to handle write back 144 and Snoop state machine 145. Directory 147 holds the cache tags and its states.
  • The RC machine 143 executes atomic instructions called, load and reserve, store conditional instructions for inter process synchronization. One purpose of this series of instructions is to synchronize operations between processors by giving ownership of common data to a processor in orderly fashion in multi-processor system.
  • A purpose, generally, of this series of instructions, is to synchronize operations between processors by giving ownership of the data to one processor at a time in multi-processor system. WB machine 144 handles write back for the RC machine when cache miss occur for load or store operations issued by MPU and when the atomic facility (AF) cache is full, and victim entry is modified state. Snoop machine 145 handles snoop operations coming from the system bus to maintain memory coherency throughout the system.
  • Turning now to FIG. 3, illustrated is an example of Lock acquisition scenario between 2 processors in a multi-processor system. Lock acquisition operation entails two main atomic instructions, a Load and Reserve atomic instruction, a Store conditional atomic instruction.
  • The Lock acquisition scenario as in MPU1 will first loop on Load and Reserve at “A” instruction until the released lock data pattern, zero's for simplicity, is loaded. During this instruction, a reservation flag is set with the reservation address in the RC machine. Once a lock is released by another processor, it can continue on to the next instruction called Store Conditional at “A”. This is a step to finalize the lock by storing its processor ID into the atomic line at address “A”. However this Store is conditional on reservation flag still being active. Another processor could have issued a store command to acquire same lock right before this Store conditional instruction.
  • Since cache coherency protocol is engaged on Atomic Facility cache, this store can be snooped by receiving a cache-line-kill or a read-exclusive snoop command on the same lock line address, which kills the current reservation.
  • Once the lock is achieved by successful Store conditional, a reservation flag is reset. If lock acquisition is unsuccessful, it restarts from load and reserve again. Therefore, the processor has a full ownership of the common storage area to do its work. During this time, other processors are lock out for any access to the common area. Once the work is completed, it releases the lock by storing ‘0’ to address “A.” At this time, a second processor, MPU2 can attain a lock when the second processor acquires the latest “A” data for the Load and Reserve instruction seeing the zero data pattern. The second processor continues with Store conditional instruction to finalize the lock as described above on the first processor.
  • Software has a tendency to reuse same lock line again, because in many cases lock acquisition is done in loop structure. So it is always good idea to preserve previous reservation line, because synchronization performance is critical for multi processor communication, and once lock line is invalidated from local cache, there is always serious performance degradation for atomic instructions.
  • Turning now to FIG. 4, illustrated is one embodiment method 400 of write back operation. Generally, the method 400 describes a decision making process on the write back, as to whether write back is needed or not. Generally, this example implementation is such that the atomic facility (AF 142) has only one write back (WB) machine.
  • A write back request is dispatched by a ‘read and claim’ (RC) machine when load or store instructions and a directory lookup occur. In step 402, it is determined whether there is an executed RC miss on DIR (Directory) lookup and there is no room in the AF. If there is not, then in step 407 (will add), it is determined that a write back is not needed, and the method ends.
  • In step 403, the RC dispatches WB machine right after DIR lookup 301 and found a miss with no empty space (302 and 303) in Data Array. If there is an empty space in Data Array, then write back is not needed. If there is not an empty space, step 404 executes.
  • In step 404, the victim entry is chosen by the least recently used algorithm. If the designated least-recently-used victim entry 404 is modified, WB has to write the modified line 405 back to memory in order to make a room in AF.
  • In step 405, it is determined whether the victim entry is modified. If no, step 407 executes, and write back is deemed not to be needed. WB machine selects victim entry by using the Least Recently Used algorithm, modified and skips over the reservation entry. It continues with storing the victim entry to the memory to complete the write back operation 406.
  • Turning now to FIG. 5, illustrated is a system 500 to manage the Atomic Facility 120, there is a pointer to point the cache line in Atomic Facility Data cache where Reservation exists. A victim pointer is used to write back a modified entry when there is a miss from an atomic instruction; the victim pointer denotes which information is to be written back out of the atomic cache, when the missed data is being reloaded. Since LRU algorithm never select Reservation pointer as victim pointer, the Load and Reserve data will never be written back to memory since it is used on subsequent Store conditional instruction. Therefore this capability will improve over all performance of an atomic operation in the Atomic Facility cache.
  • It is understood that the present invention can take many forms and embodiments. Accordingly, several variations may be made in the foregoing without departing from the spirit or the scope of the invention. The capabilities outlined herein allow for the possibility of a variety of programming models. This disclosure should not be read as preferring any particular programming model, but is instead directed to the underlying mechanisms on which these programming models can be built.
  • Having thus described the present invention by reference to certain of its preferred embodiments, it is noted that the embodiments disclosed are illustrative rather than limiting in nature and that a wide range of variations, modifications, changes, and substitutions are contemplated in the foregoing disclosure and, in some instances, some features of the present invention may be employed without a corresponding use of the other features. Many such variations and modifications may be considered desirable by those skilled in the art based upon a review of the foregoing description of preferred embodiments. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the invention.

Claims (6)

1. A method of managing an atomic facility cache write back controller, comprising:
establishing a reservation pointer pointing to the reserved line in the atomic facility data array;
making a write back selection; and
removing an entry for the reservation point for the write back selection whereby the reservation line is precluded form being selected for the write back.
2. The method of claim 1, wherein making a write back selection further comprises employing a victim entry selection function.
3. The method of claim 2, wherein the victim entry selection function comprises a least recently used algorithm.
4. A system for performing a write back to a cache, comprising:
an atomic facility cache having an atomic facility cache data array;
a reservation pointer configured to point to a reserved line in the atomic facility cache data array;
a victim entry selection mechanism configured to making a next write back selection, wherein the victim entry selection mechanism is further configured so that reservation line is precluded from being selected for the write back when valid write back entry is being selected.
5. A computer program product for managing an atomic facility cache write back controller, the computer program product having a medium with a computer program embodied thereon, the computer program comprising: computer code for establishing a reservation pointer pointing to the reserved line in the atomic facility data array;
computer code for making a write back selection; and
computer code for removing an entry for the reservation point for the write back selection whereby the valid reservation line is precluded form being selected for the write back.
6. A processor for managing an atomic facility cache write back controller, the processor including a computer program comprising:
computer code for establishing a reservation pointer pointing to the reserved line in the atomic facility data array;
computer code for making a write back selection; and
computer code for removing an entry for the reservation point for the write back selection whereby the valid reservation line is precluded form being selected for the write back.
US10/875,953 2004-06-24 2004-06-24 Disable write back on atomic reserved line in a small cache system Abandoned US20050289300A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US10/875,953 US20050289300A1 (en) 2004-06-24 2004-06-24 Disable write back on atomic reserved line in a small cache system
CNA200580020710XA CN1985245A (en) 2004-06-24 2005-06-09 Disable write back on atomic reserved line in a small cache system
EP05856249A EP1769365A2 (en) 2004-06-24 2005-06-09 Disable write back on atomic reserved line in a small cache system
PCT/IB2005/004003 WO2006085140A2 (en) 2004-06-24 2005-06-09 Disable write back on atomic reserved line in a small cache system
KR1020067027236A KR20070040340A (en) 2004-06-24 2005-06-09 Disable write back on atomic reserved line in a small cache system
JP2007517534A JP2008503821A (en) 2004-06-24 2005-06-09 Method and system for invalidating writeback on atomic reservation lines in a small capacity cache system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/875,953 US20050289300A1 (en) 2004-06-24 2004-06-24 Disable write back on atomic reserved line in a small cache system

Publications (1)

Publication Number Publication Date
US20050289300A1 true US20050289300A1 (en) 2005-12-29

Family

ID=35507435

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/875,953 Abandoned US20050289300A1 (en) 2004-06-24 2004-06-24 Disable write back on atomic reserved line in a small cache system

Country Status (6)

Country Link
US (1) US20050289300A1 (en)
EP (1) EP1769365A2 (en)
JP (1) JP2008503821A (en)
KR (1) KR20070040340A (en)
CN (1) CN1985245A (en)
WO (1) WO2006085140A2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070043915A1 (en) * 2005-08-17 2007-02-22 Sun Microsystems, Inc. Conditional multistore synchronization mechanisms
US7680989B2 (en) 2005-08-17 2010-03-16 Sun Microsystems, Inc. Instruction set architecture employing conditional multistore synchronization
US20110004731A1 (en) * 2008-03-31 2011-01-06 Panasonic Corporation Cache memory device, cache memory system and processor system
WO2014102646A1 (en) * 2012-12-26 2014-07-03 Telefonaktiebolaget L M Ericsson (Publ) Atomic write and read microprocessor instructions
US20150012711A1 (en) * 2013-07-04 2015-01-08 Vakul Garg System and method for atomically updating shared memory in multiprocessor system
US20220197813A1 (en) * 2020-12-23 2022-06-23 Intel Corporation Application programming interface for fine grained low latency decompression within processor core

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7689771B2 (en) * 2006-09-19 2010-03-30 International Business Machines Corporation Coherency management of castouts
JP2011028736A (en) * 2009-07-02 2011-02-10 Fujitsu Ltd Cache memory device, arithmetic processing unit, and control method for the cache memory device
JP5828324B2 (en) * 2011-01-18 2015-12-02 日本電気株式会社 Multiprocessor system, multiprocessor control method, and processor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5958035A (en) * 1997-07-31 1999-09-28 Advanced Micro Devices, Inc. State machine based bus cycle completion checking in a bus bridge verification system
US6145057A (en) * 1997-04-14 2000-11-07 International Business Machines Corporation Precise method and system for selecting an alternative cache entry for replacement in response to a conflict between cache operation requests
US6212605B1 (en) * 1997-03-31 2001-04-03 International Business Machines Corporation Eviction override for larx-reserved addresses

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB8802102D0 (en) * 1988-01-30 1988-02-24 Int Computers Ltd Cache memory

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6212605B1 (en) * 1997-03-31 2001-04-03 International Business Machines Corporation Eviction override for larx-reserved addresses
US6145057A (en) * 1997-04-14 2000-11-07 International Business Machines Corporation Precise method and system for selecting an alternative cache entry for replacement in response to a conflict between cache operation requests
US5958035A (en) * 1997-07-31 1999-09-28 Advanced Micro Devices, Inc. State machine based bus cycle completion checking in a bus bridge verification system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070043915A1 (en) * 2005-08-17 2007-02-22 Sun Microsystems, Inc. Conditional multistore synchronization mechanisms
US7480771B2 (en) * 2005-08-17 2009-01-20 Sun Microsystems, Inc. Conditional synchronization mechanisms allowing multiple store operations to become visible while a flagged memory location is owned and remains unchanged
US7680989B2 (en) 2005-08-17 2010-03-16 Sun Microsystems, Inc. Instruction set architecture employing conditional multistore synchronization
US20110004731A1 (en) * 2008-03-31 2011-01-06 Panasonic Corporation Cache memory device, cache memory system and processor system
WO2014102646A1 (en) * 2012-12-26 2014-07-03 Telefonaktiebolaget L M Ericsson (Publ) Atomic write and read microprocessor instructions
US20150012711A1 (en) * 2013-07-04 2015-01-08 Vakul Garg System and method for atomically updating shared memory in multiprocessor system
US20220197813A1 (en) * 2020-12-23 2022-06-23 Intel Corporation Application programming interface for fine grained low latency decompression within processor core
EP4020223A1 (en) * 2020-12-23 2022-06-29 Intel Corporation Increasing per core memory bandwidth by using forget stores

Also Published As

Publication number Publication date
JP2008503821A (en) 2008-02-07
WO2006085140A3 (en) 2007-08-16
CN1985245A (en) 2007-06-20
KR20070040340A (en) 2007-04-16
EP1769365A2 (en) 2007-04-04
WO2006085140A2 (en) 2006-08-17

Similar Documents

Publication Publication Date Title
US6839816B2 (en) Shared cache line update mechanism
US11119923B2 (en) Locality-aware and sharing-aware cache coherence for collections of processors
US7814281B2 (en) Method to provide atomic update primitives in an asymmetric heterogeneous multiprocessor environment
US9390026B2 (en) Synchronizing access to data in shared memory
US8296519B2 (en) Synchronizing access to data in shared memory via upper level cache queuing
US5895495A (en) Demand-based larx-reserve protocol for SMP system buses
US7475191B2 (en) Processor, data processing system and method for synchronizing access to data in shared memory
US7213248B2 (en) High speed promotion mechanism suitable for lock acquisition in a multiprocessor data processing system
US7254678B2 (en) Enhanced STCX design to improve subsequent load efficiency
US8924653B2 (en) Transactional cache memory system
US5043886A (en) Load/store with write-intent for write-back caches
US7228385B2 (en) Processor, data processing system and method for synchronizing access to data in shared memory
US7484042B2 (en) Data processing system and method for predictively selecting a scope of a prefetch operation
US7549025B2 (en) Efficient marking of shared cache lines
EP0608622A1 (en) Multiprocessor system with multilevel caches
WO2006085140A2 (en) Disable write back on atomic reserved line in a small cache system
US7200717B2 (en) Processor, data processing system and method for synchronizing access to data in shared memory
WO2001050274A1 (en) Cache line flush micro-architectural implementation method and system
JP2006323845A (en) Processor, data processing system, and method for initializing memory block
US7197604B2 (en) Processor, data processing system and method for synchronzing access to data in shared memory
US20050144397A1 (en) Method and apparatus for enabling volatile shared data across caches in a coherent memory multiprocessor system to reduce coherency traffic
EP0380861A2 (en) Improved data consistency between cache memories and the main memory in a multi-processor computer system
GB2401227A (en) Cache line flush instruction and method
IE901532A1 (en) Improved scheme for insuring data consistency between a¹plurality of cache memories and the main memory in a¹multiprocessor computer system

Legal Events

Date Code Title Description
AS Assignment

Owner name: SONY COMPUTER ENTERTAINMENT INC., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OKAWA, YASUKICHI;REEL/FRAME:015095/0617

Effective date: 20040617

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIM, ROY MOONSEUK;TRUONG, TONY THUONG;REEL/FRAME:015095/0582;SIGNING DATES FROM 20040119 TO 20040120

STCB Information on status: application discontinuation

Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION