US20130014120A1 - Fair Software Locking Across a Non-Coherent Interconnect - Google Patents
Fair Software Locking Across a Non-Coherent Interconnect Download PDFInfo
- Publication number
- US20130014120A1 US20130014120A1 US13/179,344 US201113179344A US2013014120A1 US 20130014120 A1 US20130014120 A1 US 20130014120A1 US 201113179344 A US201113179344 A US 201113179344A US 2013014120 A1 US2013014120 A1 US 2013014120A1
- Authority
- US
- United States
- Prior art keywords
- resource
- owner
- shared
- hardware
- access
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/526—Mutual exclusion algorithms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/52—Indexing scheme relating to G06F9/52
- G06F2209/522—Manager
Definitions
- locks are typically used to limit access to a shared resource to only one process at a time. This prevents multiple users from concurrently modifying the same shared data. For example, a group of processes may each have to acquire a lock before accessing a particular shared resource. When one process has acquired the lock, none of the other processes can acquire the lock, which provides exclusive access and control of the shared resource to the process that first acquired the lock.
- the ability to acquire the lock may depend in part upon how fast an execution unit accesses the lock and how often the execution unit reattempts to acquire the lock when a first attempt is unsuccessful.
- an execution unit that is remote from other execution units may be at a disadvantage due to the transmission delay of lock acquisition signals compared to the delays associated with closer execution units. If two units begin an attempt to acquire the lock at approximately the same time, the closer execution unit is likely to always have its request arrive first, and requests from a farther execution unit are likely to be too late. Additionally, when an execution unit cannot acquire a lock that was already in use by another device, the execution unit may back off for a period and will reattempt to acquire the lock at a later time. In the meantime, other devices may acquire the lock before the execution unit has reattempted acquiring the lock. As a result, if a number of other devices are attempting to acquire the lock, the execution device may have difficulty acquiring the lock in a timely manner.
- Access to a shared resource by a plurality of execution units is organized and controlled by issuing tickets to each execution unit as they request access to the resource.
- the tickets are issued by a hardware atomic unit so that each execution unit receives a unique ticket number.
- a current owner field indicates the ticket number of the execution unit that currently has access to the shared resource. When an execution unit has completed its access, it releases the shared resource and increments the owner field. Execution units awaiting access to the shared resource periodically check the current value of the owner field and take control of the shared resource when their respective ticket values match the owner field.
- multiple execution units may access the shared resource concurrently.
- the execution units determine if they are allowed to access the shared resource by determining if their unique ticket number is within a concurrency number of the owner field value.
- the execution units release the shared resource upon completion of their required access.
- the execution units increment the owner field value after releasing the shared resource.
- the execution units identify a last ticket number issued by the hardware atomic unit.
- the execution units compare the last issued ticket number to a number one less than the current value of the owner field. If the last issued ticket number is equal to the number one less than the current owner field value, then the execution unit may expect to achieve immediate access to the shared resource and, therefore, requests a new unique ticket from the hardware atomic unit. If the last issued ticket number is not equal to the number one less than the current owner field value, then the execution unit does not expect to achieve immediate access to the shared resource and, therefore, does not request a new unique ticket from the hardware atomic unit.
- FIG. 1 illustrates a multicore processor chip according to an example embodiment
- FIG. 2 illustrates a system, such as a multicore processor, comprising a core running a plurality of applications or threads according to one embodiment
- FIG. 3 is a flowchart illustrating a process for providing fair access to a shared resource
- FIG. 4 is a flowchart illustrating a conditional access process for a shared resource according to one embodiment.
- FIG. 1 illustrates a multicore processor chip 100 having cores 101 . Although only two cores 101 - 1 , 101 - 2 are illustrated, it will be understood that chip 100 may have any number of cores 101 .
- Each core 101 has a processing unit 102 , a cache 103 , and configuration registers 104 .
- Core bus 105 provides a communication medium for the components of core 101 .
- Cores 101 communicate via a chip bus 106 .
- Cores 101 may also access an on-chip memory 107 using chip bus 106 .
- One core 101 - 1 may access and manipulate the cache 103 of another core 101 - 2 .
- intra-core communications on bus 105 will be faster than inter-core communications on bus 106 .
- Multicore chip 100 may have a coherency protocol or a locking mechanism to allow multiple cores 101 to manipulate a cache 103 or memory 107 in a coherent and deterministic manner.
- FIG. 1 may be a system with any form of parallel independent processing. It will be understood that the present invention is not limited to applications on a multi-core chip.
- Shared data or resources may be simultaneously required for two or more execution units, such as threads, applications, or processes.
- an atomic lock is often used to prevent data collisions where two execution units attempt to access the shared resource at the same time.
- an atomic lock instruction is implemented when a first device accesses the shared resource, which prevents other devices from accessing the shared resource or changing the lock state.
- the lock is a hardware atomic primitive that provides mutual exclusion among the execution units.
- An execution unit that requires exclusive access to a shared resource will repeatedly request access until the request is granted.
- the waiting execution unit may use any one of a number of well-known mechanisms to reduce communication resource consumption while requesting access. For example, the waiting execution unit may issue a new request at regular intervals, or the execution unit may use exponential back-off to determine when to issue new requests.
- a requesting execution unit such as a processor or thread, may attempt to reduce communication congestion by backing off on its retry interval.
- the requesting execution unit uses longer periods between attempts to access the resource, it allows other devices more opportunities to acquire the desired resource instead.
- backing-off the requesting execution unit is at a disadvantage compared to other requests that arrive soon after the release of the resource.
- Thread A may be waiting for a resource while a third thread C currently owns the resource.
- Thread A tries to acquire the resource, but is denied since the resource is owned by C.
- thread A backs off and waits for a number of cycles before trying again.
- thread A is waiting to re-try its access, thread C releases the resource and thread B begins attempts to access the resource.
- Thread B which started its attempts to access the resource after thread A, will acquire the resource before thread A.
- Another problem involves differences in access latencies within hardware implementing the request. For systems with non-uniform access latency among components, requesting execution units that are further away from the atomic lock hardware are at a disadvantage due to propagation delay of the request. As a result, a more remote execution unit may be starved for forward progress by requesters that are closer to the resource.
- threads A, B, and C may be waiting for a resource, and thread C may have longer access latency for the resource than either thread A or B. If all three threads contend for the resource, then thread A or B will be more likely to acquire the resource than thread C. Moreover, in the event that thread A acquires the resource and threads B and C continue to contend for access, when A releases the resource, then thread B will be more likely to acquire the resource than thread C. Furthermore, in the event that thread A attempts to acquire the resource again before B releases the resource, when B releases the resource, then thread A will again be more likely to acquire the resource than thread C because of thread A's proximity. As a result, threads A and B may starve thread C from resource access and may limit thread C's forward progress.
- requesters' access requests for a shared resource are ordered to make the access process fairer.
- a hardware device dispenses “tickets” that guarantee a spot in a queue of requesting threads.
- An owner field identifies the current owner of the shared resource—like a “now serving” sign—and is used to indicate which ticket currently owns the resource. When a requesting thread sees the value of its ticket in the owner field, then that thread has exclusive access to the associated resources.
- Chip 100 includes ticket generation unit 108 that generates tickets 109 .
- Ticket generation unit 109 is a hardware atomic primitive that returns a value T, which is an atomically incremented number. The atomic increment of T in each ticket 109 is suited to non-coherent systems as there is no requirement to gain ownership of a cache-line or bus-lock.
- Chip 100 may have multiple shared resources, such as cache 103 - 1 , 103 - 2 .
- Chip 100 further comprises Owner storage locations 111 associated with each shared resource.
- Owner storage locations 111 may be any dedicated hardware location or a software-determined general-purpose memory location.
- the owner storage location may be a direct-map cache location, a hardware register, or a memory location.
- the Owner storage location 111 identifies the resource owner.
- the value O in storage location 111 indicates the ticket value T for the current owner of the associated resource. If the shared resource is to be initialized as available, then the value O 111 is initialized to contain the next value T 109 that will be returned from the ticket generation unit 108 . If a resource is to be initialized as already held, then O 111 is set to a value that is one less than the next value T 109 to be returned from the ticket generation unit 108 .
- a thread X that requires access to a shared resource first requests a ticket from ticket generation unit 108 .
- Ticket generation unit 108 issues a ticket T X to thread X and then atomically increases the hardware counter 109 .
- Thread X compares the value of the ticket T X to the current owner O value 111 for the shared resource. If the value of O 111 does not match the ticket T X , then thread X periodically reads the value O 111 for the resource until O 111 matches the waiting thread's ticket value T X .
- thread X then owns the shared resource and can operate upon or interact with the shared resource accordingly.
- Owner field O 111 can be considered as protected by the resource and, therefore, does not require atomic accesses or special hardware support for updating O 111 .
- Conditional acquisition may be implemented using compare-and-swap hardware to issue a ticket T 109 only if an incremented T matches the current value in O.
- the conditional sequence with the hardware compare-and-swap as the atomic step, is:
- an execution unit once an execution unit has taken a ticket, it must continue to monitor the current value of the owner field O and, when its ticket value T equals the owner field value O, the execution unit must access the resource or—at a minimum—increment the owner field value if it does not access the resource.
- An execution unit cannot ignore the owner field after it has taken a ticket, or the resource will become stalled and other devices will not be able to access the resource until the execution unit updates the owner field and allows the next device in line to access the resource.
- the example above has a concurrency level of one, meaning only one thread may access to the resource at a time.
- the ticket/owner mechanism described herein may be generalized to an arbitrary concurrency level. For a concurrency level “N”—where N threads are allowed to operate concurrently—a thread is allowed to access the resource if: T ⁇ O ⁇ N.
- the update of O 111 must be performed atomically.
- a hardware mechanism identical to ticket generation unit, which provides an atomic update for T can be used to update O.
- the hardware atomic mechanism for updating O may be configured to provide no return value.
- the mechanism for updating O may be streamlined as a write for which the thread does not need to wait for completion.
- FIG. 2 illustrates a system 200 , such as a multicore processor, comprising a core 201 running a plurality of applications or threads A X-Z 202 .
- System 200 includes a shared resource 203 that is used by each of the threads A X-Z 202 .
- Owner field 204 identifies the current owner of shared resource 203 .
- Each of the threads A X-Z 202 may access ticket generation unit 205 to request a ticket T to access shared resource 203 .
- Each thread A X-Z 202 compares its ticket, T X-Z , to owner field O 204 to determine if it is allowed to access shared resource 203 .
- each thread A X-Z 202 compares its ticket T X-Z to the owner field and evaluates whether it meets the criteria T ⁇ O ⁇ N. Any of the threads A X-Z 202 that have a ticket T X-Z that is within N of O is allowed to access shared resource 203 .
- the width—in bits—of the atomic counter that is used to generate the tickets should be wide enough to count the maximum number of threads, which may be determined by the number of waiting threads plus the concurrency level.
- the atomic increment is implemented as a read to a defined address, which returns an atomically incremented number.
- the owner field is implemented as regular memory or as dedicated hardware storage.
- releasing a concurrency level 1 resource can be a non-atomic or an atomic increment of the owner field value O.
- releasing a resource is implemented as a load, increment, store, or as one transaction that causes hardware to increment O, thereby reducing the number of hardware transactions required to release the resource.
- FIG. 3 is a flowchart 300 illustrating a process for providing fair access to a shared resource.
- the process illustrated in FIG. 3 may be applied to a shared resource that may be accessed by one or many execution units at the same time.
- the concurrency parameter—N—is the number of execution units that may simultaneously access the shared resource. For concurrency of one, as discussed above, N 1.
- an execution unit such as an application, thread, or process that requires access to the shared resource, requests a ticket from a hardware atomic unit configured to distribute tickets having unique values.
- the shared resource may be hardware or data, such as a memory block, register, device driver, or other resource.
- the execution unit reads or otherwise obtains the current value of the owner field associated with the shared resource.
- the owner field identifies the ticket value of the execution unit that is currently in control of the shared resource.
- step 303 the execution unit compares the ticket value (obtained in step 301 ) and the current owner field value (read in step 302 ) to the concurrency level N for the shared resource. If T ⁇ O ⁇ N, then the execution unit's ticket is not yet “up” and the execution unit moves to step 304 and continues to wait. The execution unit then returns to step 302 where it obtains a new current value of the owner field. The process then continues to the comparison in step 303 .
- step 304 the execution unit may immediately move to step 302 to obtain an updated owner field value, or the execution unit may delay for a predetermined period before moving back to step 302 .
- the predetermined period may be a fixed or variable interval. For example, the execution unit may use a backoff procedure to adjust the predetermined period, which may be employed to minimize traffic on a communication bus and/or to avoid collisions with other execution units that may be reading the owner field.
- step 306 the execution unit releases the shared resource and then to step 307 where the execution unit increments the owner field value.
- FIG. 4 is a flowchart 400 illustrating a conditional access process for the shared resource according to one embodiment.
- an execution unit receives a ticket, it must continue to monitor the current owner field to prevent the shared resource from being stalled. When the issued ticket number matches the owner field, then the execution unit must increment the owner field at a minimum, whether or not the execution unit actually accesses the shared resource. In some embodiments, an execution unit may not want to wait to access the shared resource if it is not immediately available. The process illustrated in FIG. 4 allows an execution unit to determine whether it can gain immediate access to the shared resource by “pulling” the next ticket.
- step 401 the execution unit reads the current owner field value O associated with the shared resource.
- step 402 the execution unit reads the value L of the last ticket issued by the hardware atomic unit.
- step 403 the execution unit compares the last ticket value L to the current owner field value O.
- the resource As illustrated in FIG. 3 , when an execution unit completes its access and releases the shared resource ( 306 ), it then increments the owner field value ( 307 ). Accordingly, the next ticket in line will have access to the resource.
- step 404 when the execution unit cannot gain immediate access to the shared resource (i.e. L ⁇ O ⁇ 1), then the process moves to step 404 and the execution unit does not take a ticket. Instead, the execution unit may proceed with other operations and may reattempt access to the shared resource at a later time and/or attempt to access a different resource.
- step 405 the execution unit requests a ticket from the hardware atomic unit.
- the process may then move immediately to step 406 where the execution unit accesses the shared resource.
- the execution unit may follow the process illustrated in FIG. 3 to verify that it actually has immediate access to the shared resource.
- step 407 the execution unit releases the shared resource and then to step 408 where the execution unit increments the owner field value.
- the execution unit could simply read the next ticket value from the hardware atomic unit to determine if the next ticket matches the current owner of the shared resource.
- such reading of the next value in the hardware atomic unit may be equivalent to issuing a new ticket, which would then require a device to continue to monitor owner field and to wait for a turn to access the shared resource and/or to increment the owner field.
- the value of the last-issued ticket may be stored in a location that is accessible to the cores.
- steps 301 - 307 of the process illustrated in FIG. 3 and steps 401 - 408 of the process illustrated in FIG. 4 may be executed simultaneously and/or sequentially. It will be further understood that each step may be performed in any order and may be performed once or repetitiously.
- processors may include any device or medium that can store or transfer information. Examples of such a processor-readable medium include an electronic circuit, a semiconductor memory device, a flash memory, a ROM, an erasable ROM (EROM), a floppy diskette, a compact disk, an optical disk, a hard disk, a fiber optic medium, etc.
- the software code segments may be stored in any volatile or non-volatile storage device, such as a hard drive, flash memory, solid state memory, optical disk, CD, DVD, computer program product, or other memory device, that provides computer-readable or machine-readable storage for a processor or a middleware container service.
- the memory may be a virtualization of several physical storage devices, wherein the physical storage devices are of the same or different kinds.
- the code segments may be downloaded or transferred from storage to a processor or container via an internal bus, another computer network, such as the Internet or an intranet, or via other wired or wireless networks.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
- Multiple computer programs, processes, applications, and/or threads running on a computer or processor often need to access shared data or hardware, such as a memory block, register, device driver, or other common resource. To avoid data collisions and data corruption, locks are typically used to limit access to a shared resource to only one process at a time. This prevents multiple users from concurrently modifying the same shared data. For example, a group of processes may each have to acquire a lock before accessing a particular shared resource. When one process has acquired the lock, none of the other processes can acquire the lock, which provides exclusive access and control of the shared resource to the process that first acquired the lock.
- Where multiple execution units try to acquire the same lock, the ability to acquire the lock may depend in part upon how fast an execution unit accesses the lock and how often the execution unit reattempts to acquire the lock when a first attempt is unsuccessful. For example, an execution unit that is remote from other execution units may be at a disadvantage due to the transmission delay of lock acquisition signals compared to the delays associated with closer execution units. If two units begin an attempt to acquire the lock at approximately the same time, the closer execution unit is likely to always have its request arrive first, and requests from a farther execution unit are likely to be too late. Additionally, when an execution unit cannot acquire a lock that was already in use by another device, the execution unit may back off for a period and will reattempt to acquire the lock at a later time. In the meantime, other devices may acquire the lock before the execution unit has reattempted acquiring the lock. As a result, if a number of other devices are attempting to acquire the lock, the execution device may have difficulty acquiring the lock in a timely manner.
- This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
- Access to a shared resource by a plurality of execution units is organized and controlled by issuing tickets to each execution unit as they request access to the resource. The tickets are issued by a hardware atomic unit so that each execution unit receives a unique ticket number. A current owner field indicates the ticket number of the execution unit that currently has access to the shared resource. When an execution unit has completed its access, it releases the shared resource and increments the owner field. Execution units awaiting access to the shared resource periodically check the current value of the owner field and take control of the shared resource when their respective ticket values match the owner field.
- Existing mechanisms require cache coherence to control ticket generation. Increasing cache coherence requirements limit scalability in the system. The mechanism described herein allows, through implementation of the hardware atomic unit, scalable non-cache coherent systems that still support an efficient shared resource arbitration mechanism.
- In one embodiment, multiple execution units may access the shared resource concurrently. The execution units determine if they are allowed to access the shared resource by determining if their unique ticket number is within a concurrency number of the owner field value.
- The execution units release the shared resource upon completion of their required access. The execution units increment the owner field value after releasing the shared resource.
- In one embodiment, the execution units identify a last ticket number issued by the hardware atomic unit. The execution units compare the last issued ticket number to a number one less than the current value of the owner field. If the last issued ticket number is equal to the number one less than the current owner field value, then the execution unit may expect to achieve immediate access to the shared resource and, therefore, requests a new unique ticket from the hardware atomic unit. If the last issued ticket number is not equal to the number one less than the current owner field value, then the execution unit does not expect to achieve immediate access to the shared resource and, therefore, does not request a new unique ticket from the hardware atomic unit.
-
FIG. 1 illustrates a multicore processor chip according to an example embodiment; -
FIG. 2 illustrates a system, such as a multicore processor, comprising a core running a plurality of applications or threads according to one embodiment; -
FIG. 3 is a flowchart illustrating a process for providing fair access to a shared resource; and -
FIG. 4 is a flowchart illustrating a conditional access process for a shared resource according to one embodiment. -
FIG. 1 illustrates amulticore processor chip 100 having cores 101. Although only two cores 101-1, 101-2 are illustrated, it will be understood thatchip 100 may have any number of cores 101. Each core 101 has aprocessing unit 102, a cache 103, andconfiguration registers 104.Core bus 105 provides a communication medium for the components of core 101. Cores 101 communicate via achip bus 106. Cores 101 may also access an on-chip memory 107 usingchip bus 106. One core 101-1 may access and manipulate the cache 103 of another core 101-2. Often, intra-core communications onbus 105 will be faster than inter-core communications onbus 106.Multicore chip 100 may have a coherency protocol or a locking mechanism to allow multiple cores 101 to manipulate a cache 103 ormemory 107 in a coherent and deterministic manner. Alternatively,FIG. 1 may be a system with any form of parallel independent processing. It will be understood that the present invention is not limited to applications on a multi-core chip. - Shared data or resources, such as shared
memory 107 or shared cache 103, may be simultaneously required for two or more execution units, such as threads, applications, or processes. In prior systems, an atomic lock is often used to prevent data collisions where two execution units attempt to access the shared resource at the same time. For example, an atomic lock instruction is implemented when a first device accesses the shared resource, which prevents other devices from accessing the shared resource or changing the lock state. The lock is a hardware atomic primitive that provides mutual exclusion among the execution units. An execution unit that requires exclusive access to a shared resource will repeatedly request access until the request is granted. The waiting execution unit may use any one of a number of well-known mechanisms to reduce communication resource consumption while requesting access. For example, the waiting execution unit may issue a new request at regular intervals, or the execution unit may use exponential back-off to determine when to issue new requests. - However, there are a certain problems with the mechanisms used in the prior systems. One problem involves the timing requests to access the resource. A requesting execution unit, such as a processor or thread, may attempt to reduce communication congestion by backing off on its retry interval. In this case, as the requesting execution unit uses longer periods between attempts to access the resource, it allows other devices more opportunities to acquire the desired resource instead. As a result, by backing-off, the requesting execution unit is at a disadvantage compared to other requests that arrive soon after the release of the resource.
- For example, two threads A and B may be waiting for a resource while a third thread C currently owns the resource. Thread A tries to acquire the resource, but is denied since the resource is owned by C. After a brief interval of trying to access the resource, thread A backs off and waits for a number of cycles before trying again. While thread A is waiting to re-try its access, thread C releases the resource and thread B begins attempts to access the resource. Thread B, which started its attempts to access the resource after thread A, will acquire the resource before thread A.
- Another problem involves differences in access latencies within hardware implementing the request. For systems with non-uniform access latency among components, requesting execution units that are further away from the atomic lock hardware are at a disadvantage due to propagation delay of the request. As a result, a more remote execution unit may be starved for forward progress by requesters that are closer to the resource.
- For example, three threads A, B, and C may be waiting for a resource, and thread C may have longer access latency for the resource than either thread A or B. If all three threads contend for the resource, then thread A or B will be more likely to acquire the resource than thread C. Moreover, in the event that thread A acquires the resource and threads B and C continue to contend for access, when A releases the resource, then thread B will be more likely to acquire the resource than thread C. Furthermore, in the event that thread A attempts to acquire the resource again before B releases the resource, when B releases the resource, then thread A will again be more likely to acquire the resource than thread C because of thread A's proximity. As a result, threads A and B may starve thread C from resource access and may limit thread C's forward progress.
- In one embodiment, requesters' access requests for a shared resource are ordered to make the access process fairer. A hardware device dispenses “tickets” that guarantee a spot in a queue of requesting threads. An owner field identifies the current owner of the shared resource—like a “now serving” sign—and is used to indicate which ticket currently owns the resource. When a requesting thread sees the value of its ticket in the owner field, then that thread has exclusive access to the associated resources.
-
Chip 100 includesticket generation unit 108 that generatestickets 109.Ticket generation unit 109 is a hardware atomic primitive that returns a value T, which is an atomically incremented number. The atomic increment of T in eachticket 109 is suited to non-coherent systems as there is no requirement to gain ownership of a cache-line or bus-lock.Chip 100 may have multiple shared resources, such as cache 103-1, 103-2.Chip 100 further comprises Owner storage locations 111 associated with each shared resource. Owner storage locations 111 may be any dedicated hardware location or a software-determined general-purpose memory location. For example, the owner storage location may be a direct-map cache location, a hardware register, or a memory location. - The Owner storage location 111 identifies the resource owner. The value O in storage location 111 indicates the ticket value T for the current owner of the associated resource. If the shared resource is to be initialized as available, then the value O 111 is initialized to contain the
next value T 109 that will be returned from theticket generation unit 108. If a resource is to be initialized as already held, then O 111 is set to a value that is one less than thenext value T 109 to be returned from theticket generation unit 108. - A thread X that requires access to a shared resource first requests a ticket from
ticket generation unit 108.Ticket generation unit 108 issues a ticket TX to thread X and then atomically increases thehardware counter 109. Thread X compares the value of the ticket TX to the current owner O value 111 for the shared resource. If the value of O 111 does not match the ticket TX, then thread X periodically reads the value O 111 for the resource until O 111 matches the waiting thread's ticket value TX. When O matches the ticket value TX, thread X then owns the shared resource and can operate upon or interact with the shared resource accordingly. When thread X is finished with the resource, it increments O 111, which effectively passes ownership of the resource to the next waiting thread. Owner field O 111 can be considered as protected by the resource and, therefore, does not require atomic accesses or special hardware support for updating O 111. - Once a waiting thread is granted a ticket T, the thread must continue waiting until it obtains the resource and then must increment O 111 when finished. Conditional acquisition may be implemented using compare-and-swap hardware to issue a
ticket T 109 only if an incremented T matches the current value in O. The conditional sequence, with the hardware compare-and-swap as the atomic step, is: -
Owner = O; // read by software P = O−1; // what T must be for conditional wait to succeed Y = Atomic(P, Owner) { if (P == T) { T = O + 1; // increment Return P; } else { return T; } } - If Y—the returned value—is equal to P, then the resource has been acquired, otherwise the resource has not been acquired and a ticket has not been granted.
- In one embodiment, once an execution unit has taken a ticket, it must continue to monitor the current value of the owner field O and, when its ticket value T equals the owner field value O, the execution unit must access the resource or—at a minimum—increment the owner field value if it does not access the resource. An execution unit cannot ignore the owner field after it has taken a ticket, or the resource will become stalled and other devices will not be able to access the resource until the execution unit updates the owner field and allows the next device in line to access the resource.
- The example above has a concurrency level of one, meaning only one thread may access to the resource at a time. To avoid stalling the resource and/or to allow multiple concurrent users, if supported by the resource, the ticket/owner mechanism described herein may be generalized to an arbitrary concurrency level. For a concurrency level “N”—where N threads are allowed to operate concurrently—a thread is allowed to access the resource if: T−O<N.
- Because multiple threads operate concurrently on the same shared resource, the update of O 111 must be performed atomically. In one embodiment, a hardware mechanism identical to ticket generation unit, which provides an atomic update for T, can be used to update O. Alternatively, because the return value of O is not required, the hardware atomic mechanism for updating O may be configured to provide no return value. In one embodiment, the mechanism for updating O may be streamlined as a write for which the thread does not need to wait for completion.
-
FIG. 2 illustrates asystem 200, such as a multicore processor, comprising acore 201 running a plurality of applications or threads AX-Z 202.System 200 includes a sharedresource 203 that is used by each of the threads AX-Z 202.Owner field 204 identifies the current owner of sharedresource 203. Each of the threads AX-Z 202 may accessticket generation unit 205 to request a ticket T to access sharedresource 203. Each thread AX-Z 202 compares its ticket, TX-Z, toowner field O 204 to determine if it is allowed to access sharedresource 203. - For the case of concurrency level of 1 (N=1), each thread AX-Z 202 evaluates whether its ticket is equal to the owner field 204 (TX-Z=O) and whichever thread has the matching ticket is allowed to access shared
resource 203. - For the case of concurrency level N, each thread AX-Z 202 compares its ticket TX-Z to the owner field and evaluates whether it meets the criteria T−O<N. Any of the threads AX-Z 202 that have a ticket TX-Z that is within N of O is allowed to access shared
resource 203. - Using the shared resource access mechanisms described herein provides the following benefits:
-
- 1) Threads gain access to the shared resource in the order in which they present their first request to the ticket-generating hardware atomic unit.
- 2) Communication traffic to the hardware atomic unit is greatly reduced because only one reference per lock acquisition is required without regard to the level of contention.
- 3) Back-off mechanisms implemented by threads waiting for resource ownership to be passed to them do not subject those threads to fairness imbalances caused by the waiting patterns or inter-arrival rates of other threads.
- 4) Latency to the hardware atomic unit determines, at most, which position in line—or which ticket number—is granted to a thread, but such latency will not lead to starvation or a continuing arbitration disadvantage.
- In one embodiment, the width—in bits—of the atomic counter that is used to generate the tickets should be wide enough to count the maximum number of threads, which may be determined by the number of waiting threads plus the concurrency level. The minimum number of bits is equal to: log 2(maximum number of threads plus concurrency level), where the maximum number of threads is rounded up to the next power of 2. For example, if the maximum number of threads is 64, then the bit-width must be at least six bits−log 2(64)=6. In some embodiments, this is the number of hardware threads or logical processors in the system.
- In some embodiments, the atomic increment is implemented as a read to a defined address, which returns an atomically incremented number.
- In some embodiments, the owner field is implemented as regular memory or as dedicated hardware storage.
- In other embodiments, releasing a
concurrency level 1 resource can be a non-atomic or an atomic increment of the owner field value O. - In other embodiments, releasing a resource is implemented as a load, increment, store, or as one transaction that causes hardware to increment O, thereby reducing the number of hardware transactions required to release the resource.
-
FIG. 3 is aflowchart 300 illustrating a process for providing fair access to a shared resource. The process illustrated inFIG. 3 may be applied to a shared resource that may be accessed by one or many execution units at the same time. The concurrency parameter—N—is the number of execution units that may simultaneously access the shared resource. For concurrency of one, as discussed above, N=1. Instep 301, an execution unit, such as an application, thread, or process that requires access to the shared resource, requests a ticket from a hardware atomic unit configured to distribute tickets having unique values. The shared resource may be hardware or data, such as a memory block, register, device driver, or other resource. Instep 302, the execution unit reads or otherwise obtains the current value of the owner field associated with the shared resource. The owner field identifies the ticket value of the execution unit that is currently in control of the shared resource. - In
step 303, the execution unit compares the ticket value (obtained in step 301) and the current owner field value (read in step 302) to the concurrency level N for the shared resource. If T−O≧N, then the execution unit's ticket is not yet “up” and the execution unit moves to step 304 and continues to wait. The execution unit then returns to step 302 where it obtains a new current value of the owner field. The process then continues to the comparison instep 303. Instep 304, the execution unit may immediately move to step 302 to obtain an updated owner field value, or the execution unit may delay for a predetermined period before moving back tostep 302. The predetermined period may be a fixed or variable interval. For example, the execution unit may use a backoff procedure to adjust the predetermined period, which may be employed to minimize traffic on a communication bus and/or to avoid collisions with other execution units that may be reading the owner field. - If the difference between the values of the ticket and the owner field are less than the concurrency level (i.e. T−O<N), then the process moves to step 305 and the execution unit is granted access to the shared resource. If the shared resource has a concurrency level of one (N=1), for example, then the execution unit is granted access when the ticket and owner field values are the same (i.e. when T=O, then T−O=0<N=1).
- After the execution unit has completed its use of the shared resource, the process moves to step 306 where the execution unit releases the shared resource and then to step 307 where the execution unit increments the owner field value.
-
FIG. 4 is aflowchart 400 illustrating a conditional access process for the shared resource according to one embodiment. As noted above, once an execution unit receives a ticket, it must continue to monitor the current owner field to prevent the shared resource from being stalled. When the issued ticket number matches the owner field, then the execution unit must increment the owner field at a minimum, whether or not the execution unit actually accesses the shared resource. In some embodiments, an execution unit may not want to wait to access the shared resource if it is not immediately available. The process illustrated inFIG. 4 allows an execution unit to determine whether it can gain immediate access to the shared resource by “pulling” the next ticket. - In
step 401, the execution unit reads the current owner field value O associated with the shared resource. Instep 402, the execution unit reads the value L of the last ticket issued by the hardware atomic unit. Instep 403, the execution unit compares the last ticket value L to the current owner field value O. - If the last ticket value L is one less than the current owner field value O (i.e. L=O−1), then the next ticket issued (i.e. L+1=T) will immediately own the resource. As illustrated in
FIG. 3 , when an execution unit completes its access and releases the shared resource (306), it then increments the owner field value (307). Accordingly, the next ticket in line will have access to the resource. - However, if the last ticket value L issued is greater than (O−1) where O is the current Owner field value, then the next ticket pulled will have to wait for access to the resource.
- In
flowchart 400, when the execution unit cannot gain immediate access to the shared resource (i.e. L≠O−1), then the process moves to step 404 and the execution unit does not take a ticket. Instead, the execution unit may proceed with other operations and may reattempt access to the shared resource at a later time and/or attempt to access a different resource. - On the other hand, when the execution will gain immediate access to the shared resource (i.e. L=O−1), then the process moves to step 405 where the execution unit requests a ticket from the hardware atomic unit. The process may then move immediately to step 406 where the execution unit accesses the shared resource. Alternatively, between
steps FIG. 3 to verify that it actually has immediate access to the shared resource. - After the execution unit has completed its use of the shared resource, the process moves to step 407 where the execution unit releases the shared resource and then to step 408 where the execution unit increments the owner field value.
- In other embodiments, the execution unit could simply read the next ticket value from the hardware atomic unit to determine if the next ticket matches the current owner of the shared resource. However, in some embodiments, such reading of the next value in the hardware atomic unit may be equivalent to issuing a new ticket, which would then require a device to continue to monitor owner field and to wait for a turn to access the shared resource and/or to increment the owner field. Instead, when a ticket is issued, the value of the last-issued ticket may be stored in a location that is accessible to the cores.
- The process illustrated in
flowchart 400 is for the case of concurrency level one, but may be generalized to allow higher concurrency levels N. For example, if the next ticket T minus the concurrency level N is less than the current owner value (i.e. T−N<O), then the next ticket T will not have to wait for access to the resource. In terms of the last ticket value L (i.e. L=T−1), this can be represented as L−N<O−1. - It will be understood that steps 301-307 of the process illustrated in
FIG. 3 and steps 401-408 of the process illustrated inFIG. 4 may be executed simultaneously and/or sequentially. It will be further understood that each step may be performed in any order and may be performed once or repetitiously. - Many of the functions described herein may be implemented in hardware, software, and/or firmware, and/or any combination thereof. When implemented in software, code segments perform the necessary tasks or steps. The program or code segments may be stored in a processor-readable, computer-readable, or machine-readable medium. The processor-readable, computer-readable, or machine-readable medium may include any device or medium that can store or transfer information. Examples of such a processor-readable medium include an electronic circuit, a semiconductor memory device, a flash memory, a ROM, an erasable ROM (EROM), a floppy diskette, a compact disk, an optical disk, a hard disk, a fiber optic medium, etc.
- The software code segments may be stored in any volatile or non-volatile storage device, such as a hard drive, flash memory, solid state memory, optical disk, CD, DVD, computer program product, or other memory device, that provides computer-readable or machine-readable storage for a processor or a middleware container service. In other embodiments, the memory may be a virtualization of several physical storage devices, wherein the physical storage devices are of the same or different kinds. The code segments may be downloaded or transferred from storage to a processor or container via an internal bus, another computer network, such as the Internet or an intranet, or via other wired or wireless networks.
- Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
Claims (18)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/179,344 US9158597B2 (en) | 2011-07-08 | 2011-07-08 | Controlling access to shared resource by issuing tickets to plurality of execution units |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/179,344 US9158597B2 (en) | 2011-07-08 | 2011-07-08 | Controlling access to shared resource by issuing tickets to plurality of execution units |
Publications (2)
Publication Number | Publication Date |
---|---|
US20130014120A1 true US20130014120A1 (en) | 2013-01-10 |
US9158597B2 US9158597B2 (en) | 2015-10-13 |
Family
ID=47439448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/179,344 Active 2032-07-16 US9158597B2 (en) | 2011-07-08 | 2011-07-08 | Controlling access to shared resource by issuing tickets to plurality of execution units |
Country Status (1)
Country | Link |
---|---|
US (1) | US9158597B2 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104199800A (en) * | 2014-07-21 | 2014-12-10 | 上海寰创通信科技股份有限公司 | Method for eliminating mutual exclusion of table items in multi-core system |
US9313208B1 (en) * | 2014-03-19 | 2016-04-12 | Amazon Technologies, Inc. | Managing restricted access resources |
WO2016126516A1 (en) * | 2015-02-02 | 2016-08-11 | Optimum Semiconductor Technologies, Inc. | Vector processor configured to operate on variable length vectors with asymmetric multi-threading |
WO2017018976A1 (en) * | 2015-07-24 | 2017-02-02 | Hewlett Packard Enterprise Development Lp | Lock manager |
US20190129846A1 (en) * | 2017-10-30 | 2019-05-02 | International Business Machines Corporation | Dynamic Resource Visibility Tracking to Avoid Atomic Reference Counting |
US10423464B2 (en) | 2016-09-30 | 2019-09-24 | Hewlett Packard Enterprise Patent Development LP | Persistent ticket operation |
US20200034214A1 (en) * | 2019-10-02 | 2020-01-30 | Juraj Vanco | Method for arbitration and access to hardware request ring structures in a concurrent environment |
US20200356485A1 (en) * | 2019-05-09 | 2020-11-12 | International Business Machines Corporation | Executing multiple data requests of multiple-core processors |
US11269692B2 (en) * | 2011-12-29 | 2022-03-08 | Oracle International Corporation | Efficient sequencer for multiple concurrently-executing threads of execution |
US11321146B2 (en) | 2019-05-09 | 2022-05-03 | International Business Machines Corporation | Executing an atomic primitive in a multi-core processor system |
US11681567B2 (en) * | 2019-05-09 | 2023-06-20 | International Business Machines Corporation | Method and processor system for executing a TELT instruction to access a data item during execution of an atomic primitive |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2532424B (en) | 2014-11-18 | 2016-10-26 | Ibm | An almost fair busy lock |
US10146689B2 (en) | 2017-01-20 | 2018-12-04 | Hewlett Packard Enterprise Development Lp | Locally poll flag in multi processing node system to determine whether a resource is free to use for thread |
CN113535412B (en) | 2020-04-13 | 2024-05-10 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer program product for tracking locks |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030195920A1 (en) * | 2000-05-25 | 2003-10-16 | Brenner Larry Bert | Apparatus and method for minimizing lock contention in a multiple processor system with multiple run queues |
US20040098723A1 (en) * | 2002-11-07 | 2004-05-20 | Zoran Radovic | Multiprocessing systems employing hierarchical back-off locks |
US20040215858A1 (en) * | 2003-04-24 | 2004-10-28 | International Business Machines Corporation | Concurrent access of shared resources |
US20070300226A1 (en) * | 2006-06-22 | 2007-12-27 | Bliss Brian E | Efficient ticket lock synchronization implementation using early wakeup in the presence of oversubscription |
US20080098180A1 (en) * | 2006-10-23 | 2008-04-24 | Douglas Larson | Processor acquisition of ownership of access coordinator for shared resource |
US20110252166A1 (en) * | 2009-01-23 | 2011-10-13 | Pradeep Padala | System and Methods for Allocating Shared Storage Resources |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7529844B2 (en) | 2002-04-26 | 2009-05-05 | Sun Microsystems, Inc. | Multiprocessing systems employing hierarchical spin locks |
US7698523B2 (en) | 2006-09-29 | 2010-04-13 | Broadcom Corporation | Hardware memory locks |
US8392925B2 (en) | 2009-03-26 | 2013-03-05 | Apple Inc. | Synchronization mechanisms based on counters |
US8838944B2 (en) | 2009-09-22 | 2014-09-16 | International Business Machines Corporation | Fast concurrent array-based stacks, queues and deques using fetch-and-increment-bounded, fetch-and-decrement-bounded and store-on-twin synchronization primitives |
-
2011
- 2011-07-08 US US13/179,344 patent/US9158597B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030195920A1 (en) * | 2000-05-25 | 2003-10-16 | Brenner Larry Bert | Apparatus and method for minimizing lock contention in a multiple processor system with multiple run queues |
US20040098723A1 (en) * | 2002-11-07 | 2004-05-20 | Zoran Radovic | Multiprocessing systems employing hierarchical back-off locks |
US20040215858A1 (en) * | 2003-04-24 | 2004-10-28 | International Business Machines Corporation | Concurrent access of shared resources |
US20070300226A1 (en) * | 2006-06-22 | 2007-12-27 | Bliss Brian E | Efficient ticket lock synchronization implementation using early wakeup in the presence of oversubscription |
US20080098180A1 (en) * | 2006-10-23 | 2008-04-24 | Douglas Larson | Processor acquisition of ownership of access coordinator for shared resource |
US20110252166A1 (en) * | 2009-01-23 | 2011-10-13 | Pradeep Padala | System and Methods for Allocating Shared Storage Resources |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11269692B2 (en) * | 2011-12-29 | 2022-03-08 | Oracle International Corporation | Efficient sequencer for multiple concurrently-executing threads of execution |
US9313208B1 (en) * | 2014-03-19 | 2016-04-12 | Amazon Technologies, Inc. | Managing restricted access resources |
CN104199800A (en) * | 2014-07-21 | 2014-12-10 | 上海寰创通信科技股份有限公司 | Method for eliminating mutual exclusion of table items in multi-core system |
KR102255313B1 (en) | 2015-02-02 | 2021-05-24 | 옵티멈 세미컨덕터 테크놀로지스 인코포레이티드 | Vector processor configured to operate on variable length vectors using asymmetric multi-threading |
WO2016126516A1 (en) * | 2015-02-02 | 2016-08-11 | Optimum Semiconductor Technologies, Inc. | Vector processor configured to operate on variable length vectors with asymmetric multi-threading |
KR20170110685A (en) * | 2015-02-02 | 2017-10-11 | 옵티멈 세미컨덕터 테크놀로지스 인코포레이티드 | A vector processor configured to operate on variable length vectors using asymmetric multi-threading; |
US10339094B2 (en) | 2015-02-02 | 2019-07-02 | Optimum Semiconductor Technologies, Inc. | Vector processor configured to operate on variable length vectors with asymmetric multi-threading |
WO2017018976A1 (en) * | 2015-07-24 | 2017-02-02 | Hewlett Packard Enterprise Development Lp | Lock manager |
US10423464B2 (en) | 2016-09-30 | 2019-09-24 | Hewlett Packard Enterprise Patent Development LP | Persistent ticket operation |
US20190129846A1 (en) * | 2017-10-30 | 2019-05-02 | International Business Machines Corporation | Dynamic Resource Visibility Tracking to Avoid Atomic Reference Counting |
US10621086B2 (en) * | 2017-10-30 | 2020-04-14 | International Business Machines Corporation | Dynamic resource visibility tracking to avoid atomic reference counting |
CN113767372A (en) * | 2019-05-09 | 2021-12-07 | 国际商业机器公司 | Executing multiple data requests of a multi-core processor |
US20200356485A1 (en) * | 2019-05-09 | 2020-11-12 | International Business Machines Corporation | Executing multiple data requests of multiple-core processors |
US11321146B2 (en) | 2019-05-09 | 2022-05-03 | International Business Machines Corporation | Executing an atomic primitive in a multi-core processor system |
US11681567B2 (en) * | 2019-05-09 | 2023-06-20 | International Business Machines Corporation | Method and processor system for executing a TELT instruction to access a data item during execution of an atomic primitive |
US20200034214A1 (en) * | 2019-10-02 | 2020-01-30 | Juraj Vanco | Method for arbitration and access to hardware request ring structures in a concurrent environment |
US11748174B2 (en) * | 2019-10-02 | 2023-09-05 | Intel Corporation | Method for arbitration and access to hardware request ring structures in a concurrent environment |
Also Published As
Publication number | Publication date |
---|---|
US9158597B2 (en) | 2015-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9158597B2 (en) | Controlling access to shared resource by issuing tickets to plurality of execution units | |
US7861042B2 (en) | Processor acquisition of ownership of access coordinator for shared resource | |
US8539486B2 (en) | Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode | |
US9996402B2 (en) | System and method for implementing scalable adaptive reader-writer locks | |
US9619303B2 (en) | Prioritized conflict handling in a system | |
US9170844B2 (en) | Prioritization for conflict arbitration in transactional memory management | |
JP3871305B2 (en) | Dynamic serialization of memory access in multiprocessor systems | |
US8015248B2 (en) | Queuing of conflicted remotely received transactions | |
US8689221B2 (en) | Speculative thread execution and asynchronous conflict events | |
JP5787629B2 (en) | Multi-processor system on chip for machine vision | |
US11461151B2 (en) | Controller address contention assumption | |
JP2000076217A (en) | Lock operation optimization system and method for computer system | |
US8141089B2 (en) | Method and apparatus for reducing contention for computer system resources using soft locks | |
US9747210B2 (en) | Managing a lock to a resource shared among a plurality of processors | |
EP3379421B1 (en) | Method, apparatus, and chip for implementing mutually-exclusive operation of multiple threads | |
US20110320659A1 (en) | Dynamic multi-level cache including resource access fairness scheme | |
CN106068497B (en) | Transactional memory support | |
US9442971B2 (en) | Weighted transaction priority based dynamically upon phase of transaction completion | |
Zhang et al. | Scalable adaptive NUMA-aware lock | |
CN112306703A (en) | Critical region execution method and device in NUMA system | |
WO2017131624A1 (en) | A unified lock | |
US11880304B2 (en) | Cache management using cache scope designation | |
EP2707793B1 (en) | Request to own chaining in multi-socketed systems | |
US8930628B2 (en) | Managing in-line store throughput reduction | |
CN117687744A (en) | Method for dynamically scheduling transaction in hardware transaction memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ROSS, JONATHAN;REEL/FRAME:026564/0939 Effective date: 20110707 |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034544/0001 Effective date: 20141014 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |