US7945911B1 - Barrier synchronization method and apparatus for work-stealing threads - Google Patents
Barrier synchronization method and apparatus for work-stealing threads Download PDFInfo
- Publication number
- US7945911B1 US7945911B1 US11/147,066 US14706605A US7945911B1 US 7945911 B1 US7945911 B1 US 7945911B1 US 14706605 A US14706605 A US 14706605A US 7945911 B1 US7945911 B1 US 7945911B1
- Authority
- US
- United States
- Prior art keywords
- thread
- threads
- subtasks
- stealing
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0253—Garbage collection, i.e. reclamation of unreferenced memory
Definitions
- This invention relates to computer systems, and more particularly to the load balancing of work distributed among a plurality of threads on a system.
- objects may have associated methods, which are routines that can be invoked by reference to the object.
- Objects may belong to a class, which is an organizational entity that may contain method code or other information shared by all objects belonging to that class.
- object may not be limited to such structures, but may additionally include structures with which methods and classes are not associated.
- object may be used to refer to a data structure represented in a computer system's memory. Other terms sometimes used for the same concept are record and structure.
- An object may be identified by a reference, a relatively small amount of information that can be used to access the object.
- a reference can be represented as a “pointer” or a “machine address,” which may require, for instance, sixteen, thirty-two, or sixty-four bits of information, although there are other ways to represent a reference.
- memory may be allocated to at least some objects dynamically. Note that not all systems or applications employ dynamic memory allocation. In some computer languages, for example, source programs must be so written that all objects to which the program's variables refer are bound to storage locations at compile time. This memory allocation approach, sometimes referred to as “static allocation,” is the policy traditionally used by the Fortran programming language, for example. Note that many systems may allow both static and dynamic memory allocation.
- Dynamic allocation has a number of advantages, among which is that the run-time system is able to adapt allocation to run-time conditions. For example, a programmer may specify that space should be allocated for a given object only in response to a particular run-time condition. For example, the C-language library function malloc( ) is often used for this purpose. Conversely, the programmer can specify conditions under which memory previously allocated to a given object can be reclaimed for reuse. For example, the C-language library function free( ) results in such memory reclamation. Because dynamic allocation provides for memory reuse, it facilitates generation of large or long-lived applications, which over the course of their lifetimes may employ objects whose total memory requirements would greatly exceed the available memory resources if they were bound to memory locations statically.
- Another kind of error may occur when an application reclaims memory for reuse even though the application still maintains at least one reference to that memory. If the reclaimed memory is reallocated for a different purpose, the application may inadvertently manipulate the same memory in multiple inconsistent ways. This kind of error is known as a “dangling reference,” because an application should not retain a reference to a memory location once that location is reclaimed. Explicit dynamic-memory management by using interfaces like malloc( )/free( ) often leads to these problems.
- Garbage collection Techniques used by systems that reclaim memory space automatically are commonly referred to as “garbage collection.”
- Garbage collectors operate by reclaiming space that they no longer consider “reachable”, i.e. that is unreachable.
- Statically allocated objects represented by a program's global variables are normally considered reachable throughout a program's life.
- Statically allocated objects are not ordinarily stored in the garbage collector's managed memory space, but they may contain references to dynamically allocated objects that are stored in the garbage collector's managed memory space, and these dynamically allocated objects are considered reachable.
- An object referred to in the processor's call stack is reachable, as is an object referred to by register contents. Also, an object referred to by any reachable object is reachable.
- garbage collectors are advantageous because, whereas a programmer working on a particular sequence of code may perform creditably in most respects with only local knowledge of the application, memory allocation and reclamation may require a global knowledge of the program. Specifically, a programmer dealing with a given sequence of code may know whether some portion of memory is still in use for that sequence of code, but it is considerably more difficult for the programmer to know what the rest of the application is doing with that memory. By tracing references from a “root set,” e.g., global variables, registers, and the call stack, automatic garbage collectors may obtain global knowledge in a methodical way. Garbage collectors relieve the programmer of the need to worry about the application's global state and thus the programmer can concentrate on local-state issues. The result is applications that are more robust, having fewer, or even no, dangling references and memory leaks.
- Garbage-collection mechanisms may be implemented by various parts and at various levels of a computing system. For example, some compilers, without the programmer's explicit direction, may additionally generate garbage collection code that automatically reclaims unreachable memory space. Even in this case, though, there is a sense in which the application does not itself provide the entire garbage collector. Specifically, the application will typically call upon the underlying operating system's memory-allocation functions, and the operating system may in turn take advantage of hardware that lends itself particularly to use in garbage collection. So a system may disperse the garbage-collection mechanism over a number of computer-system layers.
- FIG. 1 illustrates an exemplary system in which various levels of source code may result in the machine instructions that a processor executes.
- a programmer may produce source code 40 written in a high-level language.
- a compiler 42 typically converts that code into “class files.” These files include routines written in instructions, called “byte code” 44 , for a “virtual machine” that various processors may be software-configured to emulate. This conversion into byte code is generally separated in time from the byte code's execution, so FIG. 1 divides the sequence into a “compile-time environment” 46 separate from a “run-time environment” 48 , in which execution occurs.
- One example of a high level language for which compilers are available to produce such virtual-machine instructions is the JavaTM programming language. (Java is a trademark or registered trademark of Sun Microsystems, Inc., in the United States and other countries.)
- the class files' byte-code routines are executed by a processor under control of a virtual-machine process 50 .
- That process emulates a virtual machine from whose instruction set the byte code is drawn.
- the virtual-machine process 50 may be specified by code stored on a local disk or some other machine-readable medium from which it is read into RAM to configure the computer system to implement the garbage collector and otherwise act as a virtual machine.
- that code's persistent storage may instead be provided by a server system remote from the processor that implements the virtual machine, in which case the code would be transmitted to the virtual-machine-implementing processor.
- FIG. 1 depicts the virtual machine as including an “interpreter” 52 for that purpose.
- virtual-machine implementations may compile the byte codes concurrently with the resultant object code's execution, so FIG. 1 further depicts the virtual machine as additionally including a “just-in-time” compiler 54 .
- compiler 40 may not contribute to providing the garbage-collection function; garbage collection may instead be implemented as part of the virtual machine 50 's functionality.
- garbage collection may involve performing tasks that the garbage collector discovers dynamically. Since an object referred to by a reference in a reachable object is itself considered reachable, a collector that discovers a reachable object may find that it has further work to do, namely, following references in that object to determine whether the references refer to further objects. Note that other types of programs may also involve dynamically discovered tasks. Dynamically discovered tasks often cannot be performed as soon as they are discovered, so the program may maintain a list of discovered tasks to be performed.
- Computer systems typically provide for various types of concurrent operation.
- a user of a typical desktop computer may be simultaneously employing a word-processor program and an e-mail program together with a calculator program.
- a computer may one processor or several simultaneously operating processors, each of which may be operating on a different program.
- operating-system software typically causes that processor to switch from one program to another rapidly enough that the user cannot usually tell that the different programs are not really executing simultaneously.
- the different running programs are usually referred to as “processes” in this connection, and the change from one process to another is said to involve a “context switch.”
- In a context switch one process is interrupted, and the contents of the program counter, call stacks, and various registers are stored, including those used for memory mapping. Then the corresponding values previously stored for a previously interrupted process are loaded, and execution resumes for that process.
- Processor hardware and operating system software typically have special provisions for performing such context switches.
- a program running as a computer system process may take advantage of such provisions to provide separate, concurrent “threads” of its own execution.
- Switching threads is similar to switching processes: the current contents of the program counter and various register contents for one thread are stored and replaced with values previously stored for a different thread. But a thread change does not involve changing the memory mapping values, as a process change does, so the new thread of execution has access to the same process-specific physical memory as the same process's previous thread.
- multiple execution threads are merely a matter of programming convenience.
- compilers for various programming languages such as the JavaTM programming language, readily provide the “housekeeping” for spawning different threads, so the programmer is not burdened with all the details of making different threads' execution appear simultaneous.
- the use of multiple threads may provide speed advantages. A process may be performed more quickly if the system allocates different threads to different processors when processor capacity is available. To take advantage of this fact, programmers may identify constituent operations within their programs that particularly lend themselves to parallel execution. When a program reaches a point in its execution at which the parallel-execution operation can begin, the program may start different execution threads to perform different tasks within that operation.
- the initial, statically identifiable members of the root set may be divided among a plurality of threads (whose execution may be divided among many processors), and those threads may identify reachable objects in parallel.
- Each thread could maintain a list of the tasks that it has thus discovered dynamically, and it could proceed to perform all such tasks. However, much of the advantage of parallel processing may be lost if each thread performs only those tasks that it has itself discovered. Suppose, for example, that one thread of a garbage collector encounters many objects that contain many references but that other threads do not. This leaves one thread with many more tasks than the other threads. There could therefore be a significant amount of time during which that thread still has most of its tasks yet to be performed after the other threads have finished all of their tasks.
- such parallel-execution operations may be configured so that each thread may perform tasks that other threads have identified.
- different threads may be given access to some of the same task lists, and this means that their access to those lists must be synchronized to avoid inconsistency or at least duplication.
- a second thread may read that entry and proceed to perform the task that it specifies. In the absence of a synchronization mechanism, the first thread may then repeat the task unnecessarily.
- Synchronization mechanisms employed to prevent such untoward consequences typically involve atomically performing sets of machine instructions that are otherwise performed separately. Particularly in the multiprocessor systems in which parallel execution is especially advantageous, such “atomic” operations are expensive. Considerable work has therefore been done to minimize the frequency of their use.
- Various mechanisms may use a number of parallel threads or processors to perform a task. Each thread or processor may be assigned a set of subtasks, and may in some cases generate new subtasks to be performed. Load balancing, or distributing the subtasks so that all the threads or processors stay relatively busy, is commonly implemented by such mechanisms. Work stealing is one approach to load balancing among processors or threads.
- an operating system or a runtime system may support a certain number of processors, each with a dispatch queue. All of the threads schedule off a local dispatch queue. If any processors end up with no threads to run in their dispatch queue, an attempt may be made to steal a thread from another dispatch queue.
- OS operating system
- runtime system may support a certain number of processors, each with a dispatch queue. All of the threads schedule off a local dispatch queue. If any processors end up with no threads to run in their dispatch queue, an attempt may be made to steal a thread from another dispatch queue.
- This “owner” thread pushes and retrieves, or “pops,” entries onto and from an end of the deque arbitrarily referred to as its “bottom,” while any other, “stealer” thread is restricted to popping entries, and only from the other, or “top” end of the deque.
- These stealer-thread accesses involve atomic operations. However, most deque accesses are performed by the deque's owner thread, and the threads may be configured to avoid using atomic operations for pushing or, in most cases, popping.
- garbage collector thread may transitively mark objects through a heap. Initially, the tasks to be performed by a collector thread just identify those objects directly reachable from outside the heap. As those objects are marked and scanned for references to additional objects, new tasks may be generated and placed in the work queue, with each new task indicating a new object (or objects) to scan.
- Multithreaded garbage collection mechanisms may implement a “consensus barrier”, and may attempt to park or suspend threads that are unable to find work or to yield to other threads in the hopes that the scheduler will allow the threads that are not scheduled to make progress.
- a garbage collection technique to parallelize collection phases in a “stop-world” garbage collector is described in a paper by Flood, et al. in the Proceedings of the Java Virtual Machine Research and Technology Symposium, Monterey, April 2001 titled “Parallel Garbage Collection For Shared Memory Multiprocessors.” The general strategy of this technique is to:
- the consensus barrier conventionally requires all the threads to check in before allowing the application to restart.
- the scheduler (which may, for example, be a part of the operating system) may have other system threads to be scheduled, or threads for other applications running on the same machine to be managed, and thus may fail to start one or more of the worker threads in a timely fashion.
- FIGS. 2A through 2C illustrate an exemplary mechanism for scheduling several worker threads to perform a task apportioned to the threads into several “subtasks” in deques during an exemplary “stop world” operation, and using a consensus barrier to rendezvous the threads when done.
- An application (not shown) may be suspended during the stop world operation.
- An exemplary stop world operation is garbage collection, but note that a similar mechanism may be used for other types of operations.
- scheduler 100 may apportion the initial subtasks of the overall task among the deques 106 .
- Each deque may be associated with a particular worker thread 104 .
- scheduler 100 may be, but is not necessarily, a part of operating system software.
- Scheduler 100 may then start one or more of the threads 104 .
- scheduler 100 initially starts threads 104 A and 104 B. However, for some reason, scheduler 100 may not start thread 104 C. Threads 104 A and 104 B, once started by the scheduler 100 , begin performing subtasks from their respective deques 106 A and 106 B.
- a thread 104 may pop subtasks to be performed from the bottom of its associated deque 106 . If additional subtasks that need to be performed by the thread 104 are discovered during performance of one of the subtasks, the thread may push the newly discovered subtask onto the bottom of its associated deque 106 .
- a garbage collector worker thread may discover an object that needs to be evaluated which is referenced by another object being evaluated in performing a particular subtask, and may push a subtask onto the bottom of associated deque 106 for that discovered object.
- threads 104 A and 104 B continue to perform subtasks. However, thread 104 B has completed all subtasks in its deque 106 B, which is now empty. Thread 104 B may then attempt to “steal” work (subtasks) from other threads' deques 106 . According to the deque mechanism described in the paper by Arora et al., to “steal” work, a thread 104 may pop subtasks to be performed from the top of another thread's deque 106 . In this example, thread 104 B may steal work (subtasks) from deque 106 C. Note that thread 104 B may also steal work from deque 106 A.
- thread 104 A may also steal work from thread 104 C's deque 106 C.
- thread 104 B may discover new subtasks that are pushed onto the bottom of its associated deque 106 B.
- threads 104 A and 104 B have completed all subtasks of the overall tasks, either by performing subtasks from their associated deques 106 or by stealing subtasks from other threads 104 , such as thread 104 C.
- the thread “checks in” at consensus barrier 102 .
- both threads 104 A and 104 B have checked in at consensus barrier 102 .
- consensus barrier may be, but is not necessarily, something as simple as a count of all threads 106 that are scheduled to perform a task, and that checking in at consensus barrier 102 may include decrementing this count.
- thread 104 C has not yet been started by scheduler 100 , even though the task has been completed.
- consensus barrier 102 may still prevent the “stop world” operation from completing, even though the task is otherwise complete.
- the suspended application may have to wait for the scheduler to start thread 104 C, at which point thread 104 C would discover that it has no work to perform and thus checks in at consensus barrier 102 , allowing the “stop world” operation to complete.
- Embodiments of a method and apparatus for barrier synchronization of threads are described.
- Embodiments may provide a consensus barrier synchronization mechanism, or simply barrier synchronization mechanism, that allows a “stop world” operation being performed by two or more worker threads configured to “steal” work from other threads to complete, even if one or more of the threads are not scheduled/started by the thread scheduler and thus do not rendezvous or “check in” at a consensus barrier in a timely manner.
- portions (subtasks) of the overall task which were assigned to the tardy thread may have been completed by other work-stealing threads, and one of the other threads may check in the tardy thread at the consensus barrier upon determining that the thread is dormant and does not have any more apportioned work to be performed.
- the task being performed may be garbage collection for a process, such as an application, on a system, and the threads may be garbage collector worker threads.
- a thread may be checked in by another thread.
- the threads may be started with a state field for each thread stored in a vector or array, referred to herein as a thread state array, that includes one field or'entry for each worker thread.
- a thread may be in one of three states: dormant, active, or stealing. Initially, all threads are in a dormant state.
- the thread When a thread wakes up or is started by a scheduler to perform work on the task, the thread attempts to atomically change its dormant state to an active state in the thread state vector. If the thread is able to change its state to active, the thread may begin working on subtasks of the task that have been apportioned to it, and the thread has the responsibility for checking itself in at the consensus barrier when all available work has been completed.
- one or more of the threads may, for some reason, not be started, and thus may remain in a dormant state as indicated by the thread state array. If, when a thread is active and performing work, the thread needs to steal additional work from another thread, the thread first changes its state from active to stealing. The stealing thread then looks for a victim thread to steal work from. If, while looking for a victim thread, the stealing thread finds a thread whose work queue is empty (the work may already have been performed by the stealing thread or by other stealing threads), and whose state field indicates the thread is still in a dormant state, the stealing thread may change the state in the state field for the dormant thread to a stealing state.
- the stealing thread has taken over responsibility for checking in that thread at the consensus barrier.
- the stealing thread may decrement an active thread count of the consensus barrier (initially set to N, indicating the total number of threads initially scheduled to perform the task) so that the dormant thread is effectively checked in.
- N the total number of threads initially scheduled to perform the task
- its apportioned work may have been performed by one or more stealing threads, and one of the threads may have checked in at the consensus barrier for the dormant thread.
- the consensus barrier will have been met, and the suspended application may be able to resume operation in a timely manner.
- FIG. 1 illustrates an exemplary system in which various levels of source code may result in the machine instructions that a processor executes.
- FIGS. 2A through 2C illustrate an exemplary mechanism for scheduling several worker threads to perform a task apportioned to the threads into several “subtasks” in deques during an exemplary “stop world” operation, and using a consensus barrier to rendezvous the threads when done.
- FIGS. 3A through 3G illustrate an exemplary barrier synchronization mechanism for scheduling three worker threads to perform a task apportioned to the threads into several “subtasks” stored in deques during an exemplary “stop world” operation, and using a thread state array and consensus barrier with active thread count to rendezvous the threads, including threads that never wake up, according to one embodiment.
- FIGS. 4A and 4B are flowcharts of a barrier synchronization method for a multithread process such as a multithreaded garbage collector mechanism according to one embodiment.
- FIG. 5 illustrates a system implementing a barrier synchronization mechanism for a multithread process such as a multithreaded garbage collector mechanism according to one embodiment.
- Embodiments of a method and apparatus for barrier synchronization of threads are described.
- consensus barrier mechanisms may be used as a rendezvous for worker threads in which all the threads must rendezvous at the consensus barrier for the task to complete, allowing a suspended application to be restarted.
- conventional consensus barrier mechanisms have to wait for all the threads to rendezvous at the consensus barrier, including any threads that were not scheduled/started by the scheduler immediately. This may result in long delays, even multi-second delays, on even a moderately busy machine.
- Embodiments may provide a consensus barrier synchronization mechanism, or simply barrier synchronization mechanism, that allows a “stop world” operation being performed by two or more worker threads configured to “steal” work from other threads to complete, even if one or more of the threads are not scheduled/started by the thread scheduler and thus do not rendezvous or “check in” at a consensus barrier in a timely manner.
- portions (subtasks) of the overall task which were assigned to the tardy thread may have been completed by other work-stealing threads, and one of the other threads may check in the tardy thread at the consensus barrier upon determining that the thread is dormant and does not have any more apportioned work to be performed.
- states of the worker threads may be “memo-ized” or recorded in an array, vector, or similar data structure. This structure may be referred to herein as a thread state array. In one embodiment, three states for worker threads may be used:
- the barrier synchronization mechanism may implement more or fewer states.
- the names that are used are indicative of the states of the threads, other names for these states may be substituted.
- any of a variety of data structures may be used for such a thread state array, including, but not limited to: an array including an object, which may include one or more fields, for each thread; a bit field or vector wherein a certain number of bits are assigned to each thread to indicate the state of the thread; a vector of bytes wherein one or more bytes are assigned to each thread to indicate the state of the thread; and so on.
- a global active thread count may be used as a “countdown” consensus barrier. When a thread becomes a stealer thread and checks in at the consensus barrier, the active thread count is decremented. When the active thread count indicates that all threads have checked in, the “stop world” operation is completed and any application(s) that are suspended may be allowed to resume operations. Note that, in one embodiment, a “stealing” thread may decrement the active thread count for a dormant thread (a thread that has not started) and that has no more work to do, as that work has been “stolen” by one or more work-stealing threads.
- work may be distributed to the N worker threads before the worker threads are started.
- worker threads may instead or in addition to the above accumulate work by taking from a global/shared source (for example, the scanning of thread stacks and other root sources).
- the threads are all in a “dormant” state as indicated in the thread state array, and the active thread count is initialized to N (the number of threads that have been apportioned work to do).
- each worker thread may atomically change its state (e.g., using a suitable atomic instruction such as load-store-unsigned-byte (LDSTUB), compare-and-swap (CAS), or load-linked/store-conditionally (LL/SC)) from dormant to active in the thread state array. If the change of state is successful, the thread continues and proceeds to perform work (e.g., from subtasks stored in an associated deque).
- a suitable atomic instruction such as load-store-unsigned-byte (LDSTUB), compare-and-swap (CAS), or load-linked/store-conditionally (LL/SC)
- worker threads that exhaust their own sources of work may steal work from other “victim” threads.
- a worker thread may examine the thread state array to determine the states of other threads, looking for threads that are either dormant or active (stealing threads may have no more work to do in their own deques), and may also examine the work assigned to the other threads, for example subtasks in the threads' associated deques. If another thread has locally assigned, and stealable, work, the thread may attempt to steal work from the other thread.
- the stealing thread may compete to atomically change that victim thread's state in the thread state array to “stealing”. In one embodiment, if the thread succeeds in changing the state of the victim thread to “stealing”, the thread may then decrement the active thread count, effectively “checking in” the victim thread at the consensus barrier. In another embodiment, the thread may just change the state of the victim thread to “stealing” and not immediately decrement the active thread count.
- a thread (possibly, but not necessarily, the same thread) or, alternatively, some other process may check the state of the threads in the thread state array and, if all threads are in the “stealing” state, decrement the active count accordingly.
- the victim thread may check its state in the thread state array to discover that it is in a “stealing” state. This may indicate to the thread that it has already been checked in at the consensus barrier by a stealing thread, and that the thread is now in a stealing state.
- the victim (now stealing) thread may decrement the active thread count if there is no more work for it to do (note that one or more other threads may still be active).
- the thread may check to see if it has work to do (e.g., entries in its associated deque) and, if not, may itself attempt to steal work from other threads (changing its state to “stealing” in the thread state array).
- the thread may check the thread state array to see if there are any active (or dormant) threads from which work may be stolen.
- the thread may check the active thread count of the consensus barrier to determine if there may be work left to do and, if so, attempt to steal work from other threads.
- the thread may attempt to change its state in the thread state array to “active”. In one embodiment, the thread may increment the active thread count to correctly reflect the number of active threads, if the active thread count was decremented by the stealer thread that marked the thread as “stealing”. The thread may then perform whatever work it can find to do and check in at the consensus barrier as normal when no more work is found for it to perform. If no work is left to do, then the thread may terminate or go back into a dormant state.
- “tardy” threads may not prevent the active threads from performing all the work and rendezvousing on the consensus barrier.
- the consensus barrier active thread count indicates that all threads have checked in, even if one or more threads have not been started by the scheduler.
- the “stop world” operation may then be ended, allowing any suspended applications to continue operations.
- embodiments of the barrier synchronization mechanism enable work-stealing consensus barriers to reach a consensus that all worker threads have rendezvoused even if some worker threads were never scheduled/started by the scheduler.
- Embodiments may provide a simpler, and faster, rendezvous mechanism for work-stealing threads than conventional mechanisms. This may be especially advantageous for smaller tasks, but note that these advantages may extend to larger tasks as well.
- Embodiments also provide a simple summary structure (the thread state array) that may be examined by stealing threads to identify possible victims while minimizing the impact on the stealing threads' hardware caches, which may be beneficial, for example, on a processor supporting multiple hardware threads or cores sharing a cache that has limited cache set-associativity such as the Niagara CMT processor.
- Embodiments of the barrier synchronization mechanism may be particularly useful, for example, for garbage collectors and similar mechanisms whose parallel phases are sufficiently short in duration that scheduling of threads may have a significant impact on scaling. While garbage collection is used as an exemplary application for embodiments of the barrier synchronization mechanism, note that embodiments may be used in any multithreaded process configured to allocate portions of a task to multiple threads.
- FIGS. 3A through 3G illustrate an exemplary barrier synchronization mechanism for scheduling three worker threads to perform a task apportioned to the threads into several “subtasks” stored in deques during an exemplary “stop world” operation, and using a thread state array and consensus barrier with active thread count to rendezvous the threads, including threads that never wake up, according to one embodiment.
- the threads depicted in FIGS. 3A through 3G may be worker threads of a garbage collection mechanism, as previously described.
- deques are used as an exemplary mechanism for storing subtasks to be performed by worker threads, some embodiments may use other mechanisms to store subtasks for worker threads.
- work may be distributed to the worker threads before the worker threads are started.
- worker threads may instead or in addition to the above accumulate work by taking from a global/shared source (for example, the scanning of thread stacks and other root sources).
- a global/shared source for example, the scanning of thread stacks and other root sources.
- scheduler 200 starts threads 204 A and 204 B, but for some reason does not start thread 204 C.
- worker threads 204 A and 204 B may atomically change their states (e.g., using LDSTUB, CAS, or LL/SC) from dormant to active in the thread state array 208 . If the change of state is successful, each thread 204 continues and proceeds to perform work (e.g., from subtasks stored in a deque 206 associated with the thread 204 ).
- worker threads that exhaust their own sources of work may steal work from other “victim” threads.
- thread 204 B has emptied its associated deque 206 B, and thus may look for other work to steal from other threads.
- thread 204 B may change its state in thread state array 208 from “active” to “stealing”.
- worker thread 204 B may examine the work assigned to the other threads, for example by examining the threads' associated deques 206 .
- worker thread 204 B may examine the thread state array 208 to determine the states of other threads, looking for threads 204 that are either dormant or active (stealing threads may have no more work to do in their own deques).
- a stealer thread finds another (active or dormant) thread that has locally assigned, and stealable, work, the thread may attempt to steal work from the other thread.
- thread 204 B finds work to do in dormant thread 204 C's deque 206 C, and steals work from the deque by “popping” tasks off the top of the deque 206 C.
- thread 204 B may also steal work from active thread 204 A's deque 206 A.
- thread 204 B may change its state in thread state array 208 to indicate that it is active (and may again change its state back to “stealing” when the stolen work is completed).
- thread 204 B may increment the active thread count and then attempt to steal for from the deque 206 C. If thread 204 B is successful in stealing work from the deque, the active thread count thus reflects that thread 204 B is active. If thread 204 B fails in stealing work from the deque, thread 204 B may then decrement the active thread count and again looks for a victim thread to attempt from which to steal work.
- a stealer thread finds a potential victim thread that does not have work available (e.g., if its deque 206 is empty), and the victim thread is in a “dormant” state as indicated by the thread state array 208 , the stealing thread may compete to atomically change that victim thread's state in the thread state array to “stealing”.
- thread 204 B may examine the state of thread 204 C in thread state array 208 to determine that thread 204 C is dormant, and may examine the deque 206 C associated with thread 204 C, determining that thread 204 C has no more work to be performed in deque 206 C.
- FIG. 3D thread 204 B may examine the state of thread 204 C in thread state array 208 to determine that thread 204 C is dormant, and may examine the deque 206 C associated with thread 204 C, determining that thread 204 C has no more work to be performed in deque 206 C.
- thread 204 B may change the state of thread 204 C to “stealing” in thread state array 208 , and in one embodiment may decrement the consensus barrier 202 active thread count, which now indicates two active threads. This effectively “checks in” thread 204 C. Note that, in FIG. 3D , thread 204 A is still active and performing work from its deque 206 A.
- thread 204 B may just change the state of the victim thread to “stealing” and not immediately decrement the active thread count.
- a thread possibly, but not necessarily, the same thread
- some other process may check the state of the threads in the thread state array and, if all threads are in the “stealing” state, decrement the active count accordingly.
- thread 204 C may check its state in thread state array 208 and determine that it is in a stealing state, and thus know that it has been checked in at the consensus barrier 202 by another thread 204 . In one embodiment, if there is still work to be done, thread 204 C may then try to steal work itself, and in one embodiment may correct (increment) the active thread count to indicate that it is now “active”, later decrementing the active thread count when it can find no more work to do. Note that, in one embodiment, if thread 204 C finds work to do, it may change its state to “active” in the thread state array 208 while performing the work.
- thread 204 B may find no more work to steal. Thread 204 B may thus check itself in at the consensus barrier 202 by decrementing the active thread count, which now indicates one active thread (thread 204 A).
- thread 204 A may exhaust its deque 206 A, finishing the last subtask from the deque, and thus may change its state to “stealing” in the thread state array 208 .
- thread 204 A may thread 204 A may check itself in at the consensus barrier 202 by decrementing the active thread count, which now indicates zero active threads. Thread 204 A may then atomically examine the active thread count and, finding that the count is zero, determine that there is no more work to be performed.
- FIGS. 3A through 3G illustrate an exemplary embodiment, and are not intended to be limiting. Also note that the order of the operations in FIGS. 3A and 3G may be different. For example, thread 204 A may check itself in before thread 204 B checks itself (or thread 204 C) in at the consensus barrier 202 .
- the system knows that the task is complete, and thus the “stop world” process may be ended and any suspended applications may resume work. Also note that this may occur even though one or more threads, such as thread 204 C, never “woke up” and thus never checked in at the consensus barrier 202 , instead being checked in by one or more other threads.
- threads that remain dormant for an inordinate amount of time do not prevent the task from being completed, and do not cause the consensus barrier to prevent suspended applications from restarting once the overall task has been completed.
- FIGS. 4A and 4B are flowcharts of a barrier synchronization method for a multithread process such as a multithreaded garbage collector mechanism according to one embodiment.
- FIG. 4A is a flowchart of the initial configuration and initiation of the process.
- subtasks of a task to be performed may be apportioned among N threads.
- each thread may have an associated deque (or other structure) for storing subtasks to be performed.
- the thread state array indicates that all threads are “dormant”, and the active thread count of the consensus barrier indicates that all threads are active (i.e. the active thread count is initially N).
- a scheduler starts one or more of the N threads.
- the process may be a “stop world” process in which, for example, an application is suspended while the process is being performed by the threads.
- the started threads may atomically change their states to “active” in the thread state array.
- the scheduler may start all of the N threads.
- a thread completes its work (which may include stealing available work from other threads, in the process of which a thread may change its state to “stealing” in the thread state array)
- the thread checks in at a consensus barrier (e.g., by decrementing the active thread count).
- a consensus barrier e.g., by decrementing the active thread count.
- the scheduler may be a component of a larger overall system, and may be interrupted to manage threads for other processes, including system threads and other application threads.
- embodiments of the barrier synchronization mechanism may provide a mechanism for “stealing” threads to check in dormant threads that have no work left to do at the consensus barrier, allowing the “stop world” process to complete, and applications to resume, in a timely manner.
- FIG. 4B illustrates a method for a thread that exhausts its work to “steal” work from other threads, and in so doing to attempt to “check in” dormant threads that have no work left to do at the consensus barrier, if any such dormant threads are found.
- the consensus barrier tracks the number of active threads as an active thread count.
- the active thread count is initialized to N, where N represents the total number of threads that have been apportioned work for the task.
- an active thread count of 0 indicates to all threads that all work has been done and thus the task is complete.
- stealing threads may use the active thread count maintained at the consensus barrier to determine when to stop work on the task, as well as to indicate that they are about to steal.
- a thread may exhaust the work from its associated deque.
- the thread may become a “stealing” thread, and in so doing may atomically change its state to “stealing” in the thread state array.
- the stealing thread may then decrement the active thread count to indicate that it is stealing and not active.
- the stealing thread may then atomically check the active thread count and, if the active thread count is 0, the thread knows that the task is complete (i.e., that all other threads have checked in or been checked in at the consensus barrier), and so no further action may be required of the thread, as indicated at 408 .
- the thread may then check the state of other threads in the thread state array looking for a potential victim thread (a dormant or active thread) to steal work from.
- a potential victim thread a dormant or active thread
- the “stealing” thread may ignore all threads whose state is “stealing” in the thread state array.
- the “stealing” thread may examine the deques of other threads to see if any threads have stealable work, and if a thread is found that does have stealable work, the “stealing” thread may then check the potential victim thread's state in the thread state array.
- the stealing thread may or may not find a dormant or an active thread as a potential victim to steal work from, as indicated at 412 . If, at 412 , a victim thread is not found, then the stealing thread may return to 406 to check the active thread count, and proceed again from there according to the value of the active thread count. In one embodiment, a thread returning to 406 may keep track of the number of attempts to locate a victim thread and, in some embodiments, may use an exponential-backoff delay technique for which the duration increases as the number of repeated attempts increases. If, at 412 , a victim thread is found, then the stealing thread increments the active thread count at the consensus barrier, as indicated at 414 . This may indicate that the stealing thread potentially may find more work to do, and may indicate to other threads that may check the consensus barrier that the task may not be complete.
- the stealing thread may attempt to steal work from the active thread, if any is available, as indicated at 418 .
- the steal thread may atomically change its state in the thread state array to “active” as indicated at 440 , and then may begin work on the stolen task as indicated at 450 .
- the stealing thread may atomically change its state in the thread state array to “active” after a successful steal.
- the stealing thread may atomically change its state in the thread state array to “active” when the first work item is generated and pushed onto its deque.
- the steal attempt may atomically decrement the active thread count at the consensus barrier, as indicated at 404 , and continue again from there.
- the stealing thread may determine if the dormant thread has any work in its deque that may be stealable, as indicated at 422 . If, at 422 , the dormant thread does have work in its deque, then the stealing thread may attempt to steal work from the dormant thread, as indicated at 424 . At 420 , if the steal attempt is successful, then the stealing thread may begin work on the stolen task, as indicated at 450 . At 420 , if the steal attempt is not successful, then the stealing thread may atomically decrement the active thread count at the consensus barrier, as indicated at 404 , and continue again from there.
- the stealing thread may attempt to change the dormant thread's state to “stealing” in the thread state array, as indicated at 426 .
- the stealing thread may atomically decrement the active thread count (for itself) at the consensus barrier, as indicated at 404 , and continue again from there.
- the stealing thread may then decrement the active thread count for the formerly dormant, now “stealing”, thread to effectively check in the thread at the consensus barrier, as indicated at 430 .
- the stealing thread may then atomically decrement the active thread count (for itself) at the consensus barrier, as indicated at 404 , and continue again from there.
- a “workless” dormant thread may be checked in by another (stealing) thread, and will thus not prevent the “stop world” process from completing while waiting for the dormant thread to wake up and check in.
- a stealing thread may repeatedly steal work from a dormant thread's deque until the dormant thread's deque is empty.
- a stealing thread that subsequently examines the dormant thread looking for stealable work may then discover that the dormant thread has no more work, and may then attempt to change the dormant thread's state and to check in the dormant thread at the consensus barrier, as described above.
- a thread examining the thread state array may discover that all other threads are marked as “stealing”, and thus would know that there is no more stealable work. The thread may then check in at the consensus barrier to allow the “stop world” process to end and any suspended applications to resume.
- FIG. 5 illustrates a system implementing a barrier synchronization mechanism for a multithread process such as a multithreaded garbage collector mechanism according to one embodiment.
- System 250 may be any of various types of devices, including, but not limited to, a personal computer system, desktop computer, laptop or notebook computer, server computer, mainframe computer system, workstation, network computer, Portable Digital Assistant (PDA), smartphone, or other suitable device.
- System 250 may include at least one processor 252 .
- the processor 252 may be coupled to a memory 254 .
- Memory 254 is representative of various types of possible memory media, also referred to as “computer readable media” or “computer-accessible media.” Hard disk storage, floppy disk storage, removable disk storage, flash memory and random access memory (RAM) are examples of memory media.
- memory and “memory medium” may include an installation medium, e.g., a CD-ROM or floppy disk, a computer system memory such as DRAM, SRAM, EDO RAM, SDRAM, DDR SDRAM, Rambus RAM, etc., or a non-volatile memory such as a magnetic media, e.g., a hard drive or optical storage.
- the memory medium may include other types of memory as well, or combinations thereof.
- System 250 may include, in memory 254 , an instance of a barrier synchronization mechanism 260 as described herein.
- Memory 254 may also include a thread scheduler 200 (which may be part of an operating system on system 250 ), one or more applications 270 , two or more threads 204 , and two or more “deques” 206 or other structures for storing tasks to be performed for each thread 204 .
- Barrier synchronization mechanism 260 may allow a “stop world” operation, for example on application 270 , being performed by two or more worker threads 204 configured to “steal” work from other threads to complete, even if one or more of the threads 204 are not scheduled/started by the thread scheduler 200 and thus do not rendezvous or “check in” at consensus barrier 202 in a timely manner, even though the task being performed, including portions (subtasks) of the overall task which were assigned to the tardy thread 204 , has been completed by the other threads 204 .
- states of the worker threads 204 may be “memo-ized” or recorded in thread state array 208 .
- three states for worker threads may be used: dormant, active, and stealing.
- a global active thread count may be used as a countdown consensus barrier. When a thread completes and checks in at the consensus barrier, the active thread count is decremented. When the active thread count indicates that all threads 204 have checked in, the “stop world” operation is completed and any application(s) 270 that are suspended may be allowed to resume operations. Note that, in one embodiment, a “stealing” thread 204 may decrement the active thread count for a dormant thread (a thread that has not started) that has no more work to do.
- work may be distributed to the N worker threads 204 before the worker threads are started.
- the threads 204 are all in a “dormant” state in thread state array 208 and the active thread count is N.
- each worker thread may atomically change its state from dormant to active in the thread state array 208 . If the change of state is successful, the thread 204 continues and proceeds to perform work (e.g., from subtasks stored in its associated deque 206 ).
- worker threads 204 that exhaust their own sources of work may steal work from other “victim” threads.
- a worker thread may examine the thread state array 208 to determine the states of other threads, looking for threads that are either dormant or active (stealing threads may have no more work to do in their deques 206 ), and may also examine the work assigned to the other threads, for example subtasks in the threads' associated deques 206 . If another thread 204 has locally assigned, and stealable, work, the thread may attempt to steal work from the other thread.
- the stealing thread 204 may compete to atomically change that victim thread's state in the thread state array 208 to “stealing”. In one embodiment, if the thread succeeds in changing the state of the victim thread to “stealing”, the thread may then decrement the active thread count, effectively “checking in” the victim thread at the consensus barrier 202 . In another embodiment, the thread may just change the state of the victim thread to “stealing” and not immediately decrement the active thread count.
- a thread (possibly, but not necessarily, the same thread) or, alternatively, some other process may check the state of the threads in the thread state array 208 and, if all threads are in the “stealing” state, decrement the active count accordingly.
- a computer-accessible medium may include storage media or memory media such as magnetic or optical media, e.g., disk or CD-ROM, volatile or non-volatile media such as RAM (e.g. SDRAM, DDR, RDRAM, SRAM, etc.), ROM, etc.
- storage media or memory media such as magnetic or optical media, e.g., disk or CD-ROM
- volatile or non-volatile media such as RAM (e.g. SDRAM, DDR, RDRAM, SRAM, etc.), ROM, etc.
- transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as network and/or a wireless link.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
-
- assume N worker threads are blocked waiting for a request
- start N threads on a task (each thread may perform one or more subtasks of the task, as assigned)
- rendezvous threads on a consensus barrier when the task is done
-
- dormant
- active
- stealing
Claims (54)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/147,066 US7945911B1 (en) | 2005-06-03 | 2005-06-03 | Barrier synchronization method and apparatus for work-stealing threads |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/147,066 US7945911B1 (en) | 2005-06-03 | 2005-06-03 | Barrier synchronization method and apparatus for work-stealing threads |
Publications (1)
Publication Number | Publication Date |
---|---|
US7945911B1 true US7945911B1 (en) | 2011-05-17 |
Family
ID=43981683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/147,066 Active 2030-03-15 US7945911B1 (en) | 2005-06-03 | 2005-06-03 | Barrier synchronization method and apparatus for work-stealing threads |
Country Status (1)
Country | Link |
---|---|
US (1) | US7945911B1 (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080301677A1 (en) * | 2007-05-30 | 2008-12-04 | Samsung Electronics Co., Ltd. | Apparatus and method for parallel processing |
US20090063687A1 (en) * | 2007-08-28 | 2009-03-05 | Red Hat, Inc. | Hybrid connection model |
US20100011362A1 (en) * | 2008-07-14 | 2010-01-14 | Michael Maged M | Methods for single-owner multi-consumer work queues for repeatable tasks |
US20100269110A1 (en) * | 2007-03-01 | 2010-10-21 | Microsoft Corporation | Executing tasks through multiple processors consistently with dynamic assignments |
US20100333107A1 (en) * | 2009-06-26 | 2010-12-30 | Microsoft Corporation | Lock-free barrier with dynamic updating of participant count |
US20110119469A1 (en) * | 2009-11-13 | 2011-05-19 | International Business Machines Corporation | Balancing workload in a multiprocessor system responsive to programmable adjustments in a syncronization instruction |
US20110173629A1 (en) * | 2009-09-09 | 2011-07-14 | Houston Michael | Thread Synchronization |
US20110296420A1 (en) * | 2010-05-25 | 2011-12-01 | Anton Pegushin | Method and system for analyzing the performance of multi-threaded applications |
US20120066683A1 (en) * | 2010-09-09 | 2012-03-15 | Srinath Nadig S | Balanced thread creation and task allocation |
US20120304178A1 (en) * | 2011-05-24 | 2012-11-29 | International Business Machines Corporation | Concurrent reduction optimizations for thieving schedulers |
US20130212593A1 (en) * | 2012-02-10 | 2013-08-15 | International Business Machines Corporation | Controlled Growth in Virtual Disks |
US20130247069A1 (en) * | 2012-03-15 | 2013-09-19 | International Business Machines Corporation | Creating A Checkpoint Of A Parallel Application Executing In A Parallel Computer That Supports Computer Hardware Accelerated Barrier Operations |
US20140298352A1 (en) * | 2013-03-26 | 2014-10-02 | Hitachi, Ltd. | Computer with plurality of processors sharing process queue, and process dispatch processing method |
US20150074682A1 (en) * | 2013-09-11 | 2015-03-12 | Fujitsu Limited | Processor and control method of processor |
US9195520B2 (en) | 2007-08-28 | 2015-11-24 | Red Hat, Inc. | Event driven sendfile |
US9317290B2 (en) | 2007-05-04 | 2016-04-19 | Nvidia Corporation | Expressing parallel execution relationships in a sequential programming language |
WO2016094016A1 (en) * | 2014-12-12 | 2016-06-16 | Intel Corporation | Technologies for efficient synchronization barriers with work stealing support |
US10009249B2 (en) | 2014-12-12 | 2018-06-26 | International Business Machines Corporation | System with on-demand state for applications |
US20190205178A1 (en) * | 2017-01-24 | 2019-07-04 | Oracle International Corporation | Distributed graph processing system featuring interactive remote control mechanism including task cancellation |
CN110908794A (en) * | 2019-10-09 | 2020-03-24 | 上海交通大学 | Task stealing method and system based on task stealing algorithm |
US20210286646A1 (en) * | 2019-09-15 | 2021-09-16 | Mellanox Technologies, Ltd. | Task completion system |
US11200164B2 (en) * | 2014-09-10 | 2021-12-14 | Oracle International Corporation | Coordinated garbage collection in distributed systems |
US20220164282A1 (en) * | 2020-11-24 | 2022-05-26 | International Business Machines Corporation | Reducing load balancing work stealing |
US20220171704A1 (en) * | 2019-05-31 | 2022-06-02 | Intel Corporation | Avoidance of garbage collection in high performance memory management systems |
US11461130B2 (en) | 2020-05-26 | 2022-10-04 | Oracle International Corporation | Methodology for fast and seamless task cancelation and error handling in distributed processing of large graph data |
US11822973B2 (en) | 2019-09-16 | 2023-11-21 | Mellanox Technologies, Ltd. | Operation fencing system |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030005029A1 (en) * | 2001-06-27 | 2003-01-02 | Shavit Nir N. | Termination detection for shared-memory parallel programs |
US20030005025A1 (en) * | 2001-06-27 | 2003-01-02 | Shavit Nir N. | Load-balancing queues employing LIFO/FIFO work stealing |
US20030005114A1 (en) | 2001-06-27 | 2003-01-02 | Shavit Nir N. | Globally distributed load balancing |
US6526422B1 (en) * | 2000-05-15 | 2003-02-25 | Sun Microsystems, Inc. | Striding-type generation scanning for parallel garbage collection |
US20040128401A1 (en) * | 2002-12-31 | 2004-07-01 | Michael Fallon | Scheduling processing threads |
US6823351B1 (en) | 2000-05-15 | 2004-11-23 | Sun Microsystems, Inc. | Work-stealing queues for parallel garbage collection |
US6826583B1 (en) | 2000-05-15 | 2004-11-30 | Sun Microsystems, Inc. | Local allocation buffers for parallel garbage collection |
US7069281B2 (en) * | 2003-02-24 | 2006-06-27 | Sun Microsystems, Inc. | Efficient collocation of evacuated objects in a copying garbage collector using variably filled local allocation buffers |
US7321989B2 (en) * | 2005-01-05 | 2008-01-22 | The Aerospace Corporation | Simultaneously multithreaded processing and single event failure detection method |
US7581222B2 (en) * | 2003-02-20 | 2009-08-25 | International Business Machines Corporation | Software barrier synchronization |
-
2005
- 2005-06-03 US US11/147,066 patent/US7945911B1/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6526422B1 (en) * | 2000-05-15 | 2003-02-25 | Sun Microsystems, Inc. | Striding-type generation scanning for parallel garbage collection |
US6823351B1 (en) | 2000-05-15 | 2004-11-23 | Sun Microsystems, Inc. | Work-stealing queues for parallel garbage collection |
US6826583B1 (en) | 2000-05-15 | 2004-11-30 | Sun Microsystems, Inc. | Local allocation buffers for parallel garbage collection |
US20030005029A1 (en) * | 2001-06-27 | 2003-01-02 | Shavit Nir N. | Termination detection for shared-memory parallel programs |
US20030005025A1 (en) * | 2001-06-27 | 2003-01-02 | Shavit Nir N. | Load-balancing queues employing LIFO/FIFO work stealing |
US20030005114A1 (en) | 2001-06-27 | 2003-01-02 | Shavit Nir N. | Globally distributed load balancing |
US20040128401A1 (en) * | 2002-12-31 | 2004-07-01 | Michael Fallon | Scheduling processing threads |
US7581222B2 (en) * | 2003-02-20 | 2009-08-25 | International Business Machines Corporation | Software barrier synchronization |
US7069281B2 (en) * | 2003-02-24 | 2006-06-27 | Sun Microsystems, Inc. | Efficient collocation of evacuated objects in a copying garbage collector using variably filled local allocation buffers |
US7321989B2 (en) * | 2005-01-05 | 2008-01-22 | The Aerospace Corporation | Simultaneously multithreaded processing and single event failure detection method |
Non-Patent Citations (4)
Title |
---|
Arora, et al., "Thread Scheduling for Multiprogrammed Multiprocessors," Proceedings of the Tenth Annual ACM Symposium of Parallel Algorithms and Architectures, Jun. 1998, 11 pages. |
Blumofe, et al., "Scheduling Multithreaded Computations by Work Stealing," 1994, Proceedings of the 35th Annual IEEE Conference on Foundations of Computer Science, 13 pages. |
Cheng, et al., "A Parallel, Real-Time Garbage Collector," ACM 2001, pp. 125-136. |
Flood, et al., "Parallel Garbage Collection for Shared Memory Multiprocessors," sun.com/research/jtech, Apr. 2001, USENIX JVM Conference, 10 pages. |
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8112751B2 (en) * | 2007-03-01 | 2012-02-07 | Microsoft Corporation | Executing tasks through multiple processors that process different portions of a replicable task |
US20100269110A1 (en) * | 2007-03-01 | 2010-10-21 | Microsoft Corporation | Executing tasks through multiple processors consistently with dynamic assignments |
US9317290B2 (en) | 2007-05-04 | 2016-04-19 | Nvidia Corporation | Expressing parallel execution relationships in a sequential programming language |
US8595726B2 (en) * | 2007-05-30 | 2013-11-26 | Samsung Electronics Co., Ltd. | Apparatus and method for parallel processing |
US20080301677A1 (en) * | 2007-05-30 | 2008-12-04 | Samsung Electronics Co., Ltd. | Apparatus and method for parallel processing |
US9195520B2 (en) | 2007-08-28 | 2015-11-24 | Red Hat, Inc. | Event driven sendfile |
US20090063687A1 (en) * | 2007-08-28 | 2009-03-05 | Red Hat, Inc. | Hybrid connection model |
US20100011362A1 (en) * | 2008-07-14 | 2010-01-14 | Michael Maged M | Methods for single-owner multi-consumer work queues for repeatable tasks |
US8266394B2 (en) * | 2008-07-14 | 2012-09-11 | International Business Machines Corporation | Methods for single-owner multi-consumer work queues for repeatable tasks |
US20100333107A1 (en) * | 2009-06-26 | 2010-12-30 | Microsoft Corporation | Lock-free barrier with dynamic updating of participant count |
US8924984B2 (en) * | 2009-06-26 | 2014-12-30 | Microsoft Corporation | Lock-free barrier with dynamic updating of participant count |
US9952912B2 (en) | 2009-06-26 | 2018-04-24 | Microsoft Technology Licensing, Llc | Lock-free barrier with dynamic updating of participant count using a lock-free technique |
US8832712B2 (en) * | 2009-09-09 | 2014-09-09 | Ati Technologies Ulc | System and method for synchronizing threads using shared memory having different buffer portions for local and remote cores in a multi-processor system |
US20110173629A1 (en) * | 2009-09-09 | 2011-07-14 | Houston Michael | Thread Synchronization |
US20110119469A1 (en) * | 2009-11-13 | 2011-05-19 | International Business Machines Corporation | Balancing workload in a multiprocessor system responsive to programmable adjustments in a syncronization instruction |
US9733831B2 (en) | 2009-11-13 | 2017-08-15 | Globalfoundries Inc. | Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses |
US8832403B2 (en) | 2009-11-13 | 2014-09-09 | International Business Machines Corporation | Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses |
US20110119470A1 (en) * | 2009-11-13 | 2011-05-19 | International Business Machines Corporation | Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses |
US20110296420A1 (en) * | 2010-05-25 | 2011-12-01 | Anton Pegushin | Method and system for analyzing the performance of multi-threaded applications |
US9183109B2 (en) * | 2010-05-25 | 2015-11-10 | Intel Corporation | Method and system for analyzing the performance of multi-threaded applications |
US20120066683A1 (en) * | 2010-09-09 | 2012-03-15 | Srinath Nadig S | Balanced thread creation and task allocation |
US20120304178A1 (en) * | 2011-05-24 | 2012-11-29 | International Business Machines Corporation | Concurrent reduction optimizations for thieving schedulers |
US8930955B2 (en) * | 2012-02-10 | 2015-01-06 | International Business Machines Corporation | Controlling growth in virtual disks via utilization of previously used and free disk block space |
US20130212593A1 (en) * | 2012-02-10 | 2013-08-15 | International Business Machines Corporation | Controlled Growth in Virtual Disks |
US20130247069A1 (en) * | 2012-03-15 | 2013-09-19 | International Business Machines Corporation | Creating A Checkpoint Of A Parallel Application Executing In A Parallel Computer That Supports Computer Hardware Accelerated Barrier Operations |
US9619277B2 (en) * | 2013-03-26 | 2017-04-11 | Hitachi, Ltd. | Computer with plurality of processors sharing process queue, and process dispatch processing method |
US20140298352A1 (en) * | 2013-03-26 | 2014-10-02 | Hitachi, Ltd. | Computer with plurality of processors sharing process queue, and process dispatch processing method |
US20150074682A1 (en) * | 2013-09-11 | 2015-03-12 | Fujitsu Limited | Processor and control method of processor |
US9626230B2 (en) * | 2013-09-11 | 2017-04-18 | Fujitsu Limited | Processor and control method of processor |
US11200164B2 (en) * | 2014-09-10 | 2021-12-14 | Oracle International Corporation | Coordinated garbage collection in distributed systems |
US11797438B2 (en) | 2014-09-10 | 2023-10-24 | Oracle International Corporation | Coordinated garbage collection in distributed systems |
US12117931B2 (en) | 2014-09-10 | 2024-10-15 | Oracle International Corporation | Coordinated garbage collection in distributed systems |
US10009248B2 (en) | 2014-12-12 | 2018-06-26 | International Business Machines Corporation | System with on-demand state for applications |
US10009249B2 (en) | 2014-12-12 | 2018-06-26 | International Business Machines Corporation | System with on-demand state for applications |
JP2017537393A (en) * | 2014-12-12 | 2017-12-14 | インテル コーポレイション | Efficient Synchronous Barrier Technology with Worksteeling Support [Cross Reference to Related US Patent Application] This application is a US patent application Ser. No. 14 / 568,831 filed Dec. 12, 2014 (invention “TECHNOLOGIES FOR EFFICENTENT”). SYNCHRONIZATION BARRIERS WITH WORK STEARING SUPPORT ”). |
WO2016094016A1 (en) * | 2014-12-12 | 2016-06-16 | Intel Corporation | Technologies for efficient synchronization barriers with work stealing support |
JP7030514B2 (en) | 2014-12-12 | 2022-03-07 | インテル コーポレイション | Efficient synchronization barrier technology with work stealing support |
US20190205178A1 (en) * | 2017-01-24 | 2019-07-04 | Oracle International Corporation | Distributed graph processing system featuring interactive remote control mechanism including task cancellation |
US10754700B2 (en) * | 2017-01-24 | 2020-08-25 | Oracle International Corporation | Distributed graph processing system featuring interactive remote control mechanism including task cancellation |
US20220171704A1 (en) * | 2019-05-31 | 2022-06-02 | Intel Corporation | Avoidance of garbage collection in high performance memory management systems |
US20210286646A1 (en) * | 2019-09-15 | 2021-09-16 | Mellanox Technologies, Ltd. | Task completion system |
US11847487B2 (en) * | 2019-09-15 | 2023-12-19 | Mellanox Technologies, Ltd. | Task completion system allowing tasks to be completed out of order while reporting completion in the original ordering my |
US11822973B2 (en) | 2019-09-16 | 2023-11-21 | Mellanox Technologies, Ltd. | Operation fencing system |
CN110908794B (en) * | 2019-10-09 | 2023-04-28 | 上海交通大学 | Task stealing method and system based on task stealing algorithm |
CN110908794A (en) * | 2019-10-09 | 2020-03-24 | 上海交通大学 | Task stealing method and system based on task stealing algorithm |
US11461130B2 (en) | 2020-05-26 | 2022-10-04 | Oracle International Corporation | Methodology for fast and seamless task cancelation and error handling in distributed processing of large graph data |
US20220164282A1 (en) * | 2020-11-24 | 2022-05-26 | International Business Machines Corporation | Reducing load balancing work stealing |
US11645200B2 (en) * | 2020-11-24 | 2023-05-09 | International Business Machines Corporation | Reducing load balancing work stealing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7945911B1 (en) | Barrier synchronization method and apparatus for work-stealing threads | |
US7103887B2 (en) | Load-balancing queues employing LIFO/FIFO work stealing | |
US7016923B2 (en) | Multi-threaded garbage collector employing cascaded memory arrays of task identifiers to implement work stealing queues for task identification and processing | |
US7263700B1 (en) | Serially, reusable virtual machine | |
US8631219B2 (en) | Method and system for dynamic memory management | |
US7024436B2 (en) | Computer system with two heaps in contiguous storage | |
US8245239B2 (en) | Deterministic runtime execution environment and method | |
US7086053B2 (en) | Method and apparatus for enabling threads to reach a consistent state without explicit thread suspension | |
US8510710B2 (en) | System and method of using pooled thread-local character arrays | |
US7159215B2 (en) | Termination detection for shared-memory parallel programs | |
US20030093487A1 (en) | Method and apparatus for sharing code containing references to non-shared objects | |
WO2001097029A2 (en) | Method and apparatus for implementing an extended virtual machine | |
GB2378535A (en) | Method and apparatus for suspending a software virtual machine | |
US20020055929A1 (en) | Computer system with multiple heaps | |
US7600223B2 (en) | Abstracted managed code execution | |
US7743377B2 (en) | Cooperative threading in a managed code execution environment | |
JP2004503869A (en) | Method and apparatus for implementing a modular garbage collector | |
US20060101439A1 (en) | Memory management in a managed code execution environment | |
Muralidharan | An Analysis of Garbage Collectors for Multicore Platforms | |
AU2005236088A1 (en) | Modified computer architecture with finalization of objects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SUN MICROSYSTEMS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GARTHWAITE, ALEXANDER T.;REEL/FRAME:016678/0202 Effective date: 20050603 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: ORACLE AMERICA, INC., CALIFORNIA Free format text: MERGER AND CHANGE OF NAME;ASSIGNORS:ORACLE USA, INC.;SUN MICROSYSTEMS, INC.;ORACLE AMERICA, INC.;REEL/FRAME:037311/0101 Effective date: 20100212 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |