US7395418B1 - Using a transactional execution mechanism to free up processor resources used by a busy-waiting thread - Google Patents

Using a transactional execution mechanism to free up processor resources used by a busy-waiting thread Download PDF

Info

Publication number
US7395418B1
US7395418B1 US11/234,669 US23466905A US7395418B1 US 7395418 B1 US7395418 B1 US 7395418B1 US 23466905 A US23466905 A US 23466905A US 7395418 B1 US7395418 B1 US 7395418B1
Authority
US
United States
Prior art keywords
thread
transactional
halt sequence
halt
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US11/234,669
Inventor
Paul Caprioli
Wayne Mesard
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle America Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Priority to US11/234,669 priority Critical patent/US7395418B1/en
Assigned to SUN MICROSYSTEMS, INC. reassignment SUN MICROSYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CAPRIOLI, PAUL, MESARD, WAYNE
Application granted granted Critical
Publication of US7395418B1 publication Critical patent/US7395418B1/en
Assigned to Oracle America, Inc. reassignment Oracle America, Inc. MERGER AND CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: Oracle America, Inc., ORACLE USA, INC., SUN MICROSYSTEMS, INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • G06F9/467Transactional memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30087Synchronisation or serialisation instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/3009Thread control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • G06F9/3863Recovery, e.g. branch miss-prediction, exception handling using multiple copies of the architectural state, e.g. shadow registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms

Definitions

  • the present invention relates to techniques for improving performance in computer systems. More specifically, the present invention relates to a method and an apparatus for using a transactional execution mechanism to free up processor resources used by a busy-waiting thread.
  • FIG. 1 illustrates two threads executing simultaneously in two different pipelines of a processor.
  • instructions from thread 102 execute in ALU pipeline 106 while instructions from thread 104 simultaneously execute in memory pipeline 108 .
  • processors that support SMT generally utilize processor resources efficiently, in some situations, they still waste processor resources. For example, suppose that thread 102 is executing a program that performs useful work. At the same time, if thread 104 is in the idle loop (waiting for an event to occur), thread 104 is continually using processor resources to check to see if the event has occurred. These resources could otherwise be used by thread 102 to perform useful work. Note that the code block that causes a thread to wait for an event is referred to as “spin-wait loop,” or a “busy-wait loop.”
  • a spin-wait loop is also useful in a high-performance computing (HPC) code, which partitions a complex problem into smaller sub-problems to be solved in parallel.
  • HPC high-performance computing
  • each sub-problem periodically communicates its results to other sub-problems. Some sub-problems complete more quickly than others.
  • the threads that execute these sub-problems often have to wait for the other threads to complete their sub-problems.
  • spin-wait loop As each thread completes its assigned sub-problem, it enters a spin-wait loop which continually checks to see if all other sub-problems have completed. This is a performance problem because these spin-wait loops can waste processor resources that could otherwise be used to perform useful work.
  • One embodiment of the present invention improves performance of a system that supports simultaneous multi-threading (SMT).
  • SMT simultaneous multi-threading
  • the system starts a transactional memory operation by generating a checkpoint and entering a transactional-execution mode, wherein instructions are speculatively executed but results are not committed to the architectural state of the processor until the transaction completes without interference.
  • the system loads from a mailbox address associated with the halt sequence. The system then stalls execution of the first thread, so that the first thread does not execute instructions within the halt sequence, thereby freeing up processor resources for other threads.
  • a second thread stores to the mailbox address, which causes a transactional-memory mechanism within the processor to detect an interference with the previous load from the mailbox address by the first thread and which causes the first thread to exit from the halt sequence.
  • the system then continues executing instructions following the halt sequence.
  • stalling execution of the first thread involves executing a speculation-barrier instruction, wherein during transactional-execution mode, the speculation-barrier instruction prevents the first thread from executing subsequent instructions.
  • the speculation-barrier instruction prevents the halt sequence from committing the transactional-memory operation.
  • the halt sequence is implemented as a system call, a library function, or a macro.
  • the second thread terminates the halt sequence (by writing to the mailbox) after calculating a result required by the first thread.
  • FIG. 1 illustrates two threads executing simultaneously in two different pipelines.
  • FIG. 2 presents a code block for a halt sequence in accordance with an embodiment of the present invention.
  • FIG. 3 presents a flow chart illustrating the process of executing a halt sequence in accordance with an embodiment of the present invention.
  • FIG. 4 presents a flow chart illustrating the process of exiting a halt sequence in accordance with an embodiment of the present invention.
  • a computer-readable storage medium which may be any device or medium that can store code and/or data for use by a computer system.
  • the transmission medium may include a communications network, such as the Internet.
  • the present invention uses a transactional-memory mechanism to free up processor resources when a processor encounters a halt sequence.
  • Transactional memory is described in more detail in U.S. Pat. No. 6,862,664, entitled “Method and Apparatus for Avoiding Locks by Speculatively Executing Critical Sections,” by inventors Shailender Chaudhry, Marc Tremblay, and Quinn Jacobson.
  • the above-listed application is hereby incorporated by reference to provide details on how transactional memory operates and is herein referred to as “[Chaudhry].”
  • the transactional memory system in [Chaudhry] makes a critical section of code appear to execute atomically.
  • the processor executes a checkpoint instruction, to generate a checkpoint, which the processor can use to return execution to the point where the checkpoint was taken.
  • the processor performs a commit operation, to commit changes made during the speculative-execution mode.
  • the term “thread” is used to refer to a thread and a process within this specification.
  • the thread aborts the transaction and discards the changes made during the speculative-execution mode. In other words, the critical section of code completes or fails as a single unit.
  • the system speculatively executes code within the critical section, without committing results of the speculative execution to the architectural state of the processor.
  • the system also continually monitors data references made by other threads to determine if an interfering data access occurs during speculative-execution mode. If not, the system commits changes made during speculative-execution mode and then resumes execution of the first thread past the critical section in normal-execution mode.
  • the system aborts the transaction and discards changes made during the speculative-execution mode.
  • an interfering data access can include a store by another thread to a cache line that has been load marked by the thread. It can also include a load or a store by another thread to a cache line that has been store marked by the thread.
  • FIG. 2 presents a code block that implements a halt sequence in accordance with an embodiment of the present invention.
  • a first thread can invoke the halt sequence to wait for a condition to occur, such as waiting for a second thread to calculate a result needed by the first thread.
  • the halt sequence code uses the transactional memory system to handle halt sequences without requiring the processor to continually execute code in the halt sequence to check to see if a condition is satisfied.
  • the halt sequence code first generates a checkpoint which has a corresponding fail_pc address.
  • the first thread jumps to this fail_pc address when the transactional-memory operation fails.
  • the halt sequence then loads from a “mailbox address.”
  • the halt sequence code then stalls the execution of the first thread.
  • the mailbox address is a location in memory which the second thread uses to notify the first thread that a condition is satisfied. When the condition is satisfied, the second thread stores to the mailbox address, thereby interfering with the previous load from the mailbox address by the first thread.
  • the first thread stalls, the first thread no longer executes instructions, which frees up processor resources.
  • the first thread resumes execution of subsequent instructions only when another thread interferes with the transaction initiated by the first thread.
  • other events can cause a transaction to fail (such as certain system events). Therefore, once the first thread resumes execution, it needs to verify that the waited-for condition has actually occurred by checking the mailbox to ensure that it contains the expected value.
  • a programmer can use the halt sequence code to stop execution of a thread to wait for a condition to occur.
  • the halt sequence can be implemented as a system call, a library function, or a macro.
  • a processor that supports scout mode
  • the processor speculatively executes instructions to prefetch future loads, but the processor does not commit the results to the architectural state of the processor.
  • U.S. patent application Ser. No. 10/741,944 entitled “Generating Prefetches by Speculatively Executing Code Through Hardware Scout Threading,” by inventors Shailender Chaudhry and Marc Tremblay, filed on 19 Dec. 2003, and published on 8 Jul. 2004. This patent application is hereby incorporated by reference herein to provide details on how scout mode operates.
  • a processor that supports scout mode simply launches into scout mode upon encountering a conventional stall instruction, a conventional stall instruction will not stop a thread from executing subsequent instructions. Hence, a different type of stall instruction is needed to prevent execution of subsequent instructions during scout mode.
  • This new type of stall instruction is referred to as a “speculation-barrier” instruction.
  • a thread when a thread encounters the checkpoint instruction of the halt sequence, it enters into a speculative-execution mode, such as scout mode.
  • speculative-execution mode the processor first loads from the mailbox address and then executes the speculation-barrier instruction which prevents the thread from executing subsequent instructions.
  • the speculation-barrier instruction does not prevent the thread from executing instructions after the speculation-barrier instruction.
  • the speculation-barrier will be encountered while the thread is in speculative-execution mode. Therefore, it has the effect of stopping instruction execution until an event external to the thread causes speculative-execution mode to be terminated.
  • FIG. 3 presents a flow chart illustrating the process of executing a halt sequence in accordance with an embodiment of the present invention.
  • the system starts a transactional memory operation by generating a checkpoint and entering transactional-execution mode (step 302 ).
  • the thread executes a load from a mailbox address associated with the halt sequence (step 304 ).
  • the thread then executes a stall instruction (or a speculation-barrier instruction) (step 306 ).
  • FIG. 4 presents a flow chart illustrating the process of exiting a halt sequence in accordance with an embodiment of the present invention.
  • the process begins when a second thread stores to the mailbox address to terminate the halt sequence (step 402 ).
  • the process continues when the first thread detects the interference caused by second thread with the previous load from the mailbox address issued by the first thread (step 404 ).
  • the first thread then exits the halt sequence (step 406 ) and continues executing instructions following the halt sequence (step 408 ).
  • the second thread terminates the halt sequence for the first thread after calculating a result required by the first thread.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Advance Control (AREA)

Abstract

A technique for improving the performance of a system that supports simultaneous multi-threading (SMT). When a first thread encounters a halt sequence, the system starts a transactional memory operation by generating a checkpoint and entering a transactional-execution mode. Next, the system loads from a mailbox address associated with the halt sequence. The system then stalls execution of the first thread, so that the first thread does not execute instructions within the halt sequence, thereby freeing up processor resources for other threads. To terminate the halt sequence, a second thread stores to the mailbox address, which causes a transactional-memory mechanism within the processor to detect an interference with the previous load from the mailbox address by the first thread and which causes the first thread to exit from the halt sequence. The system then continues executing instructions following the halt sequence.

Description

BACKGROUND
1. Field of the Invention
The present invention relates to techniques for improving performance in computer systems. More specifically, the present invention relates to a method and an apparatus for using a transactional execution mechanism to free up processor resources used by a busy-waiting thread.
2. Related Art
Advances in semiconductor technology presently make it possible to integrate large-scale systems, including tens of millions of transistors, onto a single semiconductor chip. Integrating such large-scale systems onto a single semiconductor chip increases the speed at which such systems can operate, because signals between system components do not have to cross chip boundaries, and are not subject to lengthy chip-to-chip propagation delays. This increased speed has translated into greatly increased processor performance.
At the same time, processor designers have been developing techniques to improve processor performance even further. For example, the simultaneous multi-threading (SMT) technique is used to improve performance by dynamically sharing processor resources between multiple threads which execute simultaneously. For example, FIG. 1 illustrates two threads executing simultaneously in two different pipelines of a processor. In this example, instructions from thread 102 execute in ALU pipeline 106 while instructions from thread 104 simultaneously execute in memory pipeline 108.
Although processors that support SMT generally utilize processor resources efficiently, in some situations, they still waste processor resources. For example, suppose that thread 102 is executing a program that performs useful work. At the same time, if thread 104 is in the idle loop (waiting for an event to occur), thread 104 is continually using processor resources to check to see if the event has occurred. These resources could otherwise be used by thread 102 to perform useful work. Note that the code block that causes a thread to wait for an event is referred to as “spin-wait loop,” or a “busy-wait loop.”
A spin-wait loop is also useful in a high-performance computing (HPC) code, which partitions a complex problem into smaller sub-problems to be solved in parallel. As these sub-problems execute in parallel, each sub-problem periodically communicates its results to other sub-problems. Some sub-problems complete more quickly than others. Hence, the threads that execute these sub-problems often have to wait for the other threads to complete their sub-problems. As each thread completes its assigned sub-problem, it enters a spin-wait loop which continually checks to see if all other sub-problems have completed. This is a performance problem because these spin-wait loops can waste processor resources that could otherwise be used to perform useful work.
Hence, what is needed is a method and an apparatus for freeing up processor resources while executing spin-wait loops in a computer system that supports SMT.
SUMMARY
One embodiment of the present invention improves performance of a system that supports simultaneous multi-threading (SMT). When a first thread encounters a halt sequence, the system starts a transactional memory operation by generating a checkpoint and entering a transactional-execution mode, wherein instructions are speculatively executed but results are not committed to the architectural state of the processor until the transaction completes without interference. Next, the system loads from a mailbox address associated with the halt sequence. The system then stalls execution of the first thread, so that the first thread does not execute instructions within the halt sequence, thereby freeing up processor resources for other threads. To terminate the halt sequence, a second thread stores to the mailbox address, which causes a transactional-memory mechanism within the processor to detect an interference with the previous load from the mailbox address by the first thread and which causes the first thread to exit from the halt sequence. The system then continues executing instructions following the halt sequence.
In a variation on this embodiment, stalling execution of the first thread involves executing a speculation-barrier instruction, wherein during transactional-execution mode, the speculation-barrier instruction prevents the first thread from executing subsequent instructions.
In a variation on this embodiment, the speculation-barrier instruction prevents the halt sequence from committing the transactional-memory operation.
In a variation on this embodiment, the halt sequence is implemented as a system call, a library function, or a macro.
In a variation on this embodiment, the second thread terminates the halt sequence (by writing to the mailbox) after calculating a result required by the first thread.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 illustrates two threads executing simultaneously in two different pipelines.
FIG. 2 presents a code block for a halt sequence in accordance with an embodiment of the present invention.
FIG. 3 presents a flow chart illustrating the process of executing a halt sequence in accordance with an embodiment of the present invention.
FIG. 4 presents a flow chart illustrating the process of exiting a halt sequence in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION
The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
The data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. This includes, but is not limited to, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs) and DVDs (digital versatile discs or digital video discs), and computer instruction signals embodied in a transmission medium (with or without a carrier wave upon which the signals are modulated). For example, the transmission medium may include a communications network, such as the Internet.
Transactional Memory
The present invention uses a transactional-memory mechanism to free up processor resources when a processor encounters a halt sequence. Transactional memory is described in more detail in U.S. Pat. No. 6,862,664, entitled “Method and Apparatus for Avoiding Locks by Speculatively Executing Critical Sections,” by inventors Shailender Chaudhry, Marc Tremblay, and Quinn Jacobson. The above-listed application is hereby incorporated by reference to provide details on how transactional memory operates and is herein referred to as “[Chaudhry].”
The transactional memory system in [Chaudhry] makes a critical section of code appear to execute atomically. Before entering the critical section, the processor executes a checkpoint instruction, to generate a checkpoint, which the processor can use to return execution to the point where the checkpoint was taken. If the critical section is successfully completed without interference from other threads, the processor performs a commit operation, to commit changes made during the speculative-execution mode. Note that the term “thread” is used to refer to a thread and a process within this specification. On the other hand, if another thread interferes with the thread executing the critical section, the thread aborts the transaction and discards the changes made during the speculative-execution mode. In other words, the critical section of code completes or fails as a single unit.
During speculative-execution mode, the system speculatively executes code within the critical section, without committing results of the speculative execution to the architectural state of the processor. The system also continually monitors data references made by other threads to determine if an interfering data access occurs during speculative-execution mode. If not, the system commits changes made during speculative-execution mode and then resumes execution of the first thread past the critical section in normal-execution mode.
On the other hand, if an interfering data access is detected, the system aborts the transaction and discards changes made during the speculative-execution mode.
Note that in the transactional memory system of [Chaudhry], an interfering data access can include a store by another thread to a cache line that has been load marked by the thread. It can also include a load or a store by another thread to a cache line that has been store marked by the thread.
Halt Sequence Code Block
FIG. 2 presents a code block that implements a halt sequence in accordance with an embodiment of the present invention. A first thread can invoke the halt sequence to wait for a condition to occur, such as waiting for a second thread to calculate a result needed by the first thread. The halt sequence code uses the transactional memory system to handle halt sequences without requiring the processor to continually execute code in the halt sequence to check to see if a condition is satisfied.
The halt sequence code first generates a checkpoint which has a corresponding fail_pc address. The first thread jumps to this fail_pc address when the transactional-memory operation fails. The halt sequence then loads from a “mailbox address.” The halt sequence code then stalls the execution of the first thread. Note that the mailbox address is a location in memory which the second thread uses to notify the first thread that a condition is satisfied. When the condition is satisfied, the second thread stores to the mailbox address, thereby interfering with the previous load from the mailbox address by the first thread.
Note that since the halt sequence code stalls the execution of the thread forever, the first thread never reaches the commit instruction. In other words, the transactional-memory operation started by the halt sequence code never completes successfully.
Also note that once the first thread stalls, the first thread no longer executes instructions, which frees up processor resources. The first thread resumes execution of subsequent instructions only when another thread interferes with the transaction initiated by the first thread. Furthermore, note that other events can cause a transaction to fail (such as certain system events). Therefore, once the first thread resumes execution, it needs to verify that the waited-for condition has actually occurred by checking the mailbox to ensure that it contains the expected value.
A programmer can use the halt sequence code to stop execution of a thread to wait for a condition to occur. Note that the halt sequence can be implemented as a system call, a library function, or a macro.
Stalling Threads and Scout Mode
In a processor that supports scout mode, when a thread encounters a stall condition, it launches into scout mode. During scout mode, the processor speculatively executes instructions to prefetch future loads, but the processor does not commit the results to the architectural state of the processor. For example, see co-pending U.S. patent application Ser. No. 10/741,944, entitled “Generating Prefetches by Speculatively Executing Code Through Hardware Scout Threading,” by inventors Shailender Chaudhry and Marc Tremblay, filed on 19 Dec. 2003, and published on 8 Jul. 2004. This patent application is hereby incorporated by reference herein to provide details on how scout mode operates.
Because a processor that supports scout mode simply launches into scout mode upon encountering a conventional stall instruction, a conventional stall instruction will not stop a thread from executing subsequent instructions. Hence, a different type of stall instruction is needed to prevent execution of subsequent instructions during scout mode. This new type of stall instruction is referred to as a “speculation-barrier” instruction.
In one embodiment of the present invention, when a thread encounters the checkpoint instruction of the halt sequence, it enters into a speculative-execution mode, such as scout mode. During speculative-execution mode, the processor first loads from the mailbox address and then executes the speculation-barrier instruction which prevents the thread from executing subsequent instructions. Note that if the thread is operating in normal-execution mode, the speculation-barrier instruction does not prevent the thread from executing instructions after the speculation-barrier instruction. However, note that because the halt sequence starts a transactional-memory operation, the speculation-barrier will be encountered while the thread is in speculative-execution mode. Therefore, it has the effect of stopping instruction execution until an event external to the thread causes speculative-execution mode to be terminated.
Stalling a Thread Using a Halt Sequence
FIG. 3 presents a flow chart illustrating the process of executing a halt sequence in accordance with an embodiment of the present invention. When a thread encounters a halt sequence, the system starts a transactional memory operation by generating a checkpoint and entering transactional-execution mode (step 302). Next, the thread executes a load from a mailbox address associated with the halt sequence (step 304). The thread then executes a stall instruction (or a speculation-barrier instruction) (step 306).
Waking Up a Thread From a Stalled Halt Sequence
FIG. 4 presents a flow chart illustrating the process of exiting a halt sequence in accordance with an embodiment of the present invention. The process begins when a second thread stores to the mailbox address to terminate the halt sequence (step 402). The process continues when the first thread detects the interference caused by second thread with the previous load from the mailbox address issued by the first thread (step 404). The first thread then exits the halt sequence (step 406) and continues executing instructions following the halt sequence (step 408).
In one embodiment of the present invention, the second thread terminates the halt sequence for the first thread after calculating a result required by the first thread.
The foregoing descriptions of embodiments of the present invention have been presented only for purposes of illustration and description. They are not intended to be exhaustive or to limit the present invention to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present invention. The scope of the present invention is defined by the appended claims.

Claims (15)

1. A method for improving the performance of a processor that supports speculative execution, wherein when a first thread encounters a halt sequence which is used by the first thread to wait for data that is to be generated by a second thread, the method comprises:
starting a transactional memory operation by generating a checkpoint and entering transactional-execution mode, wherein instructions are speculatively executed but results are not committed to the architectural state of the processor until the transaction completes without interference;
loading from a mailbox address associated with the halt sequence so that a transactional memory system can detect a subsequent interference with the load from the mailbox address, wherein the subsequent interference with the load from the mailbox address indicates that the halt sequence is to be terminated; and
while executing in a speculative-execution mode, stalling execution of the first thread, so that the processor does not execute instructions within the halt sequence, thereby freeing up processor resources for other threads;
wherein when the second thread has completed generating the data required by the first thread, the second thread performs a store operation to the mailbox address and a transactional-memory mechanism within the processor terminates the halt sequence by:
detecting an interference with the load from the mailbox address;
exiting the halt sequence; and
continuing execution of instructions following the halt sequence.
2. The method of claim 1, wherein stalling execution of the first thread involves executing a speculation-barrier instruction, wherein during transactional-execution mode, the speculation-barrier instruction prevents the first thread from speculatively executing subsequent instructions.
3. The method of claim 2, wherein the speculation-barrier instruction prevents the halt sequence from committing the transactional-memory operation.
4. The method of claim 1, wherein the halt sequence is implemented as:
a system call;
a library function; or
a macro.
5. The method of claim 1, wherein the second thread terminates the halt sequence for the first thread after calculating a result required by the first thread.
6. An apparatus for improving the performance of a processor that supports speculative execution, comprising:
an processing mechanism configured such that when a first thread encounters a halt sequence which is used by the first thread to wait for data that is to be generated by a second thread, the processing mechanism:
starts a transactional memory operation using a transactional-memory mechanism which involves generating a checkpoint and entering transactional-execution mode, wherein instructions are speculatively executed but results are not committed to the architectural state of the processor until the transaction completes without interference;
loads from a mailbox address associated with the halt sequence so that a transactional memory system can detect a subsequent interference with the load from the mailbox address, wherein the subsequent interference with the load from the mailbox address indicates that the halt sequence is to be terminated; and
while executing in a speculative-execution mode, stalls execution of the first thread, so that the processor does not execute instructions within the halt sequence, thereby freeing up processor resources for other threads;
wherein when the second thread has completed generating the data required by the first thread, the second thread performs a store operation to the mailbox address and the transactional-memory mechanism within the processing mechanism is configured to:
detect an interference with the load from the mailbox address;
exit the halt sequence; and to
continue execution of instructions following the halt sequence, thereby terminating the halt sequence.
7. The apparatus of claim 6, wherein stalling execution of the first thread involves executing a speculation-barrier instruction, wherein during transactional-execution mode, the speculation-barrier instruction prevents the first thread from speculatively executing subsequent instructions.
8. The apparatus of claim 7, wherein the speculation-barrier instruction prevents the halt sequence from committing the transactional-memory operation.
9. The apparatus of claim 6, wherein the halt sequence is implemented as:
a system call;
a library function; or
a macro.
10. The apparatus of claim 6, wherein the second thread terminates the halt sequence for the first thread after calculating a result required by the first thread.
11. A computer system for improving the performance of a processor that supports speculative execution, comprising:
the processor;
a memory; and
a transactional-memory mechanism;
wherein when a first thread encounters a halt sequence which is used by the first thread to wait for data that is to be generated by a second thread, the processor is configured to:
start a transactional memory operation using the transactional-memory mechanism which involves generating a checkpoint and entering transactional-execution mode, wherein instructions are speculatively executed but results are not committed to the architectural state of the processor until the transaction completes without interference;
load from a mailbox address associated with the halt sequence so that a transactional memory system can detect a subsequent interference with the load from the mailbox address wherein the subsequent interference with the load from the mailbox address indicates that the halt sequence is to be terminated; and
while executing in a speculative-execution mode, stall execution of the first thread, so that the processor does not execute instructions within the halt sequence, thereby freeing up processor resources for other threads;
wherein when the second thread has completed generating the data required by the first thread, the second thread performs a store operation to the mailbox address and the transactional-memory mechanism within the processing mechanism is configured to:
detect an interference with the load from the mailbox address;
exit the halt sequence; and to
continue execution of instructions following the halt sequence, thereby terminating the halt sequence.
12. The computer system of claim 11, wherein stalling execution of the first thread involves executing a speculation-barrier instruction, wherein during transactional-execution mode, the speculation-barrier instruction prevents the first thread from speculatively executing subsequent instructions.
13. The computer system of claim 12, wherein the speculation-barrier instruction prevents the halt sequence from committing the transactional-memory operation.
14. The computer system of claim 11, wherein the halt sequence is implemented as:
a system call;
a library function; or
a macro.
15. The computer system of claim 11, wherein the second thread terminates the halt sequence for the first thread after calculating a result required by the first thread.
US11/234,669 2005-09-22 2005-09-22 Using a transactional execution mechanism to free up processor resources used by a busy-waiting thread Active 2026-02-03 US7395418B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/234,669 US7395418B1 (en) 2005-09-22 2005-09-22 Using a transactional execution mechanism to free up processor resources used by a busy-waiting thread

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/234,669 US7395418B1 (en) 2005-09-22 2005-09-22 Using a transactional execution mechanism to free up processor resources used by a busy-waiting thread

Publications (1)

Publication Number Publication Date
US7395418B1 true US7395418B1 (en) 2008-07-01

Family

ID=39561220

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/234,669 Active 2026-02-03 US7395418B1 (en) 2005-09-22 2005-09-22 Using a transactional execution mechanism to free up processor resources used by a busy-waiting thread

Country Status (1)

Country Link
US (1) US7395418B1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070143755A1 (en) * 2005-12-16 2007-06-21 Intel Corporation Speculative execution past a barrier
US20080162886A1 (en) * 2006-12-28 2008-07-03 Bratin Saha Handling precompiled binaries in a hardware accelerated software transactional memory system
US20100186015A1 (en) * 2009-01-22 2010-07-22 International Business Machines Corporation Method and apparatus for implementing a transactional store system using a helper thread
US20110047362A1 (en) * 2009-08-19 2011-02-24 International Business Machines Corporation Version Pressure Feedback Mechanisms for Speculative Versioning Caches
US20110047334A1 (en) * 2009-08-20 2011-02-24 International Business Machines Corporation Checkpointing in Speculative Versioning Caches
US20110179254A1 (en) * 2010-01-15 2011-07-21 Sun Microsystems, Inc. Limiting speculative instruction fetching in a processor
US20150205586A1 (en) * 2014-01-17 2015-07-23 Nvidia Corporation System, method, and computer program product for bulk synchronous binary program translation and optimization
US9128750B1 (en) * 2008-03-03 2015-09-08 Parakinetics Inc. System and method for supporting multi-threaded transactions
US9684537B2 (en) 2015-11-06 2017-06-20 International Business Machines Corporation Regulating hardware speculative processing around a transaction
US10275254B2 (en) 2017-03-08 2019-04-30 International Business Machines Corporation Spin loop delay instruction
CN114629748A (en) * 2022-04-01 2022-06-14 日立楼宇技术(广州)有限公司 Building data processing method, edge gateway of building and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5649136A (en) * 1995-02-14 1997-07-15 Hal Computer Systems, Inc. Processor structure and method for maintaining and restoring precise state at any instruction boundary
US6272517B1 (en) * 1997-10-31 2001-08-07 Sun Microsystems, Incorporated Method and apparatus for sharing a time quantum
US20020087810A1 (en) * 2000-12-29 2002-07-04 Boatright Bryan D. System and method for high performance execution of locked memory instructions in a system with distributed memory and a restrictive memory model
US20030079094A1 (en) * 2001-10-19 2003-04-24 Ravi Rajwar Concurrent execution of critical sections by eliding ownership of locks
US20040093602A1 (en) * 2002-11-12 2004-05-13 Huston Larry B. Method and apparatus for serialized mutual exclusion
US6772294B2 (en) * 2002-07-08 2004-08-03 Sun Microsystems, Inc. Method and apparatus for using a non-committing data cache to facilitate speculative execution
US20040162951A1 (en) * 2003-02-13 2004-08-19 Jacobson Quinn A. Method and apparatus for delaying interfering accesses from other threads during transactional program execution
US20050246506A1 (en) * 2004-04-30 2005-11-03 Fujitsu Limited Information processing device, processor, processor control method, information processing device control method and cache memory
US7165254B2 (en) * 2004-07-29 2007-01-16 Fujitsu Limited Thread switch upon spin loop detection by threshold count of spin lock reading load instruction

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5649136A (en) * 1995-02-14 1997-07-15 Hal Computer Systems, Inc. Processor structure and method for maintaining and restoring precise state at any instruction boundary
US6272517B1 (en) * 1997-10-31 2001-08-07 Sun Microsystems, Incorporated Method and apparatus for sharing a time quantum
US20020087810A1 (en) * 2000-12-29 2002-07-04 Boatright Bryan D. System and method for high performance execution of locked memory instructions in a system with distributed memory and a restrictive memory model
US20030079094A1 (en) * 2001-10-19 2003-04-24 Ravi Rajwar Concurrent execution of critical sections by eliding ownership of locks
US6772294B2 (en) * 2002-07-08 2004-08-03 Sun Microsystems, Inc. Method and apparatus for using a non-committing data cache to facilitate speculative execution
US20040093602A1 (en) * 2002-11-12 2004-05-13 Huston Larry B. Method and apparatus for serialized mutual exclusion
US20040162951A1 (en) * 2003-02-13 2004-08-19 Jacobson Quinn A. Method and apparatus for delaying interfering accesses from other threads during transactional program execution
US20050246506A1 (en) * 2004-04-30 2005-11-03 Fujitsu Limited Information processing device, processor, processor control method, information processing device control method and cache memory
US7165254B2 (en) * 2004-07-29 2007-01-16 Fujitsu Limited Thread switch upon spin loop detection by threshold count of spin lock reading load instruction

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Dinning, Anne and Chonberg, Edith. "Detecting Access Anomalies in Programs with Critical Sections". ACM SIGPLAN Notices vol. 25, Issue 12 (C) 1991. pp. 85-96. *
Dubois, Michel and Scheurich, Christoph. "Memory Access Dependencies in Shared-Memory Multiprocessors". IEEE Transactions on Software Engineering vol. 16, No. 6 (C) Jun. 1990. pp. 660-673. *
Free On-Line Dictionary of Computing (FOLDOC). (C) 1995. www.foldoc.org□□search term: checkpoint. *
Jon, Ruoming; Yang, Ge; and Agrawal, Gagane. "Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance". IEEE Transactions on Knowledge and Data Engineering vol. 17, No. 1 (C) Jan. 2005. pp. 71-89. *
Michael, Maged. "Scalable Lock-Free Dynamic Memory Allocation". ACM (C) 2004. pp. 1-12. *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070143755A1 (en) * 2005-12-16 2007-06-21 Intel Corporation Speculative execution past a barrier
US8719807B2 (en) * 2006-12-28 2014-05-06 Intel Corporation Handling precompiled binaries in a hardware accelerated software transactional memory system
US20080162886A1 (en) * 2006-12-28 2008-07-03 Bratin Saha Handling precompiled binaries in a hardware accelerated software transactional memory system
US9304769B2 (en) 2006-12-28 2016-04-05 Intel Corporation Handling precompiled binaries in a hardware accelerated software transactional memory system
US9128750B1 (en) * 2008-03-03 2015-09-08 Parakinetics Inc. System and method for supporting multi-threaded transactions
US20100186015A1 (en) * 2009-01-22 2010-07-22 International Business Machines Corporation Method and apparatus for implementing a transactional store system using a helper thread
US9098327B2 (en) * 2009-01-22 2015-08-04 International Business Machines Corporation Method and apparatus for implementing a transactional store system using a helper thread
US8448173B2 (en) * 2009-01-22 2013-05-21 International Business Machines Corporation Method and apparatus for implementing a transactional store system using a helper thread
US20130219121A1 (en) * 2009-01-22 2013-08-22 International Business Machines Corporation Method and apparatus for implementing a transactional store system using a helper thread
US8397052B2 (en) 2009-08-19 2013-03-12 International Business Machines Corporation Version pressure feedback mechanisms for speculative versioning caches
US20110047362A1 (en) * 2009-08-19 2011-02-24 International Business Machines Corporation Version Pressure Feedback Mechanisms for Speculative Versioning Caches
US8521961B2 (en) 2009-08-20 2013-08-27 International Business Machines Corporation Checkpointing in speculative versioning caches
US20110047334A1 (en) * 2009-08-20 2011-02-24 International Business Machines Corporation Checkpointing in Speculative Versioning Caches
US20110179254A1 (en) * 2010-01-15 2011-07-21 Sun Microsystems, Inc. Limiting speculative instruction fetching in a processor
US20150205586A1 (en) * 2014-01-17 2015-07-23 Nvidia Corporation System, method, and computer program product for bulk synchronous binary program translation and optimization
US9207919B2 (en) * 2014-01-17 2015-12-08 Nvidia Corporation System, method, and computer program product for bulk synchronous binary program translation and optimization
US9684537B2 (en) 2015-11-06 2017-06-20 International Business Machines Corporation Regulating hardware speculative processing around a transaction
US9690623B2 (en) 2015-11-06 2017-06-27 International Business Machines Corporation Regulating hardware speculative processing around a transaction
US10606638B2 (en) 2015-11-06 2020-03-31 International Business Machines Corporation Regulating hardware speculative processing around a transaction
US10996982B2 (en) 2015-11-06 2021-05-04 International Business Machines Corporation Regulating hardware speculative processing around a transaction
US10275254B2 (en) 2017-03-08 2019-04-30 International Business Machines Corporation Spin loop delay instruction
US10365929B2 (en) 2017-03-08 2019-07-30 International Business Machines Corporation Spin loop delay instruction
US10656950B2 (en) 2017-03-08 2020-05-19 International Business Machines Corporation Spin loop delay instruction
CN114629748A (en) * 2022-04-01 2022-06-14 日立楼宇技术(广州)有限公司 Building data processing method, edge gateway of building and storage medium
CN114629748B (en) * 2022-04-01 2023-08-15 日立楼宇技术(广州)有限公司 Building data processing method, building edge gateway and storage medium

Similar Documents

Publication Publication Date Title
US7395418B1 (en) Using a transactional execution mechanism to free up processor resources used by a busy-waiting thread
US9626187B2 (en) Transactional memory system supporting unbroken suspended execution
US7930695B2 (en) Method and apparatus for synchronizing threads on a processor that supports transactional memory
US9817644B2 (en) Apparatus, method, and system for providing a decision mechanism for conditional commits in an atomic region
US8544022B2 (en) Transactional memory preemption mechanism
US8688963B2 (en) Checkpoint allocation in a speculative processor
EP2619655B1 (en) Apparatus, method, and system for dynamically optimizing code utilizing adjustable transaction sizes based on hardware limitations
US6912648B2 (en) Stick and spoke replay with selectable delays
US8316366B2 (en) Facilitating transactional execution in a processor that supports simultaneous speculative threading
US9262173B2 (en) Critical section detection and prediction mechanism for hardware lock elision
EP1989619B1 (en) Hardware acceleration for a software transactional memory system
US20070198978A1 (en) Methods and apparatus to implement parallel transactions
US9501237B2 (en) Automatic mutual exclusion
JPH10312282A (en) Method and device for improving insruction completion
US20090187906A1 (en) Semi-ordered transactions
US7634639B2 (en) Avoiding live-lock in a processor that supports speculative execution
Keckler et al. Concurrent event handling through multithreading
KR100310798B1 (en) Concurrent execution of machine context synchronization operations and non-interruptible instructions
US7634641B2 (en) Method and apparatus for using multiple threads to spectulatively execute instructions
JP3146058B2 (en) Parallel processing type processor system and control method of parallel processing type processor system
US20080082804A1 (en) Method and apparatus for enabling optimistic program execution

Legal Events

Date Code Title Description
AS Assignment

Owner name: SUN MICROSYSTEMS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CAPRIOLI, PAUL;MESARD, WAYNE;REEL/FRAME:017031/0459

Effective date: 20050823

FEPP Fee payment procedure

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: ORACLE AMERICA, INC., CALIFORNIA

Free format text: MERGER AND CHANGE OF NAME;ASSIGNORS:ORACLE USA, INC.;SUN MICROSYSTEMS, INC.;ORACLE AMERICA, INC.;REEL/FRAME:037303/0336

Effective date: 20100212

FPAY Fee payment

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12