US20090049323A1 - Synchronization of processors in a multiprocessor system - Google Patents

Synchronization of processors in a multiprocessor system Download PDF

Info

Publication number
US20090049323A1
US20090049323A1 US11/838,630 US83863007A US2009049323A1 US 20090049323 A1 US20090049323 A1 US 20090049323A1 US 83863007 A US83863007 A US 83863007A US 2009049323 A1 US2009049323 A1 US 2009049323A1
Authority
US
United States
Prior art keywords
processors
synchronization point
processor
interrupts
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/838,630
Inventor
Robert R. Imark
Raymond A. Gasser
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to US11/838,630 priority Critical patent/US20090049323A1/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GASSER, RAYMOND A., IMARK, ROBERT R.
Priority to PCT/US2008/008625 priority patent/WO2009023076A1/en
Publication of US20090049323A1 publication Critical patent/US20090049323A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/522Barrier synchronisation

Definitions

  • SIMD single instruction stream, multiple data stream
  • MIMD multiple instruction stream, multiple data stream
  • One type of communication involves synchronizing two or more of the processors by requiring each of the processors to halt execution at a predetermined point in its execution thread (called “rendezvous”), and then begin execution again at the same or another predetermined location (termed “launch).
  • rendezvous a predetermined point in its execution thread
  • launch a predetermined location
  • One or more such synchronization points may be employed depending on the general nature and specific requirements of the overall task to be performed.
  • a multiprocessing computer system supports the use of synchronization points by way of complicated, specialized and nonstandard hardware functions. For example, a dedicated hardware implementation of a specialized broadcast interrupt may be supplied to support the use of launch points so that a single processor may inform all other processors of a launch quickly and efficiently.
  • a dedicated hardware implementation of a specialized broadcast interrupt may be supplied to support the use of launch points so that a single processor may inform all other processors of a launch quickly and efficiently.
  • other multiprocessor systems may not provide such hardware due to the design and implementation expense involved in supporting such a specialized hardware construct.
  • FIG. 1 is a block diagram of a multiprocessor system according to an embodiment of the invention.
  • FIG. 2 is flow diagram of a method for synchronizing the processors of FIG. 1 according to an embodiment of the invention.
  • FIG. 3 is a graphic representation of execution of the processors in the multiprocessor of FIG. 1 according to another embodiment of the invention.
  • FIG. 4 is a diagram of a rendezvous table employed in the processor execution of FIG. 3 according to an embodiment of the invention.
  • FIG. 5 is a diagram of a launch table employed in the processor execution of FIG. 3 according to an embodiment of the invention.
  • FIG. 1 provides a simplified block diagram of a multiprocessor system 100 including a first processor 102 and multiple second processors 104 . In other embodiments, as few as two second processors 104 , or many more than that depicted in FIG. 1 , may be included.
  • the multiprocessor system 100 may employ any type of multiprocessor architecture in which synchronization of two or more of the processors is desired.
  • One possible architecture for the multiprocessor system may be a symmetric multiprocessing (SMP) system, although any of a plethora of other multiprocessor architectures may benefit from the inventive concepts described below.
  • SMP symmetric multiprocessing
  • first processor 102 is distinguished from the second processors 104
  • first processor 102 and the second processors 104 may all be equivalent in terms of physical and electronic construction.
  • the first processor 102 is instead distinguished in terms of its role in the synchronization of the processors 102 , 104 from that of the second processors 104 , as described in greater detail below.
  • each of the second processors 104 serve a similar synchronization function.
  • the second processors 104 may or may not be similar in design and construction to each other. Only the functionality of the processors 102 , 104 as described below is relevant to the embodiments presented herein.
  • FIG. 2 presents by way of a flow diagram a method 200 of synchronizing the processors 102 , 104 of FIG. 1 .
  • each of the second processors 104 waits at a second synchronization point after reaching a first synchronization point (operation 202 ).
  • the last of the second processors 104 to reach the first synchronization point sends a signal to the first processor 102 (operation 204 ).
  • the first processor 102 waits at the first synchronization point until receiving the signal (operation 206 ), and initiates a launch of the second processors 104 after receiving the signal (operation 208 ).
  • the first processor initiates the launch by launching at least one of the second processors 104 , wherein the second processor 104 being launched launches another one of the second processors 104 in response to being launched.
  • Each of the second processors 104 in response to being launched, continues execution from the second synchronization point (operation 210 ).
  • FIG. 3 graphically characterizes the execution of the first processor 102 and four separate second processors 104 according to a more detailed embodiment of the invention. While four separate second processors 104 are employed in this particular example, any number of second processors 104 greater than one may be utilized in other embodiments.
  • the first processor 102 is identified as Processor 0
  • the second processors 104 are labeled Processors 1 - 4 . Execution of each of the processors 102 , 104 is depicted as progressing from top to bottom in relation to a first synchronization point 302 and a second synchronization point 304 .
  • the first synchronization point 302 is a rendezvous or gathering point for the processors 102 , 104
  • the second synchronization point 304 is a launching point from which the processors 102 , 104 continue execution.
  • the synchronization points 302 , 304 are represented in FIG. 3 as being located in completely separate areas of the execution stream of each of the processors 102 , 104
  • the first synchronization point 302 and the second synchronization point 304 may represent approximately the same point in the execution thread of any or all of the processors 102 , 104 in another example.
  • the first processor 102 executes until reaching the first synchronization point 302 , whereupon the first processor 102 waits for a signal 312 from one of the second processors 104 . Meanwhile, each of the second processors 104 executes through the first synchronization point 302 to the second synchronization point 304 . Each of the second processors 104 accesses a rendezvous table 310 at the first synchronization point 302 to indicate that the accessing second processor 104 has reached the first synchronization point 302 . The second processor 104 that is last to reach the first synchronization point 302 issues the signal 312 to the first processor 102 so that it may continue execution.
  • FIG. 4 illustrates one specific example of the rendezvous table 310 that each of the second processors 104 accesses to indicate its arrival at the first synchronization point 302 .
  • the rendezvous table 310 includes several entries, wherein each entry is depicted in FIG. 3 as a row. Further, each entry is associated with an address 402 , and includes three fields: a processor count field 404 , a processor threshold field 406 , and a next address field 408 . Also shown in FIG. 4 is the entry of the rendezvous table 310 that is associated with each of the processors 102 , 104 , identified by the address 402 of the entry.
  • each second processor 104 accesses the entry of the rendezvous table 310 to which it is assigned.
  • the second processor 104 then performs an atomic read-modify-write operation of the processor count field 404 of that entry and compares the value read from the processor count field 404 to the processor threshold field 406 . If the values are equal, the second processor 104 then accesses the rendezvous table 310 entry indicated in the next address field 408 in the same fashion as the previous entry. This process continues for each of the second processors 104 until an accessed processor count field 404 is found to be less than its associated processor threshold field 406 .
  • Processors 1 - 3 are each assigned the rendezvous table 310 entry located at address 7066200. Each of these second processors 104 reads the processor count field 404 of that entry and compares it to the processor threshold field 406 .
  • the first of Processors 1 - 3 that reaches the first synchronization point 302 and accesses the rendezvous table 310 entry at 7066200 reads a value of zero from the processor count field 404 and writes the incremented value of one back to the processor count field 404 .
  • That processor also compares the value of zero read from the processor count field 404 to the processor threshold 406 of two, and ceases this particular access of the rendezvous table 310 as a result.
  • the second of the Processors 1 - 3 to reach the first synchronization point 302 accesses the same entry of the rendezvous table 310 , resulting in a value of two being stored for the processor count field 404 at address 7066200.
  • the third of the Processors 1 - 3 to pass through the first synchronization point 302 accesses the same entry. However, after reading a value of two from the processor count field 404 and writing back a three thereto, that particular second processor 104 compares the two to the processor threshold 406 of two, and after finding that they are equal, accesses the next address field 408 , which stores an address of 7066100. The last of the Processors 1 - 3 reaching the first synchronization point 302 then accesses the entry of the rendezvous table 310 at the address of 7066100 and repeats the process.
  • the second processor 104 referred to as Processor 4 also accesses the entry of the rendezvous table 310 at address 7066100 when it reaches the first synchronization point 302 .
  • Processor 4 is the last second processor 104 to reach the first synchronization point 302 .
  • Processor 4 After reading the value of one from the processor count field 404 (written therein by the last of Processors 1 - 3 , as described above), Processor 4 compares the one to the value of one written in the processor threshold field 406 . With these two values being equal, Processor 4 reads the next address field 408 that holds the value of 7066000.
  • Processor 4 then reads the rendezvous table 310 entry at that address, reads the processor count field 404 of zero, stores the incremented value of one thereto, and compares the zero to the zero stored in the processor threshold field 406 of that entry. With the compared values being equal, Processor 4 then reads the next address field 408 of that entry, which holds the value 0000000. In this embodiment, reading all zeros indicates to Processor 4 that it is the last of the second processors 104 to reach the first synchronization point 302 .
  • Processor 4 sends a signal 312 to Processor 0 (i.e., the first processor 102 of FIG. 3 ).
  • the signal 312 may be issued in a number of ways.
  • the act of Processor 4 writing a one to the processor count field 404 of the rendezvous table 310 entry at address 7066000 may serve as the signal 312 .
  • Processor 0 may poll the processor count field 404 for a one to be written thereto, interpreting the one as the signal 312 .
  • Processor 4 writes a separate memory location, sends a message to Processor 0 , or performs some other operation to implement the signal 312 to Processor 0 .
  • Processor 0 or Processor 4 may clear the processor count field 404 of each entry of the rendezvous table 310 to initialize the table 310 for the next time the first synchronization point 302 or other similar rendezvous point is employed.
  • Write operations to the processor count fields 404 as described above specifically employ an atomic read-modify-write operation often used in multiprocessor systems for processor intercommunication so that conflicts in accessing the field 404 will not arise between two or more of the second processors 104 .
  • an atomic operation eliminates the possibility that two of the second processors will read the same value from the same processor count field 404 .
  • Other memory accesses that prevent such conflicts such as standard “semaphore” or “mailbox” operations, could be utilized in other embodiments.
  • FIG. 4 allows generation of the signal 312 by way of a single access of the processor count field 404 of address 7066000, which the first processor 102 may be polling, as described above.
  • the hierarchical model of FIG. 4 provides a more efficient method for processor synchronization.
  • FIG. 3 A similar hierarchical model employing interrupts is utilized in FIG. 3 in launching the second processors 104 from the second synchronization point 304 . More specifically, after the first processor 102 (i.e., Processor 0 ) receives the signal 312 , the first processor 102 resumes execution to the second synchronization point 304 , at which time it initiates a launch of each of the second processors 104 (i.e., Processors 1 - 4 ) by sending a first interrupt 322 to one of the second processors 104 , which is Processor 1 in the embodiment of FIG. 3 . In this specific example, Processor 0 also issues a second interrupt 324 to Processor 4 .
  • Processor 0 also issues a second interrupt 324 to Processor 4 .
  • Processor 1 issues a third interrupt 326 to Processor 2 , and a fourth interrupt 328 to Processor 3 .
  • each of the second processors 104 then continues execution from the second synchronization point 304 , along with the first processor 102 .
  • the first synchronization point 302 and the second synchronization point 304 are employed together to rendezvous and subsequently launch the processors 102 , 104 .
  • FIG. 5 graphically shows an embodiment of a launch table 330 accessed by each of the Processors 0 - 4 to perform the hierarchically-structured launch.
  • the launch table 330 includes a number of entries, each of which is accessible via an address 502 .
  • each entry reside one or more interrupt indicators 504 , wherein each indicator 504 is associated with a particular launch interrupt.
  • each interrupt indicator 504 is an interrupt mask and enable value that specifies an interrupt associated with a particular second processor 104 .
  • the processor 102 , 104 issuing the interrupt may merely write the data stored at the interrupt indicator 504 to a location that causes the processor 102 , 104 to generate the interrupt.
  • Other indicators such as a simple binary value indicating the specific second processor 104 to be launched, may be used in other examples.
  • FIG. 5 also indicates the launch table 330 entry address and interrupt associated with each of the processors 102 , 104 .
  • Processor 0 accesses its assigned launch table 330 address 7066700 after reaching the second synchronization point 304 .
  • the entry at address 7066700 indicates that interrupts are to be issued to Processor 1 (by way of interrupt 322 ) and Processor 4 (via interrupt 324 ), as shown in the interrupt indicators 504 of that entry.
  • an interrupt indicator 504 of zero denotes the end of the list of interrupts to be issued by the processor accessing the associated entry of the launch table 330 .
  • Processor 1 After receiving its launch interrupt 322 , Processor 1 accesses its assigned launch table entry 330 at address 7066800, which indicates that Processor 1 is to issue launch interrupts 326 and 328 for Processors 2 and 3 , respectively. Similarly, Processor 4 accesses its assigned entry at address 7066900 in response to receiving interrupt 324 from Processor 0 . However, the entry at address 706900 lists no further launch interrupts to be issued. Similarly, Processors 2 and 3 access the same entry in response to receiving interrupts 326 , 328 , and find that they are not responsible for issuing any launch interrupts, either. As a result, each of the second processors 104 has received a launch interrupt, even though the first processor 102 has only issued two launch interrupts directly. As a result, the work required to issue the interrupts has been distributed among the processors 102 , 104 in a hierarchical fashion, thus hastening the launch process.
  • the rendezvous table 310 , the launch table 330 , and the assigned table addresses for each of the processors 102 , 104 are initialized by one or more of the processors 102 , 104 . According to another implementation, a separate processor not discussed above may perform this function. Once the tables 310 , 330 are initialized, further setup of the tables 310 , 330 may not be required during the use of multiple synchronizations of the processors 102 , 104 .
  • each of the processors 102 , 104 may act as a producer of service requests before the first synchronization point 302 and after the second synchronization point 304 .
  • the first processor 102 may then process or consume the service requests after the first synchronization point 302 and before the second synchronization point 304 while the second processors 104 remain idle at the second synchronization point 304 .
  • the service requests may constitute requests for any service that may be provided by the first processor 102 , including but not limited to generating billing records, or searching and retrieving items in a database.
  • the first synchronization point 302 may be used to implement a “join” operation employed in a MIMD multiprocessing system, while the second synchronization point 304 may be utilized as part of a “fork” operation.
  • the first processor 102 may execute a single thread after the first synchronization point 302 while the second processors 104 wait to be called on at their second synchronization point 304 .
  • the first processor 104 may then execute a fork operation to spawn the second processors 104 by way of the launch process from the second synchronization point 304 , as described above.
  • execution in that second processor 104 reaches the next first synchronization point 302 .
  • the last of the second processors 104 to reach the first synchronization point 302 then issues the signal 312 to the first processor 102 , thus joining the execution of the processors back together.
  • the first processor 102 may then operate as a lone thread until another fork operation is undertaken at another second synchronization point 304 .
  • Various embodiments of a multiprocessor system and method as discussed above may provide significant benefits. Since a combination of simple shared memory communication and individual interrupts is employed to effectuate the rendezvous and launch processes, the embodiments may be implemented on most multiprocessing systems. In addition, the use of a logical processor hierarchy among the second processors 104 , with the first processor 102 residing at the top of the hierarchy, facilitates quick execution of both the rendezvous and launch phases of the synchronization.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

A method for synchronizing a first processor and multiple second processors is presented. In the method, each of the second processors waits at a second synchronization point after reaching a first synchronization point. The last of the second processors to reach the first synchronization point sends a signal to the first processor. The first processor waits at the first synchronization point until it receives the signal. After receiving the signal, the first processor initiates a launch of the second processors by launching at least one of the second processors. At least one of the second processors launched by the first processor launches another of the second processors in response to being launched by the first processor. Each of the second processors continues execution from the second synchronization point in response to being launched.

Description

    BACKGROUND
  • Designers of computer systems consistently strive for increased processing capacity with each product generation. Many different approaches have been adopted to achieve the computing speeds currently enjoyed by users of such systems. Increased system clock speeds, integrated circuit design advances, wider data paths, and various other technological developments have all contributed to increasing the processing throughput of a single processor.
  • To further enhance computer capability, pipelined and parallel arrangements of multiple processing units have been pursued successfully. Parallel processing generally began with the use of “single instruction stream, multiple data stream” (SIMD) architectures, in which multiple processors perform identical operations on different data. In such a system, a single program line, or “thread,” of instructions is executed. More advanced “multiple instruction stream, multiple data stream” (MIMD) systems allow each processor to execute a completely diverse set of instructions, or a separate copy of the same set of instructions.
  • However, even in an MIMD system, some communication or cooperation between the various processors is typically required. One type of communication involves synchronizing two or more of the processors by requiring each of the processors to halt execution at a predetermined point in its execution thread (called “rendezvous”), and then begin execution again at the same or another predetermined location (termed “launch). One or more such synchronization points may be employed depending on the general nature and specific requirements of the overall task to be performed.
  • Typically, a multiprocessing computer system supports the use of synchronization points by way of complicated, specialized and nonstandard hardware functions. For example, a dedicated hardware implementation of a specialized broadcast interrupt may be supplied to support the use of launch points so that a single processor may inform all other processors of a launch quickly and efficiently. However, other multiprocessor systems may not provide such hardware due to the design and implementation expense involved in supporting such a specialized hardware construct.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of a multiprocessor system according to an embodiment of the invention.
  • FIG. 2 is flow diagram of a method for synchronizing the processors of FIG. 1 according to an embodiment of the invention.
  • FIG. 3 is a graphic representation of execution of the processors in the multiprocessor of FIG. 1 according to another embodiment of the invention.
  • FIG. 4 is a diagram of a rendezvous table employed in the processor execution of FIG. 3 according to an embodiment of the invention.
  • FIG. 5 is a diagram of a launch table employed in the processor execution of FIG. 3 according to an embodiment of the invention.
  • DETAILED DESCRIPTION
  • FIG. 1 provides a simplified block diagram of a multiprocessor system 100 including a first processor 102 and multiple second processors 104. In other embodiments, as few as two second processors 104, or many more than that depicted in FIG. 1, may be included. The multiprocessor system 100 may employ any type of multiprocessor architecture in which synchronization of two or more of the processors is desired. One possible architecture for the multiprocessor system may be a symmetric multiprocessing (SMP) system, although any of a plethora of other multiprocessor architectures may benefit from the inventive concepts described below.
  • While the first processor 102 is distinguished from the second processors 104, the first processor 102 and the second processors 104 may all be equivalent in terms of physical and electronic construction. The first processor 102 is instead distinguished in terms of its role in the synchronization of the processors 102, 104 from that of the second processors 104, as described in greater detail below. Further, each of the second processors 104 serve a similar synchronization function. However, the second processors 104 may or may not be similar in design and construction to each other. Only the functionality of the processors 102, 104 as described below is relevant to the embodiments presented herein.
  • FIG. 2 presents by way of a flow diagram a method 200 of synchronizing the processors 102, 104 of FIG. 1. In the method 200, each of the second processors 104 waits at a second synchronization point after reaching a first synchronization point (operation 202). The last of the second processors 104 to reach the first synchronization point sends a signal to the first processor 102 (operation 204). The first processor 102 waits at the first synchronization point until receiving the signal (operation 206), and initiates a launch of the second processors 104 after receiving the signal (operation 208). More specifically, the first processor initiates the launch by launching at least one of the second processors 104, wherein the second processor 104 being launched launches another one of the second processors 104 in response to being launched. Each of the second processors 104, in response to being launched, continues execution from the second synchronization point (operation 210).
  • FIG. 3 graphically characterizes the execution of the first processor 102 and four separate second processors 104 according to a more detailed embodiment of the invention. While four separate second processors 104 are employed in this particular example, any number of second processors 104 greater than one may be utilized in other embodiments. For identification purposes, the first processor 102 is identified as Processor 0, while the second processors 104 are labeled Processors 1-4. Execution of each of the processors 102, 104 is depicted as progressing from top to bottom in relation to a first synchronization point 302 and a second synchronization point 304. In one embodiment, the first synchronization point 302 is a rendezvous or gathering point for the processors 102, 104, while the second synchronization point 304 is a launching point from which the processors 102, 104 continue execution. Although the synchronization points 302, 304 are represented in FIG. 3 as being located in completely separate areas of the execution stream of each of the processors 102, 104, the first synchronization point 302 and the second synchronization point 304 may represent approximately the same point in the execution thread of any or all of the processors 102, 104 in another example.
  • In FIG. 3, the first processor 102 executes until reaching the first synchronization point 302, whereupon the first processor 102 waits for a signal 312 from one of the second processors 104. Meanwhile, each of the second processors 104 executes through the first synchronization point 302 to the second synchronization point 304. Each of the second processors 104 accesses a rendezvous table 310 at the first synchronization point 302 to indicate that the accessing second processor 104 has reached the first synchronization point 302. The second processor 104 that is last to reach the first synchronization point 302 issues the signal 312 to the first processor 102 so that it may continue execution.
  • FIG. 4 illustrates one specific example of the rendezvous table 310 that each of the second processors 104 accesses to indicate its arrival at the first synchronization point 302. The rendezvous table 310 includes several entries, wherein each entry is depicted in FIG. 3 as a row. Further, each entry is associated with an address 402, and includes three fields: a processor count field 404, a processor threshold field 406, and a next address field 408. Also shown in FIG. 4 is the entry of the rendezvous table 310 that is associated with each of the processors 102, 104, identified by the address 402 of the entry.
  • As each second processor 104 reaches the first synchronization point 302, that second processor 104 accesses the entry of the rendezvous table 310 to which it is assigned. The second processor 104 then performs an atomic read-modify-write operation of the processor count field 404 of that entry and compares the value read from the processor count field 404 to the processor threshold field 406. If the values are equal, the second processor 104 then accesses the rendezvous table 310 entry indicated in the next address field 408 in the same fashion as the previous entry. This process continues for each of the second processors 104 until an accessed processor count field 404 is found to be less than its associated processor threshold field 406.
  • Using the scenario depicted in FIGS. 3 and 4 as an example, Processors 1-3 are each assigned the rendezvous table 310 entry located at address 7066200. Each of these second processors 104 reads the processor count field 404 of that entry and compares it to the processor threshold field 406. Thus, the first of Processors 1-3 that reaches the first synchronization point 302 and accesses the rendezvous table 310 entry at 7066200 reads a value of zero from the processor count field 404 and writes the incremented value of one back to the processor count field 404. That processor also compares the value of zero read from the processor count field 404 to the processor threshold 406 of two, and ceases this particular access of the rendezvous table 310 as a result. Similarly, the second of the Processors 1-3 to reach the first synchronization point 302 accesses the same entry of the rendezvous table 310, resulting in a value of two being stored for the processor count field 404 at address 7066200.
  • Continuing in this manner, the third of the Processors 1-3 to pass through the first synchronization point 302 accesses the same entry. However, after reading a value of two from the processor count field 404 and writing back a three thereto, that particular second processor 104 compares the two to the processor threshold 406 of two, and after finding that they are equal, accesses the next address field 408, which stores an address of 7066100. The last of the Processors 1-3 reaching the first synchronization point 302 then accesses the entry of the rendezvous table 310 at the address of 7066100 and repeats the process. Assuming this processor 104 reaches the first synchronization point 302 before Processor 4 (the operation of which is addressed below), the processor count field 404 of zero is read, a one is written back thereto, and the zero is compared to the processor threshold field 406 of one. As a result, the last of these second processors 104 (i.e., Processors 1-3) ceases its access of the rendezvous table 310.
  • Proceeding with the example of FIGS. 3 and 4, the second processor 104 referred to as Processor 4 also accesses the entry of the rendezvous table 310 at address 7066100 when it reaches the first synchronization point 302. In this case, Processor 4 is the last second processor 104 to reach the first synchronization point 302. After reading the value of one from the processor count field 404 (written therein by the last of Processors 1-3, as described above), Processor 4 compares the one to the value of one written in the processor threshold field 406. With these two values being equal, Processor 4 reads the next address field 408 that holds the value of 7066000. Processor 4 then reads the rendezvous table 310 entry at that address, reads the processor count field 404 of zero, stores the incremented value of one thereto, and compares the zero to the zero stored in the processor threshold field 406 of that entry. With the compared values being equal, Processor 4 then reads the next address field 408 of that entry, which holds the value 0000000. In this embodiment, reading all zeros indicates to Processor 4 that it is the last of the second processors 104 to reach the first synchronization point 302.
  • In response to being the last of the second processors 104, Processor 4 sends a signal 312 to Processor 0 (i.e., the first processor 102 of FIG. 3). The signal 312 may be issued in a number of ways. In one embodiment, the act of Processor 4 writing a one to the processor count field 404 of the rendezvous table 310 entry at address 7066000 may serve as the signal 312. In that case, Processor 0 may poll the processor count field 404 for a one to be written thereto, interpreting the one as the signal 312. In another implementation, Processor 4 writes a separate memory location, sends a message to Processor 0, or performs some other operation to implement the signal 312 to Processor 0. In addition, Processor 0 or Processor 4 may clear the processor count field 404 of each entry of the rendezvous table 310 to initialize the table 310 for the next time the first synchronization point 302 or other similar rendezvous point is employed.
  • Write operations to the processor count fields 404 as described above specifically employ an atomic read-modify-write operation often used in multiprocessor systems for processor intercommunication so that conflicts in accessing the field 404 will not arise between two or more of the second processors 104. For example, use of an atomic operation eliminates the possibility that two of the second processors will read the same value from the same processor count field 404. Other memory accesses that prevent such conflicts, such as standard “semaphore” or “mailbox” operations, could be utilized in other embodiments.
  • By employing multiple rendezvous table 310 entries, a hierarchical structure of memory locations is formed by which the second processors 104 indicate reaching the first synchronization point 302. Thus, by spreading the access to the rendezvous table 310 by each of the second processors 104 across multiple memory locations, access contention for those locations, which potentially is exacerbated by the atomic memory operations, is greatly decreased, resulting in faster signaling of the first processor 102. In addition, the particular embodiment of FIG. 4 allows generation of the signal 312 by way of a single access of the processor count field 404 of address 7066000, which the first processor 102 may be polling, as described above. In contrast to a single counter being updated by each of the second processors 104 while the first processor 102 polls the same memory location, the hierarchical model of FIG. 4 provides a more efficient method for processor synchronization.
  • A similar hierarchical model employing interrupts is utilized in FIG. 3 in launching the second processors 104 from the second synchronization point 304. More specifically, after the first processor 102 (i.e., Processor 0) receives the signal 312, the first processor 102 resumes execution to the second synchronization point 304, at which time it initiates a launch of each of the second processors 104 (i.e., Processors 1-4) by sending a first interrupt 322 to one of the second processors 104, which is Processor 1 in the embodiment of FIG. 3. In this specific example, Processor 0 also issues a second interrupt 324 to Processor 4. In turn, after receiving the first interrupt 322, Processor 1 issues a third interrupt 326 to Processor 2, and a fourth interrupt 328 to Processor 3. In response to receiving one of the interrupts 322-328, each of the second processors 104 then continues execution from the second synchronization point 304, along with the first processor 102. Thus, the first synchronization point 302 and the second synchronization point 304 are employed together to rendezvous and subsequently launch the processors 102, 104.
  • FIG. 5 graphically shows an embodiment of a launch table 330 accessed by each of the Processors 0-4 to perform the hierarchically-structured launch. As with the rendezvous table 310, the launch table 330 includes a number of entries, each of which is accessible via an address 502. Within each entry reside one or more interrupt indicators 504, wherein each indicator 504 is associated with a particular launch interrupt. In one embodiment, each interrupt indicator 504 is an interrupt mask and enable value that specifies an interrupt associated with a particular second processor 104. Thus, the processor 102, 104 issuing the interrupt may merely write the data stored at the interrupt indicator 504 to a location that causes the processor 102, 104 to generate the interrupt. Other indicators, such as a simple binary value indicating the specific second processor 104 to be launched, may be used in other examples. FIG. 5 also indicates the launch table 330 entry address and interrupt associated with each of the processors 102, 104.
  • In the particular example of FIG. 5, Processor 0 (i.e., the first processor 102) accesses its assigned launch table 330 address 7066700 after reaching the second synchronization point 304. The entry at address 7066700 indicates that interrupts are to be issued to Processor 1 (by way of interrupt 322) and Processor 4 (via interrupt 324), as shown in the interrupt indicators 504 of that entry. Further according to the embodiment of FIG. 5, an interrupt indicator 504 of zero denotes the end of the list of interrupts to be issued by the processor accessing the associated entry of the launch table 330.
  • After receiving its launch interrupt 322, Processor 1 accesses its assigned launch table entry 330 at address 7066800, which indicates that Processor 1 is to issue launch interrupts 326 and 328 for Processors 2 and 3, respectively. Similarly, Processor 4 accesses its assigned entry at address 7066900 in response to receiving interrupt 324 from Processor 0. However, the entry at address 706900 lists no further launch interrupts to be issued. Similarly, Processors 2 and 3 access the same entry in response to receiving interrupts 326, 328, and find that they are not responsible for issuing any launch interrupts, either. As a result, each of the second processors 104 has received a launch interrupt, even though the first processor 102 has only issued two launch interrupts directly. As a result, the work required to issue the interrupts has been distributed among the processors 102, 104 in a hierarchical fashion, thus hastening the launch process.
  • In one embodiment, the rendezvous table 310, the launch table 330, and the assigned table addresses for each of the processors 102, 104 are initialized by one or more of the processors 102, 104. According to another implementation, a separate processor not discussed above may perform this function. Once the tables 310, 330 are initialized, further setup of the tables 310, 330 may not be required during the use of multiple synchronizations of the processors 102, 104.
  • The use of the two synchronization points 302, 304 may be advantageous in a number of processing contexts. In one example, each of the processors 102, 104 may act as a producer of service requests before the first synchronization point 302 and after the second synchronization point 304. In response, the first processor 102 may then process or consume the service requests after the first synchronization point 302 and before the second synchronization point 304 while the second processors 104 remain idle at the second synchronization point 304. The service requests may constitute requests for any service that may be provided by the first processor 102, including but not limited to generating billing records, or searching and retrieving items in a database.
  • In another example, the first synchronization point 302 may be used to implement a “join” operation employed in a MIMD multiprocessing system, while the second synchronization point 304 may be utilized as part of a “fork” operation. In such an environment, the first processor 102 may execute a single thread after the first synchronization point 302 while the second processors 104 wait to be called on at their second synchronization point 304. The first processor 104 may then execute a fork operation to spawn the second processors 104 by way of the launch process from the second synchronization point 304, as described above. As each of the second processors 104 then finishes the work to which it was assigned, execution in that second processor 104 reaches the next first synchronization point 302. The last of the second processors 104 to reach the first synchronization point 302 then issues the signal 312 to the first processor 102, thus joining the execution of the processors back together. The first processor 102 may then operate as a lone thread until another fork operation is undertaken at another second synchronization point 304.
  • Various embodiments of a multiprocessor system and method as discussed above may provide significant benefits. Since a combination of simple shared memory communication and individual interrupts is employed to effectuate the rendezvous and launch processes, the embodiments may be implemented on most multiprocessing systems. In addition, the use of a logical processor hierarchy among the second processors 104, with the first processor 102 residing at the top of the hierarchy, facilitates quick execution of both the rendezvous and launch phases of the synchronization.
  • While several embodiments of the invention have been discussed herein, other embodiments encompassed by the scope of the invention are possible. For example, while many embodiments as described above specifically involve the use of a handful of processors within a multiprocessor system, other embodiments employing many more processors coupled together within a single system may exhibit even greater advantages over more serially oriented solutions due to the hierarchical nature of the synchronization mechanisms distributing the required work among many more processors. Further, aspects of one embodiment may be combined with those of alternative embodiments to create further implementations of the present invention. Thus, while the present invention has been described in the context of specific embodiments, such descriptions are provided for illustration and not limitation. Accordingly, the proper scope of the present invention is delimited only by the following claims and their equivalents.

Claims (23)

1. A method for synchronizing a first processor and second processors, the method comprising:
in each of the second processors, waiting at a second synchronization point after reaching a first synchronization point;
in the last of the second processors to reach the first synchronization point, sending the first processor a signal;
in the first processor, waiting at the first synchronization point until receiving the signal, and initiating a launch of the second processors after receiving the signal by launching one of the second processors, wherein the one of the second processors launches another one of the second processors in response to being launched; and
in each of the second processors, continuing execution from the second synchronization point in response to being launched.
2. The method of claim 1, wherein:
initiating the launch of the second processors comprises initiating a plurality of interrupts;
launching the one of the second processors comprises sending a first of the interrupts to the one of the second processors; and
launching the other one of the second processors comprises sending one of the remaining interrupts to the other one of the second processors in response to receiving the first interrupt.
3. The method of claim 2, further comprising:
in each of the second processors, after receiving one of the interrupts, sending another one of the interrupts to another one of the second processors if indicated in a memory location assigned to the second processor sending the other one of the interrupts.
4. The method of claim 3, wherein sending another one of the interrupts comprises writing to a memory-mapped address indicated in the memory location assigned to the second processor sending the other one of the interrupts.
5. The method of claim 1, further comprising:
in each of the second processors, after reaching the first synchronization point, updating at least one of a plurality of memory locations, wherein the memory locations indicate whether all of the second processors have reached the first synchronization point.
6. The method of claim 5, wherein updating at least one of the plurality of memory locations comprises performing at least one atomic memory update operation on the at least one of the memory locations.
7. The method of claim 6, wherein sending the first processor the signal comprises writing to one of the plurality of memory locations.
8. The method of claim 1, further comprising:
in the first processor, continuing execution from the first synchronization point to the second synchronization point in response to receiving the signal, and initiating the plurality of interrupts after reaching the second synchronization point.
9. The method of claim 8, further comprising:
in the processors, generating service requests for the first processor during execution before the first synchronization point and after the second synchronization point;
wherein continuing execution in the first processor from the first synchronization point to the second synchronization point in response to receiving the signal comprises processing the service requests.
10. The method of claim 8, wherein:
the first synchronization point is associated with a join operation of a multi-threaded program; and
the second synchronization point is associated with a fork operation of the multi-threaded program.
11. The method of claim 1, wherein the first synchronization point comprises the second synchronization point.
12. A computer-readable storage medium comprising instructions executable on a first processor and second processors for employing a method for synchronizing the processors, the method comprising:
in each of the second processors, waiting at a second synchronization point after reaching a first synchronization point;
in the last of the second processors to reach the first synchronization point, sending the first processor a signal;
in the first processor, waiting at the first synchronization point until receiving the signal, and initiating a launch of the second processors after receiving the signal by launching one of the second processors, wherein the one of the second processors launches another one of the second processors in response to being launched; and
in each of the second processors, continuing execution from the second synchronization point in response to being launched.
13. A multiprocessor system, comprising:
a first processor configured to wait at a first synchronization point until receiving a signal; and
second processors, wherein each of the second processors is configured to wait at a second synchronization point after reaching the first synchronization point, and to send the signal to the first processor if last of the second processors to reach the first synchronization point;
wherein the first processor is configured to initiate a launch of the second processors after receiving the signal by launching one of the second processors;
wherein the one of the second processors is configured to launch another one of the second processors in response to being launched; and
wherein each of the second processors is configured to continue execution from the second synchronization point in response to being launched.
14. The multiprocessor system of claim 13, wherein:
the first processor is configured to initiate the launch of the second processors by initiating a plurality of interrupts, and to launch one of the second processors by sending a first of the interrupts to one of the second processors; and
the one of the second processors is configured to launch another one of the second processors by sending at least one of the remaining interrupts in response to receiving the first interrupt.
15. The multiprocessor system of claim 14, wherein each of the second processors is configured to send another one of the interrupts to another one of the second processors after receiving one of the interrupts if indicated in a memory location assigned to the second processor sending the other one of the interrupts.
16. The multiprocessor system of claim 15, wherein each of the second processors is configured to send another one of the interrupts by writing to a memory-mapped address indicated in the memory location assigned to the second processor sending the other one of the interrupts.
17. The multiprocessor system of claim 13, wherein each of the second processors is configured to update at least one of a plurality of memory locations after reading the first synchronization point, wherein the memory locations indicate whether all of the second processors have reached the first synchronization point.
18. The multiprocessor system of claim 17, wherein each of the second processors is configured to update at least one of the plurality of memory locations by performing at least one atomic memory update operation on the at least one of the memory locations.
19. The multiprocessor system of claim 18, wherein each of the second processors is configured to send the first processor the signal by writing to one of the plurality of memory locations.
20. The multiprocessor system of claim 13, wherein the first processor is configured to continue execution from the first synchronization point to the second synchronization point in response to receiving the signal, and to initiate the plurality of interrupts after reaching the second synchronization point.
21. The multiprocessor system of claim 20, wherein:
each of the processors are configured to generate service requests for the first processor during execution before the first synchronization point and after the second synchronization point; and
the first processor is configured to continue execution from the first synchronization point to the second synchronization point in response to receiving the signal by processing the service requests.
22. The multiprocessor system of claim 20, wherein:
the first synchronization point is associated with a join operation of a multi-threaded program; and
the second synchronization point is associated with a fork operation of the multi-threaded program.
23. The multiprocessor system of claim 13, wherein the first synchronization point comprises the second synchronization point.
US11/838,630 2007-08-14 2007-08-14 Synchronization of processors in a multiprocessor system Abandoned US20090049323A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US11/838,630 US20090049323A1 (en) 2007-08-14 2007-08-14 Synchronization of processors in a multiprocessor system
PCT/US2008/008625 WO2009023076A1 (en) 2007-08-14 2008-07-14 Synchronization of processors in a multiprocessor system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/838,630 US20090049323A1 (en) 2007-08-14 2007-08-14 Synchronization of processors in a multiprocessor system

Publications (1)

Publication Number Publication Date
US20090049323A1 true US20090049323A1 (en) 2009-02-19

Family

ID=40350959

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/838,630 Abandoned US20090049323A1 (en) 2007-08-14 2007-08-14 Synchronization of processors in a multiprocessor system

Country Status (2)

Country Link
US (1) US20090049323A1 (en)
WO (1) WO2009023076A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070106918A1 (en) * 2005-11-07 2007-05-10 Seiko Epson Corporation Multi processor system and interrupt signal sending method therefor
US20090327554A1 (en) * 2008-06-25 2009-12-31 Dell Products L.P. Synchronizing processors when entering system management mode
US20110047556A1 (en) * 2008-01-17 2011-02-24 Kosuke Nishihara Synchronization control method and information processing device
US20110274192A1 (en) * 2009-01-23 2011-11-10 Alcatel Lucent Synchronization method and device for real-time distributed system
US20130145378A1 (en) * 2011-12-01 2013-06-06 International Business Machines Corporation Determining Collective Barrier Operation Skew In A Parallel Computer

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2827247A1 (en) 2013-07-16 2015-01-21 Continental Automotive GmbH Method for synchronising changes in state in systems embedded in multi-core computers

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5276823A (en) * 1988-12-09 1994-01-04 Tandem Computers Incorporated Fault-tolerant computer system with redesignation of peripheral processor
US5978838A (en) * 1996-08-19 1999-11-02 Samsung Electronics Co., Ltd. Coordination and synchronization of an asymmetric, single-chip, dual multiprocessor
US5978830A (en) * 1997-02-24 1999-11-02 Hitachi, Ltd. Multiple parallel-job scheduling method and apparatus
US6032173A (en) * 1992-06-10 2000-02-29 Siemens Aktiengesellschaft Synchronization of a computer system having a plurality of processors
US6223228B1 (en) * 1998-09-17 2001-04-24 Bull Hn Information Systems Inc. Apparatus for synchronizing multiple processors in a data processing system
US6249880B1 (en) * 1998-09-17 2001-06-19 Bull Hn Information Systems Inc. Method and apparatus for exhaustively testing interactions among multiple processors
US20050193148A1 (en) * 2004-02-27 2005-09-01 Chen Chenghung J. Processor synchronization in a multi-processor computer system
US20070240158A1 (en) * 2006-04-06 2007-10-11 Shailender Chaudhry Method and apparatus for synchronizing threads on a processor that supports transactional memory

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2708172B2 (en) * 1988-03-24 1998-02-04 株式会社東芝 Parallel processing method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5276823A (en) * 1988-12-09 1994-01-04 Tandem Computers Incorporated Fault-tolerant computer system with redesignation of peripheral processor
US6032173A (en) * 1992-06-10 2000-02-29 Siemens Aktiengesellschaft Synchronization of a computer system having a plurality of processors
US5978838A (en) * 1996-08-19 1999-11-02 Samsung Electronics Co., Ltd. Coordination and synchronization of an asymmetric, single-chip, dual multiprocessor
US5978830A (en) * 1997-02-24 1999-11-02 Hitachi, Ltd. Multiple parallel-job scheduling method and apparatus
US6223228B1 (en) * 1998-09-17 2001-04-24 Bull Hn Information Systems Inc. Apparatus for synchronizing multiple processors in a data processing system
US6249880B1 (en) * 1998-09-17 2001-06-19 Bull Hn Information Systems Inc. Method and apparatus for exhaustively testing interactions among multiple processors
US20050193148A1 (en) * 2004-02-27 2005-09-01 Chen Chenghung J. Processor synchronization in a multi-processor computer system
US20070240158A1 (en) * 2006-04-06 2007-10-11 Shailender Chaudhry Method and apparatus for synchronizing threads on a processor that supports transactional memory

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070106918A1 (en) * 2005-11-07 2007-05-10 Seiko Epson Corporation Multi processor system and interrupt signal sending method therefor
US7853814B2 (en) * 2005-11-07 2010-12-14 Seiko Epson Corporation Method and system for executing a power-cutoff-specific process within a specific processor of a multiprocessor system
US20110047556A1 (en) * 2008-01-17 2011-02-24 Kosuke Nishihara Synchronization control method and information processing device
US8555291B2 (en) * 2008-01-17 2013-10-08 Nec Corporation Synchronization control method and information processing device
US20090327554A1 (en) * 2008-06-25 2009-12-31 Dell Products L.P. Synchronizing processors when entering system management mode
US7991933B2 (en) * 2008-06-25 2011-08-02 Dell Products L.P. Synchronizing processors when entering system management mode
US8260995B2 (en) 2008-06-25 2012-09-04 Dell Products L.P. Processor interrupt command response system
US20110274192A1 (en) * 2009-01-23 2011-11-10 Alcatel Lucent Synchronization method and device for real-time distributed system
US8495408B2 (en) * 2009-01-23 2013-07-23 Alcatel Lucent Synchronization method and device for real-time distributed system wherein each module sets an indicator to signal whether it is currently able to operate synchronously with other modules and resets the indicator at a unified synchronization end time
US20130145378A1 (en) * 2011-12-01 2013-06-06 International Business Machines Corporation Determining Collective Barrier Operation Skew In A Parallel Computer
US9195517B2 (en) 2011-12-01 2015-11-24 International Business Machines Corporation Determining collective barrier operation skew in a parallel computer
US9195516B2 (en) * 2011-12-01 2015-11-24 International Business Machines Corporation Determining collective barrier operation skew in a parallel computer

Also Published As

Publication number Publication date
WO2009023076A1 (en) 2009-02-19

Similar Documents

Publication Publication Date Title
US6631462B1 (en) Memory shared between processing threads
US9146777B2 (en) Parallel processing with solidarity cells by proactively retrieving from a task pool a matching task for the solidarity cell to process
US8209690B2 (en) System and method for thread handling in multithreaded parallel computing of nested threads
EP1381939B1 (en) Registers for data transfers within a multithreaded processor
US8015248B2 (en) Queuing of conflicted remotely received transactions
US5202988A (en) System for communicating among processors having different speeds
US20090049323A1 (en) Synchronization of processors in a multiprocessor system
US7415598B2 (en) Message synchronization in network processors
JP3206914B2 (en) Multiprocessor system
KR100613923B1 (en) Context pipelines
US20200034214A1 (en) Method for arbitration and access to hardware request ring structures in a concurrent environment
KR100895536B1 (en) Data transfer mechanism
US8359459B2 (en) Using hardware support to reduce synchronization costs in multithreaded applications
JP2017045151A (en) Arithmetic processing device and control method of arithmetic processing device
JPH09138778A (en) Device and method using semaphore buffer for semaphore instruction
WO2019178178A1 (en) Thread scheduling for multithreaded data processing environments
US6766437B1 (en) Composite uniprocessor
CA2382728A1 (en) Efficient event waiting
US20030212852A1 (en) Signal aggregation
CN114756287A (en) Data processing method and device for reorder buffer and storage medium
US10489164B2 (en) Apparatuses for enqueuing kernels on a device-side
Diep et al. A general approach for supporting nonblocking data structures on distributed-memory systems
US20090300630A1 (en) Waiting based on a task group
US6996665B2 (en) Hazard queue for transaction pipeline
JP2517859B2 (en) Parallel process management method

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:IMARK, ROBERT R.;GASSER, RAYMOND A.;REEL/FRAME:019779/0336

Effective date: 20070808

STCB Information on status: application discontinuation

Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION