US20090049323A1 - Synchronization of processors in a multiprocessor system - Google Patents
Synchronization of processors in a multiprocessor system Download PDFInfo
- Publication number
- US20090049323A1 US20090049323A1 US11/838,630 US83863007A US2009049323A1 US 20090049323 A1 US20090049323 A1 US 20090049323A1 US 83863007 A US83863007 A US 83863007A US 2009049323 A1 US2009049323 A1 US 2009049323A1
- Authority
- US
- United States
- Prior art keywords
- processors
- synchronization point
- processor
- interrupts
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/522—Barrier synchronisation
Definitions
- SIMD single instruction stream, multiple data stream
- MIMD multiple instruction stream, multiple data stream
- One type of communication involves synchronizing two or more of the processors by requiring each of the processors to halt execution at a predetermined point in its execution thread (called “rendezvous”), and then begin execution again at the same or another predetermined location (termed “launch).
- rendezvous a predetermined point in its execution thread
- launch a predetermined location
- One or more such synchronization points may be employed depending on the general nature and specific requirements of the overall task to be performed.
- a multiprocessing computer system supports the use of synchronization points by way of complicated, specialized and nonstandard hardware functions. For example, a dedicated hardware implementation of a specialized broadcast interrupt may be supplied to support the use of launch points so that a single processor may inform all other processors of a launch quickly and efficiently.
- a dedicated hardware implementation of a specialized broadcast interrupt may be supplied to support the use of launch points so that a single processor may inform all other processors of a launch quickly and efficiently.
- other multiprocessor systems may not provide such hardware due to the design and implementation expense involved in supporting such a specialized hardware construct.
- FIG. 1 is a block diagram of a multiprocessor system according to an embodiment of the invention.
- FIG. 2 is flow diagram of a method for synchronizing the processors of FIG. 1 according to an embodiment of the invention.
- FIG. 3 is a graphic representation of execution of the processors in the multiprocessor of FIG. 1 according to another embodiment of the invention.
- FIG. 4 is a diagram of a rendezvous table employed in the processor execution of FIG. 3 according to an embodiment of the invention.
- FIG. 5 is a diagram of a launch table employed in the processor execution of FIG. 3 according to an embodiment of the invention.
- FIG. 1 provides a simplified block diagram of a multiprocessor system 100 including a first processor 102 and multiple second processors 104 . In other embodiments, as few as two second processors 104 , or many more than that depicted in FIG. 1 , may be included.
- the multiprocessor system 100 may employ any type of multiprocessor architecture in which synchronization of two or more of the processors is desired.
- One possible architecture for the multiprocessor system may be a symmetric multiprocessing (SMP) system, although any of a plethora of other multiprocessor architectures may benefit from the inventive concepts described below.
- SMP symmetric multiprocessing
- first processor 102 is distinguished from the second processors 104
- first processor 102 and the second processors 104 may all be equivalent in terms of physical and electronic construction.
- the first processor 102 is instead distinguished in terms of its role in the synchronization of the processors 102 , 104 from that of the second processors 104 , as described in greater detail below.
- each of the second processors 104 serve a similar synchronization function.
- the second processors 104 may or may not be similar in design and construction to each other. Only the functionality of the processors 102 , 104 as described below is relevant to the embodiments presented herein.
- FIG. 2 presents by way of a flow diagram a method 200 of synchronizing the processors 102 , 104 of FIG. 1 .
- each of the second processors 104 waits at a second synchronization point after reaching a first synchronization point (operation 202 ).
- the last of the second processors 104 to reach the first synchronization point sends a signal to the first processor 102 (operation 204 ).
- the first processor 102 waits at the first synchronization point until receiving the signal (operation 206 ), and initiates a launch of the second processors 104 after receiving the signal (operation 208 ).
- the first processor initiates the launch by launching at least one of the second processors 104 , wherein the second processor 104 being launched launches another one of the second processors 104 in response to being launched.
- Each of the second processors 104 in response to being launched, continues execution from the second synchronization point (operation 210 ).
- FIG. 3 graphically characterizes the execution of the first processor 102 and four separate second processors 104 according to a more detailed embodiment of the invention. While four separate second processors 104 are employed in this particular example, any number of second processors 104 greater than one may be utilized in other embodiments.
- the first processor 102 is identified as Processor 0
- the second processors 104 are labeled Processors 1 - 4 . Execution of each of the processors 102 , 104 is depicted as progressing from top to bottom in relation to a first synchronization point 302 and a second synchronization point 304 .
- the first synchronization point 302 is a rendezvous or gathering point for the processors 102 , 104
- the second synchronization point 304 is a launching point from which the processors 102 , 104 continue execution.
- the synchronization points 302 , 304 are represented in FIG. 3 as being located in completely separate areas of the execution stream of each of the processors 102 , 104
- the first synchronization point 302 and the second synchronization point 304 may represent approximately the same point in the execution thread of any or all of the processors 102 , 104 in another example.
- the first processor 102 executes until reaching the first synchronization point 302 , whereupon the first processor 102 waits for a signal 312 from one of the second processors 104 . Meanwhile, each of the second processors 104 executes through the first synchronization point 302 to the second synchronization point 304 . Each of the second processors 104 accesses a rendezvous table 310 at the first synchronization point 302 to indicate that the accessing second processor 104 has reached the first synchronization point 302 . The second processor 104 that is last to reach the first synchronization point 302 issues the signal 312 to the first processor 102 so that it may continue execution.
- FIG. 4 illustrates one specific example of the rendezvous table 310 that each of the second processors 104 accesses to indicate its arrival at the first synchronization point 302 .
- the rendezvous table 310 includes several entries, wherein each entry is depicted in FIG. 3 as a row. Further, each entry is associated with an address 402 , and includes three fields: a processor count field 404 , a processor threshold field 406 , and a next address field 408 . Also shown in FIG. 4 is the entry of the rendezvous table 310 that is associated with each of the processors 102 , 104 , identified by the address 402 of the entry.
- each second processor 104 accesses the entry of the rendezvous table 310 to which it is assigned.
- the second processor 104 then performs an atomic read-modify-write operation of the processor count field 404 of that entry and compares the value read from the processor count field 404 to the processor threshold field 406 . If the values are equal, the second processor 104 then accesses the rendezvous table 310 entry indicated in the next address field 408 in the same fashion as the previous entry. This process continues for each of the second processors 104 until an accessed processor count field 404 is found to be less than its associated processor threshold field 406 .
- Processors 1 - 3 are each assigned the rendezvous table 310 entry located at address 7066200. Each of these second processors 104 reads the processor count field 404 of that entry and compares it to the processor threshold field 406 .
- the first of Processors 1 - 3 that reaches the first synchronization point 302 and accesses the rendezvous table 310 entry at 7066200 reads a value of zero from the processor count field 404 and writes the incremented value of one back to the processor count field 404 .
- That processor also compares the value of zero read from the processor count field 404 to the processor threshold 406 of two, and ceases this particular access of the rendezvous table 310 as a result.
- the second of the Processors 1 - 3 to reach the first synchronization point 302 accesses the same entry of the rendezvous table 310 , resulting in a value of two being stored for the processor count field 404 at address 7066200.
- the third of the Processors 1 - 3 to pass through the first synchronization point 302 accesses the same entry. However, after reading a value of two from the processor count field 404 and writing back a three thereto, that particular second processor 104 compares the two to the processor threshold 406 of two, and after finding that they are equal, accesses the next address field 408 , which stores an address of 7066100. The last of the Processors 1 - 3 reaching the first synchronization point 302 then accesses the entry of the rendezvous table 310 at the address of 7066100 and repeats the process.
- the second processor 104 referred to as Processor 4 also accesses the entry of the rendezvous table 310 at address 7066100 when it reaches the first synchronization point 302 .
- Processor 4 is the last second processor 104 to reach the first synchronization point 302 .
- Processor 4 After reading the value of one from the processor count field 404 (written therein by the last of Processors 1 - 3 , as described above), Processor 4 compares the one to the value of one written in the processor threshold field 406 . With these two values being equal, Processor 4 reads the next address field 408 that holds the value of 7066000.
- Processor 4 then reads the rendezvous table 310 entry at that address, reads the processor count field 404 of zero, stores the incremented value of one thereto, and compares the zero to the zero stored in the processor threshold field 406 of that entry. With the compared values being equal, Processor 4 then reads the next address field 408 of that entry, which holds the value 0000000. In this embodiment, reading all zeros indicates to Processor 4 that it is the last of the second processors 104 to reach the first synchronization point 302 .
- Processor 4 sends a signal 312 to Processor 0 (i.e., the first processor 102 of FIG. 3 ).
- the signal 312 may be issued in a number of ways.
- the act of Processor 4 writing a one to the processor count field 404 of the rendezvous table 310 entry at address 7066000 may serve as the signal 312 .
- Processor 0 may poll the processor count field 404 for a one to be written thereto, interpreting the one as the signal 312 .
- Processor 4 writes a separate memory location, sends a message to Processor 0 , or performs some other operation to implement the signal 312 to Processor 0 .
- Processor 0 or Processor 4 may clear the processor count field 404 of each entry of the rendezvous table 310 to initialize the table 310 for the next time the first synchronization point 302 or other similar rendezvous point is employed.
- Write operations to the processor count fields 404 as described above specifically employ an atomic read-modify-write operation often used in multiprocessor systems for processor intercommunication so that conflicts in accessing the field 404 will not arise between two or more of the second processors 104 .
- an atomic operation eliminates the possibility that two of the second processors will read the same value from the same processor count field 404 .
- Other memory accesses that prevent such conflicts such as standard “semaphore” or “mailbox” operations, could be utilized in other embodiments.
- FIG. 4 allows generation of the signal 312 by way of a single access of the processor count field 404 of address 7066000, which the first processor 102 may be polling, as described above.
- the hierarchical model of FIG. 4 provides a more efficient method for processor synchronization.
- FIG. 3 A similar hierarchical model employing interrupts is utilized in FIG. 3 in launching the second processors 104 from the second synchronization point 304 . More specifically, after the first processor 102 (i.e., Processor 0 ) receives the signal 312 , the first processor 102 resumes execution to the second synchronization point 304 , at which time it initiates a launch of each of the second processors 104 (i.e., Processors 1 - 4 ) by sending a first interrupt 322 to one of the second processors 104 , which is Processor 1 in the embodiment of FIG. 3 . In this specific example, Processor 0 also issues a second interrupt 324 to Processor 4 .
- Processor 0 also issues a second interrupt 324 to Processor 4 .
- Processor 1 issues a third interrupt 326 to Processor 2 , and a fourth interrupt 328 to Processor 3 .
- each of the second processors 104 then continues execution from the second synchronization point 304 , along with the first processor 102 .
- the first synchronization point 302 and the second synchronization point 304 are employed together to rendezvous and subsequently launch the processors 102 , 104 .
- FIG. 5 graphically shows an embodiment of a launch table 330 accessed by each of the Processors 0 - 4 to perform the hierarchically-structured launch.
- the launch table 330 includes a number of entries, each of which is accessible via an address 502 .
- each entry reside one or more interrupt indicators 504 , wherein each indicator 504 is associated with a particular launch interrupt.
- each interrupt indicator 504 is an interrupt mask and enable value that specifies an interrupt associated with a particular second processor 104 .
- the processor 102 , 104 issuing the interrupt may merely write the data stored at the interrupt indicator 504 to a location that causes the processor 102 , 104 to generate the interrupt.
- Other indicators such as a simple binary value indicating the specific second processor 104 to be launched, may be used in other examples.
- FIG. 5 also indicates the launch table 330 entry address and interrupt associated with each of the processors 102 , 104 .
- Processor 0 accesses its assigned launch table 330 address 7066700 after reaching the second synchronization point 304 .
- the entry at address 7066700 indicates that interrupts are to be issued to Processor 1 (by way of interrupt 322 ) and Processor 4 (via interrupt 324 ), as shown in the interrupt indicators 504 of that entry.
- an interrupt indicator 504 of zero denotes the end of the list of interrupts to be issued by the processor accessing the associated entry of the launch table 330 .
- Processor 1 After receiving its launch interrupt 322 , Processor 1 accesses its assigned launch table entry 330 at address 7066800, which indicates that Processor 1 is to issue launch interrupts 326 and 328 for Processors 2 and 3 , respectively. Similarly, Processor 4 accesses its assigned entry at address 7066900 in response to receiving interrupt 324 from Processor 0 . However, the entry at address 706900 lists no further launch interrupts to be issued. Similarly, Processors 2 and 3 access the same entry in response to receiving interrupts 326 , 328 , and find that they are not responsible for issuing any launch interrupts, either. As a result, each of the second processors 104 has received a launch interrupt, even though the first processor 102 has only issued two launch interrupts directly. As a result, the work required to issue the interrupts has been distributed among the processors 102 , 104 in a hierarchical fashion, thus hastening the launch process.
- the rendezvous table 310 , the launch table 330 , and the assigned table addresses for each of the processors 102 , 104 are initialized by one or more of the processors 102 , 104 . According to another implementation, a separate processor not discussed above may perform this function. Once the tables 310 , 330 are initialized, further setup of the tables 310 , 330 may not be required during the use of multiple synchronizations of the processors 102 , 104 .
- each of the processors 102 , 104 may act as a producer of service requests before the first synchronization point 302 and after the second synchronization point 304 .
- the first processor 102 may then process or consume the service requests after the first synchronization point 302 and before the second synchronization point 304 while the second processors 104 remain idle at the second synchronization point 304 .
- the service requests may constitute requests for any service that may be provided by the first processor 102 , including but not limited to generating billing records, or searching and retrieving items in a database.
- the first synchronization point 302 may be used to implement a “join” operation employed in a MIMD multiprocessing system, while the second synchronization point 304 may be utilized as part of a “fork” operation.
- the first processor 102 may execute a single thread after the first synchronization point 302 while the second processors 104 wait to be called on at their second synchronization point 304 .
- the first processor 104 may then execute a fork operation to spawn the second processors 104 by way of the launch process from the second synchronization point 304 , as described above.
- execution in that second processor 104 reaches the next first synchronization point 302 .
- the last of the second processors 104 to reach the first synchronization point 302 then issues the signal 312 to the first processor 102 , thus joining the execution of the processors back together.
- the first processor 102 may then operate as a lone thread until another fork operation is undertaken at another second synchronization point 304 .
- Various embodiments of a multiprocessor system and method as discussed above may provide significant benefits. Since a combination of simple shared memory communication and individual interrupts is employed to effectuate the rendezvous and launch processes, the embodiments may be implemented on most multiprocessing systems. In addition, the use of a logical processor hierarchy among the second processors 104 , with the first processor 102 residing at the top of the hierarchy, facilitates quick execution of both the rendezvous and launch phases of the synchronization.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multi Processors (AREA)
Abstract
A method for synchronizing a first processor and multiple second processors is presented. In the method, each of the second processors waits at a second synchronization point after reaching a first synchronization point. The last of the second processors to reach the first synchronization point sends a signal to the first processor. The first processor waits at the first synchronization point until it receives the signal. After receiving the signal, the first processor initiates a launch of the second processors by launching at least one of the second processors. At least one of the second processors launched by the first processor launches another of the second processors in response to being launched by the first processor. Each of the second processors continues execution from the second synchronization point in response to being launched.
Description
- Designers of computer systems consistently strive for increased processing capacity with each product generation. Many different approaches have been adopted to achieve the computing speeds currently enjoyed by users of such systems. Increased system clock speeds, integrated circuit design advances, wider data paths, and various other technological developments have all contributed to increasing the processing throughput of a single processor.
- To further enhance computer capability, pipelined and parallel arrangements of multiple processing units have been pursued successfully. Parallel processing generally began with the use of “single instruction stream, multiple data stream” (SIMD) architectures, in which multiple processors perform identical operations on different data. In such a system, a single program line, or “thread,” of instructions is executed. More advanced “multiple instruction stream, multiple data stream” (MIMD) systems allow each processor to execute a completely diverse set of instructions, or a separate copy of the same set of instructions.
- However, even in an MIMD system, some communication or cooperation between the various processors is typically required. One type of communication involves synchronizing two or more of the processors by requiring each of the processors to halt execution at a predetermined point in its execution thread (called “rendezvous”), and then begin execution again at the same or another predetermined location (termed “launch). One or more such synchronization points may be employed depending on the general nature and specific requirements of the overall task to be performed.
- Typically, a multiprocessing computer system supports the use of synchronization points by way of complicated, specialized and nonstandard hardware functions. For example, a dedicated hardware implementation of a specialized broadcast interrupt may be supplied to support the use of launch points so that a single processor may inform all other processors of a launch quickly and efficiently. However, other multiprocessor systems may not provide such hardware due to the design and implementation expense involved in supporting such a specialized hardware construct.
-
FIG. 1 is a block diagram of a multiprocessor system according to an embodiment of the invention. -
FIG. 2 is flow diagram of a method for synchronizing the processors ofFIG. 1 according to an embodiment of the invention. -
FIG. 3 is a graphic representation of execution of the processors in the multiprocessor ofFIG. 1 according to another embodiment of the invention. -
FIG. 4 is a diagram of a rendezvous table employed in the processor execution ofFIG. 3 according to an embodiment of the invention. -
FIG. 5 is a diagram of a launch table employed in the processor execution ofFIG. 3 according to an embodiment of the invention. -
FIG. 1 provides a simplified block diagram of amultiprocessor system 100 including afirst processor 102 and multiplesecond processors 104. In other embodiments, as few as twosecond processors 104, or many more than that depicted inFIG. 1 , may be included. Themultiprocessor system 100 may employ any type of multiprocessor architecture in which synchronization of two or more of the processors is desired. One possible architecture for the multiprocessor system may be a symmetric multiprocessing (SMP) system, although any of a plethora of other multiprocessor architectures may benefit from the inventive concepts described below. - While the
first processor 102 is distinguished from thesecond processors 104, thefirst processor 102 and thesecond processors 104 may all be equivalent in terms of physical and electronic construction. Thefirst processor 102 is instead distinguished in terms of its role in the synchronization of theprocessors second processors 104, as described in greater detail below. Further, each of thesecond processors 104 serve a similar synchronization function. However, thesecond processors 104 may or may not be similar in design and construction to each other. Only the functionality of theprocessors -
FIG. 2 presents by way of a flow diagram amethod 200 of synchronizing theprocessors FIG. 1 . In themethod 200, each of thesecond processors 104 waits at a second synchronization point after reaching a first synchronization point (operation 202). The last of thesecond processors 104 to reach the first synchronization point sends a signal to the first processor 102 (operation 204). Thefirst processor 102 waits at the first synchronization point until receiving the signal (operation 206), and initiates a launch of thesecond processors 104 after receiving the signal (operation 208). More specifically, the first processor initiates the launch by launching at least one of thesecond processors 104, wherein thesecond processor 104 being launched launches another one of thesecond processors 104 in response to being launched. Each of thesecond processors 104, in response to being launched, continues execution from the second synchronization point (operation 210). -
FIG. 3 graphically characterizes the execution of thefirst processor 102 and four separatesecond processors 104 according to a more detailed embodiment of the invention. While four separatesecond processors 104 are employed in this particular example, any number ofsecond processors 104 greater than one may be utilized in other embodiments. For identification purposes, thefirst processor 102 is identified asProcessor 0, while thesecond processors 104 are labeled Processors 1-4. Execution of each of theprocessors first synchronization point 302 and asecond synchronization point 304. In one embodiment, thefirst synchronization point 302 is a rendezvous or gathering point for theprocessors second synchronization point 304 is a launching point from which theprocessors synchronization points FIG. 3 as being located in completely separate areas of the execution stream of each of theprocessors first synchronization point 302 and thesecond synchronization point 304 may represent approximately the same point in the execution thread of any or all of theprocessors - In
FIG. 3 , thefirst processor 102 executes until reaching thefirst synchronization point 302, whereupon thefirst processor 102 waits for asignal 312 from one of thesecond processors 104. Meanwhile, each of thesecond processors 104 executes through thefirst synchronization point 302 to thesecond synchronization point 304. Each of thesecond processors 104 accesses a rendezvous table 310 at thefirst synchronization point 302 to indicate that the accessingsecond processor 104 has reached thefirst synchronization point 302. Thesecond processor 104 that is last to reach thefirst synchronization point 302 issues thesignal 312 to thefirst processor 102 so that it may continue execution. -
FIG. 4 illustrates one specific example of the rendezvous table 310 that each of thesecond processors 104 accesses to indicate its arrival at thefirst synchronization point 302. The rendezvous table 310 includes several entries, wherein each entry is depicted inFIG. 3 as a row. Further, each entry is associated with anaddress 402, and includes three fields: aprocessor count field 404, aprocessor threshold field 406, and anext address field 408. Also shown inFIG. 4 is the entry of the rendezvous table 310 that is associated with each of theprocessors address 402 of the entry. - As each
second processor 104 reaches thefirst synchronization point 302, thatsecond processor 104 accesses the entry of the rendezvous table 310 to which it is assigned. Thesecond processor 104 then performs an atomic read-modify-write operation of theprocessor count field 404 of that entry and compares the value read from theprocessor count field 404 to theprocessor threshold field 406. If the values are equal, thesecond processor 104 then accesses the rendezvous table 310 entry indicated in thenext address field 408 in the same fashion as the previous entry. This process continues for each of thesecond processors 104 until an accessedprocessor count field 404 is found to be less than its associatedprocessor threshold field 406. - Using the scenario depicted in
FIGS. 3 and 4 as an example, Processors 1-3 are each assigned the rendezvous table 310 entry located ataddress 7066200. Each of thesesecond processors 104 reads theprocessor count field 404 of that entry and compares it to theprocessor threshold field 406. Thus, the first of Processors 1-3 that reaches thefirst synchronization point 302 and accesses the rendezvous table 310 entry at 7066200 reads a value of zero from theprocessor count field 404 and writes the incremented value of one back to theprocessor count field 404. That processor also compares the value of zero read from theprocessor count field 404 to theprocessor threshold 406 of two, and ceases this particular access of the rendezvous table 310 as a result. Similarly, the second of the Processors 1-3 to reach thefirst synchronization point 302 accesses the same entry of the rendezvous table 310, resulting in a value of two being stored for theprocessor count field 404 ataddress 7066200. - Continuing in this manner, the third of the Processors 1-3 to pass through the
first synchronization point 302 accesses the same entry. However, after reading a value of two from theprocessor count field 404 and writing back a three thereto, that particularsecond processor 104 compares the two to theprocessor threshold 406 of two, and after finding that they are equal, accesses thenext address field 408, which stores an address of 7066100. The last of the Processors 1-3 reaching thefirst synchronization point 302 then accesses the entry of the rendezvous table 310 at the address of 7066100 and repeats the process. Assuming thisprocessor 104 reaches thefirst synchronization point 302 before Processor 4 (the operation of which is addressed below), theprocessor count field 404 of zero is read, a one is written back thereto, and the zero is compared to theprocessor threshold field 406 of one. As a result, the last of these second processors 104 (i.e., Processors 1-3) ceases its access of the rendezvous table 310. - Proceeding with the example of
FIGS. 3 and 4 , thesecond processor 104 referred to asProcessor 4 also accesses the entry of the rendezvous table 310 ataddress 7066100 when it reaches thefirst synchronization point 302. In this case,Processor 4 is the lastsecond processor 104 to reach thefirst synchronization point 302. After reading the value of one from the processor count field 404 (written therein by the last of Processors 1-3, as described above),Processor 4 compares the one to the value of one written in theprocessor threshold field 406. With these two values being equal,Processor 4 reads thenext address field 408 that holds the value of 7066000.Processor 4 then reads the rendezvous table 310 entry at that address, reads theprocessor count field 404 of zero, stores the incremented value of one thereto, and compares the zero to the zero stored in theprocessor threshold field 406 of that entry. With the compared values being equal,Processor 4 then reads thenext address field 408 of that entry, which holds thevalue 0000000. In this embodiment, reading all zeros indicates toProcessor 4 that it is the last of thesecond processors 104 to reach thefirst synchronization point 302. - In response to being the last of the
second processors 104,Processor 4 sends asignal 312 to Processor 0 (i.e., thefirst processor 102 ofFIG. 3 ). Thesignal 312 may be issued in a number of ways. In one embodiment, the act ofProcessor 4 writing a one to theprocessor count field 404 of the rendezvous table 310 entry ataddress 7066000 may serve as thesignal 312. In that case,Processor 0 may poll theprocessor count field 404 for a one to be written thereto, interpreting the one as thesignal 312. In another implementation,Processor 4 writes a separate memory location, sends a message toProcessor 0, or performs some other operation to implement thesignal 312 toProcessor 0. In addition,Processor 0 orProcessor 4 may clear theprocessor count field 404 of each entry of the rendezvous table 310 to initialize the table 310 for the next time thefirst synchronization point 302 or other similar rendezvous point is employed. - Write operations to the processor count fields 404 as described above specifically employ an atomic read-modify-write operation often used in multiprocessor systems for processor intercommunication so that conflicts in accessing the
field 404 will not arise between two or more of thesecond processors 104. For example, use of an atomic operation eliminates the possibility that two of the second processors will read the same value from the sameprocessor count field 404. Other memory accesses that prevent such conflicts, such as standard “semaphore” or “mailbox” operations, could be utilized in other embodiments. - By employing multiple rendezvous table 310 entries, a hierarchical structure of memory locations is formed by which the
second processors 104 indicate reaching thefirst synchronization point 302. Thus, by spreading the access to the rendezvous table 310 by each of thesecond processors 104 across multiple memory locations, access contention for those locations, which potentially is exacerbated by the atomic memory operations, is greatly decreased, resulting in faster signaling of thefirst processor 102. In addition, the particular embodiment ofFIG. 4 allows generation of thesignal 312 by way of a single access of theprocessor count field 404 ofaddress 7066000, which thefirst processor 102 may be polling, as described above. In contrast to a single counter being updated by each of thesecond processors 104 while thefirst processor 102 polls the same memory location, the hierarchical model ofFIG. 4 provides a more efficient method for processor synchronization. - A similar hierarchical model employing interrupts is utilized in
FIG. 3 in launching thesecond processors 104 from thesecond synchronization point 304. More specifically, after the first processor 102 (i.e., Processor 0) receives thesignal 312, thefirst processor 102 resumes execution to thesecond synchronization point 304, at which time it initiates a launch of each of the second processors 104 (i.e., Processors 1-4) by sending a first interrupt 322 to one of thesecond processors 104, which isProcessor 1 in the embodiment ofFIG. 3 . In this specific example,Processor 0 also issues a second interrupt 324 toProcessor 4. In turn, after receiving the first interrupt 322,Processor 1 issues a third interrupt 326 toProcessor 2, and a fourth interrupt 328 toProcessor 3. In response to receiving one of the interrupts 322-328, each of thesecond processors 104 then continues execution from thesecond synchronization point 304, along with thefirst processor 102. Thus, thefirst synchronization point 302 and thesecond synchronization point 304 are employed together to rendezvous and subsequently launch theprocessors -
FIG. 5 graphically shows an embodiment of a launch table 330 accessed by each of the Processors 0-4 to perform the hierarchically-structured launch. As with the rendezvous table 310, the launch table 330 includes a number of entries, each of which is accessible via anaddress 502. Within each entry reside one or more interrupt indicators 504, wherein each indicator 504 is associated with a particular launch interrupt. In one embodiment, each interrupt indicator 504 is an interrupt mask and enable value that specifies an interrupt associated with a particularsecond processor 104. Thus, theprocessor processor second processor 104 to be launched, may be used in other examples.FIG. 5 also indicates the launch table 330 entry address and interrupt associated with each of theprocessors - In the particular example of
FIG. 5 , Processor 0 (i.e., the first processor 102) accesses its assigned launch table 330address 7066700 after reaching thesecond synchronization point 304. The entry ataddress 7066700 indicates that interrupts are to be issued to Processor 1 (by way of interrupt 322) and Processor 4 (via interrupt 324), as shown in the interrupt indicators 504 of that entry. Further according to the embodiment ofFIG. 5 , an interrupt indicator 504 of zero denotes the end of the list of interrupts to be issued by the processor accessing the associated entry of the launch table 330. - After receiving its launch interrupt 322,
Processor 1 accesses its assignedlaunch table entry 330 ataddress 7066800, which indicates thatProcessor 1 is to issue launch interrupts 326 and 328 forProcessors Processor 4 accesses its assigned entry ataddress 7066900 in response to receiving interrupt 324 fromProcessor 0. However, the entry at address 706900 lists no further launch interrupts to be issued. Similarly,Processors second processors 104 has received a launch interrupt, even though thefirst processor 102 has only issued two launch interrupts directly. As a result, the work required to issue the interrupts has been distributed among theprocessors - In one embodiment, the rendezvous table 310, the launch table 330, and the assigned table addresses for each of the
processors processors processors - The use of the two
synchronization points processors first synchronization point 302 and after thesecond synchronization point 304. In response, thefirst processor 102 may then process or consume the service requests after thefirst synchronization point 302 and before thesecond synchronization point 304 while thesecond processors 104 remain idle at thesecond synchronization point 304. The service requests may constitute requests for any service that may be provided by thefirst processor 102, including but not limited to generating billing records, or searching and retrieving items in a database. - In another example, the
first synchronization point 302 may be used to implement a “join” operation employed in a MIMD multiprocessing system, while thesecond synchronization point 304 may be utilized as part of a “fork” operation. In such an environment, thefirst processor 102 may execute a single thread after thefirst synchronization point 302 while thesecond processors 104 wait to be called on at theirsecond synchronization point 304. Thefirst processor 104 may then execute a fork operation to spawn thesecond processors 104 by way of the launch process from thesecond synchronization point 304, as described above. As each of thesecond processors 104 then finishes the work to which it was assigned, execution in thatsecond processor 104 reaches the nextfirst synchronization point 302. The last of thesecond processors 104 to reach thefirst synchronization point 302 then issues thesignal 312 to thefirst processor 102, thus joining the execution of the processors back together. Thefirst processor 102 may then operate as a lone thread until another fork operation is undertaken at anothersecond synchronization point 304. - Various embodiments of a multiprocessor system and method as discussed above may provide significant benefits. Since a combination of simple shared memory communication and individual interrupts is employed to effectuate the rendezvous and launch processes, the embodiments may be implemented on most multiprocessing systems. In addition, the use of a logical processor hierarchy among the
second processors 104, with thefirst processor 102 residing at the top of the hierarchy, facilitates quick execution of both the rendezvous and launch phases of the synchronization. - While several embodiments of the invention have been discussed herein, other embodiments encompassed by the scope of the invention are possible. For example, while many embodiments as described above specifically involve the use of a handful of processors within a multiprocessor system, other embodiments employing many more processors coupled together within a single system may exhibit even greater advantages over more serially oriented solutions due to the hierarchical nature of the synchronization mechanisms distributing the required work among many more processors. Further, aspects of one embodiment may be combined with those of alternative embodiments to create further implementations of the present invention. Thus, while the present invention has been described in the context of specific embodiments, such descriptions are provided for illustration and not limitation. Accordingly, the proper scope of the present invention is delimited only by the following claims and their equivalents.
Claims (23)
1. A method for synchronizing a first processor and second processors, the method comprising:
in each of the second processors, waiting at a second synchronization point after reaching a first synchronization point;
in the last of the second processors to reach the first synchronization point, sending the first processor a signal;
in the first processor, waiting at the first synchronization point until receiving the signal, and initiating a launch of the second processors after receiving the signal by launching one of the second processors, wherein the one of the second processors launches another one of the second processors in response to being launched; and
in each of the second processors, continuing execution from the second synchronization point in response to being launched.
2. The method of claim 1 , wherein:
initiating the launch of the second processors comprises initiating a plurality of interrupts;
launching the one of the second processors comprises sending a first of the interrupts to the one of the second processors; and
launching the other one of the second processors comprises sending one of the remaining interrupts to the other one of the second processors in response to receiving the first interrupt.
3. The method of claim 2 , further comprising:
in each of the second processors, after receiving one of the interrupts, sending another one of the interrupts to another one of the second processors if indicated in a memory location assigned to the second processor sending the other one of the interrupts.
4. The method of claim 3 , wherein sending another one of the interrupts comprises writing to a memory-mapped address indicated in the memory location assigned to the second processor sending the other one of the interrupts.
5. The method of claim 1 , further comprising:
in each of the second processors, after reaching the first synchronization point, updating at least one of a plurality of memory locations, wherein the memory locations indicate whether all of the second processors have reached the first synchronization point.
6. The method of claim 5 , wherein updating at least one of the plurality of memory locations comprises performing at least one atomic memory update operation on the at least one of the memory locations.
7. The method of claim 6 , wherein sending the first processor the signal comprises writing to one of the plurality of memory locations.
8. The method of claim 1 , further comprising:
in the first processor, continuing execution from the first synchronization point to the second synchronization point in response to receiving the signal, and initiating the plurality of interrupts after reaching the second synchronization point.
9. The method of claim 8 , further comprising:
in the processors, generating service requests for the first processor during execution before the first synchronization point and after the second synchronization point;
wherein continuing execution in the first processor from the first synchronization point to the second synchronization point in response to receiving the signal comprises processing the service requests.
10. The method of claim 8 , wherein:
the first synchronization point is associated with a join operation of a multi-threaded program; and
the second synchronization point is associated with a fork operation of the multi-threaded program.
11. The method of claim 1 , wherein the first synchronization point comprises the second synchronization point.
12. A computer-readable storage medium comprising instructions executable on a first processor and second processors for employing a method for synchronizing the processors, the method comprising:
in each of the second processors, waiting at a second synchronization point after reaching a first synchronization point;
in the last of the second processors to reach the first synchronization point, sending the first processor a signal;
in the first processor, waiting at the first synchronization point until receiving the signal, and initiating a launch of the second processors after receiving the signal by launching one of the second processors, wherein the one of the second processors launches another one of the second processors in response to being launched; and
in each of the second processors, continuing execution from the second synchronization point in response to being launched.
13. A multiprocessor system, comprising:
a first processor configured to wait at a first synchronization point until receiving a signal; and
second processors, wherein each of the second processors is configured to wait at a second synchronization point after reaching the first synchronization point, and to send the signal to the first processor if last of the second processors to reach the first synchronization point;
wherein the first processor is configured to initiate a launch of the second processors after receiving the signal by launching one of the second processors;
wherein the one of the second processors is configured to launch another one of the second processors in response to being launched; and
wherein each of the second processors is configured to continue execution from the second synchronization point in response to being launched.
14. The multiprocessor system of claim 13 , wherein:
the first processor is configured to initiate the launch of the second processors by initiating a plurality of interrupts, and to launch one of the second processors by sending a first of the interrupts to one of the second processors; and
the one of the second processors is configured to launch another one of the second processors by sending at least one of the remaining interrupts in response to receiving the first interrupt.
15. The multiprocessor system of claim 14 , wherein each of the second processors is configured to send another one of the interrupts to another one of the second processors after receiving one of the interrupts if indicated in a memory location assigned to the second processor sending the other one of the interrupts.
16. The multiprocessor system of claim 15 , wherein each of the second processors is configured to send another one of the interrupts by writing to a memory-mapped address indicated in the memory location assigned to the second processor sending the other one of the interrupts.
17. The multiprocessor system of claim 13 , wherein each of the second processors is configured to update at least one of a plurality of memory locations after reading the first synchronization point, wherein the memory locations indicate whether all of the second processors have reached the first synchronization point.
18. The multiprocessor system of claim 17 , wherein each of the second processors is configured to update at least one of the plurality of memory locations by performing at least one atomic memory update operation on the at least one of the memory locations.
19. The multiprocessor system of claim 18 , wherein each of the second processors is configured to send the first processor the signal by writing to one of the plurality of memory locations.
20. The multiprocessor system of claim 13 , wherein the first processor is configured to continue execution from the first synchronization point to the second synchronization point in response to receiving the signal, and to initiate the plurality of interrupts after reaching the second synchronization point.
21. The multiprocessor system of claim 20 , wherein:
each of the processors are configured to generate service requests for the first processor during execution before the first synchronization point and after the second synchronization point; and
the first processor is configured to continue execution from the first synchronization point to the second synchronization point in response to receiving the signal by processing the service requests.
22. The multiprocessor system of claim 20 , wherein:
the first synchronization point is associated with a join operation of a multi-threaded program; and
the second synchronization point is associated with a fork operation of the multi-threaded program.
23. The multiprocessor system of claim 13 , wherein the first synchronization point comprises the second synchronization point.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/838,630 US20090049323A1 (en) | 2007-08-14 | 2007-08-14 | Synchronization of processors in a multiprocessor system |
PCT/US2008/008625 WO2009023076A1 (en) | 2007-08-14 | 2008-07-14 | Synchronization of processors in a multiprocessor system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/838,630 US20090049323A1 (en) | 2007-08-14 | 2007-08-14 | Synchronization of processors in a multiprocessor system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20090049323A1 true US20090049323A1 (en) | 2009-02-19 |
Family
ID=40350959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/838,630 Abandoned US20090049323A1 (en) | 2007-08-14 | 2007-08-14 | Synchronization of processors in a multiprocessor system |
Country Status (2)
Country | Link |
---|---|
US (1) | US20090049323A1 (en) |
WO (1) | WO2009023076A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070106918A1 (en) * | 2005-11-07 | 2007-05-10 | Seiko Epson Corporation | Multi processor system and interrupt signal sending method therefor |
US20090327554A1 (en) * | 2008-06-25 | 2009-12-31 | Dell Products L.P. | Synchronizing processors when entering system management mode |
US20110047556A1 (en) * | 2008-01-17 | 2011-02-24 | Kosuke Nishihara | Synchronization control method and information processing device |
US20110274192A1 (en) * | 2009-01-23 | 2011-11-10 | Alcatel Lucent | Synchronization method and device for real-time distributed system |
US20130145378A1 (en) * | 2011-12-01 | 2013-06-06 | International Business Machines Corporation | Determining Collective Barrier Operation Skew In A Parallel Computer |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2827247A1 (en) | 2013-07-16 | 2015-01-21 | Continental Automotive GmbH | Method for synchronising changes in state in systems embedded in multi-core computers |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5276823A (en) * | 1988-12-09 | 1994-01-04 | Tandem Computers Incorporated | Fault-tolerant computer system with redesignation of peripheral processor |
US5978838A (en) * | 1996-08-19 | 1999-11-02 | Samsung Electronics Co., Ltd. | Coordination and synchronization of an asymmetric, single-chip, dual multiprocessor |
US5978830A (en) * | 1997-02-24 | 1999-11-02 | Hitachi, Ltd. | Multiple parallel-job scheduling method and apparatus |
US6032173A (en) * | 1992-06-10 | 2000-02-29 | Siemens Aktiengesellschaft | Synchronization of a computer system having a plurality of processors |
US6223228B1 (en) * | 1998-09-17 | 2001-04-24 | Bull Hn Information Systems Inc. | Apparatus for synchronizing multiple processors in a data processing system |
US6249880B1 (en) * | 1998-09-17 | 2001-06-19 | Bull Hn Information Systems Inc. | Method and apparatus for exhaustively testing interactions among multiple processors |
US20050193148A1 (en) * | 2004-02-27 | 2005-09-01 | Chen Chenghung J. | Processor synchronization in a multi-processor computer system |
US20070240158A1 (en) * | 2006-04-06 | 2007-10-11 | Shailender Chaudhry | Method and apparatus for synchronizing threads on a processor that supports transactional memory |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2708172B2 (en) * | 1988-03-24 | 1998-02-04 | 株式会社東芝 | Parallel processing method |
-
2007
- 2007-08-14 US US11/838,630 patent/US20090049323A1/en not_active Abandoned
-
2008
- 2008-07-14 WO PCT/US2008/008625 patent/WO2009023076A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5276823A (en) * | 1988-12-09 | 1994-01-04 | Tandem Computers Incorporated | Fault-tolerant computer system with redesignation of peripheral processor |
US6032173A (en) * | 1992-06-10 | 2000-02-29 | Siemens Aktiengesellschaft | Synchronization of a computer system having a plurality of processors |
US5978838A (en) * | 1996-08-19 | 1999-11-02 | Samsung Electronics Co., Ltd. | Coordination and synchronization of an asymmetric, single-chip, dual multiprocessor |
US5978830A (en) * | 1997-02-24 | 1999-11-02 | Hitachi, Ltd. | Multiple parallel-job scheduling method and apparatus |
US6223228B1 (en) * | 1998-09-17 | 2001-04-24 | Bull Hn Information Systems Inc. | Apparatus for synchronizing multiple processors in a data processing system |
US6249880B1 (en) * | 1998-09-17 | 2001-06-19 | Bull Hn Information Systems Inc. | Method and apparatus for exhaustively testing interactions among multiple processors |
US20050193148A1 (en) * | 2004-02-27 | 2005-09-01 | Chen Chenghung J. | Processor synchronization in a multi-processor computer system |
US20070240158A1 (en) * | 2006-04-06 | 2007-10-11 | Shailender Chaudhry | Method and apparatus for synchronizing threads on a processor that supports transactional memory |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070106918A1 (en) * | 2005-11-07 | 2007-05-10 | Seiko Epson Corporation | Multi processor system and interrupt signal sending method therefor |
US7853814B2 (en) * | 2005-11-07 | 2010-12-14 | Seiko Epson Corporation | Method and system for executing a power-cutoff-specific process within a specific processor of a multiprocessor system |
US20110047556A1 (en) * | 2008-01-17 | 2011-02-24 | Kosuke Nishihara | Synchronization control method and information processing device |
US8555291B2 (en) * | 2008-01-17 | 2013-10-08 | Nec Corporation | Synchronization control method and information processing device |
US20090327554A1 (en) * | 2008-06-25 | 2009-12-31 | Dell Products L.P. | Synchronizing processors when entering system management mode |
US7991933B2 (en) * | 2008-06-25 | 2011-08-02 | Dell Products L.P. | Synchronizing processors when entering system management mode |
US8260995B2 (en) | 2008-06-25 | 2012-09-04 | Dell Products L.P. | Processor interrupt command response system |
US20110274192A1 (en) * | 2009-01-23 | 2011-11-10 | Alcatel Lucent | Synchronization method and device for real-time distributed system |
US8495408B2 (en) * | 2009-01-23 | 2013-07-23 | Alcatel Lucent | Synchronization method and device for real-time distributed system wherein each module sets an indicator to signal whether it is currently able to operate synchronously with other modules and resets the indicator at a unified synchronization end time |
US20130145378A1 (en) * | 2011-12-01 | 2013-06-06 | International Business Machines Corporation | Determining Collective Barrier Operation Skew In A Parallel Computer |
US9195517B2 (en) | 2011-12-01 | 2015-11-24 | International Business Machines Corporation | Determining collective barrier operation skew in a parallel computer |
US9195516B2 (en) * | 2011-12-01 | 2015-11-24 | International Business Machines Corporation | Determining collective barrier operation skew in a parallel computer |
Also Published As
Publication number | Publication date |
---|---|
WO2009023076A1 (en) | 2009-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6631462B1 (en) | Memory shared between processing threads | |
US9146777B2 (en) | Parallel processing with solidarity cells by proactively retrieving from a task pool a matching task for the solidarity cell to process | |
US8209690B2 (en) | System and method for thread handling in multithreaded parallel computing of nested threads | |
EP1381939B1 (en) | Registers for data transfers within a multithreaded processor | |
US8015248B2 (en) | Queuing of conflicted remotely received transactions | |
US5202988A (en) | System for communicating among processors having different speeds | |
US20090049323A1 (en) | Synchronization of processors in a multiprocessor system | |
US7415598B2 (en) | Message synchronization in network processors | |
JP3206914B2 (en) | Multiprocessor system | |
KR100613923B1 (en) | Context pipelines | |
US20200034214A1 (en) | Method for arbitration and access to hardware request ring structures in a concurrent environment | |
KR100895536B1 (en) | Data transfer mechanism | |
US8359459B2 (en) | Using hardware support to reduce synchronization costs in multithreaded applications | |
JP2017045151A (en) | Arithmetic processing device and control method of arithmetic processing device | |
JPH09138778A (en) | Device and method using semaphore buffer for semaphore instruction | |
WO2019178178A1 (en) | Thread scheduling for multithreaded data processing environments | |
US6766437B1 (en) | Composite uniprocessor | |
CA2382728A1 (en) | Efficient event waiting | |
US20030212852A1 (en) | Signal aggregation | |
CN114756287A (en) | Data processing method and device for reorder buffer and storage medium | |
US10489164B2 (en) | Apparatuses for enqueuing kernels on a device-side | |
Diep et al. | A general approach for supporting nonblocking data structures on distributed-memory systems | |
US20090300630A1 (en) | Waiting based on a task group | |
US6996665B2 (en) | Hazard queue for transaction pipeline | |
JP2517859B2 (en) | Parallel process management method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:IMARK, ROBERT R.;GASSER, RAYMOND A.;REEL/FRAME:019779/0336 Effective date: 20070808 |
|
STCB | Information on status: application discontinuation |
Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION |