CN101833440A - Speculative multithreading memory data synchronous execution method under support of compiler and device thereof - Google Patents
Speculative multithreading memory data synchronous execution method under support of compiler and device thereof Download PDFInfo
- Publication number
- CN101833440A CN101833440A CN 201010160536 CN201010160536A CN101833440A CN 101833440 A CN101833440 A CN 101833440A CN 201010160536 CN201010160536 CN 201010160536 CN 201010160536 A CN201010160536 A CN 201010160536A CN 101833440 A CN101833440 A CN 101833440A
- Authority
- CN
- China
- Prior art keywords
- thread
- instruction
- data
- synchronous
- write
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Multi Processors (AREA)
Abstract
The invention discloses a speculative multithreading memory data synchronous execution method under the support of a compiler and a device thereof, which can synchronize selected read/write memory instructions when a program is operated, reduce the frequency of read/write data dependency violation and improve the integral speedup ratio of multithreading synchronous operation. The method comprises the following steps of: adding a stall instruction before a read instruction after a candidate read/write instruction pair is obtained, replacing the read instruction into a synchronous read instruction synload, adding one or more forward instructions behind a write instruction and adding a synset instruction behind a thread initiating instruction spawn of a thread in which the write instruction is positioned; finally operating on a simulator after an executable file generated through compilation linking is loaded; and speculatively executing a multithreading program in a synchronous mode to obtain an operation result and a higher speedup ratio.
Description
Technical field
The invention belongs to computer realm, relate to Thread-Level Parallelism and speculative multithreading technology, particularly under the compiler support internal storage data is carried out synchronous manner of execution of the multi-thread internal storage data of synchronous supposition and device.
Background technology
The parallel development trend that has become parallel processing and parallel computation of current thread level.On the one hand, polycaryon processor has obtained general application.On the other hand, on-chip multi-processor is because of bandwidth bigger between its nuclear and lower communication delay, for the Thread-Level Parallelism technology provides huge application platform.But the utilization ratio of processor core often is subjected to the restriction of various factors, and some parallel Programming methods have brought extra burden for user and programming personnel.(SpMT SpeculativeMultithreading) is suggested under such background the speculative multithreading technology.
For the speculative multithreading technology, compiler adopts radical strategy, and the serial program with high level language is divided into multithread programs automatically, and allows cross-thread to exist certain data to rely on.When a plurality of thread parallels were carried out, the data that detected cross-thread by hardware dynamic relied in violation of rules and regulations, and the buffer memory tentative data is also realized the maintenance of data consistency, the submission by thread, cancel, mechanism such as checking, guarantee the correctness that program parallelization is carried out.
The data degree of dependence of cross-thread is to influence speculative multithreading to carry out one of key factor of efficient.The data of cross-thread rely on and comprise that mainly register data relies on and internal storage data relies on; Writeafterread (RAW, Read After Write) internal storage data all can take place and rely in violation of rules and regulations in these two kinds of data when relying on operation.The writeafterread internal storage data that takes place in when operation relies in violation of rules and regulations, will directly cause restarting of thread, and all back stepchild's threads that read the thread of misdata will directly be cancelled.The waste that this has just caused processor resource has restricted the raising of program parallelization execution speed-up ratio.
The method that solves the data dependence mainly contains two kinds: value prediction technology and data synchronization technology.Rely on for register data, can be by traditional compiling data-flow analysis technology, as arriving the definite value analysis, technology associated value forecasting techniquess such as live variable analysis etc. are solved.And rely on for internal storage data, often can not be relied on the result accurately by the data-flow analysis technology of static state.This internal storage data that makes value prediction technology and data synchronization technology can not solve cross-thread effectively relies on.
Summary of the invention
The problem that the present invention solves is to provide speculative multithreading memory data sync manner of execution and the device under a kind of compiler support, can when program run, carry out synchronously selected read/write memory instruction, reduce the writeafterread data and rely on the number of times that takes place in violation of rules and regulations, improve the speed-up ratio of multithread programs parallel running.
The present invention takes following technical scheme to realize:
Speculative multithreading memory data sync manner of execution under a kind of compiler support may further comprise the steps:
1) after program generates the multithreading assembly routine by the hardware thread divider divides, as the input of hardware program parser, and operation thereon;
2) the hardware program parser is added up internal storage data and is relied on situation in violation of rules and regulations when working procedure, and gives synchronous collector with the statistics information feedback;
3) synchronously collector reads statistics information, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; By after the data-flow analysis, reading instruction before, add a stall instruction then, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward and instruct; After initiating instruction spawn, adds the thread of write command place thread a synset instruction; After compilation linking generates executable file loads the back and move on simulator;
When instructing to stall with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
With the operation of step thread after receiving synchronization message complete by the synload instruction, father's thread operation of sending synchronization message after writing data is synchronously instructed by forward and is finished;
4) multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Described statistics internal storage data relies in violation of rules and regulations, and situation comprises: number of times that read/write instruction is performed and PC value thereof, read/write instruction generation writeafterread internal storage data relies on number of times in violation of rules and regulations, the writeafterread internal storage data takes place rely on the interval time that read/write instruction is in violation of rules and regulations carried out; Statistics information is written in the file, feeds back to synchronous collector.
The synchronous cost of described calculating is the potentiality of read/write instruction to carrying out bringing speed-up ratio to improve after the synchronous operation of calculating multithreading, and it is right to obtain the big candidate's read/write instruction of speed-up ratio raising potentiality.
Described speed-up ratio improve potentiality by the internal memory read/write instruction to the frequency that data rely on takes place, the enter the mouth code distance that reads instruction and the code distance that reads instruction between the write command of thread body judged.
Described data-flow analysis is to carry out on the control flow graph of program, to find the insertion position of stall instruction and forward instruction; Wherein, before stall instruction is inserted into and needs synchronous reading instruction, the forward instruction be inserted into need synchronous write command and and the thread outlet between each paths on.
The operation steps that described synchronous father's thread sends synchronization message is:
(1) after father's thread is initiated with the step thread synchronously, father's thread execution synset instruction synchronously, to sending message with the step thread, will be with the position, syn_flag position of step thread Sbuffer;
(2) carry out the store*p operation, the conventional internal memory operation of writing;
(3) carry out save_address (p) operation, the memory address p that writes data is saved in the transmission address buffer;
(4) carry out receiver_id=get_immed_successor (), obtain processor numbering with the step thread;
(5) carry out the mesg.value=*p operation, will write the value territory that data that internal memory operation writes write synchronization message mesg;
(6) carry out the mesg.addr=p operation, the memory address p that writes data is write the addr territory of synchronization message;
(7) (synchronization message that structure is good sends to the step thread for mesg, receiver_id) operation to carry out send.
Described operation steps with step thread reception synchronization message is:
(1) carry out stall () operation, whether the ready_flag position of inquiring about Sbuffer with the step thread is set; If not, then keep blocked state with the step thread; If, then finish blocked state with the step thread, carry out subsequent operation;
(2) carry out receive (mesg) operation, receive synchronization message after, the ready_flag position of Sbuffer is set;
(3) carry out message_addr=mesg.addr, obtain the memory address of synchronization message, deposit in the message_addr territory of Sbuffer;
(4) carry out message_value=mesg.value, obtain the internal storage data value of synchronization message, deposit in the message_value territory of Sbuffer;
(5) (q, message_addr) operation check that whether the memory address of receiving synchronization message is same address with the memory address q that the write command operation will be read, if then with the use_flag zone bit set of Sbuffer to carry out verify;
(6) carry out the loaded_value=load*q operation, carry out conventional rdma read operation, the data of reading that write are put into memory_value; When carrying out read operation, check whether current thread writes data to memory address q place, if then the use_flag zone bit with Sbuffer resets;
(7) carry out result=choose (loaded_value, message_value) operation, if the use_flag zone bit of Sbuffer is set, then with the result of the internal storage data value in the step thread employing synchronization message as the last reading of data of thread synchronization read operation; If the use_flag zone bit is reset, then use the result of the value of loaded_value as the last reading of data of thread synchronization read operation;
(8) carry out quit () operation, the ready_flag position of Sbuffer is resetted, the use_flag position of Sbuffer is resetted.
A kind of device of finishing the speculative multithreading memory data sync manner of execution under the compiler support comprises with lower module:
The thread dividing device, procedure division generated the multithreading assembly routine after, as the input of hardware program parser, and operation thereon;
Hardware program parser, hardware program parser are added up internal storage data and are relied on situation in violation of rules and regulations when working procedure, and give synchronous collector with the statistics information feedback;
Synchronous collector, collector reads statistics information synchronously, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward instruction, after initiating instruction spawn, adds the thread of write command place thread a synset instruction;
Compilation linking device, assembly routine generate binary executable file after compilation linking;
Simulator, executable file loads the back and moves on simulator; Comprise internal memory under the synchronization mechanism is read instruction and the expansion support of internal memory write command;
Obtain being operating as of internal storage data under the synchronization mechanism: when instructing to stall, stop to carry out subsequent instructions, wait for a period of time with the step thread execution; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
Write and send being operating as of data under the synchronization mechanism: when intact corresponding write command of synchronous father's thread execution and forward instruction, to contain the write command corresponding memory address and be saved in the transmission address buffer, with memory address with write data and send to the step thread with the form of message;
Multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Described transmission address buffer is the data buffer of the address of all data that sent by forward instruction of record.
Described Sbuffer is the register file that is used to receive synchronization message and control synchronous operation; Sbuffer is made up of five territories: three controlled flag position sync_flag, ready_flag and use_flag, two message field message_addr and message_value; The thread process unit can reset all zone bits of Sbuffer when initialization;
Whether the sync_flag bit representation presses method for synchronous is carried out, and when this zone bit is set, represents to carry out according to method for synchronous; When this zone bit is reset, represent method execution routinely, this moment, the stall instruction was skipped, and the load that the operation that the synload instruction is finished is equal to routine instructs, and the data in the message field are invalid;
Whether ready_flag bit representation synchronization message is ready to, and is ready to when this zone bit is set the expression synchronization message;
Whether the use_flag bit representation uses data in the message with the step thread, will use data in the message when this zone bit is set expression with the step thread, when the expression that is reset of this zone bit will be used conventional read operation acquisition internal storage data with the step thread;
Message_addr is used for storing the addr territory of synchronization message;
Message_value is used for storing the value territory of synchronization message.
Compared with prior art, the present invention has following beneficial technical effects:
Speculative multithreading memory data sync manner of execution and device thereof under the compiler support provided by the invention, can when operation, carry out synchronously selected read/write internal memory instruction, reduce the writeafterread data and rely on the number of times that takes place in violation of rules and regulations, improve the speed-up ratio of multithread programs parallel running.
The present invention is to carrying out right the obtaining of candidate's read/write instruction of multithreading synchronous operation, obtain by the hardware program parser, the statistics internal storage data relies on situation in violation of rules and regulations when working procedure, and give synchronous collector with the statistics information feedback, and then to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost.When hardware program parser during at working procedure, its purpose mainly is the dynamic memory access feature of parsing program, and program is moved in the lightweight mode, can not bring too much overhead; And it is right to obtain the big candidate's read/write instruction of raising speed-up ratio potentiality, right for the memory read-write instruction that frequent generation data rely on, and carries out often bringing synchronously high speedup ratio to promote to it; And the memory read-write instruction that relies on for non-frequent generation data is right, add synchronic command at it after, tend to during operation occur that some are unnecessary synchronously, or excessive synchronia.On the other hand, for a thread, the code distance that synchronous candidate reads instruction if the thread body enters the mouth is bigger, and the code distance that synchronous candidate reads instruction between synchronous candidate's write command is less, and then synchronous effect can be more remarkable.
Rely on the number of times that takes place in violation of rules and regulations in order to reduce the writeafterread data, the present invention is by adding a stall instruction before reading instruction.After carrying out the stall instruction, get clogged, wait for the internal storage data that synchronous father's thread produces to be needed with the step thread; Behind write command, add one or more forward instruction; With the operation of step thread after receiving synchronization message complete by the synload instruction, father's thread operation of sending synchronization message after writing data is synchronously instructed by forward and is finished;
When instructing to stall with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive and send the message that contains memory address and write data, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
According to synchronous manner of execution of the present invention, read thread synchronously after carrying out data sync, can obtain correct internal storage data, avoid being cancelled because of the writeafterread data rely in violation of rules and regulations, kept in the stall instruction and carried out the work that front processor nuclear is finished, reduced the waste of restarting and cancel the processor resource that is caused of thread.
Except the hardware program parser to obtaining the right statistics support of candidate's read/write instruction, in order to support that the functional module of the new expansion of simulator of the present invention comprises that mainly one sends address buffer and a local register file Sbuffer with step thread and the synchronous operation of father's thread synchronously.
Sending address buffer is a very little data buffer, is used for the address of all data that sent by the forward instruction of minute book thread.Local register file Sbuffer is the register file that is used to receive synchronization message and control synchronous operation.
For the frequency that internal memory read/write instruction generation data rely on, can be by the empirical value of a large amount of experiment acquisitions.The present invention has specifically chosen 7 test procedures from the Olden benchmark program, presses the Prophet thread dividing method, infers the test of carrying out on the simulator of 4 nuclears, can obtain average 34% speed-up ratio raising.The employing method for synchronous is carried out, the minimum value that data rely on frequency is set at 15%, the thread body enter the mouth the code distance that reads instruction and write command to the code distance of thread outlet than minimum value location 1, choosing the instruction of read/write internal memory carries out synchronously, compare with asynchronous supposition execution, speed-up ratio can obtain average 7% lifting.As seen method for synchronous of the present invention promotes inferring that the speed-up ratio of carrying out has more significantly.
Description of drawings
Fig. 1 is the schematic flow sheet of the speculative multithreading memory data sync manner of execution under the compiler support.
Fig. 2 be synchronous father's thread with the read/write instruction of step thread to the synchronous operation synoptic diagram.
Multiplied unit (multi-core system) the structural framing synoptic diagram that Fig. 3 carries out for the multithreading memory data.
Fig. 4 is the Sbuffer design diagram.
Fig. 5 is the operation chart of implementing with the read operation of step thread synchronization.
Embodiment
Below in conjunction with accompanying drawing the present invention is described in further detail.The explanation of the invention is not limited.
Referring to Fig. 1, the speculative multithreading system that shown compiler is supported mainly is made up of compiler and simulator two parts; Wherein, the hardware program parser can be the module that obtains by the instruction of extender memory read-write on simulator statistical module.The hardware program parser historical reading writing information that logging program is carried out when program run, and internal storage data dependence information by the pc value of instruction, are determined the position of read write command by the compilation linking device.When hardware program parser during at working procedure, its purpose mainly is the dynamic memory access feature of parsing program, and program is moved in the lightweight mode, can't bring too much overhead.
Speculative multithreading memory data sync manner of execution under the compiler support may further comprise the steps:
1) after program generates the multithreading assembly routine by the hardware thread divider divides, as the input of hardware program parser, and operation thereon;
2) the hardware program parser is added up internal storage data and is relied on situation in violation of rules and regulations when working procedure, and gives synchronous collector with the statistics information feedback;
3) synchronously collector reads statistics information, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; By after the data-flow analysis, reading instruction before, add a stall instruction then, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward and instruct; After initiating instruction spawn, adds the thread of write command place thread a synset instruction; After compilation linking generates executable file loads the back and move on simulator;
When instructing to stall with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
With the operation of step thread after receiving synchronization message complete by the synload instruction, father's thread operation of sending synchronization message after writing data is synchronously instructed by forward and is finished;
4) multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Above-mentioned steps 3) the described synchronous collector synchronous cost of calculating of carrying out is the potentiality of read/write instruction to carrying out bringing speed-up ratio to improve after the synchronous operation of calculating multithreading, and obtaining speed-up ratio, to improve the big candidate's read/write instruction of potentiality right.Described speed-up ratio improves the big read/write instruction of potentiality to being:
The internal memory read/write instruction that relies on for frequent generation data is right, carries out often bringing synchronously high speedup ratio to promote to it; And the memory read-write instruction that relies on for non-frequent generation data is right, add synchronic command at it after, tend to during operation occur that some are unnecessary synchronously, or excessive synchronia;
Except the frequency that data rely on to take place, thread entry also is used as the Consideration of estimating synchronous cost to the code distance between the write command etc.The code distance that synchronous candidate reads instruction if the thread body enters the mouth is bigger, and the code distance that synchronous candidate reads instruction between synchronous candidate's write command is less, and then synchronous effect can be more remarkable.Take all factors into consideration above-mentioned factor, it is right to choose the read/write instruction with big speed-up ratio lifting potentiality.
The frequency that relies on for internal memory read/write instruction generation data, and the thread body code distance that enters the mouth and read instruction, write command all depends on concrete executive routine to the code distance that thread exports.This method data is by experiment determined suitable threshold values, as the synchronization criterion of internal memory read/write instruction.
The data-flow analysis that described synchronous collector is carried out is to carry out on the control flow graph of program, to find the insertion position of stall instruction and forward instruction; Wherein, before stall instruction is inserted into and needs synchronous reading instruction, the forward instruction be inserted into need synchronous write command and and the thread outlet between each paths on.
Based on said method, finish the embodiment of the device of the speculative multithreading memory data sync manner of execution under the compiler support, comprise with lower module:
The thread dividing device, procedure division generated the multithreading assembly routine after, as the input of hardware program parser, and operation thereon;
Hardware program parser, hardware program parser are added up internal storage data and are relied on situation in violation of rules and regulations when working procedure, and give synchronous collector with the statistics information feedback;
Synchronous collector, collector reads statistics information synchronously, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward instruction, after initiating instruction spawn, adds the thread of write command place thread a synset instruction;
Compilation linking device, assembly routine generate binary executable file after compilation linking;
Simulator, executable file loads the back and moves on simulator; Comprise internal memory under the synchronization mechanism is read instruction and the expansion support of internal memory write command;
Obtain being operating as of internal storage data under the synchronization mechanism: when instructing to stall, stop to carry out subsequent instructions, wait for a period of time with the step thread execution; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
Write and send being operating as of data under the synchronization mechanism: when intact corresponding write command of synchronous father's thread execution and forward instruction, to contain the write command corresponding memory address and be saved in the transmission address buffer, with memory address with write data and send to the step thread with the form of synchronization message;
Multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Described transmission address buffer is the data buffer of the address of all data that sent by forward instruction of record.
Referring to Fig. 3, multiplied unit (multinuclear) architecture frame shown in this embodiment by the on-chip bus connection, simulator is realized according to this structural framing, comprise that the hardware program parser relies on the statistics of situation in violation of rules and regulations for internal storage data, and in order to support the synchronous operation of sub-thread and father's thread, the functional module of expansion: send address buffer (FAB) and local register file Sbuffer; In addition, thread controller, tentative data buffer memory and thread master control are in order to support multithreading to infer the module of expansion, are equal to the speculative multithreading chip multi-core processor of routine or quite.Connect a plurality of processing units by on-chip bus, multithreading is inferred execution in a synchronous manner at a plurality of processing units.
Embodiment shown in Figure 3, the hardware program parser of processing unit specifically are made of a nuclear internal program analysis controller and analysis statistics buffer memory.
Sending address buffer is a very little data buffer, is used for the address of all data that sent by the forward instruction of minute book thread.The main effect that sends address buffer is: the memory address of noting the synchrodata of transmission, the synchronous write thread is in operation subsequently, if carry out write operation (store operation), then need with the address of this write operation with send that address stored compares one by one in the address buffer, see to send in the address buffer whether stored identical memory address.If inquire identical address, show that then sending to the data of reading thread synchronously still is not up-to-date data.In this case, the synchronous write thread can send message to reading thread synchronously, restarts and reads thread synchronously.Write send address buffer after, the synchronous write thread can send to the data that write and the memory address that writes data respectively with this and read thread synchronously in the mode of message.Through hardware program parser statistics, a thread needs data in synchronization to be no more than 8 at most, and it is very little therefore to send address buffer, can not bring too much overhead to system when operation.
Described Sbuffer is the register file that is used to receive synchronization message and control synchronous operation.This document is that processor core is privately owned.Referring to Sbuffer design diagram shown in Figure 4, Sbuffer is made up of five territories: three controlled flag position sync_flag, ready_flag and use_flag, two message field message_addr and message_value; The thread process unit can reset all zone bits of Sbuffer when initialization.
Whether the sync_flag bit representation presses method for synchronous is carried out, and when this zone bit is set, represents to carry out according to method for synchronous; When this zone bit is reset, represent method execution routinely, this moment, the stall instruction was skipped, and the load that the operation that the synload instruction is finished is equal to routine instructs, and the data in the message field are invalid;
Whether ready_flag bit representation synchronization message is ready to, and is ready to when this zone bit is set the expression synchronization message;
Whether the use_flag bit representation uses data in the message with the step thread, will use data in the message when this zone bit is set expression with the step thread, when the expression that is reset of this zone bit will be used conventional read operation acquisition internal storage data with the step thread;
Message_addr is used for the addr territory of storing message;
Message_value is used for the value territory of storing message.
Provide candidate's read/write instruction below in synchronous father's thread a kind of implementation with the concrete operations of carrying out synchronously with the step thread.Wherein, synset instruction is the synchronous operation house-keeping instruction of new expansion, and its effect is to guarantee when carrying out data sync synchronous father's thread and with existing in the step thread.After a new thread is initiated in a thread execution spawn instruction, can be by the execution of synset instruction, with the position, syn_flag position of its sub-thread Sbuffer.After having only syn_flag position as the Sbuffer of a thread by the position, this thread execution stall instruction just can enter waiting status.Otherwise stall instruction meeting is skipped, and does not carry out any operation.
The operation steps of the synchronization message of father's thread transmission synchronously is:
(1) after father's thread is initiated with the step thread synchronously, father's thread execution synset instruction synchronously, to sending message with the step thread, will be with the position, syn_flag position of step thread Sbuffer;
(2) carry out the store*p operation, the conventional internal memory operation of writing;
(3) carry out save_address (p) operation, the memory address p that writes data is saved in the transmission address buffer;
(4) carry out receiver_id=get_immed_successor (), obtain processor numbering with the step thread;
(5) carry out the mesg.value=*p operation, will write the value territory that data that internal memory operation writes write synchronization message mesg;
(6) carry out the mesg.addr=p operation, the memory address p that writes data is write the addr territory of synchronization message;
(7) (synchronization message that structure is good sends to the step thread for mesg, receiver_id) operation to carry out send.
For synchronous writing operation, after carrying out the store instruction, hardware is carried out the conventional internal memory operation of writing.Finish write internal memory operation after, the synchronous write thread need store the memory address of write operation among the FAB into.After carrying out the store operation, other operation is finished by the forward instruction, and the function of forward instruction is that value and corresponding memory address that the internal memory write command is write are sent to sub-thread.Father's thread sends to the synchronization message of synchronous father's thread synchronously, comprises two territories of addr and value, and addr deposits memory address in the territory, deposits in the value territory and writes data.
Referring to Fig. 5, the operation steps that receives synchronization message with the step thread is:
(1) carry out stall () operation, whether the ready_flag position of inquiring about Sbuffer with the step thread is set; If not, then keep blocked state with the step thread; If, then finish blocked state with the step thread, carry out subsequent operation;
(2) carry out receive (mesg) operation, receive synchronization message after, the ready_flag position of Sbuffer is set;
(3) carry out message_addr=mesg.addr, obtain the memory address of synchronization message, deposit in the message_addr territory of Sbuffer;
(4) carry out message_value=mesg.value, obtain the internal storage data value of synchronization message, deposit in the message_value territory of Sbuffer;
(5) (q, message_addr) operation check that whether the memory address of receiving synchronization message is same address with the memory address q that the write command operation will be read, if then with the use_flag zone bit set of Sbuffer to carry out verify;
(6) carry out the loaded_value=load*q operation, carry out conventional rdma read operation, the data of reading that write are put into memory_value; When carrying out read operation, check whether current thread writes data to memory address q place, if then the use_flag zone bit with Sbuffer resets;
(7) carry out result=choose (loaded_value, message_value) operation, if the use_flag zone bit of Sbuffer is set, then with the result of the internal storage data value in the step thread employing synchronization message as the last reading of data of thread synchronization read operation; If the use_flag zone bit is reset, then use the result of the value of loaded_value as the last reading of data of thread synchronization read operation;
(8) carry out quit () operation, the ready_flag position of Sbuffer is resetted, the use_flag position of Sbuffer is resetted.
For synchronous read operation, at first, after reading thread execution stall instruction synchronously, can enter waiting status (the ready_flag position of Sbuffer is not set, and is in blocked state with the step thread), wait for that synchronous father's thread sends message.After receiving message, be set with the ready_flag position of the Sbuffer of step thread, finish to wait for.After obtaining message, by conventional load operation, read an internal storage data value with the step thread.When carrying out conventional read operation, processor will check whether processor carried out write operation to same address before this.If then the use_flag zone bit with Sbuffer resets, to avoid using legacy data to replace new data.Carry out verify () operation then, whether the memory address that comprises in the address that reading of data is wanted in the read operation that inspection will be carried out and the data of sending equates.If equate, then with the position, use_flag position of Sbuffer.Then, by choose () operation, choose suitable value, as the end product of synchronous read operation.
According to above-mentioned synchronous operation, specifically 7 test procedure mst, bh, tsp have been chosen from the Olden benchmark program, em3d, bisort, vovonoi and health, press the Prophet thread dividing method, on the simulator of 4 nuclears, infer the test of carrying out, can obtain average 34% speed-up ratio raising.
And the minimum value that data rely on frequency is set at 15%, the thread body enter the mouth the code distance that reads instruction and write command to the code distance of thread outlet than minimum value location 1, choosing the instruction of read/write internal memory carries out synchronously, compare with asynchronous supposition execution, speed-up ratio can obtain average 7% lifting.As seen method for synchronous of the present invention promotes inferring that the speed-up ratio of carrying out has more significantly.
Claims (10)
1. the speculative multithreading memory data sync manner of execution under the compiler support is characterized in that, multithread programs realizes that internal storage data may further comprise the steps synchronously:
1) after program generates the multithreading assembly routine by the hardware thread divider divides, as the input of hardware program parser, and operation thereon;
2) the hardware program parser is added up internal storage data and is relied on situation in violation of rules and regulations when working procedure, and gives synchronous collector with the statistics information feedback;
3) synchronously collector reads statistics information, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; By after the data-flow analysis, reading instruction before, add a stall instruction then, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward and instruct; After initiating instruction spawn, adds the thread of write command place thread a synset instruction; After compilation linking generates executable file loads the back and move on simulator;
When instructing to stall with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
With the operation of step thread after receiving synchronization message complete by the synload instruction, father's thread operation of sending synchronization message after writing data is synchronously instructed by forward and is finished;
4) multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
2. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 1, it is characterized in that, described statistics internal storage data relies in violation of rules and regulations, and situation comprises: number of times that read/write instruction is performed and PC value thereof, read/write instruction generation writeafterread internal storage data relies on number of times in violation of rules and regulations, the writeafterread internal storage data takes place rely on the interval time that read/write instruction is in violation of rules and regulations carried out; Statistics information is written in the file, feeds back to synchronous collector.
3. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 1, it is characterized in that, the synchronous cost of described calculating, be the potentiality of read/write instruction to carrying out bringing speed-up ratio to improve after the synchronous operation of calculating multithreading, it is right to obtain the big candidate's read/write instruction of speed-up ratio raising potentiality.
4. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 3, it is characterized in that, described speed-up ratio improve potentiality by the internal memory read/write instruction to the frequency that data rely on takes place, the enter the mouth code distance that reads instruction and the code distance that reads instruction between the write command of thread body judged.
5. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 1 is characterized in that, described data-flow analysis is to carry out on the control flow graph of program, to find the insertion position of stall instruction and forward instruction; Wherein, before stall instruction is inserted into and needs synchronous reading instruction, the forward instruction be inserted into need synchronous write command and and the thread outlet between each paths on.
6. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 1 is characterized in that, the operation steps that described synchronous father's thread sends synchronization message is:
(1) after father's thread is initiated with the step thread synchronously, father's thread execution synset instruction synchronously, to sending message with the step thread, will be with the position, syn_flag position of step thread Sbuffer;
(2) carry out the store*p operation, the conventional internal memory operation of writing;
(3) carry out save_address (p) operation, the memory address p that writes data is saved in the transmission address buffer;
(4) carry out receiver_id=get_immed_successor (), obtain processor numbering with the step thread;
(5) carry out the mesg.value=*p operation, will write the value territory that data that internal memory operation writes write synchronization message mesg;
(6) carry out the mesg.addr=p operation, the memory address p that writes data is write the addr territory of synchronization message;
(7) (synchronization message that structure is good sends to the step thread for mesg, receiver_id) operation to carry out send.
7. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 1 is characterized in that, described operation steps with step thread reception synchronization message is:
(1) carry out stall () operation, whether the ready_flag position of inquiring about Sbuffer with the step thread is set; If not, then keep blocked state with the step thread; If, then finish blocked state with the step thread, carry out subsequent operation;
(2) carry out receive (mesg) operation, receive synchronization message after, the ready_flag position of Sbuffer is set;
(3) carry out message_addr=mesg.addr, obtain the memory address of synchronization message, deposit in the message_addr territory of Sbuffer;
(4) carry out message_value=mesg.value, obtain the internal storage data value of synchronization message, deposit in the message_value territory of Sbuffer;
(5) (q, message_addr) operation check that whether the memory address of receiving synchronization message is same address with the memory address q that the write command operation will be read, if then with the use_flag zone bit set of Sbuffer to carry out verify;
(6) carry out the loaded_value=load*q operation, carry out conventional rdma read operation, the data of reading that write are put into memory_value; When carrying out read operation, check whether current thread writes data to memory address q place, if then the use_flag zone bit with Sbuffer resets;
(7) carry out result=choose (loaded_value, message_value) operation, if the use_flag zone bit of Sbuffer is set, then with the result of the internal storage data value in the step thread employing synchronization message as the last reading of data of thread synchronization read operation; If the use_flag zone bit is reset, then use the result of the value of loaded_value as the last reading of data of thread synchronization read operation;
(8) carry out quit () operation, the ready_flag position of Sbuffer is resetted, the use_flag position of Sbuffer is resetted.
8. a device of finishing the speculative multithreading memory data sync manner of execution under the compiler support is characterized in that, comprises with lower module:
The thread dividing device, procedure division generated the multithreading assembly routine after, as the input of hardware program parser, and operation thereon;
Hardware program parser, hardware program parser are added up internal storage data and are relied on situation in violation of rules and regulations when working procedure, and give synchronous collector with the statistics information feedback;
Synchronous collector, collector reads statistics information synchronously, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward instruction, after initiating instruction spawn, adds the thread of write command place thread a synset instruction;
Compilation linking device, assembly routine generate binary executable file after compilation linking;
Simulator, executable file loads the back and moves on simulator; Comprise internal memory under the synchronization mechanism is read instruction and the expansion support of internal memory write command;
Obtain being operating as of internal storage data under the synchronization mechanism: when instructing to stall, stop to carry out subsequent instructions, wait for a period of time with the step thread execution; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
Write and send being operating as of data under the synchronization mechanism: when intact corresponding write command of synchronous father's thread execution and forward instruction, to contain the write command corresponding memory address and be saved in the transmission address buffer, with memory address with write data and send to the step thread with the form of synchronization message;
Multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
9. the device of finishing the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 8, it is characterized in that described transmission address buffer is the data buffer of the address of all data that sent by forward instruction of record.
10. the device of finishing the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 8 is characterized in that, described Sbuffer is the register file that is used to receive synchronization message and control synchronous operation; Sbuffer is made up of five territories: three controlled flag position sync_flag, ready_flag and use_flag, two message field message_addr and message_value; The thread process unit can reset all zone bits of Sbuffer when initialization;
Whether the sync_flag bit representation presses method for synchronous is carried out, and when this zone bit is set, represents to carry out according to method for synchronous; When this zone bit is reset, represent method execution routinely, this moment, the stall instruction was skipped, and the load that the operation that the synload instruction is finished is equal to routine instructs, and the data in the message field are invalid;
Whether ready_flag bit representation synchronization message is ready to, and is ready to when this zone bit is set the expression synchronization message;
Whether the use_flag bit representation uses data in the message with the step thread, will use data in the message when this zone bit is set expression with the step thread, when the expression that is reset of this zone bit will be used conventional read operation acquisition internal storage data with the step thread;
Message_addr is used for storing the addr territory of synchronization message;
Message_value is used for storing the value territory of synchronization message.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010160536 CN101833440B (en) | 2010-04-30 | 2010-04-30 | Speculative multithreading memory data synchronous execution method under support of compiler and device thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010160536 CN101833440B (en) | 2010-04-30 | 2010-04-30 | Speculative multithreading memory data synchronous execution method under support of compiler and device thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101833440A true CN101833440A (en) | 2010-09-15 |
CN101833440B CN101833440B (en) | 2013-01-02 |
Family
ID=42717520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010160536 Expired - Fee Related CN101833440B (en) | 2010-04-30 | 2010-04-30 | Speculative multithreading memory data synchronous execution method under support of compiler and device thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101833440B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063291A (en) * | 2011-01-13 | 2011-05-18 | 上海大学 | Multilevel parallel execution method of speculation thread |
CN102073580A (en) * | 2011-02-01 | 2011-05-25 | 华为技术有限公司 | Performance analyzing method and tool and computer system |
CN104424117A (en) * | 2013-08-20 | 2015-03-18 | 华为技术有限公司 | Memory physical address query method and memory physical address query device |
CN104915180A (en) * | 2014-03-10 | 2015-09-16 | 华为技术有限公司 | Data operation method and device |
CN105138309A (en) * | 2015-07-09 | 2015-12-09 | 西安交通大学 | CMP (Chip Multiprocessor)-based multi-speculative path thread partitioning method under speculative multithreading mechanism |
CN109828788A (en) * | 2018-12-21 | 2019-05-31 | 天翼电子商务有限公司 | The regulation engine accelerated method executed and system are speculated based on thread-level |
CN110347400A (en) * | 2019-06-13 | 2019-10-18 | 西安科技大学 | Compile accelerated method, routing unit and caching |
WO2020259560A1 (en) * | 2019-06-24 | 2020-12-30 | 华为技术有限公司 | Method and apparatus for inserting synchronization instruction |
CN113360157A (en) * | 2020-03-05 | 2021-09-07 | 阿里巴巴集团控股有限公司 | Program compiling method, device and computer readable medium |
WO2022218337A1 (en) * | 2021-04-14 | 2022-10-20 | 华为技术有限公司 | Method for inspecting code under weak memory order architecture, and corresponding device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020095453A1 (en) * | 2001-01-16 | 2002-07-18 | Microsoft Corporation | Thread-specific heaps |
CN1885275A (en) * | 2005-06-20 | 2006-12-27 | 中兴通讯股份有限公司 | Embedded system and real-time monitoring and processing method thereof |
CN101630276A (en) * | 2009-08-18 | 2010-01-20 | 深圳市融创天下科技发展有限公司 | High-efficiency memory pool access method |
-
2010
- 2010-04-30 CN CN 201010160536 patent/CN101833440B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020095453A1 (en) * | 2001-01-16 | 2002-07-18 | Microsoft Corporation | Thread-specific heaps |
CN1885275A (en) * | 2005-06-20 | 2006-12-27 | 中兴通讯股份有限公司 | Embedded system and real-time monitoring and processing method thereof |
CN101630276A (en) * | 2009-08-18 | 2010-01-20 | 深圳市融创天下科技发展有限公司 | High-efficiency memory pool access method |
Non-Patent Citations (1)
Title |
---|
《ISPA 2010》 20100909 Xuhao Wang, Yinliang Zhao, Yuanke Wei, Shaolong Song, Bo Han 《Prophet Synchronization Thread Model and Compiler Support》 81-87 1-10 , * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063291B (en) * | 2011-01-13 | 2013-08-28 | 上海大学 | Multilevel parallel execution method of speculation thread |
CN102063291A (en) * | 2011-01-13 | 2011-05-18 | 上海大学 | Multilevel parallel execution method of speculation thread |
CN102073580A (en) * | 2011-02-01 | 2011-05-25 | 华为技术有限公司 | Performance analyzing method and tool and computer system |
CN102073580B (en) * | 2011-02-01 | 2013-10-02 | 华为技术有限公司 | Performance analyzing method and tool and computer system |
CN104424117B (en) * | 2013-08-20 | 2017-09-05 | 华为技术有限公司 | Internal memory physics address inquiring method and device |
CN104424117A (en) * | 2013-08-20 | 2015-03-18 | 华为技术有限公司 | Memory physical address query method and memory physical address query device |
US10114762B2 (en) | 2013-08-20 | 2018-10-30 | Huawei Technologies Co., Ltd. | Method and apparatus for querying physical memory address |
CN104915180B (en) * | 2014-03-10 | 2017-12-22 | 华为技术有限公司 | A kind of method and apparatus of data manipulation |
CN104915180A (en) * | 2014-03-10 | 2015-09-16 | 华为技术有限公司 | Data operation method and device |
CN105138309A (en) * | 2015-07-09 | 2015-12-09 | 西安交通大学 | CMP (Chip Multiprocessor)-based multi-speculative path thread partitioning method under speculative multithreading mechanism |
CN105138309B (en) * | 2015-07-09 | 2018-07-17 | 西安交通大学 | More supposition paths thread dividing method under speculative multithreading mechanism based on CMP |
CN109828788A (en) * | 2018-12-21 | 2019-05-31 | 天翼电子商务有限公司 | The regulation engine accelerated method executed and system are speculated based on thread-level |
CN110347400A (en) * | 2019-06-13 | 2019-10-18 | 西安科技大学 | Compile accelerated method, routing unit and caching |
CN110347400B (en) * | 2019-06-13 | 2023-03-14 | 西安科技大学 | Compile acceleration method, routing unit and cache |
WO2020259560A1 (en) * | 2019-06-24 | 2020-12-30 | 华为技术有限公司 | Method and apparatus for inserting synchronization instruction |
US11934832B2 (en) | 2019-06-24 | 2024-03-19 | Huawei Technologies Co., Ltd. | Synchronization instruction insertion method and apparatus |
CN113360157A (en) * | 2020-03-05 | 2021-09-07 | 阿里巴巴集团控股有限公司 | Program compiling method, device and computer readable medium |
WO2022218337A1 (en) * | 2021-04-14 | 2022-10-20 | 华为技术有限公司 | Method for inspecting code under weak memory order architecture, and corresponding device |
Also Published As
Publication number | Publication date |
---|---|
CN101833440B (en) | 2013-01-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101833440B (en) | Speculative multithreading memory data synchronous execution method under support of compiler and device thereof | |
DE102020115581A1 (en) | MANAGEMENT OF PLANNING DEPENDENCIES BY A COMPILER FOR NEURAL NETWORKS | |
DE102020110688A1 (en) | HARDWARE-AGNOSTIC COMPILER FOR DEEP NEURAL NETWORKS | |
JP4276028B2 (en) | Multiprocessor system synchronization method | |
Huang et al. | Decoupled software pipelining creates parallelization opportunities | |
Tian et al. | Supporting speculative parallelization in the presence of dynamic data structures | |
US20120324472A1 (en) | Transactional computation on clusters | |
US20100070730A1 (en) | Minimizing memory access conflicts of process communication channels | |
WO2013184380A2 (en) | Systems and methods for efficient scheduling of concurrent applications in multithreaded processors | |
US10564955B2 (en) | Method to manage multiple versions of parts of a software application and to retire older versions in a dynamically updatable software system | |
DE4206062A1 (en) | Pipeline data processor for instructions - has source operands stored in registers and data in memory for execution using ALU circuits via selector circuit | |
US9558118B2 (en) | Tracing mechanism for recording shared memory interleavings on multi-core processors | |
US7243195B2 (en) | Software managed cache optimization system and method for multi-processing systems | |
Zhai et al. | Compiler optimization of memory-resident value communication between speculative threads | |
US9069545B2 (en) | Relaxation of synchronization for iterative convergent computations | |
CN101655783B (en) | Forward-looking multithreading partitioning method | |
Anantpur et al. | Runtime dependence computation and execution of loops on heterogeneous systems | |
Kim et al. | A study of source-level compiler algorithms for automatic construction of pre-execution code | |
CN102110052A (en) | Parallel acceleration method for dynamic analysis of program behavior | |
CN109507991A (en) | A kind of two axis servo control platform debugging system and method | |
CN115599714A (en) | Multithreading synchronous vehicle-end data acquisition method and device | |
CN114116015A (en) | Method and system for managing hardware command queue | |
DE68923021T2 (en) | Microcomputer system capable of memory access at high speed. | |
US20140223419A1 (en) | Compiler, object code generation method, information processing apparatus, and information processing method | |
Khatami et al. | Using hpx and op2 for improving parallel scaling performance of unstructured grid applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130102 Termination date: 20160430 |