CN101833440A - Speculative multithreading memory data synchronous execution method under support of compiler and device thereof - Google Patents

Speculative multithreading memory data synchronous execution method under support of compiler and device thereof Download PDF

Info

Publication number
CN101833440A
CN101833440A CN 201010160536 CN201010160536A CN101833440A CN 101833440 A CN101833440 A CN 101833440A CN 201010160536 CN201010160536 CN 201010160536 CN 201010160536 A CN201010160536 A CN 201010160536A CN 101833440 A CN101833440 A CN 101833440A
Authority
CN
China
Prior art keywords
thread
instruction
data
synchronous
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010160536
Other languages
Chinese (zh)
Other versions
CN101833440B (en
Inventor
赵银亮
王旭昊
韦远科
宋少龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN 201010160536 priority Critical patent/CN101833440B/en
Publication of CN101833440A publication Critical patent/CN101833440A/en
Application granted granted Critical
Publication of CN101833440B publication Critical patent/CN101833440B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Multi Processors (AREA)

Abstract

The invention discloses a speculative multithreading memory data synchronous execution method under the support of a compiler and a device thereof, which can synchronize selected read/write memory instructions when a program is operated, reduce the frequency of read/write data dependency violation and improve the integral speedup ratio of multithreading synchronous operation. The method comprises the following steps of: adding a stall instruction before a read instruction after a candidate read/write instruction pair is obtained, replacing the read instruction into a synchronous read instruction synload, adding one or more forward instructions behind a write instruction and adding a synset instruction behind a thread initiating instruction spawn of a thread in which the write instruction is positioned; finally operating on a simulator after an executable file generated through compilation linking is loaded; and speculatively executing a multithreading program in a synchronous mode to obtain an operation result and a higher speedup ratio.

Description

Speculative multithreading memory data sync manner of execution and device under the compiler support
Technical field
The invention belongs to computer realm, relate to Thread-Level Parallelism and speculative multithreading technology, particularly under the compiler support internal storage data is carried out synchronous manner of execution of the multi-thread internal storage data of synchronous supposition and device.
Background technology
The parallel development trend that has become parallel processing and parallel computation of current thread level.On the one hand, polycaryon processor has obtained general application.On the other hand, on-chip multi-processor is because of bandwidth bigger between its nuclear and lower communication delay, for the Thread-Level Parallelism technology provides huge application platform.But the utilization ratio of processor core often is subjected to the restriction of various factors, and some parallel Programming methods have brought extra burden for user and programming personnel.(SpMT SpeculativeMultithreading) is suggested under such background the speculative multithreading technology.
For the speculative multithreading technology, compiler adopts radical strategy, and the serial program with high level language is divided into multithread programs automatically, and allows cross-thread to exist certain data to rely on.When a plurality of thread parallels were carried out, the data that detected cross-thread by hardware dynamic relied in violation of rules and regulations, and the buffer memory tentative data is also realized the maintenance of data consistency, the submission by thread, cancel, mechanism such as checking, guarantee the correctness that program parallelization is carried out.
The data degree of dependence of cross-thread is to influence speculative multithreading to carry out one of key factor of efficient.The data of cross-thread rely on and comprise that mainly register data relies on and internal storage data relies on; Writeafterread (RAW, Read After Write) internal storage data all can take place and rely in violation of rules and regulations in these two kinds of data when relying on operation.The writeafterread internal storage data that takes place in when operation relies in violation of rules and regulations, will directly cause restarting of thread, and all back stepchild's threads that read the thread of misdata will directly be cancelled.The waste that this has just caused processor resource has restricted the raising of program parallelization execution speed-up ratio.
The method that solves the data dependence mainly contains two kinds: value prediction technology and data synchronization technology.Rely on for register data, can be by traditional compiling data-flow analysis technology, as arriving the definite value analysis, technology associated value forecasting techniquess such as live variable analysis etc. are solved.And rely on for internal storage data, often can not be relied on the result accurately by the data-flow analysis technology of static state.This internal storage data that makes value prediction technology and data synchronization technology can not solve cross-thread effectively relies on.
Summary of the invention
The problem that the present invention solves is to provide speculative multithreading memory data sync manner of execution and the device under a kind of compiler support, can when program run, carry out synchronously selected read/write memory instruction, reduce the writeafterread data and rely on the number of times that takes place in violation of rules and regulations, improve the speed-up ratio of multithread programs parallel running.
The present invention takes following technical scheme to realize:
Speculative multithreading memory data sync manner of execution under a kind of compiler support may further comprise the steps:
1) after program generates the multithreading assembly routine by the hardware thread divider divides, as the input of hardware program parser, and operation thereon;
2) the hardware program parser is added up internal storage data and is relied on situation in violation of rules and regulations when working procedure, and gives synchronous collector with the statistics information feedback;
3) synchronously collector reads statistics information, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; By after the data-flow analysis, reading instruction before, add a stall instruction then, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward and instruct; After initiating instruction spawn, adds the thread of write command place thread a synset instruction; After compilation linking generates executable file loads the back and move on simulator;
When instructing to stall with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
With the operation of step thread after receiving synchronization message complete by the synload instruction, father's thread operation of sending synchronization message after writing data is synchronously instructed by forward and is finished;
4) multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Described statistics internal storage data relies in violation of rules and regulations, and situation comprises: number of times that read/write instruction is performed and PC value thereof, read/write instruction generation writeafterread internal storage data relies on number of times in violation of rules and regulations, the writeafterread internal storage data takes place rely on the interval time that read/write instruction is in violation of rules and regulations carried out; Statistics information is written in the file, feeds back to synchronous collector.
The synchronous cost of described calculating is the potentiality of read/write instruction to carrying out bringing speed-up ratio to improve after the synchronous operation of calculating multithreading, and it is right to obtain the big candidate's read/write instruction of speed-up ratio raising potentiality.
Described speed-up ratio improve potentiality by the internal memory read/write instruction to the frequency that data rely on takes place, the enter the mouth code distance that reads instruction and the code distance that reads instruction between the write command of thread body judged.
Described data-flow analysis is to carry out on the control flow graph of program, to find the insertion position of stall instruction and forward instruction; Wherein, before stall instruction is inserted into and needs synchronous reading instruction, the forward instruction be inserted into need synchronous write command and and the thread outlet between each paths on.
The operation steps that described synchronous father's thread sends synchronization message is:
(1) after father's thread is initiated with the step thread synchronously, father's thread execution synset instruction synchronously, to sending message with the step thread, will be with the position, syn_flag position of step thread Sbuffer;
(2) carry out the store*p operation, the conventional internal memory operation of writing;
(3) carry out save_address (p) operation, the memory address p that writes data is saved in the transmission address buffer;
(4) carry out receiver_id=get_immed_successor (), obtain processor numbering with the step thread;
(5) carry out the mesg.value=*p operation, will write the value territory that data that internal memory operation writes write synchronization message mesg;
(6) carry out the mesg.addr=p operation, the memory address p that writes data is write the addr territory of synchronization message;
(7) (synchronization message that structure is good sends to the step thread for mesg, receiver_id) operation to carry out send.
Described operation steps with step thread reception synchronization message is:
(1) carry out stall () operation, whether the ready_flag position of inquiring about Sbuffer with the step thread is set; If not, then keep blocked state with the step thread; If, then finish blocked state with the step thread, carry out subsequent operation;
(2) carry out receive (mesg) operation, receive synchronization message after, the ready_flag position of Sbuffer is set;
(3) carry out message_addr=mesg.addr, obtain the memory address of synchronization message, deposit in the message_addr territory of Sbuffer;
(4) carry out message_value=mesg.value, obtain the internal storage data value of synchronization message, deposit in the message_value territory of Sbuffer;
(5) (q, message_addr) operation check that whether the memory address of receiving synchronization message is same address with the memory address q that the write command operation will be read, if then with the use_flag zone bit set of Sbuffer to carry out verify;
(6) carry out the loaded_value=load*q operation, carry out conventional rdma read operation, the data of reading that write are put into memory_value; When carrying out read operation, check whether current thread writes data to memory address q place, if then the use_flag zone bit with Sbuffer resets;
(7) carry out result=choose (loaded_value, message_value) operation, if the use_flag zone bit of Sbuffer is set, then with the result of the internal storage data value in the step thread employing synchronization message as the last reading of data of thread synchronization read operation; If the use_flag zone bit is reset, then use the result of the value of loaded_value as the last reading of data of thread synchronization read operation;
(8) carry out quit () operation, the ready_flag position of Sbuffer is resetted, the use_flag position of Sbuffer is resetted.
A kind of device of finishing the speculative multithreading memory data sync manner of execution under the compiler support comprises with lower module:
The thread dividing device, procedure division generated the multithreading assembly routine after, as the input of hardware program parser, and operation thereon;
Hardware program parser, hardware program parser are added up internal storage data and are relied on situation in violation of rules and regulations when working procedure, and give synchronous collector with the statistics information feedback;
Synchronous collector, collector reads statistics information synchronously, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward instruction, after initiating instruction spawn, adds the thread of write command place thread a synset instruction;
Compilation linking device, assembly routine generate binary executable file after compilation linking;
Simulator, executable file loads the back and moves on simulator; Comprise internal memory under the synchronization mechanism is read instruction and the expansion support of internal memory write command;
Obtain being operating as of internal storage data under the synchronization mechanism: when instructing to stall, stop to carry out subsequent instructions, wait for a period of time with the step thread execution; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
Write and send being operating as of data under the synchronization mechanism: when intact corresponding write command of synchronous father's thread execution and forward instruction, to contain the write command corresponding memory address and be saved in the transmission address buffer, with memory address with write data and send to the step thread with the form of message;
Multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Described transmission address buffer is the data buffer of the address of all data that sent by forward instruction of record.
Described Sbuffer is the register file that is used to receive synchronization message and control synchronous operation; Sbuffer is made up of five territories: three controlled flag position sync_flag, ready_flag and use_flag, two message field message_addr and message_value; The thread process unit can reset all zone bits of Sbuffer when initialization;
Whether the sync_flag bit representation presses method for synchronous is carried out, and when this zone bit is set, represents to carry out according to method for synchronous; When this zone bit is reset, represent method execution routinely, this moment, the stall instruction was skipped, and the load that the operation that the synload instruction is finished is equal to routine instructs, and the data in the message field are invalid;
Whether ready_flag bit representation synchronization message is ready to, and is ready to when this zone bit is set the expression synchronization message;
Whether the use_flag bit representation uses data in the message with the step thread, will use data in the message when this zone bit is set expression with the step thread, when the expression that is reset of this zone bit will be used conventional read operation acquisition internal storage data with the step thread;
Message_addr is used for storing the addr territory of synchronization message;
Message_value is used for storing the value territory of synchronization message.
Compared with prior art, the present invention has following beneficial technical effects:
Speculative multithreading memory data sync manner of execution and device thereof under the compiler support provided by the invention, can when operation, carry out synchronously selected read/write internal memory instruction, reduce the writeafterread data and rely on the number of times that takes place in violation of rules and regulations, improve the speed-up ratio of multithread programs parallel running.
The present invention is to carrying out right the obtaining of candidate's read/write instruction of multithreading synchronous operation, obtain by the hardware program parser, the statistics internal storage data relies on situation in violation of rules and regulations when working procedure, and give synchronous collector with the statistics information feedback, and then to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost.When hardware program parser during at working procedure, its purpose mainly is the dynamic memory access feature of parsing program, and program is moved in the lightweight mode, can not bring too much overhead; And it is right to obtain the big candidate's read/write instruction of raising speed-up ratio potentiality, right for the memory read-write instruction that frequent generation data rely on, and carries out often bringing synchronously high speedup ratio to promote to it; And the memory read-write instruction that relies on for non-frequent generation data is right, add synchronic command at it after, tend to during operation occur that some are unnecessary synchronously, or excessive synchronia.On the other hand, for a thread, the code distance that synchronous candidate reads instruction if the thread body enters the mouth is bigger, and the code distance that synchronous candidate reads instruction between synchronous candidate's write command is less, and then synchronous effect can be more remarkable.
Rely on the number of times that takes place in violation of rules and regulations in order to reduce the writeafterread data, the present invention is by adding a stall instruction before reading instruction.After carrying out the stall instruction, get clogged, wait for the internal storage data that synchronous father's thread produces to be needed with the step thread; Behind write command, add one or more forward instruction; With the operation of step thread after receiving synchronization message complete by the synload instruction, father's thread operation of sending synchronization message after writing data is synchronously instructed by forward and is finished;
When instructing to stall with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive and send the message that contains memory address and write data, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
According to synchronous manner of execution of the present invention, read thread synchronously after carrying out data sync, can obtain correct internal storage data, avoid being cancelled because of the writeafterread data rely in violation of rules and regulations, kept in the stall instruction and carried out the work that front processor nuclear is finished, reduced the waste of restarting and cancel the processor resource that is caused of thread.
Except the hardware program parser to obtaining the right statistics support of candidate's read/write instruction, in order to support that the functional module of the new expansion of simulator of the present invention comprises that mainly one sends address buffer and a local register file Sbuffer with step thread and the synchronous operation of father's thread synchronously.
Sending address buffer is a very little data buffer, is used for the address of all data that sent by the forward instruction of minute book thread.Local register file Sbuffer is the register file that is used to receive synchronization message and control synchronous operation.
For the frequency that internal memory read/write instruction generation data rely on, can be by the empirical value of a large amount of experiment acquisitions.The present invention has specifically chosen 7 test procedures from the Olden benchmark program, presses the Prophet thread dividing method, infers the test of carrying out on the simulator of 4 nuclears, can obtain average 34% speed-up ratio raising.The employing method for synchronous is carried out, the minimum value that data rely on frequency is set at 15%, the thread body enter the mouth the code distance that reads instruction and write command to the code distance of thread outlet than minimum value location 1, choosing the instruction of read/write internal memory carries out synchronously, compare with asynchronous supposition execution, speed-up ratio can obtain average 7% lifting.As seen method for synchronous of the present invention promotes inferring that the speed-up ratio of carrying out has more significantly.
Description of drawings
Fig. 1 is the schematic flow sheet of the speculative multithreading memory data sync manner of execution under the compiler support.
Fig. 2 be synchronous father's thread with the read/write instruction of step thread to the synchronous operation synoptic diagram.
Multiplied unit (multi-core system) the structural framing synoptic diagram that Fig. 3 carries out for the multithreading memory data.
Fig. 4 is the Sbuffer design diagram.
Fig. 5 is the operation chart of implementing with the read operation of step thread synchronization.
Embodiment
Below in conjunction with accompanying drawing the present invention is described in further detail.The explanation of the invention is not limited.
Referring to Fig. 1, the speculative multithreading system that shown compiler is supported mainly is made up of compiler and simulator two parts; Wherein, the hardware program parser can be the module that obtains by the instruction of extender memory read-write on simulator statistical module.The hardware program parser historical reading writing information that logging program is carried out when program run, and internal storage data dependence information by the pc value of instruction, are determined the position of read write command by the compilation linking device.When hardware program parser during at working procedure, its purpose mainly is the dynamic memory access feature of parsing program, and program is moved in the lightweight mode, can't bring too much overhead.
Speculative multithreading memory data sync manner of execution under the compiler support may further comprise the steps:
1) after program generates the multithreading assembly routine by the hardware thread divider divides, as the input of hardware program parser, and operation thereon;
2) the hardware program parser is added up internal storage data and is relied on situation in violation of rules and regulations when working procedure, and gives synchronous collector with the statistics information feedback;
3) synchronously collector reads statistics information, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; By after the data-flow analysis, reading instruction before, add a stall instruction then, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward and instruct; After initiating instruction spawn, adds the thread of write command place thread a synset instruction; After compilation linking generates executable file loads the back and move on simulator;
When instructing to stall with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
With the operation of step thread after receiving synchronization message complete by the synload instruction, father's thread operation of sending synchronization message after writing data is synchronously instructed by forward and is finished;
4) multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Above-mentioned steps 3) the described synchronous collector synchronous cost of calculating of carrying out is the potentiality of read/write instruction to carrying out bringing speed-up ratio to improve after the synchronous operation of calculating multithreading, and obtaining speed-up ratio, to improve the big candidate's read/write instruction of potentiality right.Described speed-up ratio improves the big read/write instruction of potentiality to being:
The internal memory read/write instruction that relies on for frequent generation data is right, carries out often bringing synchronously high speedup ratio to promote to it; And the memory read-write instruction that relies on for non-frequent generation data is right, add synchronic command at it after, tend to during operation occur that some are unnecessary synchronously, or excessive synchronia;
Except the frequency that data rely on to take place, thread entry also is used as the Consideration of estimating synchronous cost to the code distance between the write command etc.The code distance that synchronous candidate reads instruction if the thread body enters the mouth is bigger, and the code distance that synchronous candidate reads instruction between synchronous candidate's write command is less, and then synchronous effect can be more remarkable.Take all factors into consideration above-mentioned factor, it is right to choose the read/write instruction with big speed-up ratio lifting potentiality.
The frequency that relies on for internal memory read/write instruction generation data, and the thread body code distance that enters the mouth and read instruction, write command all depends on concrete executive routine to the code distance that thread exports.This method data is by experiment determined suitable threshold values, as the synchronization criterion of internal memory read/write instruction.
The data-flow analysis that described synchronous collector is carried out is to carry out on the control flow graph of program, to find the insertion position of stall instruction and forward instruction; Wherein, before stall instruction is inserted into and needs synchronous reading instruction, the forward instruction be inserted into need synchronous write command and and the thread outlet between each paths on.
Based on said method, finish the embodiment of the device of the speculative multithreading memory data sync manner of execution under the compiler support, comprise with lower module:
The thread dividing device, procedure division generated the multithreading assembly routine after, as the input of hardware program parser, and operation thereon;
Hardware program parser, hardware program parser are added up internal storage data and are relied on situation in violation of rules and regulations when working procedure, and give synchronous collector with the statistics information feedback;
Synchronous collector, collector reads statistics information synchronously, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward instruction, after initiating instruction spawn, adds the thread of write command place thread a synset instruction;
Compilation linking device, assembly routine generate binary executable file after compilation linking;
Simulator, executable file loads the back and moves on simulator; Comprise internal memory under the synchronization mechanism is read instruction and the expansion support of internal memory write command;
Obtain being operating as of internal storage data under the synchronization mechanism: when instructing to stall, stop to carry out subsequent instructions, wait for a period of time with the step thread execution; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
Write and send being operating as of data under the synchronization mechanism: when intact corresponding write command of synchronous father's thread execution and forward instruction, to contain the write command corresponding memory address and be saved in the transmission address buffer, with memory address with write data and send to the step thread with the form of synchronization message;
Multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Described transmission address buffer is the data buffer of the address of all data that sent by forward instruction of record.
Referring to Fig. 3, multiplied unit (multinuclear) architecture frame shown in this embodiment by the on-chip bus connection, simulator is realized according to this structural framing, comprise that the hardware program parser relies on the statistics of situation in violation of rules and regulations for internal storage data, and in order to support the synchronous operation of sub-thread and father's thread, the functional module of expansion: send address buffer (FAB) and local register file Sbuffer; In addition, thread controller, tentative data buffer memory and thread master control are in order to support multithreading to infer the module of expansion, are equal to the speculative multithreading chip multi-core processor of routine or quite.Connect a plurality of processing units by on-chip bus, multithreading is inferred execution in a synchronous manner at a plurality of processing units.
Embodiment shown in Figure 3, the hardware program parser of processing unit specifically are made of a nuclear internal program analysis controller and analysis statistics buffer memory.
Sending address buffer is a very little data buffer, is used for the address of all data that sent by the forward instruction of minute book thread.The main effect that sends address buffer is: the memory address of noting the synchrodata of transmission, the synchronous write thread is in operation subsequently, if carry out write operation (store operation), then need with the address of this write operation with send that address stored compares one by one in the address buffer, see to send in the address buffer whether stored identical memory address.If inquire identical address, show that then sending to the data of reading thread synchronously still is not up-to-date data.In this case, the synchronous write thread can send message to reading thread synchronously, restarts and reads thread synchronously.Write send address buffer after, the synchronous write thread can send to the data that write and the memory address that writes data respectively with this and read thread synchronously in the mode of message.Through hardware program parser statistics, a thread needs data in synchronization to be no more than 8 at most, and it is very little therefore to send address buffer, can not bring too much overhead to system when operation.
Described Sbuffer is the register file that is used to receive synchronization message and control synchronous operation.This document is that processor core is privately owned.Referring to Sbuffer design diagram shown in Figure 4, Sbuffer is made up of five territories: three controlled flag position sync_flag, ready_flag and use_flag, two message field message_addr and message_value; The thread process unit can reset all zone bits of Sbuffer when initialization.
Whether the sync_flag bit representation presses method for synchronous is carried out, and when this zone bit is set, represents to carry out according to method for synchronous; When this zone bit is reset, represent method execution routinely, this moment, the stall instruction was skipped, and the load that the operation that the synload instruction is finished is equal to routine instructs, and the data in the message field are invalid;
Whether ready_flag bit representation synchronization message is ready to, and is ready to when this zone bit is set the expression synchronization message;
Whether the use_flag bit representation uses data in the message with the step thread, will use data in the message when this zone bit is set expression with the step thread, when the expression that is reset of this zone bit will be used conventional read operation acquisition internal storage data with the step thread;
Message_addr is used for the addr territory of storing message;
Message_value is used for the value territory of storing message.
Provide candidate's read/write instruction below in synchronous father's thread a kind of implementation with the concrete operations of carrying out synchronously with the step thread.Wherein, synset instruction is the synchronous operation house-keeping instruction of new expansion, and its effect is to guarantee when carrying out data sync synchronous father's thread and with existing in the step thread.After a new thread is initiated in a thread execution spawn instruction, can be by the execution of synset instruction, with the position, syn_flag position of its sub-thread Sbuffer.After having only syn_flag position as the Sbuffer of a thread by the position, this thread execution stall instruction just can enter waiting status.Otherwise stall instruction meeting is skipped, and does not carry out any operation.
The operation steps of the synchronization message of father's thread transmission synchronously is:
(1) after father's thread is initiated with the step thread synchronously, father's thread execution synset instruction synchronously, to sending message with the step thread, will be with the position, syn_flag position of step thread Sbuffer;
(2) carry out the store*p operation, the conventional internal memory operation of writing;
(3) carry out save_address (p) operation, the memory address p that writes data is saved in the transmission address buffer;
(4) carry out receiver_id=get_immed_successor (), obtain processor numbering with the step thread;
(5) carry out the mesg.value=*p operation, will write the value territory that data that internal memory operation writes write synchronization message mesg;
(6) carry out the mesg.addr=p operation, the memory address p that writes data is write the addr territory of synchronization message;
(7) (synchronization message that structure is good sends to the step thread for mesg, receiver_id) operation to carry out send.
For synchronous writing operation, after carrying out the store instruction, hardware is carried out the conventional internal memory operation of writing.Finish write internal memory operation after, the synchronous write thread need store the memory address of write operation among the FAB into.After carrying out the store operation, other operation is finished by the forward instruction, and the function of forward instruction is that value and corresponding memory address that the internal memory write command is write are sent to sub-thread.Father's thread sends to the synchronization message of synchronous father's thread synchronously, comprises two territories of addr and value, and addr deposits memory address in the territory, deposits in the value territory and writes data.
Referring to Fig. 5, the operation steps that receives synchronization message with the step thread is:
(1) carry out stall () operation, whether the ready_flag position of inquiring about Sbuffer with the step thread is set; If not, then keep blocked state with the step thread; If, then finish blocked state with the step thread, carry out subsequent operation;
(2) carry out receive (mesg) operation, receive synchronization message after, the ready_flag position of Sbuffer is set;
(3) carry out message_addr=mesg.addr, obtain the memory address of synchronization message, deposit in the message_addr territory of Sbuffer;
(4) carry out message_value=mesg.value, obtain the internal storage data value of synchronization message, deposit in the message_value territory of Sbuffer;
(5) (q, message_addr) operation check that whether the memory address of receiving synchronization message is same address with the memory address q that the write command operation will be read, if then with the use_flag zone bit set of Sbuffer to carry out verify;
(6) carry out the loaded_value=load*q operation, carry out conventional rdma read operation, the data of reading that write are put into memory_value; When carrying out read operation, check whether current thread writes data to memory address q place, if then the use_flag zone bit with Sbuffer resets;
(7) carry out result=choose (loaded_value, message_value) operation, if the use_flag zone bit of Sbuffer is set, then with the result of the internal storage data value in the step thread employing synchronization message as the last reading of data of thread synchronization read operation; If the use_flag zone bit is reset, then use the result of the value of loaded_value as the last reading of data of thread synchronization read operation;
(8) carry out quit () operation, the ready_flag position of Sbuffer is resetted, the use_flag position of Sbuffer is resetted.
For synchronous read operation, at first, after reading thread execution stall instruction synchronously, can enter waiting status (the ready_flag position of Sbuffer is not set, and is in blocked state with the step thread), wait for that synchronous father's thread sends message.After receiving message, be set with the ready_flag position of the Sbuffer of step thread, finish to wait for.After obtaining message, by conventional load operation, read an internal storage data value with the step thread.When carrying out conventional read operation, processor will check whether processor carried out write operation to same address before this.If then the use_flag zone bit with Sbuffer resets, to avoid using legacy data to replace new data.Carry out verify () operation then, whether the memory address that comprises in the address that reading of data is wanted in the read operation that inspection will be carried out and the data of sending equates.If equate, then with the position, use_flag position of Sbuffer.Then, by choose () operation, choose suitable value, as the end product of synchronous read operation.
According to above-mentioned synchronous operation, specifically 7 test procedure mst, bh, tsp have been chosen from the Olden benchmark program, em3d, bisort, vovonoi and health, press the Prophet thread dividing method, on the simulator of 4 nuclears, infer the test of carrying out, can obtain average 34% speed-up ratio raising.
And the minimum value that data rely on frequency is set at 15%, the thread body enter the mouth the code distance that reads instruction and write command to the code distance of thread outlet than minimum value location 1, choosing the instruction of read/write internal memory carries out synchronously, compare with asynchronous supposition execution, speed-up ratio can obtain average 7% lifting.As seen method for synchronous of the present invention promotes inferring that the speed-up ratio of carrying out has more significantly.

Claims (10)

1. the speculative multithreading memory data sync manner of execution under the compiler support is characterized in that, multithread programs realizes that internal storage data may further comprise the steps synchronously:
1) after program generates the multithreading assembly routine by the hardware thread divider divides, as the input of hardware program parser, and operation thereon;
2) the hardware program parser is added up internal storage data and is relied on situation in violation of rules and regulations when working procedure, and gives synchronous collector with the statistics information feedback;
3) synchronously collector reads statistics information, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; By after the data-flow analysis, reading instruction before, add a stall instruction then, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward and instruct; After initiating instruction spawn, adds the thread of write command place thread a synset instruction; After compilation linking generates executable file loads the back and move on simulator;
When instructing to stall with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
With the operation of step thread after receiving synchronization message complete by the synload instruction, father's thread operation of sending synchronization message after writing data is synchronously instructed by forward and is finished;
4) multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
2. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 1, it is characterized in that, described statistics internal storage data relies in violation of rules and regulations, and situation comprises: number of times that read/write instruction is performed and PC value thereof, read/write instruction generation writeafterread internal storage data relies on number of times in violation of rules and regulations, the writeafterread internal storage data takes place rely on the interval time that read/write instruction is in violation of rules and regulations carried out; Statistics information is written in the file, feeds back to synchronous collector.
3. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 1, it is characterized in that, the synchronous cost of described calculating, be the potentiality of read/write instruction to carrying out bringing speed-up ratio to improve after the synchronous operation of calculating multithreading, it is right to obtain the big candidate's read/write instruction of speed-up ratio raising potentiality.
4. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 3, it is characterized in that, described speed-up ratio improve potentiality by the internal memory read/write instruction to the frequency that data rely on takes place, the enter the mouth code distance that reads instruction and the code distance that reads instruction between the write command of thread body judged.
5. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 1 is characterized in that, described data-flow analysis is to carry out on the control flow graph of program, to find the insertion position of stall instruction and forward instruction; Wherein, before stall instruction is inserted into and needs synchronous reading instruction, the forward instruction be inserted into need synchronous write command and and the thread outlet between each paths on.
6. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 1 is characterized in that, the operation steps that described synchronous father's thread sends synchronization message is:
(1) after father's thread is initiated with the step thread synchronously, father's thread execution synset instruction synchronously, to sending message with the step thread, will be with the position, syn_flag position of step thread Sbuffer;
(2) carry out the store*p operation, the conventional internal memory operation of writing;
(3) carry out save_address (p) operation, the memory address p that writes data is saved in the transmission address buffer;
(4) carry out receiver_id=get_immed_successor (), obtain processor numbering with the step thread;
(5) carry out the mesg.value=*p operation, will write the value territory that data that internal memory operation writes write synchronization message mesg;
(6) carry out the mesg.addr=p operation, the memory address p that writes data is write the addr territory of synchronization message;
(7) (synchronization message that structure is good sends to the step thread for mesg, receiver_id) operation to carry out send.
7. the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 1 is characterized in that, described operation steps with step thread reception synchronization message is:
(1) carry out stall () operation, whether the ready_flag position of inquiring about Sbuffer with the step thread is set; If not, then keep blocked state with the step thread; If, then finish blocked state with the step thread, carry out subsequent operation;
(2) carry out receive (mesg) operation, receive synchronization message after, the ready_flag position of Sbuffer is set;
(3) carry out message_addr=mesg.addr, obtain the memory address of synchronization message, deposit in the message_addr territory of Sbuffer;
(4) carry out message_value=mesg.value, obtain the internal storage data value of synchronization message, deposit in the message_value territory of Sbuffer;
(5) (q, message_addr) operation check that whether the memory address of receiving synchronization message is same address with the memory address q that the write command operation will be read, if then with the use_flag zone bit set of Sbuffer to carry out verify;
(6) carry out the loaded_value=load*q operation, carry out conventional rdma read operation, the data of reading that write are put into memory_value; When carrying out read operation, check whether current thread writes data to memory address q place, if then the use_flag zone bit with Sbuffer resets;
(7) carry out result=choose (loaded_value, message_value) operation, if the use_flag zone bit of Sbuffer is set, then with the result of the internal storage data value in the step thread employing synchronization message as the last reading of data of thread synchronization read operation; If the use_flag zone bit is reset, then use the result of the value of loaded_value as the last reading of data of thread synchronization read operation;
(8) carry out quit () operation, the ready_flag position of Sbuffer is resetted, the use_flag position of Sbuffer is resetted.
8. a device of finishing the speculative multithreading memory data sync manner of execution under the compiler support is characterized in that, comprises with lower module:
The thread dividing device, procedure division generated the multithreading assembly routine after, as the input of hardware program parser, and operation thereon;
Hardware program parser, hardware program parser are added up internal storage data and are relied on situation in violation of rules and regulations when working procedure, and give synchronous collector with the statistics information feedback;
Synchronous collector, collector reads statistics information synchronously, and to obtain candidate's read/write instruction of carrying out multithreading synchronous operation right by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with the synload that reads instruction synchronously, behind write command, add one or more forward instruction, after initiating instruction spawn, adds the thread of write command place thread a synset instruction;
Compilation linking device, assembly routine generate binary executable file after compilation linking;
Simulator, executable file loads the back and moves on simulator; Comprise internal memory under the synchronization mechanism is read instruction and the expansion support of internal memory write command;
Obtain being operating as of internal storage data under the synchronization mechanism: when instructing to stall, stop to carry out subsequent instructions, wait for a period of time with the step thread execution; Until intact corresponding write command of synchronous father's thread execution of correspondence and forward instruction, receive containing memory address and writing data synchronization message of synchronous father's thread transmission, again according to synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, the data that whether decision uses synchronous father's thread to send according to this zone bit, then stop to wait for, continue to carry out subsequent instructions;
Write and send being operating as of data under the synchronization mechanism: when intact corresponding write command of synchronous father's thread execution and forward instruction, to contain the write command corresponding memory address and be saved in the transmission address buffer, with memory address with write data and send to the step thread with the form of synchronization message;
Multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
9. the device of finishing the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 8, it is characterized in that described transmission address buffer is the data buffer of the address of all data that sent by forward instruction of record.
10. the device of finishing the speculative multithreading memory data sync manner of execution under the compiler support as claimed in claim 8 is characterized in that, described Sbuffer is the register file that is used to receive synchronization message and control synchronous operation; Sbuffer is made up of five territories: three controlled flag position sync_flag, ready_flag and use_flag, two message field message_addr and message_value; The thread process unit can reset all zone bits of Sbuffer when initialization;
Whether the sync_flag bit representation presses method for synchronous is carried out, and when this zone bit is set, represents to carry out according to method for synchronous; When this zone bit is reset, represent method execution routinely, this moment, the stall instruction was skipped, and the load that the operation that the synload instruction is finished is equal to routine instructs, and the data in the message field are invalid;
Whether ready_flag bit representation synchronization message is ready to, and is ready to when this zone bit is set the expression synchronization message;
Whether the use_flag bit representation uses data in the message with the step thread, will use data in the message when this zone bit is set expression with the step thread, when the expression that is reset of this zone bit will be used conventional read operation acquisition internal storage data with the step thread;
Message_addr is used for storing the addr territory of synchronization message;
Message_value is used for storing the value territory of synchronization message.
CN 201010160536 2010-04-30 2010-04-30 Speculative multithreading memory data synchronous execution method under support of compiler and device thereof Expired - Fee Related CN101833440B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010160536 CN101833440B (en) 2010-04-30 2010-04-30 Speculative multithreading memory data synchronous execution method under support of compiler and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010160536 CN101833440B (en) 2010-04-30 2010-04-30 Speculative multithreading memory data synchronous execution method under support of compiler and device thereof

Publications (2)

Publication Number Publication Date
CN101833440A true CN101833440A (en) 2010-09-15
CN101833440B CN101833440B (en) 2013-01-02

Family

ID=42717520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010160536 Expired - Fee Related CN101833440B (en) 2010-04-30 2010-04-30 Speculative multithreading memory data synchronous execution method under support of compiler and device thereof

Country Status (1)

Country Link
CN (1) CN101833440B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063291A (en) * 2011-01-13 2011-05-18 上海大学 Multilevel parallel execution method of speculation thread
CN102073580A (en) * 2011-02-01 2011-05-25 华为技术有限公司 Performance analyzing method and tool and computer system
CN104424117A (en) * 2013-08-20 2015-03-18 华为技术有限公司 Memory physical address query method and memory physical address query device
CN104915180A (en) * 2014-03-10 2015-09-16 华为技术有限公司 Data operation method and device
CN105138309A (en) * 2015-07-09 2015-12-09 西安交通大学 CMP (Chip Multiprocessor)-based multi-speculative path thread partitioning method under speculative multithreading mechanism
CN109828788A (en) * 2018-12-21 2019-05-31 天翼电子商务有限公司 The regulation engine accelerated method executed and system are speculated based on thread-level
CN110347400A (en) * 2019-06-13 2019-10-18 西安科技大学 Compile accelerated method, routing unit and caching
WO2020259560A1 (en) * 2019-06-24 2020-12-30 华为技术有限公司 Method and apparatus for inserting synchronization instruction
CN113360157A (en) * 2020-03-05 2021-09-07 阿里巴巴集团控股有限公司 Program compiling method, device and computer readable medium
WO2022218337A1 (en) * 2021-04-14 2022-10-20 华为技术有限公司 Method for inspecting code under weak memory order architecture, and corresponding device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020095453A1 (en) * 2001-01-16 2002-07-18 Microsoft Corporation Thread-specific heaps
CN1885275A (en) * 2005-06-20 2006-12-27 中兴通讯股份有限公司 Embedded system and real-time monitoring and processing method thereof
CN101630276A (en) * 2009-08-18 2010-01-20 深圳市融创天下科技发展有限公司 High-efficiency memory pool access method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020095453A1 (en) * 2001-01-16 2002-07-18 Microsoft Corporation Thread-specific heaps
CN1885275A (en) * 2005-06-20 2006-12-27 中兴通讯股份有限公司 Embedded system and real-time monitoring and processing method thereof
CN101630276A (en) * 2009-08-18 2010-01-20 深圳市融创天下科技发展有限公司 High-efficiency memory pool access method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《ISPA 2010》 20100909 Xuhao Wang, Yinliang Zhao, Yuanke Wei, Shaolong Song, Bo Han 《Prophet Synchronization Thread Model and Compiler Support》 81-87 1-10 , *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063291B (en) * 2011-01-13 2013-08-28 上海大学 Multilevel parallel execution method of speculation thread
CN102063291A (en) * 2011-01-13 2011-05-18 上海大学 Multilevel parallel execution method of speculation thread
CN102073580A (en) * 2011-02-01 2011-05-25 华为技术有限公司 Performance analyzing method and tool and computer system
CN102073580B (en) * 2011-02-01 2013-10-02 华为技术有限公司 Performance analyzing method and tool and computer system
CN104424117B (en) * 2013-08-20 2017-09-05 华为技术有限公司 Internal memory physics address inquiring method and device
CN104424117A (en) * 2013-08-20 2015-03-18 华为技术有限公司 Memory physical address query method and memory physical address query device
US10114762B2 (en) 2013-08-20 2018-10-30 Huawei Technologies Co., Ltd. Method and apparatus for querying physical memory address
CN104915180B (en) * 2014-03-10 2017-12-22 华为技术有限公司 A kind of method and apparatus of data manipulation
CN104915180A (en) * 2014-03-10 2015-09-16 华为技术有限公司 Data operation method and device
CN105138309A (en) * 2015-07-09 2015-12-09 西安交通大学 CMP (Chip Multiprocessor)-based multi-speculative path thread partitioning method under speculative multithreading mechanism
CN105138309B (en) * 2015-07-09 2018-07-17 西安交通大学 More supposition paths thread dividing method under speculative multithreading mechanism based on CMP
CN109828788A (en) * 2018-12-21 2019-05-31 天翼电子商务有限公司 The regulation engine accelerated method executed and system are speculated based on thread-level
CN110347400A (en) * 2019-06-13 2019-10-18 西安科技大学 Compile accelerated method, routing unit and caching
CN110347400B (en) * 2019-06-13 2023-03-14 西安科技大学 Compile acceleration method, routing unit and cache
WO2020259560A1 (en) * 2019-06-24 2020-12-30 华为技术有限公司 Method and apparatus for inserting synchronization instruction
US11934832B2 (en) 2019-06-24 2024-03-19 Huawei Technologies Co., Ltd. Synchronization instruction insertion method and apparatus
CN113360157A (en) * 2020-03-05 2021-09-07 阿里巴巴集团控股有限公司 Program compiling method, device and computer readable medium
WO2022218337A1 (en) * 2021-04-14 2022-10-20 华为技术有限公司 Method for inspecting code under weak memory order architecture, and corresponding device

Also Published As

Publication number Publication date
CN101833440B (en) 2013-01-02

Similar Documents

Publication Publication Date Title
CN101833440B (en) Speculative multithreading memory data synchronous execution method under support of compiler and device thereof
DE102020115581A1 (en) MANAGEMENT OF PLANNING DEPENDENCIES BY A COMPILER FOR NEURAL NETWORKS
DE102020110688A1 (en) HARDWARE-AGNOSTIC COMPILER FOR DEEP NEURAL NETWORKS
JP4276028B2 (en) Multiprocessor system synchronization method
Huang et al. Decoupled software pipelining creates parallelization opportunities
Tian et al. Supporting speculative parallelization in the presence of dynamic data structures
US20120324472A1 (en) Transactional computation on clusters
US20100070730A1 (en) Minimizing memory access conflicts of process communication channels
WO2013184380A2 (en) Systems and methods for efficient scheduling of concurrent applications in multithreaded processors
US10564955B2 (en) Method to manage multiple versions of parts of a software application and to retire older versions in a dynamically updatable software system
DE4206062A1 (en) Pipeline data processor for instructions - has source operands stored in registers and data in memory for execution using ALU circuits via selector circuit
US9558118B2 (en) Tracing mechanism for recording shared memory interleavings on multi-core processors
US7243195B2 (en) Software managed cache optimization system and method for multi-processing systems
Zhai et al. Compiler optimization of memory-resident value communication between speculative threads
US9069545B2 (en) Relaxation of synchronization for iterative convergent computations
CN101655783B (en) Forward-looking multithreading partitioning method
Anantpur et al. Runtime dependence computation and execution of loops on heterogeneous systems
Kim et al. A study of source-level compiler algorithms for automatic construction of pre-execution code
CN102110052A (en) Parallel acceleration method for dynamic analysis of program behavior
CN109507991A (en) A kind of two axis servo control platform debugging system and method
CN115599714A (en) Multithreading synchronous vehicle-end data acquisition method and device
CN114116015A (en) Method and system for managing hardware command queue
DE68923021T2 (en) Microcomputer system capable of memory access at high speed.
US20140223419A1 (en) Compiler, object code generation method, information processing apparatus, and information processing method
Khatami et al. Using hpx and op2 for improving parallel scaling performance of unstructured grid applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130102

Termination date: 20160430