CN101833440B - Speculative multithreading memory data synchronous execution method under support of compiler and device thereof - Google Patents

Speculative multithreading memory data synchronous execution method under support of compiler and device thereof Download PDF

Info

Publication number
CN101833440B
CN101833440B CN 201010160536 CN201010160536A CN101833440B CN 101833440 B CN101833440 B CN 101833440B CN 201010160536 CN201010160536 CN 201010160536 CN 201010160536 A CN201010160536 A CN 201010160536A CN 101833440 B CN101833440 B CN 101833440B
Authority
CN
China
Prior art keywords
thread
instruction
message
data
synchronous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201010160536
Other languages
Chinese (zh)
Other versions
CN101833440A (en
Inventor
赵银亮
王旭昊
韦远科
宋少龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN 201010160536 priority Critical patent/CN101833440B/en
Publication of CN101833440A publication Critical patent/CN101833440A/en
Application granted granted Critical
Publication of CN101833440B publication Critical patent/CN101833440B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a speculative multithreading memory data synchronous execution method under the support of a compiler and a device thereof, which can synchronize selected read/write memory instructions when a program is operated, reduce the frequency of read/write data dependency violation and improve the integral speedup ratio of multithreading synchronous operation. The method comprises the following steps of: adding a stall instruction before a read instruction after a candidate read/write instruction pair is obtained, replacing the read instruction into a synchronous read instruction synload, adding one or more forward instructions behind a write instruction and adding a synset instruction behind a thread initiating instruction spawn of a thread in which the write instruction is positioned; finally operating on a simulator after an executable file generated through compilation linking is loaded; and speculatively executing a multithreading program in a synchronous mode to obtain an operation result and a higher speedup ratio.

Description

Speculative multithreading memory data synchronous manner of execution and device under the compiler support
Technical field
The invention belongs to computer realm, relate to Thread-Level Parallelism and speculative multithreading technology, particularly under the compiler support internal storage data is carried out the synchronous manner of execution of the multi-thread internal storage data of synchronous supposition and device.
Background technology
The parallel development trend that has become parallel processing and parallel computation of current thread level.On the one hand, polycaryon processor has obtained general application.On the other hand, on-chip multi-processor is because of bandwidth larger between its nuclear and lower communication delay, for the Thread-Level Parallelism technology provides huge application platform.But the utilization ratio of processor core often is subject to the restriction of various factors, and some parallel Programming methods have brought extra burden for user and programming personnel.Speculative multithreading technology (SpMT, SpeculativeMultithreading) is exactly to be suggested under such background.
For the speculative multithreading technology, compiler adopts radical strategy, and the serial program with high level language is divided into multithread programs automatically, and allows cross-thread to have certain data dependence.When a plurality of thread parallels are carried out, the data dependence that detects cross-thread by hardware dynamic in violation of rules and regulations, the buffer memory tentative data is also realized the maintenance of data consistency, the submission by thread, cancel, the mechanism such as checking, guarantee the correctness that program parallelization is carried out.
The data dependence degree of cross-thread is to affect speculative multithreading to carry out one of key factor of efficient.The data dependence of cross-thread comprises that mainly register data relies on and internal storage data relies on; Writeafterread (RAW, Read After Write) internal storage data all can occur and rely in violation of rules and regulations in these two kinds of data dependences when operation.The writeafterread internal storage data that occurs in when operation relies in violation of rules and regulations, will directly cause restarting of thread, and stepchild's threads will be directly withdrawn after having read all of thread of misdata.The waste that this has just caused processor resource has restricted the raising of program parallelization execution speed-up ratio.
The method that solves data dependence mainly contains two kinds: value prediction technology and data synchronization technology.Rely on for register data, can be by traditional compiling data-flow analysis technology, as arriving the definite value analysis, the technology associated value forecasting techniquess such as live variable analysis etc. are solved.And rely on for internal storage data, often can not be relied on accurately the result by the data-flow analysis technology of static state.This is so that value prediction technology and data synchronization technology can not solve the internal storage data dependence of cross-thread effectively.
Summary of the invention
The problem that the present invention solves is to provide speculative multithreading memory data synchronous manner of execution and the device under a kind of compiler support, can when moving, program carry out synchronously selected read/write memory instruction, reduce the number of times that the writeafterread data dependence occurs in violation of rules and regulations, improve the speed-up ratio of multithread programs parallel running.
The present invention takes following technical scheme to realize:
Speculative multithreading memory data synchronous manner of execution under a kind of compiler support may further comprise the steps:
1) after program generates the multithreading assembly routine by the hardware thread divider divides, as the input of hardware program parser, and operation thereon;
2) the hardware program parser is added up internal storage data and is relied in violation of rules and regulations situation when working procedure, and statistics information is fed back to synchronous collector;
3) synchronously collector reads statistics information, obtains candidate's read/write instruction pair of carrying out multithreading synchronous operation by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with and read synchronously instruction synload, behind write command, add one or more forward instruction; After initiating instruction spawn, the thread of write command place thread adds a synset instruction; Finally by moving at simulator after the loading of compilation linking generation executable file;
When arriving the stall instruction with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until the corresponding complete corresponding write command of synchronous father's thread execution and forward instruction, receive the synchronization message that contains memory address and data writing that synchronous father's thread sends, again according to the synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, determine the data of whether using synchronous father's thread to send according to this zone bit, then Stop-wait continues to carry out subsequent instructions;
With complete by the synload instruction of the operation of step thread after receiving synchronization message, synchronously father's thread operation of sending synchronization message behind data writing is finished by the forward instruction;
4) multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Described statistics internal storage data relies in violation of rules and regulations, and situation comprises: the number of times that read/write instruction is performed and PC value thereof, read/write instruction generation writeafterread internal storage data relies on number of times in violation of rules and regulations, the writeafterread internal storage data occurs rely on the interval time that read/write instruction is in violation of rules and regulations carried out; Statistics information is written in the file, feeds back to synchronous collector.
The synchronous cost of described calculating is the potentiality of read/write instruction to carrying out bringing speed-up ratio to improve after the synchronous operation of calculating multithreading, obtains speed-up ratio and improves the large candidate's read/write instruction of potentiality pair.
Described speed-up ratio improve potentiality by the internal memory read/write instruction to the frequency of data dependence occurs, thread body entrance is to the code distance of reading instruction and read instruction and judge to the code distance between the write command.
Described data-flow analysis is that the control flow graph in program carries out, to find the insertion position of stall instruction and forward instruction; Wherein, the stall instruction be inserted into need synchronous read instruction before, the forward instruction be inserted into need synchronous write command and and the thread outlet between each paths on.
The operation steps that described synchronous father's thread sends synchronization message is:
(1) after father's thread is initiated with the step thread synchronously, synchronously father's thread execution synset instruction, to sending message with the step thread, will be with the position, syn_flag position of step thread Sbuffer;
(2) carry out the store*p operation, the conventional internal memory operation of writing;
(3) carry out save_address (p) operation, the memory address p of data writing is saved in the transmission address buffer;
(4) carry out receiver_id=get_immed_successor (), obtain the processor numbering with the step thread;
(5) carry out the mesg.value=*p operation, will write data that internal memory operation writes and write the value territory of synchronization message mesg;
(6) carry out the mesg.addr=p operation, the memory address p of data writing is write the addr territory of synchronization message;
(7) carry out send (mesg, receiver_id) operation, the synchronization message that structure is good sends to the step thread.
Described operation steps with step thread reception synchronization message is:
(1) carries out stall () operation, whether be set with the ready_flag position of step thread pool Sbuffer; If not, then keep blocked state with the step thread; If so, then finish blocked state with the step thread, carry out subsequent operation;
(2) carry out receive (mesg) operation, receive synchronization message after, the ready_flag position of Sbuffer is set;
(3) carry out message_addr=mesg.addr, obtain the memory address of synchronization message, deposit in the message_addr territory of Sbuffer;
(4) carry out message_value=mesg.value, obtain the internal storage data value of synchronization message, deposit in the message_value territory of Sbuffer;
(5) carry out verify (q, message_addr) operation, check that whether the memory address of receiving synchronization message is same address with the memory address q that the write command operation will be read, if it is with the use_flag zone bit set of Sbuffer;
(6) carry out the loaded_value=load*q operation, carry out conventional rdma read operation, the data writing of reading is put into memory_value; When carrying out read operation, whether check current thread to memory address q place data writing, if so, then the use_flag zone bit with Sbuffer resets;
(7) carry out result=choose (loaded_value, message_value) operation, if the use_flag zone bit of Sbuffer is set, then with the result of the internal storage data value in the step thread employing synchronization message as the last reading out data of thread synchronization read operation; If the use_flag zone bit is reset, then use the value of loaded_value as the result of the last reading out data of thread synchronization read operation;
(8) carry out quit () operation, the ready_flag position of Sbuffer is resetted, the use_flag position of Sbuffer is resetted.
A kind of device of finishing the speculative multithreading memory data synchronous manner of execution under the compiler support comprises with lower module:
Thread divider, procedure division generated the multithreading assembly routine after, as the input of hardware program parser, and operation thereon;
Hardware program parser, hardware program parser are added up internal storage data and are relied in violation of rules and regulations situation when working procedure, and statistics information is fed back to synchronous collector;
Synchronous collector, collector reads statistics information synchronously, obtains candidate's read/write instruction pair of carrying out multithreading synchronous operation by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with and read synchronously instruction synload, behind write command, add one or more forward instruction, after initiating instruction spawn, the thread of write command place thread adds a synset instruction;
Compilation linking device, assembly routine generate binary executable file after compilation linking;
Simulator, executable file moves at simulator after loading; Comprise the expansion support of internal memory under the synchronization mechanism being read instruction and internal memory write command;
Obtain being operating as of internal storage data under the synchronization mechanism: when arriving the stall instruction with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until the corresponding complete corresponding write command of synchronous father's thread execution and forward instruction, receive the synchronization message that contains memory address and data writing that synchronous father's thread sends, again according to the synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, determine the data of whether using synchronous father's thread to send according to this zone bit, then Stop-wait continues to carry out subsequent instructions;
Write and send being operating as of data under the synchronization mechanism: when the complete corresponding write command of synchronous father's thread execution and forward instruction, to contain memory address corresponding to write command and be saved in the transmission address buffer, memory address and the data writing form with message will be sent to the step thread;
Multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Described transmission address buffer is the data buffer of the address of all data that sent by the forward instruction of record.
Described Sbuffer is for the register file that receives synchronization message and control synchronous operation; Sbuffer is comprised of five territories: three control zone bit sync_flag, ready_flag and use_flag, two message field message_addr and message_value; The thread process unit can reset all zone bits of Sbuffer when initialization;
Whether the sync_flag bit representation presses method for synchronous is carried out, and when this zone bit is set, represents to carry out according to method for synchronous; When this zone bit is reset, represent method execution routinely, this moment, the stall instruction was skipped, and the operation that the synload instruction is finished is equal to conventional load instruction, and the data in the message field are invalid;
Whether ready_flag bit representation synchronization message is ready to, and is ready to when this zone bit is set the expression synchronization message;
Whether the use_flag bit representation uses data in the message with the step thread, will use data in the message when this zone bit is set expression with the step thread, when the expression that is reset of this zone bit will be used conventional read operation acquisition internal storage data with the step thread;
Message_addr is used for storing the addr territory of synchronization message;
Message_value is used for storing the value territory of synchronization message.
Compared with prior art, the present invention has following useful technique effect:
Speculative multithreading memory data synchronous manner of execution and device thereof under the compiler support provided by the invention, can when operation, carry out synchronously selected read/write internal memory instruction, reduce the number of times that the writeafterread data dependence occurs in violation of rules and regulations, improve the speed-up ratio of multithread programs parallel running.
The present invention is to carrying out right the obtaining of candidate's read/write instruction of multithreading synchronous operation, obtain by the hardware program parser, the statistics internal storage data relies in violation of rules and regulations situation when working procedure, and statistics information fed back to synchronous collector, and then obtain candidate's read/write instruction pair of carrying out multithreading synchronous operation by calculating synchronous cost.When hardware program parser during at working procedure, its purpose mainly is the dynamic memory access feature of parsing program, and program is moved in the lightweight mode, can not bring too much overhead; Improve the large candidate's read/write instruction of speed-up ratio potentiality pair and obtain, for the memory read-write instruction of frequent occurrence data dependence pair, carry out often bringing synchronously higher speed-up ratio to promote to it; And for the memory read-write instruction of non-frequent occurrence data dependence pair, add synchronic command for it after, tend to during operation occur that some are unnecessary synchronously, or excessive synchronia.On the other hand, for a thread, if thread body entrance is larger to the code distance that synchronous candidate reads instruction, and synchronously the candidate to read instruction less to the code distance between synchronous candidate's write command, then synchronous effect can be more remarkable.
The number of times that occurs in violation of rules and regulations in order to reduce the writeafterread data dependence, the present invention is by reading to add a stall instruction before the instruction.After carrying out the stall instruction, get clogged with the step thread, wait for the internal storage data that synchronous father's thread produces to be needed; Behind write command, add one or more forward instruction; With complete by the synload instruction of the operation of step thread after receiving synchronization message, synchronously father's thread operation of sending synchronization message behind data writing is finished by the forward instruction;
When arriving the stall instruction with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until the corresponding complete corresponding write command of synchronous father's thread execution and forward instruction, receive and send the message that contains memory address and data writing, again according to the synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, determine the data of whether using synchronous father's thread to send according to this zone bit, then Stop-wait continues to carry out subsequent instructions;
According to synchronous manner of execution of the present invention, read synchronously thread carry out data synchronously after, can obtain correct internal storage data, avoid being cancelled in violation of rules and regulations because of the writeafterread data dependence, kept in the stall instruction and carried out the work that front processor nuclear is finished, reduced the waste of restarting and cancel the processor resource that causes of thread.
Except the hardware program parser to obtaining the right statistics support of candidate's read/write instruction, in order to support that the functional module of the new expansion of simulator of the present invention comprises that mainly one sends address buffer and a local register file Sbuffer with step thread and the synchronously synchronous operation of father's thread.
Sending address buffer is a very little data buffer, is used for the address of all data that sent by the forward instruction of minute book thread.Local register file Sbuffer is for the register file that receives synchronization message and control synchronous operation.
For the frequency of internal memory read/write instruction generation data dependence, can be by the empirical value of great many of experiments acquisition.The present invention has specifically chosen 7 test procedures from the Olden benchmark program, presses the Prophet thread dividing method, infers the test of carrying out at the simulator of 4 nuclears, can obtain average 34% speed-up ratio raising.The employing method for synchronous is carried out, the minimum value of data dependence frequency is set as 15%, thread body entrance locates 1 to the code distance of thread outlet than minimum value to the code distance of reading instruction and write command, choosing the instruction of read/write internal memory carries out synchronously, compare with asynchronous supposition execution, speed-up ratio can obtain average 7% lifting.As seen method for synchronous of the present invention has obvious lifting to inferring the speed-up ratio of carrying out.
Description of drawings
Fig. 1 is the schematic flow sheet of the speculative multithreading memory data synchronous manner of execution under the compiler support.
Fig. 2 be synchronous father's thread with the read/write instruction of step thread to the synchronous operation synoptic diagram.
Fig. 3 is multiplied unit (multi-core system) the structural framing synoptic diagram that the multithreading memory data are carried out.
Fig. 4 is the Sbuffer design diagram.
Fig. 5 is the operation chart of implementing with the read operation of step thread synchronization.
Embodiment
Below in conjunction with accompanying drawing the present invention is described in further detail.The explanation of the invention is not limited.
Referring to Fig. 1, the speculative multithreading system that shown compiler is supported mainly is comprised of compiler and simulator two parts; Wherein, the hardware program parser can be the module that obtains by extender memory read-write Instruction Statistics module on simulator.The historical reading writing information that logging program was carried out when the hardware program parser moved in program, and internal storage data Dependency Specification by the pc value of instruction, are determined the position of read write command by the compilation linking device.When hardware program parser during at working procedure, its purpose mainly is the dynamic memory access feature of parsing program, and program is moved in the lightweight mode, can't bring too much overhead.
Speculative multithreading memory data synchronous manner of execution under the compiler support may further comprise the steps:
1) after program generates the multithreading assembly routine by the hardware thread divider divides, as the input of hardware program parser, and operation thereon;
2) the hardware program parser is added up internal storage data and is relied in violation of rules and regulations situation when working procedure, and statistics information is fed back to synchronous collector;
3) synchronously collector reads statistics information, obtains candidate's read/write instruction pair of carrying out multithreading synchronous operation by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with and read synchronously instruction synload, behind write command, add one or more forward instruction; After initiating instruction spawn, the thread of write command place thread adds a synset instruction; Finally by moving at simulator after the loading of compilation linking generation executable file;
When arriving the stall instruction with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until the corresponding complete corresponding write command of synchronous father's thread execution and forward instruction, receive the synchronization message that contains memory address and data writing that synchronous father's thread sends, again according to the synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, determine the data of whether using synchronous father's thread to send according to this zone bit, then Stop-wait continues to carry out subsequent instructions;
With complete by the synload instruction of the operation of step thread after receiving synchronization message, synchronously father's thread operation of sending synchronization message behind data writing is finished by the forward instruction;
4) multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Above-mentioned steps 3) the described synchronous collector synchronous cost of calculating of carrying out is the potentiality of read/write instruction to carrying out bringing speed-up ratio to improve after the synchronous operation of calculating multithreading, obtains speed-up ratio and improves the large candidate's read/write instruction of potentiality pair.Described speed-up ratio improves the large read/write instruction of potentiality to being:
For the internal memory read/write instruction of frequent occurrence data dependence pair, carry out often bringing synchronously higher speed-up ratio to promote to it; And for the memory read-write instruction of non-frequent occurrence data dependence pair, add synchronic command for it after, tend to during operation occur that some are unnecessary synchronously, or excessive synchronia;
Except the frequency that data dependence occurs, thread entry also is used as the Consideration of estimating synchronous cost to the code distance between the write command etc.If thread body entrance is larger to the code distance that synchronous candidate reads instruction, and synchronously the candidate to read instruction less to the code distance between synchronous candidate's write command, then synchronous effect can be more remarkable.Consider above-mentioned factor, choose the read/write instruction pair with larger speed-up ratio Improvement.
For the frequency of internal memory read/write instruction generation data dependence, and thread body entrance is to the code distance of reading instruction, and write command all depends on concrete executive routine to the code distance of thread outlet.This method by experiment data is determined suitable threshold values, as the synchronization criterion of internal memory read/write instruction.
The data-flow analysis that described synchronous collector is carried out is that the control flow graph in program carries out, to find the insertion position of stall instruction and forward instruction; Wherein, the stall instruction be inserted into need synchronous read instruction before, the forward instruction be inserted into need synchronous write command and and the thread outlet between each paths on.
Based on said method, finish the embodiment of the device of the speculative multithreading memory data synchronous manner of execution under the compiler support, comprise with lower module:
Thread divider, procedure division generated the multithreading assembly routine after, as the input of hardware program parser, and operation thereon;
Hardware program parser, hardware program parser are added up internal storage data and are relied in violation of rules and regulations situation when working procedure, and statistics information is fed back to synchronous collector;
Synchronous collector, collector reads statistics information synchronously, obtains candidate's read/write instruction pair of carrying out multithreading synchronous operation by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with and read synchronously instruction synload, behind write command, add one or more forward instruction, after initiating instruction spawn, the thread of write command place thread adds a synset instruction;
Compilation linking device, assembly routine generate binary executable file after compilation linking;
Simulator, executable file moves at simulator after loading; Comprise the expansion support of internal memory under the synchronization mechanism being read instruction and internal memory write command;
Obtain being operating as of internal storage data under the synchronization mechanism: when arriving the stall instruction with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until the corresponding complete corresponding write command of synchronous father's thread execution and forward instruction, receive the synchronization message that contains memory address and data writing that synchronous father's thread sends, again according to the synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, determine the data of whether using synchronous father's thread to send according to this zone bit, then Stop-wait continues to carry out subsequent instructions;
Write and send being operating as of data under the synchronization mechanism: when the complete corresponding write command of synchronous father's thread execution and forward instruction, to contain memory address corresponding to write command and be saved in the transmission address buffer, memory address and the data writing form with synchronization message will be sent to the step thread;
Multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation.
Described transmission address buffer is the data buffer of the address of all data that sent by the forward instruction of record.
Referring to Fig. 3, multiplied unit (multinuclear) architecture frame by the on-chip bus connection shown in this embodiment, simulator is realized according to this structural framing, comprise that the hardware program parser relies on the in violation of rules and regulations statistics of situation for internal storage data, and in order to support the synchronous operation of sub-thread and father's thread, the functional module of expansion: send address buffer (FAB) and local register file Sbuffer; In addition, thread controller, tentative data buffer memory and thread master control are in order to support multithreading to infer the module of expansion, are equal to the speculative multithreading chip multi-core processor of routine or quite.Connect a plurality of processing units by on-chip bus, multithreading is inferred execution in a synchronous manner at a plurality of processing units.
Embodiment shown in Figure 3, the hardware program parser of processing unit specifically are made of a nuclear internal program analysis controller and analysis statistics buffer memory.
Sending address buffer is a very little data buffer, is used for the address of all data that sent by the forward instruction of minute book thread.The Main Function that sends address buffer is: the memory address of recording the synchrodata of transmission, the synchronous write thread is in operation subsequently, if carry out write operation (store operation), then need with the address of this write operation with send the address of storing in the address buffer and compare one by one, see to send in the address buffer whether stored identical memory address.If inquire identical address, show that then sending to the data of reading synchronously thread still is not up-to-date data.In this case, the synchronous write thread can send message to reading synchronously thread, restarts and reads synchronously thread.Write send address buffer after, the synchronous write thread can send to the memory address of the data that write and data writing respectively with this and read synchronously thread in the mode of message.Through hardware program parser statistics, a thread needs synchronous data to be no more than at most 8, and it is very little therefore to send address buffer, can not bring too much overhead to system when operation.
Described Sbuffer is for the register file that receives synchronization message and control synchronous operation.This document is that processor core is privately owned.Referring to Sbuffer design diagram shown in Figure 4, Sbuffer is comprised of five territories: three control zone bit sync_flag, ready_flag and use_flag, two message field message_addr and message_value; The thread process unit can reset all zone bits of Sbuffer when initialization.
Whether the sync_flag bit representation presses method for synchronous is carried out, and when this zone bit is set, represents to carry out according to method for synchronous; When this zone bit is reset, represent method execution routinely, this moment, the stall instruction was skipped, and the operation that the synload instruction is finished is equal to conventional load instruction, and the data in the message field are invalid;
Whether ready_flag bit representation synchronization message is ready to, and is ready to when this zone bit is set the expression synchronization message;
Whether the use_flag bit representation uses data in the message with the step thread, will use data in the message when this zone bit is set expression with the step thread, when the expression that is reset of this zone bit will be used conventional read operation acquisition internal storage data with the step thread;
Message_addr is used for the addr territory of storing message;
Message_value is used for the value territory of storing message.
The below provides candidate's read/write instruction in synchronous father's thread a kind of implementation with the concrete operations of carrying out synchronously with the step thread.Wherein, the synset instruction is the synchronous operation house-keeping instruction of new expansion, its effect be guarantee carry out data when synchronous synchronous father's thread and with the step thread in exist.After a new thread is initiated in a thread execution spawn instruction, can be by the execution of synset instruction, with the position, syn_flag position of its sub-thread Sbuffer.After only having syn_flag position as the Sbuffer of a thread by the position, this thread execution stall instruction just can enter waiting status.Otherwise stall instruction meeting is skipped, and does not carry out any operation.
The operation steps of synchronously father's thread transmission synchronization message is:
(1) after father's thread is initiated with the step thread synchronously, synchronously father's thread execution synset instruction, to sending message with the step thread, will be with the position, syn_flag position of step thread Sbuffer;
(2) carry out the store*p operation, the conventional internal memory operation of writing;
(3) carry out save_address (p) operation, the memory address p of data writing is saved in the transmission address buffer;
(4) carry out receiver_id=get_immed_successor (), obtain the processor numbering with the step thread;
(5) carry out the mesg.value=*p operation, will write data that internal memory operation writes and write the value territory of synchronization message mesg;
(6) carry out the mesg.addr=p operation, the memory address p of data writing is write the addr territory of synchronization message;
(7) carry out send (mesg, receiver_id) operation, the synchronization message that structure is good sends to the step thread.
For synchronous writing operation, after carrying out the store instruction, the hardware implement routine write internal memory operation.Finish write internal memory operation after, the synchronous write thread need to store the memory address of write operation among the FAB into.After carrying out the store operation, other operation is finished by the forward instruction, and the function of forward instruction is that value and corresponding memory address that the internal memory write command is write are sent to sub-thread.Father's thread sends to the synchronization message of synchronous father's thread synchronously, comprises two territories of addr and value, and addr deposits memory address in the territory, and value deposits data writing in the territory.
Referring to Fig. 5, the operation steps that receives synchronization message with the step thread is:
(1) carries out stall () operation, whether be set with the ready_flag position of step thread pool Sbuffer; If not, then keep blocked state with the step thread; If so, then finish blocked state with the step thread, carry out subsequent operation;
(2) carry out receive (mesg) operation, receive synchronization message after, the ready_flag position of Sbuffer is set;
(3) carry out message_addr=mesg.addr, obtain the memory address of synchronization message, deposit in the message_addr territory of Sbuffer;
(4) carry out message_value=mesg.value, obtain the internal storage data value of synchronization message, deposit in the message_value territory of Sbuffer;
(5) carry out verify (q, message_addr) operation, check that whether the memory address of receiving synchronization message is same address with the memory address q that the write command operation will be read, if it is with the use_flag zone bit set of Sbuffer;
(6) carry out the loaded_value=load*q operation, carry out conventional rdma read operation, the data writing of reading is put into memory_value; When carrying out read operation, whether check current thread to memory address q place data writing, if so, then the use_flag zone bit with Sbuffer resets;
(7) carry out result=choose (loaded_value, message_value) operation, if the use_flag zone bit of Sbuffer is set, then with the result of the internal storage data value in the step thread employing synchronization message as the last reading out data of thread synchronization read operation; If the use_flag zone bit is reset, then use the value of loaded_value as the result of the last reading out data of thread synchronization read operation;
(8) carry out quit () operation, the ready_flag position of Sbuffer is resetted, the use_flag position of Sbuffer is resetted.
For synchronous read operation, at first, after reading synchronously thread execution stall instruction, can enter waiting status (the ready_flag position of Sbuffer is not set, and is in blocked state with the step thread), wait for that synchronous father's thread sends message.After receiving message, be set with the ready_flag position of the Sbuffer of step thread, finish to wait for.After obtaining message, by conventional load operation, read an internal storage data value with the step thread.When carrying out conventional read operation, processor will check whether processor carried out write operation to same address before this.If so, then the use_flag zone bit with Sbuffer resets, to avoid using legacy data to replace new data.Then carry out verify () operation, whether the memory address that comprises in the address that reading out data is wanted in the read operation that inspection will be carried out and the data of sending equates.If equate, then with the position, use_flag position of Sbuffer.Then, by choose () operation, choose suitable value, as the end product of synchronous read operation.
According to above-mentioned synchronous operation, specifically 7 test procedure mst, bh, tsp have been chosen from the Olden benchmark program, em3d, bisort, vovonoi and health, press the Prophet thread dividing method, infer the test of carrying out at the simulator of 4 nuclears, can obtain average 34% speed-up ratio raising.
And the minimum value of data dependence frequency is set as 15%, thread body entrance locates 1 to the code distance of thread outlet than minimum value to the code distance of reading instruction and write command, choosing the instruction of read/write internal memory carries out synchronously, compare with asynchronous supposition execution, speed-up ratio can obtain average 7% lifting.As seen method for synchronous of the present invention has obvious lifting to inferring the speed-up ratio of carrying out.

Claims (7)

1. the speculative multithreading memory data synchronous manner of execution under the compiler support is characterized in that, multithread programs realizes that internal storage data may further comprise the steps synchronously:
1) after program generates the multithreading assembly routine by the hardware thread divider divides, as the input of hardware program parser, and operation thereon;
2) the hardware program parser is added up internal storage data and is relied in violation of rules and regulations situation when working procedure, and statistics information is fed back to synchronous collector;
3) synchronously collector reads statistics information, obtains candidate's read/write instruction pair of carrying out multithreading synchronous operation by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with and read synchronously instruction synload, behind write command, add one or more forward instruction; After initiating instruction spawn, the thread of write command place thread adds a synset instruction; Finally by moving at simulator after the loading of compilation linking generation executable file;
When arriving the stall instruction with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until the corresponding complete corresponding write command of synchronous father's thread execution and forward instruction, receive the synchronization message that contains memory address and data writing that synchronous father's thread sends, again according to the synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, determine the data of whether using synchronous father's thread to send according to this zone bit, then Stop-wait continues to carry out subsequent instructions;
With complete by the synload instruction of the operation of step thread after receiving synchronization message, synchronously father's thread operation of sending synchronization message behind data writing is finished by the forward instruction;
4) multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation;
The operation steps that described synchronous father's thread sends synchronization message is:
(1) after father's thread is initiated with the step thread synchronously, synchronously father's thread execution synset instruction, to sending message with the step thread, will be with the position, syn_flag position of step thread Sbuffer;
(2) carry out the store*p operation, the conventional internal memory operation of writing;
(3) carry out save_address (p) operation, the memory address p of data writing is saved in the transmission address buffer;
(4) carry out receiver_id=get_immed_successor (), obtain the processor numbering with the step thread;
(5) carry out the mesg.value=*p operation, will write data that internal memory operation writes and write the value territory of synchronization message mesg;
(6) carry out the mesg.addr=p operation, the memory address p of data writing is write the addr territory of synchronization message;
(7) carry out send (mesg, receiver_id) operation, the synchronization message that structure is good sends to the step thread;
Described operation steps with step thread reception synchronization message is:
(1) carries out stall () operation, whether be set with the ready_flag position of step thread pool Sbuffer; If not, then keep blocked state with the step thread; If so, then finish blocked state with the step thread, carry out subsequent operation;
(2) carry out receive (mesg) operation, receive synchronization message after, the ready_flag position of Sbuffer is set;
(3) carry out message_addr=mesg.addr, obtain the memory address of synchronization message, deposit in the message_addr territory of Sbuffer;
(4) carry out message_value=mesg.value, obtain the internal storage data value of synchronization message, deposit in the message_value territory of Sbuffer;
(5) carry out verify (q, message_addr) operation, check that whether the memory address of receiving synchronization message is same address with the memory address q that the write command operation will be read, if it is with the use_flag zone bit set of Sbuffer;
(6) carry out the loaded_value=load*q operation, carry out conventional rdma read operation, the data writing of reading is put into memory_value; When carrying out read operation, whether check current thread to memory address q place data writing, if so, then the use_flag zone bit with Sbuffer resets;
(7) carry out result=choose (loaded_value, message_value) operation, if the use_flag zone bit of Sbuffer is set, then with the result of the internal storage data value in the step thread employing synchronization message as the last reading out data of thread synchronization read operation; If the use_flag zone bit is reset, then use the value of loaded_value as the result of the last reading out data of thread synchronization read operation;
(8) carry out quit () operation, the ready_flag position of Sbuffer is resetted, the use_flag position of Sbuffer is resetted;
Described Sbuffer is for the register file that receives synchronization message and control synchronous operation; Sbuffer is comprised of five territories: three control zone bit sync_flag, ready_flag and use_flag, two message field message_addr and message_value; The thread process unit can reset all zone bits of Sbuffer when initialization;
Whether the sync_flag bit representation presses method for synchronous is carried out, and when this zone bit is set, represents to carry out according to method for synchronous; When this zone bit is reset, represent method execution routinely, this moment, the stall instruction was skipped, and the operation that the synload instruction is finished is equal to conventional load instruction, and the data in the message field are invalid;
Whether ready_flag bit representation synchronization message is ready to, and is ready to when this zone bit is set the expression synchronization message;
Whether the use_flag bit representation uses data in the message with the step thread, will use data in the message when this zone bit is set expression with the step thread, when the expression that is reset of this zone bit will be used conventional read operation acquisition internal storage data with the step thread;
Message_addr is used for storing the addr territory of synchronization message;
Message_value is used for storing the value territory of synchronization message.
2. the speculative multithreading memory data synchronous manner of execution under the compiler support as claimed in claim 1, it is characterized in that, described statistics internal storage data relies in violation of rules and regulations, and situation comprises: the number of times that read/write instruction is performed and PC value thereof, read/write instruction generation writeafterread internal storage data relies on number of times in violation of rules and regulations, the writeafterread internal storage data occurs rely on the interval time that read/write instruction is in violation of rules and regulations carried out; Statistics information is written in the file, feeds back to synchronous collector.
3. the speculative multithreading memory data synchronous manner of execution under the compiler support as claimed in claim 1, it is characterized in that, the synchronous cost of described calculating, be the potentiality of read/write instruction to carrying out bringing speed-up ratio to improve after the synchronous operation of calculating multithreading, obtain speed-up ratio and improve the large candidate's read/write instruction of potentiality pair.
4. the speculative multithreading memory data synchronous manner of execution under the compiler support as claimed in claim 3, it is characterized in that, described speed-up ratio improve potentiality by the internal memory read/write instruction to the frequency of data dependence occurs, thread body entrance is to the code distance of reading instruction and read instruction and judge to the code distance between the write command.
5. the speculative multithreading memory data synchronous manner of execution under the compiler support as claimed in claim 1 is characterized in that, described data-flow analysis is that the control flow graph in program carries out, to find the insertion position of stall instruction and forward instruction; Wherein, the stall instruction be inserted into need synchronous read instruction before, the forward instruction be inserted into need synchronous write command and and the thread outlet between each paths on.
6. a device of finishing the speculative multithreading memory data synchronous manner of execution under the compiler support is characterized in that, comprises with lower module:
Thread divider, procedure division generated the multithreading assembly routine after, as the input of hardware program parser, and operation thereon;
Hardware program parser, hardware program parser are added up internal storage data and are relied in violation of rules and regulations situation when working procedure, and statistics information is fed back to synchronous collector;
Synchronous collector, collector reads statistics information synchronously, obtains candidate's read/write instruction pair of carrying out multithreading synchronous operation by calculating synchronous cost; Then by after the data-flow analysis, before reading instruction, add a stall instruction, and will read instruction and replace with and read synchronously instruction synload, behind write command, add one or more forward instruction, after initiating instruction spawn, the thread of write command place thread adds a synset instruction;
Compilation linking device, assembly routine generate binary executable file after compilation linking;
Simulator, executable file moves at simulator after loading; Comprise the expansion support of internal memory under the synchronization mechanism being read instruction and internal memory write command;
Obtain being operating as of internal storage data under the synchronization mechanism: when arriving the stall instruction with the step thread execution, stop to carry out subsequent instructions, wait for a period of time; Until the corresponding complete corresponding write command of synchronous father's thread execution and forward instruction, receive the synchronization message that contains memory address and data writing that synchronous father's thread sends, again according to the synchronizing information that receives and the write situation of the sub-thread of this preamble to same memory address, use_flag among the Sbuffer is carried out set or resets, determine the data of whether using synchronous father's thread to send according to this zone bit, then Stop-wait continues to carry out subsequent instructions;
Write and send being operating as of data under the synchronization mechanism: when the complete corresponding write command of synchronous father's thread execution and forward instruction, to contain memory address corresponding to write command and be saved in the transmission address buffer, memory address and the data writing form with synchronization message will be sent to the step thread;
Multithread programs is inferred execution in a synchronous manner, the result of simulator written-out program operation;
The operation steps that described synchronous father's thread sends synchronization message is:
(1) after father's thread is initiated with the step thread synchronously, synchronously father's thread execution synset instruction, to sending message with the step thread, will be with the position, syn_flag position of step thread Sbuffer;
(2) carry out the store*p operation, the conventional internal memory operation of writing;
(3) carry out save_address (p) operation, the memory address p of data writing is saved in the transmission address buffer;
(4) carry out receiver_id=get_immed_successor (), obtain the processor numbering with the step thread;
(5) carry out the mesg.value=*p operation, will write data that internal memory operation writes and write the value territory of synchronization message mesg;
(6) carry out the mesg.addr=p operation, the memory address p of data writing is write the addr territory of synchronization message;
(7) carry out send (mesg, receiver_id) operation, the synchronization message that structure is good sends to the step thread;
Described operation steps with step thread reception synchronization message is:
(1) carries out stall () operation, whether be set with the ready_flag position of step thread pool Sbuffer; If not, then keep blocked state with the step thread; If so, then finish blocked state with the step thread, carry out subsequent operation;
(2) carry out receive (mesg) operation, receive synchronization message after, the ready_flag position of Sbuffer is set;
(3) carry out message_addr=mesg.addr, obtain the memory address of synchronization message, deposit in the message_addr territory of Sbuffer;
(4) carry out message_value=mesg.value, obtain the internal storage data value of synchronization message, deposit in the message_value territory of Sbuffer;
(5) carry out verify (q, message_addr) operation, check that whether the memory address of receiving synchronization message is same address with the memory address q that the write command operation will be read, if it is with the use_flag zone bit set of Sbuffer;
(6) carry out the loaded_value=load*q operation, carry out conventional rdma read operation, the data writing of reading is put into memory_value; When carrying out read operation, whether check current thread to memory address q place data writing, if so, then the use_flag zone bit with Sbuffer resets;
(7) carry out result=choose (loaded_value, message_value) operation, if the use_flag zone bit of Sbuffer is set, then with the result of the internal storage data value in the step thread employing synchronization message as the last reading out data of thread synchronization read operation; If the use_flag zone bit is reset, then use the value of loaded_value as the result of the last reading out data of thread synchronization read operation;
(8) carry out quit () operation, the ready_flag position of Sbuffer is resetted, the use_flag position of Sbuffer is resetted;
Described Sbuffer is for the register file that receives synchronization message and control synchronous operation; Sbuffer is comprised of five territories: three control zone bit sync_flag, ready_flag and use_flag, two message field message_addr and message_value; The thread process unit can reset all zone bits of Sbuffer when initialization;
Whether the sync_flag bit representation presses method for synchronous is carried out, and when this zone bit is set, represents to carry out according to method for synchronous; When this zone bit is reset, represent method execution routinely, this moment, the stall instruction was skipped, and the operation that the synload instruction is finished is equal to conventional load instruction, and the data in the message field are invalid;
Whether ready_flag bit representation synchronization message is ready to, and is ready to when this zone bit is set the expression synchronization message;
Whether the use_flag bit representation uses data in the message with the step thread, will use data in the message when this zone bit is set expression with the step thread, when the expression that is reset of this zone bit will be used conventional read operation acquisition internal storage data with the step thread;
Message_addr is used for storing the addr territory of synchronization message;
Message_value is used for storing the value territory of synchronization message.
7. the device of finishing the speculative multithreading memory data synchronous manner of execution under the compiler support as claimed in claim 6, it is characterized in that described transmission address buffer is the data buffer of the address of all data that sent by the forward instruction of record.
CN 201010160536 2010-04-30 2010-04-30 Speculative multithreading memory data synchronous execution method under support of compiler and device thereof Expired - Fee Related CN101833440B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010160536 CN101833440B (en) 2010-04-30 2010-04-30 Speculative multithreading memory data synchronous execution method under support of compiler and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010160536 CN101833440B (en) 2010-04-30 2010-04-30 Speculative multithreading memory data synchronous execution method under support of compiler and device thereof

Publications (2)

Publication Number Publication Date
CN101833440A CN101833440A (en) 2010-09-15
CN101833440B true CN101833440B (en) 2013-01-02

Family

ID=42717520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010160536 Expired - Fee Related CN101833440B (en) 2010-04-30 2010-04-30 Speculative multithreading memory data synchronous execution method under support of compiler and device thereof

Country Status (1)

Country Link
CN (1) CN101833440B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063291B (en) * 2011-01-13 2013-08-28 上海大学 Multilevel parallel execution method of speculation thread
CN102073580B (en) * 2011-02-01 2013-10-02 华为技术有限公司 Performance analyzing method and tool and computer system
CN107608912B (en) * 2013-08-20 2020-10-16 华为技术有限公司 Memory physical address query method and device
CN104915180B (en) * 2014-03-10 2017-12-22 华为技术有限公司 A kind of method and apparatus of data manipulation
CN105138309B (en) * 2015-07-09 2018-07-17 西安交通大学 More supposition paths thread dividing method under speculative multithreading mechanism based on CMP
CN109828788A (en) * 2018-12-21 2019-05-31 天翼电子商务有限公司 The regulation engine accelerated method executed and system are speculated based on thread-level
CN110347400B (en) * 2019-06-13 2023-03-14 西安科技大学 Compile acceleration method, routing unit and cache
CN112130898A (en) 2019-06-24 2020-12-25 华为技术有限公司 Method and device for inserting synchronous instruction
CN115202662A (en) * 2021-04-14 2022-10-18 华为技术有限公司 Code checking method under weak memory order architecture and corresponding equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1885275A (en) * 2005-06-20 2006-12-27 中兴通讯股份有限公司 Embedded system and real-time monitoring and processing method thereof
CN101630276A (en) * 2009-08-18 2010-01-20 深圳市融创天下科技发展有限公司 High-efficiency memory pool access method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7111294B2 (en) * 2001-01-16 2006-09-19 Microsoft Corporation Thread-specific heaps

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1885275A (en) * 2005-06-20 2006-12-27 中兴通讯股份有限公司 Embedded system and real-time monitoring and processing method thereof
CN101630276A (en) * 2009-08-18 2010-01-20 深圳市融创天下科技发展有限公司 High-efficiency memory pool access method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Xuhao Wang, Yinliang Zhao, Yuanke Wei, Shaolong Song, Bo Han.《Prophet Synchronization Thread Model and Compiler Support》.《ISPA 2010》.2010,81-87. *

Also Published As

Publication number Publication date
CN101833440A (en) 2010-09-15

Similar Documents

Publication Publication Date Title
CN101833440B (en) Speculative multithreading memory data synchronous execution method under support of compiler and device thereof
Burckhardt et al. Concurrent programming with revisions and isolation types
DE102020115581A1 (en) MANAGEMENT OF PLANNING DEPENDENCIES BY A COMPILER FOR NEURAL NETWORKS
Raman et al. Speculative parallelization using software multi-threaded transactions
Huang et al. Decoupled software pipelining creates parallelization opportunities
Tian et al. Supporting speculative parallelization in the presence of dynamic data structures
DE4206062C2 (en) Instruction pipeline processing
US20120324472A1 (en) Transactional computation on clusters
CN101807144B (en) Prospective multi-threaded parallel execution optimization method
US20100070730A1 (en) Minimizing memory access conflicts of process communication channels
Zhai et al. Compiler optimization of memory-resident value communication between speculative threads
US9069545B2 (en) Relaxation of synchronization for iterative convergent computations
US20140053131A1 (en) Automatic asynchronous offload for many-core coprocessors
Gramoli et al. On the input acceptance of transactional memory
CN101655783B (en) Forward-looking multithreading partitioning method
CN101510162A (en) Software transaction internal memory implementing method based on delaying policy
Anantpur et al. Runtime dependence computation and execution of loops on heterogeneous systems
CN101719116B (en) Method and system for realizing transaction memory access mechanism based on exception handling
Tuck et al. SoftSig: software-exposed hardware signatures for code analysis and optimization
CN103019655B (en) Towards memory copying accelerated method and the device of multi-core microprocessor
CN109507991A (en) A kind of two axis servo control platform debugging system and method
Zhang et al. GPU-TLS: An efficient runtime for speculative loop parallelization on gpus
CN114116015B (en) Method and system for managing hardware command queue
US10228948B2 (en) Parallelization method, parallelization tool, and in-vehicle device
US20140223419A1 (en) Compiler, object code generation method, information processing apparatus, and information processing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130102

Termination date: 20160430