CN102981839A - Data expanding optimization method of merging execution large-scale parallel thread - Google Patents

Data expanding optimization method of merging execution large-scale parallel thread Download PDF

Info

Publication number
CN102981839A
CN102981839A CN2012104413292A CN201210441329A CN102981839A CN 102981839 A CN102981839 A CN 102981839A CN 2012104413292 A CN2012104413292 A CN 2012104413292A CN 201210441329 A CN201210441329 A CN 201210441329A CN 102981839 A CN102981839 A CN 102981839A
Authority
CN
China
Prior art keywords
thread
statement
variable
invariant
carried out
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104413292A
Other languages
Chinese (zh)
Other versions
CN102981839B (en
Inventor
吴伟
卿鹏
文延华
王珊珊
何王全
刘勇
方燕飞
毛兴权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201210441329.2A priority Critical patent/CN102981839B/en
Publication of CN102981839A publication Critical patent/CN102981839A/en
Application granted granted Critical
Publication of CN102981839B publication Critical patent/CN102981839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a data expanding optimization method of merging execution large-scale parallel thread. The data expanding optimization method of merging execution large-scale parallel thread comprises that a thread invariant in a parallel thread is distinguished in the process of merging execution; the thread invariant is stayed the same in each parallel thread in the process of merging execution; in the compilation process, a non-thread variant is data expanded. The data expanding optimization method of merging execution large-scale parallel thread reduces unnecessary thread private data expansion, is capable of remitting severely expanding of stack space after the process of merging and improves execution efficiency of a program.

Description

Merge the data expansion optimization method of carrying out the large-scale parallel thread
Technical field
The present invention relates to computer realm, relate in particular to a kind of data expansion optimization method of carrying out the large-scale parallel thread that merges.
Background technology
In recent years, multinuclear, the fast development of many nuclear system structures, for the application developer provides computing power from strength to strength, the developer of business software hankers after utilizing the parallel processor of various maturations on the market to develop the application software of high degree of parallelism.Yet, an application program that has specific parallel granularity and implementation can not adapt to various parallel processing platforms well, therefore need a powerful parallel programming model and compilation tool support, with feature and the ability that effectively utilizes each parallel processor.Many research work launch based on this, for example mix multi-core parallel concurrent programming tool (HMPP, Hybrid MulticoreParallel Programming workbench) instruct statement that C or Fortran program translation are become to calculate unified equipment framework (CUDA by compiling, Compute Unified Device Architecture) or open computational language (OpenCL, Open Computing Language) program, the people such as S.-W.Liao translate the stream handling procedure of string routine design language (Brook) design on the multi-core CPU and carry out.And multinuclear CUDA (MCUDA, Multicore-CUDA) and then be devoted to the CUDA program portable to there not being graphic process unit (GPU towards the CUDA C compiler (CUDA-X86) of x86 platform, Graphics Processing Unit) carries out in the multi-core CPU system, wish to promote thus CUDA framework popularizing in whole high performance parallel process field.
In the such programming model of CUDA, Opencl, allow the huge thread of user's usage quantity, this is that present multi-core CPU system can't support, therefore the such implementation of similar MCUDA and CUDA-x86 must be in the face of merging the problem of carrying out the large-scale parallel thread.In order to guarantee the semanteme of original program, be operating as the basis with thread synchronization, program can be divided into a plurality of loop bodies, and several parallel threads are carried out in each loop body circulation, are about to these parallel threads and merge to serial execution on the nuclear.Will guarantee the consistance of original thread private data between such thread circulation, this need to expand private data, and namely scalar becomes array, and one-dimension array becomes two-dimensional array etc.As shown in Figure 1, the data a of double-precision floating point type is expanded to the array a[THREAD_NUM of double-precision floating point type], with the two-dimensional array d[X of integer] [Y] expand to integer three-dimensional array d[THREAD_NUM] [X] [Y] etc.The size of expansion is for merging the Thread Count of carrying out, and namely the THREAD_NUM shown in the figure could for each thread reservation data separately, guarantee the correctness of carrying out like this.
Can find out, can cause the violent expansion of stack space after the data expansion, affect program implementation efficient.In order to address this problem, processing mode commonly used is the definite value reference information by situational variables, finds out the variable (being the thread local variable) that is confined to life cycle in the single thread circulation.For this class variable, can carry out the data extended operation, the method can find introduction in the relevant documentation of MCUDA.Such as the loop variable a among Fig. 2, its definite value all is confined in the thread circulation, that is: in for loop statement with quoting.In the thread circulation that merges after carrying out, each thread can to this variable again assignment and use, therefore not need to carry out the data expansion.
Yet, only use this optimal way still can leave over lower many variablees that needs the data expansion, the violent expansion issues of stack space is not effectively alleviated, and has brought huge challenge for the program optimization in later stage.Thereby how to reduce unnecessary thread private data expansion, improve program implementation efficient and become the technical barrier that those skilled in the art need to be resolved hurrily.
Be in the Chinese patent application of CN101937367A in publication No., disclosed more related contents.
Summary of the invention
Technical matters to be solved by this invention is to reduce unnecessary thread private data expansion, alleviates the violent expansion of stack space, improves program implementation efficient.
In order to address the above problem, the invention provides a kind of data expansion optimization method of carrying out the large-scale parallel thread that merges, comprising:
Identification thread invariant in merging the parallel thread of carrying out; Described thread invariant is consistent in the parallel thread that each merging is carried out;
In the compilation process, only to non-thread invariant, carry out the data expansion.
Optionally, described identification thread invariant comprises:
Utilize definite value reference information, the read-write correlativity between variable and the structured features of program of variable, the structure complete graph is identified the thread invariant automatically.
Optionally, described structure complete graph, automatically identify the thread invariant and comprise:
Set up thread correlated variables information table;
The scanning syntax tree information take function as unit; If in scanning syntax tree process, in the described thread correlated variables information table newly-increased variable is arranged, then rescans described syntax tree, until the variable in the described thread correlated variables information table no longer includes variation;
The variable in described thread correlated variables information table is not described thread invariant.
Optionally, before the scanning syntax tree information, also comprise take function as unit described:
Described thread correlated variables information table is carried out initialization, and the variable that can distinguish different threads is added in the described thread correlated variables information table.
It is optionally, described that scanning syntax tree information comprises take function as unit:
Variable newly-increased in the thread correlated variables information table is carried out correlation detection, and described thread correlated variables information table is advanced in variable increase that will be relevant with newly-increased variable.
Optionally, described variable newly-increased in the thread correlated variables information table is carried out correlation detection, variable increase that will be relevant with newly-increased variable is advanced described thread correlated variables information table and is comprised:
Find out the statement relevant with newly-increased variable in the current thread correlated variables information table, find in this statement by the variable of assignment, described variable is added in the described thread correlated variables information table;
If this statement is the code block that condition is carried out, then the assigned variable that relates in this code block is all added in the described thread correlated variables information table;
If this statement is the behavior of goto statement and goto is not that whole thread is consistent, the variable that then relates between the label label that goto statement and described goto statement is corresponding all is added in the described thread correlated variables information table.
Optionally, described thread correlated variables information table is chained list.
Optionally, described method also comprises:
The constant statement of identification thread;
According to the program structure feature of the constant statement of described thread, the constant statement of described thread is carried out the program code conversion.
Optionally, the constant statement of described thread comprises: the statement that is made of thread invariant and constant fully.
Optionally, the constant statement of described identification thread comprises: utilize the automatic analysis program statement of compiler, the constant statement of identification thread.
Optionally, described program structure feature according to the constant statement of thread, the step of the constant statement of thread being carried out the program code conversion comprises:
If forming the thread invariant of the constant statement of this thread is the thread local variable, then the constant statement of the described thread of double counting;
If forming the thread invariant of the constant statement of this thread is not the thread local variable, then analyze the read-write properties of described thread invariant;
If described thread invariant exists writeafterread relevant in certain thread circulation, then utilize copy to keep and recover described thread invariant;
If described thread invariant does not exist writeafterread relevant in certain thread circulation, then judge and carry out or the constant statement of the described thread of double counting;
Judge the constant statement of thread of carrying out if all statements in the described thread circulation are, then get rid of the control structure of described thread circulation, single pass is carried out the statement in the described thread circulation.
Optionally, the described copy that utilizes keep to recover to comprise:
Create the copy of a described thread invariant;
Adding an assignment statement before the circulation of described thread, is the value of current thread invariant with the value assignment of copy;
Adding an assignment statement at the beginning in that thread circulation is inner, is the value of described copy with the initial value assignment of thread invariant described in each thread.
Optionally, described judgement execution or the constant statement of the described thread of double counting comprise:
Statistics is judged the constant statement number of thread of carrying out continuously;
When the constant statement of described thread outnumbers predetermined threshold value, then judge and carry out the constant statement of described thread; Otherwise, the constant statement of the described thread of double counting.
Optionally, described predetermined threshold value is 1 ~ 10.
Optionally, the step of the constant statement of described judgement execution thread comprises:
Add the if statement in the outside of the constant statement of described thread, the Rule of judgment of described if statement is first thread number for whether, the only execution one time in first thread of the constant statement of described thread.
Compared with prior art, technical scheme of the present invention has the following advantages:
1, the present invention is by analyzing variable information, identify the thread invariant, do not carry out the data extension process for the thread invariant, reduced the data propagation, can alleviate the violent expansion issues of the stack space that brings after the merging, alleviate the pressure of scratch-pad storage (SPM, Scratch Pad Memory) or the Cache phenomenon of missing the target, for the further optimization of program provides advantage, improve program implementation efficient.
2, in the possibility, also further identify the constant statement of thread, and according to the different characteristic of program structure the constant statement of thread is processed, the program that guaranteed is carried out correctness.
3, in the possibility, identify the thread invariant by the mode of structure complete graph, identifying is more simple, convenient.
Description of drawings
Fig. 1 is thread private data expansion synoptic diagram;
Fig. 2 is thread local variable synoptic diagram;
Fig. 3 is the schematic flow sheet that the present invention merges the first embodiment of the data expansion optimization method of carrying out the large-scale parallel thread;
Fig. 4 is the schematic flow sheet of identification thread invariant step in first embodiment shown in Figure 3;
Fig. 5 is the program synoptic diagram of an embodiment of identification thread invariant step of the present invention;
Fig. 6 is stage result schematic diagram embodiment illustrated in fig. 5;
Fig. 7 is the schematic flow sheet that the present invention merges the second embodiment of the data expansion optimization method of carrying out the large-scale parallel thread;
Fig. 8 is the schematic flow sheet that in described the second embodiment of Fig. 7 the constant statement of thread is carried out the program code shift step;
Fig. 9 is the program synoptic diagram that the constant statement of thread is carried out an embodiment of program code shift step of the present invention;
Figure 10 be embodiment illustrated in fig. 9 carry out after the program code conversion the program synoptic diagram.
Embodiment
A lot of details have been set forth in the following description so that fully understand the present invention.But the present invention can implement much to be different from alternate manner described here, and those skilled in the art can do similar popularization in the situation of intension of the present invention, so the present invention is not subjected to the restriction of following public implementation.
Secondly, the present invention utilizes synoptic diagram to be described in detail, and when the embodiment of the invention was described in detail in detail, for ease of explanation, described synoptic diagram was example, and it should not limit the scope of protection of the invention at this.
The inventor finds in practice: have like this some variablees, that is: thread invariant, it merges in the parallel thread of carrying out at each and is consistent, and can not carry out the data expansion to this class variable, with further minimizing data propagation, thus the violent expansion issues of alleviation stack space.Described thread invariant has comprised thread local variable of the prior art, but is not limited to the thread local variable, and compared to the optimization method of prior art, the diminishbb data propagation of this method is larger.
The invention provides a kind of data expansion optimization method of carrying out the large-scale parallel thread that merges.Fig. 3 is the schematic flow sheet that the present invention merges the first embodiment of the data expansion optimization method of carrying out the large-scale parallel thread.As shown in Figure 3, this embodiment may further comprise the steps:
Execution in step S101, identification thread invariant in merging the parallel thread of carrying out.Described thread invariant is consistent in the parallel thread that each merging is carried out.
Execution in step S102 judges whether current variable is the thread invariant.If then execution in step S103 in compilation process, does not carry out the data expansion to current variable.If not, then execution in step S104 carries out the data expansion to current variable.
Execution in step S105 judges whether that each variable all processes.If then finish.Otherwise, begin circulation from step S102, continue to process next variable.
Fig. 4 is the schematic flow sheet of identification thread invariant step in first embodiment shown in Figure 3.Particularly, can utilize definite value reference information, the read-write correlativity between variable and the structured features of program of variable, the structure complete graph is identified the thread invariant automatically.As shown in Figure 4, the step of identification thread invariant can comprise:
Execution in step S1010 sets up thread correlated variables information table.
Execution in step S1011, initialization, the variable that can distinguish different threads is added in the thread correlated variables information table.
Then, repeatedly scan syntax tree information take function as unit, newly-increased variable in the thread correlated variables information table is carried out correlation detection, and described thread correlated variables information table is advanced in variable increase that will be relevant with newly-increased variable, until the variable in the thread correlated variables information table no longer increases.Particularly, comprising: execution in step S1012, judge in the thread correlated variables information table whether newly-increased variable is arranged.If in the thread correlated variables information table newly-increased variable is arranged, then execution in step S1013 finds out the statement relevant with newly-increased variable in the current thread correlated variables information table.Execution in step S1014 finds in this statement by the thread variable of assignment, and described variable is added in the described thread correlated variables information table.Execution in step S1015 judges whether this statement is the code block that condition is carried out.If then execution in step S10106 adds the assigned variable that relates in this code block in the described thread correlated variables information table.The code block that if this statement is not condition to be carried out, direct execution in step S10107 then judges that further whether this statement is that the behavior of goto statement and goto is not that whole thread is consistent.If the behavior of goto statement and goto is not that whole thread is consistent, then carry out S1018, the variable that relates between the label label that goto statement and described goto statement is corresponding all is added in the described thread correlated variables information table, then begins circulation from step S1012.If this statement is not the goto statement, although perhaps be that the behavior of goto statement goto is that whole thread is consistent, then directly begin circulation from step S1012.
If newly-increased variable in the thread correlated variables information table, execution in step S1019 then, the variable in thread correlated variables information table is not the thread invariant.
Need to prove that it will be understood by those skilled in the art that thread correlated variables information table provides storage space for the centralized stores of thread correlated variables, its data structure can be tree, array, formation, chained list etc., the present invention is not specifically limited this.
Below in conjunction with the drawings and specific embodiments identification thread invariant step among the present invention is described further.
Fig. 5 is the program synoptic diagram of an embodiment of identification thread invariant step of the present invention.Fig. 6 is stage result schematic diagram embodiment illustrated in fig. 5.
As shown in Figure 5, in the present embodiment, N represents total number of threads, and tid represents the thread number index of thread circulation, and thread_Idx represents built-in thread index, is constant the thread of block_Idx in thread_loop.A, B, As are global variable, do not need to carry out the data expansion.Bx, tx, b, a, stride are the thread variable, according to prior art, all need carry out the data expansion.
Now above-mentioned thread variable is identified, if the thread invariant then can not carry out the data expansion.
With reference to figure 6, adopt the data structure of chained list in the present embodiment, the storage thread variable information.
At first, set up a thread correlated variables chained list.
Then, carry out initialization, the thread_Idx that represents built-in thread index is added in the described thread correlated variables chained list.Can distinguish variant thread by the thread_Idx that represents built-in thread index.
Then, owing to added variable thread_Idx, chained list length increases, so thread_Idx is carried out correlation detection.Find the statement relevant with newly-increased variable thread_Idx, that is: a tx[tid]=thread_Idx*bx[tid].Find in this statement by the variable tx of assignment, variable tx is added in the described thread correlated variables chained list.This statement is neither the code block that condition is carried out, neither the goto statement, so, the correlation detection of newly-increased variable thread_Idx is finished.
Because increased variable tx newly in the thread correlated variables chained list, so, need to continue variable tx is carried out correlation detection.Find the statement relevant with newly-increased variable tx, that is: a b[tid] +=As[stride[tid]+tx[tid]].Find in this statement by the variable b of assignment, variable b is added in the described thread correlated variables chained list.This statement is neither the code block that condition is carried out, neither the goto statement, so, the correlation detection of newly-increased variable b is finished.
Because increased variable b newly in the thread correlated variables chained list, so, need to continue newly-increased variable b is carried out correlation detection.Find three statements relevant with newly-increased variable b.
1, b[tid]=0, wherein b is by the object of assignment, b in thread correlated variables chained list, need not to repeat to add;
2, b[tid] +=As[stride[tid]+tx[tid]], same b is by the object of assignment, b in thread correlated variables chained list, need not to repeat to add;
3, B[tx[tid]+a[tid]]=b[idx], be global variable by the object B of assignment wherein, not the thread variable, do not need to carry out the data expansion.
Simultaneously, code block or goto statement that above-mentioned statement neither condition be carried out, so, the correlation detection of newly-increased variable b is finished newly-increased variable in the thread variable chained list.
So far, finish identification thread invariant, that is: the stack variable tx in thread variable chained list and b are the thread variables, need to carry out the data expansion; And stack variable bx, a in thread variable chained list, stride are not the thread invariant, can not carry out the data expansion.
Fig. 7 is the schematic flow sheet that the present invention merges the second embodiment of the data expansion optimization method of carrying out the large-scale parallel thread.Different from last embodiment is also the constant statement of thread have been carried out code change in this embodiment, with the correctness of assurance program execution.As shown in Figure 7, this embodiment may further comprise the steps:
Execution in step S201, identification thread invariant in merging the parallel thread of carrying out.
Execution in step S202 judges whether current variable is the thread invariant.If not, then execution in step S206 carries out data augmentation to current variable.
If current variable is the thread invariant, then execution in step S203 does not carry out the data expansion to current variable.Execution in step S204, the constant statement of identification thread.The constant statement of described thread comprises the statement that is made of thread invariant and constant fully.Particularly, can utilize the automatic analysis program statement of compiler, the constant statement of identification thread.It will be understood by those skilled in the art that existing multiple compiler provides above-mentioned functions in the prior art.
Execution in step S205 according to the program structure feature of the constant statement of thread, carries out the program code conversion to the constant statement of thread.
Execution in step S207 judges whether that each variable all processes.If then finish.If not, then from step S202, continue to process next variable.
Fig. 8 is the schematic flow sheet that in described the second embodiment of Fig. 7 the constant statement of thread is carried out the program code shift step.As shown in Figure 8, the constant statement of thread being carried out the program code shift step comprises:
Execution in step S2050, the thread invariant that judge to form the constant statement of this thread is the thread local variable whether.If the thread local variable, execution in step S2051 then, the constant statement of this thread can not processed, and to avoid increasing the structural complexity of program, then the constant statement of this thread can be repeated to calculate when carrying out merging.
If forming the thread invariant of the constant statement of this thread is not the thread local variable, then continue execution in step S2052, judge whether the thread invariant exists writeafterread relevant in certain thread circulation.If exist writeafterread relevant, then utilize copy to keep and recover.Particularly, create the copy of this thread invariant, added an assignment statement before this whole thread circulation, it is the current constant value of thread that the value of copy is composed.And adding an assignment statement at the beginning in that thread circulation is inner, is the copy of Set For Current with the initial value assignment of this thread invariant in each thread.
If do not exist writeafterread relevant, then continue execution in step S2054, judge whether the constant statement number of thread that continuous judgement is carried out surpasses predetermined threshold value.If not, then execution in step S2051 adopts to repeat the constant statement of described thread, to avoid increasing the if statement, increases the complexity of program structure, and may affect efficient because judge frequently simultaneously.If judge that continuously the constant statement of thread of carrying out outnumbers predetermined threshold value, then execution in step S2055 judges and carries out the constant statement of this thread.Particularly, add the if piece in the outside of the constant statement of this thread, Rule of judgment is first thread number for whether, and namely this statement is by first thread execution one time.And, judge that continuously the statement of carrying out can merge in the same if piece for many.
Need to prove, consider to judge continuously the statement number of carrying out, be based on the compromise of calculated amount and program complexity.Adopt and judge execution, then the constant statement of this thread only need be carried out one time, has reduced calculated amount, but simultaneously owing to increased the if judge module, has increased the complexity of program.And adopting the double counting of not processing, calculated amount increases to some extent, and the complexity of program does not change.The inventor is through repeatedly test, and preferably predetermined threshold value is 1 ~ 10, that is: many statement numbers that satisfy to judge executive conditions were more than or equal to 1 ~ 10 o'clock continuously, and then these statements all adopt and judge execution, and all merge in the if piece.If less than 1 ~ 10, then all adopt double counting, avoid increasing the if statement, increase the complexity of program structure and may affect efficient because judge frequently simultaneously.
Continuation is carried out if adopt step S2055 to judge with reference to figure 8, then continues execution in step S2056, judges whether all statements in the thread circulation are the constant statement of the thread of judging execution.If then execution in step S2057 gets rid of the control structure of this thread circulation, the statement in the circulation becomes single pass to be carried out.
Below in conjunction with the drawings and specific embodiments technical scheme of the present invention is described further.
Fig. 9 is the program synoptic diagram that the constant statement of thread is carried out an embodiment of program code shift step of the present invention.Figure 10 be embodiment illustrated in fig. 9 carry out after the program code conversion the program synoptic diagram.In conjunction with reference to figure 9, Figure 10.
According to the program code order, at first find the constant statement of thread, bx=block_Idx.Thread invariant bx only quotes for customization in this thread circulation, is a thread local variable.Therefore, this statement is not needed to process, in this thread circulation, this statement can be repeated to calculate.
Then, find the constant statement of thread, a=bx*32.Variable a beyond the circulation of this thread, also be cited (that is: statement B[tx[tid]+a]=b[idx] in; ), so variable a is not the thread local variable, so this statement can not adopt double counting.And variable a does not exist writeafterread relevant in the circulation of this thread, and therefore, this statement adopts to be judged and carry out, that is: at the outside if statement that adds of this statement, whether Rule of judgment for being first thread, that is: if (tid==0).Need to prove that this statement also can select to adopt double counting according to the threshold value setting situation.
Then, find the constant statement of thread, stride=512.With regard to this statement, and this statement is the constant statement of thread (analytic process such as statement a=bx*32 repeat no more herein) that can judge execution in the circulation of this thread.So, can remove the control structure of this circulation, change single pass into and carry out, that is: remove the thread_loop statement, stride=512 only carries out one time.
Then, find the constant statement of thread, stride〉〉=1.Thread invariant stride is relevant at writeafterread at this thread cyclic memory, therefore to statement stride〉〉=1 need to utilize copy to keep to recover.Particularly, before the circulation beginning, increase an assignment statement, utilize copy to record currency, that is: _ stride_0=stride.And in circulation at the beginning, increase an assignment statement, the value of thread invariant stride is reverted to copy value, that is: stride=_strid_0 again.
Need to prove that through the above description of the embodiments, those skilled in the art can be well understood to and of the present inventionly partly or entirely can realize by software and in conjunction with essential general hardware platform.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can comprise the one or more machine readable medias that store machine-executable instruction on it, and these instructions are can be so that these one or more machines come executable operations according to embodiments of the invention when carrying out such as the one or more machines such as computing machine, computer network or other electronic equipments.Machine readable media can comprise, but be not limited to floppy disk, CD, CD-ROM(compact-disc-ROM (read-only memory)), magneto-optic disk, ROM(ROM (read-only memory)), the RAM(random access memory), the EPROM(Erasable Programmable Read Only Memory EPROM), the EEPROM(Electrically Erasable Read Only Memory), magnetic or optical card, flash memory or be suitable for store the medium/machine readable media of the other types of machine-executable instruction.
The present invention can be used in numerous general or special purpose computingasystem environment or the configuration.Such as: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system, based on microprocessor system, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, comprise the distributed computing environment of above any system or equipment etc.
The present invention can describe in the general context of the computer executable instructions of being carried out by computing machine, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract data type, program, object, assembly, data structure etc.Also can in distributed computing environment, put into practice the application, in these distributed computing environment, be executed the task by the teleprocessing equipment that is connected by communication network.In distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium that comprises memory device.
Although the present invention with preferred embodiment openly as above; but it is not to limit the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; can utilize method and the technology contents of above-mentioned announcement that technical solution of the present invention is made possible change and modification; therefore; every content that does not break away from technical solution of the present invention; to any simple modification, equivalent variations and modification that above embodiment does, all belong to the protection domain of technical solution of the present invention according to technical spirit of the present invention.

Claims (15)

1. one kind merges the data expansion optimization method of carrying out the large-scale parallel thread, it is characterized in that, comprising:
Identification thread invariant in merging the parallel thread of carrying out; Described thread invariant is consistent in the parallel thread that each merging is carried out;
In the compilation process, only to non-thread invariant, carry out the data expansion.
2. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 1, it is characterized in that described identification thread invariant comprises:
Utilize definite value reference information, the read-write correlativity between variable and the structured features of program of variable, the structure complete graph is identified the thread invariant automatically.
3. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 2, it is characterized in that described structure complete graph is automatically identified the thread invariant and comprised:
Set up thread correlated variables information table;
The scanning syntax tree information take function as unit; If in scanning syntax tree process, in the described thread correlated variables information table newly-increased variable is arranged, then rescans described syntax tree, until the variable in the described thread correlated variables information table no longer includes variation;
The variable in described thread correlated variables information table is not described thread invariant.
4. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 3, it is characterized in that, before the scanning syntax tree information, also comprises take function as unit described:
Described thread correlated variables information table is carried out initialization, and the variable that can distinguish different threads is added in the described thread correlated variables information table.
5. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 4, it is characterized in that, described scanning syntax tree information comprises take function as unit:
Newly-increased variable in the thread correlated variables information table is carried out correlation detection, and described thread correlated variables information table is advanced in variable increase that will be relevant with newly-increased variable.
6. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 5, it is characterized in that, described variable newly-increased in the thread correlated variables information table is carried out correlation detection, variable increase that will be relevant with newly-increased variable is advanced described thread correlated variables information table and is comprised:
Find out the statement relevant with newly-increased variable in the current thread correlated variables information table, find in this statement by the variable of assignment, described variable is added in the described thread correlated variables information table;
If this statement is the code block of having ready conditions and carrying out, then the assigned variable that relates in this code block is all added in the described thread correlated variables information table;
If this statement is the behavior of goto statement and goto is not that whole thread is consistent, the variable that then relates between the label label that goto statement and described goto statement is corresponding all is added in the described thread correlated variables information table.
7. merge the data expansion optimization method of carrying out the large-scale parallel thread such as described any of claim 3 ~ 6, it is characterized in that: described thread correlated variables information table is chained list.
8. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 1, it is characterized in that described method also comprises:
The constant statement of identification thread;
According to the program structure feature of the constant statement of described thread, the constant statement of described thread is carried out the program code conversion.
9. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 8, it is characterized in that the constant statement of described thread comprises: the statement that is made of thread invariant and constant fully.
10. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 8, it is characterized in that the constant statement of described identification thread comprises: utilize the automatic analysis program statement of compiler, the constant statement of identification thread.
11. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 8, it is characterized in that, and described program structure feature according to the constant statement of thread, the step of the constant statement of thread being carried out the program code conversion comprises:
If forming the thread invariant of the constant statement of this thread is the thread local variable, then the constant statement of the described thread of double counting;
If forming the thread invariant of the constant statement of this thread is not the thread local variable, then analyze the read-write properties of described thread invariant;
If described thread invariant exists writeafterread relevant in certain thread circulation, then utilize copy to keep and recover described thread invariant;
If described thread invariant does not exist writeafterread relevant in certain thread circulation, then judge and carry out or the constant statement of the described thread of double counting;
Judge the constant statement of thread of carrying out if all statements in the described thread circulation are, then get rid of the control structure of described thread circulation, single pass is carried out the statement in the described thread circulation.
12. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 11, it is characterized in that, describedly utilizes copy to keep to recover to comprise:
Create the copy of a described thread invariant;
Adding an assignment statement before the circulation of described thread, is the value of current thread invariant with the value assignment of copy;
Adding an assignment statement at the beginning in that thread circulation is inner, is the value of described copy with the initial value assignment of thread invariant described in each thread.
13. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 11, it is characterized in that described judgement execution or the constant statement of the described thread of double counting comprise:
Statistics is judged the constant statement number of thread of carrying out continuously;
When the constant statement of described thread outnumbers predetermined threshold value, then judge and carry out the constant statement of described thread; Otherwise, the constant statement of the described thread of double counting.
14. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 13, it is characterized in that described predetermined threshold value is 1 ~ 10.
15. the data expansion optimization method of large-scale parallel thread is carried out in merging as claimed in claim 13, it is characterized in that the step of the constant statement of described judgement execution thread comprises:
Add the if statement in the outside of the constant statement of described thread, the Rule of judgment of described if statement is first thread number for whether, the only execution one time in first thread of the constant statement of described thread.
CN201210441329.2A 2012-11-06 2012-11-06 Merge the Data expansion optimization method performing large-scale parallel thread Active CN102981839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210441329.2A CN102981839B (en) 2012-11-06 2012-11-06 Merge the Data expansion optimization method performing large-scale parallel thread

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210441329.2A CN102981839B (en) 2012-11-06 2012-11-06 Merge the Data expansion optimization method performing large-scale parallel thread

Publications (2)

Publication Number Publication Date
CN102981839A true CN102981839A (en) 2013-03-20
CN102981839B CN102981839B (en) 2015-08-12

Family

ID=47855904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210441329.2A Active CN102981839B (en) 2012-11-06 2012-11-06 Merge the Data expansion optimization method performing large-scale parallel thread

Country Status (1)

Country Link
CN (1) CN102981839B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391742A (en) * 2014-11-11 2015-03-04 小米科技有限责任公司 Application optimization method and device
CN110069243A (en) * 2018-10-31 2019-07-30 上海奥陶网络科技有限公司 A kind of java program threads optimization method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JOHAN A. STRATTON .ET: "《languages and compliers for parallel computing》", 31 December 2008 *
吴伟等: "FILIC:一种CUDA上的交互型库函数框架", 《计算机科学》 *
孙俊等: "动态二进制翻译中的指令调度技术研究与实现", 《计算机应用与软件》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391742A (en) * 2014-11-11 2015-03-04 小米科技有限责任公司 Application optimization method and device
CN110069243A (en) * 2018-10-31 2019-07-30 上海奥陶网络科技有限公司 A kind of java program threads optimization method
CN110069243B (en) * 2018-10-31 2023-03-03 上海奥陶网络科技有限公司 Java program thread optimization method

Also Published As

Publication number Publication date
CN102981839B (en) 2015-08-12

Similar Documents

Publication Publication Date Title
Fauzia et al. Characterizing and enhancing global memory data coalescing on GPUs
Dave et al. Cetus: A source-to-source compiler infrastructure for multicores
Servat et al. Automating the application data placement in hybrid memory systems
CN104536898B (en) The detection method of c program parallel regions
Kwon et al. A hybrid approach of OpenMP for clusters
CN109643260A (en) Resource high-efficiency using the data-flow analysis processing of analysis accelerator accelerates
CN104035781A (en) Method for quickly developing heterogeneous parallel program
KR20140001864A (en) Tile communication operator
Kim et al. Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures
CN104142819A (en) File processing method and device
Fang et al. Aristotle: A performance impact indicator for the OpenCL kernels using local memory
CN103207786B (en) Gradual intelligent backtracking vector code tuning method
CN102981839B (en) Merge the Data expansion optimization method performing large-scale parallel thread
Wu et al. Bandwidth-aware loop tiling for dma-supported scratchpad memory
CN105447285A (en) Method for improving OpenCL hardware execution efficiency
Selva et al. Building a polyhedral representation from an instrumented execution: Making dynamic analyses of nonaffine programs scalable
Prabhu et al. DAME: A runtime-compiled engine for derived datatypes
CN105487911A (en) Compilation instruction based many-core data fragmentation method
Gupta et al. Statuner: Efficient tuning of cuda kernels parameters
CN102831004B (en) Method for optimizing compiling based on C*core processor and compiler
Van Der Spek et al. Sublimation: expanding data structures to enable data instance specific optimizations
Dooley et al. Detecting and using critical paths at runtime in message driven parallel programs
Fabeiro et al. OCLoptimizer: An iterative optimization tool for OpenCL
CN103530132A (en) Method for transplanting CPU (central processing unit) serial programs to MIC (microphone) platform
Laurenzano et al. A static binary instrumentation threading model for fast memory trace collection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant