CN102981839B - Merge the Data expansion optimization method performing large-scale parallel thread - Google Patents

Merge the Data expansion optimization method performing large-scale parallel thread Download PDF

Info

Publication number
CN102981839B
CN102981839B CN201210441329.2A CN201210441329A CN102981839B CN 102981839 B CN102981839 B CN 102981839B CN 201210441329 A CN201210441329 A CN 201210441329A CN 102981839 B CN102981839 B CN 102981839B
Authority
CN
China
Prior art keywords
thread
statement
invariant
constant
variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210441329.2A
Other languages
Chinese (zh)
Other versions
CN102981839A (en
Inventor
吴伟
卿鹏
文延华
王珊珊
何王全
刘勇
方燕飞
毛兴权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201210441329.2A priority Critical patent/CN102981839B/en
Publication of CN102981839A publication Critical patent/CN102981839A/en
Application granted granted Critical
Publication of CN102981839B publication Critical patent/CN102981839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a kind of Data expansion optimization method merging execution large-scale parallel thread, comprising: in the parallel thread merging execution, identify thread invariant; Described thread invariant was consistent in each merging in the parallel thread performed; In compilation process, only to non-thread invariant, carry out Data expansion.The present invention reduces unnecessary thread private data expansion, effectively can alleviate the violent expansion of stack space after merging, improve the execution efficiency of program.

Description

Merge the Data expansion optimization method performing large-scale parallel thread
Technical field
The present invention relates to computer realm, particularly relate to a kind of Data expansion optimization method merging execution large-scale parallel thread.
Background technology
In recent years, multinuclear, many core architecture fast developments, for application developer provides computing power from strength to strength, the developer of business software hankers after utilizing the parallel processor of various maturation on market to develop the application software of high degree of parallelism.But, an application program having specific parallel granularity and implementation can not adapt to various parallel processing platform well, therefore a powerful parallel programming model and compilation tool support is needed, to effectively utilize the characteristic sum ability of each parallel processor.Many research work launch based on this, such as mix multi-core parallel concurrent programming tool (HMPP, Hybrid MulticoreParallel Programming workbench) instruct statement that C or Fortran program translation is become to calculate Unified Device framework (CUDA by compiling, Compute Unified Device Architecture) or open computational language (OpenCL, Open Computing Language) program, the stream handling procedure that string routine design language (Brook) designs is translated on multi-core CPU and is performed by the people such as S.-W.Liao.And multinuclear CUDA (MCUDA, Multicore-CUDA) the CUDA C compiler (CUDA-X86) and towards x86 platform is then devoted to CUDA program portable to not having graphic process unit (GPU, Graphics Processing Unit) multi-core CPU system on perform, it is desirable to promote thus universal in whole high performance parallel process field of CUDA framework.
In the programming model that CUDA, Opencl are such, allow the thread that user's usage quantity is huge, this is that present multi-core CPU system cannot be supported, the implementation that therefore similar MCUDA and CUDA-x86 is such must in the face of merging the problem performing large-scale parallel thread.In order to ensure the semanteme of original program, based on thread synchronization operation, program can be divided into multiple loop body, and each loop body circulation performs several parallel threads, is merged into serial on a core performs by these parallel threads.Will ensure the consistance of original thread private data between such thread loops, this needs private data to expand, and namely scalar becomes array, and one-dimension array becomes two-dimensional array etc.As shown in Figure 1, the data a of double-precision floating point type is expanded to the array a [THREAD_NUM] of double-precision floating point type, two-dimensional array d [X] [Y] of integer is expanded to integer three-dimensional array d [THREAD_NUM] [X] [Y] etc.The size of expansion is merge the Thread Count performed, and the THREAD_NUM namely shown in figure, could retain respective data for each thread like this, ensures the correctness performed.
Can find out, the violent expansion of stack space can be caused after Data expansion, affect the execution efficiency of program.In order to address this problem, conventional processing mode is the definite value reference information by situational variables, finds out the variable (i.e. thread local variable) be confined to life cycle in single thread loops.For this class variable, can carry out Data expansion operation, the method can find introduction in the relevant documentation of MCUDA.As the loop variable a in Fig. 2, its definite value is all confined in a thread loops, that is: in for loop statement with quoting.Merging in the thread loops after performing, each thread can, to this variable again assignment and use, therefore not need to carry out Data expansion.
But only use this optimal way still can leave over lower many variablees needing Data expansion, the violent expansion issues of stack space is not effectively alleviated, and brings huge challenge to the program optimization in later stage.Thus how to reduce unnecessary thread private data expansion, the execution efficiency improving program becomes those skilled in the art's technical barrier urgently to be resolved hurrily.
Be in the Chinese patent application of CN101937367A in publication No., disclose more related contents.
Summary of the invention
Technical matters to be solved by this invention reduces unnecessary thread private data expansion, alleviates the violent expansion of stack space, improve the execution efficiency of program.
In order to solve the problem, the invention provides a kind of Data expansion optimization method merging execution large-scale parallel thread, comprising:
Thread invariant is identified in the parallel thread merging execution; Described thread invariant was consistent in each merging in the parallel thread performed;
In compilation process, only to non-thread invariant, carry out Data expansion.
Optionally, described identification thread invariant comprises:
Utilize the structured features of read-write correlativity between the definite value reference information of variable, variable and program, structure complete graph, identifies thread invariant automatically.
Optionally, described structure complete graph, identifies that thread invariant comprises automatically:
Set up thread correlated variables information table;
Syntax tree information is scanned in units of function; If in scanning syntax tree process, have newly-increased variable in described thread correlated variables information table, then rescan described syntax tree, until the variable in described thread correlated variables information table no longer includes change;
Variable not in described thread correlated variables information table, is described thread invariant.
Optionally, described in units of function, scan syntax tree information before, also comprise:
Carry out initialization to described thread correlated variables information table, the variable can distinguishing different threads is added in described thread correlated variables information table.
Optionally, the described syntax tree information that scans in units of function comprises:
Correlation detection is carried out to variable newly-increased in thread correlated variables information table, described thread correlated variables information table is entered in the variable increase relevant to newly-increased variable.
Optionally, the described variable to increasing newly in thread correlated variables information table carries out correlation detection, the variable increase relevant to newly-increased variable is entered described thread correlated variables information table and comprises:
Find out the statement relevant to variable newly-increased in current thread correlated variables information table, find the variable be assigned in this statement, described variable is added in described thread correlated variables information table;
If this statement is the code block that condition performs, then the assigned variable related in this code block is all added in described thread correlated variables information table;
If this statement be goto statement and the behavior of goto not to be whole thread consistent, then the variable related between goto statement and label label corresponding to described goto statement is all added in described thread correlated variables information table.
Optionally, described thread correlated variables information table is chained list.
Optionally, described method also comprises:
Identify the constant statement of thread;
According to the program structure feature of the constant statement of described thread, program code conversion is carried out to the constant statement of described thread.
Optionally, the constant statement of described thread comprises: the statement be made up of thread invariant and constant completely.
Optionally, the constant statement of described identification thread comprises: utilize the automatic analysis program statement of compiler, identifies the constant statement of thread.
Optionally, the described program structure feature according to the constant statement of thread, the step of the constant statement of thread being carried out to program code conversion comprises:
If the thread invariant forming the constant statement of this thread is thread local variable, then the constant statement of thread described in double counting;
If the thread invariant forming the constant statement of this thread is not thread local variable, then analyze the read-write properties of described thread invariant;
Be correlated with if described thread invariant exists writeafterread in certain thread loops, then utilize copy to retain and recover described thread invariant;
Be correlated with if described thread invariant does not exist writeafterread in certain thread loops, then judge to perform or the constant statement of thread described in double counting;
If all statements in described thread loops are the constant statement of thread judging to perform, then get rid of the control structure of described thread loops, single pass performs the statement in described thread loops.
Optionally, described utilize copy retain recovery comprise:
Create the copy of a described thread invariant;
Before described thread loops, add an assignment statement, be the value of current thread invariant by the value assignment of copy;
In thread loops inside at the beginning, adding an assignment statement, is the value of described copy by the initial value assignment of thread invariant described in each thread.
Optionally, described judgement performs or described in double counting, the constant statement of thread comprises:
Statistics judges the constant statement number of thread performed continuously;
When the constant statement number of described thread exceedes predetermined threshold value, then judge to perform the constant statement of described thread; Otherwise, the constant statement of thread described in double counting.
Optionally, described predetermined threshold value is 1 ~ 10.
Optionally, the step of the constant statement of described judgement execution thread comprises:
Add if statement in the outside of the constant statement of described thread, whether the Rule of judgment of described if statement is for being first thread number, and the constant statement of described thread only performs one time in first thread.
Compared with prior art, technical scheme of the present invention has the following advantages:
1, the present invention is by analyzing variable information, identify thread invariant, Data expansion process is not carried out for thread invariant, decrease Data expansion amount, the violent expansion issues of the stack space brought after can alleviating merging, alleviates the pressure of scratch-pad storage (SPM, Scratch Pad Memory) or Cache and to miss the target phenomenon, for the further optimization of program provides advantage, improve the execution efficiency of program.
2, in possibility, also identify the constant statement of thread further, and according to the different characteristic of program structure, the constant statement of thread is processed, ensure that program performs correctness.
3, in possibility, by constructing the mode identification thread invariant of complete graph, identifying is more simple, convenient.
Accompanying drawing explanation
Fig. 1 is thread private data expansion schematic diagram;
Fig. 2 is thread local variable schematic diagram;
Fig. 3 is the schematic flow sheet that the present invention merges the first embodiment of the Data expansion optimization method performing large-scale parallel thread;
Fig. 4 is for identifying the schematic flow sheet of thread invariant step in the first embodiment shown in Fig. 3;
Fig. 5 is the program schematic diagram of an embodiment of identification thread invariant step of the present invention;
Fig. 6 is phase results schematic diagram embodiment illustrated in fig. 5;
Fig. 7 is the schematic flow sheet that the present invention merges the second embodiment of the Data expansion optimization method performing large-scale parallel thread;
Fig. 8 is for carrying out the schematic flow sheet of program code shift step to the constant statement of thread in the second embodiment described in Fig. 7;
Fig. 9 is the program schematic diagram constant statement of thread being carried out to an embodiment of program code shift step of the present invention;
Figure 10 be embodiment illustrated in fig. 9 carry out after program code conversion program schematic diagram.
Embodiment
Set forth a lot of detail in the following description so that fully understand the present invention.But the present invention can be much different from alternate manner described here to implement, those skilled in the art can when without prejudice to doing similar popularization when intension of the present invention, therefore the present invention is by the restriction of following public concrete enforcement.
Secondly, the present invention utilizes schematic diagram to be described in detail, and when describing the embodiment of the present invention in detail, for ease of illustrating, described schematic diagram is example, and it should not limit the scope of protection of the invention at this.
Inventor finds in practice: there are some variablees like this, that is: thread invariant, it was consistent in each merging in the parallel thread performed, and can not carry out Data expansion to this class variable, to reduce Data expansion amount further, thus alleviate the violent expansion issues of stack space.Described thread invariant includes thread local variable of the prior art, but is not limited to thread local variable, and compared to the optimization method of prior art, this method diminishbb Data expansion amount is larger.
The invention provides a kind of Data expansion optimization method merging execution large-scale parallel thread.Fig. 3 is the schematic flow sheet that the present invention merges the first embodiment of the Data expansion optimization method performing large-scale parallel thread.As shown in Figure 3, this embodiment comprises the following steps:
Perform step S101, in the parallel thread merging execution, identify thread invariant.Described thread invariant was consistent in each merging in the parallel thread performed.
Perform step S102, judge whether current variable is thread invariant.If so, then perform step S103, in compilation process, Data expansion is not carried out to current variable.If not, then perform step S104, Data expansion is carried out to current variable.
Perform step S105, judge whether that each variable all processes.If so, then terminate.Otherwise, circulate from step S102 place, continue next variable of process.
Fig. 4 is for identifying the schematic flow sheet of thread invariant step in the first embodiment shown in Fig. 3.Particularly, can utilize the structured features of read-write correlativity between the definite value reference information of variable, variable and program, structure complete graph, identifies thread invariant automatically.As shown in Figure 4, identify that the step of thread invariant can comprise:
Perform step S1010, set up thread correlated variables information table.
Perform step S1011, initialization, the variable can distinguishing different threads is added in thread correlated variables information table.
Then, repeatedly in units of function, scan syntax tree information, correlation detection is carried out to variable newly-increased in thread correlated variables information table, described thread correlated variables information table is entered in the variable increase relevant to newly-increased variable, until the variable in thread correlated variables information table no longer increases.Particularly, comprising: perform step S1012, judge whether there is newly-increased variable in thread correlated variables information table.If have newly-increased variable in thread correlated variables information table, then perform step S1013, find out the statement relevant to variable newly-increased in current thread correlated variables information table.Perform step S1014, find the thread variable be assigned in this statement, described variable is added in described thread correlated variables information table.Perform step S1015, judge whether this statement is the code block that condition performs.If so, then perform step S10106, the assigned variable related to all is added in described thread correlated variables information table in this code block.If this statement is not the code block that condition performs, then directly performs step S10107, judge whether this statement is goto statement and the behavior of goto is not that whole thread is consistent further.If goto statement and the behavior of goto is not that whole thread is consistent, then perform S1018, the variable related between goto statement and label label corresponding to described goto statement is all added in described thread correlated variables information table, then circulates from step S1012 place.If this statement is not goto statement, although or be the behavior of goto statement goto be that whole thread is consistent, then directly to circulate from step S1012 place.
If not newly-increased variable in thread correlated variables information table, then perform step S1019, the variable not in thread correlated variables information table, is thread invariant.
It should be noted that, it will be understood by those skilled in the art that thread correlated variables information table provides storage space for the centralized stores of thread correlated variables, its data structure can be tree, array, queue, chained list etc., and the present invention is not specifically limited this.
Below in conjunction with the drawings and specific embodiments to identifying in the present invention that thread invariant step is described further.
Fig. 5 is the program schematic diagram of an embodiment of identification thread invariant step of the present invention.Fig. 6 is phase results schematic diagram embodiment illustrated in fig. 5.
As shown in Figure 5, in the present embodiment, N represents total number of threads, and tid represents the thread number index of thread loops, and thread_Idx represents built-in thread index, and block_Idx is constant for the thread in thread_loop.A, B, As are global variable, do not need to carry out Data expansion.Bx, tx, b, a, stride are thread variable, conventionally, all need to carry out Data expansion.
Now above-mentioned thread variable is identified, if thread invariant, then can not carry out Data expansion.
With reference to figure 6, in the present embodiment, adopt the data structure of chained list, storage thread variable information.
First, a thread correlated variables chained list is set up.
Then, carry out initialization, will represent that the thread_Idx of built-in thread index is added in described thread correlated variables chained list.By representing that the thread_Idx of built-in thread index can distinguish variant thread.
Then, owing to the addition of variable thread_Idx, chained list length increases, so carry out correlation detection to thread_Idx.Find the statement that relevant to newly-increased variable thread_Idx, that is: tx [tid]=thread_Idx*bx [tid].Find the variable tx be assigned in this statement, variable tx is added in described thread correlated variables chained list.This statement, neither goto statement neither the code block that performs of condition, so, the correlation detection of newly-increased variable thread_Idx is terminated.
Owing to having increased variable tx newly in thread correlated variables chained list, so, need to continue to carry out correlation detection to variable tx.Find the statement that relevant to newly-increased variable tx, that is: b [tid] +=As [stride [tid]+tx [tid]].Find the variable b be assigned in this statement, variable b is added in described thread correlated variables chained list.This statement, neither goto statement neither the code block that performs of condition, so, the correlation detection of newly-increased variable b is terminated.
Owing to having increased variable b newly in thread correlated variables chained list, so, need to continue to carry out correlation detection to newly-increased variable b.Find the statement that three relevant to newly-increased variable b.
1, b [tid]=0, wherein b is the object be assigned, and b, in thread correlated variables chained list, adds without the need to repeating;
2, b [tid] +=As [stride [tid]+tx [tid]], same b is the object be assigned, and b, in thread correlated variables chained list, adds without the need to repeating;
3, B [tx [tid]+a [tid]]=b [idx], the object B be wherein assigned is global variable, is not thread variable, does not need to carry out Data expansion.
Meanwhile, above-mentioned statement neither the code block that performs of condition or goto statement, so, the correlation detection of newly-increased variable b is terminated, newly-increased variable in thread variable chained list.
So far, complete and identify thread invariant, that is: the stack variable tx in thread variable chained list and b is thread variable, needs to carry out Data expansion; And stack variable bx not in thread variable chained list, a, stride are thread invariant, Data expansion can not be carried out.
Fig. 7 is the schematic flow sheet that the present invention merges the second embodiment of the Data expansion optimization method performing large-scale parallel thread.With last embodiment unlike, also code change has been carried out to the constant statement of thread in this embodiment, to ensure the correctness that program performs.As shown in Figure 7, this embodiment comprises the following steps:
Perform step S201, in the parallel thread merging execution, identify thread invariant.
Perform step S202, judge whether current variable is thread invariant.If not, then perform step S206, data augmentation is carried out to current variable.
If current variable is thread invariant, then performs step S203, Data expansion is not carried out to current variable.Perform step S204, identify the constant statement of thread.The constant statement of described thread comprises the statement be made up of thread invariant and constant completely.Particularly, the automatic analysis program statement of compiler can be utilized, identify the constant statement of thread.It will be understood by those skilled in the art that in prior art, existing multiple compiler provides above-mentioned functions.
Perform step S205, according to the program structure feature of the constant statement of thread, program code conversion is carried out to the constant statement of thread.
Perform step S207, judge whether that each variable all processes.If so, then terminate.If not, then from step S202, continue next variable of process.
Fig. 8 is for carrying out the schematic flow sheet of program code shift step to the constant statement of thread in the second embodiment described in Fig. 7.As shown in Figure 8, carry out program code shift step to the constant statement of thread to comprise:
Perform step S2050, judge whether the thread invariant forming the constant statement of this thread is thread local variable.If thread local variable, then perform step S2051, the constant statement of this thread can not process, and to avoid the structural complexity of increase program, then when merging execution, the constant statement of this thread can by double counting.
If the thread invariant forming the constant statement of this thread is not thread local variable, then continue to perform step S2052, judge whether thread invariant exists writeafterread and be correlated with in certain thread loops.If there is writeafterread to be correlated with, then utilize copy to retain and recover.Particularly, create the copy of this thread invariant, before this whole thread loops, add an assignment statement, the value of copy is composed as the current constant value of thread.And in thread loops inside at the beginning, adding an assignment statement, is the copy of current setting by the initial value assignment of this thread invariant in each thread.
If there is not writeafterread to be correlated with, then continue to perform step S2054, judge that continuous print judges whether the constant statement number of thread performed exceedes predetermined threshold value.If not, then perform step S2051, adopt and repeat the constant statement of described thread, to avoid increasing if statement, increase the complexity of program structure, and simultaneously may because judge frequently to affect efficiency.If judge continuously, the constant statement number of thread performed exceedes predetermined threshold value, then perform step S2055, judges to perform the constant statement of this thread.Particularly, add if block in the outside of the constant statement of this thread, whether Rule of judgment is for being first thread number, and namely this statement is only by first thread execution one time.Further, many judge that the statement performed can be merged in same if block continuously.
It should be noted that, considering the statement number judging continuously to perform, is the compromise based on calculated amount and program complexity.Adopt and judge to perform, then the constant statement of this thread only need perform one time, decreases calculated amount, but simultaneously owing to adding if judge module, adds the complexity of program.And adopting the double counting do not processed, calculated amount increases to some extent, and the complexity of program does not change.Inventor is through repeatedly testing, and preferred predetermined threshold value is 1 ~ 10, that is: continuously many meet when judging that the statement number of executive condition is more than or equal to 1 ~ 10, then these statements all adopt and judge execution, and are all merged in an if block.If be less than 1 ~ 10, then all adopt double counting, avoid increasing if statement, the complexity increasing program structure also simultaneously may because judge to affect efficiency frequently.
Continue with reference to figure 8, judge to perform according to step S2055, then continue to perform step S2056, judge whether all statements in thread loops are the constant statement of thread carrying out judging to perform.If so, then perform step S2057, get rid of the control structure of this thread loops, the statement in circulation becomes single pass and performs.
Below in conjunction with the drawings and specific embodiments, technical scheme of the present invention is described further.
Fig. 9 is the program schematic diagram constant statement of thread being carried out to an embodiment of program code shift step of the present invention.Figure 10 be embodiment illustrated in fig. 9 carry out after program code conversion program schematic diagram.In conjunction with reference to figure 9, Figure 10.
According to program code order, first find the constant statement of thread, bx=block_Idx.Thread invariant bx only in this thread loops for customization is quoted, be a thread local variable.Therefore, do not need to process to this statement, in this thread loops, this statement can by double counting.
Then, the constant statement of thread is found, a=bx*32.Variable a is also cited beyond this thread loops (that is: in statement B [tx [tid]+a]=b [idx]; ), so variable a is not thread local variable, so this statement can not adopt double counting.And variable a does not exist writeafterread in this thread loops is correlated with, therefore, this statement adopts and judges to perform, that is: add if statement in this statement outside, Rule of judgment for whether being first thread, that is: if (tid==0).It should be noted that, this statement also can be selected to adopt double counting according to threshold value facilities.
Then, the constant statement of thread is found, stride=512.With regard to this statement in this thread loops, and this statement is the constant statement of thread (analytic process, as statement a=bx*32, repeats no more herein) that can judge to perform.So can remove the control structure of this circulation, change single pass into and perform, that is: remove thread_loop statement, stride=512 only performs one time.
Then, the constant statement of thread is found, stride>>=1.There is writeafterread and be correlated with in thread invariant stride, therefore need to utilize copy to retain to statement stride>>=1 and recover in this thread loops.Particularly, before circulation starts, increase an assignment statement, utilize copy to record currency, that is: _ stride_0=stride.And in circulation at the beginning, increase an assignment statement, the value of thread invariant stride is reverted to copy value again, that is: stride=_strid_0.
It should be noted that, through the above description of the embodiments, those skilled in the art can be well understood to and of the present inventionly partly or entirely can to realize in conjunction with required general hardware platform by software.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product can comprise the one or more machine readable medias it storing machine-executable instruction, and these instructions can make this one or more machine carry out executable operations according to embodiments of the invention when being performed by one or more machine such as such as computing machine, computer network or other electronic equipments etc.Machine readable media can comprise, but be not limited to, floppy disk, CD, CD-ROM(compact-disc-ROM (read-only memory)), magneto-optic disk, ROM(ROM (read-only memory)), RAM(random access memory), EPROM(Erasable Programmable Read Only Memory EPROM), EEPROM(Electrically Erasable Read Only Memory), magnetic or optical card, flash memory or be suitable for the medium/machine readable media of other types of storing machine executable instruction.
The present invention can be used in numerous general or special purpose computing system environment or configuration.Such as: personal computer, server computer, handheld device or portable set, laptop device, multicomputer system, system based on microprocessor, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, comprise the distributed computing environment etc. of above any system or equipment.
The present invention can describe in the general context of computer executable instructions, such as program module.Usually, program module comprises the routine, program, object, assembly, data structure etc. that perform particular task or realize particular abstract data type.Also can put into practice the application in a distributed computing environment, in these distributed computing environment, be executed the task by the remote processing devices be connected by communication network.In a distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium comprising memory device.
Although the present invention with preferred embodiment openly as above; but it is not for limiting the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; the Method and Technology content of above-mentioned announcement can be utilized to make possible variation and amendment to technical solution of the present invention; therefore; every content not departing from technical solution of the present invention; the any simple modification done above embodiment according to technical spirit of the present invention, equivalent variations and modification, all belong to the protection domain of technical solution of the present invention.

Claims (15)

1. merge the Data expansion optimization method performing large-scale parallel thread, it is characterized in that, comprising:
Thread invariant is identified in the parallel thread merging execution; Described thread invariant was consistent in each merging in the parallel thread performed;
In compilation process, only to non-thread invariant, carry out Data expansion;
Described identification thread invariant comprises:
Utilize the structured features of read-write correlativity between the definite value reference information of variable, variable and program, structure complete graph, identifies thread invariant automatically;
Described structure complete graph, identifies that thread invariant comprises automatically:
Set up thread correlated variables information table;
Syntax tree information is scanned in units of function; If in scanning syntax tree process, have newly-increased variable in described thread correlated variables information table, then rescan described syntax tree, until the variable in described thread correlated variables information table no longer includes change;
Variable not in described thread correlated variables information table, is described thread invariant;
Described in units of function, scan syntax tree information before, also comprise:
Carry out initialization to described thread correlated variables information table, the variable can distinguishing different threads is added in described thread correlated variables information table;
The described syntax tree information that scans in units of function comprises:
Correlation detection is carried out to variable newly-increased in thread correlated variables information table, described thread correlated variables information table is entered in the variable increase relevant to newly-increased variable;
The described variable to increasing newly in thread correlated variables information table carries out correlation detection, the variable increase relevant to newly-increased variable is entered described thread correlated variables information table and comprises:
Find out the statement relevant to variable newly-increased in current thread correlated variables information table, find the variable be assigned in this statement, described variable is added in described thread correlated variables information table;
If this statement is the code block performed of having ready conditions, then the assigned variable related in this code block is all added in described thread correlated variables information table;
If this statement be goto statement and the behavior of goto not to be whole thread consistent, then the variable related between goto statement and label label corresponding to described goto statement is all added in described thread correlated variables information table.
2. the Data expansion optimization method merging execution large-scale parallel thread as claimed in claim 1, is characterized in that: described thread correlated variables information table is chained list.
3. the Data expansion optimization method merging execution large-scale parallel thread as claimed in claim 1, it is characterized in that, described method also comprises:
Identify the constant statement of thread;
According to the program structure feature of the constant statement of described thread, program code conversion is carried out to the constant statement of described thread.
4. the Data expansion optimization method merging execution large-scale parallel thread as claimed in claim 3, it is characterized in that, the constant statement of described thread comprises: the statement be made up of thread invariant and constant completely.
5. the Data expansion optimization method merging execution large-scale parallel thread as claimed in claim 3, it is characterized in that, the constant statement of described identification thread comprises: utilize the automatic analysis program statement of compiler, identifies the constant statement of thread.
6. the Data expansion optimization method merging execution large-scale parallel thread as claimed in claim 3, it is characterized in that, the described program structure feature according to the constant statement of thread, the step of the constant statement of thread being carried out to program code conversion comprises:
If the thread invariant forming the constant statement of this thread is thread local variable, then the constant statement of thread described in double counting;
If the thread invariant forming the constant statement of this thread is not thread local variable, then analyze the read-write properties of described thread invariant;
Be correlated with if described thread invariant exists writeafterread in certain thread loops, then utilize copy to retain and recover described thread invariant;
Be correlated with if described thread invariant does not exist writeafterread in certain thread loops, then judge to perform or the constant statement of thread described in double counting;
If all statements in described thread loops are the constant statement of thread judging to perform, then get rid of the control structure of described thread loops, single pass performs the statement in described thread loops.
7. as claimed in claim 6 merge the Data expansion optimization method performing large-scale parallel thread, it is characterized in that, describedly utilize copy to retain recovery to comprise:
Create the copy of a described thread invariant;
Before described thread loops, add an assignment statement, be the value of current thread invariant by the value assignment of copy;
In thread loops inside at the beginning, adding an assignment statement, is the value of described copy by the initial value assignment of thread invariant described in each thread.
8. the Data expansion optimization method merging execution large-scale parallel thread as claimed in claim 6, is characterized in that, described judgement performs or described in double counting, the constant statement of thread comprises:
Statistics judges the constant statement number of thread performed continuously;
When the constant statement number of described thread exceedes predetermined threshold value, then judge to perform the constant statement of described thread; Otherwise, the constant statement of thread described in double counting.
9. the Data expansion optimization method merging execution large-scale parallel thread as claimed in claim 8, it is characterized in that, described predetermined threshold value is 1 ~ 10.
10. the Data expansion optimization method merging execution large-scale parallel thread as claimed in claim 8, it is characterized in that, the step of the constant statement of described judgement execution thread comprises:
Add if statement in the outside of the constant statement of described thread, whether the Rule of judgment of described if statement is for being first thread number, and the constant statement of described thread only performs one time in first thread.
11. 1 kinds merge the Data expansion optimization method performing large-scale parallel thread, it is characterized in that, comprising:
Thread invariant is identified in the parallel thread merging execution; Described thread invariant was consistent in each merging in the parallel thread performed;
In compilation process, only to non-thread invariant, carry out Data expansion;
Described method also comprises:
Identify the constant statement of thread;
According to the program structure feature of the constant statement of described thread, program code conversion is carried out to the constant statement of described thread;
The described program structure feature according to the constant statement of thread, the step of the constant statement of thread being carried out to program code conversion comprises:
If the thread invariant forming the constant statement of this thread is thread local variable, then the constant statement of thread described in double counting;
If the thread invariant forming the constant statement of this thread is not thread local variable, then analyze the read-write properties of described thread invariant;
Be correlated with if described thread invariant exists writeafterread in certain thread loops, then utilize copy to retain and recover described thread invariant;
Be correlated with if described thread invariant does not exist writeafterread in certain thread loops, then judge to perform or the constant statement of thread described in double counting;
If all statements in described thread loops are the constant statement of thread judging to perform, then get rid of the control structure of described thread loops, single pass performs the statement in described thread loops.
12. merge as claimed in claim 11 the Data expansion optimization methods performing large-scale parallel threads, it is characterized in that, describedly utilize copy to retain recovery to comprise:
Create the copy of a described thread invariant;
Before described thread loops, add an assignment statement, be the value of current thread invariant by the value assignment of copy;
In thread loops inside at the beginning, adding an assignment statement, is the value of described copy by the initial value assignment of thread invariant described in each thread.
13. merge the Data expansion optimization method performing large-scale parallel thread as claimed in claim 11, it is characterized in that, described judgement performs or described in double counting, the constant statement of thread comprises:
Statistics judges the constant statement number of thread performed continuously;
When the constant statement number of described thread exceedes predetermined threshold value, then judge to perform the constant statement of described thread; Otherwise, the constant statement of thread described in double counting.
14. merge the Data expansion optimization method performing large-scale parallel thread as claimed in claim 13, and it is characterized in that, described predetermined threshold value is 1 ~ 10.
15. merge the Data expansion optimization method performing large-scale parallel thread as claimed in claim 13, and it is characterized in that, the step of the constant statement of described judgement execution thread comprises:
Add if statement in the outside of the constant statement of described thread, whether the Rule of judgment of described if statement is for being first thread number, and the constant statement of described thread only performs one time in first thread.
CN201210441329.2A 2012-11-06 2012-11-06 Merge the Data expansion optimization method performing large-scale parallel thread Active CN102981839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210441329.2A CN102981839B (en) 2012-11-06 2012-11-06 Merge the Data expansion optimization method performing large-scale parallel thread

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210441329.2A CN102981839B (en) 2012-11-06 2012-11-06 Merge the Data expansion optimization method performing large-scale parallel thread

Publications (2)

Publication Number Publication Date
CN102981839A CN102981839A (en) 2013-03-20
CN102981839B true CN102981839B (en) 2015-08-12

Family

ID=47855904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210441329.2A Active CN102981839B (en) 2012-11-06 2012-11-06 Merge the Data expansion optimization method performing large-scale parallel thread

Country Status (1)

Country Link
CN (1) CN102981839B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391742B (en) * 2014-11-11 2019-03-01 小米科技有限责任公司 Optimizing application method and apparatus
CN110069243B (en) * 2018-10-31 2023-03-03 上海奥陶网络科技有限公司 Java program thread optimization method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FILIC:一种CUDA上的交互型库函数框架;吴伟等;《计算机科学》;20120330;第39卷(第3期);全文 *
Johan A. Stratton .ET.MCUDA: an efficient implementation of CUDA kernels for multi-core CPUs.《languages and compliers for parallel computing》.2008, *
动态二进制翻译中的指令调度技术研究与实现;孙俊等;《计算机应用与软件》;20080131;第25卷(第1期);全文 *

Also Published As

Publication number Publication date
CN102981839A (en) 2013-03-20

Similar Documents

Publication Publication Date Title
Dave et al. Cetus: A source-to-source compiler infrastructure for multicores
Fauzia et al. Characterizing and enhancing global memory data coalescing on GPUs
US10007605B2 (en) Hardware-based array compression
Hayashi et al. Machine-learning-based performance heuristics for runtime cpu/gpu selection
CN104536898B (en) The detection method of c program parallel regions
Liao et al. Semantic-aware automatic parallelization of modern applications using high-level abstractions
CN109643260A (en) Resource high-efficiency using the data-flow analysis processing of analysis accelerator accelerates
CN111078279A (en) Processing method, device and equipment of byte code file and storage medium
CN109062636A (en) A kind of data processing method, device, equipment and medium
CN102981839B (en) Merge the Data expansion optimization method performing large-scale parallel thread
Tripathy et al. Localityguru: A ptx analyzer for extracting thread block-level locality in gpgpus
CN104142819A (en) File processing method and device
CN105511867A (en) Optimization mode automatic generation method and optimization device
Wu et al. Bandwidth-aware loop tiling for dma-supported scratchpad memory
Prabhu et al. DAME: A runtime-compiled engine for derived datatypes
CN105487911A (en) Compilation instruction based many-core data fragmentation method
Ozturk et al. A performance portability study using tensor contraction benchmarks
US20230305949A1 (en) Static and automatic inference of inter-basic block burst transfers for high-level synthesis
Fabeiro et al. OCLoptimizer: An iterative optimization tool for OpenCL
Laurenzano et al. A static binary instrumentation threading model for fast memory trace collection
Van Der Spek et al. Sublimation: expanding data structures to enable data instance specific optimizations
Yuki et al. Memory allocations for tiled uniform dependence programs
CN102866893B (en) Legacy software structure extracting method based on intermediate language IL
Deo et al. Performance and metrics analysis between python3 via mojo
Benz et al. Scenario-aware program specialization for timing predictability

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant