CN103955406A - Super block-based based speculation parallelization method - Google Patents
Super block-based based speculation parallelization method Download PDFInfo
- Publication number
- CN103955406A CN103955406A CN201410146566.5A CN201410146566A CN103955406A CN 103955406 A CN103955406 A CN 103955406A CN 201410146566 A CN201410146566 A CN 201410146566A CN 103955406 A CN103955406 A CN 103955406A
- Authority
- CN
- China
- Prior art keywords
- superblock
- dependence
- register
- executable file
- static analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Devices For Executing Special Programs (AREA)
Abstract
The invention discloses a super block-based speculation parallelization method which comprises the following steps: dividing a program in an executable file into each super block; statically analyzing data dependence of a register between the super blocks; eliminating counter dependence and output dependence obtained by static analysis from the executable file; writing true data obtained by static analysis into the executable file; performing speculation execution on the program in the executable file on a multi-core processor by taking the super blocks as units. The super block-based speculation parallelization method can be used for eliminating the counter dependence and output dependence of the register between the super blocks; when the multi-core processor takes the super blocks as scheduling units and is used for preventing the speculative execution multiple super blocks with true data dependence on the processor at the same time; the super block-based speculation parallelization method can be used for improving the program parallelism, and reducing the speculation fault risk caused by a situation that processor scheduling particles become thinner.
Description
Technical field
The present invention relates to a kind of computing machine parallel method, especially relate to a kind of speculative parallelism method based on superblock in computer compile technology and architecture field.
Background technology
Polycaryon processor is the development trend of current computer processor, however programmer uncomfortablely write parallel program.Although the core amounts of processor is more and more, polycaryon processor being underutilized in most of the cases.
Superblock is the technique of compiling for very-long instruction word processor and superscalar processor design.It has broken through the restriction of program fundamental block, has further developed the concurrency of program.
Processor is to take thread as basic thread at present, and scheduling size ratio superblock wants thick, and concurrency can further be excavated.
Congenial execution is a kind of hardware technology, and it can allow the out of order execution of instruction, finally completes according to the order of sequence simultaneously.When introducing the basic thread of superblock as processor, can attenuate because of scheduling granularity, thereby increase the risk failing in speculation.
Summary of the invention
The object of this invention is to provide a kind of speculative parallelism method based on superblock, it can make full use of the multinuclear resource of polycaryon processor.
The technical solution used in the present invention comprises the following steps:
1) by the procedure division in executable file, be each superblock;
2) data dependence of the register between static analysis superblock;
3) antidependence static analysis being obtained and output dependence are eliminated from executable file;
4) true data dependence static analysis being obtained writes executable file;
5) program in executable file be take to superblock as thread congenial execution on polycaryon processor.
The data dependence of the register between the static analysis superblock described step 2) specifically comprises the following steps:
2.1) scan each superblock, gather the register read write operation in wherein all superblock instructions;
2.2) according to step 2.1) gathering of obtaining, obtain true data dependence, antidependence and output dependence.
Between each core of polycaryon processor in described step 5), share register file, when a plurality of superblocks of existing true data to rely on while making to speculate to carry out are different on polycaryon processor, carry out.
The invention has the beneficial effects as follows:
The present invention has eliminated the register dependence between superblock.When polycaryon processor is using superblock as thread and while avoiding existing a plurality of superblocks that true data relies on to speculate to carry out on processor simultaneously, the present invention can improve the concurrency of program, reduces the risk failing in speculation of bringing when processor scheduling granularity attenuates.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of performing step of the present invention.
Fig. 2 is the enforcement schematic diagram that true data of the present invention relies on data flow diagram.
Fig. 3 is that the present invention is dispatched to superblock the enforcement illustration of three core processors.
Fig. 4 is the complete procedure schematic diagram of the embodiment of the present invention.
Embodiment
Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
As shown in Figure 1, of the present invention comprising the following steps:
1) by the procedure division in executable file, be each superblock;
2) data dependence of the register between static analysis superblock;
3) antidependence static analysis being obtained and output dependence are eliminated from executable file;
4) true data dependence static analysis being obtained writes executable file;
5) program in executable file be take to superblock as thread congenial execution on polycaryon processor.
The data dependence of the register between the static analysis superblock step 2) specifically comprises the following steps:
2.1) scan each superblock, gather the register read write operation in wherein all superblock instructions;
2.2) according to step 2.1) gathering of obtaining, obtain representing true data dependence, antidependence and output dependence.
Between each core of polycaryon processor in step 5), share register file, facilitate SYN register between a plurality of core, when a plurality of superblocks of existing true data to rely on while making to speculate to carry out are different on polycaryon processor, carry out.The present invention need to do the synchronous of register between the core of processor because the mutually continuous two-wheeled in front and back speculate the superblock carried out between, may there is data dependence, by sharing register file, undertaken synchronously.
Polycaryon processor is double-core and above processor.
Data dependence between superblock, can be divided into the data dependence of register and the data dependence of internal memory two classes.The numbering of register can directly be obtained in instruction, so can obtain the register read-write dependence between superblock by static analysis; As for the read-write of internal memory, its address is generally that register base address adds skew, could determine, so cannot obtain the data dependence between superblock by static analysis when base address often will be moved.
For data dependence that cannot static analysis, can only when operation, be speculated, inevitably will introduce the risk failing in speculation, therefore the static analysis before, at least obtained the register read-write dependence between superblock, avoid the congenial read operation of register, thereby reduce this part of speculative risk.
For the data dependence of the register between superblock, antidependence and the output that should eliminate between each superblock rely on, and retain true data and rely on.Antidependence also makes writeafterread WAR rely on, and output relies on also makes write after write WAW rely on, and true data relies on also makes read-after-write RAW rely on.
In concrete enforcement, during static analysis data dependence, introduce data flow diagram and represent that true data relies on.As shown in Figure 2, superblock of each box indicating in figure, numeral in square frame is the sequence number of superblock, arrow in square frame represents that the data that the superblock of arrow top connection writes will read in the superblock of arrow end connection, it is the top superblock that the superblock of end depends on, be that superblock 5 depends on superblock 2, superblock 3 and superblock 4, superblock 2, superblock 3 and superblock 4 depend on superblock 1.
After static analysis, need to will represent that the data flow diagram that true data relies on writes executable file.According to the data flow diagram of the superblock in executable file, implement the scheduling of superblock, just can avoid the different superblocks that exist true data to rely on to carry out simultaneously.
Embodiments of the invention:
Fig. 3 has illustrated once the superblock shown in Fig. 2 at least to be had to the example of dispatching on the processor of three cores at one.Wherein, C1, C2, C3 represent three cores of processor, and B1, B2, B3, B4, B5 represent five superblocks, and T1, T2, T3 represent three time periods.
During the invention process, between the superblock of operation, do not exist register data to rely on simultaneously, and do not mean that and between a plurality of core, do not need SYN register.Because the mutually continuous two-wheeled in front and back is speculated to carry out, between superblock, still can exist register data to rely on, and on may the offices different core of these superblocks.So, every take turns speculate to carry out and to start before, need the register value of synchronous all processor cores.
Illustrate, suppose to have 4 superblock: B
0, B
1, B
2, B
3, and there are two core C
0, C
1processor on move.The first round speculates to carry out, B
0and B
1be dispatched to respectively the core C of core processor
0with core C
1upper, B
0with B
1between do not have register data to rely on.Second takes turns congenial execution, B
2and B
3be dispatched to respectively core C
0with core C
1upper, B
2with B
3between do not have register data to rely on yet, so this scheduling meets our requirement.But B
3certain the register r reading in is by B
0write, then second take turns and speculate to carry out and to start before, we need to be the value of r from core C
0be synchronized to core C
1in go.
In order to reduce the synchronous time delay of register, between a plurality of core, share register file, and manage by the mode of register renaming.Processor core when speculate carrying out, the logic register that must not write direct, and the physical register of writing does not directly correspond on logic register, only just sets up correspondence when submitting to.
Fig. 4 has illustrated a complete process of the invention process, and in figure four arrows are from left to right divided into five parts figure.First is from left to right that the present invention needs executable file to be processed; The corresponding step 1) of the present invention of second portion and step 2), the procedure division being about in executable file is each superblock; The data dependence of the register between static analysis superblock; The corresponding step 3) of the present invention of third part and step 4), the antidependence that soon static analysis obtains and output rely on to be eliminated from executable file, and the true data that static analysis is obtained relies on and writes executable file; The corresponding step 5) of the present invention of the 4th part and the 5th part, the program being about in executable file be take superblock as thread congenial execution on polycaryon processor.The 4th part is that operating system completes, and the 5th part is the work of processor.The 5th part and Fig. 3 are corresponding.
Above-mentioned embodiment is used for the present invention that explains, rather than limits the invention, and in the protection domain of spirit of the present invention and claim, any modification and change that the present invention is made, all fall into protection scope of the present invention.
Claims (3)
1. the speculative parallelism method based on superblock, is characterized in that: comprise the following steps:
1) by the procedure division in executable file, be each superblock;
2) data dependence of the register between static analysis superblock;
3) antidependence static analysis being obtained and output dependence are eliminated from executable file;
4) true data dependence static analysis being obtained writes executable file;
5) program in executable file be take to superblock as thread congenial execution on polycaryon processor.
2. a kind of speculative parallelism method based on superblock according to claim 1, is characterized in that: the data dependence of the register between the static analysis superblock described step 2) specifically comprises the following steps:
2.1) scan each superblock, gather the register read write operation in wherein all superblock instructions;
2.2) according to step 2.1) gathering of obtaining, obtain true data dependence, antidependence and output dependence.
3. a kind of speculative parallelism method based on superblock according to claim 1, it is characterized in that: between each core of the polycaryon processor in described step 5), share register file, when a plurality of superblocks of existing true data to rely on while making to speculate to carry out are different on polycaryon processor, carry out.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410146566.5A CN103955406A (en) | 2014-04-14 | 2014-04-14 | Super block-based based speculation parallelization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410146566.5A CN103955406A (en) | 2014-04-14 | 2014-04-14 | Super block-based based speculation parallelization method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103955406A true CN103955406A (en) | 2014-07-30 |
Family
ID=51332681
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410146566.5A Pending CN103955406A (en) | 2014-04-14 | 2014-04-14 | Super block-based based speculation parallelization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103955406A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101515231A (en) * | 2009-03-23 | 2009-08-26 | 浙江大学 | Realization method for parallelization of single-threading program based on analysis of data flow |
CN101667135A (en) * | 2009-09-30 | 2010-03-10 | 浙江大学 | Interactive parallelization compiling system and compiling method thereof |
CN102043659A (en) * | 2010-12-08 | 2011-05-04 | 上海交通大学 | Compiling device for eliminating memory access conflict and implementation method thereof |
EP2372530A1 (en) * | 2008-11-28 | 2011-10-05 | Shanghai Xinhao Micro-Electronics Co. Ltd. | Data processing method and device |
CN103080900A (en) * | 2010-09-03 | 2013-05-01 | 西门子公司 | Method for parallelizing automatic control programs and compiler |
CN103377035A (en) * | 2012-04-12 | 2013-10-30 | 浙江大学 | Pipeline parallelization method for coarse-grained streaming application |
-
2014
- 2014-04-14 CN CN201410146566.5A patent/CN103955406A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2372530A1 (en) * | 2008-11-28 | 2011-10-05 | Shanghai Xinhao Micro-Electronics Co. Ltd. | Data processing method and device |
CN101515231A (en) * | 2009-03-23 | 2009-08-26 | 浙江大学 | Realization method for parallelization of single-threading program based on analysis of data flow |
CN101667135A (en) * | 2009-09-30 | 2010-03-10 | 浙江大学 | Interactive parallelization compiling system and compiling method thereof |
CN103080900A (en) * | 2010-09-03 | 2013-05-01 | 西门子公司 | Method for parallelizing automatic control programs and compiler |
CN102043659A (en) * | 2010-12-08 | 2011-05-04 | 上海交通大学 | Compiling device for eliminating memory access conflict and implementation method thereof |
CN103377035A (en) * | 2012-04-12 | 2013-10-30 | 浙江大学 | Pipeline parallelization method for coarse-grained streaming application |
Non-Patent Citations (1)
Title |
---|
张小强: "基于事务的软件投机并行机制研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105389158B (en) | Data processing system, compiler, method of processor, and machine-readable medium | |
Green et al. | GPU merge path: a GPU merging algorithm | |
US9672035B2 (en) | Data processing apparatus and method for performing vector processing | |
CN101777007B (en) | Parallel function simulation system for on-chip multi-core processor and method thereof | |
US8793692B2 (en) | Language for task-based parallel programming | |
US10013290B2 (en) | System and method for synchronizing threads in a divergent region of code | |
WO2020083050A1 (en) | Data stream processing method and related device | |
US7581222B2 (en) | Software barrier synchronization | |
JP2014216021A (en) | Processor for batch thread processing, code generation apparatus and batch thread processing method | |
US9830157B2 (en) | System and method for selectively delaying execution of an operation based on a search for uncompleted predicate operations in processor-associated queues | |
Anantpur et al. | Runtime dependence computation and execution of loops on heterogeneous systems | |
US9239732B2 (en) | Unrolling aggregation operations in asynchronous programming code having multiple levels in hierarchy | |
US20160055029A1 (en) | Programmatic Decoupling of Task Execution from Task Finish in Parallel Programs | |
CN111026444A (en) | GPU parallel array SIMT instruction processing model | |
Kiessling | An introduction to parallel programming with OpenMP | |
WO2018076979A1 (en) | Detection method and apparatus for data dependency between instructions | |
Zhang et al. | GPU-TLS: An efficient runtime for speculative loop parallelization on gpus | |
CN105446733B (en) | Data processing system, method for data processing system, and readable storage medium | |
Valero et al. | Towards a more efficient use of gpus | |
CN103955406A (en) | Super block-based based speculation parallelization method | |
Duarte et al. | On the performance and energy-efficiency of multi-core SIMD CPUs and CUDA-enabled GPUs | |
Tomiyama et al. | Automatic parameter optimization for edit distance algorithm on GPU | |
Gu et al. | Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor | |
Alyasseri et al. | Parallelize Bubble Sort Algorithm Using OpenMP | |
US10209997B2 (en) | Computer architecture for speculative parallel execution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140730 |
|
WD01 | Invention patent application deemed withdrawn after publication |