CN103955406A - Super block-based based speculation parallelization method - Google Patents

Super block-based based speculation parallelization method Download PDF

Info

Publication number
CN103955406A
CN103955406A CN201410146566.5A CN201410146566A CN103955406A CN 103955406 A CN103955406 A CN 103955406A CN 201410146566 A CN201410146566 A CN 201410146566A CN 103955406 A CN103955406 A CN 103955406A
Authority
CN
China
Prior art keywords
superblock
dependence
register
executable file
static analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410146566.5A
Other languages
Chinese (zh)
Inventor
李颂元
袁明敏
孟静磊
叶敏娇
陈天洲
施青松
刘莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201410146566.5A priority Critical patent/CN103955406A/en
Publication of CN103955406A publication Critical patent/CN103955406A/en
Pending legal-status Critical Current

Links

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a super block-based speculation parallelization method which comprises the following steps: dividing a program in an executable file into each super block; statically analyzing data dependence of a register between the super blocks; eliminating counter dependence and output dependence obtained by static analysis from the executable file; writing true data obtained by static analysis into the executable file; performing speculation execution on the program in the executable file on a multi-core processor by taking the super blocks as units. The super block-based speculation parallelization method can be used for eliminating the counter dependence and output dependence of the register between the super blocks; when the multi-core processor takes the super blocks as scheduling units and is used for preventing the speculative execution multiple super blocks with true data dependence on the processor at the same time; the super block-based speculation parallelization method can be used for improving the program parallelism, and reducing the speculation fault risk caused by a situation that processor scheduling particles become thinner.

Description

A kind of speculative parallelism method based on superblock
Technical field
The present invention relates to a kind of computing machine parallel method, especially relate to a kind of speculative parallelism method based on superblock in computer compile technology and architecture field.
Background technology
Polycaryon processor is the development trend of current computer processor, however programmer uncomfortablely write parallel program.Although the core amounts of processor is more and more, polycaryon processor being underutilized in most of the cases.
Superblock is the technique of compiling for very-long instruction word processor and superscalar processor design.It has broken through the restriction of program fundamental block, has further developed the concurrency of program.
Processor is to take thread as basic thread at present, and scheduling size ratio superblock wants thick, and concurrency can further be excavated.
Congenial execution is a kind of hardware technology, and it can allow the out of order execution of instruction, finally completes according to the order of sequence simultaneously.When introducing the basic thread of superblock as processor, can attenuate because of scheduling granularity, thereby increase the risk failing in speculation.
Summary of the invention
The object of this invention is to provide a kind of speculative parallelism method based on superblock, it can make full use of the multinuclear resource of polycaryon processor.
The technical solution used in the present invention comprises the following steps:
1) by the procedure division in executable file, be each superblock;
2) data dependence of the register between static analysis superblock;
3) antidependence static analysis being obtained and output dependence are eliminated from executable file;
4) true data dependence static analysis being obtained writes executable file;
5) program in executable file be take to superblock as thread congenial execution on polycaryon processor.
The data dependence of the register between the static analysis superblock described step 2) specifically comprises the following steps:
2.1) scan each superblock, gather the register read write operation in wherein all superblock instructions;
2.2) according to step 2.1) gathering of obtaining, obtain true data dependence, antidependence and output dependence.
Between each core of polycaryon processor in described step 5), share register file, when a plurality of superblocks of existing true data to rely on while making to speculate to carry out are different on polycaryon processor, carry out.
The invention has the beneficial effects as follows:
The present invention has eliminated the register dependence between superblock.When polycaryon processor is using superblock as thread and while avoiding existing a plurality of superblocks that true data relies on to speculate to carry out on processor simultaneously, the present invention can improve the concurrency of program, reduces the risk failing in speculation of bringing when processor scheduling granularity attenuates.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of performing step of the present invention.
Fig. 2 is the enforcement schematic diagram that true data of the present invention relies on data flow diagram.
Fig. 3 is that the present invention is dispatched to superblock the enforcement illustration of three core processors.
Fig. 4 is the complete procedure schematic diagram of the embodiment of the present invention.
Embodiment
Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
As shown in Figure 1, of the present invention comprising the following steps:
1) by the procedure division in executable file, be each superblock;
2) data dependence of the register between static analysis superblock;
3) antidependence static analysis being obtained and output dependence are eliminated from executable file;
4) true data dependence static analysis being obtained writes executable file;
5) program in executable file be take to superblock as thread congenial execution on polycaryon processor.
The data dependence of the register between the static analysis superblock step 2) specifically comprises the following steps:
2.1) scan each superblock, gather the register read write operation in wherein all superblock instructions;
2.2) according to step 2.1) gathering of obtaining, obtain representing true data dependence, antidependence and output dependence.
Between each core of polycaryon processor in step 5), share register file, facilitate SYN register between a plurality of core, when a plurality of superblocks of existing true data to rely on while making to speculate to carry out are different on polycaryon processor, carry out.The present invention need to do the synchronous of register between the core of processor because the mutually continuous two-wheeled in front and back speculate the superblock carried out between, may there is data dependence, by sharing register file, undertaken synchronously.
Polycaryon processor is double-core and above processor.
Data dependence between superblock, can be divided into the data dependence of register and the data dependence of internal memory two classes.The numbering of register can directly be obtained in instruction, so can obtain the register read-write dependence between superblock by static analysis; As for the read-write of internal memory, its address is generally that register base address adds skew, could determine, so cannot obtain the data dependence between superblock by static analysis when base address often will be moved.
For data dependence that cannot static analysis, can only when operation, be speculated, inevitably will introduce the risk failing in speculation, therefore the static analysis before, at least obtained the register read-write dependence between superblock, avoid the congenial read operation of register, thereby reduce this part of speculative risk.
For the data dependence of the register between superblock, antidependence and the output that should eliminate between each superblock rely on, and retain true data and rely on.Antidependence also makes writeafterread WAR rely on, and output relies on also makes write after write WAW rely on, and true data relies on also makes read-after-write RAW rely on.
In concrete enforcement, during static analysis data dependence, introduce data flow diagram and represent that true data relies on.As shown in Figure 2, superblock of each box indicating in figure, numeral in square frame is the sequence number of superblock, arrow in square frame represents that the data that the superblock of arrow top connection writes will read in the superblock of arrow end connection, it is the top superblock that the superblock of end depends on, be that superblock 5 depends on superblock 2, superblock 3 and superblock 4, superblock 2, superblock 3 and superblock 4 depend on superblock 1.
After static analysis, need to will represent that the data flow diagram that true data relies on writes executable file.According to the data flow diagram of the superblock in executable file, implement the scheduling of superblock, just can avoid the different superblocks that exist true data to rely on to carry out simultaneously.
Embodiments of the invention:
Fig. 3 has illustrated once the superblock shown in Fig. 2 at least to be had to the example of dispatching on the processor of three cores at one.Wherein, C1, C2, C3 represent three cores of processor, and B1, B2, B3, B4, B5 represent five superblocks, and T1, T2, T3 represent three time periods.
During the invention process, between the superblock of operation, do not exist register data to rely on simultaneously, and do not mean that and between a plurality of core, do not need SYN register.Because the mutually continuous two-wheeled in front and back is speculated to carry out, between superblock, still can exist register data to rely on, and on may the offices different core of these superblocks.So, every take turns speculate to carry out and to start before, need the register value of synchronous all processor cores.
Illustrate, suppose to have 4 superblock: B 0, B 1, B 2, B 3, and there are two core C 0, C 1processor on move.The first round speculates to carry out, B 0and B 1be dispatched to respectively the core C of core processor 0with core C 1upper, B 0with B 1between do not have register data to rely on.Second takes turns congenial execution, B 2and B 3be dispatched to respectively core C 0with core C 1upper, B 2with B 3between do not have register data to rely on yet, so this scheduling meets our requirement.But B 3certain the register r reading in is by B 0write, then second take turns and speculate to carry out and to start before, we need to be the value of r from core C 0be synchronized to core C 1in go.
In order to reduce the synchronous time delay of register, between a plurality of core, share register file, and manage by the mode of register renaming.Processor core when speculate carrying out, the logic register that must not write direct, and the physical register of writing does not directly correspond on logic register, only just sets up correspondence when submitting to.
Fig. 4 has illustrated a complete process of the invention process, and in figure four arrows are from left to right divided into five parts figure.First is from left to right that the present invention needs executable file to be processed; The corresponding step 1) of the present invention of second portion and step 2), the procedure division being about in executable file is each superblock; The data dependence of the register between static analysis superblock; The corresponding step 3) of the present invention of third part and step 4), the antidependence that soon static analysis obtains and output rely on to be eliminated from executable file, and the true data that static analysis is obtained relies on and writes executable file; The corresponding step 5) of the present invention of the 4th part and the 5th part, the program being about in executable file be take superblock as thread congenial execution on polycaryon processor.The 4th part is that operating system completes, and the 5th part is the work of processor.The 5th part and Fig. 3 are corresponding.
Above-mentioned embodiment is used for the present invention that explains, rather than limits the invention, and in the protection domain of spirit of the present invention and claim, any modification and change that the present invention is made, all fall into protection scope of the present invention.

Claims (3)

1. the speculative parallelism method based on superblock, is characterized in that: comprise the following steps:
1) by the procedure division in executable file, be each superblock;
2) data dependence of the register between static analysis superblock;
3) antidependence static analysis being obtained and output dependence are eliminated from executable file;
4) true data dependence static analysis being obtained writes executable file;
5) program in executable file be take to superblock as thread congenial execution on polycaryon processor.
2. a kind of speculative parallelism method based on superblock according to claim 1, is characterized in that: the data dependence of the register between the static analysis superblock described step 2) specifically comprises the following steps:
2.1) scan each superblock, gather the register read write operation in wherein all superblock instructions;
2.2) according to step 2.1) gathering of obtaining, obtain true data dependence, antidependence and output dependence.
3. a kind of speculative parallelism method based on superblock according to claim 1, it is characterized in that: between each core of the polycaryon processor in described step 5), share register file, when a plurality of superblocks of existing true data to rely on while making to speculate to carry out are different on polycaryon processor, carry out.
CN201410146566.5A 2014-04-14 2014-04-14 Super block-based based speculation parallelization method Pending CN103955406A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410146566.5A CN103955406A (en) 2014-04-14 2014-04-14 Super block-based based speculation parallelization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410146566.5A CN103955406A (en) 2014-04-14 2014-04-14 Super block-based based speculation parallelization method

Publications (1)

Publication Number Publication Date
CN103955406A true CN103955406A (en) 2014-07-30

Family

ID=51332681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410146566.5A Pending CN103955406A (en) 2014-04-14 2014-04-14 Super block-based based speculation parallelization method

Country Status (1)

Country Link
CN (1) CN103955406A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515231A (en) * 2009-03-23 2009-08-26 浙江大学 Realization method for parallelization of single-threading program based on analysis of data flow
CN101667135A (en) * 2009-09-30 2010-03-10 浙江大学 Interactive parallelization compiling system and compiling method thereof
CN102043659A (en) * 2010-12-08 2011-05-04 上海交通大学 Compiling device for eliminating memory access conflict and implementation method thereof
EP2372530A1 (en) * 2008-11-28 2011-10-05 Shanghai Xinhao Micro-Electronics Co. Ltd. Data processing method and device
CN103080900A (en) * 2010-09-03 2013-05-01 西门子公司 Method for parallelizing automatic control programs and compiler
CN103377035A (en) * 2012-04-12 2013-10-30 浙江大学 Pipeline parallelization method for coarse-grained streaming application

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2372530A1 (en) * 2008-11-28 2011-10-05 Shanghai Xinhao Micro-Electronics Co. Ltd. Data processing method and device
CN101515231A (en) * 2009-03-23 2009-08-26 浙江大学 Realization method for parallelization of single-threading program based on analysis of data flow
CN101667135A (en) * 2009-09-30 2010-03-10 浙江大学 Interactive parallelization compiling system and compiling method thereof
CN103080900A (en) * 2010-09-03 2013-05-01 西门子公司 Method for parallelizing automatic control programs and compiler
CN102043659A (en) * 2010-12-08 2011-05-04 上海交通大学 Compiling device for eliminating memory access conflict and implementation method thereof
CN103377035A (en) * 2012-04-12 2013-10-30 浙江大学 Pipeline parallelization method for coarse-grained streaming application

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张小强: "基于事务的软件投机并行机制研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Similar Documents

Publication Publication Date Title
CN105389158B (en) Data processing system, compiler, method of processor, and machine-readable medium
Green et al. GPU merge path: a GPU merging algorithm
US9672035B2 (en) Data processing apparatus and method for performing vector processing
CN101777007B (en) Parallel function simulation system for on-chip multi-core processor and method thereof
US8793692B2 (en) Language for task-based parallel programming
US10013290B2 (en) System and method for synchronizing threads in a divergent region of code
WO2020083050A1 (en) Data stream processing method and related device
US7581222B2 (en) Software barrier synchronization
JP2014216021A (en) Processor for batch thread processing, code generation apparatus and batch thread processing method
US9830157B2 (en) System and method for selectively delaying execution of an operation based on a search for uncompleted predicate operations in processor-associated queues
Anantpur et al. Runtime dependence computation and execution of loops on heterogeneous systems
US9239732B2 (en) Unrolling aggregation operations in asynchronous programming code having multiple levels in hierarchy
US20160055029A1 (en) Programmatic Decoupling of Task Execution from Task Finish in Parallel Programs
CN111026444A (en) GPU parallel array SIMT instruction processing model
Kiessling An introduction to parallel programming with OpenMP
WO2018076979A1 (en) Detection method and apparatus for data dependency between instructions
Zhang et al. GPU-TLS: An efficient runtime for speculative loop parallelization on gpus
CN105446733B (en) Data processing system, method for data processing system, and readable storage medium
Valero et al. Towards a more efficient use of gpus
CN103955406A (en) Super block-based based speculation parallelization method
Duarte et al. On the performance and energy-efficiency of multi-core SIMD CPUs and CUDA-enabled GPUs
Tomiyama et al. Automatic parameter optimization for edit distance algorithm on GPU
Gu et al. Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor
Alyasseri et al. Parallelize Bubble Sort Algorithm Using OpenMP
US10209997B2 (en) Computer architecture for speculative parallel execution

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140730

WD01 Invention patent application deemed withdrawn after publication