CN103955406A

CN103955406A - Super block-based based speculation parallelization method

Info

Publication number: CN103955406A
Application number: CN201410146566.5A
Authority: CN
Inventors: 李颂元; 袁明敏; 孟静磊; 叶敏娇; 陈天洲; 施青松; 刘莉
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-04-14
Filing date: 2014-04-14
Publication date: 2014-07-30

Abstract

The invention discloses a super block-based speculation parallelization method which comprises the following steps: dividing a program in an executable file into each super block; statically analyzing data dependence of a register between the super blocks; eliminating counter dependence and output dependence obtained by static analysis from the executable file; writing true data obtained by static analysis into the executable file; performing speculation execution on the program in the executable file on a multi-core processor by taking the super blocks as units. The super block-based speculation parallelization method can be used for eliminating the counter dependence and output dependence of the register between the super blocks; when the multi-core processor takes the super blocks as scheduling units and is used for preventing the speculative execution multiple super blocks with true data dependence on the processor at the same time; the super block-based speculation parallelization method can be used for improving the program parallelism, and reducing the speculation fault risk caused by a situation that processor scheduling particles become thinner.

Description

A kind of speculative parallelism method based on superblock

Technical field

The present invention relates to a kind of computing machine parallel method, especially relate to a kind of speculative parallelism method based on superblock in computer compile technology and architecture field.

Background technology

Polycaryon processor is the development trend of current computer processor, however programmer uncomfortablely write parallel program.Although the core amounts of processor is more and more, polycaryon processor being underutilized in most of the cases.

Superblock is the technique of compiling for very-long instruction word processor and superscalar processor design.It has broken through the restriction of program fundamental block, has further developed the concurrency of program.

Processor is to take thread as basic thread at present, and scheduling size ratio superblock wants thick, and concurrency can further be excavated.

Congenial execution is a kind of hardware technology, and it can allow the out of order execution of instruction, finally completes according to the order of sequence simultaneously.When introducing the basic thread of superblock as processor, can attenuate because of scheduling granularity, thereby increase the risk failing in speculation.

Summary of the invention

The object of this invention is to provide a kind of speculative parallelism method based on superblock, it can make full use of the multinuclear resource of polycaryon processor.

The technical solution used in the present invention comprises the following steps:

1) by the procedure division in executable file, be each superblock;

2) data dependence of the register between static analysis superblock;

3) antidependence static analysis being obtained and output dependence are eliminated from executable file;

4) true data dependence static analysis being obtained writes executable file;

5) program in executable file be take to superblock as thread congenial execution on polycaryon processor.

The data dependence of the register between the static analysis superblock described step 2) specifically comprises the following steps:

2.1) scan each superblock, gather the register read write operation in wherein all superblock instructions;

2.2) according to step 2.1) gathering of obtaining, obtain true data dependence, antidependence and output dependence.

Between each core of polycaryon processor in described step 5), share register file, when a plurality of superblocks of existing true data to rely on while making to speculate to carry out are different on polycaryon processor, carry out.

The invention has the beneficial effects as follows:

The present invention has eliminated the register dependence between superblock.When polycaryon processor is using superblock as thread and while avoiding existing a plurality of superblocks that true data relies on to speculate to carry out on processor simultaneously, the present invention can improve the concurrency of program, reduces the risk failing in speculation of bringing when processor scheduling granularity attenuates.

Accompanying drawing explanation

Fig. 1 is the schematic diagram of performing step of the present invention.

Fig. 2 is the enforcement schematic diagram that true data of the present invention relies on data flow diagram.

Fig. 3 is that the present invention is dispatched to superblock the enforcement illustration of three core processors.

Fig. 4 is the complete procedure schematic diagram of the embodiment of the present invention.

Embodiment

Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.

As shown in Figure 1, of the present invention comprising the following steps:

1) by the procedure division in executable file, be each superblock;

2) data dependence of the register between static analysis superblock;

4) true data dependence static analysis being obtained writes executable file;

The data dependence of the register between the static analysis superblock step 2) specifically comprises the following steps:

2.2) according to step 2.1) gathering of obtaining, obtain representing true data dependence, antidependence and output dependence.

Between each core of polycaryon processor in step 5), share register file, facilitate SYN register between a plurality of core, when a plurality of superblocks of existing true data to rely on while making to speculate to carry out are different on polycaryon processor, carry out.The present invention need to do the synchronous of register between the core of processor because the mutually continuous two-wheeled in front and back speculate the superblock carried out between, may there is data dependence, by sharing register file, undertaken synchronously.

Polycaryon processor is double-core and above processor.

Data dependence between superblock, can be divided into the data dependence of register and the data dependence of internal memory two classes.The numbering of register can directly be obtained in instruction, so can obtain the register read-write dependence between superblock by static analysis; As for the read-write of internal memory, its address is generally that register base address adds skew, could determine, so cannot obtain the data dependence between superblock by static analysis when base address often will be moved.

For data dependence that cannot static analysis, can only when operation, be speculated, inevitably will introduce the risk failing in speculation, therefore the static analysis before, at least obtained the register read-write dependence between superblock, avoid the congenial read operation of register, thereby reduce this part of speculative risk.

For the data dependence of the register between superblock, antidependence and the output that should eliminate between each superblock rely on, and retain true data and rely on.Antidependence also makes writeafterread WAR rely on, and output relies on also makes write after write WAW rely on, and true data relies on also makes read-after-write RAW rely on.

In concrete enforcement, during static analysis data dependence, introduce data flow diagram and represent that true data relies on.As shown in Figure 2, superblock of each box indicating in figure, numeral in square frame is the sequence number of superblock, arrow in square frame represents that the data that the superblock of arrow top connection writes will read in the superblock of arrow end connection, it is the top superblock that the superblock of end depends on, be that superblock 5 depends on superblock 2, superblock 3 and superblock 4, superblock 2, superblock 3 and superblock 4 depend on superblock 1.

After static analysis, need to will represent that the data flow diagram that true data relies on writes executable file.According to the data flow diagram of the superblock in executable file, implement the scheduling of superblock, just can avoid the different superblocks that exist true data to rely on to carry out simultaneously.

Embodiments of the invention:

Fig. 3 has illustrated once the superblock shown in Fig. 2 at least to be had to the example of dispatching on the processor of three cores at one.Wherein, C1, C2, C3 represent three cores of processor, and B1, B2, B3, B4, B5 represent five superblocks, and T1, T2, T3 represent three time periods.

During the invention process, between the superblock of operation, do not exist register data to rely on simultaneously, and do not mean that and between a plurality of core, do not need SYN register.Because the mutually continuous two-wheeled in front and back is speculated to carry out, between superblock, still can exist register data to rely on, and on may the offices different core of these superblocks.So, every take turns speculate to carry out and to start before, need the register value of synchronous all processor cores.

Illustrate, suppose to have 4 superblock: B ₀, B ₁, B ₂, B ₃, and there are two core C ₀, C ₁processor on move.The first round speculates to carry out, B ₀and B ₁be dispatched to respectively the core C of core processor ₀with core C ₁upper, B ₀with B ₁between do not have register data to rely on.Second takes turns congenial execution, B ₂and B ₃be dispatched to respectively core C ₀with core C ₁upper, B ₂with B ₃between do not have register data to rely on yet, so this scheduling meets our requirement.But B ₃certain the register r reading in is by B ₀write, then second take turns and speculate to carry out and to start before, we need to be the value of r from core C ₀be synchronized to core C ₁in go.

In order to reduce the synchronous time delay of register, between a plurality of core, share register file, and manage by the mode of register renaming.Processor core when speculate carrying out, the logic register that must not write direct, and the physical register of writing does not directly correspond on logic register, only just sets up correspondence when submitting to.

Fig. 4 has illustrated a complete process of the invention process, and in figure four arrows are from left to right divided into five parts figure.First is from left to right that the present invention needs executable file to be processed; The corresponding step 1) of the present invention of second portion and step 2), the procedure division being about in executable file is each superblock; The data dependence of the register between static analysis superblock; The corresponding step 3) of the present invention of third part and step 4), the antidependence that soon static analysis obtains and output rely on to be eliminated from executable file, and the true data that static analysis is obtained relies on and writes executable file; The corresponding step 5) of the present invention of the 4th part and the 5th part, the program being about in executable file be take superblock as thread congenial execution on polycaryon processor.The 4th part is that operating system completes, and the 5th part is the work of processor.The 5th part and Fig. 3 are corresponding.

Above-mentioned embodiment is used for the present invention that explains, rather than limits the invention, and in the protection domain of spirit of the present invention and claim, any modification and change that the present invention is made, all fall into protection scope of the present invention.

Claims

1. the speculative parallelism method based on superblock, is characterized in that: comprise the following steps:

1) by the procedure division in executable file, be each superblock;

2) data dependence of the register between static analysis superblock;

4) true data dependence static analysis being obtained writes executable file;

2. a kind of speculative parallelism method based on superblock according to claim 1, is characterized in that: the data dependence of the register between the static analysis superblock described step 2) specifically comprises the following steps:

3. a kind of speculative parallelism method based on superblock according to claim 1, it is characterized in that: between each core of the polycaryon processor in described step 5), share register file, when a plurality of superblocks of existing true data to rely on while making to speculate to carry out are different on polycaryon processor, carry out.