CN102880449B

CN102880449B - Method and system for scheduling delay slot in very-long instruction word structure

Info

Publication number: CN102880449B
Application number: CN201210347706.6A
Authority: CN
Inventors: 朱浩; 彭楚; 王东辉; 洪缨; 侯朝焕
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2012-09-18
Filing date: 2012-09-18
Publication date: 2014-11-05
Anticipated expiration: 2032-09-18
Also published as: CN102880449A

Abstract

The invention discloses a method and a system for scheduling a delay slot in a very-long instruction word structure. The method comprises the steps of locally scheduling instructions in a current basic block; after the local scheduling is finished, judging whether a residual instruction delay slot exists, if not, ending the scheduling, otherwise, putting an instruction which can be filled into the instruction delay slot and is high in spending into a local standby instruction cache; globally scheduling instructions in a basic block of a branch target, selecting an instruction which can be filled into the instruction delay slot and placing the instruction in a global standby instruction cache; and selecting an instruction from the local standby instruction cache and/or the global standby instruction cache and filling the instruction into the residual instruction delay slot. The system comprises a local scheduling unit, a global scheduling unit and a balanced scheduling unit. According to the method and the system for scheduling the delay slot in the very-long instruction word structure disclosed by the invention, through balance between scheduling of the delay slot and program parallelism, as well as balance between local scheduling and global scheduling, high execution efficiency of programs can be implemented.

Description

Under a kind of very long instruction word structure, postpone groove dispatching method and system thereof

Technical field

The present invention relates to a kind of instruction scheduling technology, relate in particular under a kind of very long instruction word structure and postpone groove dispatching method and system thereof.

Background technology

(Very Long Instruction Word is called for short: VLIW) be a kind of very long packing of orders, it connects together many instructions, has increased the speed of computing very long instruction word.VLIW technology is one of main performance improving processor instruction level concurrency, and it adopts the concurrency of software-hardware synergism development process device.The assembling of long instruction is completed by compiler, rather than adopts superscalar processor based on hardware dynamic scheduling strategy, thereby has significantly reduced hardware complexity and chip power-consumption.

At digital signal processor (Digital Signal Processing, abbreviation DSP) in architecture Design, normally at Reduced Instruction Set Computer, (Reduced Instruction Set Computer is called for short: RISC) under architecture, in conjunction with VILW technology, study.Adopt in the flow water treater of this architecture, branch instruction is to put forward a high performance significant obstacle, this be because the instruction prefetch address after branch instruction need to be after branch instruction be carried out relatively pipelining-stage time could produce, middle delay can cause the branch outcome such as instruction pipelining pause, thereby directly affect the execution of subsequent instructions sequence, reduced parallel instructions degree.Postpone groove structure just in order to reduce the relevant performance cost of bringing of this type of instruction flow line line traffic control stream, its principle is, if inserted after branch instruction and the incoherent instruction of branch instruction, the instruction pipelining of processor will be still in running order so, the delay that branch instruction is brought will rationally be applied by these instructions, and postponing groove is exactly for depositing the structure of these instructions.Postpone groove structure and want to obtain good result in practical programs is carried out, need the support of software dispatching algorithm.If be filled with the instruction of use in postponing groove, the handling property of processor just may improve so, and if can not find suitable instruction and make to postpone groove and have to fill non-operation instruction, control so the performance loss that associative operation brings and still exist.Therefore, how allowing instruction delay groove obtain as far as possible and use fully on software, is probing into of value.

Postpone groove dispatching algorithm and mainly comprise local scheduling algorithm and overall scheduling algorithm two schemes.Conventional compiler only adopts the scheduling of comparatively simple local function fragment, it is local scheduling, the instruction that postpones to fill in groove is chosen from fundamental block (Basic Block), fundamental block is a statement sequence that order is carried out in program, when carrying out, can only enter from entry statement, from exit statement, exit (fundamental block only has an exit statement and an entry statement), fundamental block with branch instruction as END instruction.If there is no suitable instructions in fundamental block can select, so just fill non-operation instruction, overall scheduling is in finding fundamental block, not have suitable instructions to fill to postpone after groove, also can based on certain constraint rule, from other fundamental blocks, select suitable instruction to fill, that is to say that overall scheduling allows to cross over the code moving on fundamental block border, if this process is failure also, select so again non-operation instruction to fill, but the cost that realizes overall scheduling on compiler is larger, and directly affect the compilation speed of compiler.

In traditional dsp processor architecture Design, conventionally only use local scheduling algorithm or be limited to the compromise of the consideration of compilation speed being selected to overall scheduling and compiler efficiency, and local scheduling algorithm is confined to the optional target instruction target word number of local function fragment seldom conventionally, when compiler is fully optimized, it is on the low side that the logicality enhancing in local function fragment between instruction causes postponing groove utilization rate.In traditional dsp processor architecture Design scheme architecture to realizing based on VLIW technology useless, cannot assess the impact on parallel instructions degree, and may bring the destruction to parallel instructions degree to postponing the use of groove, affect on the contrary the actual efficiency of program.

Summary of the invention

The method that the object of this invention is to provide the delay groove scheduling under a kind of very long instruction word structure, to reach the balance postponing between groove scheduling and program parallelization degree, the balance between local scheduling and overall scheduling, thus make program obtain higher execution efficiency.

For achieving the above object, on the one hand, the invention provides the delay groove dispatching method under a kind of very long instruction word structure, the method comprises the following steps:

Instruction in current fundamental block is carried out to local scheduling, after described local scheduling completes, judged whether that remaining command postpones groove, if do not have, finishing scheduling; Otherwise can be packed into instruction delay groove but local alternative instruction buffer is put in the larger instruction of expense;

Instruction in branch target fundamental block is carried out to overall scheduling, choose the instruction that can be packed into instruction delay groove and put into overall alternative instruction buffer;

From the alternative instruction buffer in described part and/or the alternative instruction buffer of the described overall situation, choose instruction and be packed into described remaining command delay groove.On the other hand, the present invention also provides the dispatching system of the delay groove under a kind of very long instruction word structure, and this system comprises:

Local scheduling unit, for instruction in current fundamental block is carried out to local scheduling, has judged whether after described local scheduling completes that remaining command postpones groove, if do not have, and finishing scheduling; Otherwise can be packed into instruction delay groove but local alternative instruction buffer is put in the larger instruction of expense;

Overall scheduling unit, puts into overall alternative instruction buffer for instruction in branch target fundamental block being carried out to overall scheduling, choose being packed into the instruction of instruction delay groove;

Balance scheduling unit, is packed into described remaining command and postpones groove for choose instruction from the alternative instruction buffer in described part and/or the alternative instruction buffer of the described overall situation.

The method of the delay groove scheduling under a kind of very long instruction word structure that the embodiment of the present invention provides can set about postponing from assembly level the instruction filling work of groove, the method combines and postpones local scheduling strategy and overall scheduling strategy in groove dispatching method, to reach the balance postponing between groove scheduling and program parallelization degree, balance between local scheduling and overall scheduling, thus make program obtain higher execution efficiency.

Accompanying drawing explanation

After embodiments of the present invention being described in detail with way of example below in conjunction with accompanying drawing, other features of the present invention, feature and advantage will be more obvious.

Fig. 1 postpones groove dispatching system structural drawing under a kind of very long instruction word structure of the embodiment of the present invention;

Fig. 2 postpones groove dispatching method process flow diagram under a kind of very long instruction word structure of the embodiment of the present invention;

Fig. 3 is the oriented instruction dependency graph of local scheduling front construction;

Fig. 4 is the oriented instruction dependency graph of overall scheduling front construction.

Embodiment

Below by drawings and Examples, the application's technical scheme is described in further detail.

Delay groove dispatching method under a kind of VLIW structure that the embodiment of the present invention proposes, for having set about postponing the instruction filling work of groove from assembly level.The method combines local scheduling strategy, overall scheduling strategy in the own delay groove dispatching algorithm proposing, and the requirement to parallel instructions for VLIW structure, design balance scheduling strategy to reach the balance between balance, local scheduling and the overall scheduling between the scheduling of instruction delay groove and program parallelization degree, taken this to obtain high as far as possible instruction pipelining performance.

Below in conjunction with Fig. 1 and Fig. 2, introduce in detail delay groove dispatching method and the system thereof under embodiment of the present invention VLIW structure.As shown in Figure 1, this system comprises processing unit 11, local scheduling unit 21, overall scheduling unit 22, balance scheduling unit 31, statistic unit 32, and code generation unit 41.

Processing unit 11 is for resolving successively the assembling file of input, and from assembling file, obtains the specifying information that every assembly instruction comprises.Above-mentioned specifying information comprises the division of the hardware capability unit that register information, target address information, instruction title, instruction are involved, and cycle (cycle) number carried out of instruction.Processing unit 11 be take function fragment as unit by the assembling file receiving, and hash index table is stored and set up to function label, to facilitate index.

Processing unit 11 adopts doubly linked list structure that assembly instruction Yi Baowei unit is organized.For example, assembling file is as follows:

test:

pr0add d0,d1,d2|1

pr0sub d3,d4,d5|2

pr0add d6,d7,d8|3

pr0sub d9,d10,d11.4

pr0 j test1.5

test1:

pr0 addia a10,1.6

pr0 addia a8,1|7

pr0 j test 1.8

pr0 addia a5,1|9

pr0 sub d9,d10,d11.10

pr0 j test .11

Above assembling file is divided into two function fragments, be function fragment test and function fragment test1, instruction 1～4 in function fragment tes t forms instruction bag pak1, end symbol " | " in instruction 1～instruction 3 represents that this assembly instruction is for can and wrap instruction, end symbol ". " in instruction 4 represents that this instruction is that the last item can and wrap instruction, child list in pak1 is organized by the logical order of instruction 1～4, and instruction 5 forms instruction bag pak2.Therefore, function fragment tes t consists of pak1 and pak2.The object set that carries out local scheduling for function fragment test is { pak1, pak2 }, and according to branch instruction, the object set that is the inference register pr0 analysis overall scheduling of instruction 5 is the whole instruction bags to next branch instruction or function fragment or between interior two branch instructions of subfunction fragment from the instruction 6 of test1, as the instruction 9～instruction 11 in above assembling file.

Statistic unit 32 is for adding up the static instruction cycles of objective function fragment, and the variation of the instruction cycles bringing after local scheduling and overall scheduling.The function fragment periodicity obtaining by statistic unit 32 statistics becomes the foundation of assessment instruction pipelining performance quality.

Particularly, postpone in groove dispatching system under very long instruction word (VLIW) structure, each instruction delay groove can represent saves an instruction cycle.Before carrying out local scheduling and overall scheduling, statistic unit 32 need to travel through the set of instruction bag { pak1, the pak2 of concrete function fragment, ..., pakn } in the sub-instructions chained list of each element, and calculate maximum instruction cycles, form maximum instruction cycles set { c1, c2, ... cn }, the value of the whole elements in cumulative this set, thus obtain the instruction cycles of this function fragment.

The embodiment of the present invention postpones groove dispatching system when starting local scheduling, on the one hand, and the instruction cycles that needs considering delay groove to save; Another aspect, need to consider very long instruction word (VLIW) structure the give an order fractionation of bag and the impact of the instruction cycles that brings of restructuring.

Local scheduling unit 21 is for carrying out local scheduling to instruction in current fundamental block, judged whether that remaining command postpones groove after local scheduling completes, if do not have, and finishing scheduling; Otherwise can be packed into instruction delay groove but local alternative instruction buffer is put in the larger instruction of expense.

Particularly, the local scheduling of instruction in function fragment is realized in local scheduling unit 21, from the sub-fragment of objective function, choose with branch instruction, below the irrelevant instruction of instruction is packed into delay groove, what embody is a kind of process from bottom to top, need to analyze the dependence between instruction in current fundamental block carrying out local scheduling early stage, construct local oriented instruction dependency graph, the Ingress node select target branch instruction of local instruction dependency graph.According to local instruction dependency graph, can from fundamental block, find out the whole instructions that have relevant conflict to branch instruction, form local correlation instruction set, and from fundamental block, find out the whole instructions that can insert in instruction delay groove, form local alternative instruction set.Function fragment as shown in Fig. 3 left hand view, suppose that it is address register a11 that call in branch instruction 9 gives tacit consent to the register using under this architecture, in Fig. 2 right part of flg, solid-line curve partly represents that with call instruction exists the instruction of data dependence relation, show to exist the dependence of writeafterread, dependent instruction set { 1,2,6,7,8 } cannot be received in instruction delay groove; And dashed curve partly represents that instruction 7 in instruction 5 and dependent instruction set exists the antidependence relation of read-after-write in Fig. 2 right part of flg, i.e. instruction 5 cannot be added in instruction delay groove equally, and therefore final associated instruction set is combined into { 1,2,5,6,7,8,9 }.The alternative instruction set in part that therefore, may be packed into instruction delay groove in the function fragment shown in Fig. 3 left hand view is { 3,4 }.

Can be according to the number that postpones groove in target architecture by alternative instruction set { 3 in traditional local scheduling algorithm, 4 } be added into as much as possible in instruction delay groove, yet, adopting on the target architecture of very long instruction word (VLIW) structure technology, the effect of filling according to traditional local scheduling mode also not obviously even has no effect.End symbol by each assembly instruction in Fig. 3 left hand view can be found out instruction set { 3,4,5 } can form a parallel instructions bag, instruction { 1 }, instruction { 2 }, instruction set { 6,7 } and instruction set { 8,9 } form equally independently parallel instructions bag, { 5} forms dependent instruction subclass.

For local scheduling unit 21 balance that balance scheduling unit 31 carries out when carrying out local scheduling, scheduling describes below.According to target architecture, give an order and postpone the comparison of number of instructions in groove number and local alternative instruction set, and in local correlation instruction set, be packed into preparation the local correlation subset of instructions that instruction in the alternative instruction set in part in instruction delay groove is present in same parallel instructions bag and close, whether can with dependent instruction set in other elements reconstitute the situations such as parallel instructions bag the balance Scheduling Algorithms in local scheduling discussed:

The first situation: instruction delay groove number is greater than or equal to local alternative instruction set { 3,4 } element number, and the local correlation subset of instructions in function fragment is closed, it is instruction 5, with the instruction the instruction in dependent instruction subclass 5 in dependent instruction set, be that instruction { 1 }, instruction { 2 }, instruction set { 6,7 } or instruction set { 8,9 } cannot form new parallel instructions bag.

In the first situation, for the instruction scheduling of the alternative instruction set in part { 3,4 }, will cause two instruction delay grooves occupied, and but form independently instruction bag with the instruction { 5 } that the alternative instruction set in part { 3,4 } can form parallel instructions bag.So on the one hand, because the instruction in function fragment { 3 }, instruction { 4 } and instruction { 5 } itself can be merged into an instruction bag, the scheduling for the alternative instruction set in part { 3,4 } will can not change instruction set { 3 so, 4,5 } taking instruction pipelining; On the other hand, the scheduling of alternative instruction set { 3,4 } will cause the waste of instruction delay groove.Therefore, for the first situation, will can not carry out the scheduling of local alternative instruction set { 3,4 }, and the overall scheduling that enters next step.

The second situation: instruction delay groove number is greater than or equal to alternative instruction set element number, and local correlation instruction set and close and can form new parallel instructions bag with described local correlation subset of instructions.

For example, instruction delay groove number is more than or equal to local alternative instruction set { 3,4 } element number, and instruction { 5 } can and instruction { 1 }, instruction { 2 }, instruction set { 6,7 } or instruction set { 8,9 } form new parallel instructions bag.

In the second situation, for the alternative instruction set { 3 in part, 4 } instruction scheduling will cause two instruction delay grooves occupied equally, and instruction { 5 } also needs to discuss two kinds of situation: situation A while merging to other instruction bags, when the introducing of instruction { 5 } causes two instruction parlors to add extra nop (sky) instruction, local alternative instruction set { 3,4 } can not be scheduled; Situation B, the introducing of instruction { 5 } can not cause the introducing of extra nop instruction, and for situation B, the result while also needing according to overall scheduling is weighed.

The third situation: instruction delay groove number is less than local alternative instruction set element number, and with described local correlation subset of instructions close with the alternative instruction set in described part in one or more instruction of appointing can form new parallel instructions bag with described local correlation instruction set.

For example, instruction delay groove number is less than local alternative instruction set { 3,4 } element number, and instruction { 5 } can with the alternative instruction set { 3 in part, 4 } instruction in { 3 } or instruction { 4 } and instruction { 1 }, instruction { 2 }, instruction set { 6,7 } or instruction set { 8,9 } form new parallel instructions bag.

In the third situation, by local scheduling, from the alternative instruction set in part { 3,4 }, choose an instruction and insert instruction delay groove, save 1 instruction cycle.But result when its result also needs according to overall scheduling is weighed.

The 4th kind of situation: instruction delay groove number is less than local alternative instruction set element number, and with described local correlation subset of instructions close with the alternative instruction set in described part in one or more instruction of appointing all can not form new parallel instructions bag with described local correlation instruction set.

For example, instruction delay groove number is less than local alternative instruction set { 3,4 } element number, and instruction { 5 } not all can with the alternative instruction set { 3 in part, 4 } instruction in { 3 } or instruction { 4 } and instruction { 1 }, instruction { 2 }, instruction set { 6,7 } or instruction set { 8,9 } form new parallel instructions bag.

In the 4th kind of situation, by local scheduling, an instruction in the alternative instruction in part { 3,4 } is inserted to instruction delay groove and will can not promote the performance of instruction stream water-based, therefore can only carry out overall scheduling.

The 5th kind of situation, instruction delay groove number is less than local alternative instruction set element number, and with described local correlation subset of instructions close with the alternative instruction set in described part in one or more instruction of appointing all cannot form new parallel instructions bag with described local correlation instruction set.

For example, instruction delay groove number is less than local alternative instruction set { 3,4 } element number, and instruction { 5 } and instruction { 3 } or instruction { 4 } all cannot and instruction { 1 }, instruction { 2 }, instruction set { 6,7 } or instruction set { 8,9 } form new parallel instructions bag.

In the 5th kind of situation, if instruction { 3 } and instruction { 4 } are present in same parallel instructions bag, so by local scheduling by the alternative instruction set { 3 in part, 4 } instruction delay groove is inserted in an instruction in can not promote the performance of instruction pipelining equally, therefore can only carry out overall scheduling.If instruction { 3 } and instruction { 4 } are present in same parallel bag, by local scheduling, utilize local alternative instruction { 3,4 } to fill up delay groove so, just can save and postpone groove number, therefore only need carry out local scheduling.

It should be noted that, local scheduling is to judge whether to fill delay groove according to the present instruction delay number of groove and the expense of local alternative instruction set, in local scheduling, need to provide an alternative buffer in part, for storing that local alternative instruction set can be scheduled but the larger alternative instruction set in part of expense, after the overall scheduling at next step completes, the more alternative instruction selection instruction of storing from local scheduling buffer and overall scheduling buffer is packed in remaining command delay groove.

Preferably, in the alternative instruction set by part, can be packed into instruction delay groove but the larger instruction of expense while putting into local alternative buffer, first delete the instruction that can not make performance improve in local instruction set.

Overall scheduling need to be moved the partial code in branch target function fragment and postpone in groove, but call normally many-one but not man-to-man relation between program function, therefore, the natural conjugate branch of overall scheduling algorithm is predicted to realize, the common like this increase that causes size of code.Yet under very long instruction word structure, widely used inference register but allows overall scheduling algorithm obtain better realization, the architecture of a processor of take is example, and retaining the guided instruction of inference register pr0 will always be performed.Therefore, can confirm whether branch instruction is necessarily carried out according to the call number of pr register.

What overall scheduling embodied is a kind of top-down heuristic process, than local scheduling, it need to make certain adjustment to the parallel instructions of object code fragment, is the process of a kind of " First come first served " (first obtain the processing of scheduling power, rear acquisition is not processed).In the early stage of using in overall scheduling, need to construct overall oriented instruction dependency graph equally, and the same select target branch instruction of the Ingress node of this dependency graph.According to instruction dependency graph, can from fundamental block, find out the whole instructions that have relevant conflict to branch instruction, form overall dependent instruction set, and from fundamental block, find out the whole instructions that can insert in instruction delay groove, the alternative instruction set of the overall situation while forming overall scheduling.And according to overall dependent instruction set search with the alternative instruction set of the overall situation in order element be present in the order element of the parallel bag of same instructions, form overall dependent instruction subclass.

Function fragment as shown in Figure 4, suppose that { it is address register a11 that the register using is given tacit consent in the call instruction in 9} under this architecture to branch instruction, in Fig. 4 the solid-line curve shown in right part of flg part there is the instruction of data dependence relation in representative and call instruction, it is instruction set { 14,15 }, instruction set { 14,15 } cannot put into instruction delay groove, and dashed curve representative is the data dependence relation between objective function fragment native instructions, be that instruction 11 can not be carried out before instruction 10, instruction 15 can not be carried out before instruction 11.The alternative instruction set of the overall situation that therefore, finally may be packed into instruction delay groove is { 10,11,12,13 }.

Under traditional pattern, according to postponing the remaining quantity of groove, be packed into as much as possible instruction, and conventionally easily ignore the concurrency that usability of program fragments self has.

From the function fragment of Fig. 4 left hand view, can find out overall alternative instruction set { 10,11,12,13 } subclass in { 11,12,13 } is in same parallel instruction bag, the use instruction delay groove of maximal efficiency how, can divide following several situation to discuss equally:

For overall scheduling unit 22 balance that balance scheduling unit 31 carries out when carrying out overall scheduling, scheduling describes below.Below in conjunction with Fig. 3, and according to the comparable situation that remains instruction delay groove number and alternative instruction set element number under target architecture, the balance Scheduling Algorithms in overall scheduling is discussed:

The first situation, remaining command postpones groove number and is more than or equal to overall alternative instruction set { 10,11,12,13 } element number.

In the first situation, overall alternative instruction set { 10,11,12,13 } can all be inserted in instruction delay groove.Due to the alternative instruction set { 10 of the overall situation, 11,12,13 } subclass { 11,12 in, 13 } and instruction 14 be in same parallel instruction bag, when instruction { 14 } can not be added into the parallel instruction bag at instruction { 15 } place, overall alternative instruction set { 10,11,12,13 } all insert the periodicity that the effect of bringing in instruction delay groove has only reduced the parallel instruction bag consisting of instruction { 10 }; When instruction { 14 } can be added into the parallel instruction bag at instruction { 15 } place, overall alternative instruction set { 10,11,12,13 } all inserting the effect of bringing in instruction delay groove has only reduced by instruction { 10 } and instruction set { 11,12,13 } form respectively the periodicity of parallel instruction bag.

Therefore, when the first situation when local scheduling is set up, or the situation A of the second situation when local scheduling is when set up, and the result of overall scheduling is the final scheduling result of delay groove.When the situation B of the second situation when local scheduling sets up, and residual delay groove number now equals overall alternative instruction set { 10,11,12,13 } during element number, final scheduling result is 1: when the quantity of residual delay groove is more than or equal to local alternative instruction set { 3,4} and overall alternative instruction set { 10,11,12,13 }, during order element number sum, scheduling result is that they are all added in instruction delay groove so.2: if the quantity of residual delay groove is less than local alternative instruction set { 3,4} and overall alternative instruction set { 10,11,12,13 } order element number sum, scheduling result is that the element of the alternative instruction set in part or overall alternative instruction set is all added in instruction delay groove so.

The second situation, remaining command postpones groove number and is less than overall alternative instruction set { 10,11,12,13 } element number, and is more than or equal to the alternative instruction set element number of local scheduling that local scheduling obtains.

In the second situation, overall alternative instruction set { 10,11,12,13 } can not all insert in instruction delay groove, according to traditional overall scheduling algorithm by the alternative instruction set { 10 of the overall situation, 11,12,13 } subclass { 10,11 } put into instruction delay groove, when instruction set { 12,13,14,15 } in the time of can not independently becoming to wrap, will take two instruction delay grooves, the effect of bringing is to have reduced the periodicity of the parallel instruction bag consisting of instruction { 10 }; When instruction set { 12,13,14,15 } can independently become to wrap, take two instruction delay grooves, the effect of bringing is the periodicity that has reduced by two parallel instruction bags.Or according to traditional overall scheduling scheme by alternative instruction set { 10,11,12,13 } subclass { 10,11,12 } put into instruction delay groove, when instruction set { 13,14,15 } can independently become to wrap, take three instruction delay grooves, the effect of bringing is the periodicity that has reduced by two parallel instruction bags; When instruction set { 13,14,15 } cannot independently become to wrap, take three instruction delay grooves, the effect of bringing is to have reduced the periodicity of the parallel instruction bag consisting of instruction 10.

The first situation when the effect that the second situation during overall scheduling is brought is all better than local scheduling and the second situation, therefore, overall scheduling result is the final scheduling result of instruction delay groove.

The third situation, remaining command postpones groove number and is less than 2.

In the third situation, can only from the alternative instruction set of the overall situation { 10,11,12,13 }, choose an instruction and insert in instruction delay groove, because the instruction { 10 } in the alternative instruction set of the overall situation { 10,11,12,13 } has formed single instrction bag.Therefore, instruction 10 is inserted in instruction delay groove and will be reduced by an instruction cycle, and the service efficiency of delay groove is very high.

When any situation in the five kinds of situations of the first situation to the during local scheduling is set up, the result of the third situation of overall scheduling is the final scheduling result of instruction delay groove.

It should be noted that equally, in overall scheduling scheme, still need to provide an alternative Instruction Register of the overall situation, the larger alternative instruction set of the overall situation of expense when storing overall scheduling.In balance scheduling scheme, need to judge whether local alternative Instruction Register is empty, if, from the alternative Instruction Register of the overall situation, choose instruction and be packed into instruction delay groove, otherwise the instruction of choosing best performance from the alternative Instruction Register in part and overall alternative Instruction Register is packed into remaining command, postpone groove.

Preferably, in the alternative instruction set by the overall situation, can be packed into instruction delay groove but the larger instruction of expense while putting into local alternative buffer, first delete the instruction that can not make performance improve in local instruction set.

In addition, when the number of instructions sum in overall scheduling buffer memory and local scheduling buffer memory is greater than instruction delay groove number, the instruction correlation analysis in instruction delay groove need to be considered, when adding instruction in instruction delay groove, the correlativity between instruction need to be considered equally.

In an example, in instruction delay groove, inserted three instructions (as follows),

Delayslot

Pr0 mul d0,d1,d2.1

Pr0 nop.2

Pr0 nop sub d3,d4,d0.3

Wherein, there is the dependence about data register d0 in instruction { 1 } and instruction { 3 }, and the performance period of mul instruction in instruction { 1 } be 2 cycles, traditional way is in instruction delay groove 2, to insert nop instruction, and inserting of nop instruction will be wasted instruction delay groove.Therefore, need to select other independent instructions to be packed into the position of instruction delay groove 2, thereby solve the waste of nop instruction to instruction delay groove.

Delay groove algorithm under the very long instruction word (VLIW) structure that the embodiment of the present invention provides combines traditional local scheduling algorithm and overall scheduling algorithm, and has proposed balance Scheduling Algorithms for very long instruction word structure.By postponing the balance between groove scheduling and program parallelization degree, the balance between local scheduling and overall scheduling, thus make program obtain higher execution efficiency.In addition, the overall scheduling algorithm using in computation complexity will be lower than compiler, and more flexible, can excavate the execution efficiency of target program more fully.

Obviously, do not departing under the prerequisite of true spirit of the present invention and scope, the present invention described here can have many variations.Therefore, all changes that it will be apparent to those skilled in the art that, within all should being included in the scope that these claims contain.The present invention's scope required for protection is only limited by described claims.

Claims

1. under very long instruction word structure, postpone a groove dispatching method, it is characterized in that comprising the following steps:

From the alternative instruction buffer in described part and/or the alternative instruction buffer of the described overall situation, choose instruction and be packed into described remaining command delay groove;

Described local scheduling comprises:

According to the dependence between instruction in current fundamental block, obtain local alternative instruction set and local associated instruction set and close;

According to described local correlation instruction set search with the alternative instruction set in described part in each order element be present in the order element of same parallel instructions bag, form local correlation subset of instructions and close;

According to target architecture, give an order postpone the number of groove, number of instructions and the described local correlation subset of instructions in the alternative instruction set of the alternative instruction set in described part and described part closed, from the alternative instruction set in described part, choose instruction and be packed into instruction delay groove.

2. dispatching method according to claim 1, is characterized in that, can be packed into instruction delay groove but before the larger instruction of expense puts into local alternative instruction buffer, also comprise:

Delete the instruction that can not make performance improve in the alternative instruction set in described part.

3. according to the dispatching method described in claim power 1, it is characterized in that, described instruction in branch target fundamental block carried out to overall scheduling, choose the instruction that can be packed into instruction delay groove and put into overall alternative instruction buffer and further comprise:

According to the dependence between instruction in branch target fundamental block, obtain overall alternative instruction set and overall dependent instruction set;

According to described overall dependent instruction set search with the alternative instruction set of the described overall situation in order element be present in the order element of the parallel bag of same instructions, form overall dependent instruction subclass;

According to current residual, postpone number of instructions and described overall dependent instruction subclass in number, the alternative instruction set of the described overall situation and the alternative instruction set of the described overall situation of groove, from the alternative instruction set of the described overall situation, choose the instruction that can be packed into instruction delay groove and put into overall alternative instruction buffer.

4. dispatching method according to claim 3, is characterized in that, chooses before the instruction that can be packed into instruction delay groove puts into overall alternative instruction buffer from the alternative instruction set of the described overall situation, also comprises:

Delete the instruction that can not make performance improve in the alternative instruction set of the described overall situation.

5. dispatching method according to claim 1, is characterized in that, chooses instruction and be packed into described remaining command and postpone groove and further comprise from the alternative instruction buffer in described part and/or the alternative instruction buffer of the described overall situation:

Judge whether the alternative instruction buffer in described part is empty, if, from the alternative instruction buffer of the described overall situation, choose instruction and be packed into instruction delay groove, otherwise the instruction of choosing best performance from the alternative instruction buffer in described part and the alternative instruction buffer of the described overall situation is packed into remaining command, postpone groove.

6. according to the dispatching method described in arbitrary claim in claim 1 to 5, it is characterized in that, described instruction is assembly instruction.