CN1560732A

CN1560732A - Processor of static data bypass and register file data write control and compiling method

Info

Publication number: CN1560732A
Application number: CNA2004100167542A
Authority: CN
Inventors: 琚小明; 史册; 姚庆栋; 李东晓; 高磊; 刘鹏
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2004-03-03
Filing date: 2004-03-03
Publication date: 2005-01-05
Anticipated expiration: 2024-03-03
Also published as: CN1276345C

Abstract

The invention discloses a microprocessor and computer system, aims to provide a processor and compiling method for static data by-pass and writing control to the register file data, especially for the compiling control for flow register in media processor. The invention provides a media processor, the processor has no hardware by-pass logic, includes six grades of flows. The invention also provides a compiling method for realizing static data by-pass and writing control to the register file data, when using above mentioned media processor, no hardware by-passed logic, the data channel of flow by-pass is used to transmit data needed to be by-passed, it only needs to the hardware reserve the data channel.

Description

Static data bypass and the register file data of realizing writes the processor and the Compilation Method of control

Technical field

The present invention relates to microprocessor and computer system, more particularly, the present invention relates to a kind ofly static realize data bypass and register file data is write the processor and the Compilation Method of control, relate to the way of compiling control of flowing water register in the Media Processor especially.

Background technology

Existing DSP and Media Processor have adopted the design of instruction level parallelism (ILP) usually.In traditional ILP processor that a plurality of parallel functions unit is arranged a subject matter that limits its scale is arranged, Here it is shares annexation between register file (RF) and the functional unit FU.Can encounter problems when very long instruction word (VLIW) (VLIW) structure processor need be supported a large amount of parallel work-flow, especially when supporting a plurality of functional unit, it is very complicated that RF and bypass circuit become.

Because functional unit is to adopt flowing water,, in processor, all designed bypass circuit usually in order to reduce the relevant time-delay that causes of director data.The task of bypass circuit is be responsible for to judge in the flowing water between instruction correlation of data and send relevant data to need instruction in advance.The design of hardware bypass circuit is the correlativity of coming recognition data according to the address of instruction register manipulation number in flowing water, and long flowing water or a plurality of streamline will increase the complicacy of bypass circuit greatly.

On the other hand, in traditional The pipeline design, bypass circuit does not reduce the write indegree of streamline to register file when transmitting the bypass value in advance.Usually, streamline in the end one-level will be exported the result and store register file into, further use in order to the back instruction.Sometimes data no longer include instruction later on through bypass and use these data, and there is no need this result is write register file this moment.

Summary of the invention

The objective of the invention is to overcome deficiency of the prior art, a kind of static processor and Compilation Method that realizes data bypass and register file data is write control is provided.

In order to solve the problems of the technologies described above, the present invention is achieved by the following technical solutions:

The invention provides and a kind ofly static realize data bypass and register file data is write the Media Processor of control that this processor does not have the hardware bypass logic, comprises 6 grades of flowing water, the IF level is instruction fetch and calculates new PC value; The ID level is instruction decoding, reading of data from register file; The DA level is to calculate the memory address of source operand; The DM level is memory access, reading of data from storer; The EX level is ALU operation and MAC operation, executing instruction operations; The WB level is that the output result with ALU or load instruction writes register file.

The present invention also provides a kind of static data bypass and register file data of realizing to write the Compilation Method of control, during the Media Processor of stating before use, does not have the hardware bypass logic, and the data channel of streamline bypass is used to transmit need be by the data of bypass.

Among the present invention, on the register file data passage, enable control when using Media Processor as claimed in claim 1, can whether write back register file by control data.By the data of bypass is that flowing water register by correspondence provides.Providing the flowing water register of bypass data to be numbered and name, is visible to compiler, the numbering of flowing water register and the pipelining-stage of bypass data is provided is corresponding, and numbering had both been represented the flowing water register, also represent the pipelining-stage of correspondence.

The flowing water register pair compiler of being numbered among the present invention is visible, but can not be used for the programming of application program, and compiler is by compiling information s1 and s2, only with control reading of data from which flowing water register.2 compiling information allows the data dependence of continuous at the most five instructions to solve through bypass in 6 grades of flowing water.

Enable control on the data channel writing of register file among the present invention, whether this enables control and is responsible for data being write in the register file.Enabling control is provided by compiling information d, promptly whether the result is write register file by the d decision.Compiling information d depends on the life cycle of register, is meant from instruction to write destination register to the last instruction number that uses this register value of instruction, and its calculating is to consider in global scope rather than in fundamental block;

Register is to calculate according to the delay between the processor different instruction life cycle among the present invention.

Compared with prior art, the invention has the beneficial effects as follows:

Realize that with static state the data bypass function need not each bypass data is compared, only need hardware to keep the data channel of bypass, the structure of hardware is very simple.

Description of drawings

Fig. 1 is an assembly instruction coded format of the present invention;

Position and the corresponding name of bypass data flowing water register in flowing water that provide of the present invention is provided Fig. 2;

Fig. 3 is the synoptic diagram of counter register life cycle in the compilation process of the present invention;

Fig. 4 in the compilation process of the present invention through the synoptic diagram of counter register life cycle behind the instruction scheduling;

Fig. 5 is a life cycle and a corresponding compiling information algorithm flow chart that is used for counter register of the present invention.

Embodiment

Describe the present invention with concrete example below in conjunction with accompanying drawing:

Usually, processor carries out after decoding (ID) level is selected a plurality of input data by the bypass logic circuit correlation of data to be analyzed, and promptly processor hardware has been realized the function of bypass.This can just analyze comparison to the data dependence between the instruction by compiler (static state) when compiling equally, thus that data are relevant and need the information of bypass data to encode, and this coded message is placed in the coding of instruction, after instruction decode, control the transmission of related data.Fig. 1 is several main instruction type coded format of designed instruction set, wherein OPCODE accounts for 7, register Rs, Rt, Rd and several immediately Shamt account for 5 respectively, auxiliary addressing register Arm, Arn and register Dst, Src1, Src2 account for 3 respectively, and address offset amount dispm, dispn and addressing mode Modm, Modn account for 4 respectively.At this our special arrangements relevant compiling information s1, the s2 of data account for 2 respectively, jump instruction does not here need the relevant compiling information coding of data, number instruction immediately only needs 2 compiling information.

The instruction of RE-type1 and RE-type2 type all is the indirect addressing mode, and the former is the address that obtains data by side-play amount of background register plus-minus, and the latter is that the plus-minus by background register and indexed registers obtains data address.The P-type type is the parallel instruction coded format, supports the executed in parallel of addition, subtraction and multiplication, and the executed in parallel of storage (store) instruction and arithmetic, logic instruction, has all arranged the data association message position here.

The data association message that compiling information s1 that data are relevant and s2 represent source operand 1 and source operand 2 respectively, table 1 are coding and the contrasts of compiling information of compiling information bit s1, s2, the semanteme of descriptive information.00 this operand of expression does not have data dependence, and 01 this operand of expression need be from the bypass data of DM level, and 10 and 11 by that analogy.The transmission of bypass data is controlled according to data association message s1 and s2, for example the s1 of an instruction and s2's is encoded to 00 and 10, then showing between the instruction of first source operand of this instruction and front does not have correlativity, and second source operand needs the bypass value of front instruction in the EX level.What compiling will be done is, is 00 o'clock at s1 or s2, if source operand is the indirect addressing mode, what then send back the ID level is the bypass value of DA level.

??S1(2bits)	??00	??01	??10	??11
??S1(2bits)	??00	??01	??10	??11	Compiling information	Do not have relevant	Bypass DM level	Bypass EX level	Bypass WB level

Table 1

The register title	??RDAA	??RDM	??REX	??RWB
The register title	??RDAA	??RDM	??REX	??RWB	Register number	??00	??01	??10	??11

Table 2

Because the DA level is the address computation that is used for operand, its output result is the memory address of operand, here do not need to consider between memory address and the data dependence of between memory address and register manipulation number, therefore, when considering the correlativity of command source operand, needn't consider the bypass value of DA level.Like this, the bypass value that is transmitted back to the ID level has DM level, EX level and WB level.In order to make compiler control transmission, give name and numbering respectively to the flowing water register to the value of want bypass.Different with the flowing water register in the common streamline, the flowing water register pair compiler here is visible, and compiler can be controlled and when fetch data from these register read.Table 2 is flowing water register name and corresponding numbering, gives name and corresponding numbers to the flowing water register that the bypass value is provided.Provide the position of flowing water register in flowing water of data bypass to see Fig. 2.

Numbering	Instruction	Compiling information s1	Compiling information s2	Enable to control d
Numbering	Instruction	Compiling information s1	Compiling information s2	Enable to control d	????1	????lw??r2， ????34(r8)	????00	????00	?0
????2	????lw??r3， ????45(r8)	????00	????00	?1	????1	????lw??r2， ????34(r8)	????00	????00	?0
????2	????lw??r3， ????45(r8)	????00	????00	?1	????3	????add?r4， ????r2，r6	????01	????00	?1
????4	????sub?r5， ????r3，r4	????01	????10	?1	????3	????add?r4， ????r2，r6	????01	????00	?1
????4	????sub?r5， ????r3，r4	????01	????10	?1	????5	????sw??r5， ????r2(0)	????10	????11	?0

Table 3

Table 3 is the assembly instruction code of one section designed processor and the compiling information of corresponding director data correlativity.Instruct add relevant with article one lw director data as can be seen, first source operand r2 of instruction add need instruct lw at the bypass value RDM of DM level, and this is could read data because the lw instruction must be waited until the DM level from internal memory.In like manner, instruction sub is relevant with second lw instruction and add director data; Instruction sw needs the bypass value REX and instruction lw bypass value RWB in WB level of sub instruction in the EX level.

On the data dependence basis between the compiler analysis instruction, obtain the relevant compiling information of director data according to following several rules:

For the instruction of load/store class, to just can obtain data in the DM level the earliest, wherein the store instruction finishes in the DM level, and the load instruction enters subsequently EX level and WB level.

For arithmetical operation and logic instruction, to can get bypass data in the EX level the earliest, and enter next pipelining-stage WB.

Do not write down the compiling information of its data dependence for jump instruction.

In the DSP of 6 level production lines structure, because DA, DM, EX and WB pipelining-stage have the bypass value, add the instruction of ID level, the data dependence of continuous at the most five instructions can solve through bypass in 6 grades of flowing water.Therefore, compiler is when collecting compiling of instruction information s1 and s2, and the window size of instruction dependency analysis is five instructions, i.e. compiling information record be data dependence between continuous five instructions.

If the spacing between the data dependent instruction (instruction numbers between the two data dependent instructions) greater than the numbering (seeing Table 2) of the pipelining-stage register that requires in the above-mentioned rule, is then determined the pipelining-stage of the bypass of being wanted with spacing; If spacing is less than or equal to the numbering of the pipelining-stage register that requires in the above-mentioned rule, then determine the pipelining-stage of the bypass of being wanted with the requirement in first, second rule.

For example, spacing between sw and the sub instruction is 0, and sub instruction according to the second rule as far back as the EX level (register be numbered 10, be metric 2) can obtain the bypass value, because of the numbering of spacing 0 less than register, so adopt regular b, by the value of EX level bypass value, so the compiling information s1 that the sw director data is correlated with is 10 as instruction sw.

In Fig. 1, compiling information d accounts for 1, this be control WB level whether data are write back register file enable control compiling information bit.Because jump instruction does not have destination register, so the J-type type instruction does not need to enable control compiling information bit here.Except J-type, compiling information bit d has all been arranged in other all instruction.The instruction of IE-type, RE-type1 and RE-type2 type all is the indirect addressing mode, and the former adds/subtract the address that a side-play amount obtains data by background register, and the latter is adding/subtracting and obtain data address by background register and indexed registers.

d(1bit)	0	?1
d(1bit)	0	?1	Compiling information	Allow the WB level to write RF	Forbid that the WB level writes RF

Table 4

Enabling in the order number form controlled compiling information d and is used to control streamline WB level whether will export the compiling information that the result writes register file.Table 4 is coding and its semantic tables of comparisons of compiling information d, and when d was 0, expression allowed the result that will instruct to write register file, promptly carries out normal water operation; On the contrary, when d was 1, expression was forbidden the result of this instruction is write register file.

When compiling, the data dependence of compiler between analysis instruction, can also determine the life cycle of every instruction destination register according to data dependence, be meant the life cycle of register from instruction and write destination register to the last instruction number that uses this register value of instruction.Fig. 3 is register analysis result life cycle of having described instruction code, round dot represents that register definitions instruction once uses relative position of each instruction between the instruction of this register to the end, here suppose that the instruction of back can not use the register that defines in this fundamental block, promptly do not have RAW relevant.

Compiler is the scan instruction code one by one, to the instruction of definition destination register is arranged, its back of scanning there is the relevant instruction of RAW with it, the last instruction of this register and the instruction number between the defined instruction of using of record, this instruction number is exactly the life cycle of register.Be 4 the life cycle as r2 among Fig. 3, and be 2 the life cycle of r3, r4, and be 1 etc. the life cycle of r5.

By to the register analysis of life cycle, compiler can be known the ultimate range of the existence between the dependent instruction.Correlation of data is not to be limited in the fundamental block, should carry out in global scope the analysis of register life cycle, analyzes the situation that the result that obtains could true reflection reality like this.Therefore, be in global scope to the analysis of register life cycle, and every paths among the data correlogram DDG (data dependence graph) analyzed, with the longest register life cycle as this register last result of life cycle.

In addition, register life cycle determined dissimilar combined situation between also needing to consider to instruct.Because there is different time-delays between the different instruction types, have only the existence of considering time-delay, the register life cycle that obtains in the time of just making compiling, the actual conditions when moving were consistent.Time-delay is meant that the flowing water that occurs pauses because data are correlated with between the instruction, but this flowing water pauses and can be used by instruction scheduling.For 6 grades of designed flowing water Media Processors, time-delay situation between different instruction is as follows: the time-delay between the load/sotore instruction is 2, time-delay between arithmetic/logical order and the store instruction is 3, not time-delay between other two kinds of instruction types, in addition, transfer instruction (branch) has 1 time-delay.

The data dependence situation explanation of code in streamline all do not have time-delay, but between instruction SUB and the SW 3 time-delays arranged among Fig. 3 between article one LW instruction and ADD, second LW and SUB, ADD and the SUB.Because uncorrelated instruction scheduling of no use or nop utilize time-delay, actual conditions do not conform in the life cycle of causing compiling among Fig. 3 to analyze like this to obtain and the flowing water.Therefore, the influence of time-delay between also should considering to instruct to the analysis of register life cycle, the instruction code among Fig. 4 utilizes after the time-delay analysis again to obtain register life cycle through instruction scheduling or with nop.

Time-delay can be returned the bypass value from bypass logic the influence of life cycle and clearly find out, for example the analysis result to register r5 life cycle is 1 among Fig. 3, yet, as can be seen from Figure 4, the SUB instruction will be arrived the EX level and the result could be sent back to the ID level, has 3 flowing water to pause between these two instructions and exists.Like this, in service at actual code, be 4 the life cycle of r5, rather than the result 1 of preceding surface analysis.Similarly, be 7 rather than 4 the actual life cycle of r2.Therefore, after the influence of consideration different instruction type time-delay to the life cycle of register, need to adjust the life cycle of register.

For six grades of designed flowing water Media Processors, there are 4 pipelining-stages can return the bypass value, allowing maximum register life cycle like this is 5, promptly the RAW of the 5th instruction behind the definition register variable is relevant can also provide the value that needs by bypass.Register life cycle can only be from the register file read operands greater than 5 instruction.In order to obtain enabling to control compiling information d, the algorithm of counter register life cycle has been proposed, see Fig. 5.

After instruction scheduling and code optimization, use this algorithm and come counter register life cycle, and follow following principle:

Set up DDG and in global scope, carry out, rather than be confined in the fundamental block.

For the situation that has mulitpath among the DDG, promptly RAW is relevant is present between the different fundamental blocks, this moment with register life cycle the longest that value as the foundation of generation compiling information d.

At different instruction types, the relative position of differentiating article one RAW dependent instruction is to determine whether the time-delay groove all utilizes.If time-delay does not all utilize, then fill up with nop, adjust the node state value among the DDG then.

Determine the value of compiling information d life cycle according to register.

Utilize the register analysis result of life cycle, can conveniently obtain enabling to control the value of compiling information d, see Table 3.The instruction of every definition register all had enable to control compiling information d accordingly, like this, when one-level WB is carried out to the end in instruction in streamline, d will control whether the execution result that instructs is write back to register file.

As can be seen from Figure 4, the result of calculation of second LW, ADD and SUB can realize all using through bypass logic, and these values do not need to write back register file, and the corresponding compiling information d of these instructions is changed to 1.Article one, the value of the r2 in the LW instruction can only partly be utilized through bypass logic in streamline, must write back register file and continue to use in order to instruction SW, and compiling information d is changed to 0.

At last, it is also to be noted that what more than enumerate only is specific embodiments of the invention.Obviously, the invention is not restricted to above examples of implementation, many distortion can also be arranged.All distortion that those of ordinary skill in the art can directly derive or associate from content disclosed by the invention all should be thought protection scope of the present invention.

Claims

1, a kind of static data bypass and register file data of realizing writes the Media Processor of control, it is characterized in that comprising six grades of flowing water, and wherein the IF level is instruction fetch and calculates new PC value; The ID level is instruction decoding, reading of data from register file; The DA level is to calculate the memory address of source operand; The DM level is memory access, reading of data from storer; The EX level is ALU operation and MAC operation, executing instruction operations; The WB level is that the output result with ALU or load instruction writes register file.

2, a kind of static data bypass and register file data of realizing writes the Compilation Method of control, it is characterized in that using Media Processor as claimed in claim 1, do not have the hardware bypass logic, the data channel of streamline bypass is used to transmit need be by the data of bypass.

3, Compilation Method as claimed in claim 2 is characterized in that using Media Processor as claimed in claim 1, enables control on the register file data passage, can whether write back register file by control data.

4, Compilation Method as claimed in claim 2 is characterized in that described data by bypass are that flowing water register by correspondence provides.

5, Compilation Method as claimed in claim 2, it is characterized in that the described flowing water register of bypass data that provides is numbered and named, to compiler is visible, the numbering of flowing water register and the pipelining-stage of bypass data is provided is corresponding, numbering had both been represented the flowing water register, also represented corresponding pipelining-stage.

6, Compilation Method as claimed in claim 5, it is characterized in that described flowing water register pair compiler of being numbered is visible, but can not be used for the programming of application program, compiler is by compiling information s1 and s2, only with control reading of data from which flowing water register.

7, Compilation Method as claimed in claim 5 is characterized in that 2 compiling information allows the data dependence of continuous at the most five instructions to solve through bypass in 6 grades of flowing water.

8, Compilation Method as claimed in claim 3 is characterized in that enabling control on the data channel writing of register file, and whether this enables control and be responsible for data being write in the register file.

9, Compilation Method as claimed in claim 8, it is characterized in that enabling control is provided by compiling information d, promptly whether the result is write register file by the d decision.

10, Compilation Method as claimed in claim 9, it is characterized in that described compiling information d depends on the life cycle of register, be meant from instruction and write destination register that its calculating is to consider rather than in fundamental block in global scope to the last instruction number that uses this register value of instruction; Register is to calculate according to the delay between the processor different instruction life cycle.