A kind of digital signal processor with dynamic submitting pipeline function
Technical field:
The present invention relates to a kind of nextport universal digital signal processor NextPort, particularly a kind of digital signal processor with dynamic submitting pipeline function.
Background technology:
Current, in the design of high-performance digital signal processor, pipelining is occupied very important position in the performance performance.In at present popular processor, carry out the method that combines be widely used (as PowerPC 620, MIPS R10000 etc.) based on the scheduling of Tomasulo algorithm and prediction.
The Important Thought that realizes this method is to allow the out of order execution of instruction still must submit to according to the order of sequence, and should avoid any behavior that can't recover to take place before instruction is submitted to.Different with general DLX streamline, realize that this method need add an extra step and promptly instruct submission.Add submit state and require instruction execution sequence to change not to the utmost, and it is submitted up to instruction with the result that holds instruction to require to increase a cover hardware buffer station.
Certain this method is compared on performance with common streamline and is greatly improved, and this also is the reason that present processor generally adopts this method.But because interior most important one of the thought of its realization is to submit to according to the order of sequence, limited the further lifting of performance to a certain extent, also reduced the utilization factor of resource.
Causing the reason of above-mentioned technology limitation is that this method can produce the wait submit state.For example in fact, referring to Fig. 1, this figure adopts the streamline sequential of carrying out the streamline executive routine 1 of the method that combines based on the dynamic dispatching and the prediction of Tomasulo algorithm.
The program of carrying out among the figure 1 comprises 5 instructions, and contents of program is as follows:
DIV AR1, AR2; Instruction 1
MOV * AR3+, AR4; Instruction 2
MOV * AR3+, AR5; Instruction 3
MOV * AR3+, AR6; Instruction 4
MOV * AR3+, AR7; Instruction 5
Among this figure, in the IS presentation directives emission cycle, ID is the instruction decode cycle, and EX is the instruction performance period, and WB is that the result writes back the cycle, and CT is the instruction submission cycle, and WT waits for the submission cycle.
According to our hypothesis, instruction 1 is a divide instruction in the program 1, and generally the performance period of division is all longer relatively in processor, and this is relevant with figure place.Though all comprise a plurality of functional parts that can executed in parallel (as the multiplication and division unit according to the performance element of the streamline of supporting at present dynamically to carry out, ALU, access unit etc.), instruction 2 just can launch, decipher, carries out in the implementation of instruction 1, and just writes back before end is carried out in instruction 1.But owing to carry out the most important principle of method that combines based on the dynamic dispatching of Tomasulo algorithm and prediction is dynamic execution, submits to according to the order of sequence and could submit to after therefore instructing 2 results that write back must wait instruction 1 to submit to.This is because the physical resource of system is limited, instruction 5 because physical register not enough must wait for the front have instruction submit to and the release physical register after could continue to launch, so etc. the time to be launched also just to grow relatively.
Hypothesis is carried out divide instruction now needs 8 performance periods, and hardware also has 8 virtual registers idle, can see that carrying out this section program need spend 17 instruction cycles, and a large amount of waiting statuss occurred in the execution of instruction 2,3,4,5.
If can before instruction 1 is submitted to, allow the submission of other instruction, a large amount of waiting statuss has appearred in the execution of minimizing such as above-mentioned instruction 2,3,4,5, will improve the execution efficient of instruction so to a great extent, reduce the working time of program.For the consideration that above-mentioned technology solves thinking, we have proposed the present invention.
Summary of the invention:
The purpose of this invention is to provide a kind of digital signal processor with dynamic submitting pipeline function.This processor possesses the shortening stand-by period, improves the function of instruction efficient.
As shown in Figure 2, same is example with the example in the background technology, this processor is in the process of execution command 1, instruction 2 is finished to carry out in the instruction cycle 5 and is write back, permission was just submitted in the instruction cycle 6, instruction instruction cycle of 5 is also with regard to 2 instruction cycles like this, so as long as the whole performance period of program is 12 instruction cycles.Compare with 17 instruction cycles of carrying out the method that combines based on the dynamic dispatching of Tomasulo algorithm and prediction and to have saved 5 instruction cycles, reached the purpose that improves instruction efficient with this.
For realizing above-mentioned technique effect, the digital signal processor with dynamic submitting pipeline function of the present invention, the kernel framework comprises procedure control unit, instruction decoding unit, register file, performance element; It is characterized in that, also comprise in the framework being used for the submission control desk that supervisory instruction dynamically submits to; Core framework intermediate range preface control module is connected with submission control desk data with decoding unit respectively in described; Submit to control desk to be connected with register file data; Decoding unit is connected with register file with performance element respectively; Performance element is connected with register file data.
For the submission control desk of mentioning in the above-mentioned processor, when moving, processor is used for writing down the content of the content that comprises conditional execution instruction mark, condition, prediction, the number of register that needs are submitted to etc.
In addition, it should be noted that also the performance element in this processor comprises functional parts such as multiplication and division unit, ALU and storage unit, allow a plurality of parts to carry out simultaneously.
According to above-mentioned kernel framework, functions such as the emission of looking ahead, instructing that procedure control unit wherein is responsible for instructing, jump forecasting, interrupt management, debugging control.Instruction decoding unit is responsible for instruction decode and hardware resource scheduling.Register file is the framework of a multistage mapping, is responsible for management, the scheduling of logic register and physical register.Submitting control desk to is core of the present invention, is used for the dynamic submission of supervisory instruction.
For the principle of foregoing invention, be because the instruction of the processor of definite instruction set meeting occurrence condition jump forecasting and instruction exception is determined, be that decoding unit can be differentiated, the normal execution of the program flow of other instruction, therefore all instruction results before unusual instruction may take place for next band condition jump forecasting or instruction must take place, and also therefore the result of these instructions can dynamically submit to.Local dynamically submission just is meant that two other instructions between the condition instruction are dynamically to submit to.
The present invention adds instruction and submits control desk on the basis of general processor architecture, make processor after transmitting instructions, enter decoding unit, be conditional execution instruction or might produce unusual if decoding unit identifies present instruction, the extra mark that adds when submiting command being sent to the submission control desk, and instruction type is also sent into the submission control desk preserve up to instructing submitted.
When submitting to, all instructions in submitting control desk to before the markd submiting command of article one can be considered every possible angle, and just can submit to earlier if the submission condition has been satisfied in certain bar instruction wherein.So not only can improve the execution efficient of instruction but also can guarantee to instruct the correctness of carrying out.
Description of drawings:
Fig. 1 is based on the execution graph of Tomasulo algorithm when the execution pipeline program 1.
Fig. 2 is the execution graph of the present invention when execution pipeline program 1.
Fig. 3 is the cut-away view of digital signal processor provided by the present invention.
Fig. 4 is a performance element structured flowchart in the digital signal processor provided by the present invention.
Fig. 5 is the execution graph of Tomasulo algorithm when execution pipeline program 2 among the embodiment.
Fig. 6 is the execution graph of the present invention when execution pipeline program 2 among the embodiment.
Embodiment:
Further set forth the present invention below in conjunction with relevant drawings by embodiment.
As shown in Figure 3, in the digital signal processor provided by the present invention, procedure control unit links to each other with the submission control desk with decoding unit, decoding unit and other four part all have confidential relation, performance element links to each other with decoding unit with register file, submits to control desk can influence program flow control module and register file.
When work disposal, the program flow unit obtains instruction, analysis instruction from command memory, if conditional execution instruction then carries out branch prediction, the combined command that predicts the outcome sends to decoding unit.Decoding unit obtains the instruction that procedure control unit provides, analyzed, and to each functional part of performance element, register file and submit control desk application resource requirement to, if surplus resources can satisfy the instruction demand, just to for the performance element functional part send instruction, simultaneously command information is sent into the submission control desk.Write the number etc. of register of content, the needs submission of content that the content of submitting control desk to should comprise conditional execution instruction mark (the most important executive condition of the present invention), condition, prediction.Instruction comprises the register that read register and mark instructions will be revised to the required register of register application instruction in decoding and write-in functions parts.If conditional execution instruction to revise register, also need in addition special mark.
As shown in Figure 4, performance element comprises functional parts such as multiplication and division unit, ALU and storage unit, allows a plurality of parts to carry out simultaneously.After satisfying executive condition (instruct required operand ready) here, the instruction in the functional part just can really execute instruction, functional part can many instructions of the same type of buffer memory, mandatoryly after satisfying executive condition, can carry out earlier for wherein any one, with the sequence independence that enters functional part, realize dynamically carrying out.
After in functional part, obtaining the execution result of instruction, the result is write back physical register wait for and submitting to, if submit to control desk to find that wherein the required submission content of certain bar instruction is included in when not having the conditional forecasting instruction before the instruction, allows instruction to submit to.The instruction of utilization before article one conditional forecasting instruction can out of order submission characteristic realize local out of order submission.
Following mask body illustrates.
As shown in Figure 2, be example (contents of program pass away scape technology) with regard to program 1, the performance period of individual instructions is identical behind use the present invention, the performance period of instruction 1 still is 8 cycles, instruction 2 is transmitted in the instruction cycle 5 in the instruction cycle 2 and writes back, owing to there not being the conditional forecasting instruction before the instruction 2, therefore instruct 2 in just submission of beat 2, discharge the space of two physical registers simultaneously.The situation of instruction 3, instruction 4 is basic identical, instruction 5 should be in instruction cycles 5 emission under the sufficient situation of resource, but because the processor register resources is limited, therefore need wait until after other instruction discharges register resources and could launch, again owing to the dynamic dispatching based on the Tomasulo algorithm needs the instruction cycle 12 that the release of instruction register just be arranged with method such as the accompanying drawing 1 that the prediction execution combines, therefore at instruction cycles 13 ability firing order 5.And when utilizing the present invention to carry out this program, instruction 2 was just submitted to and has been discharged register resources in the instruction cycle 6, therefore just can firing order 5 in the instruction cycle 7.Utilize the present invention, 12 instruction cycles of 1 needs of program are just finished operation.More many soon than the method that dynamic dispatching and prediction execution based on the Tomasulo algorithm combine.
Certainly, owing to do not comprise conditional forecasting instruction in the program 1, therefore can only find out the advantage of dynamic submission and embody the local characteristic of submitting to.Describe in detail below by enumerating program 2.
Program 2 comprises 9 instructions, and contents of program is as follows:
DIV AR1, AR2; Instruction 1
MOV * AR3+, AR4; Instruction 2
MOV * AR3+, AR5; Instruction 3
MOV * AR3+, AR6; Instruction 4
BCC L1, AR6==#0; Instruct 5 assumed conditions to be false
DIV AR1, AR2; Instruction 6
MOV * AR3+, AR4; Instruction 7
MOV * AR3+, AR5; Instruction 8
MOV * AR3+, AR6; Instruction 9
As Fig. 5, shown in Figure 6, to carry out the method that combines based on the dynamic dispatching of Tomasulo algorithm with prediction as can be seen and spent 26 instruction cycles, and adopt method of the present invention only to spend 17 instruction cycles, advantage is still obvious.
Instruct 1,2,3,4 instruction decode by normal process in its former because program 2,, therefore when decoding, the type of mark and instruction should be write the submission control desk because instruction 5 is conditional forecasting instructions.Whether the instruction of submitting to control desk always to detect in the control desk can be submitted to, finds to instruct 2 can submit in the instruction cycle 6, and does not have conditional order at it, therefore finishes submission in the instruction cycle 6.In like manner, out of order submission is also finished in instruction 3,4,1.Though instruct 5 result prior to instructing 1 to write back, but because instruction 5 is conditional forecasting instructions, in order to guarantee to instruct the correctness of carrying out, instruction 5 could be submitted instruction 5 after must the instruction before being in it all submitting end to, therefore instruct 5 need to wait for instruction 1 submit to after the submission result.In like manner, instruction 6,7,8,9 just can out of orderly be submitted to after instruction 5 submissions such as having only.Out of order submission is the boundary with the instruction of conditional forecasting, realizes local out of order submission.
More than be based on one of related embodiment of the principle of the invention,, do not spend performing creative labour, on the basis of the foregoing description, can do multiple variation, can realize purpose of the present invention equally for those skilled in the art.But this variation obviously should be in the protection domain of claims of the present invention.