CN101866280B

CN101866280B - Microprocessor and execution method thereof

Info

Publication number: CN101866280B
Application number: CN201010185596.9A
Authority: CN
Inventors: 杰拉德·M·卡尔; 罗德尼·E·虎克; 布莱恩·W·伯格
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2009-05-29
Filing date: 2010-05-19
Publication date: 2014-10-29
Anticipated expiration: 2030-05-19
Also published as: CN101866280A

Abstract

A microprocessor and an execution method thereof are used for pipelined out-of-order execution in-order retire. The microprocessor includes a branch predictor that predicts a target address of a branch instruction, a fetch unit that fetches instructions at the predicted target address, and an execution unit that: resolves a target address of the branch instruction and detects that the predicted and resolved target addresses are different; determines whether there is an unretired instruction that must be corrected and that is older in program order than the branch instruction, in response to detecting that the predicted and resolved target addresses are different; execute the branch instruction by flushing instructions fetched at the predicted target address and causing the fetch unit to fetch from the resolved target address, if there is not an unretired instruction that must be corrected and that is older in program order than the branch instruction; and otherwise, refrain from executing the branch instruction.

Description

Microprocessor and manner of execution thereof

Technical field

The present invention relates to the non-field of microprocessors of carrying out in proper order, the particularly execution of its branch instruction.

Background technology

Superscalar microprocessor (superscalar microprocessors) has multiple performance elements (execution units), in order to carry out the instruction set (instruction set) of microprocessor.Superscalar microprocessor improves treatment efficiency by multiple performance element (multiple execution unit), and therefore superscalar microprocessor can side by side be carried out multiple instruction in each clock period.The key that makes superscalar microprocessor have this latent effectiveness improvement is must allow order persistence be supplied to performance element to carry out; Otherwise the usefulness of superscalar microprocessor can't be better than the usefulness of scalar micro-processor, and superscalar microprocessor can spend more hardware cost than scalar micro-processor.For example, execution units is written into and stores ordering calculation unit (instruction operands), calculated address, actuating logic and calculating operation, and resolves (resolve) branch instruction (branch instructions).If use the more performance element of quantity and type, in each clock period, microprocessor will for each performance element, to give the scope of programmed instruction crossfire (program instruction stream) of instruction for change also larger.This is commonly referred to the pre-searching ability (lookahead capability) of microprocessor.

Have a kind of method to be used to improve the pre-searching ability of microprocessor, it allows the not order execution of amenable to process of instruction, and is commonly referred to as the non-microprocessor (out-of-order executionmicroprocessor) of carrying out in proper order.Although instruction can non-ly be carried out sequentially, most microprocessor architecture designs still need instruction amenable to process order resignation (retired in program order).The state of the microprocessor architecture design in other words, being affected by instruction results only can amenable to process order be updated.

Non-execution in proper order and the microprocessor of retiring from office in proper order generally include considerable pipeline stages (pipelinestages), so be sometimes called super pipeline (super-pipelining).Microprocessor has one of reason of multi-line level so: if the instruction set architecture of microprocessor allows instruction to change length, conventionally just need considerable pipeline stages at pipeline front end, in order to the crossfire of differentiation command byte (undifferentiated instruction bytes) is not done to grammatical analysis (parse), and the instruction of grammatical analysis (parsed instructions) is translated into micro-order.

Although in microprocessor Design field, using branch predictor is helpful to usefulness, in the microprocessor of super pipeline, it is well known to the adverse effect of usefulness that branch instruction occurs.Particularly, the level of instruction is provided in the predicted branches destination address in order to provide according to branch predictor (predictedbranch target address), and make extraction apparatus (fetcher) start to extract between the level of instruction being different from the destination address parsing of predicted branches destination address (resolved target address), if the quantity of pipeline stages is more, the loss of efficacy (penalty) that is relevant to branch misprediction (branch mispredictions) is also larger.

Therefore, need a kind of effectively manner of execution badly, in order to carry out branch instruction in the non-microprocessor of carrying out in proper order and retire from office in proper order.

Summary of the invention

One embodiment of the invention provides a kind of pipelineization the non-microprocessor of carrying out in proper order and retiring from office in proper order, comprising: branch predictor, extraction unit and at least one performance element.Branch predictor is in order to the predicted target address of predicting branch instructions.Extraction unit is coupled to branch predictor, in order to extract branch instruction from above-mentioned predicted target address.Performance element, is coupled to extraction unit, in order to: resolve the destination address of branch instruction, and whether predicted target address is different from parsed destination address; In the time that predicted target address is different from parsed destination address, judge whether to have procedure order older in branch instruction and the not instruction retired that need to be corrected; If do not have procedure order older in branch instruction and the not instruction retired that need to be corrected, remove the branch instruction of the error prediction being extracted by predicted target address, and extraction unit is extracted by parsed destination address, to carry out branch instruction; And if have procedure order older in above-mentioned branch instruction and the instruction that need to be corrected, temporary respite branch instruction.

Another embodiment of the present invention provides a kind of manner of execution, in order to carry out branch instruction in the non-execution in proper order of pipelineization and micro-processor of retiring from office in proper order, comprising: predicting branch instructions will be resolved to the first extraction path and according to above-mentioned prediction by the first extraction path extraction branch instruction; After above-mentioned prediction and extraction step, resolve branch instruction to the second extraction path, second extracts path is different from the first extraction path; Judge whether to have procedure order older in branch instruction and the not instruction retired that need to be corrected; If do not have procedure order older in branch instruction and the not instruction retired that need to be corrected, remove the error prediction branch instruction of being extracted by the first extraction path, and change by the second extraction path extraction branch instruction, to carry out branch instruction; And if have procedure order older in branch instruction and the not instruction retired that need to be corrected, temporary respite branch instruction.

Brief description of the drawings

Fig. 1 is the calcspar according to microprocessor of the present invention;

Fig. 2 is the operational flowchart of Fig. 1 microprocessor, and it is carried out in order to the selectivity that non-microprocessor is in proper order described.

[main element symbol description]

100～microprocessor; 102～instruction memory cache;

124～order format device; 126～format instruction queue;

104～instruction transfer interpreter; 128～translate instruction queue;

106～register alias table; 108～reservation station;

166～be written into unit; 164～execution logic unit;

162～branch instruction comparison logical block;

158～steering logic unit; 152,154,156～register;

112～performance element; 114～retirement unit;

118～branch predictor; 116～reformation impact damper;

122～extraction unit; 172～totalizer;

170,174～extraction address; 176～predicted target address; 138～re-execute instruction;

136～correct branch address; 134～branch correction signal;

178～instruction; 146～be written into instruction to miss label;

144～integer instructions re-executes label;

142～branch instruction error prediction label;

132～clear signal.

Embodiment

Fig. 1 is the calcspar of microprocessor 100 of the present invention.Microprocessor 100 comprises the pipeline being made up of multiple level (stages) or functional units (functional units), this pipeline comprises extraction unit (instruction fetch unit) 122, instruction memory cache (instruction cache) 102, order format device (instruction formatter) 124, format instruction queue (formatted instructionqueue) 126, instruction transfer interpreter (instruction translator) 104, translate instruction queue (translatedinstruction queue) 128, register alias table (register alias table) 106, reservation station (reservation station) 108, performance element (execution unit) 112, and retirement unit (retireunit) 114.Microprocessor 100 also comprises branch predictor (branch predictor) 118, is couple to extraction unit 122.Microprocessor 100 also comprises reformation impact damper (reorder buffer) 116, is couple to register alias table 106, reservation station 108, performance element 112 and retirement unit 114.

Performance element 112 comprises and is written into unit (load unit) 166, execution logic unit (executionlogic) 164, and branch instruction comparison logical block (branch checking logic) 162, above-mentioned each be all coupled to steering logic unit (control logic) 158.Performance element 112 also comprises register (register) 156, register 154, and register 152.Register 156 is in order to the oldest reformation impact damper label (ROB tag) that is written into instruction (oldest missing load instruction) of missing of storage; Register 154 is in order to the reformation impact damper label of the oldest integer instructions being re-executed (oldest replaying integerinstruction) of storage; Register 152 is in order to the reformation impact damper label of the oldest mispredicted branch instruction (oldest mispredicted branch instruction) of storage, and above-mentioned each register is all coupled to steering logic unit 158.Steering logic unit 158 produces correct branch address (correct branch address) 136 to extraction unit 122.Steering logic unit 158 also produce branch correction signal (branch correct signal) 134 to extraction unit 122, order format device 124, format instruction queue 126, instruction transfer interpreter 104, translate instruction queue 128 and register alias table 106.In one embodiment, in performance element 112, except being written into unit 166, all unit are included in single performance element (being in multiple integer units (integer unit)), are performance elements 112 discrete with integer unit and be written into unit 166.

In one embodiment, microprocessor 100 is x86 architecture microprocessor.The microprocessor that can correctly be implemented as the major applications program of x86 architecture microprocessor design is called x86 architecture microprocessor.If microprocessor wishes to obtain correct result, application program need to correctly be carried out.Microprocessor 100 is carried out and resignation sequentially sequentially by non-instruction.Therefore, even if the non-result that produces sequentially instruction of performance element 112, till retirement unit 114 can wait until that this instruction is the instruction completing the oldest in microprocessor 100, just can be upgraded by the result of instruction the architecture states of microprocessor 100.

Extraction unit 122 provides and extracts address 170 to instruction memory cache 102, extracts instruction in order to specify next extraction address from instruction memory cache 102.Totalizer 172 extracts to produce to be connected in the extraction address 174 that the next one of address 170 continues by extracting address 170 increments (increment), extracts address 174 and is provided to extraction unit 122.Extraction unit 122 also receives predicted target address (predicted target address) 176 from branch predictor 118.Extraction unit 122 also receives correct branch address 136 from performance element 112.As described below, extraction unit 122 is by selecting one as the extraction address 170 providing to instruction memory cache 102 in aforementioned provided multiple addresses.

If steering logic unit 158 sends branch correction signal 134, extraction unit 122 is selected correct branch address 136; If branch predictor 118 predicted branch direction occur, extraction unit 122 is selected predicted target address 176; Otherwise extraction unit 122 is selected the extraction address 174 of continuing.The predicted branches address of branch instruction be along with along pipeline and under branch instruction and be provided.If branch predictor 118 predicted branches do not occur, predicted branches address is the extraction address 174 of continuing; If branch predictor 118 predicted branches occur, predicted branches address is predicted target address 176.Branch predictor 118 may error prediction (mispredict) branch instruction, therefore need microprocessor 100 to proofread and correct this error prediction, correct instruction can be extracted and carry out.If then correction branch (as discussion after a while) of performance element 112, predicted branches address becomes correct branch address 136.Predicted branches address is along with branch instruction and has about other information of instruction 178 and be provided to performance element 112.

In microprocessor 100, the technical scheme that is of value to the branch instruction of error recovery prediction can be described respectively by following steps.

Step 1: performance element 112 is resolved branch instruction.In other words, performance element 112 receives to resolve the input operand (input operands) of branch instruction and judges branch direction and branch address according to input operand.Particularly, performance element 112 checks the specified condition code of branch instruction (condition code), make branch be about to occur or can not occur to judge this condition code whether to meet the branch condition specified by branch instruction, and performance element 112 also carry out the destination address of Branch Computed instruction according to the specified source operand of branch instruction (source operands).Resolving after the branch direction and destination address of branch instruction, no matter because branch predictor 118 has been predicted wrong direction (branch occurs or do not occur) and/or wrong branch target address, performance element 112 all can have been judged branch predictor 118 error predictions branch.For the purpose of simplifying the description, below explanation hypothesis: branch predictor 118 predicted path A (path A) and extraction unit 122 are extracted by path A; But path B (path B) is only correct path.

Step 2: then, performance element 112 is carried out branch instruction.In other words, performance element 112:(1) notify (tell) register alias table 106 to stop sending with charge free (dispatch) instruction; (2) remove (flush) pipeline front end; And the correct branch address 136 that (3) are provided by performance element 112 points out correct path B, to notify extraction unit 122 to start to extract at correct path B.Register alias table 106 is afterbodies (last stages) of pipeline, and amenable to process order receives instruction.The front end of pipeline is the part before register alias table 106.In a lot of situations, microprocessor 100 often has much instructions of the older branch instruction in (older) error prediction, must before the branch instruction of error prediction becomes the oldest instruction and retired from office, first amenable to process order be retired from office.Therefore, during branch instruction becomes the oldest instruction, microprocessor 100 extracts and processes the well instruction of (good) (extract from correct path B instruction), and fills up the front end of pipeline with this.

Step 3: last, because the end of pipeline comprises the instruction (because extracting from wrong path A) that should not be performed, retirement unit 114 is retired from office branch instruction and removed the end of pipeline.The end of pipeline is the part after register alias table 106.

Step 4: retirement unit 114 notifies register alias table 106 again to start to send with charge free instruction, the instruction of extracting and processing by correct path B in the time that step 2 is carried out branch instruction when performance element 112.

Because before retirement unit 114 is retired from office the branch instruction of error prediction, microprocessor 100 perhaps just can the correct branch address 136 in pipeline front end start to extract and processing instruction, therefore if the branch instruction described in step 2 can be not according to procedure order to be performed (the early stage correction that is error prediction) be helpful.In other words, the instruction of being extracted by correct path B can be shorter than performance element 112 by the N a being performed clock period and not carry out branch instruction according to error prediction, empty waiting until branch instruction is ready to the time that will retire from office.The maximal value of N starts to count by start to carry out branch instruction (being error recovery prediction) when microprocessor 100, until arrive the clock period of register alias table 106 from first instruction of " correct " individual path B.In one embodiment, because branch's loss (branch penalty) is up to 17 clock period, therefore so work is helpful especially.Particularly, in one embodiment, be redirected to (redirected) to new individual path once extract address 170, arrive register alias table 106 until come from first instruction of new individual path, need 10 clock period of cost.In other words,, by early stage correction, the clock period of branch's loss is hidden (hidden); Be noted that microprocessor 100 can will start and clock number that retirement unit 114 is ready between the instruction of the error prediction of will retiring from office stashes between proofreading and correct.

But the instruction of non-execution/correction branch sequentially might not be helpful.Below illustrate that the instruction of the non-execution/correction branch sequentially of microprocessor 100 does not but produce situation about benefiting.Particularly, this situation is branch predictor 118 predicting branch instructions correctly really, but performance element 112 is but for example, because receive incorrect input operand (condition code and/or destination address calculation operations unit), and resolve improperly branch instruction.Then, performance element 112 is assert branch predictor 118 error prediction branches (branch direction and/or the branch address that are performance element 112 judgement predictions do not meet branch direction and/or the branch address of having resolved) mistakenly, and carries out/proofread and correct this branch.Why performance element 112 receives wrong input operand, no matter be because branch instruction is that directly or indirectly the link by correlativity (chain of dependencies) is relevant with the condition code of older branch instruction and/or the operand of originating, and do not provide correct input operand to performance element 112.For example, missed compared with the old somewhere that is written into the correlativity link of instruction in data quick storer and stale data (stale data) is provided.For the purpose of simplifying the description without the technical scheme of benefiting, below hypothesis branch fallout predictor 118 correctly predicted path A and extraction unit 122 extracted by path A, its step is described below:

Step 1: as the step 1 of above-mentioned helpful technical scheme, performance element 112 has been resolved branch instruction and pointed to path B.

Step 2: as the step 2 of above-mentioned helpful technical scheme, the instruction of carry out/correction branch of performance element 112 is to path B.

Step 3: continue in the execution/correction of step 2, than the executed of step 1 and 2/the old instruction of correction branch instruction becomes the oldest instruction, and retirement unit 114 starts to re-execute instruction and all newer instructions the oldest in (replay) reformation impact damper 116, comprising the branch instruction of executed/ proofreaied and correct.Re-execute expression retirement unit 114 and remove the end of pipeline, and send and re-execute instruction 138 to reservation station 108 from reformation impact damper 116, also again send sequentially (re-dispatch) with charge free to reservation station 108 by all effective instruction in reformation impact damper 116.If (the older instruction being re-executed is the branch instruction of error prediction, and, in the time that the older branch instruction in just re-executing is at present carried out/proofreaied and correct to performance element 112, the front end of pipeline also can be eliminated)

Step 4: in the time re-executing in step 2 the identical branch instruction that is performed/proofreaies and correct, performance element 112 is resolved and judged that path A is correct path but not path B after branch instruction.This expression: in step 2, in fact the instruction being eliminated by pipeline front end is the instruction being correctly extracted.Unfortunately, this also represents: microprocessor 100 must be proofreaied and correct " correction " in step 2, carried out now.(be noted that in re-executing, " prediction " that performance element 112 is seen is path B, the path that performance element 112 is proofreaied and correct in step 2; But, not but that branch predictor 118 is predicted toward path B " prediction "; Exactly, should " prediction " be in the time that it carries out/proofread and correct this branch instruction, to be predicted in step 2 by performance element 112.)

Step 5: be similar to above-mentioned steps 2, according to re-executing done parsing in step 4, carry out/correction branch of performance element 112.But in step 4, performance element 112 is that path is proofreaied and correct as path A.Therefore execution/correction that step 2 is done is a shortcoming, this is because execution/correction of doing of step 2 causes microprocessor 100 to remove those branch predictors predicted path A correctly before step 1, and started the instruction of extracting and processing, and those identical instructions must again be extracted (re-fetched) and again be processed (re-processed) at pipeline front end now.

The technical scheme of the error recovery predicting branch instructions that the above-mentioned nothing of following explanatory memorandum is benefited, i.e. first predicted path A of branch predictor 118, and extraction unit 122 extracts branch instruction by path A.Then, performance element 112 is resolved and is branched into path B, in fact because performance element 112 receives wrong input operand, so path B is incorrect, and carry out in step 2/proofread and correct, make extraction unit 122 extract branch instruction by (mistake) path B.But branch instruction is still re-executed (because an older instruction make its like this), and during re-executing, performance element 112 is resolved and is branched into path A, and wherein path A is correct path.Because performance element 112 receives the correct input operand of branch instruction during re-executing, performance element 112 is resolved and is branched into path A during re-executing.This is because do not provide the instruction in the correlativity link of correct result to produce now correct result in resolving for the first time, and correct result is provided for performance element 112, in order to resolve branch instruction.In other words, resolve for the first time branch compared to performance element 112, condition code flag and/or destination address calculation operations unit used in this re-executes are different.Therefore, performance element 112 is carried out/is proofreaied and correct and makes extraction unit 122 to extract branch instruction by path A.

For this problem (can enjoy the advantage of helpful scheme, also can reduce the possibility without the scheme of benefiting), microprocessor 100 is the branch of the non-prediction of execution error sequentially of meeting conventionally; But, microprocessor 100 also can attempt to differentiate (identify), and some resolve to branch instruction the situation of error prediction (being that branch predictor 118 correctly predict) the most commonly mistakenly, and can't non-ly carry out sequentially in these situations/" error prediction " branch that correction has been resolved of microprocessor 100.More particularly, microprocessor 100 can be attempted to tell these branch instructions and must be re-executed, and in re-executing, is resolved the modal situation of correctly having predicted into.In one embodiment, when above-mentioned modal situation occurs in an older instruction in branch instruction and will be re-executed, that is:

(1) older branch is resolved as error prediction

(missed) missed in (2) older instructions that are written into

(3) older integer instructions are wrong (faulted)

" error prediction " branch having resolved by temporary respite, microprocessor 100 need again not extract and again process identical instruction at pipeline front end, because it is extracted and processes when predicted for the first time in branch.

Which kind of situation can be included into and consider is decidable in design, and this decision weigh in: make complexity/operating speed/power consumption increase because considering a given situation and cost increases, and make the gain and loss of usefulness between not good because not considering above-mentioned situation, wherein above-mentioned gain and loss is relevant with its occurrence frequency in some clock period and average loss of efficacy in essence.

With reference to figure 1, reformation impact damper 116 is organized into a circle queue (circular queue) and has multiple projects (entry) distributes to each instruction that is dispatched into reservation station 108 by register alias table 106.In reformation impact damper 116, each project has a relevant index (index), and the scope of its value is 0 to (n-1), and wherein n is the number of project in reformation impact damper 116.Register alias table 106 amenable to process orders are assigned the project in (allocates) reformation impact damper 116 sequentially for each instruction.Therefore, can make comparisons to the index of two instructions in reformation impact damper 116 or label, and judge that whichever is the oldest in the order of program.

Microprocessor 100 is carried out the prediction execution (speculative execution) that is written into instruction.In other words, microprocessor 100 hypothesis are written into instruction and always can hit (hit) data quick storer.Then, in the case of not knowing whether to obtain correct being written into data, reservation station 108 send use be written into data as the instruction of source operand to performance element 112.Therefore, instruction (for example branch instruction) may receive incorrect data, and this is to have used the wrong data that are written into owing to older instruction directly or indirectly.Detecting that when being written into unit 166 be written into instruction (load instruction) has missed in data quick storer, and must will be written into instruction and re-execute time, be written into unit 166 instruction that is written into that is written into instruction of missing at data quick storer is missed to label 146 and exported steering logic unit 158 to, miss and be written into instruction the label that label 146 is reformation impact damper.Steering logic unit 158 by the label in register 156 (the oldest label that is written into instruction of missing) be written into instruction and miss label 146 and make comparisons.If be written into instruction, to miss label 146 be older, and steering logic unit 158 is missed label 146 and upgraded register 156 to be written into instruction.Steering logic unit 158 maintains the label that is written into instruction of being missed the oldest in microprocessor 100 by this.

Similarly, in the time that execution logic unit 164 detects that an integer instructions (integer instruction) need to be re-executed, the integer instructions of the integer instructions that execution logic unit 164 need to be re-executed re-executes label 144 and exports steering logic unit 158 to, and integer instructions re-executes the label that label 144 is reformation impact damper.Steering logic unit 158 re-executes label 144 by the label in register 154 (label of the oldest integer instructions being re-executed) and integer instructions and makes comparisons.If it is older that integer instructions re-executes label 144, steering logic unit 158 re-executes label 144 with integer instructions and upgrades register 154.Steering logic unit 158 maintains the label of the integer instructions being re-executed the oldest in microprocessor 100 by this.

Moreover, detect while being error prediction when branch instruction is resolved and be branched instruction comparison logical block 162, branch instruction comparison logical block 162 exports the branch instruction error prediction label 142 of mispredicted branch instruction to steering logic unit 158, and branch instruction error prediction label 142 is the label of reformation impact damper.The label in register 152 (label of the oldest mispredicted branch instruction) and branch instruction error prediction label 142 are made comparisons in steering logic unit 158.If branch instruction error prediction label 142 is older, steering logic unit 158 upgrades register 152 with branch instruction error prediction label 142.Steering logic unit 158 maintains the label of mispredicted branch instruction the oldest in microprocessor 100 by this.

According to the present invention, Fig. 2 is the operational flowchart of Fig. 1 microprocessor 100, and it is in order to illustrate the optionally non-operation of carrying out in proper order branch instruction of microprocessor 100.Flow process begins in step 202.

In step 202, performance element 112 is resolved branch instruction and is judged that it is error prediction.Flow process advances to determining step 204.

At determining step 204, performance element 112 is made comparisons branch instruction error prediction label 142 and the label (label of the oldest mispredicted branch instruction) in register 152, in order to judge whether that the older not branch instruction of resignation is error prediction and therefore needs to proofread and correct.If had, flow process advances to step 206; Otherwise flow process advances to determining step 208.

In step 206, performance element 112 can postpone going non-newer (newer) branch instruction being resolved in step 202 as error prediction of proofreading and correct sequentially/carry out.In step 202, be resolved and will can not be retired from office for the newer branch instruction of error prediction, because before the branch instruction of newer error prediction has an opportunity to become the branch instruction of the oldest error prediction, the branch instruction of older error prediction can make the branch instruction of newer error prediction be eliminated in end of line.The invention has the advantages that microprocessor 100 can be avoided the shortcoming of above-mentioned poor technical scheme by postpone the non-branch instruction being resolved as error prediction of proofreading and correct sequentially/carry out in step 202.In other words,, if prove the path of predicting branch instructions correctly of branch predictor 118, need again do not extracted and again be processed at pipeline front end from the instruction of the path extraction of correct Prediction.Flow process terminates in step 206.

At determining step 208, performance element 112 is missed the instruction that is written into of the label in register 156 (the oldest label that is written into instruction of missing) and error prediction branch label 146 and is made comparisons, in order to judge whether that the older instruction that is written in error prediction branch is missed.If had, flow process advances to step 212; Otherwise flow process advances to determining step 214.

In step 212, performance element 112 can postpone going the non-branch instruction being resolved in step 202 as error prediction of proofreading and correct sequentially/carry out.In step 202, be resolved and will can not be retired from office for the branch instruction of error prediction, because have an opportunity to become before the oldest instruction in the branch instruction of error prediction, being written into when instruction becomes the oldest instruction in computing machine of missing can make the branch instruction of error prediction re-execute.The invention has the advantages that microprocessor 100 can be avoided the shortcoming of above-mentioned poor technical scheme by postpone the non-branch instruction being resolved as error prediction of proofreading and correct sequentially/carry out in step 202.Flow process terminates in step 212.

At determining step 214, performance element 112 re-executes label 144 by the label in register 154 (label of the oldest integer instructions being re-executed) with the integer instructions of error prediction branch and makes comparisons, in order to judge whether to be labeled in order to re-execute the integer instructions of (marked) and the older branch instruction in error prediction.If had, flow process advances to step 216; Otherwise flow process advances to step 218.

In step 216, performance element 112 can postpone going the non-branch instruction being resolved in step 202 as error prediction of proofreading and correct sequentially/carry out.In step 202, be resolved and will can not be retired from office for the branch instruction of error prediction, because have an opportunity to become before the oldest instruction in the branch instruction of error prediction, the integer instructions re-executing can make the branch instruction of error prediction re-execute become the oldest instruction in computing machine time.The invention has the advantages that by postponement and go the non-branch instruction being resolved as error prediction of proofreading and correct sequentially/carry out in step 202, microprocessor 100 can be avoided the shortcoming of above-mentioned poor technical scheme.Flow process terminates in step 216.

In step 218, the steering logic unit 158 of performance element 112 provides correct branch address 136 to extraction unit 122.Steering logic unit 158 also sends branch correction signal 134, wherein branch correction signal 134 makes extraction unit 122 select correct branch address 136 to extract address 170 as next, and makes pipeline front end proofread and correct the branch instruction being resolved as error prediction in step 202.In other words, by sending branch correction signal 134, steering logic unit 158 is performed the branch instruction right and wrong of error prediction sequentially, and realizes by this above-mentioned advantage about helpful technical scheme.Flow process advances to step 222.

In step 222, register alias table 106 can stop the instruction of providing and delivering according to sent branch correction signal 134.Flow process advances to step 224.

In step 224, part pipeline before register alias table 106 is removed (flush) (or ineffective treatment (invalidate)) according to sent branch correction signal 134 by all instructions, and extracts instruction and processing instruction on correct branch address 136.Flow process advances to step 226.

In step 226, the branch instruction that retirement unit 114 is judged error prediction be ready to now to retire from office (being that mispredicted branch instruction is the oldest instruction in computing machine), therefore send clear signal 132 to remove register alias table 106 all instructions afterwards, retirement unit 114 is removed new for all instructions of mispredicted branch instruction.The clear signal 132 of sending also provides to register alias table 106, to notify register alias table 106 present circumstances.Flow process advances to step 228.

In step 228, register alias table 106 is sent instruction with charge free again according to sent clear signal 132.Flow process terminates in step 228.

Although the present invention by several embodiment openly as above, it is only in order to as an example, not in order to limit the present invention.Those skilled in the art will be understood that under the premise of without departing from the spirit of the present invention, when doing a little change to the present invention.For example, function, manufacture, model foundation, simulation, various character that software can activation apparatus and method of the present invention, and/or test.It can be reached by different program languages, for example program language (as C, C++), hardware description language (hardware description language, HDL, as Verilog HDL, VHDL), or other possible program languages.Above-mentioned software can be arranged at any known computer read/write memory medium (computer usable medium), for example semiconductor, disk, or CD (as CD-ROM, DVD-ROM).Within apparatus and method of the present invention may be included in any semiconductor Wise property core (semiconductor IP core), for example, (embed) microcontroller core with HDL, or in the time that integrated circuit is manufactured, be transferred to hardware.In addition, the present invention may realize by the combination of hardware and software.Therefore, the present invention should not limited by any embodiment described herein, and the present invention is defined according to the appended claim apparatus/method equivalent with it.Specifically, the present invention can be arranged in the micro processor, apparatus of general service computing machine.Finally, those skilled in the art will be understood that: do not departing under the prerequisite of the scope of the invention being defined by claims, it can be using concept disclosed by the invention and specific embodiment as basis, in order to design or revise other frameworks and carry out the object identical with the present invention.

Claims

1. a microprocessor, carries out and instruction retired in proper order in proper order in order to pipelineization is non-, comprising:

One memory cache;

One branch predictor, in order to predict a predicted target address of a branch instruction;

One extraction unit, is coupled to above-mentioned branch predictor, in order to extract instruction from above-mentioned predicted target address; And

One performance element, is coupled to said extracted unit, in order to:

Resolve the destination address of above-mentioned branch instruction, and whether detect above-mentioned predicted target address different from parsed destination address;

In the time that above-mentioned predicted target address is different from above-mentioned parsed destination address, judge whether to have procedure order older in above-mentioned branch instruction and in memory cache, miss one not resignation be written into instruction;

If do not had, procedure order is older is written into instruction in above-mentioned branch instruction and the above-mentioned not resignation missed in memory cache, remove the instruction of being extracted by above-mentioned predicted target address, and make said extracted unit by above-mentioned parsed target address fetch instruction, to carry out above-mentioned branch instruction; And

If have procedure order older in above-mentioned branch instruction and in memory cache, miss above-mentioned not resignation be written into instruction, the above-mentioned branch instruction of temporary respite,

Wherein this microprocessor further comprises the pipeline with front end, and wherein this front end comprises this memory cache, this branch predictor and this extraction unit and do not comprise this performance element,

Wherein this microprocessor is configured to:

If had, procedure order is older is written into instruction in above-mentioned branch instruction and the above-mentioned not resignation missed in memory cache, postpones resignation and is resolved the branch instruction into error prediction; And

Re-execute and be above-mentionedly written into instruction, above-mentioned branch instruction and above-mentioned instruction of extracting from predicted target address, and from memory cache, again do not extract them, and in the front end of pipeline, again do not process them.

2. microprocessor as claimed in claim 1, also comprise a memory element, be written into the reformation impact damper label of instruction in order to leave not resignation the oldest in procedure order and that miss in memory cache, wherein the reformation impact damper label of above-mentioned performance element by more above-mentioned branch instruction and above-mentioned in procedure order not resignation the oldest and that miss in memory cache be written into the reformation impact damper label of instruction, to judge whether to have, procedure order is older is written into instruction in above-mentioned branch instruction and the above-mentioned not resignation missed in memory cache.

3. microprocessor as claimed in claim 1, also comprises:

One register alias table, sequentially receives multiple programmed instruction in order to amenable to process, and said procedure instruction is dispatched into the multiple above-mentioned performance element of above-mentioned microprocessor to carry out non-execution in proper order; And

Be included in more than first pipeline stages in the front end of pipeline, before being positioned at above-mentioned register alias table, above-mentioned more than first pipeline stages comprises above-mentioned branch predictor and said extracted unit,

Wherein above-mentioned performance element is by providing above-mentioned parsed destination address to above-mentioned extraction unit and sending a signal and carry out above-mentioned branch instruction, above-mentioned register alias table stops sending with charge free instruction according to above-mentioned signal, above-mentioned more than first pipeline stages is according to all instructions in it of above-mentioned signal removal, and said extracted unit starts to extract instruction according to above-mentioned signal by above-mentioned parsed destination address.

4. microprocessor as claimed in claim 3, also comprises:

One retirement unit, in order to the amenable to process said procedure instruction of sequentially retiring from office; And

More than second pipeline stages, after being positioned at above-mentioned register alias table, comprises multiple above-mentioned performance elements and above-mentioned retirement unit,

Wherein in the time that above-mentioned retirement unit is judged above-mentioned branch instruction and is the oldest not instruction retired in above-mentioned microprocessor, above-mentioned retirement unit is eliminated all programmed instruction of above-mentioned more than second pipeline stages, and make after all programmed instruction of above-mentioned more than second pipeline stages are eliminated in above-mentioned retirement unit, above-mentioned register alias table starts to send with charge free programmed instruction to multiple above-mentioned performance elements.

5. a manner of execution, in order to carry out a branch instruction in the non-microprocessor of carrying out in proper order and retire from office in proper order of a pipelineization, the pipeline stages of this microprocessor comprises memory cache, branch predictor, extraction unit and performance element, this manner of execution comprises:

Predict that by branch predictor above-mentioned branch instruction will be resolved to one first extraction path and extract path extraction instruction according to above-mentioned prediction by above-mentioned first by extraction unit;

After above-mentioned prediction and extraction step, carry out following steps by performance element:

Resolve above-mentioned branch instruction to one second and extract path, above-mentioned second extracts path is different from above-mentioned the first extraction path;

Judge whether to have procedure order older in above-mentioned branch instruction and in memory cache, miss one not resignation be written into instruction;

If do not had, procedure order is older is written into instruction in above-mentioned branch instruction and the above-mentioned not resignation missed in memory cache, remove by above-mentioned first and extract the instruction that extract in path, and change by above-mentioned the second extraction path extraction instruction, to carry out above-mentioned branch instruction; And

If have procedure order older in above-mentioned branch instruction and in memory cache, miss above-mentioned not resignation be written into instruction:

The above-mentioned branch instruction of temporary respite;

Postpone resignation and be resolved the branch instruction into error prediction; And

Re-execute above-mentioned be written into instruction, above-mentioned branch instruction and the above-mentioned instruction from the first extraction path extraction, and from memory cache, again do not extract them, and in the front end of pipeline, again do not process them, wherein this front end comprises memory cache, branch predictor and extraction unit and does not comprise performance element.

6. manner of execution as claimed in claim 5, also comprises:

Leave not resignation the oldest in procedure order and that miss in and be written into the reformation impact damper label of instruction in memory cache, wherein above-mentioned judge whether to have reformation impact damper label that the older step that is written into instruction in above-mentioned branch instruction and the above-mentioned resignation missed in memory cache of procedure order comprises more above-mentioned branch instruction with above-mentioned in procedure order the oldest and not retiring from office of missing in memory cache be written into the reformation impact damper label of instruction.

7. manner of execution as claimed in claim 5, wherein the step of the above-mentioned branch instruction of above-mentioned execution also comprises:

Stop sending with charge free instruction; And

Remove all instructions that are positioned at before sending instruction with charge free in the pipeline stages of above-mentioned microprocessor.

8. manner of execution as claimed in claim 7, also comprises:

Be positioned at all instructions before sending instruction with charge free in the pipeline stages of removing at above-mentioned microprocessor after, judge whether above-mentioned branch instruction is not instruction retired the oldest in above-mentioned microprocessor;

In above-mentioned branch instruction is above-mentioned microprocessor when the oldest not instruction retired, makes to be positioned at all instructions of sending with charge free after instruction and be eliminated in the pipeline stages of above-mentioned microprocessor; And

In the pipeline stages of above-mentioned microprocessor, be positioned at after all instructions of sending with charge free after instruction are eliminated, restart to send with charge free instruction.