CN1581070A

CN1581070A - Device and method for selectively covering return godown to respond the detection of non-standard return sequence

Info

Publication number: CN1581070A
Application number: CN 200410079837
Authority: CN
Inventors: G·葛兰·亨利; 汤玛斯·麦当劳
Original assignee: INTELLIGENCE FIRST CO
Current assignee: INTELLIGENCE FIRST CO; IP First LLC
Priority date: 2003-10-06
Filing date: 2004-09-23
Publication date: 2005-02-16
Anticipated expiration: 2024-09-23
Also published as: TW200513961A; TWI281121B; CN1291311C

Abstract

A microprocessor for predicting a target address of a return instruction is disclosed. The microprocessor includes a BTAC and a return stack that each makes a prediction of the target address. Typically the return stack is more accurate. However, if the return stack mispredicts, update logic sets an override flag associated with the return instruction in the BTAC. The next time the return instruction is encountered, if the override flag is set, branch control logic branches the microprocessor to the BTAC prediction. Otherwise, the microprocessor branches to the return stack prediction. If the BTAC mispredicts, then the update logic clears the override flag. In one embodiment, the return stack predicts in response to decode of the return instruction. In another embodiment, the return stack predicts in response to the BTAC predicting the return instruction is present in an instruction cache line. Another embodiment includes a second, BTAC-based return stack.

Description

A kind ofly respond the non-standard detection of returning sequence and selectivity covers the device and method of return stack prediction

Technical field

The present invention refers to a kind of relevant for the link order destination address prediction of using return stack and branch target address caching relevant for the correlative technology field of the branch prediction in a kind of microprocessor especially.

Background technology

Microprocessor is the digital device of the specified instruction of a kind of computer program.Modern microprocessor is generally pipeline.That is, in the different blocks or pipeline stage of microprocessor, can make many instruction runnings simultaneously.Hennessy and Patterson are defined as pipeline " technical application that one kind of multiple instructions can be carried out simultaneously ".At " Computer Architecture:A Quantitative Approach " (second edition), the Morgan Kaufmann publisher by San Francisco, California published in 1996, and John L.Hennessy and david A.Patterson show.They provide the following good example of pipeline:

Pipeline and assembly line are similar.In automobile assembly line, many steps are arranged, each step all has some contribution for the construction of automobile.Though on different automobiles, each step can operate concurrently with other step.In the computing machine pipeline, each step in the pipeline can be finished the some in the instruction.As assembly line, different step can be finished the different parts in the different instruction concurrently.In these steps each is called pipeline stage or line section.These stages can be connected to the next stage with a stage, enter from an end and form pipeline-instruction meeting, carry out via these stages, and leave from the other end, just as the automobile in the assembly line.

Microprocessor operates according to frequency period.Usually, be unit with a frequency period, the instruction meeting reaches another stage from a stage of microprocessor pipeline.In automobile assembly line, if the worker in a stage of this line is not because of having automobile to operate and being in idle state, then the production of this line or usefulness can reduce.Similarly, if the microprocessor stage because do not have instruction to operate and during a frequency period, be in idle state (being commonly referred to the incident of pipeline foam (pipeline bubble)), then the usefulness of processor can reduce.

The potential cause of pipeline foam is a branch instruction.When meeting with branch instruction, processor must determine the destination address of branch instruction, and beginning locates to extract instruction in destination address (rather than the next sequential address after branch instruction).Because determine the pipeline stage of destination address just to be positioned at after the stage of extracting instruction, clearly so foam is produced by branch instruction.As beneath more discussion, microprocessor generally includes branch prediction mechanism, to reduce the number by the foam that branch instruction was produced.

A kind of specific pattern of branch instruction is a link order.In order to make program circuit return to the purpose of calling out routine (it is to make the program control routine of transferring to secondary routine), link order is normally by the performed last instruction of secondary routine.In typical agenda, call out routine and can carry out call instruction.Call instruction can be indicated microprocessor, and the return address is pushed storehouse in the internal memory, makes the address branch of secondary routine then.The return address that pushes storehouse is the address that is next to the call instruction instruction afterwards of calling out in the routine.Secondary routine finally can be carried out link order, and it can make the return address leave storehouse (it is before to push by call instruction), and can make return address branch, and it is the destination address of link order.One example of link order is X86 RET instruction.One example of call instruction is the x86 call instruction.

One of advantage of sequence is called out/returned in execution is to make secondary routine call out nidoization.For example, main routine can be called out secondary routine A, in order to push the return address; And secondary routine A can call out secondary routine B, in order to push the return address; Secondary then routine B can carry out link order, in order to being released by the return address that secondary routine A pushed; Secondary then routine A can carry out link order, in order to being released by the return address that main routine pushed.The notion that the secondary routine of nido formula is called out is very useful, and upward example may extend to and the as many calling degree of depth of supported storehouse size.

Because the regular nature of calling/link order sequence so Modern microprocessor can be used the branch prediction mechanism that is called return stack usually, is predicted the destination address of link order.Return stack is the minibuffer device, in the mode that last in, first out, gets the return address soon.When running into call instruction, the return address that pushes the internal memory storehouse also can push return stack at every turn.When running into link order at every turn, can release in the return address on the top of return stack, and be used for being used as the predicted target address of link order.Because microprocessor needn't be waited for the return address of being extracted from the internal memory storehouse, so this running can reduce foam.

Owing to call out/return the regular nature of sequence, return stack prediction link order destination address is very accurate usually.Yet, it has been observed by the present inventors that some program (as some operating system) can always not carry out calling/link order with canonical form.For example, execution be positioned at the x86 microprocessor on program code (code) can comprise calling (CALL), push (PUSH) then, in order to different return addresses is placed on the storehouse, return (RET) then, it can cause getting back to the return address that pushes, rather than gets back to the address (it pushes storehouse by CALL) of the instruction behind the CALL.In another example, program code can be carried out PUSH, in order to the return address is placed on the storehouse, can carry out CALL then, can carry out two RET instruction then, it can cause getting back to the return address that pushes in the incident of the 2nd RET, rather than gets back to the instruction behind the CALL that is positioned at before the PUSH.This behavior meeting is because return stack, and causes prediction error.

Therefore, need a kind of more accurately predicting link order destination address, particularly in order to the device of the program code of carrying out non-standard calling/return.

Summary of the invention

The present invention proposes a kind of device, in order to detect the return stack prediction error and responsively to set covering (override) flag corresponding to link order, so that after the link order that occurs next time, microprocessor can be predicted the destination address of link order immediately by the mechanism except return stack.Branch target address caching (Branch Target Address Cache is called for short BTAC) is used for storing the covering flag corresponding to link order.In one embodiment; other mechanism of destination address in order to the prediction link order is BTAC; it is in normal call/return in the incident of sequence; perhaps usually can be lower than the degree of accuracy of return stack; but it can be comparatively accurate in the incident in order to the program code (code) of carrying out non-standard calling/return sequence.

In a scheme of the present invention, a kind of microprocessor is proposed.This microprocessor comprises return stack, in order to first prediction of the destination address that produces link order.This microprocessor also comprises branch target address caching (BTAC), in order to second prediction of the destination address that produces link order, and covers index in order to produce.If first prediction error is predicted the destination address of the link order of first incident, then cover index and can show predetermined value.This microprocessor also comprises the branch control logic circuit, it is coupled to return stack and BTAC, shows predetermined value if cover index, then for the link order of second incident, with so that this microprocessor is branched off into the destination address of second prediction, do not predict and can not be branched off into first.

In another program, the present invention proposes a kind of in order to improve the device of the branch prediction accuracy in the microprocessor, this microprocessor has branch target address caching (BTAC) and return stack, and each return stack can produce the prediction of the destination address of link order.This device comprises the covering index.This device also comprises the renewal logical circuit, is coupled to the covering index, and as if the destination address of the link order that is occurred by the prediction error prediction first that return stack produced, then it is with being updated to true value so that cover index.This device also comprises the branch control logic circuit, is coupled to the covering index, be true if cover index, then for second link order that occurs, be in order to selection by the prediction that BTAC produced, and can not select the prediction that produces by return stack.

In another program, the present invention proposes a kind of method of predicting the destination address of the link order in the microprocessor.The method comprises in response to the destination address of return stack error prediction link order, is updated to true value and will cover index.The method also is included in after the renewal, the prediction that produces destination address by branch target address caching (BTAC).The method also is included in BTAC and produces after the prediction, judges whether cover index has true value.The method also comprises if cover index having true value, then makes this microprocessor be branched off into the prediction that is produced by BTAC.

In another program, the present invention proposes a kind of in order to improve the device of the branch prediction accuracy in the microprocessor, this microprocessor has return stack and another kind of prediction unit, and each can produce the prediction of the destination address of link order, and branch target address caching (BTAC).This device comprises the covering index.This device also comprises the renewal logical circuit, is coupled to the covering index, and as if the destination address of the link order that is occurred by the prediction error prediction first that return stack produced, then it is used so that the covering index among the BTAC is updated to true value.This device also comprises the branch control logic circuit, is coupled to the covering index, be true if cover index, then for second link order that occurs, by the prediction that another kind of prediction unit produced, and can not select the prediction that produced by return stack in order to selection.

In another program, the present invention proposes the computer data signal that comprised in a kind of transmission medium, comprises computer readable program code, in order to offer microprocessor.This program code comprises first program code, and in order to offer return stack, it is in order to first prediction of the destination address of generation link order.This program code also comprises second program code, and in order to offering branch target address caching (BTAC), it is in order to second prediction of the destination address that produces link order, and covers index in order to produce.If first prediction error is predicted the destination address of the link order of first incident, then cover index and can show predetermined value.This program code also comprises the 3rd program code, in order to offer the branch control logic circuit, it is coupled to return stack and BTAC, show predetermined value if cover index, then for the link order of second incident, with so that microprocessor is branched off into the destination address of second prediction, do not predict and can not be branched off into first.

An advantage of the present invention is can improve potentially to carry out non-standard calling/the return branch prediction accuracy of the program of sequence.When use as described in this embodiment covering mechanism, performed emulation has shown the usefulness improvement of benchmark mark.In addition, if microprocessor has comprised BTAC and another kind of link order destination address forecasting mechanism, then can realize this advantage to increase hardware in a small amount.

After remainder of studying instructions carefully and accompanying drawing formula, other characteristic of the present invention and advantage will can become obviously as can be known immediately.

Description of drawings

Fig. 1 is the calcspar according to pipeline type microprocessor of the present invention;

Fig. 2 is the operation workflow figure according to the microprocessor of Fig. 1 of the present invention;

Fig. 3 is the operation workflow figure according to the microprocessor of Fig. 1 of the present invention;

Fig. 4 is the process flow diagram according to the running of the microprocessor of Fig. 1 of the present invention;

Fig. 5 is the calcspar of pipeline type microprocessor according to another embodiment of the present invention; And

Fig. 6 is the process flow diagram of running of the microprocessor of Fig. 5 according to another embodiment of the present invention.

Wherein, description of reference numerals is as follows:

100,500: pipeline type microprocessor

101,103,105,107,111,1 13,121,123,125,127: the pipeline buffer

The 102:BTAC array

The 104:BTAC return stack

106,126: multiplexer

108: instruction cache

112: the branch control logic circuit

114: instruction decoder

116:F-stage return stack

118: comparer

122:BTAC upgrades logical circuit

124: branch solves logical circuit

132: extract the address

134:BTAC update request signal

136: cover signal

138,154: return (ret) signal

142,144,146,164,176: destination address

148:E-phase targets address signal

152: matched signal not

158: predicted error signal

162: the next address of extracting in proper order

164: predicted target address

168: multiplex (MUX) (mux) selects signal

The 172:override_F signal

The 174:override_E signal

182: totalizer

184: control signal

186: command byte

Embodiment

Referring now to Fig. 1, it shows the calcspar according to pipeline type microprocessor 100 of the present invention.In this embodiment, microprocessor 100 comprises that instruction set meets the x86 framework instruction set microprocessor of (comprising that x86 calls out (CALL) and returns (RET) instruction) in fact.Yet the present invention is not subject to the microprocessor of x86 framework, but can be used for using return stack, predicts in any microprocessor of destination address of link order.

Microprocessor 100 comprises instruction cache 108.Instruction cache 108 can be from the Installed System Memory that is coupled to microprocessor 100, fast instruction fetch byte.Instruction cache 108 is the command byte of peek bar line soon.In one embodiment, fast line taking comprises the command byte of 32 bytes.Instruction cache 108 can receive and extract address 132 from multiplexer 106.Hit instruction cache 108 if extract address 132, then instruction cache 108 can output by the command byte 186 of extracting the specified fast line taking in address 132.In particular, can comprise one or more link orders by the command byte 186 of extracting this specified fast line taking of address 132.Command byte 186 can be via pipeline buffer 121 and 123 and down transmit along microprocessor 100 pipelines, as shown in the figure.Though have only two pipeline buffers 121 and 123 to show the command byte 186 that is used for down transmitting, so other embodiment can comprise more pipeline stage.

Microprocessor 100 also comprises the instruction decoder (being called F-stage instruction decoder 114) of the output that is coupled to pipeline buffer 123.Instruction decoder 114 can receive command byte 186 and relevant information, and command byte is decoded.In one embodiment, microprocessor 100 can be supported the instruction of variable-length.Instruction decoder 114 can receive the crossfire command byte, and order format can be turned to separation command, to judge the length of each instruction.In particular, instruction decoder 114 can make and return (ret) signal 154 generation true value, has been decoded into link order to show it.In one embodiment, microprocessor 100 comprises Reduced Instruction Set Computer (RISC) core in order to the execution micro-order, and instruction decoder 114 can be translated into macro instruction (as the x86 macro instruction) micro-order of original risc instruction set.Micro-order can be via pipeline buffer 125 and 127 and down transmit along microprocessor 100 pipelines, as shown.Though have only two pipeline buffers 125 and 127 to show the micro-order that is used for down transmitting, so other embodiment can comprise more pipeline stage.For example, these stages can comprise register file, address generator, data load/storage element, Integer Execution Units, performance element of floating point, MMX performance element, SSE performance element and SSE-2 performance element.

Microprocessor 100 comprises that also the branch of the output that is coupled to pipeline buffer 127 solves logical circuit (be called E-stage branch and solve logical circuit 124).When branch instruction when microprocessor 100 pipelines down transmit, branch solves logical circuit 124 can receive branch instructions (comprising link order), and the destination address that can determine all branch instructions at last.Branch solves the E-phase targets address signal 148 that logical circuit 124 can offer correct branch instruction destination address the input of multiplexer 106.In addition, if destination address is used for predicted branches instruction, then branch solves logical circuit 124 and can receive predicted target address.Branch solves logical circuit 124 meeting comparison prediction destination addresses and correct destination address 148, and the error prediction that judges whether to make destination address is (as because branch target address caching (BranchTarget Address Cache, abbreviation BTAC) array 102, BTAC return stack 104 or F-stage return stack 116), it all can be in beneath detailed announcement.If make the error prediction of destination address, then branch solves the true value that logical circuit 124 can produce predicted error signal 158.

Microprocessor 100 also comprises branch control logic circuit 112, and it is coupled to multiplexer 106.Branch control logic circuit 112 can generation multiplex (MUX) (mux) be selected signal 168, select multiple Input Address one of them (as described in beneath) in order to control multiplexer 106, and extraction address 132 is used as in output.The running meeting of branch control logic circuit 112 illustrates in greater detail in beneath.

Microprocessor 100 also comprises totalizer 182, extracts address 132 in order to receive, and extraction address 132 is increased, and produces the next address 162 of extracting in proper order, is used as the input of multiplexer 106.If during the given frequency cycle, do not predict or carry out branch instruction, then branch control logic circuit 112 can be selected the next address 162 of extracting in proper order by control multiplexers 106.

Microprocessor 100 also comprises branch target address caching (BTAC) array 102, and it couples in order to receive and extracts address 132.BTAC array 102 comprises a plurality of storage units, or project (entry), and each is in order to get branch instruction destination address and relevant branch prediction information soon.Input to instruction cache 108 and instruction cache 108 when responsively producing the command byte 186 of this line when extracting address 132, BTAC array 102 can produce simultaneously in fact whether branch instruction is present in the prediction in the fast line taking 186, whether the predicted target address and the branch instruction of branch instruction is link order.Helpful is that according to the present invention, BTAC array 102 also can produce the covering index, should predict by BTAC array 102 rather than by return stack in order to the destination address of indication link order, explains as beneath.

The destination address 164 of the link order of being predicted by BTAC array 102 is used for being used as the input of second multiplexer 126.The output of multiplexer 126 (destination address 144) is used for being used as the input of multiplexer 106.Destination address 144 also can be via pipeline buffer 111 and 113 and down transmit along microprocessor 100 pipelines, as shown in the figure.The output of pipeline buffer 113 is called destination address 176.Though have only two pipeline buffers 111 and 113 to show the destination address 144 that is used for down transmitting, so other embodiment can comprise more pipeline stage.

In one embodiment, BTAC array 102 be configured to store 4096 destination addresses and relevant information 2 to set combined type (way set associative) high-speed cache.Yet the present invention is not subject to the BTAC array 102 of a specific embodiment.That in one embodiment, extracts address 132 can select in the BTAC array 102 one group or row than low level.The address label stores each project that is used for BTAC array 102, in order to the higher address position of the address that shows branch instruction (its destination address is stored in the corresponding project).The high bit of extracting address 132 can compare with the address label of each project in the selection group.If extract the high bit and the coupling of the effective address label in the selection group of address 132, then hitting in the BTAC array 102 can be taken place, it shows that BTAC array 102 can the predicted branches instruction be to be present in by in the fast line taking 186 of extraction address 132 selected instructions, and is by exporting with destination address prediction 164 instruction cache 108 simultaneously in fact.

Each project in the BTAC array 102 also can store the indication that is present in by the pattern of extracting the branch instruction in the specified fast line taking 186 of instruction in address 132.That is BTAC array 102 also can store the pattern of branch instruction, and its predicted target address 164 is to be sent to multiplexer 126 by BTAC array 102.In particular, if the branch instruction pattern is a link order, then BTAC array 102 can make and return (ret) signal 138 (it can be sent to branch control logic circuit 112) generation true value.In addition, BTAC array 102 can output cover signal 136, at length discusses as beneath, and it also can be sent to branch control logic circuit 112.In one embodiment, branch instruction pattern field stored in each BTAC array 102 project comprises two positions, and it is with as encoding as shown in the table 1.

00 does not return or calls out

01 calls out

10 normal calls

11 coverings are returned

Table 1

In one embodiment, the most significant digit of branch's pattern field is positioned on the return signal 138, and the least significant bit (LSB) of covering signal is positioned on the covering signal 136.In the incident of call instruction, can not use to cover signal 136.As observable be because the pattern field has been two positions, and four may state in, only use three, cover the position so do not need extra storage unit to hold.Covering signal 136 can be via pipeline buffer 101,103,105 and 107 and down transmit along microprocessor 100 pipelines, as shown.In particular, the output of pipeline buffer 103 (being called covering (override) _ F signal 172) can be sent to branch control logic circuit 112.In addition, the output of pipeline buffer 107 is called override_E signal 174.Though have only four pipeline buffers 101,103,105 and 107 to show the covering signal 136 that is used for down transmitting, other embodiment can comprise more pipeline stage.

In one embodiment, when branch solves the new call instruction of logical circuit 124 solutions, the destination address of call instruction, and the pattern field value that instructs in order to show Calls can be taken in the BTAC array 102 soon.Similarly, when branch solves logical circuit 124 and solves new link order, the destination address of link order, and can be taken at soon in the BTAC array 102 in order to the pattern field value that shows normal link order.

Microprocessor 100 also comprises return stack 104 (being called BTAC return stack 104), and it couples the return signal 138 that comes from BTAC array 102 in order to reception.BTAC return stack 104 can be got by the specified return address of call instruction soon in the mode that last in, first out.In one embodiment, when branch solves the new call instruction of logical circuit 124 solutions, can push the top of BTAC return stack 104 by the specified return address of call instruction.When showing link order via return signal 138, BTAC array 102 is present in when extracting in the specified fast line taking 186 in address 132, the return address that is positioned at the top of BTAC return stack 104 can be released, and is used for being used as the destination address 142 of multiplexer 126.If return signal 138 is very and covers signal 136 for pseudo-that then branch control logic circuit 112 can be controlled the destination address 142 that multiplexer 126 selections are predicted by BTAC return stack 104 via control signal 184.In addition, branch control logic circuit 112 can be controlled the destination address 164 that multiplexer 126 selections are predicted by BTAC array 102 via control signal 184.

Microprocessor 100 also comprises second return stack 116 (being called F-stage return stack 116), and it couples the return signal 154 that comes from instruction decoder 114 in order to reception.F-stage return stack 116 can be got by the specified return address of call instruction soon in the mode that last in, first out.In one embodiment, when branch solves the new call instruction of logical circuit 124 solutions, can push the top of F-stage return stack 116 by the specified return address of call instruction.When instruction decoder 114 showed that via return signal 154 link order has been decoded, the return address that is positioned at the top of F-stage return stack 116 can be released, and was used for being used as the destination address 146 of multiplexer 126.

Microprocessor 100 also comprises comparer 118.Comparer 118 is the destination address 146 and the destination address 176 that down transmits of F-stage return stack 116 relatively.If the destination address of F-stage return stack 116 146 is not mated with the destination address 176 that down transmits, then comparer 118 can make not matched signal 152 (it can be sent to branch control logic circuit 112) generation true value.If return signal 154 be true, if ovenide_F 172 for pseudo-and if matched signal 152 is for true, then branch control logic circuit 112 can be controlled the destination address 146 of multiplexer 106 selection F-stage return stacks 116 via control signal 168.In addition, branch control logic circuit 112 can be controlled multiplexer 106 and select it to import one of them in addition via control signal 168.

Microprocessor 100 comprises that also BTAC upgrades logical circuit 122, and it is coupled to branch and solves logical circuit 124 and BTAC array 102.BTAC upgrades logical circuit 122 can receive the predicted error signal 158 that comes from branch's solution logical circuit 124.BTAC upgrades logical circuit 122 can be received in the override_F signal 174 that comes from pipeline buffer 107.BTAC upgrades logical circuit 122 can produce BTAC update request signal 134, and it can be sent to BTAC array 102.BTAC update request signal 134 comprises the information in order to the project of upgrading BTAC array 102.In one embodiment, BTAC update request signal 134 comprises the destination address of branch instruction, the address of branch instruction and the value of form field.

When branch solves the new branch instruction of logical circuit 124 solutions, BTAC upgrades logical circuit 122 can produce BTAC update request 134, and with the destination address and the pattern of the new branch instruction that occurs thereafter in order to prediction, or for example be by the destination address of extracting the branch instruction in the specified fast line taking of instruction in address 132 and the information of pattern, upgrade BTAC array 102.In addition, if error prediction signal 158 is true, then BTAC upgrades logical circuit 122 meeting generation BTAC update requests 134, upgrades corresponding to the project in the BTAC array 102 of branch instruction.In particular, if branch instruction is by BTAC return stack 104 or by the link order of 116 error predictions of F-stage return stack, then BTAC renewal logical circuit 122 can be designated as predetermined value with the covering position in BTAC array 102 projects, should occur in the next time of link order or during incident with the prediction 146 of the prediction 142 of expression BTAC return stack 104 and F-stage return stack 116, the prediction 164 by BTAC array 102 covers.In one embodiment, the pattern field is set at the covering rreturn value, or 11, specified as above table 1.Otherwise, if cover the position because set, so branch instruction is the link order by 102 error predictions of BTAC array, then BTAC renewal logical circuit 122 can be designated as predetermined value with the covering position in BTAC array 102 projects, should select the prediction 142 of BTAC return stack 104 with expression, and if necessary, the prediction 146 of F-stage return stack 116, and occur the next time that can not be chosen in link order or the prediction 164 of the BTAC array 102 during incident.In one embodiment, the pattern field is set at normal rreturn value, or 10, specified as above table 1.The running of microprocessor 100 will cooperate Fig. 2 to 4 now, do more completely explanation.

With reference now to Fig. 2,, operating state one process flow diagram that shown is according to the microprocessor 100 of Fig. 1 of the present invention.Fig. 2 describes microprocessor 100 in response to the running of predicting link order by BTAC array 102 and the BTAC return stack 104 of Fig. 1.Flow process is from square 202.

At square 202, the extraction address 132 of Fig. 1 is used for instruction cache 108 and the parallel BTAC array 102 of Fig. 1.During response, instruction cache 108 can response extraction addresses 132, and with the command byte 186 of the fast line taking of Fig. 1, are sent to microprocessor 100 pipelines.Flow process can be proceeded square 204.

At square 204, BTAC array 102 can be based on extracting address 132, and via return signal 138, predict that link order is to be present in instruction cache 108 to deliver in the fast line taking 186 of instruction of microprocessor 100, and BTAC array 102 can be sent to multiplexer 126 with destination address 164.Flow process can be proceeded decision block 206.

In decision block 206, branch control logic circuit 112 can judge whether to have set covering index 136.If so, flow process can be proceeded square 212; Otherwise flow process can be proceeded square 208.

At square 208, branch control logic circuit 112 can be controlled multiplexer 126 and multiplexer 106, selects BTAC return stack destination address 142 to be used as and extracts address 132, and make microprocessor 100 to meeting this moment branch.At square 208, flow process can finish.

At square 212, branch control logic circuit 112 can be controlled multiplexer 126 and multiplexer 106, selects BTAC array target address 164 to be used as and extracts address 132, and make microprocessor 100 to meeting this moment branch.At square 212, flow process can finish.

Learn as observing from Fig. 2, if set cover index 136 (as beneath cooperation square 408 described before occurred link order during), branch control logic circuit 112 can help to cover BTAC return stack 104, and the another kind of destination address 164 predicted by BTAC array 102 can selected, by this, if executive routine is just being carried out non-standard calling/return sequence, then almost can avoid certain error prediction that is produced by BTAC return stack 104.

Referring now to Fig. 3, shown is according to a process flow diagram of the operating state of the microprocessor 100 of Fig. 1 of the present invention.Fig. 3 describes the running of microprocessor 100 in response to prediction link order (link order of being predicted among the Fig. 2 as the F-stage return stack 116 by Fig. 1).Flow process is from square 302.

At square 302, the F-stage instruction decoder 114 of Fig. 1 can be in response to the BTAC array 102 in the square 202 that is applied to Fig. 2, and will be present in the fast line taking 186 of the instruction of being exported by instruction cache 108, and next can decode, as described in cooperation Fig. 2 by the link order that BTAC array 102 and BTAC return stack 104 are predicted.In response to via return signal 154, represent the decoded F-stage instruction decoder 114 of link order, F-stage return stack 116 can be sent to multiplexer 106 with the destination address 146 of its prediction.Flow process can be proceeded square 304.

At square 304, destination address 146 and destination address 176 that the F-stage return stack of the comparer 118 meeting comparison diagrams 1 of Fig. 1 is predicted.If does not mate address 146 and 176, then comparer 118 can make the not matched signal 152 of Fig. 1 produce true value.Flow process can be proceeded decision block 306.

In decision block 306, branch control logic circuit 112 can be checked not matched signal 152, judges whether to take place not coupling.If so, then flow process can be proceeded decision block 308; Otherwise flow process can finish.

In decision block 308, the override_F signal 172 of branch control logic circuit 112 meeting controlling charts 1 judges whether to set override_F position 172.If so, flow process can finish (that is the branch in the performed BTAC array target address 164 of the square 212 of Fig. 2 can not replaced by the destination address 146 that F-stage return stack is predicted).If remove override_F position 172, then flow process can be proceeded square 312.

At square 312, branch control logic circuit 112 can be controlled multiplexer 106, and the destination address 146 of selecting F-stage return stack to be predicted can branch and make microprocessor 100 arrive this moment.In one embodiment, before the branch of the destination address 146 that F-stage return stack is predicted, microprocessor 100 can refresh the instruction of F-in the stage on the stage.At square 312, flow process can finish.

Learn as observing from Fig. 3, if set override_F index 172 (as beneath cooperation square 408 described before occurred link order during), branch control logic circuit 112 can help to cover F-stage return stack 116, and another kind can be kept the destination address of being predicted by BTAC array 102 164, by this, if executive routine is just being carried out non-standard calling/return sequence, then almost can avoid certain error prediction that is produced by F-stage return stack 116.

Referring now to Fig. 4, shown is illustrates process flow diagram according to the running of the microprocessor 100 of Fig. 1 of the present invention.Fig. 4 describes microprocessor 100 in response to the running that solves link order (as the link order of the previous incident predicting and decode in Fig. 2 and 3).Flow process is from square 402.

At square 402, the E-stage branch of Fig. 1 solves logical circuit 124 can solve link order.That is branch solves the correct destination address 148 that logical circuit 124 can determine Fig. 1 of link order at last.In particular, if make microprocessor 100 be branched off into the incorrect destination address of link order, then branch solves the error prediction signal 158 generation true value that logical circuit 124 can make Fig. 1.Flow process can be proceeded decision block 404.

In decision block 404, BTAC upgrades logical circuit 122 can check not matched signal 158, judges whether prediction error of link order destination address.If so, then flow process can be proceeded decision block 406; Otherwise flow process can finish.

In decision block 406, BTAC upgrades logical circuit 122 can check override_F signal 174, judges whether to set override_F position 174.If so, flow process can be proceeded square 408; Otherwise flow process can be proceeded square 412.

At square 408, BTAC upgrades logical circuit 122 can produce BTAC update request 134, and the covering position of removing the project of error prediction link order.The present inventor has observed known link order and can have been reached by multiple Program path.That is, return sometimes can by non-standard program code path reach (as above-mentioned program code path one of them), it always can make the destination address of return stack error prediction link order; Yet identical link order also can be by constituting standard call/return the program code path of sequence is reached.In the latter's incident, return stack generally can be predicted the destination address of link order more accurately.Therefore, if when setting when covering the position, the prediction that makes a mistake is then because expectedly be, based on standard call/return sequence, so BTAC upgrades the covering position of logical circuit 122 in can removing squares 408.Flow process can be proceeded square 414.

At square 412, because the destination address of F-stage return stack 116 error prediction link orders can produce BTAC update request 134 so BTAC upgrades logical circuit 122, to set the covering position in the suitable items in the BTAC array 102.By setting the load byte in order to the BTAC 102 of the project of the prediction that stores link order, the present invention helps to solve by non-standard calling/the return problem that sequence produces.That is, the destination address 164 of BTAC array be can be branched off into, and the destination address 142 of BTAC return stack or the destination address 146 that F-stage return stack is predicted can not be branched off into, the destination address of its prediction link order will be incorrect.Flow process can be proceeded square 414.

At square 414, because the link order destination address of the error prediction that incorrect instruction caused can be extracted microprocessor 100 pipelines from instruction cache 108, microprocessor 100 can refresh its pipeline; Therefore, needn't carry out those instructions.Next, branch control logic circuit 112 can be controlled multiplexer 106, selects E-phase targets address 146, and makes microprocessor 100 to this branch, to extract correct target instruction target word.At square 414, flow process can finish.

In one embodiment, according to above table 1, square 412 can upgrade the pattern field of BTAC array 102 projects with binary value 11, and square 408 can upgrade the pattern field of BTAC array 102 projects with binary value 10.

Learn as observing, cover the prediction accuracy that index can be improved link order potentially from Fig. 2 to 4.If microprocessor senses is to the destination address because of return stack error prediction link order, and make a part of non-standard calling of link order executed/return sequence, then microprocessor can be set the covering index corresponding to the link order among the BTAC, and in the next incident of link order, because microprocessor can be from cover index, judge the destination address of the link order that return stack possible errors prediction occurs at present, so microprocessor can use the forecasting mechanism except return stack, predict the destination address of link order.Otherwise, though the non-standard calling of the previous executed some of link order/return sequence, but if microprocessor senses is to the destination address because of BTAC array error prediction link order, and make the next a part of standard call of executed of link order/return sequence, then microprocessor can be removed the covering index corresponding to the link order among the BTAC, and in the next incident of link order, because microprocessor can be from cover index, the judgement return stack may correctly be predicted the destination address of the link order of present appearance, so microprocessor can use return stack, predict the destination address of link order.

Referring now to Fig. 5, shown is the calcspar of pipeline type microprocessor 500 according to another embodiment of the present invention.The microprocessor 500 of Fig. 5 is similar with the microprocessor 100 of Fig. 1, except not comprising BTAC return stack 104 or multiplexer 126.Therefore, the destination address 164 of the prediction of being exported by BTAC array 102 can directly be sent to multiplexer 106, and can be via multiplexer 126.In addition, the destination address of BTAC array 102 (rather than destination address 144 of Fig. 1) can be used for being used as the input of pipeline buffer 111, and can down transmit, and is used as destination address 176.

Referring now to Fig. 6, shown is illustrates the process flow diagram of running of the microprocessor 500 of Fig. 5 according to another embodiment of the present invention.Fig. 6 and Fig. 2 are similar, except decision block 206 and square 208 do not exist; Therefore, flow process can proceed to square 212 from square 204.Therefore, because the BTAC return stack 104 of the microprocessor of Fig. 1 100 and multiplexer 126 are not present in the microprocessor 500 of Fig. 5, so when BTAC array 102 was predicted link order via return signal 138, branch control logic circuit 112 always was used for making microprocessor 500 to be branched off into the destination address of being predicted by BTAC array 102 164.

The microprocessor 500 of Fig. 5 also can operate according to the process flow diagram of Fig. 3 and 4.Be noted that because BTAC return stack 104 is not present in the microprocessor 500, so down the destination address 176 of Chuan Songing is always the destination address 164 of BTAC array 102; Therefore, performed relatively can comparing in the destination address 146 of F-stage return stack 116 and the square 304 between the destination address 176 with the destination address 164 of the BTAC array 102 that down transmits.

, the present invention and purpose thereof, characteristic and advantage comprise other embodiment though but having explained the present invention.For example, have two return stacks though embodiment has illustrated microprocessor, microprocessor can have the return stack of other number, as has only single return stack, or surpasses two return stacks.In addition, though embodiment has illustrated except storing corresponding to the covering position by the link order of return stack institute error prediction, BTAC is still in order to cover the another kind of destination address forecasting mechanism of return stack, but can use other another kind of destination address forecasting mechanism, as branch target buffer.

Moreover, though having explained the present invention, the present invention and purpose thereof, characteristic and advantage comprise other embodiment.The present invention present invention may also be implemented in computing machine and can use in the computer readable code (for example, computer readable program code, data etc.) that is comprised in (for example, can read) medium except use hardware is implemented.Computer program code can make the function of the present invention that discloses at this or make feasiblely, or the two is all feasible.For example, this can be via using general procedure language (for example, C, C++, JAVA and similar program language); The GDSII database; The hardware description language (HDL) that comprises Verilog HDL, VHDL, Altera HDL (AHDL) or the like; Or available other sequencing and/or circuit (for example, sketch plan) equipments of recording are reached in this technology.Computer program code can place any known computing machine (for example can use, can read) medium ((for example comprise semiconductor memory, disk, CD, CD-ROM, DVD-ROM and similar thing) and (for example can use as computing machine, can read) transfer medium is (for example, carrier wave, or comprise any other medium of the medium of numeral, optics or utilization simulation) in the computer data signal that is comprised.Just itself, computer program code can be gone up transmission at communication network (comprising the Internet and internal network).What recognize is, (for example invention can be implemented in computer program code, part as intellecture property (IP) core (as microcontroller core), or as the design of system hierarchy type (as system single chip (System on Chip, be called for short SOC))) in, and can convert hardware to, be used as the some that integrated circuit is made.Moreover the present invention can be embodied as the combination of hardware and computer program code.

At last, what those skilled in the art should recognize is, do not breaking away under defined spirit of the present invention of accompanying Claim and the scope, in order to carry out the purpose identical with the present invention, it can use the notion and the certain embodiments of announcement immediately, the basis of being used as design or revising other structure.

Claims

1. microprocessor comprises:

One return stack is in order to one first prediction of the destination address that produces a link order;

One branch target address caching, one second prediction in order to this destination address of producing this link order, and in order to produce a covering index, wherein if this first prediction error is predicted this destination address of this link order of one first incident, then this covering index can show a predetermined value; And

One branch control logic circuit, be coupled to this return stack and this branch target address caching, if this covering index shows this predetermined value, then for this link order of one second incident, with so that this microprocessor be branched off into this second the prediction this destination address, and can not be branched off into this first the prediction.

2. microprocessor as claimed in claim 1 wherein also comprises:

One upgrades logical circuit, is coupled to this branch target address caching, if this first prediction error is predicted this destination address of this link order of this first incident, then is updated to this predetermined value in order to should cover index.

3. microprocessor as claimed in claim 2, wherein if this second prediction error is predicted this destination address of this link order of one the 3rd incident, then this renewal logical circuit can be updated to this covering index one second predetermined value, wherein this second predetermined value is different with this predetermined value, and when this covering index shows this second predetermined value, then for this link order of one the 4th incident, this branch control logic circuit can make this microprocessor be branched off into this destination address of this first prediction.

4. microprocessor as claimed in claim 3 wherein also comprises:

One comparer, be coupled to this branch control logic circuit, in order to relatively this first prediction and this second prediction, wherein if this covering index shows this second predetermined value, then as long as this comparer shows this first prediction and this second prediction not to be mated, this branch control logic circuit can make this microprocessor be branched off into this destination address of this first prediction.

5. microprocessor as claimed in claim 3, wherein the 3rd incident of this link order is this second incident.

6. microprocessor as claimed in claim 1, wherein this return stack can produce this first prediction, and and then this branch target address caching can produce this second prediction.

7. microprocessor as claimed in claim 1 wherein also comprises:

One instruction decode logic circuit is coupled to this branch control logic circuit, and in order to the decoding of this link order, wherein this return stack can respond this instruction decode logic circuit with this link order decoding and produce this first prediction.

8. microprocessor as claimed in claim 7, wherein this return stack can respond this first prediction that stores this destination address in order to this instruction decode logic circuit that a call instruction is decoded.

9. microprocessor as claimed in claim 1, wherein this first prediction of this return stack generation takes place simultaneously with this second prediction that this branch target address caching produces in fact.

10. microprocessor as claimed in claim 1, wherein this branch target address caching also disposes in order to produce an indication, and this link order of this indicated number is present in the command byte of a fast line taking that is provided by an instruction cache.

11. microprocessor as claimed in claim 10, wherein this branch target address caching configuration comprises following possibility mode in order to produce an indication:

It is this indication that is present in this fast line taking that this return stack can produce this link order in response to this branch target address caching, and produces this first prediction; And

This branch target address caching can respond in order to specify this fast line taking in this instruction cache, is this indication that is present in this fast line taking and produce this link order.

12. microprocessor as claimed in claim 1, wherein can to produce a call instruction in response to this branch target address caching be to be present in the indication of instruction in the fast line taking to this return stack, and store first prediction of this destination address.

13. microprocessor as claimed in claim 1 wherein also comprises:

One second return stack is coupled to this branch control logic circuit, in order to one the 3rd prediction of this destination address of producing this link order.

14. microprocessor as claimed in claim 13, wherein if this covering index shows this predetermined value, then for this link order of this second incident, this branch control logic circuit can make this microprocessor be branched off into this destination address of this second prediction, and can not be branched off into the 3rd the prediction this destination address, in addition if this covers the value of index demonstration except this predetermined value, then for this link order of this second incident, this branch control logic circuit can make this microprocessor be branched off into this destination address of the 3rd prediction.

15. microprocessor as claimed in claim 14 wherein also comprises:

One comparer is coupled to this branch control logic circuit, in order to relatively this first prediction and the 3rd prediction;

Wherein, this comparer do not mate if showing this first prediction and the 3rd prediction, and if this covers the value of index demonstration except this predetermined value, then after being branched off into the 3rd prediction, this branch control logic circuit can make this microprocessor be branched off into this first prediction.

16. microprocessor as claimed in claim 1, wherein this branch control logic circuit comprises a multiplexer, in order to select in this first prediction and this second prediction, and be sent to an instruction cache, be used as with so that this microprocessor is branched off into an extraction address of one of this selections in this first prediction and this second prediction.

17. one kind in order to improve the device of the branch prediction accuracy in the microprocessor, this microprocessor has a branch target address caching and a return stack, and each can produce a prediction of a destination address of a link order, and this device comprises:

One covers index;

One upgrades logical circuit, is coupled to this coverings index, if predict this destination address of this link order of one first appearance by this prediction error that this return stack produced, then with so that this covering index is updated to a true value; And

One branch control logic circuit, being coupled to this covering index, is true if this covers index, then for one second this link order that occurs, in order to selecting, and can not select this prediction of producing by this return stack by this prediction that this branch target address caching produced.

18. device as claimed in claim 17, wherein this covering index is produced by this branch target address caching, and this branch target address caching is in order to store a plurality of covering indexs of a plurality of link orders, wherein if these a plurality of link orders one of them be this link order, then this branch target address caching can one of them be used as this covering index with these a plurality of covering indexs of this link order.

19. device as claimed in claim 18, wherein this branch target address caching extracts the address input based on one, whether one of them is this link order to judge these a plurality of link orders, and wherein should extract the address is the address input of an instruction cache of this microprocessor.

20. device as claimed in claim 17, wherein as if this destination address of being predicted this first this link order that occurs by this prediction error that this branch target address caching produced, then this renewal logical circuit can make this covering index be updated to a pseudo-value.

21. device as claimed in claim 17, be pseudo-wherein if this covers index, then for this second this link order that occurs, this branch control logic circuit can be selected this prediction of being produced by this return stack, and can not select this prediction of being produced by this branch target address caching.

22. device as claimed in claim 17 wherein also comprises:

One comparer, be coupled to this branch control logic circuit, in order to compare, with this prediction to this second this link order that occurs by this return stack produced by this prediction that this branch target address caching produced to this second this link order that occurs.

23. device as claimed in claim 22, be pseudo-wherein if this covers index, then this branch control logic circuit this prediction that can select to produce to this second this link order that occurs by this branch target address caching, and next if this comparer shows this prediction by this return stack produced, with do not match this prediction that then can select to produce to this second this link order that occurs by this return stack by this prediction that this branch target address caching produced.

24. device as claimed in claim 23, wherein at a first frequency in the cycle, this branch control logic circuit can receive this prediction that is produced by this branch target address caching, and then at a second frequency in the cycle, this branch control logic circuit can receive this prediction that is produced by this return stack.

25. a method of predicting the destination address of the link order in the microprocessor comprises the following steps:

In response to this destination address of this link order of return stack error prediction, be updated to a true value and cover index with one;

After this upgrades, a prediction that produces this destination address by a branch target address caching;

After this branch target address caching produces this prediction, judge whether this covering index has a true value; And

If this covering index has a true value, then make this microprocessor be branched off into this prediction that produces by this branch target address caching.

26. method as claimed in claim 25 wherein also comprises:

In response to this destination address of this this link order of branch target address caching error prediction, be updated to a pseudo-value and should cover index.

27., wherein also comprise as claim the 25 described methods:

Make this microprocessor be branched off into this prediction that produces by this branch target address caching;

After this branch target address caching produces this prediction of this destination address, produce a prediction of this destination address by this return stack; And

After this prediction that this microprocessor is branched off into produce by this branch target address caching, can be relatively by this prediction that this branch target address caching produced, with this prediction that is produced by this return stack.

28. method as claimed in claim 27 wherein also comprises:

If by this prediction that this branch target address caching produced, do not mate with this prediction that is produced by this return stack, then can make this microprocessor be branched off into this prediction that produces by this return stack.

29. method as claimed in claim 25 wherein also comprises:

After this upgrades, a prediction that produces this destination address of this link order by a return stack; And

If this covering index has a pseudo-value, then make this microprocessor be branched off into this prediction that produces by this return stack.

30. method as claimed in claim 29 wherein also comprises:

Extract the address in response to one, can be by this branch target address caching, predict that this link order is to be present in the fast line taking that is provided by an instruction cache, wherein this extraction address is in order to specify this fast line taking by this instruction cache provided.

31. method as claimed in claim 30, wherein this branch target address caching produces the action of this destination address of this link order, comprise in response to this branch target address caching and predict that this link order is to be present in by in this fast line taking, and produce this destination address.

32. method as claimed in claim 30, wherein this return stack produces the action of this prediction of this destination address of this link order, comprise in response to this branch target address caching and predict that this link order is to be present in by in this fast line taking, and produce this destination address.

33. method as claimed in claim 25 wherein also comprises:

After this branch target address caching produces this prediction of this destination address, with this link order decoding.

34. method as claimed in claim 33, wherein this return stack produces the action of this prediction of this destination address of this link order, comprises in response to this link order decoding, and produces this destination address.

35. method as claimed in claim 33 wherein after this link order is by instruction cache output, can be decoded this link order.

36. one kind in order to improve the device of the branch prediction accuracy in the microprocessor, this microprocessor has a return stack and an another kind of prediction unit, each can produce a prediction of a destination address of a link order, and a branch target address caching, and this device comprises:

One covers index, is provided by this branch target address caching;

One upgrades logical circuit, be coupled to this covering index, as if this destination address of predicting one first this link order that occurs by this prediction error that this return stack produced, then use so that this covering index in this branch target address caching is updated to a true value; And

One branch control logic circuit, being coupled to this covering index, is true if this covers index, then for one second this link order that occurs, in order to this prediction of selecting to be produced, and can not select this prediction of producing by this return stack by this another kind prediction unit.

37. the computer data signal that is comprised in the transmission medium comprises:

Computer readable program code, in order to offer a microprocessor, this program code comprises:

First program code, in order to offering a return stack, it is in order to one first prediction of the destination address that produces a link order;

Second program code, in order to offer a branch target address caching, it is in order to one second prediction of this destination address of producing this link order, and in order to produce a covering index, wherein if this first prediction error is predicted this destination address of this link order of one first incident, then this covering index can show a predetermined value; And

The 3rd program code, in order to offer the branch control logic circuit, it is coupled to this return stack and this branch target address caching, if this covering index shows this predetermined value, then for this link order of one second incident, with so that this microprocessor be branched off into this second the prediction this destination address, and can not be branched off into this first the prediction.