CN101819522A - Microprocessor and method for analyzing related instruction - Google Patents

Microprocessor and method for analyzing related instruction Download PDF

Info

Publication number
CN101819522A
CN101819522A CN 201010126829 CN201010126829A CN101819522A CN 101819522 A CN101819522 A CN 101819522A CN 201010126829 CN201010126829 CN 201010126829 CN 201010126829 A CN201010126829 A CN 201010126829A CN 101819522 A CN101819522 A CN 101819522A
Authority
CN
China
Prior art keywords
mentioned
calling
return
instruction
link order
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010126829
Other languages
Chinese (zh)
Other versions
CN101819522B (en
Inventor
布兰特·比恩
泰瑞·派克斯
G·葛兰·亨利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/481,074 external-priority patent/US7975132B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN101819522A publication Critical patent/CN101819522A/en
Application granted granted Critical
Publication of CN101819522B publication Critical patent/CN101819522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

The invention provides a kind of microprocessor and method for analyzing related instruction.Microprocessor has a plurality of callings/return the microprocessor that piles up, and correctly resolves calling or link order that desire is resolved, but not sends calling that desire resolves or the link order performance element to microprocessor.Microprocessor extracts whether calling or link order and decision instruction are first calling or link order.If instruction, is duplicated content that present calling/return piles up for extracting first calling or the link order after the conditional branch instructions of not resolving to another calling/return and pile up, specify another calling/return to be stacked as present calling/return and pile up.If instruction is for call instruction, microprocessor is pushed into present calling/return with the address of the next instruction of call instruction and piles up and extract instruction on the destination address of call instruction.If instruction is for link order, microprocessor is released second return address and extract instruction on second return address in being piled up by present calling/return.

Description

Microprocessor and method for analyzing related instruction
Technical field
The present invention is relevant for microprocessor, is particularly to the calling in a pipeline (pipelined) microprocessor and the processing of link order.
Background technology
Program comprises subroutine call instruction (subroutine call instructions) and usually from subroutine return instruction (return instruction).Call instruction can make a program circuit from present program or instruction sequence change to one with a different instruction sequence or the programs of program of extracting and carrying out at present.One call instruction is specified (specify) call address or is claimed destination address, and it is the address of first instruction of subroutine.In addition, the call instruction instruction processorunit is stored in the next instruction address after the call instruction, and this address is called the return address.One link order also can make a program circuit change to one with a different instruction sequence of program of at present extracting and carrying out.Yet a link order does not have clear and definite named place of destination location (target address).In addition, link order indication microprocessor uses the address of the return address of a nearest storage as first instruction of different instruction sequence, or an address of returning (now-returning) subroutine now.Link order in the above-mentioned present backspace subroutine makes processor begin to be extracted in a time instruction of a nearest executed call instruction.
Calling and link order are with the configuration state (architecturalstate) of update system.For example, the processor in a traditional x86 framework, call instruction is upgraded a structure stack pointer working storage and updated stored device (that is: a return address being pushed on (push) stack pointer value of piling up to the storer).Link order is new construction stack pointer working storage more also.
Many conventional processors are (speculatively) execution command also predictably.That is to say, when conventional processors runs into a conditional branch instructions, result's (being direction and destination address) of its predicted branches instruction, and continue to extract and execution command according to predicting the outcome.Call out or link order occurs on the predicted path of instruction when one, when execution that processor is no longer predicted, processor just upgrades the configuration state that is associated with calling or link order, that is resolves (resolve) intact all conditional branch instructions before calling or link order up to processor.That is to say, traditional processor will be called out or link order is delivered to its a plurality of performance elements, and performance element resolved all calling out or link order before be extracted and the conditional branch instructions of not resignation after, upgrade to be associated with and call out or the configuration state of link order.Therefore, call out and link order as other instructions (for example conditional branch instructions), can by execute phase (stage) and (retirement) stage of retiring from office to be performed and to retire from office.Therefore, with regard to the clock pulse cycle, the delay of calling or link order is identical with other instructions.Moreover, calling or link order consume considerable resource, for example, the time slot of performance element (time slot), working storage alias table (register alias table is hereinafter to be referred as RAT) project (entry), reservation station project or resequencing buffer project.
Therefore, need a kind of microprocessor, call out subroutine and return from subroutine with the permission program with better technology.
Summary of the invention
In view of this, the invention provides the method that a kind of microprocessor and instruction thereof are resolved, call out subroutine and return from subroutine with the permission program.
The invention provides the method that a kind of microprocessor and instruction thereof are resolved, call out subroutine and return from subroutine with the permission program.Wherein an embodiment provides a kind of method of resolving of instructing, and is applicable to an extraction unit of a microprocessor, in order to correctly resolving calling or the link order that a desire is resolved, but not sends calling that desire resolves or the link order performance element to microprocessor.Wherein, microprocessor has a plurality of callings/return and pile up.Method comprises the following steps.At first, extracting one calls out or link order.Then, corresponding to extraction step, when after the said extracted unit is extracting a conditional branch instructions of not resolving, extracting first calling or link order, the said extracted unit duplicates above-mentioned calling/return one in piling up and calls out/return the content of piling up at present to another calling in piling up of above-mentioned calling/return/return and pile up, and specifies above-mentioned another calling/return to be stacked as above-mentioned present calling/return and pile up.When above-mentioned calling or link order are a call instruction, the said extracted unit is pushed into one first return address that the above-mentioned present calling that above-mentioned calling/return piles up/return is piled up and extracts an instruction on the specified destination address of above-mentioned call instruction, and wherein above-mentioned first return address is the address of the next instruction after call instruction.When calling out or link order when being a link order, the above-mentioned present calling that the said extracted unit is piled up by above-mentioned calling/return/return is released one second return address and extract an instruction on above-mentioned second return address in piling up.
The embodiment of the invention provides a kind of microprocessor in addition, comprises a plurality of performance elements, in order to the analysis condition branch instruction.Microprocessor comprises a plurality of callings/return and pile up.Microprocessor also comprises an extraction unit, be coupled to call out/return and pile up, call out or link order in order to extract one, when after the said extracted unit is extracting a conditional branch instructions of not resolving, extracting first calling or link order, duplicate a plurality of callings in the microprocessor/return one in piling up and call out/return the content of piling up at present to another calling in piling up of above-mentioned calling/return/return and pile up, and when after the said extracted unit is extracting a conditional branch instructions of not resolving, extracting first calling or link order, specify above-mentioned another calling/return to be stacked as above-mentioned present calling/return and pile up.When calling out or link order when being a call instruction, the said extracted unit is pushed into one first return address that the above-mentioned present calling that above-mentioned calling/return piles up/return is piled up and extracts an instruction on above-mentioned call instruction destination address pointed, and wherein first return address is the address of the next instruction after above-mentioned call instruction.When calling out or link order when being a link order, the above-mentioned present calling of being piled up by above-mentioned calling/return/return is released one second return address and extract an instruction on above-mentioned second return address in piling up.
One of them advantage of the present invention is, because extraction unit (it can be microcode unit) can not be dispensed to short calling and link order under the situation of carrying out pipeline, directly carry out and resignation short calling and link order, therefore, the execution of short calling and link order can have delay still less than the execution of traditional calling and link order.Secondly, utilize extraction unit correctly to carry out short calling and link order, can avoid in traditional calling and link order the error prediction of generation and the spent microprocessor resources of follow-up corrigendum (correction).Another advantage of the present invention is can use than traditional calling and the required microprocessor resources still less of link order and resolve and retire from office short calling and link order.For example, because short calling or link order are not assigned to the execution pipeline, so short calling and link order can't take the project in working storage alias table, reservation station, performance element or resequencing buffer.
Description of drawings
Fig. 1 shows the block diagram of executing a microprocessor of example according to the present invention.
Fig. 2 is according to short calling of the present invention/the return block diagram of stackable unit 122 in the displayed map 1.
Fig. 3 shows a process flow diagram according to the embodiment of the invention, shows short calling/the return initialization flow process of stackable unit.
Fig. 4 is the operational flowchart of demonstration according to the extraction unit of Fig. 1 of the embodiment of the invention, to handle a short calling instruction.
Fig. 5 is a process flow diagram that shows according to the embodiment of the invention, carries out the operation of a fast return instruction in order to the extraction unit of displayed map 1.
Fig. 6 is a process flow diagram that shows according to the embodiment of the invention, carries out the operation of a conditional branch instructions in order to the microprocessor of displayed map 1.
Fig. 7 a, Fig. 7 b and Fig. 7 c are forms that shows according to the embodiment of the invention, carry out the operation of a procedure order in order to the microprocessor 100 of displayed map 1.
Fig. 8 is a microprocessor synoptic diagram that shows according to another embodiment of the present invention.
Fig. 9 is a process flow diagram that shows according to the embodiment of the invention, operation in order to the microprocessor of displayed map 8, in order to handle the user's program that is realized by microcode, wherein user's program can comprise short calling and the link order that can be extracted and be carried out by a microcode unit.
Embodiment
For above-mentioned and other purposes of the present invention, feature and advantage can be become apparent, cited below particularlyly go out preferred embodiment, and cooperate appended graphicly, be described in detail below.
Referring to Fig. 1, be the block diagram that shows according to a microprocessor 100 of the present invention.Microprocessor 100 comprises an extraction unit (fetch unit) 104, and it gets the instruction of 106 extraction procedures soon from an instruction.One instruction pointer working storage 112 provides extraction address 168 a to instruction to get 106 soon.Extraction unit 104 is got 106 extraction address 168 extraction instructions soon from instruction.The programmed instruction that extracts can comprise user's programmed instruction, for example operating system or application program.The instruction of extracting can comprise general instruction, wherein comprises conditional branch instructions (conditional branch instruction is hereinafter to be referred as CB).The instruction of extracting also comprises according to short calling of the present invention and link order.General instruction (normal instruction) is the programmed instruction that performance element 138 is carried out and retirement unit (retire unit) 144 is retired from office by microprocessor 100.On the contrary, short calling or link order are a programmed instruction of being resolved and being retired from office by the extraction unit 104 or the microcode unit 128 of microprocessor 100, but not are handled by the performance element 138 and the retirement unit 144 of microprocessor 100.
Extraction unit 104 also comprises a branch predictor 118.Branch predictor 118 produces one corresponding to the instruction pointer working storage (pointer) 112 of extraction unit 104 and extracts address 168 to instruction and get 106 action soon, produce a prediction destination address 156 to multiplexer 114 and a prediction direction 158 to extracting address control unit 126.Especially, branch predictor 118 according to extract addresses 168 whether specify one before comprised a conditional branch instructions get row soon, whether address control unit 126 prediction destination addresses 156 and prediction direction 158 are extracted in indication is effectively (valid).
In an embodiment, branch predictor 118 comprises branch's purpose and gets (not shown) soon.Get soon when middle in branch's purpose when extracting address 168, branch predictor 118 produce prediction destination addresses 156 to multiplexer 114 and prediction direction 158 to extracting address control unit 126.The conditional branch instructions address that microprocessor 100 utilizations had before been carried out and the destination address of parsing upgrade branch's purpose and get soon.In addition, microprocessor 100 is got soon according to the resulting direction prediction information updating of the parsing direction branch purpose of the conditional branch instructions of before having carried out.
Extraction unit 104 is got read routine instruction 106 soon from instruction, and the programmed instruction that transmission is extracted is to instruction decoder 108.The instruction that instruction decoder 108 decodings are extracted, and judge that each instruction of extracting be sent to the execution pipeline of microprocessor 100 or execution and resignation in extraction unit 104.Carry out herein pipeline can be considered under extraction unit shown in Figure 1 104 and microcode unit 128 pipeline partly, i.e. multiplexer 132, working storage alias table 134, instruction scheduling device 136, performance element 138 and retirement unit 144.Conditional branch instructions is one of example of the instruction decoder 108 general instruction 146 that is sent to performance element.Sending a general instruction 146 in the performance element of microprocessor 100, instruction decoder 108 also produces a CB and sends indication 162 short callings to extraction unit 104/return stackable unit 122.The operation that CB sends indication 162 sees also following short calling/the return description of stackable unit 122.
Instruction decoder 108 transmits general instruction 146 to one multiplexers 132, and this multiplexer 132 is from from selecting between the general instruction 146 of extraction unit 104 and the general instruction 148 from a microcode unit 128.Microcode unit 128 provides general instruction 148 to multiplexer 132, and it will be described in detail in Fig. 8.Multiplexer 132 transmits generally instruction 146/148 to working storage alias table 134.
Working storage alias table 134 (RAT) determines the operand dependence of all general instructions 146/148.General at present instruction 146/148 in working storage alias table 134 can have the operand that one or more depends upon the result of the previous general instruction of carrying out 146/148.After these operand dependences of working storage alias table 134 decision, pass on 146/148 to instruction scheduling device 136 of general instruction.In a conditional branch instructions 146 examples, a previous instruction is produced a result in order to the destination address of parsing branch condition or conditional branch instructions 146, can there be an operand dependence.
Instruction scheduling device 136 instructs 146/148 for execution in order to scheduling is general.Before instruction scheduling device 136 sends the general instruction 146/148 of desire execution, instruction scheduling device 136 will wait, until the operand that requires is available (available).In the example of a conditional branch instructions 146, before instruction scheduling device 136 sent conditional branch instructions 146 preparation execution, the result of a previous general instruction 146 was necessary for available.The instruction scheduling device 136 a plurality of performance elements 138 of the general instruction 146/148 of available operand that transmission has to microprocessor 100.
Performance element 138 is carried out general instruction 146/148.To a conditional branching 146, performance element 138 calculates correct branch direction and destination address, the correct branch direction that then will calculate with along with conditional branch instructions 146 compares down to the prediction direction 158 of carrying out pipeline, and the correct branch destination address that will calculate and side condition branch instruction 146 compare down to the prediction destination address 156 of execution pipeline.When above-mentioned two values of predicting were all correct, branch predictor 118 is predicted branches correctly, and extraction unit 104 extracts suitable instruction after extraction conditions branch instruction 146.Otherwise branch predictor 118 is the predicted branches result improperly, and extraction unit 104 extracts wrong instruction after extraction conditions branch instruction 146, and the error prediction of above situation must be corrected.Performance element 138 transmits all executed general instructions 146/148, and it comprises conditional branch instructions error prediction result of information to a retirement unit 144.
Retirement unit 144 framework ground are positioned at the end of carrying out pipeline, in order to write back structure working storage and the resignation general instruction 146/148 of execution result to microprocessor 100.The execution pipeline of microprocessor 100 comprises multiplexer 132, working storage alias table 134, instruction scheduling device 136, performance element 138 and retirement unit 144.In the example of conditional branch instructions 146, retirement unit 144 produces CB error prediction indication (a CBmispredicted indication) 178 to extraction unit 104 and microcode unit 128.When branch predictor 118 error prediction branch outcome, CB error prediction indication 178 is true.In this case, retirement unit 144 also can produce a correct destination address 176 of CB.Its address is that performance element 138 is resolved branches can take place the time, specified branch's destination address in the conditional branch instructions 146, or resolve branches can not take place the time when performance element 138, the conditional branch instructions 146 next one afterwards address of continuing.Conditional branch instructions 146 of error prediction will cause microprocessor 100 (for example to refresh (flush) from the execution pipeline of microprocessor 100, the amenable to process order) all instructions of upgrading than conditional branch instructions 146, and begin from the correct destination address 176 of CB that retirement unit 144 is produced, to extract instruction.
Extraction address control unit 126 in the extraction unit 104 receives indication 178 of CB error prediction and the correct destination address 176 of CB from retirement unit 144.Multiplexer selects signal 152 control multiplexers 114 selecting one of them from a plurality of possible addresses, and extraction unit 104 is arranged in these in instruction and gets soon to extract on 106 possible the address and instruct.Multiplexer 114 is loaded into instruction pointer working storage 112 with the address of choosing.If CB error prediction indication 178 is true, extracts address control unit 126 generation multiplexers and select signal 152 to select the correct destination address 176 of CB.If CB error prediction indication 178 is false, extracting the address control unit 126 default multiplexers that produce selects signal 152 to select next IP address of continuing (next sequential IP address, hereinafter to be referred as NSIP) 154, unless the branch that branch predictor 118 predictions have a meeting to take place, or instruction decoder 108 points out to run into a short calling or link order.NSIP 154 is the addresses of continuing of the next one after the extraction address 168 that is produced by an increment circuits 116.
Instruction decoder 108 is got soon from instruction and is extracted short calling and link order 106.Instruction decoder 100 produce respectively its value for 124 or link orders indications 142 of genuine call instruction indication to short calling/return stackable unit 122 and extract address control unit 126.Short calling/return stackable unit 122 correctly and nonselective execution and resignation in the short calling of 104 li of extraction units or link order (promptly regardless of any situation, extraction unit 104 can never distribute short calling and link order to performance element 138), and short calling or link order can not transferred to the execution pipeline of microprocessor 100.Especially, short calling/return stackable unit 122 comprises and upgrades relevant configuration state, to carry out as described above and retire from office short calling or link order.Therefore, microprocessor 100 of the present invention can use the still less clock pulse cycle than being assigned to general calling and the link order of carrying out pipeline, and faster and correctly carry out and retire from office short calling and link order.It should be noted that the earlier executed and the resignation of short calling described herein and link order, be different from the prediction of known calling of processor design field or link order.In the conventional microprocessor of a measurable calling and link order, must carry out predicted operation, and find to remove incorrect instruction of extracting behind the error prediction, and extract instruction at correct destination address (being correct calling or return address).Otherwise, short calling and link orders that short calling of the present invention/return stackable unit 122 is forever correctly carried out and resolved in extraction unit 104, do not need to use performance element 138 and retirement unit 144 to call out and link order thus correctly to carry out, do not need by error prediction of short calling/return stackable unit 122 corrigendums yet, because short calling/return stackable unit 122 and do not need to predict short calling and link order.
Instruction decoder 108 also provides call instruction indication 124 and link order indication 142 to extracting address control unit 126.Instruction decoder 108 also provides call address 164 to multiplexer 114.Short calling/return stackable unit 122 also to receive CB transmission indication 162 and receive CB error predictions indication 178 from retirement unit 144 from instruction decoder 108.At last, short calling/return stackable unit 122 provides (popped) return address 166 of a release to extracting address control unit 126.About short calling/return stackable unit 122 with and operation will be described in detail in following Fig. 2 and the block diagram of Fig. 7 a-Fig. 7 c and the process flow diagram of Fig. 3-Fig. 6.
Then referring to Fig. 2, be in the displayed map 1 according to short calling of the present invention/the return block diagram of stackable unit 122.Short calling/return stackable unit 122 comprises a plurality of callings/return and pile up 212 (call/return stacks is designated hereinafter simply as CRS 212), and calling as shown in the figure/return is piled up 0212 and piled up 3212 to calling out/returning.In an embodiment, short calling/return stackable unit 122 comprises 4 CRS 212.Though all embodiment comprise a plurality of CRS 212 in short calling/return in the stackable unit 122, other embodiment can comprise more or less in 4 CRS 212.
Each CRS 212 comprises identical a plurality of project, and wherein each project is in order to storing the return address from a short calling instruction, and last in, first out that (last infirst out, mode LIFO) is provided with one.In an embodiment, each CRS 212 has 8 and calls out/return the project of piling up (though Fig. 2 only shows 6 projects), only manages other embodiment and can consider to have among each CRS 212 more or less in 8 project.Item number among CRS 212 is a design item, may be by considering by the greatest hope quantity of the performed short calling instruction of program, and the electric power that each CRS 212 is consumed in the framework of microprocessor 100 (under the fast return instruction that neither one inserts).When short calling/return stackable unit 122 is carried out a call instruction, short calling/return stackable unit 122 link order is pushed into one suitable among a plurality of CRS 212, and when short calling/return stackable unit 122 is carried out a fast return instruction, short calling/return the suitable person of stackable unit 122 from a plurality of CRS 212 to release a return address.Short calling/return stackable unit 122 comprises a stack pointer (not shown), and it is used to refer to and pushes and release the project of a return address to each CRS 212.
Each CRS 212 has the calling of the correspondence of an appointment/return and pile up counter 214 (call/return stack counter, hereinafter referred to as CRSC 214), make short calling/return stackable unit 122 include and the calling of CRSC 214 similar numbers/return and pile up 212.When the CRS 212 of correspondence was present CRS 212, each CRSC 214 kept the count value that an extraction unit 104 distributes the not analysis condition branch instruction 146 of removing to carry out pipeline.In other words, each CRSC 214 keeps a count value of the conditional branch instructions of seeing 146 of not resolving in being associated with the prediction level of CRS 212.In an embodiment, each CRSC 214 is maximum 64 the analysis condition branch instructions not of one 6 digit counter and count enable.Among another embodiment, can comprise more or less position among the CRSC 214.One not analysis condition branch instruction 146 for not being performed the conditional branch instructions 146 that unit 138 is resolved and do not retired from office by retirement unit 144 as yet as yet.That is, performance element 138 does not determine correct branch direction and destination address as yet, and retirement unit 144 does not produce a true value as yet or falsity is indicated on 178 in the CB of Fig. 1 error prediction, indicates whether that for conditional branch instructions conditional branch instructions correctly or is improperly predicted.
Short calling/return stackable unit 122 comprises prediction 206 and nonanticipating pointers of pointer (speculativepointer) (non-speculative pointer) 208, all stores a value and can be used to point to a CRS 212 among a plurality of CRS 212.Present calling among the prediction pointer 206 sensing CRS 212/return and pile up promptly respectively corresponding to a short calling or link order, pushes or releases the CRS 212 of a return address.Nonanticipating pointer 208 points to a CRS 212 among a plurality of CRS 212, and it comprises and is associated with in procedure order all not return addresses of the call instruction of not retiring from office before the analysis condition branch instruction 146.In other words, such as the step 626 of following Fig. 6 discussion, nonanticipating pointer 208 points to CRS 212, wherein extraction unit 104 is in when a conditional branch instructions 146 is resolved when being incorrect, extraction unit 104 duplicates nonanticipating pointer 208 to predicting pointer 206.
Short calling/return stackable unit 122 also comprises steering logic unit 204, these steering logic unit 204 controls about the reading of CRS 212, CRSC 214, prediction pointer 206 and nonanticipating pointer 208, write, increment operation, decrement operations and clear operation.Steering logic unit 204 is corresponding to the call instruction indication 124 that receives from the instruction decoder 108 of Fig. 1, produces a return address that pushes 232 to by the present CRS 212 that predicts that pointer 206 is indicated.The value of the return address 232 that pushes is the next address after the call instruction address, and when the address of call instruction was the extraction address 168 of Fig. 1, the value of the return address 232 that then pushes was NSIP 154.Steering logic unit 204 is corresponding to the link order indication 142 that receives from Fig. 1 instruction decoder 108, makes prediction pointer 206 indicated present CRS 212 produce the return address 166 of release of Fig. 1 to multiplexer 114.Steering logic unit 204 is also called out by reading/is returned and piles up counter signals 228 and read each CRSC 214, and the calling by increasing/remove/reduce/choose/return is piled up counter signals 226 and write CRSC 214.
Steering logic unit 204 reads prediction pointer 206 by signal 216, assigns steering order with the value of foundation prediction pointer 206, and writes prediction pointer 206 according to above-mentioned steering order by signal 218.Steering logic unit 204 also reads nonanticipating pointer 208 by signal 224, assigns steering order with the value of foundation nonanticipating pointer 208, and writes nonanticipating pointer 208 according to above-mentioned steering order by signal 222.Detail operations about steering order and short calling/return stackable unit 122 sees also the process flow diagram of following Fig. 3 to Fig. 6 and the block diagram of Fig. 7 a-Fig. 7 c.
CRS 212 and CRSC 214, whole sight, be arranged in the configuration state of the microprocessor 100 that is associated with short calling and link order, therefore, when short calling/return stackable unit 122 is upgraded CRS 212 corresponding to extraction unit short calling of 104 decodings or link order, short calling or link order will correctly be carried out and retired from office by extraction unit 104.Indirect only short calling and the link order by microprocessor 100 instruction set architectures of CRS 212 and CRSC 214 revised, and can not be revised by other instruction in the instruction set architecture.Therefore, because the performed and resignation of short calling and link order,, and can not be revised by other unit (ex. performance element) so CRS 212 and CRSC 214 only can be revised by extraction unit by extraction unit.This is in contrast to other conventional microprocessor, and it is associated with the configuration state of traditional calling and link order, can be revised by the instruction beyond calling in the instruction set architecture and the link order.For example, in the x86 framework, the configuration state that is associated with CALL and RET instruction is a structure stack pointer working storage and storer, and it can be by other instruction in the instruction framework, and for example PUSH, POP, ENTER, LEAVE and MOV instruction are revised.In addition, instruction set architecture adds a restriction to the program that comprises short calling and link order, makes before carrying out each short calling link order, and program must formerly be carried out the short calling instruction of a correspondence.
As described above, CRS 212 is not the return address that is used for predicting the fast return instruction.Therefore, CRS 212 will be not can not pile up institute in order to the calling of prediction return address/return with processor design field know to be obscured, and for example name is called No. 6314514 described Internal call of patent of the U.S. of " METHOD ANDAPPARATUS FOR CORRECTING AN INTERNALCALL/RETURN STACK IN A MICROPROCESSOR THATSPECULATIVELY EXECUTES CALL AND RETURNINSTRUCTION "/return and pile up.No. 6314514 described Internal call of patent of the U.S./return and pile up the some that is not the system architecture state; Opposite, be the configuration state that stores the return address with system storage in the system of No. 6314514 patent of the U.S., and Internal call/return pile up be simply attempt to keep a configuration state get version soon in system storage.Yet it is inconsistent that the configuration state that may become with in system storage is piled up in Internal call/return.Therefore, the return address that is provided is provided for Internal call/return is a kind of prediction of correcting of may needing, and causes considerable clock pulse loss of cycle.That is to say, although the invention of No. 6314514 patent of the U.S. attempt to keep with system storage in structure pile up consistent, because the Internal call of nonsystematic configuration state/and to return the content of piling up and may become and pile up inconsistently with the structure in the system storage, Internal call/return the return address that the prediction that is provided is provided may need corrigendum.On the contrary, CRS 212 of the present invention is included in and is associated with one of short calling and link order and is arranged on configuration state in the microprocessor 100.
It should be noted that, embodiments of the invention are except considering short calling described herein and link order, and the instruction of microprocessor 100 and framework can comprise general calling and the link order that configuration state is different from the configuration state that is associated with short calling and link order.In an embodiment, be associated with instruction and the general calling of framework and the configuration state of link order of microprocessor 100, pile up for being included in the system storage by pointed one of a structure stack pointer working storage.In addition, extraction unit 104 is used to differentiate general calling and link order and short calling and link order.More detailed, extraction unit 104 is dispensed to the execution pipeline to general calling and link order with execution and resignation, but extraction unit 104 itself is correctly carried out and retire from office short calling and link order.In an embodiment, general calling and link order are x86 framework CALL and RET instruction.Mention unless have in addition in the article, in the embodiment of the invention, call out or link order is considered as a short calling or link order respectively for one, but not general calling or link order.
Fig. 3 shows a process flow diagram according to the embodiment of the invention, shows short calling/the return initialization flow process of stackable unit 122.Flow process starts from step 304.
In step 304, stackable unit 122 takes place or short calling/returns and receives first call instruction indication 124, first link order indication 142 or first CB and send indication 162 o'clock in microprocessor 100 in start, an exception (exceptioncondition), then execution in step 306.
In step 306, extraction unit 104 is removed prediction pointer 206, nonanticipating pointer 208, a plurality of CRS212 and a plurality of CRS C214, and flow process ends at step 306 afterwards.
Referring to Fig. 4, demonstration is according to the process flow diagram of the operation of the extraction unit 104 of Fig. 1 of the embodiment of the invention, in order to handle a short calling instruction.Flow process starts from step 404.
In step 404, extraction unit 104 provides extraction address 168 to instruction to get soon after 106 at instruction pointer working storage 112, gets 106 soon from instruction and extracts a short calling instruction.Follow execution in step 406.
In step 406, the short calling instruction that instruction decoder 108 decoding is extracted, and produce call instruction indication 124 to short calling/return stackable unit 122 and extract address control unit 126.Instruction decoder 108 also captures the call address 164 with the call instruction of calculating self-demarking code, and provides call address 164 to multiplexer 114.Follow execution in step 408.
In step 408, extraction unit 104 piles up counter by test call/return and whether comprises a nonzero value and judged whether unresolved conditional branch instructions.If the CRSC 214 that chooses comprises a nonzero value, then expression has not analysis condition branch instruction 146.If the CRSC 214 that chooses comprises a null value, then expression does not have not analysis condition branch instruction 146.Flow process is then carried out to determining step 412.
In determining step 412, if having when analysis condition branch instruction 146 does not exist, then the flow performing determining step 422; Otherwise, flow performing step 414.
In step 414, extraction unit 104 is pushed into the return address of short calling instruction the present CRS 212 of prediction pointer 206 indications.Short calling/the return steering logic unit 204 in the stackable unit 122 reads prediction pointer 206 to determine present CRS212, the return address 232 (it is NSIP 154) that pushes is write to the pairing CRS 212 of value that is read corresponding to from prediction pointer 206, and along with the stack pointer of upgrading present CRS 212.In an embodiment, as if the overflow (overflow) that action can cause present CRS 212 that pushes of return address, microprocessor 100 will produce one and pile up Overflow Indicator (stack overflow exception).This exception is handled routine (exception handler) content of present CRS 212 is deposited storer, to vacate required space, return address.On the contrary, if the release action of the return address in the step 514 of Fig. 5 can cause the underflow (underflow) of present CRS 212, microprocessor 100 will produce one and pile up the underflow exception.This exception is handled routine will return the content of depositing (restore) present CRS 212 from storer.In this embodiment, when not good program of usefulness produced the exception of a relative majority, the generation of exception and the processing of exception may be offseted the advantage of short calling described herein and link order; Yet general program (that is: to prevent to carry out the short calling instruction more than the degree of depth of CRS 212) still can obtain advantage from short calling and link order.Flow process is followed execution in step 416.
In step 416, extract address control unit 126 corresponding to call instruction indication 124, select signal 152 control multiplexers 114 so that call address 164 is written into instruction pointer working storage 112 via multiplexer, make next instruction on 106 call address 164 is got in instruction soon, to be extracted.Flow process is followed execution in step 418.
In step 418, extraction unit 104 resignation short calling instructions.Special, the short calling instruction can not be assigned to be carried out in the pipeline.Flow process ends at step 418.
In determining step 422, extraction unit 104 judges whether first calling or the link order of short calling instruction for being extracted after a conditional branch instructions 146.When instruction decoder 108 distributes a conditional branch instructions 146 to the execution pipeline, instruction decoder 108 produces a value and indicates 162 for genuine CB sends to short calling/return stackable unit 122, with notice short calling/return stackable unit 122, and when a conditional branch instructions was retired from office, retirement unit 144 also can be notified short calling/return stackable unit 122.So, send indication 162 for after very, the short calling that steering logic unit 204 can lasting tracking be assigned with or the number of link order from CB.When extraction unit 104 is judged the short callings instruction for first calling of being extracted or link order after a conditional branch instructions 146, predict then that level will increase and flow performing to step 424; Otherwise, the prediction level will can not be increased and flow performing to step 414.
In step 424, when in present prediction level during without any the CRS 212 of available (available), the execution that extraction unit 104 stops to extract instruction and suspends the short calling instruction is up to there being a CRS 212 to become available.When the value that increases prediction pointer 206 in step 428 (or in step 528 of Fig. 5) will make the value of its value and nonanticipating pointer 208 identical, be illustrated in the present prediction level without any available CRS 212.As below discussing, when sending a conditional branch instructions 146, instruction decoder 108 after carrying out pipeline, decodes first short calling instruction (or in the step 526 of Fig. 5, when a fast return is instructed decoded) time, short calling/return stackable unit 122 in new CRS 212 of step 426 configuration, it makes the CRS 212 of this new configuration to be configured again.On the contrary, when a conditional branch instructions 146 was resolved, a CRS 212 may become and can be configured.Especially, when a conditional branch instructions was mispredicted, because prediction pointer 206 will be updated in the step 626 among following described Fig. 6, one or more CRS 212 may become available.In addition, resolved when correctly predicting when a conditional branch instructions, when other conditions also satisfied, because nonanticipating pointer 208 will be updated in the step 624 among following described Fig. 6, a CRS 212 became available.Flow process is then carried out to step 428.
In step 426, new CRS 212 of extraction unit 104 configurations, the content of duplicating present CRS 212 has wherein comprised the project in piling up, and has removed new CRSC 214 to the CRS 212 of new configuration.Especially, the return address in present CRS 212 is copied among the CRS 212 of new configuration.Flow process is then carried out to step 428.
In step 428, extraction unit 104 increases progressively the CRS 212 of value to point to new configuration of prediction pointer 206.When extraction unit 104 increases the value of prediction pointer 206, the prediction level will increase.Extraction unit 104 makes CRS 212 be provided with ring-type formation (circular queue) to increase progressively prediction pointer 206 in parcel (wrapping) mode.That is to say, for example when the number of CRS 212 be 4, when the present value of prediction pointer 206 be 3, when steering logic unit 204 increases prediction pointers 206, new value will be zero.Similar parcel (wrapping) mode increment operation is carried out in steering logic unit 204, as among Fig. 6 in shown in the nonanticipating pointer 208 in the step 624.Flow process is then carried out to step 432.
In step 432, steering logic unit 204 is pushed into present CRS 212 with return address 232 (it is NSIP154), that is the CRS 212 of new configuration in step 426.Flow process is then carried out to step 416.
Fig. 5 shows the process flow diagram according to the embodiment of the invention, carries out the operation of a fast return instruction in order to the extraction unit 104 of displayed map 1.Flow process starts from step 504.
In step 504, extraction unit 104 provides extraction address 168 to instruction to get soon after 106 at instruction pointer working storage 112, gets 106 soon from instruction and extracts a fast return instruction.Flow process is then carried out to step 506.
In step 506, the fast return instruction that instruction decoder 108 decoding is extracted, and produce link order indication 142 to short calling/return stackable unit 122 and extract address control unit 126.Flow process is then carried out to step 508.
In step 508, extraction unit 104 has judged whether not analysis condition branch instruction 146 by testing whether present CRSC 214 is nonzero value.If the CRSC214 that chooses has comprised a nonzero value, then branch condition instruction 146 is not resolved in expression.If the CRSC that chooses 214 has comprised a null value, then branch condition instruction 146 is not resolved in expression.Flow process is then carried out to determining step 512.
In determining step 512, when analysis condition branch instruction 146 did not exist, then the flow performing determining step 522; Otherwise, flow performing step 514.
In step 514, extraction unit 104 will be released the return address from the present CRS212 of prediction pointer 206 indications.Short calling/the return steering logic unit 204 in the stackable unit 122 reads prediction pointer 206 to determine present CRS 212, upgrade the stack pointer of present CRS212, and read return address 166 from CRS 212 corresponding to the present stack pointer value of reading of prediction pointer 206.Flow process is followed execution in step 516.
In step 516, corresponding to link order indication 142, extract address control unit 126 and select signal 152 control multiplexers 114 to be written into instruction pointer working storage 112, make next instruction instructing the return address 166 of getting the release on 106 soon to extract with the return address 166 that will release via multiplexer.Flow process is followed execution in step 518.
In step 518, extraction unit 104 resignation fast return instructions.Special, the fast return instruction can not be assigned to be carried out in the pipeline.Flow process ends at step 518.
In determining step 522, extraction unit 104 judges whether first calling or the link order of fast return instruction for being extracted after a conditional branch instructions 146.When link order during for first short calling that is assigned with after last CB sends indication 162 or link order, predict then that level will increase and flow performing to step 524; Otherwise, the prediction level will can not be increased and flow performing to step 514.
In step 524, when in present prediction level during without any available CRS212, the execution that extraction unit 104 stops to extract instruction and suspends the fast return instruction is up to there being a CRS 212 to become available.Flow process is followed execution in step 526.
In step 526, new CRS 212 of extraction unit 104 configurations duplicates among the extremely new CRS 212 that disposes of content of present CRS 212, and removes new CRSC214.Especially, the return address in present CRS 212 is copied among the CRS212 of new configuration.Flow process is then carried out to step 528.
In step 528, extraction unit 104 increases progressively the CRS 212 of prediction pointer 206 to point to new configuration.Flow process is then carried out to step 532.
In step 532, return address 166 is released from the CRS 212 of new configuration in steering logic unit 204.Flow process is then carried out to step 516.
Fig. 6 shows the process flow diagram according to the embodiment of the invention, and it carries out the operation of a conditional branch instructions in order to the microprocessor 100 of displayed map 1.Flow process starts from step 604.
In step 604, extraction unit 104 provides extraction address 168 to instruction to get soon after 106 at instruction pointer working storage 112, gets 106 soon from instruction and extracts a conditional branch instructions.Flow process is then carried out to step 606.
In step 606, the conditional branch instructions that instruction decoder 108 decodings are extracted, and produce a true value in CB transmission indication 162.Short calling/return stackable unit 122 increases the value by the CRSC 214 of prediction pointer 206 CRS 212 correspondences pointed accordingly, that is, CRSC 214 at present.Each CRSC 214 stores the number of the not analysis condition branch instruction 146 of its corresponding CRS 212.Flow process is then carried out to step 608.
In step 608, whether extraction unit 104 can take place according to prediction direction 158, sends conditional branch instructions to carrying out pipeline and extract next instruction from prediction destination address 156 or NSIP154.Flow process is then carried out to step 612.
In step 612, carry out pipeline execution and resignation conditional branch instructions 146.Carry out pipeline and comprise multiplexer 132, RAT 134, scheduler 136, performance element 138 and retirement unit 144.Flow process is then carried out to step 614.
In step 614, retirement unit 144 by the correct destination address of CB 176 notice extraction units 104 conditional branch instructions 146 retired from office, correct branch's destination address and whether branched into prediction mistakenly by what CB error prediction indication 178 predicted.Flow process is then carried out to determining step 616.
In determining step 616, extraction unit 104 is judged according to CB error prediction indication 178 carry out any operation.When CB error prediction indication 178 was puppet, then branch was correctly predicted, and flow process is then carried out to step 618.Indicate 178 to be true time when the CB error prediction, then branch is predicted improperly, and flow process is then carried out to step 626.
In step 618, because correctly predicting conditional branch instructions 146, branch predictor 118 will be retired from office, extraction unit 104 reduces the value of the CRSC214 of nonanticipating pointer 208 indications.Flow process is then carried out to determining step 622.
In determining step 622, extraction unit 104 judges whether that the CRSC 214 that reduces in step 618 comprises a null value.Null value in CRSC 214 represents not to be associated with the not analysis condition branch instruction 146 of the CRS 212 of nonanticipating pointer 208 indications.When CRSC 214 was zero, flow process was then carried out to step 624; Otherwise flow process ends at step 622.
In step 624, extraction unit 104 increases progressively the value of nonanticipating pointer 208, makes the next CRS 212 of its sensing in the ring-type formation of CRS 212.Extraction unit 104 increases progressively the value of nonanticipating pointer 208 in a parcel mode described in the step 426 as described above.Flow process ends at step 624.
In step 626, take place and must be corrected owing to detect the error prediction of a conditional branch instructions, extraction unit 104 duplicates the value of nonanticipating pointer 208 to predicting pointer 206.The value of duplicating nonanticipating pointer 208 makes to the operation of predicting pointer 206 and becomes present CRS 212 by nonanticipating pointer 208 CRS 212 pointed.Flow process is then carried out to step 628.
In step 628, microprocessor 100 refreshes the execution pipeline.The conditional branch instructions of error prediction will be retired from office that is be represented that it is the oldest instruction in the microprocessor 100.Refresh carry out pipeline will remove all come from microprocessor 100 and in its program execution sequence than the error prediction and the also new instruction of conditional branch instructions 146 of retiring from office at present.This behavior is necessary, because branch predictor 118 predicted condition branch instruction mistakenly makes extraction unit 104 instruct from incorrect path extraction.Flow process is then carried out to step 632.
In step 632, because the pipeline refresh operation in the step 628 makes in the microprocessor 100 that without any the conditional branch instructions of not resolving, extraction unit 104 is removed present CRSC 214.Flow process is then carried out to step 634.
In step 634, extraction unit 104 for genuine CB error prediction indication 178, is loaded into instruction pointer working storage 112 with the correct destination address 176 of CB corresponding to its value, makes extraction unit 104 extract next instruction from the correct destination address 176 of CB.Flow process ends at step 634.
The described method of Fig. 4-Fig. 6 of the present invention is described extraction unit 104 and how to be operated, so that when analysis condition branch instruction 146 does not occur, correctly carry out short calling and link order.When the prediction level increased, extraction unit 104 disposed a new CRS 212, and when the prediction level reduced, extraction unit 104 was removed the configuration of one or more CRS212.Therefore, suitably the return address is according to carrying out how analysis condition branch instruction 146 of pipeline, corresponding to extraction unit 104 predicted condition branch instruction 146 and being stored how.When conditional branch instructions when resignation that correctly predicts, short calling/return stackable unit 122 will reduce by the value (referring to the step 618 of Fig. 6) of nonanticipating pointer 208 CRSC 214 pointed.If without any analysis condition branch instruction 146 not in present prediction level, then CRS 212 is nonanticipating at present, and short calling/return the value that stackable unit 122 increases progressively nonanticipating pointer 208, to point on next CRS 212 (step 624 as shown in Figure 6).During the conditional branch instructions resignation of arriving when an error prediction, because without any the conditional branch instructions 146 of not resolving, short calling/return stackable unit 122 will make and become present CRS 212 by nonanticipating CRS 212 CRS 212 pointed, and refresh pipeline and remove present CRSC 214 (referring to the step 626-632 of Fig. 6).
Fig. 7 a shows the form according to the embodiment of the invention, carries out the operation of one first procedure order in order to the microprocessor 100 of displayed map 1.First procedure order is right after after initialization step shown in Figure 3.The shown example of Fig. 7 a-Fig. 7 c uses four CRSC 214, is denoted as c0 to c3.Pile up depth representing after operation is finished at present, the return address number on present CRS 212.The value of pointer is after operation is finished at present, the content of nonanticipating pointer 208 and prediction pointer 206.The value of instruction pointer is for after operation is finished at present, in the content of instruction pointer working storage 112.The size of each instruction is that four hytes and address are represented with hexadecimal.Though instruction sequence comprises many other non-short calling/link orders and non-conditional branch instructions most probably, for the purpose of simplifying the description, only show the example that mainly comprises short calling/link order and conditional branch instructions in the present embodiment.
In the first step of Fig. 7 a, microprocessor 100 is initialised.Initialization operation makes microprocessor 100 remove all CRS 212 and CRSC 214, together with prediction pointer 206 and nonanticipating pointer 208.
In second step of Fig. 7 a, extraction unit 104 extracts an initial order from the extraction address 168 that an address is 0x100.
In the third step of Fig. 7 a, extraction unit 104 extracts a short calling instruction (from address 0x104, the next one after the initial order address of second step continue IP address (NSIP) 154).Call instruction is specified the instruction of jump (jump) to address 0x300, and in address 0x300, extraction unit 104 extracts a new instruction.Instruction decoder 108 produces call instruction indication 124 to short calling/return stackable unit 122.Short calling/return stackable unit 122 to produce the return address 232 that pushes, its address value (next address of continuing after call instruction indication 124) that will be worth for 0x108 is pushed into present CRS 212 (CRS 0), and the degree of depth of piling up of present CRS 212 (CRS 0) is increased to 1.
In the 4th step of Fig. 7 a, extraction unit 104 extracts a conditional branch instructions 146 from the 0x300 of address.Conditional branch instructions 146 specifies a value to be the destination address of 0xC80.In this embodiment, hypothesis branch fallout predictor 118 predicted branches can take place.A prediction destination address 156 that it is 0xC80 that branch predictor 118 produces a value and a prediction direction 158 are designated as branch and can take place.Multiplexer 114 selective values are the prediction destination address 156 of 0xC80 and will predict that destination address 156 is loaded into instruction pointer working storage 112.Instruction decoder 108 sends conditional branch instructions 146 to carrying out pipeline, and produces CB and send indication 162 to short calling/return stackable unit 122, and it makes short calling/return stackable unit 122 that the value of present CRSC 214 (c0) is increased to 1.
In the 5th step of Fig. 7 a, extraction unit 104 extracts a fast return address from address 0xC80, and it is the prediction destination address 156 of the conditional branch instructions in the 4th step for address 0xC80.Instruction decoder 108 produce a true value in link order indication 142 to short calling/return stackable unit 122.Because first calling or link order that this is extracted after a conditional branch instructions 146 for extraction unit 104, the CRS 212 (CRS 1) that extraction unit 104 configuration one is new and the content of duplicating present CRS 212 (CRS 0) are to new CRS 212 (CRS 1), remove new CRSC 214 (c1), and increase prediction pointer 206 to 1, cause CRS 1 to become present CRS 212.Short calling/return stackable unit 122 reads release from new CRS 212 (CRS 1) return address 166.Multiplexer 114 selective values are the return address 166 of the release of 0x108, and the address of choosing is loaded into instruction pointer working storage 112.Because unique return address (0x108) that link order indication 142 is released on new CRS 212 (CRS 1) is piled up the degree of depth and is back to zero.
In the 6th step of Fig. 7 a, the conditional branch instructions 146 in the 4th step of Fig. 7 a is performed and retires from office, and wherein branch does not correctly resolve to and can take place.Yet because predicted branch direction (can take place) and do not meet correctly the branch direction (can not take place) of resolving, this is a branch misprediction.Retirement unit 144 produce a value for genuine CB error prediction indication 178 and the correct destination address 176 of CB to extraction unit 104.Shown in the step 626 of Fig. 6, extraction unit 104 is copied to prediction pointer 206 with the value of nonanticipating pointer 208.So, make CRS 0 become present CRS 212.Therefore, prediction pointer 206 and nonanticipating pointer 208 all have one 0 value.Then, shown in the step 632 of Fig. 6, extraction unit 104 is removed CRSC 214 (c0).At last, shown in the step 628 and 634 of Fig. 6, microprocessor 100 refreshes to be carried out pipeline and the correct destination address 176 of CB (0x304) is loaded into instruction pointer working storage 112.
Fig. 7 b shows the form according to the embodiment of the invention, in order to show the operation of one second procedure order.Second procedure order is begun to carry out by the end part of Fig. 7 a, so CRS 212, CRS C 214, the count value of piling up, prediction pointer 206 and nonanticipating pointer 208 are identical with last step of Fig. 7 a.
In the first step of Fig. 7 b, the procedure order that it is 0x220 that instruction pointer working storage 112 produces an address value is extracted address 168.
In second step of Fig. 7 b, extraction unit 104 extracts short calling instruction from the extraction address 168 that an address value is 0x220.It is 0x600 that call instruction is specified a call address value, and wherein extraction unit 104 extracts a new instruction sequence in this.Instruction decoder 108 produce a true value in call instruction indication 124 to short calling/return stackable unit 122.Short calling/return stackable unit 122 produces and pushes return address (0x224), for the address of continuing of the next one call instruction after, to present CRS 212 (CRS 0).The degree of depth of piling up of CRS 212 is 1 at present.
In the third step of Fig. 7 b, the general instruction 146 that extraction unit 104 extracts a non-conditional branch instructions from value for the extraction address 168 of 0x600.Instruction decoder 108 sends this general instruction 146 to the execution pipeline, and extraction unit 104 increases the value of instruction pointer working storages 112.
In the 4th step of Fig. 7 b, extraction unit 104 extracts a short calling instruction from value for the extraction address 168 of 0x604.It is 0x700 that call instruction is specified a call address value, and wherein extraction unit 104 will extract a new instruction sequence in this.Instruction decoder 108 produce a true value in call instruction indication 124 to short calling/return stackable unit 122.Short calling/return stackable unit 122 produces the return address 232 that pushes to present CRS 212 (CRS 0), and it is the continue return address of address of the next one after call instruction indication 124, and these return address that pushes 232 address values are 0x608.The degree of depth of piling up of CRS 212 is 2 at present.
In the 5th step of Fig. 7 b, the general instruction 146 that extraction unit 104 extracts a non-conditional branch instructions from value for the extraction address 168 of 0x700.Instruction decoder 108 sends this general instruction 146 to the execution pipeline, and extraction unit 104 increases the value of instruction pointer working storages 112.
In the 6th step of Fig. 7 b, the general instruction 146 that extraction unit 104 extracts a non-conditional branch instructions from value for the extraction address 168 of 0x704.Instruction decoder 108 sends this general instruction 146 to the execution pipeline, and extraction unit 104 increases the value of instruction pointer working storages 112.
In the 7th step of Fig. 7 b, extraction unit 104 extracts a conditional branch instructions from the 0x708 of address, and the destination address value of this conditional branch instructions is 0xD80.In this embodiment, hypothesis branch fallout predictor 118 predicted branches can not take place.Prediction direction 158 of branch predictor 118 generations is designated as branch and can take place, and increment circuits 116 produces NSIP 154.The NSIP 154 that multiplexer 114 will be worth for 0x70C is loaded into instruction pointer working storage 112.Instruction decoder 108 sends conditional branch instructions 146 to carrying out pipeline, and produces a true value and send indication 162 to short calling/return stackable unit 122 at CB, and it makes short calling/return stackable unit 122 that the value of present CRSC214 (c0) is increased to 1.
In the 8th step of Fig. 7 b, extraction unit 104 extracts a fast return address from address 0x70C.Instruction decoder 108 produce a true value in link order indication 142 to short calling/return stackable unit 122.Because first calling or link order that this is extracted after a conditional branch instructions 146 for extraction unit 104, extraction unit 104 duplicates the new CRS 212 (CRS 1) of content to of present CRS 212 (CRS 0), remove new CRSC 214 (c1), and increase prediction pointer 206 to 1.Short calling/return stackable unit 122 from new CRS 212 (CRS 1), to read the return address 166 of release, and produce a true value in link order indication 142 to extracting address control unit 126.Multiplexer 114 is selected the return address 166 released, and extracts next instruction among the return address 0x608 by the 4th step of Fig. 7 b.Because the return address 0x224 of second step of Fig. 7 b still in CRS 212, piles up the degree of depth and is back to 1.
In the 9th step of Fig. 7 b, the conditional branch instructions 146 in the 7th step of Fig. 7 b is performed and retires from office, and wherein branch does not correctly resolve to and can take place.Do not indicate the branch resolution result's (can not take place) who meets correctly because predicted branch direction 158 can not take place, so this is a correct branch prediction.Retirement unit 144 generations, one pseudo-CB error prediction indication 178 and the correct destination address 176 of CB are to extraction unit 104.Shown in the step 618 of Fig. 6, extraction unit 104 reduces the value (c0) of the CRSC 214 of nonanticipating pointer 208 CRS 212 correspondences pointed.Then, shown in the step 622 of Fig. 6, extraction unit 104 checks whether nonanticipating CRSC 214 is zero.In this example, nonanticipating CRSC 214 is c0, and it has one 0 value.Therefore, shown in the step 624 of Fig. 6, extraction unit 104 increases to 1 with nonanticipating pointer 208.
In the tenth step of Fig. 7 b, extraction unit 104 extracts a fast return address from address 0x60C.Instruction decoder 108 produce a true value in link order indication 142 to short calling/return stackable unit 122.Short calling/return stackable unit 122 from present CRS 212 (CRS 1), to read the return address 166 (0x224) of release, and produce a true value in link order indication 142 to extracting address control unit 126.Multiplexer 114 is selected the return address 166 of release, and extracts next instruction among the return address 0x224 of extraction unit 104 by second step of Fig. 7 b.Because this is not first calling or link order that extraction unit 104 is extracted after extracting a conditional branch instructions 146, prediction pointer 206 is not affected.Because without any the return address, pile up the degree of depth and be back to 0 among the present CRS 212.
Fig. 7 c shows the form according to the embodiment of the invention, in order to show the operation of one the 3rd procedure order.The 3rd procedure order is begun to carry out by the end part of Fig. 7 b, so CRS 212, CRSC 214, the count value of piling up, prediction pointer 206 and nonanticipating pointer 208 are identical with last step of Fig. 7 b.
In the first step of Fig. 7 c, instruction pointer working storage 112 produces the extraction address 168 that address value is 0x540.
In second step of Fig. 7 c, extraction unit 104 extracts short calling instruction from the extraction address 168 that an address value is 0x540.It is 0x580 that call instruction is specified a call address value, and wherein extraction unit 104 extracts a new instruction sequence in this.Instruction decoder 108 produce a true value in call instruction indication 124 to short calling/return stackable unit 122.It is that the return address that pushes 232 of 0x544 is to present CRS 212 (CRS 1) that short calling/return stackable unit 122 produces address values.The degree of depth of piling up of CRS 212 is 1 at present.
In the third step of Fig. 7 c, extraction unit 104 extracts a conditional branch instructions 146 from address 0x580.Conditional branch instructions 146 specifies a value to be the destination address of 0xE60.In this embodiment, hypothesis branch fallout predictor 118 predicted branches can not take place.Prediction direction 158 of branch predictor 118 generations is designated as branch and can take place, and increment circuits 116 produces NSIP 154.The NSIP 154 that multiplexer 114 will be worth for 0x584 is loaded into instruction pointer working storage 112.Instruction decoder 108 sends conditional branch instructions 146 to carrying out pipeline, and produces a true value and send indication 162 to short calling/return stackable unit 122 at CB, and it makes short calling/return stackable unit 122 that the value of present CRSC 214 (c1) is increased to 1.
In the 4th step of Fig. 7 c, extraction unit 104 is the extraction address 168 extractions one short calling instruction of 0x584 from address value.It is 0x5D0 that call instruction is specified a call address value, and wherein extraction unit 104 will extract next instruction in this.Instruction decoder 108 produces call instruction indication 124 to short calling/return stackable unit 122.Because first calling or link order that this is extracted after extracting a conditional branch instructions 146 for extraction unit 104, extraction unit 104 duplicates the new CRS 212 (CRS 2) of content to of present CRS 212 (CRS 1), remove new CRSC 214 (c2), and increase prediction pointer 206 to 2.Short calling/return stackable unit 122 produce address values be the return address of 0x588 to new CRS 212 (CRS 2), this return address is the address of continuing of the next one after call instruction.The degree of depth of piling up of new CRS 212 (CRS 2) is 2.
In the 5th step of Fig. 7 c, extraction unit 104 extracts fast return instruction from address 0x5D0.Instruction decoder 108 produces link order indication 142 to short calling/return stackable unit 122.Short calling/return stackable unit 122 from present CRS 212 (CRS 2), to read the return address 166 of release, and produce a true value in link order indication 142 to extracting address control unit 126.Multiplexer 114 is selected the return address 166 released, and extracts next instruction among the return address 0x588 by the 4th step of Fig. 7 c.Because this is not first calling or link order that extraction unit 104 is extracted after extracting a conditional branch instructions 146, prediction pointer 206 is not affected.Because still on CRS 212, therefore pile up the degree of depth is back to 1 to the return address 0x544 of second step of Fig. 7 c.
In the 6th step of Fig. 7 c, extraction unit 104 extracts a fast return address from address 0x588.Instruction decoder 108 produces link order indication 142 to short calling/return stackable unit 122.Short calling/return stackable unit 122 from present CRS 212 (CRS 2), to read the return address 166 of release, and produce a true value in link order indication 142 to extracting address control unit 126.Multiplexer 114 is selected the return address 166 of release, and extracts next instruction among the return address 0x544 of extraction unit 104 by second step of Fig. 7 c.Because this is not first calling or link order that extraction unit 104 is extracted after extracting a conditional branch instructions 146, prediction pointer 206 is not affected.Because without any the return address, pile up the degree of depth and be back to 0 among the present CRS 212 (CRS 2).
In the 7th step of Fig. 7 c, the conditional branch instructions 146 in the third step of Fig. 7 c is performed and retires from office, and wherein branch does not correctly resolve to and can take place.Do not indicate the branch resolution result's (can not take place) who meets correctly because predicted branch direction 158 can not take place, so this is a correct branch prediction.Retirement unit 144 generations, one pseudo-CB error prediction indication 178 and the correct destination address 176 of CB are to extraction unit 104.Shown in the step 618 of Fig. 6, extraction unit 104 reduces the value (c1) of the CRS C 214 of nonanticipating pointer 208 CRS 212 correspondences pointed.Then, shown in the step 622 of Fig. 6, extraction unit 104 checks whether nonanticipating CRSC 214 is zero.In this example, nonanticipating CRSC 214 is c1, and it has one 0 value now.Therefore, shown in the step 624 of Fig. 6, extraction unit 104 increases to 2 with nonanticipating pointer 208.
In the 8th step of Fig. 7 c, microprocessor 100 runs into an exception.As shown in Figure 3, extraction unit 104 is removed prediction pointer 206, nonanticipating pointer 208, each CRS 212 and each CRSC 214.This moment, short calling/return stackable unit 122 to be initialised, and exception will be decided by that the next one in the instruction pointer working storage 112 extracts address 168.
Seeing also Fig. 8, is a microprocessor 800 synoptic diagram that show according to another embodiment of the present invention.Microprocessor 800 is similar to and is the microprocessor 100 that is same as Fig. 1 in some embodiment.The detail of the microcode unit 128 of the embodiment displayed map 1 of Fig. 8.Especially, microcode unit 128 comprises the like among a plurality of Fig. 1 of being described in detail in, and for example extraction unit 104.Especially, microcode unit 128 comprises a short calling/return stackable unit 822, it is to be similar to the short calling shown in Fig. 2/return stackable unit 122 and its class of operation is similar to the step that Fig. 3 to Fig. 7 a to Fig. 7 c figure describes, with correctly carry out and retire from office short calling and link order in the instruction crossfire that is mixed with conditional branch instructions.The short calling of Fig. 8/return in the stackable unit 822 corresponding to the short calling of Fig. 2/return stackable unit 122.The extraction unit 104 of Fig. 8 is similar to the extraction unit 104 of Fig. 1, though and not shown, the microprocessor 800 of Fig. 8 also comprises an instruction and gets 106 soon, and it is that the instruction that is similar to Fig. 1 gets 106 soon.
In the embodiment of Fig. 8, microcode unit 128 is correctly carried out and is retired from office and is included in short calling and link order in the microcode order that is mixed with conditional branch instructions, the extraction unit 104 that is similar to Fig. 1 correctly carry out and the user's program that is mixed with conditional branch instructions of retiring from office in short calling and link order.Especially, short calling and link order are not sent in the execution pipeline of microprocessor 800, but are correctly carried out and retired from office by microcode unit 128.In the embodiment of Fig. 8, microcode unit 128 is extracted micro-code instruction from a microcode ROM 806, but not getting soon 106 from instruction as shown in Figure 1 extracted.Being stored in micro-code instruction sequence in the microcode ROM 806 can comprise as being stored in instruction and get the instruction that comprises conditional branch instructions that the user's program in 106 is comprised soon.Being stored in micro-code instruction sequence in the microcode ROM 806 also can comprise as being stored in instruction and get short calling and the link order that the user's program in 106 is comprised soon.Be similar to the extraction unit 104 of Fig. 1, microcode unit 128 can send the general instruction 148 that the comprises conditional branch instructions execution pipeline to microprocessor 800.Therefore, microcode unit 128 is as one second extraction unit of 800 li of microprocessors, and its execution is stored in the microcode in the microcode ROM 806, gets user's program in 106 soon but not be stored in instruction.
According to an embodiment, be different from the extraction unit 104 of Fig. 1, microcode unit 128 does not comprise branch predictor (although considering among the embodiment that microcode unit 128 comprises the situation of a branch predictor).Therefore, in the step 608 of Fig. 6, microcode unit 128 will always be extracted instruction from next address 854 of continuing of microcode ROM 806.That is to say that when microcode unit 128 was extracted a conditional branch instructions from microcode ROM 806, microcode unit 128 was incited somebody to action the conditional branching of total " prediction " conditional branch instructions for not taking place.One increment circuits 816 increases progressively the value of the extraction address 868 of conditional branch instructions, and it produces next IP address (NSIP) 854 of continuing.Extract address control unit 826 and produce multiplexer selection signal 852, to select NSIP 854 from multiplexer 814.The address of choosing is loaded into an instruction pointer working storage 812, makes that extracting address 868 is NSIP 854.
When a microcode conditional branch instructions 148 was delivered to performance element 138, it can correctly be resolved to can (taken) take place or (not take) can not take place.Yet, be different from the extraction unit 104 of Fig. 1, when a conditional branch instructions correctly resolves to can take place the time, under all situations, a conditional branch instructions is by 128 error predictions of microcode unit.This be because, as described above, the conditional branching that microcode unit 128 will total " prediction " conditional branch instructions be for can not taking place, and always from the NSIP address 854 of microcode ROM 806 extraction instruct.If performance element 138 correctly analysis condition branch instruction takes place for meeting, retirement unit 144 produces a true value and indicates 878 to microcode unit 128 at the CB error prediction, connect the correct destination address 876 of same CB, and extract address control unit 826 generations one multiplexer and select signal 852 to select the correct destination address 876 of CB to cause multiplexer 814.When conditional branch instructions 148 is mispredicted, will makes and take place as refreshing the action of carrying out pipeline comprising shown in the step 626-634 of earlier figures 6.
Fig. 9 shows the process flow diagram according to the embodiment of the invention, operation in order to the microprocessor 800 of displayed map 8, in order to handle the user's program that realizes by microcode, wherein user's program can comprise short calling and the link order that can be extracted and be carried out by a microcode unit 128, and wherein above-mentioned microcode is that exclusive (private to) is in above-mentioned microprocessor.Flow process starts from step 904.
In step 904, the extraction unit 104 among Fig. 8 is got soon from instruction and is extracted user's programmed instruction 106.Write forms by instruction set with primary (native) of microprocessor 800 for user's program, and be that an operating system, application program or any microprocessor 800 are got other programs of extracting 106 soon from instruction.Flow process is then carried out to step 906.
In step 906, the instruction decoder 108 decoding user programmed instruction in extraction unit 104.Flow process is then carried out to step 908.
In step 908, the instruction decoder 108 in extraction unit 104 judges whether that user's programmed instruction realizes with microcode.The micro-code instruction sequence is extracted and carried out to microcode unit 128 to realize user's programmed instruction complicated and/or that seldom be performed.In addition, the micro-code instruction sequence comprises exception processing routine.When user's programmed instruction is non-when realizing with microcode, flow process is then carried out to step 912.When user's programmed instruction was realized with microcode, flow process was then carried out to step 916.
In step 912, instruction decoder in extraction unit 104 108 distribute user's programmed instruction to performance element 138 correctly to be carried out.Flow process is then carried out to step 914.
In step 914, performance element 138 is correctly carried out and the user's programmed instruction of retiring from office.Especially, performance element 138 determines correct branch direction and correct destination address by mode shown in Figure 6 as described above, with the conditional branch instructions of correctly carrying out and retiring from office and distributed by microcode unit 128.Flow process ends at step 914.
In step 916, instruction decoder 108 conversion and control in extraction unit 104 are weighed to microcode unit 128, to realize user's programmed instruction.Microcode unit 128 has stored the micro-code instruction sequence that can realize user's programmed instruction.Flow process is then carried out to step 918.
In step 918, microcode unit 128 is extracted a micro-code instruction from microcode ROM 806.Initially, microcode unit 128 is extracted micro-code instruction from the specified one first micro code program address of extraction unit 104.When microcode unit 128 runs into the micro-code instruction that indication microcode unit 128 controls are changed back extraction unit 104, microcode unit 128 will stop the extraction of micro-code instruction.In this manner, microcode unit 128 is extracted and is carried out a series of micro-code instruction, to realize user's programmed instruction.Flow process is then carried out to step 922.
In step 922, the micro-code instruction that instruction decoder 808 decodings in the microcode unit 128 are extracted from step 918.Flow process is then carried out to determining step 924.
In step 924, microcode unit 128 judges whether that the micro-code instruction that extracts is a short calling or link order.Utilization is similar to Fig. 1 to mode shown in Figure 7, and microcode unit 128 is carried out and the short calling or the link order of 128 li of resignation microcode unit, and does not transmit short calling or the link order performance element 138 to microprocessor 800.When the micro-code instruction that extracts not is a short calling or link order, then then carry out to step 926.When the micro-code instruction that extracts is a short calling or link order, then then carry out to step 932.
In step 926, instruction decoder in the microcode unit 128 808 transmit the micro-code instruction that extracts non-short calling or link order to performance element 138 correctly to be carried out and to be retired from office.In one embodiment, the micro-code instruction of extraction is a general instruction 148.Flow process is then carried out to step 928.
In step 928, performance element 138 is correctly carried out and is retired from office and is the micro-code instruction of general instruction 148.When general instruction 148 was a conditional branch instructions, performance element 138 and retirement unit 144 were as described execution of step 914 and resignation conditional branch instructions.When general instruction 148 is last when instruction in the micro-code instruction sequence, microcode unit 128 conversion and control are weighed to extraction unit 104, and flow process ends at step 928; Otherwise flow process is then carried out to step 918.
In step 932, microcode unit 128 is respectively as Fig. 4 or mode shown in Figure 5 is correctly carried out and retire from office short calling or link order.128 li microcodes of carrying out and retiring from office of microcode unit call out or link order is short calling or link order, because they can not be sent to performance element 138 and can not cause the delay of carrying out in the pipeline.When short calling or link order were last instruction in the micro-code instruction sequence, microcode unit 128 conversion and control were weighed to extraction unit 104, and flow process ends at step 932; Otherwise flow process is then carried out to step 918.
Though the embodiment of Fig. 1-Fig. 7 and Fig. 8-Fig. 9 is described separately, in the good embodiment of a design, extraction unit 104 and microcode unit 128 all can carry out and retire from office short calling and link order.
Though the present invention discloses as above with preferred embodiment; right its is not in order to qualification the present invention, those skilled in the art, without departing from the spirit and scope of the present invention; when can doing a little change and retouching, so protection scope of the present invention is as the criterion when looking aforementioned claims person of defining.For example, but the software activation, for example, function, manufacturing, modelling, simulation, description and/or test device of the present invention and method.Above-mentioned by using general procedure language (for example: C, C++), hardware description language (HDL) to comprise that Verilog HDL, VHDL or the like realize.This type of software can be contained in tangible media with the kenel of procedure code, for example any other machine readable is got (as embodied on computer readable) storage medium such as semiconductor, floppy disk, hard disk or discs (for example: CD-ROM, DVD-ROM or the like), wherein, when procedure code by machine, when being written into and carrying out as computing machine, this machine becomes in order to implement device of the present invention.Method and apparatus of the present invention also can be with the procedure code kenel by some transmission mediums, transmit as electric wire or cable, optical fiber or any transmission kenel, wherein, when procedure code by machine, as the computing machine reception, when being written into and carrying out, this machine becomes in order to implement device of the present invention.When the general service processor is done in fact, the procedure code associative processor provides a class of operation to be similar to the unique apparatus of using particular logic circuit.Device of the present invention and method can be contained in a for example microcontroller core (being embedded in HDL) of semiconductor intellectual property core, and convert the hardware product of integrated circuit to.In addition, device of the present invention and method can comprise the physical embodiment of the combination with hardware and software.Therefore protection scope of the present invention is as the criterion when looking aforementioned claims person of defining.At last, those skilled in the art can without departing from the spirit and scope of the present invention, can do a little change and retouch to reach identical purpose of the present invention based on disclosed notion of the present invention and specific embodiment.
Being simply described as follows of symbol in the accompanying drawing:
100: microprocessor; 104: extraction unit; 106: the instruction cache; 108: instruction decoder; 112: the instruction pointer buffer; 114: multiplexer; 116: increment circuits; 118: branch predictor; 122: short calling/return stackable unit; 124: the call instruction indication; 126: extract address control unit; 128: microcode unit; 132: multiplexer; 134: the buffer alias table; 136: the instruction scheduling device; 138: performance element; 142: the return instruction indication; 144: retirement unit; 146,148: instruction; 152: multiplexer is selected signal; 154: next IP address (NSIP) of continuing; 156: the prediction destination address; 158: prediction direction; 162:CB sends indication; 164: call address; 166: the return address of release; 168: extract the address; The correct destination address of 176:CB; The indication of 178:CB error prediction; 204: the control logic unit; 206: the prediction pointer; 208: the nonanticipating pointer; 212: call out/return stackable unit (CRS); 214: call out/return stacking counter (CRSC); 222,224,216: signal; 226,228: call out/return stacking counter signals; 232: the return address that pushes; 304-306,404-432,504-532,604-634: execution in step; 800: microprocessor; 806: microcode ROM; 808: instruction decoder; 812: the instruction pointer buffer; 814: multiplexer; 816: increment circuits; 822: short calling/return stackable unit; 824: the call instruction indication; 826: extract address control unit; 842: the return instruction indication; 852: multiplexer is selected signal; 854: next address of continuing; 862:CB sends indication; 864: call address; 866: the return address of release; 868: extract the address; The correct destination address of 876:CB; The indication of 878:CB error prediction; 904-932: execution in step.

Claims (18)

1. a microprocessor is characterized in that, comprising:
A plurality of performance elements are in order to the analysis condition branch instruction;
A plurality of callings/return and pile up; And
One extraction unit, be coupled to above-mentioned calling/return and pile up, call out or link order in order to extract one, the said extracted unit is also in order to judge that whether above-mentioned calling or link order are for extracting first calling or the link order of being extracted after the conditional branch instructions of not resolving in the said extracted unit, when above-mentioned calling or link order are above-mentioned first calling or link order, the said extracted unit duplicates above-mentioned calling/return one in piling up and calls out/return the content of piling up at present to another calling in piling up of above-mentioned calling/return/return and pile up, and specifies above-mentioned another calling/return to be stacked as above-mentioned present calling/return and pile up;
When above-mentioned calling or link order are a call instruction, the said extracted unit is pushed into one first return address that the above-mentioned present calling that above-mentioned calling/return piles up/return is piled up and extracts an instruction on the specified destination address of above-mentioned call instruction, and wherein above-mentioned first return address is the address of the next instruction after above-mentioned call instruction; And
When above-mentioned calling or link order were a link order, the above-mentioned present calling that the said extracted unit is piled up by above-mentioned calling/return/return was released one second return address and extract an instruction on above-mentioned second return address in piling up.
2. microprocessor according to claim 1 is characterized in that, the said extracted unit can not send above-mentioned calling or the link order above-mentioned performance element to above-mentioned microprocessor.
3. microprocessor according to claim 1 is characterized in that, also comprises:
One prediction pointer, be coupled to above-mentioned calling/return and pile up, the above-mentioned present calling that the above-mentioned calling of wherein above-mentioned prediction pointed/return is piled up/return and pile up, wherein when the said extracted unit specifies above-mentioned another calling/return to be stacked as above-mentioned present calling/return to pile up, the above-mentioned present calling of above-mentioned prediction pointed/return and pile up.
4. microprocessor according to claim 3 is characterized in that, also comprises:
One nonanticipating pointer, be coupled to above-mentioned calling/return and pile up, one call out/return and pile up in order to what be directed to that above-mentioned calling/return piles up, wherein above-mentioned calling/return is piled up to comprise and only is associated with the program execution sequence in all return addresses of the call instruction before the analysis condition branch instruction not.
5. microprocessor according to claim 4 is characterized in that, also comprises:
A plurality of counters correspond respectively to above-mentioned calling/return and pile up, and wherein the said extracted unit is in order to corresponding to extracting a conditional branch instructions, increase by above-mentioned prediction pointer above-mentioned calling pointed/the return value of a pairing counter that piles up.
6. microprocessor according to claim 5, it is characterized in that, after the said extracted unit upgraded above-mentioned prediction pointer with above-mentioned nonanticipating pointer, the said extracted unit was removed and to be associated with above-mentioned nonanticipating pointer above-mentioned calling pointed/the return value of the counter that piles up.
7. microprocessor according to claim 6, it is characterized in that, the said extracted unit reduces above-mentioned nonanticipating pointer above-mentioned calling pointed/the return value of piling up pairing above-mentioned counter, resolves of above-mentioned conditional branch instructions of not resolving and determines above-mentioned microprocessor correctly to predict above-mentioned conditional branch instructions corresponding to above-mentioned performance element.
8. microprocessor according to claim 7 is characterized in that, when the said extracted unit was reduced to 0 with the value of above-mentioned counter, the said extracted unit increased above-mentioned nonanticipating pointer.
9. microprocessor according to claim 6, it is characterized in that, upgrade above-mentioned prediction pointer in the said extracted unit with above-mentioned nonanticipating pointer, be stacked as after above-mentioned present calling/return piles up to point to another calling/return, the said extracted unit is removed and is associated with above-mentioned prediction pointer above-mentioned calling pointed/the return value of one the above-mentioned counter that piles up.
10. a method of carrying out calling or link order fast is characterized in that said method comprises the following steps:
Extracting one calls out or link order;
Corresponding to the said extracted step, judge whether above-mentioned calling or link order are first calling or the link order that extraction unit is extracted after extracting a conditional branch instructions of not resolving;
Duplicate a plurality of callings in the microprocessor/return one in piling up and call out/return the content of piling up at present to another calling in piling up of above-mentioned calling/return/return and pile up, and when after the said extracted unit is extracting a conditional branch instructions of not resolving, extracting first calling or link order, specify above-mentioned another calling/return to be stacked as above-mentioned present calling/return and pile up; And
When after the said extracted unit is extracting a conditional branch instructions of not resolving, extracting first calling or link order, above-mentioned duplicate and given step also comprises:
When above-mentioned calling or link order are a call instruction, one first return address is pushed into that the above-mentioned present calling that above-mentioned calling/return piles up/return is piled up and extracts an instruction on above-mentioned call instruction destination address pointed, wherein above-mentioned first return address is the address of next instruction after above-mentioned call instruction; And
When above-mentioned calling or link order were a link order, the above-mentioned present calling of being piled up by above-mentioned calling/return/return was released one second return address and extract an instruction on above-mentioned second return address in piling up.
11. the method for quick execution calling according to claim 10 or link order is characterized in that, also comprises:
Do not send above-mentioned calling or link order performance element to above-mentioned microprocessor.
12. the method for quick execution calling according to claim 10 or link order is characterized in that, above-mentioned microprocessor comprises a prediction pointer, the above-mentioned present calling of above-mentioned prediction pointed/return and pile up:
Wherein above-mentioned given step comprises that upgrading above-mentioned prediction pointer is stacked as above-mentioned present calling/return and pile up to point to another calling/return.
13. the method for quick execution calling according to claim 12 or link order, it is characterized in that, above-mentioned microprocessor comprises a nonanticipating pointer, in order to point to above-mentioned a plurality of calling/return and pile up one of them, wherein above-mentioned calling/return is piled up one of them and is comprised and only be associated with the program execution sequence in all return addresses of the link order after the analysis condition branch instruction not, and said method also comprises:
Corresponding to resolving above-mentioned not analysis condition branch instruction one removes above-mentioned prediction pointer with above-mentioned nonanticipating pointer, and judges above-mentioned microprocessor error and predict above-mentioned conditional branch instructions.
14. the method for quick execution calling according to claim 13 or link order is characterized in that, above-mentioned microprocessor comprises and is associated with each above-mentioned calling/return the counter of a correspondence of piling up, and said method also comprises:
The nonanticipating pointer is copied to after the step of above-mentioned prediction pointer, removes by above-mentioned nonanticipating pointer above-mentioned calling pointed/the return value of piling up one of them associated counter.
15. the method for quick execution calling according to claim 13 or link order is characterized in that, above-mentioned microprocessor comprises and is associated with each above-mentioned calling/return the counter of a correspondence of piling up, and said method also comprises:
Corresponding to extracting a conditional branch instructions, increase and be associated with above-mentioned prediction pointer above-mentioned calling pointed/the return value of the counter that piles up one of them.
16. the method for quick execution calling according to claim 15 or link order is characterized in that, also comprises:
Resolve of above-mentioned conditional branch instructions of not resolving and determine above-mentioned microprocessor correctly to predict above-mentioned conditional branch instructions corresponding to performance element, reduce being associated with above-mentioned nonanticipating pointer above-mentioned calling pointed/the return value of the counter that piles up one of them.
17. the method for quick execution calling according to claim 16 or link order is characterized in that, also comprises:
When above-mentioned minimizing step is reduced to 0 with the value of above-mentioned counter, increase above-mentioned nonanticipating pointer.
18. the method for quick execution calling according to claim 15 or link order is characterized in that, also comprises:
Upgrade above-mentioned prediction pointer corresponding to above-mentioned nonanticipating pointer, be stacked as after above-mentioned present calling/return piles up, remove by above-mentioned prediction pointer above-mentioned calling pointed/the return value of the counter that piles up one of them to point to another calling/return.
CN 201010126829 2009-03-04 2010-03-04 Microprocessor and method for analyzing related instruction Active CN101819522B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US15735109P 2009-03-04 2009-03-04
US61/157,351 2009-03-04
US12/481,074 US7975132B2 (en) 2009-03-04 2009-06-09 Apparatus and method for fast correct resolution of call and return instructions using multiple call/return stacks in the presence of speculative conditional instruction execution in a pipelined microprocessor
US12/481,074 2009-06-09

Publications (2)

Publication Number Publication Date
CN101819522A true CN101819522A (en) 2010-09-01
CN101819522B CN101819522B (en) 2012-12-12

Family

ID=42654637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010126829 Active CN101819522B (en) 2009-03-04 2010-03-04 Microprocessor and method for analyzing related instruction

Country Status (1)

Country Link
CN (1) CN101819522B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252335A (en) * 2013-06-28 2014-12-31 国际商业机器公司 Predictive fetching and decoding for selected return instructions
TWI478054B (en) * 2011-12-29 2015-03-21 Intel Corp Programmable predication logic in command streamer instruction execution
CN104951697A (en) * 2014-03-28 2015-09-30 英特尔公司 Return-target restrictive return from procedure instructions, processors, methods, and systems
CN108196884A (en) * 2014-04-25 2018-06-22 安华高科技通用Ip(新加坡)公司 Utilize the computer information processing device for generating renaming

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6314514B1 (en) * 1999-03-18 2001-11-06 Ip-First, Llc Method and apparatus for correcting an internal call/return stack in a microprocessor that speculatively executes call and return instructions
CN1397880A (en) * 2001-05-04 2003-02-19 智慧第一公司 Imaginary branch target address high speed buffer storage attached with secondary predictor
CN1397876A (en) * 2001-05-04 2003-02-19 智慧第一公司 Appts. and method for replacing target address in imaginary branch target address high speed buffer storage
US20040143727A1 (en) * 2003-01-16 2004-07-22 Ip-First, Llc. Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack
JP2006155374A (en) * 2004-11-30 2006-06-15 Fujitsu Ltd Branch prediction device and branch prediction method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6314514B1 (en) * 1999-03-18 2001-11-06 Ip-First, Llc Method and apparatus for correcting an internal call/return stack in a microprocessor that speculatively executes call and return instructions
CN1397880A (en) * 2001-05-04 2003-02-19 智慧第一公司 Imaginary branch target address high speed buffer storage attached with secondary predictor
CN1397876A (en) * 2001-05-04 2003-02-19 智慧第一公司 Appts. and method for replacing target address in imaginary branch target address high speed buffer storage
US20040143727A1 (en) * 2003-01-16 2004-07-22 Ip-First, Llc. Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack
JP2006155374A (en) * 2004-11-30 2006-06-15 Fujitsu Ltd Branch prediction device and branch prediction method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI478054B (en) * 2011-12-29 2015-03-21 Intel Corp Programmable predication logic in command streamer instruction execution
CN104252335A (en) * 2013-06-28 2014-12-31 国际商业机器公司 Predictive fetching and decoding for selected return instructions
CN104252335B (en) * 2013-06-28 2017-04-12 国际商业机器公司 Predictive fetching and decoding for selected return instructions
CN104951697A (en) * 2014-03-28 2015-09-30 英特尔公司 Return-target restrictive return from procedure instructions, processors, methods, and systems
CN104951697B (en) * 2014-03-28 2018-07-03 英特尔公司 The restricted return of return-target from process instruction, processor, method and system
CN108196884A (en) * 2014-04-25 2018-06-22 安华高科技通用Ip(新加坡)公司 Utilize the computer information processing device for generating renaming
CN108196884B (en) * 2014-04-25 2022-05-31 安华高科技股份有限公司 Computer information processor using generation renames

Also Published As

Publication number Publication date
CN101819522B (en) 2012-12-12

Similar Documents

Publication Publication Date Title
CN103543985B (en) The method that microprocessor and dependent instruction perform
CN101876891B (en) Microprocessor and method for quickly executing conditional branch instructions
TWI288351B (en) Pipelined microprocessor, apparatus, and method for performing early correction of conditional branch instruction mispredictions
CN101876889B (en) Method for performing a plurality of quick conditional branch instructions and relevant microprocessor
CN101149701B (en) Method and apparatus for redirection of machine check interrupts in multithreaded systems
EP0399760B1 (en) Paired instruction processor branch recovery mechanism
CN101324840B (en) Method and system for performing independent loading for reinforcement processing unit
CN102483696A (en) Methods and apparatus to predict non-execution of conditional non-branching instructions
JPH09185506A (en) Method and system for executing instruction inside processor
JPH03116236A (en) Method and system for processing exception
US7143271B2 (en) Automatic register backup/restore system and method
CN101884025B (en) Method and system for accelerating procedure return sequences
CN101819522B (en) Microprocessor and method for analyzing related instruction
CN101390046A (en) Method and apparatus for repairing a link stack
CN101819523B (en) Microprocessor and related instruction execution method
US9448800B2 (en) Reorder-buffer-based static checkpointing for rename table rebuilding
CN102163139A (en) Microprocessor fusing loading arithmetic/logic operation and skip macroinstructions
US6421774B1 (en) Static branch predictor using opcode of instruction preceding conditional branch
US5713012A (en) Microprocessor
KR100508320B1 (en) Processor having replay architecture with fast and slow replay paths
CN101324841B (en) Method and system for performing independent loading for reinforcement processing unit
REAL Today’s Menu

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant