CN1326037C - Method and device for correcting internal call or return stack in microprocessor - Google Patents

Method and device for correcting internal call or return stack in microprocessor Download PDF

Info

Publication number
CN1326037C
CN1326037C CNB2004100038255A CN200410003825A CN1326037C CN 1326037 C CN1326037 C CN 1326037C CN B2004100038255 A CNB2004100038255 A CN B2004100038255A CN 200410003825 A CN200410003825 A CN 200410003825A CN 1326037 C CN1326037 C CN 1326037C
Authority
CN
China
Prior art keywords
stratum
call
signal
storehouse
microprocessor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNB2004100038255A
Other languages
Chinese (zh)
Other versions
CN1558326A (en
Inventor
汤玛斯·C·麦当劳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INTELLIGENCE FIRST CO
Original Assignee
INTELLIGENCE FIRST CO
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INTELLIGENCE FIRST CO filed Critical INTELLIGENCE FIRST CO
Priority to CNB2004100038255A priority Critical patent/CN1326037C/en
Publication of CN1558326A publication Critical patent/CN1558326A/en
Application granted granted Critical
Publication of CN1326037C publication Critical patent/CN1326037C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Advance Control (AREA)

Abstract

The present invention discloses a device for correcting an internal call/return stack (CRS) in a pipeline microprocessor. Whenever the microprocessor updates the CRS according to a call/return command, correction information is also stored in a first correction stack. The microprocessor comprises two different stages capable of detecting invalid events (such as branch error prediction or abnormalities). Once the call/return command passes through the first detection stage, the correction information corresponding to the call/return command is transferred from the first correction stack to a second correction stack. If an invalid event is detected in a higher detection stage, then only the correction information in the first stack can be used for correcting the CRS. However, if the invalid event is detected in a lower detection stage, then the correction information in the second stack can be used for correcting the CRS.

Description

Be used for correcting the intrinsic call of microprocessor or the method and apparatus of return stack
Technical field
The field that the present invention relates to call (call) in a kind of microprocessor or return (return) storehouse, relate in particular to one call or return stack and primary memory between conforming method and the device kept.
Background technology
But microprocessor is a kind of digital device of instruction of computer system defined.Modern microprocessor is normal to use a kind of intrinsic call or return stack reducing the generation of pipeline bubble, and it is by relevantly with link order being caused when being stored in access in tediously long with calling.
Call instruction comprises and is used for program circuit is changed to an instruction of subroutine that wherein the address of this subroutine is specified by this call instruction.When a call instruction is performed, the return address (promptly, address in the instruction of this call instruction back) in will being pushed in the primary memory by microprocessor in the specified storehouse of a stack pointer buffer, and the address of this subroutine will be loaded in the stack pointer buffer of microprocessor.Link order then comprises can change a program circuit is retracted into call instruction back in the program by subroutine instruction.
When a link order was performed, (pop) will be ejected in the return address that before is pushed on this storehouse in the autonomous memory storehouse, and is loaded in the stack pointer buffer.
Comprise call or a little processing of return stack in, when call instruction is performed, the return address can be pushed to this call or return stack on so that instruction can continue to extract (fetch), and the rear end stratum of pipeline then can utilize this return address to upgrade main memory stack.On the contrary, when link order is carried out, the return address can call certainly or return stack in eject (pop) need not wait for the tediously long extraction of the return address that produces from this main memory stack so that instruction can continue to extract.
Call or the usefulness of return stack is mainly called according to making of processor or return stack is kept consistent ability with primary memory and determined.Call or return stack according to call or speculative update that link order is done can cause primary memory and call or return stack between inconsistent generation.And call or why the renewal of return stack is predictive, be that (for example producing one unusually) causes because the instruction before calling or return in the pipeline causes an invalid event, for instance, it requires pipeline to carry out all instructions after causing unusual instruction and cleans (flush), causes that wherein unusual instruction comprises by calling or link order that predictive is carried out.Similarly, call or link order before a performed branch instruction might mispredicted (mispredicted), and will require pipeline to clean by calling or link order that predictive is carried out.Since call or return stack according to calling or link order and being upgraded by predictive ground, but main memory stack call or link order no longer for being updated before the predictive, therefore call or the content of return stack and the content of primary memory will cause inconsistent phenomenon.
Herein can be with reference to United States Patent (USP) case numbers 6,314,514 " METHOD AND APPARATUS FORCORRECTING AN INTERNAL CALL/RETURN STACK IN AMICROPROCESSOR THAT SPECULATIVELY EXECUTES CALL ANDRETURN INSTRUCTIONS ", its be described in no matter whether have predictive to call or the situation of the execution of link order under, be used for keeping one and call or device and method that return stack is consistent with primary memory.Yet the device described in this patent, the single stratum that only can be used for its pipeline can detect in the microprocessor of a branch instruction error prediction, and can not be effectively applied to carry out and detect in the microprocessor of branch instruction error prediction with multiple stratum in the pipeline.
Therefore, be provided for correcting the intrinsic call in the microprocessor or the method and apparatus of return stack, and this microprocessor can detect wrong predictive and call or link order in multiple pipeline stratum, just be purpose of the present invention.
Summary of the invention
For reaching above-mentioned purpose, a preferred embodiment of the present invention provides more equipment of a separate type, this separate type more equipment can preserve corrigendum information with provide according to pipeline inside call or return mechanism and the predictive that is positioned on the distinct program section call or link order, thereby can when detecting invalid event, start based on call or link order under the selectivity corrigendum of stratum, wherein this invalid event is relevant to the stratum that detects this invalid event.A purpose of the present invention provides a kind of and is used for more being sitting at that one in the pipeline microprocessor calls or the device of return stack.This device comprises one first storehouse, and it has more than first project to be used to store corrigendum information, wherein should corrigendum information be relevant to calling or link order in more than first stratum in the microprocessor pipeline.This device also includes one second storehouse, and it is coupled to this first storehouse and has more than second project to be used to store corrigendum information, wherein should corrigendum information be relevant to calling or link order in more than second stratum in the microprocessor pipeline.In addition, this device more comprises a steering logic, and it is coupled to first and second storehouse.This steering logic can be used to receive a control signal to indicate one to call or link order is passed to more than second stratum from more than first stratum.And this steering logic can be called being relevant to or the corrigendum information of link order is moved to second storehouse in first storehouse.
Another purpose of the present invention provides the microprocessor of a pipelineization.This microprocessor comprises and calling or return stack (CRS).And it also comprises first and second pipeline stratum, it can be used to respectively at producing a true value on first and second signal, corresponding respectively to the error prediction testing result of the branch instruction in this first and second pipeline stratum, and wherein this first stratum is positioned in pipeline above this second stratum.In addition, this microprocessor more comprises a device, and it couples to receive first and second signal.This device is used to keep and is relevant to calling or the first information of link order in the pipeline stratum that is higher than this first stratum, and be used to keep calling or second information of link order in the pipeline stratum that is relevant in the middle of this first and second stratum, when wherein being true value as if this first signal, this device can utilize this first information optionally to correct CRS, and if this secondary signal is when being true value, this device can utilize this first and second information optionally to correct CRS.
Another object of the present invention provide a kind of be used for that one of a pipeline microprocessor is called or return stack (CRS) with keep consistent method with the internal memory that it couples.This method comprises that the request of receiving calls or link order is upgraded CRS to respond one.This method also comprises responds aforesaid receiving step, will correct in information storage to one first impact damper.After having stored, this method also includes a step, its detect this call or link order one of them whether passed one first stratum of microprocessor pipeline to be used for detecting an invalid event.This method more includes a step at last, and it is responded aforesaid detection step and moves to one second impact damper from this first impact damper with the part that will correct information.
A further object of the present invention provides a computer data signal that specifically is implemented in the transmission medium, and it comprises that computer readable program code is to provide one to be used to correct that one in the pipeline microprocessor calls or the device of return stack.And this program code comprises first program code to be used to provide one first storehouse, and it more includes more than first project that is used to store corrigendum information, and calling or link order in more than first stratum of the information-related microprocessor pipeline of this corrigendum.This program code also comprises that second program code is to be used to provide one second storehouse, it is coupled to this first storehouse, this second program code more includes more than second project that is used to store corrigendum information, and calling or link order in more than second stratum of the information-related microprocessor pipeline of this corrigendum.In addition, this program code comprises that also the 3rd program code is to be used to provide steering logic, it is coupled to this first and second storehouse, the 3rd program code can receive a control signal, it can indicate one to call or link order just is passed to this more than second stratum from this more than first stratum, and wherein this steering logic responds this control signal and moves to this second storehouse with this steering logic that will call with this or link order is relevant from this first storehouse.
Benefit of the present invention is that voidable incident (for example branch instruction error prediction or unusual) is detected in the multiple stratum of microprocessor pipeline, and still can allow one to call or return stack and primary memory are kept consistance.The present invention can promptly detect the branch instruction error prediction but not can only be in the bottom of pipeline at the leading portion of pipeline, it takes place and can reduce the pipeline bubble because can make right the wrong prediction and the action of extracting right instructions than can only be fast when the pipeline floor detection makes mistake prediction.
Further feature of the present invention, benefit and advantage, behind the remainder of consulting this instructions and accompanying drawing, can be clearer.
Description of drawings
Fig. 1 is the block diagram of a pipeline microprocessor of the present invention.
Fig. 2 calls for microprocessor included among Fig. 1 of the present invention or the return stack block diagram of equipment more.
Fig. 3 to Fig. 7 is calling or the return stack operational flowchart of equipment more among Fig. 2 of the present invention.
Wherein, description of reference numerals is as follows:
The 100-microprocessor
104-F stratum of 102-I stratum
108-R stratum of 106-X stratum
114-J stratum of 112-A stratum
118-G stratum of 116-D stratum
124-E stratum of 122-H stratum
128-W stratum of 126-S stratum
202-steering logic 204-both-end input multiplexer
206-calls or return stack
208-return address corrigendum storehouse (RACS)
212-three ends input multiplexer 214-instruction pointer buffer
216-high-order corrigendum order storehouse (HCCS)
218-significance bit 222-six ends input multiplexer
224-low order corrigendum order storehouse (LCCS)
226-significance bit 228-xfer_cmd_mux_sel signal
232-call signal 234-return signal
236-subroutine_addr signal 238-return_addr signal
242-J_stalled signal 244-J_call/ret signal
246-J_mispredict signal 248_cmd signal
252-S_call/ret signal 254-S_mispredict signal
256-S_exception signal 258-RACS_ret_addr signal
262-new_return_addr signal 264-CRS_mux_sel signal
266-push_CRS signal 268-pop_CRS signal
272-push_RACS signal 274-pop_RACS signal
The 276-new_subroutine_addr signal
The 278-IP_mux_sel signal
282-IP_load signal 284-push_HCCS signal
286-pop_HCCS signal 288-valid_bits_HCCS signal
291-J_correct_addr signal 292-CRS_ret_addr signal
293-S_correct_addr signal 294-push_LCCS signal
The 295-invalidating_event_addr signal
296-pop_LCCS signal 297-top_hccs_cmd signal
298-valid_bits_LCCS signal 299-top_hccs_cmd signal
302-746 calls or the return stack operating process of equipment more
Embodiment
Please refer to Fig. 1, it is the block diagram of a pipeline microprocessor 100 of the present invention.In a preferred embodiment, microprocessor 100 has comprised 12 stratum as shown in Figure 1.
Microprocessor 100 includes an I stratum 102 (or claiming instruction fetch stratum 102).I stratum 102 can provide an extraction address to carry out for microprocessor 100 to extract instruction to an instruction cache.The address is miss in instruction cache if this extracts, and then I stratum 102 will extract miss fast line taking (missing cache line) in a primary memory, and wherein this primary memory is coupled to microprocessor 100.In a preferred embodiment, I stratum 102 includes a branch target address caching, extracts the address by instruction cache, and this branch target address cache can be carried out access along an instruction cache of I stratum 102 inside.In addition, the destination address of the branch instruction that was performed before this branch target address caching can store, and the information of forecasting that stores relevant branch instruction, no matter whether branch instruction can be extracted.Branch target address caching can extract the position according to this instruction cache and refer to produce the destination address of a prediction of result and a branch instruction.Branch instruction is the technology that is widely known by the people in the known technology of microprocessor, but it comprises the instruction of reprogramming flow process.Particularly, branch instruction can change the value (for example instruction pin pin (IP) buffer 214 among Fig. 2) of an instruction pointer impact damper or programmed counting impact damper, and wherein this instruction pointer impact damper or programmed counting impact damper can be noted down the memory address of the instruction that the next one will be performed.I stratum 102 includes multiple stratum in a preferred embodiment.
Microprocessor 100 also includes a F stratum 104 (or claiming order format stratum 104), and it is coupled to I stratum 102.F stratum 104 includes instruction decode and formats logic to be used for instruction decode and format.In a preferred embodiment, microprocessor 100 is an x86 processor, and the instruction of the various different length of its tolerable is in its instruction set.F stratum 104 can receive command bit stream in instruction cache, and analyzes this command bit stream to become the separation group of byte according to the length of each instruction, and wherein this separation group can form x86 instruction.Particularly, the decoding logic of F stratum 104 can will call and link order decoding, and respond one deciphered call or link order is called or return stack (CRS) 206 to upgrade one in the microprocessor as shown in Figure 2.Call or return stack (CRS) 206 will cooperate Fig. 2 in the back detailed description with F stratum 104.
Microprocessor 100 also includes an X stratum 106 (or claiming to translate stratum 106), and it is coupled to F stratum 104.This X stratum 106 comprises an instruction transfer interpreter, the macro instruction of x86 can be translated to micro-order, and this micro-order is carried out by the remaining stratum of pipeline.
Microprocessor 100 also includes a R stratum 108 (or claiming buffer stratum 108), and it is coupled to X stratum 106.This R stratum 108 also comprises the visual temporary collection of a user (user-visible register set) except comprising the visual buffer of other non-user (non-user-visibleregister).The instruction operands of the micro-order after being used for translating is stored in R stratum 108 buffers, carries out the usefulness of this micro-order for pipeline 100 follow-up stratum.
Microprocessor 100 also includes an A stratum 112 (or claiming address stratum 112), and it is coupled to R stratum 108.This A stratum 106 comprises that the address produces logic, and it can receive operand and micro-order from R stratum 108, and produces the needed address of micro-order, for example is used to load/store the memory address of micro-order.Particularly A stratum 112 also can calculate the branch instruction destination address.
Microprocessor 100 also includes a J stratum 114, and it is coupled to A stratum 112.This J stratum 114 can receive the branch instruction destination address that is calculated by A stratum 112.It can detect the branch instruction error prediction.That is to say, J stratum 114 can detect by one than pipeline 100 stratum of front because a branch instruction make be predicted as error prediction, and so-called herein error prediction does not promptly match for destination address and the previous destination address of predicting that is calculated by A stratum 112.J stratum 114 will cooperate Fig. 2 in the back detailed description.
Microprocessor 100 also includes a D stratum 116 (or claiming data hierarchy 116), and it is coupled to J stratum 114.This D stratum 116 comprises the logical block that is used for the data that address that access produces by A stratum 112 state clearly.But in D stratum 116 autonomous memories with in the data load microprocessor 100.Particularly D stratum 116 can respond a link order to load the return address in the autonomous memory storehouse.Be used for a link order is called certainly or compare in return stack (CRS) 206 return addresses that taken place in order to respond before by the return address that main memory stack loaded, call or whether return stack 206 has correct value with detection.If call or the return address of return stack 206 and the return address of main memory stack do not match, then pipeline 100 will clean (flush) and utilize the return address of main memory stack to restart.D stratum 116 also includes a data cache, but gets the data in the microprocessor 100 soon in its autonomous memory.In a preferred embodiment, this data cache is one or three round-robin high-speed caches.G stratum 118 is coupled to D stratum 116, and it is as second stratum of data cache project.And H stratum 122 is coupled to G stratum 118, with the three-layered as the data cache project.
Microprocessor 100 also comprises an E stratum 124 (or claiming to carry out stratum 124), and it is coupled to H stratum 122.This E stratum 124 has actuating logic (for example arithmetic logic unit) to carry out micro-order, stratum provided before wherein those micro-orders were based on data and operand.Particularly, E stratum 124 also can produce a decomposition goal address and an execution result of all branch instructions.That is the destination address of E stratum 124 is known as the destination address of all correct branch instructions, and all predicted target address must be complementary with it.Moreover E stratum 124 can produce a decomposition result that is used for all branch instructions, and no matter whether this branch is extracted.
Microprocessor 100 also comprises a S stratum 126 (or claiming to store stratum 126), and it is coupled to E stratum 124.S stratum 126 can deposit the result that the micro-order that receives from E stratum 124 is carried out in the main memory in.In addition, the destination address and the execution result of the branch instruction that calculates of E stratum 124 offer S stratum 126.Moreover the branch target address caching of I stratum 102 can utilize the destination address of a branch instruction and execution result and be upgraded by S stratum 126.Whether S stratum 126 can detect by a prediction of being done than pipeline 100 stratum early is error prediction.Whether in addition, S stratum 125 also can detect has unusual (exception) to take place.So-called be included in a change that is taken place when normal procedure is carried out unusually herein.The unusual reason that takes place may or be that software event causes for hardware, and it includes hardware interrupts, software interruption, non-internal memory project of covering curtain interruptions, instruction tracing execution, breakpoint, the overflow that counts or underflow, page fault, deviation, memory protect violation, undefined instruction, hardware fault, power-fail, zero is removed, segment limit mistake, floating-point mistake, damage descriptor, assembly (for example coprocessor) inefficacy.S stratum 126 will cooperate Fig. 2 in the back detailed description.
Microprocessor 100 also comprises a W stratum 128 (or claiming write-back stratum 128), and it is coupled to S stratum 126.This W stratum 128 can be written back into the result from S stratum 126 gained the buffer of R stratum 108 to upgrade the state of microprocessor 100.
As shown in Figure 1, the included stratum of microprocessor 100 is above one, it can detect various invalid events, for example branch prediction mistake or instruction exception etc., it will require microprocessor 100 to carry out with corrigendum calling or return stack (CRS) 206 renewals of being done according to the predictive that calls or link order is done.The microprocessor of this and known technology has very big-difference, and known microprocessor only can detect invalid event in the stratum in pipeline, as previously mentioned the United States Patent (USP) case.Therefore if an invalid event is when J stratum 114 is detected, the more equipment described in Fig. 2 of the present invention will for J stratum 114 (containing) above call or link order is corrected CRS 206.Otherwise, if an invalid event is when S stratum 126 is detected, the more equipment described in Fig. 2 of the present invention will for S stratum 126 (containing) above call or link order is corrected CRS 206.
Please refer to Fig. 2, it calls for the microprocessor 100 among Fig. 1 of the present invention comprises or the return stack block diagram of equipment more.Fig. 2 includes the part of pipeline stratum 102 to 128 among Fig. 1.In addition as shown in Figure 2, microprocessor 100 includes steering logic 202, be coupled to the both-end input multiplexer 204 of steering logic 202, be coupled to that one of multiplexer 204 calls or return stack (CRS) 206, be coupled to return address corrigendum storehouse (return address correctionstack, RACS) 208 of CRS 206, be coupled to one or the three ends input multiplexer 212 of RACS 208, be coupled to an instruction pointer buffer 214 of multiplexer 212, be coupled to the high-order corrigendum order storehouse (HCCS) 216 of steering logic 202, the low order corrigendum order storehouse (LCCS) that is coupled to one or the six ends input multiplexer 222 of HCCS 216 and is coupled to multiplexer 222.
When a call instruction was decoded, F stratum 104 can produce a true value to offer steering logic 202 on a call signal 232.In a preferred embodiment, a call instruction comprises count logic CALL instruction of an x86.Steering logic 202 can respond this call signal 232 and call or return stack (CRS) 206 with the renewal of predictive ground, and its detailed situation will illustrate in the back.In addition, F stratum 104 can provide the address of subroutine to steering logic 202, and it is noted down by the call instruction on a subroutine_addr signal 236.Moreover F stratum 104 also can be provided at the return address that the call instruction on the return_addr signal 238 notes down and give steering logic 202.At last, when a link order was decoded, F stratum 104 also can produce a return signal 234 to offer steering logic 202.In a preferred embodiment, a link order comprises count logic RET instruction of an x86.Steering logic 202 can respond this return signal 234 and predictive ground upgrades and to call or return stack (CRS) 206, and it will describe in detail in the back.
Call or link order when arriving J stratum 114 when one, J stratum 114 can produce true value on a J_call/ret signal 244, to offer steering logic 202.In a preferred embodiment, F stratum 104 produces a mode bit and indicates an instruction for calling or link order, and this mode bit along with this instruction to offer pipeline.If this mode bit is a true time, J stratum promptly produces true value on a J_call/ret signal 244.
When the time-out situation took place in J stratum 114, it can produce a true value and give steering logic 202 on J_stalled signal 242; Otherwise J stratum 114 produces a pseudo-value on J_stalled signal 242.When J_stalled signal 242 is true value, the instruction that expression just is being present in the J stratum 114 will can not advance to next stratum (being D stratum 116) at next clock pulse in the cycle, and work as J_stalled signal 242 is pseudo-true time, and the instruction that its expression just is being present in the J stratum 114 will advance to next stratum at next clock pulse in the cycle.
If a branch instruction arrives J stratum 114 and J stratum and detects branch instruction when being error prediction, J stratum 114 will produce true value in J_mispredict signal 246, will require this moment above stratum of all J stratum to clean (flush) so that microprocessor 100 is branched off into correct branch instruction target, that is be branched off into the instruction (if this branch is an error prediction) afterwards of this branch instruction, or be branched off into the destination address (when the address of prediction is wrong if this branches into correct prediction) of correct branch instruction.For branch can be decomposed in J stratum 114, but J stratum 114 also the Branch Computed instruction correct target and on a J_correct_addr signal 291, provide correct address to steering logic 202.In addition, comprise J stratum 114 and above stratum thereof and be defined herein as upper strata or high-order stratum in the pipeline 100, following stratum of J stratum then is called low layer or low order stratum.
Call or link order when arriving S stratum 126 when one, S stratum 126 can produce true value to offer steering logic 202 on a S_call/ret signal.In a preferred embodiment, the mode bit that F stratum is produced is if show this instruction for calling or link order, and when also being true value, S stratum 126 just can produce true value on S_call/ret signal 252.
And if a branch instruction arrives S stratum 126 and S stratum 126 and detects this branch instruction when mispredicted before, S stratum 126 will produce a true value on a S_mispredict signal 254, above stratum of S stratum 126 (containing) will clean and make microprocessor 100 to be branched to the correct target of branch instruction thus.S stratum 126 also can provide correct branch instruction target, with steering logic 202 on a S_correct_addr signal 293.
If an instruction arrives S stratum 126, and S stratum 126 detects this instruction and takes place one unusually the time, and S stratum 126 will produce a true value on a S_exception signal 256.This moment, 126 (containing) of S stratum, all above stratum will be required to clean so that microprocessor 100 can branch to an exception handler.In a preferred embodiment, detected a branch instruction by J stratum 114 or S stratum 126, or one instruct an abnormality that is produced in S stratum 126 into error prediction, be to call or link order.
One input end of multiplexer 204 can receive a new_return_addr signal 262 that is produced by steering logic 202.When F stratum 104 produced a true value on a call signal 232, steering logic 202 can be delivered to multi-function device 204 via new_return_addr signal 262 with receive and pass through the return address of return_addr signal 238 from F stratum 104.Second input end of multiplexer 204 can be corrected storehouse (RACS) 208 places from the return address and receive a RACS_ret_addr signal 258.The numerical value of RACS_ret_addr signal 258 is RACS 208 top item purpose return addresses, and its content will describe in detail in the back.The output of multiplexer 204 is as calling or the input of return stack (CRS) 206.Steering logic 202 can produce a CRS_mux_sel signal 264 with control multiplexer 204.When F stratum 104 illustrates that via call signal 232 existence of a call instruction comes predictive ground to upgrade CRS 206 to utilize the call instruction return address, steering logic 202 can produce a value on CRS_mux_sel signal 264, so that multiplexer 204 is selected new_return_addr signal 262, its situation will be described in detail in the back.Control signal 202 produces a value on CRS_mux_sel signal 264, and make multiplexer 204 can respond an invalid incident, for example select the previous wrong speculative update of RACS_ret_addr signal 258 with corrigendum CRS 206 by indicated invalid events such as J_mispredict signal 246, S_mispredict signal 254 or S_exception signals 256.
CRS 206 comprises the array an of storage assembly (or project), to be used to store the call instruction return address.In a preferred embodiment, CRS 206 comprises that 16 projects are to store 16 return addresses; In the hope of simplifying accompanying drawing, only represent among Fig. 2 with three projects.The project of those CRS 206 is arranged as a storehouse kenel.Steering logic 202 can produce a true value from multiplexer 204 places one return address is advanced into CRS 206 on a push_CRS signal 266.When steering logic 202 advances a return address to CRS 206, existing return address in CRS 206 will down be advanced a project (project of bottom will be pushed away), and will be loaded the top project of CRS 206 by the return address that multiplexer 204 is provided.Steering logic 202 also can produce a true value on a pop_CRS signal 268, it can eject (pop) return address and call or return stack (CRS) 206 to leave.And eject return addresses and when leaving CRS 206 when steering logic 202, existing top item purpose return address in CRS 206 will and be provided for a CRS_ret_addr signal 292 by diversion rapidly, and the return address that is positioned at other project will all be moved a project.Its detailed operational scenario will illustrate in the back.
Return address corrigendum storehouse (RACS) 208 includes the array of a storage assembly (or project) in order to store the call instruction return address.In a preferred embodiment, RACS 208 comprises that three projects are to store three return addresses.And the project of RACS 208 is arranged as a storehouse kenel.Steering logic 202 can produce a true value to advance a return address to RACS 208 from CRS_ret_addr signal 292 on a push_RACS signal 272.And when steering logic 202 advances a return address to RACS 208, existing return address in RACS 208 will down be advanced a project (project of bottom will be pushed away), and the return address that offers CRS_ret_addr signal 292 can be loaded in the top project of RACS208.Steering logic 202 also can produce a true value on a pop_RACS signal 274, it can eject (pop) return address to leave RACS 208.And eject return addresses and when leaving RACS 208 when steering logic 202, existing top item purpose return address in RACS 208 will and be provided for a RACS_ret_addr signal 258 by diversion rapidly, and the return address that is positioned at other project will all be moved a project.Its detailed operational scenario will illustrate in the back.
One input end of multiplexer 212 can receive a new_subroutine_addr signal 276 that is produced by steering logic 202.When F stratum 104 produced a true value on a call signal 232, steering logic 202 can be delivered to multi-function device 212 via new_subroutine_addr signal 276 with receive and pass through the return address of subroutine_addr signal 236 from F stratum 104.Second input end of multiplexer 212 can receive a CRS_ret_addr signal 292 from CRS 206 places.And the self controllable system logic 202 of the 3rd input end of multiplexer 212 receives an invalidating_event_addr signal 295.In addition, the output of multiplexer 212 is as the input of point instruction buffer 214.Steering logic 202 can produce a true value on IP_load signal 282, it can make the output of multiplexer 212 be loaded in the point instruction buffer 214.When F stratum 104 via the existence of call signal 232 explanation one call instruction so that microprocessor 100 is when branching to the call instruction subroutine address, as the described situation of following cooperation Fig. 3, steering logic 202 can produce a true value so that multiplexer 212 is selected new_subroutine_addr signal 276 on IP_mux_sel signal 278.The existence when F stratum 104 in addition via return signal 234 explanations one link order, the return address that makes microprocessor 100 branch to before to be stored in CRS 206 with respond one before be performed call instruction the time, as the described situation of following cooperation Fig. 3, steering logic 202 can produce a true value on IP_mux_sel signal 278 so that multiplexer 212 is selected CRS_ret_addr signal 292.When an invalid event takes place, for example a branch misprediction or instruction exception etc., steering logic 202 can produce a value on IP_mux_el signal 278, so that multiplexer 212 is selected invalidating_event_addr 295, but not just generation after CRS 206 is by corrigendum, as described later about the situation of Fig. 6 and Fig. 7.Specifically, as if being a branch instruction error prediction in J stratum 114 detected invalid events, steering logic 202 can be delivered to invalidating_event_addr signal 295 with J_correct_addr signal 291; If in S stratum 126 detected invalid events is a branch instruction error prediction, steering logic 202 can be delivered to invalidating_event_addr signal 295 with S_correct_addr signal 293; And if be an instruction exception in S stratum 126 detected invalid events, steering logic 202 can be delivered to invalidating_event_addr signal 295 with the microcode exception handler program address in the microcode internal memory of microprocessor 100, and wherein this microcode exception handler program is used to handle special anomalous event.
High-order corrigendum order storehouse (HCCS) 216 comprises that the array of a storage assembly (or project) is to be used to store the order of corrigendum CRS 206.In a preferred embodiment, HCCS 216 comprises that six projects are to store six orders.And those projects are arranged as the storehouse form.The top item purpose content of HCCS 216 is provided on the top_hccs_cmd signal 297, and this top_hccs_cmd signal 297 is provided for steering logic 202.Steering logic 202 can produce a true value on a push_HCCS signal 284, be provided on the cmd signal 248 to order to HCCS216 as one of the input end of HCCS 216 to advance one.When steering logic 202 advances the HCCS 216 of an order, existing order meeting in HCCS 216 is down advanced a project (project of bottom will be pushed away), is provided to the top project that order on the cmd signal 248 then will be written into HCCS 216.Steering logic 202 also can produce a true value on a pop_HCCS signal 286, it can eject an order to leave HCCS 216.And eject an order and when leaving HCCS 216, the order in the project of top will be removed when steering logic 202, the order that is positioned at other project then can up be moved a project.HCCS 216 can store the order that is used for correcting CRS 206, its with pipeline 100 in above stratum of J stratum 114 (containing) existing call or link order relevant, relatively, the LCCS 224 stored orders that are used for correcting CRS 206, call with following stratum of J stratum 114 (containing) in the pipeline 100 existing or link order relevant.The operating conditions of HCCS216 and LCCS 224 will be described in detail in the back.
Steering logic 202 stores two orders in HCCS 216, first is the POP-to-correct order, and second is the PUSH-to-correct order.POP-to-correct order can be indicated more equipment to eject a return address and be left CRS 206 to correct a call instruction of being carried out and just being cleaned from pipeline 100 places by predictive.PUSH-to-correct order then can be after ejecting these return addresses from RACS 208, indicates more that equipment advances a return address to CRS 206, to correct a call instruction of being carried out and just being cleaned from pipeline 100 places by predictive.
Microprocessor 100 also includes storage assembly to store the significance bit 218 relevant with each project of HCCS 216.Significance bit 218 also is arranged as a storehouse form according to HCCS 216.If significance bit 218 be true value, its expression corresponds to the order of project of HCCS 216 for effective, otherwise be invalid.Steering logic 202 can read and write significance bit 218 via valid_bits_HCCS signal 288.Specifically, when steering logic 202 advances one to order in HCCS 216, significance bit 218 will move down and will load a true value in significance bit 218 tops.It mainly is because an effective order down is advanced into corresponding HCCS 216 projects, so this project just is effective.On the contrary, when HCCS 216 was left in steering logic 202 ejections one order, significance bit 218 will up move and will load a puppet and be worth to significance bit 218 bottoms.Significance bit 218 is initialized as a pseudo-value.In addition, significance bit 218 can be carried out respectively via valid_bits_HCCS signal 288 and read and write, to remove (being ineffective treatment) bottom effective item from HCCS 216, as described later.
Multiplexer 222 utilizes its six input ends can receive six interior orders of 216 6 projects of HCCS respectively.And the output terminal of multiplexer 222 is as the input of low order corrigendum order storehouse (LCCS) 224.When J stratum 114 produces a true value and produces a pseudo-value in J_stalled signal 242 in J_call/ret signal 244, to illustrate that one calls or link order when passing through J stratum 114, steering logic 202 can produce an xfer_cmd_mux_sel signal so that multiplexer 222 selects the input end of one of six input ends to offer LCCS 224, the effective order of its transferable HCCS 216 bottoms is to LCCS 224, as described later.
LCCS 224 comprises that the array of a storage assembly (or project) corrects CRS 206 with stored command.In a preferred embodiment, LCCS 224 comprises that four projects are to store four orders.The project of LCCS 224 is arranged as a storehouse kenel.The top item purpose content of LCCS 224 is provided on the top_lccs_cmd signal 299, and this top_lccs_cmd signal 299 offers steering logic 202.Steering logic 202 can produce a true value on a push_LCCS signal 294, provided from multiplexer 222 and order to LCCS 224 as one of the input end of LCCS 224 to advance one.When steering logic 202 advanced the LCCS 2224 of an order, existing order meeting in LCCS 224 was down advanced a project (project of bottom will be pushed away), and the order that is provided from multiplexer 222 then will be loaded the top project of LCCS224.Steering logic 202 also can produce a true value on a pop_LCCS signal 296, it can eject an order to leave LCCS 224.And eject an order and when leaving LCCS 224, the order in the project of top will be removed when steering logic 202, the order that is positioned at other project then can up be moved a project.LCCS 224 can store the order that is used for correcting CRS 206, its with pipeline 100 in following stratum of J stratum 114 (containing) existing call or link order relevant.Relatively, the HCCS 216 stored orders that are used for correcting CRS 206, call with above stratum of J stratum 114 (containing) in the pipeline 100 existing or link order relevant.HCCS 216 will describe in detail in the back with the operating conditions of LCCS 224.The order that is stored in LCCS 224 is identical with these two orders of the top described HCCS of being relevant to 216.
Microprocessor 100 also includes storage assembly to store the significance bit (valid bits) 226 relevant with each project of LCCS 224.Steering logic 202 can read and write significance bit 226 via valid_bits_LCCS signal 298.And the situation of the operational scenario of significance bit 226 and above-mentioned significance bit 218 is similar.
Please refer to Fig. 3, it is calling or the return stack operational flowchart of equipment more among Fig. 2 of the present invention.Particularly Figure 3 shows that when one call or link order such as Fig. 1 by F stratum 104 when detected, the operational circumstances of microprocessor 100.Its flow process starts from square frame 302.
In square frame 302, F stratum 104 detects one and calls or link order exists and predicts that this calls or link order is extracted; Then F stratum according to call or the existence of link order whether, respectively at producing a true value on call signal 232 or the return signal 234.If instruction is call instruction, F stratum 104 also can produce call instruction subroutine subroutine_addr 236 and return_addr 238.Flow process enters into decisional block 304 afterwards.
In decisional block 304, steering logic check call signal 232 and return signal 234 with determine its be call instruction also or link order.If call instruction, flow process enter square frame 306; If a link order, flow process enter square frame 314.
In square frame 306, steering logic 202 is delivered to new_return_addr262 with return_addr 238, it can be advanced into CRS 206 to select new_return_addr_262 in producing a true value on the push_CRS signal 266 and produce true value on CRS_mux_sel signal 264, thus the call instruction that just can judge in response to decisional block 304 places and predictive ground renewal CRS 206.Then flow process enters square frame 308.
In square frame 308, steering logic 202 is ordered to HCCS 216 to advance this in producing to-correct order on the cmd signal 248 and produce a true value on push_HCCS signal 284.Therefore, as cooperating the aforementioned content of Fig. 2, this significance bit 218 is intended for the up-to-date order that is pushed into.Then flow process enters square frame 312.
In square frame 312, steering logic 202 can be in producing a true value on the IP_load signal 282 and produce true value on IP_mux_sel signal 278, it can make multiplexer 212 select new_subroutine_addr 276, and then makes microprocessor 100 branch to the call instruction subroutine address.This flow process ends at square frame 312.
In square frame 314, steering logic 202 can produce a true value on pop_CRS signal 268, it can eject the interior return address of top project that (pop) is stored in CRS 206, predictive ground upgrades CRS 206 with the link order of judging in response to decisional block 304 places, and wherein this return address provides to CRS_ret_addr signal 292.Then flow process enters square frame 316.
In square frame 316, steering logic 202 is advanced into RACS208 in producing a true value on the push_RACS signal 272 with the CRS_ret_addr signal 292 that will eject from CRS 206 places in square frame 314.Then flow process enters square frame 318.
In square frame 318, steering logic 202 is ordered to HCCS 216 to advance this in producing to-correct order on the cmd signal 248 and produce a true value on push_HCCS signal 284.Therefore, as cooperating the aforementioned content of Fig. 2, this significance bit 218 is used as the up-to-date order that is pushed into.Square frame is followed flow process and is entered square frame 322.
In square frame 322, steering logic 202 can be in producing a true value on the IP_load signal 282 and produce true value on IP_mux_sel signal 278, it can make multiplexer 212 select new_subroutine_addr 276, and then making microprocessor 100 branch to link order, this link order is to eject from CRS 206 places in the square frame 314.This flow process ends at square frame 322.
Please refer to Fig. 4, it is for calling among Fig. 2 of the present invention or the return stack operational flowchart of equipment more.Particularly Figure 4 shows that when one and call or link order when arriving the J stratum 114 of Fig. 1 the operational circumstances of microprocessor 100.Its flow process starts from square frame 402.
In square frame 402, steering logic 202 determines that by detecting true value on the J_call/ret signal 244 one calls or link order has arrived J stratum 114, and is worth to determine that by the puppet on the detection J_stalled signal 242 J stratum 114 there is no time-out (stall).Then flow process enters square frame 404.
In square frame 404, steering logic 202 can be removed the interior order of bottom effective item of HCCS 216.In other words, steering logic 202 can read significance bit 218 and cleanings so that bottom significance bit 218 becomes the puppet value, wherein this bottom significance bit 218 has a true value so that the bottom effective order of HCCS 216 becomes invalidly, and main cause is that this order is transferred to LCCS 224.Then flow process enters square frame 406.
In square frame 406, steering logic 202 can produce a true value on push_LCCS signal 294, and in producing a true value on the xfer_cmd_mux_sel signal 228 with select command in the project of HCCS 216, it can be advanced into LCCS 224 with this removed order, and wherein the project of HCCS 216 is removed in square frame 404.This flow process ends at square frame 406.
Please refer to Fig. 5, it is for calling among Fig. 2 of the present invention or the return stack operational flowchart of equipment more.Particularly Figure 5 shows that when one and call or link order when arriving the S stratum 126 of Fig. 1, that is call or link order when no longer being predictive the operational circumstances of microprocessor 100 when one.Its flow process starts from square frame 502.
In square frame 502, steering logic 202 determines that by detecting a true value on the S_call/ret signal 252 one calls or link order has arrived S stratum 126.Then flow process enters square frame 504.
In square frame 504, steering logic 202 is removed the order in the bottom effective item of LCCS 224.That is steering logic read significance bit 226 and cleanings so that bottom significance bit 226 becomes the puppet value, wherein this bottom significance bit 226 have a true value so that the bottom effective order of LCCS 224 become invalid, main cause be call or link order no longer for predictive; Therefore, if an invalid event takes place, CRS 206 according in the S stratum 126 call or link order will not be updated.This flow process ends at square frame 504.
Please refer to Fig. 6, it is for calling among Fig. 2 of the present invention or the return stack operational flowchart of equipment more.Particularly Figure 6 shows that when a branch instruction error prediction is detected the operational circumstances of microprocessor 100 in J stratum 114.Its flow process starts from square frame 602.
In square frame 602, steering logic 202 determines that by detecting J_mispredict signal 246 a J stratum 114 has checked out a branch instruction error prediction.Then flow process enters square frame 604.
In decisional block 604, whether steering logic 202 reads the significance bit 218 relevant with HCCS 216 top projects effective with the order that decision is stored in wherein.If the result is that then flow process does not advance to square frame 642; Otherwise flow process advances to decisional block 606.
In decisional block 606, steering logic 202 detects the order on HCCS216 top to determine the form of this order via top_hccs_cmd signal 297.If this order is PUSH-to-correct order, then flow process advances to square frame 612, if this order is POP-to-correct order, then flow process advances to square frame 608.
In decisional block 608, steering logic 202 produces a true value and leaves CRS 206 to eject the return address on pop_CRS signal 268, so just, the predictive that can correct the mistake that is caused according to the call instruction in the above a certain stratum of J stratum 114 (containing) advances, and it is advanced into CRS206 with the return address.Wherein if detect the branch instruction error prediction in J stratum 114, above stratum of J stratum 114 (containing) will be cleaned.Then flow process advances to square frame 614.
In square frame 612, steering logic 202 produces a true value and leaves RACS 208 to eject the return address, top on pop_RACS signal 274, and wherein this return address, top provides to RACS_ret_addr signal 258.In addition, steering logic 202 produces a true value so that RACS_ret_addr signal 258 is advanced into CRS 206 on push_CRS signal 266, so just, the predictive that can correct the mistake that is caused according to the link order in the above a certain stratum of J stratum 114 (containing) ejects, and it ejects the return address and leaves CRS 206.Wherein if detect the branch instruction error prediction in J stratum 114, above stratum of J stratum 114 (containing) will be cleaned.Then flow process advances to square frame 614.
In square frame 614, steering logic 202 lies on the pop_HCCS signal 286 and to produce a true value and leave HCCS 216 to eject the top order, because calling or link order of being performed to property by inference this moment, CRS 206 is to be correct value.Then flow process is got back to decisional block 604 to judge whether another effective order in HCCS 216 exists.
In square frame 642, the microprocessor 100 renewable information of forecastings that caused branch misprediction to take place.For example, if branch misprediction is done by the branch target address caching of I stratum 102, and error prediction being disabled of project of branch target address caching is so that can be used in the branch target address caching according to the extraction address of branch instruction next time, then this branch target address caching will produce one miss (miss), and it will cause predicted branches instruction not to be extracted and make flow process advance to the instruction that the next one continues but not advance to the target instruction target word of this branch.Then flow process advances to square frame 644.
In square frame 644, microprocessor 100 cleans the instruction that is present in than upper strata pipeline stratum, that is above stratum of J stratum 114 (containing).In a preferred embodiment, the method for clear instruction comprises sets a significance bit so that each instruction in the pipeline stratum that it corresponded to is pseudo-value.Then flow process advances to square frame 646.
In square frame 646, steering logic 202 produces a true value on IP_load signal 282, and on IP_mux_sel signal 278, produce a true value, it can make multiplexer 212 select invalidating_event_addr signal 295 and allow microprocessor 100 be branched to correct branch instruction target, and wherein this branch instruction is sent from J_correct_addr signal 291 places by steering logic 202.This flow process just ends at square frame 646.
Please refer to Fig. 7, it is for calling among Fig. 2 of the present invention or the return stack operational flowchart of equipment more.Particularly Figure 7 shows that when a branch instruction error prediction is detected the operational circumstances of microprocessor 100 in S stratum 126.Its flow process starts from square frame 702.
In square frame 702, steering logic 202 determines that by detecting S_mispredict signal 254 a J stratum 114 has checked out a branch instruction error prediction, or by checking S_exception signal 256 to determine that S stratum 126 is one unusual to check out.Then flow process enters square frame 704.
The flow process of the square frame 604 to 614 among the square frame 704 to 714 among Fig. 7 and Fig. 6 is similar; Therefore no longer repeated description it.It is unique different with Fig. 6 to be, when the order on HCCS 216 tops is invalid, flow process will advance to decisional block 724 by decisional block 704 places.
In decisional block 724, whether steering logic 202 reads the significance bit 226 relevant with LCCS 224 top projects effective with the order that decision is stored in wherein.If the result is that then flow process does not advance to square frame 742; Otherwise flow process advances to decisional block 726.
In decisional block 726, steering logic 202 detects the order on LCCS224 top to determine the form of this order via top_lccs_cmd signal 299.If this order is PUSH-to-correct order, then flow process advances to square frame 732, if this order is POP-to-correct order, then flow process advances to square frame 728.
In square frame 728, steering logic 202 produces a true value and leaves CRS 206 to eject the return address on pop_CRS signal 268, so just, the predictive that can correct the mistake that is caused according to the call instruction in the following a certain stratum of J stratum 114 (containing) advances, and it is advanced into CRS 206 with the return address.Wherein if detect the branch instruction error prediction in S stratum 126, or when S stratum 126 checks unusual generations, the stratum below the J stratum 114 (containing) will be cleaned.Then flow process advances to square frame 734.
In square frame 732, steering logic 202 produces a true value and leaves RACS 208 to eject the return address, top on pop_RACS signal 274, and wherein this return address, top provides to RACS_ret_addr signal 258.In addition, steering logic 202 produces a true value so that RACS_ret_addr signal 258 is advanced into CRS 206 on push_CRS signal 266, so just, the predictive that can correct the mistake that is caused according to the link order in the following a certain stratum of J stratum 114 (containing) ejects, and it ejects the return address and leaves CRS 206.Wherein if detect the branch instruction error prediction in S stratum 126, or when S stratum 126 detects unusual generations, the stratum below the J stratum 114 (containing) will be cleaned.Then flow process advances to square frame 734.
In square frame 734, steering logic 202 produces a true value and leaves LCCS 224 to eject the top order on pop_LCCS signal 296, because calling or link order of being performed to property by inference this moment, CRS 206 has been correct value.Then flow process is got back to decisional block 724 to judge whether another effective order in LCCS224 exists.
In square frame 742, the microprocessor 100 renewable information of forecastings that caused branch misprediction to take place.If the step among Fig. 7 then will not have any action and take place because S stratum 126 detects one in square frame 702 be performed unusually in the square frame 742.Then flow process advances to square frame 744.
In square frame 744, the instruction that microprocessor 100 cleans in the existing pipeline stratum more than S stratum 126 (containing).
In square frame 746, steering logic 202 produces a true value on IP_load signal 282, and on IP_mux_sel signal 278, produce a true value, it can make multiplexer 212 select invalidating_event_addr signal 295 and allow microprocessor 100 when a branch misprediction takes place, branched to correct branch instruction target, wherein this branch instruction is sent from J_correct_addr signal 291 or S_correct_addr signal 293 places by steering logic 202, or allow microprocessor 100 when an anomalous event takes place, branched to microcode exception handler program (microexception handler routine) to handle this abnormal conditions.And this flow process just ends at square frame 746.
Specific embodiments of the invention narrated as before, but the present invention is not subject to this.The above only is preferred embodiment of the present invention, so can not limit the scope of the invention with it, it gives the usefulness that those of ordinary skill in the art used or made the present invention for providing.For example, though needing a microprocessor to correct, the mentioned two kinds of incidents in front (being referred to as branch misprediction with unusual) call or renewal that return stack is done one, it is carried out according to the predictive that before called or link order is done, and the present invention not only is defined in the incident of these two kinds of forms; Exactly, a microprocessor can use call or return stack more equipment call or return stack with the incident that responds other kind kenel.In addition, though the aforesaid content of the present invention is calling of x86 architecture microprocessor and link order, the present invention also can use and comprise and calling and any microprocessor of link order in its instruction set.Moreover, though the project that has a given number among the aforesaid embodiment in various corrigendum storehouse, the big I of storehouse decides according to the stratum's number in the various program segment of microprocessor pipeline.
Specific embodiments of the invention narrated as before, but the present invention is not subject to this.The above only is preferred embodiment of the present invention, can not limit the scope of the invention with it, and it gives the usefulness that those of ordinary skill in the art used or made the present invention for providing.In the present invention, use the enforcement aspect of hardware, the present invention (for example: software) in (computer readableprogram code), for example may be implemented in order to the computing machine of store program code and can use on (as: readable) media (computer usable medium) also may be implemented in computer readable program code.This program code can be realized disclosed function, formation or both combinations.For instance, it can utilize computer readable program code to finish, the form of this computer readable program code then can be general program language (as C, C++, JAVA or the like), GDSII form or hardware description language (hardware description languages, HDL), as VerilogHDL, VHDL, AHDL or the like, also can be in the known technology other kind database, program and/or circuit and pick (circuit capture) instrument etc. of depositing.And this program code also can directly be built in computing machine available media known to any, it includes semiconductor memory, disk and CD (as CD-ROM, DVD-ROM etc.), also can be embedded in computing machine can use in (as: readable) transmission medium (as: carrier wave or any other kind comprise the media of numeral, optics or analog basis).In itself, this program code can be in communication network, as: transmit among Internet and the Intranet.The present invention can show in the processor of a computer embedded sign indicating number in aforementioned mentioned function and/or structure, and also can be exchanged into example, in hardware becomes specific part on the whole integrated circuit.Certainly the present invention more can combined with hardware and the form of program code implement.
At last, though the present invention is for realizing the optimal mode of purpose of the present invention, any those of ordinary skill in the art can be in not breaking away from the defined scope of the present invention of claim, can use the idea and the particular specific embodiment that are disclosed to be used as the basis immediately, carry out the design identical or revise other structure with purpose of the present invention.

Claims (21)

1. one kind is used for correcting that one of a pipeline microprocessor calls or the device of return stack, comprising:
One first storehouse, it includes a plurality of first projects and is relevant to calling or the corrigendum information of link order in a plurality of first stratum of microprocessor pipeline to be used for storing;
One second storehouse, it is coupled to this first storehouse, and this second storehouse includes a plurality of second projects and is relevant to calling or the corrigendum information of link order in a plurality of second stratum of microprocessor pipeline to be used for storing; And
Steering logic, it is coupled to this first and second storehouse, this steering logic can receive and be used to indicate one to call or link order just is being sent to a control signal of these a plurality of second stratum from these a plurality of first stratum, wherein according to this control signal, this steering logic can be called or this corrigendum information of link order is moved to second storehouse by this first storehouse being relevant to this.
2. device as claimed in claim 1, it is characterized in that this steering logic will be relevant to this and call or this corrigendum information of link order is moved to the method for second storehouse by this first storehouse, comprise wherein removing in a bottom effective item that this steering logic should this a plurality of first projects of corrigendum information in this first storehouse, and should removed corrigendum information be advanced into this second storehouse.
3. device as claimed in claim 1, it is characterized in that if should be relevant call or link order is a call instruction, then should corrigendum information comprise that an order leaves intrinsic call or return stack to eject a return address.
4. device as claimed in claim 1 is characterized in that also comprising:
One the 3rd storehouse, it is coupled to this steering logic, the 3rd storehouse include a plurality of the 3rd projects be used for storing be relevant to this a plurality of first or the return address of the call instruction of the second pipeline stratum.
5. device as claimed in claim 1 is characterized in that also comprising:
A plurality of significance bits, it is coupled to this first storehouse, and whether these a plurality of significance bits are used for indicating the corresponding project of these a plurality of first projects effective.
6. device as claimed in claim 1 is characterized in that also comprising:
A plurality of significance bits, it is coupled to this second storehouse, and whether these a plurality of significance bits are used for indicating the corresponding project of these a plurality of second projects effective.
7. device as claimed in claim 1 is characterized in that this control signal can indicate this to call or link order has arrived a bottom stratum in these a plurality of first pipeline stratum.
8. device as claimed in claim 1 is characterized in that:
Described steering logic receives and is used to refer to that in these a plurality of first pipeline stratum this calls or link order is speculated as one second control signal of carrying out mistakenly.
9. device as claimed in claim 8, it is characterized in that this second control signal can be instructed in that in these a plurality of first pipeline stratum this calls or a branch instruction of link order front by microprocessor institute error prediction.
10. device as claimed in claim 8 is characterized in that this steering logic can respond this second control signal, utilizes the interior corrigendum information of these a plurality of first projects that is stored in this first storehouse to correct and calls or return stack.
11. device as claimed in claim 10, it is characterized in that about each effective item in these a plurality of first projects, this steering logic can eject the effective item on the top in these a plurality of first projects in this first storehouse, and corrects this according to its this stored corrigendum information and call or return stack.
12. device as claimed in claim 10 is characterized in that:
Described steering logic receive be used to refer to this a plurality of first with this second pipeline stratum in this call or link order is speculated as one the 3rd control signal of carrying out mistakenly.
13. device as claimed in claim 1 is characterized in that:
Described steering logic receives and is used to refer to that this in this a plurality of second stratum in the microprocessor pipeline calls or one of them of link order no longer is a secondary signal of supposition, and wherein this steering logic is upgraded this second storehouse according to this secondary signal.
14. device as claimed in claim 1 is characterized in that:
Described steering logic receives and is used for that requirement is called or return stack calls according to this of this a plurality of first pipeline stratum or link order and the secondary signal upgraded, wherein this steering logic can be according to this secondary signal, calls or one of them corrigendum information of link order deposits in this first storehouse being relevant to this.
15. the microprocessor of a pipelineization, comprising having:
One calls or return stack;
First and second pipeline stratum, it is respectively to producing a true value on first and second signal, corresponding respectively to the error prediction testing result of the branch instruction in this first and second pipeline stratum, and wherein this first stratum is positioned in pipeline above this second stratum;
One device, itself and first and second pipeline stratum couples to receive this first and second signal, this device is used to keep existing calling or the first information of link order in the pipeline stratum that is higher than this first stratum, and be used to keep and be relevant to calling or second information of link order in the pipeline stratum between this first and second stratum, when wherein being true value as if this first signal, this device configuration is for can utilize this first information to call or go back to storehouse optionally to correct this, and if this secondary signal is when being true value, this device can utilize this first and second information to call or go back to storehouse optionally to correct this.
16. microprocessor as claimed in claim 15 is characterized in that:
Described device receive be used to indicate this to call or link order one of them arrived one the 3rd signal of this first stratum;
Wherein this device can respond the 3rd signal, is transferred to this second information with the part with this first information, and this part correlation calls or link order in this.
17. microprocessor as claimed in claim 15 is characterized in that:
Described device receive be used to indicate this to call or link order one of them arrived one the 3rd signal of this second stratum;
Wherein this device can respond the 3rd signal, so that the part of this second information is removed in this second information, this part correlation in this that arrives this second stratum call or link order one of them.
18. microprocessor as claimed in claim 15 is characterized in that:
Described device receives and is used for indicating the instruction that detects by this second stratum produced one the 3rd a unusual signal;
Wherein if the 3rd signal is a true value, this device can utilize this first and second information to call or go back to return stack to correct this.
19. one kind be used for that one of a pipeline microprocessor is called or return stack with keep consistent method with the internal memory that it couples, comprising:
The request of receiving upgrade call or return stack to respond existing calling or link order;
Respond aforesaid receiving step, will correct in information storage to one first impact damper;
After having stored, detect this and call or one of them one first stratum that whether has passed through microprocessor pipeline of link order, its configuration is for can detect an invalid event; And
Respond aforesaid detection step, the part of corrigendum information is moved to one second impact damper from this first impact damper.
20. method as claimed in claim 19 is characterized in that also comprising:
When if this first stratum detects this invalid event, utilize the corrigendum information that is stored in this first impact damper to call or return stack to correct this.
21. method as claimed in claim 19 is characterized in that also comprising:
When if one second stratum in the microprocessor pipeline detects one second invalid event, utilize the corrigendum information that is stored in this first and second impact damper to call or return stack to correct this, wherein this second stratum is below first stratum.
CNB2004100038255A 2004-02-06 2004-02-06 Method and device for correcting internal call or return stack in microprocessor Expired - Lifetime CN1326037C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100038255A CN1326037C (en) 2004-02-06 2004-02-06 Method and device for correcting internal call or return stack in microprocessor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100038255A CN1326037C (en) 2004-02-06 2004-02-06 Method and device for correcting internal call or return stack in microprocessor

Publications (2)

Publication Number Publication Date
CN1558326A CN1558326A (en) 2004-12-29
CN1326037C true CN1326037C (en) 2007-07-11

Family

ID=34350797

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100038255A Expired - Lifetime CN1326037C (en) 2004-02-06 2004-02-06 Method and device for correcting internal call or return stack in microprocessor

Country Status (1)

Country Link
CN (1) CN1326037C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106104483A (en) * 2014-03-14 2016-11-09 Arm有限公司 Abnormality processing in microprocessor system

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7971044B2 (en) * 2007-10-05 2011-06-28 Qualcomm Incorporated Link stack repair of erroneous speculative update
US8521996B2 (en) * 2009-02-12 2013-08-27 Via Technologies, Inc. Pipelined microprocessor with fast non-selective correct conditional branch instruction resolution
US8635437B2 (en) * 2009-02-12 2014-01-21 Via Technologies, Inc. Pipelined microprocessor with fast conditional branch instructions based on static exception state
US20170090927A1 (en) * 2015-09-30 2017-03-30 Paul Caprioli Control transfer instructions indicating intent to call or return

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6253315B1 (en) * 1998-08-06 2001-06-26 Intel Corporation Return address predictor that uses branch instructions to track a last valid return address
US6314514B1 (en) * 1999-03-18 2001-11-06 Ip-First, Llc Method and apparatus for correcting an internal call/return stack in a microprocessor that speculatively executes call and return instructions
US6408385B1 (en) * 1988-03-01 2002-06-18 Mitsubishi Denki Dabushiki Kaisha Data processor
US6530016B1 (en) * 1998-12-10 2003-03-04 Fujitsu Limited Predicted return address selection upon matching target in branch history table with entries in return address stack

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6408385B1 (en) * 1988-03-01 2002-06-18 Mitsubishi Denki Dabushiki Kaisha Data processor
US6253315B1 (en) * 1998-08-06 2001-06-26 Intel Corporation Return address predictor that uses branch instructions to track a last valid return address
US6530016B1 (en) * 1998-12-10 2003-03-04 Fujitsu Limited Predicted return address selection upon matching target in branch history table with entries in return address stack
US6314514B1 (en) * 1999-03-18 2001-11-06 Ip-First, Llc Method and apparatus for correcting an internal call/return stack in a microprocessor that speculatively executes call and return instructions

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106104483A (en) * 2014-03-14 2016-11-09 Arm有限公司 Abnormality processing in microprocessor system
CN106104483B (en) * 2014-03-14 2020-07-14 Arm有限公司 Microprocessor system, method for operating microprocessor system, and readable medium

Also Published As

Publication number Publication date
CN1558326A (en) 2004-12-29

Similar Documents

Publication Publication Date Title
US5615350A (en) Apparatus to dynamically control the out-of-order execution of load-store instructions in a processor capable of dispatching, issuing and executing multiple instructions in a single processor cycle
US6170054B1 (en) Method and apparatus for predicting target addresses for return from subroutine instructions utilizing a return address cache
US7178010B2 (en) Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack
CN102112966B (en) Apparatus and methods for speculative interrupt vector prefetching
EP0871109B1 (en) Forwarding of results of store instructions
US8447959B2 (en) Multithread processor and method of controlling multithread processor
CN1632877B (en) Variable latency stack cache and method for providing data
CN101815984B (en) Link stack repair of erroneous speculative update
CN100390756C (en) Virtual set high speed buffer storage for reorientation of stored data
EP1296230B1 (en) Instruction issuing in the presence of load misses
US5930832A (en) Apparatus to guarantee TLB inclusion for store operations
EP1296229B1 (en) Scoreboarding mechanism in a pipeline that includes replays and redirects
CN101493762B (en) Method and device for data processing
CN101449238A (en) Local and global branch prediction information storage
CN101460922B (en) Sliding-window, block-based branch target address cache
JP2001297000A (en) Microprocessor having first-order issuing queue and second-order issuing queue
US5634119A (en) Computer processing unit employing a separate millicode branch history table
CN101501635B (en) Methods and apparatus for reducing lookups in a branch target address cache
CN1326037C (en) Method and device for correcting internal call or return stack in microprocessor
JP2596712B2 (en) System and method for managing execution of instructions, including adjacent branch instructions
JP2001229024A (en) Microprocessor using basic cache block
CN100397365C (en) Apparatus and method for resolving deadlock fetch conditions involving branch target address cache
US5841999A (en) Information handling system having a register remap structure using a content addressable table
EP1296228B1 (en) Instruction Issue and retirement in processor having mismatched pipeline depths
CN1270233C (en) Processor and method for returning branch prediction mechanism of remote skip and remote call instruction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term
CX01 Expiry of patent term

Granted publication date: 20070711