CN1514357A - Buffer command and its device and method for relatirely later producing related information - Google Patents

Buffer command and its device and method for relatirely later producing related information Download PDF

Info

Publication number
CN1514357A
CN1514357A CNA2003101204239A CN200310120423A CN1514357A CN 1514357 A CN1514357 A CN 1514357A CN A2003101204239 A CNA2003101204239 A CN A2003101204239A CN 200310120423 A CN200310120423 A CN 200310120423A CN 1514357 A CN1514357 A CN 1514357A
Authority
CN
China
Prior art keywords
instruction
buffer
formation
clock
clock period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2003101204239A
Other languages
Chinese (zh)
Other versions
CN1310137C (en
Inventor
汤玛斯・C・麦当劳
汤玛斯·C·麦当劳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INTELLIGENCE FIRST CO
Original Assignee
INTELLIGENCE FIRST CO
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/422,057 external-priority patent/US7159097B2/en
Application filed by INTELLIGENCE FIRST CO filed Critical INTELLIGENCE FIRST CO
Publication of CN1514357A publication Critical patent/CN1514357A/en
Application granted granted Critical
Publication of CN1310137C publication Critical patent/CN1310137C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Advance Control (AREA)

Abstract

An instruction buffering apparatus is disclosed. The apparatus includes an early queue and a late queue. The early queue receives an instruction generated during a first clock cycle. The late queue receives information related to the instruction during a second clock cycle subsequent to the first clock cycle. The early queue receives load/shift control signals for loading/shifting the early queue. Registers receive the early queue load/shift signals and provide delayed versions of the signals to the late queue for controlling loading/shifting the related information in the late queue. The late queue is configured such that when the apparatus is empty, the related information may be provided during the second clock cycle, i.e., in the same clock cycle that its related instruction is provided from the early queue.

Description

The device and method of buffered instructions and later generation relevant information thereof
Technical field
The pipeline that the present invention relates to microprocessor is carried out (pipelining) field, particularly relates to a kind of device and method about buffered instructions in the pipeline microprocessor and later generation relevant information thereof.
Background technology
Modern microprocessor is the pipeline microprocessor.Just, in the different blocks or pipeline stage of microprocessor, can carry out several instructions simultaneously.Hennessy and Patterson carry out pipeline and are defined as " reality that a plurality of instructions can overlapping execution is made technology ".Referring to the calculator structure: quantization method (second edition), the Morgan Kaufmann publishing company by San Francisco, California printed and distribute in 1996, and John L.Hennessy and David A.Patterson show.Then, they carry out the explanation of having done following excellence to pipeline:
Pipeline and assembly line are similar.In automobile assembly line, many steps are arranged, each step all has some contribution for the construction of automobile.Though each step is to carry out on different automobiles, can operate concurrently with other step.In the computing machine pipeline, each step can be finished the part of instruction.As assembly line, different step can be finished the different piece of different instruction concurrently.Each of these steps is called pipeline stage or line section.These stages can be connected to the next stage with a stage, and form pipeline, and the instruction meeting enters from an end, passes through these stages, and leaves from the other end, just as the automobile in the assembly line.
Synchronous microprocessor is to operate according to the clock period.Usually, every through a clock cycle, instruction just reaches another stage from stage of microprocessor pipeline.In automobile assembly line, if the workman in certain stage is not because have automobile to assemble and be in idle state, then the production capacity of this assembly line or usefulness just can reduce.Similarly, if the microprocessor stage will not carry out because of instruction is arranged, and be in idle state (being commonly referred to pipeline foam (pipelme bubble)) in a clock cycle, then the usefulness of processor can reduce.
In this manual, each stage of microprocessor pipeline logically can be distinguished into two parts.Upper layer part branch extracts and translation instruction, and delivers to the underclad portion of execution command.Top section generally includes the instruction fetch device, to instruct from the internal memory extraction procedure.Because quite long from the time that Installed System Memory extraction instruction is required, so top section also comprises instruction cache, the instruction of being extracted to get from internal memory soon reduces the follow-up instruction fetch time.The groundwork of upper strata pipeline stage is for when the execute phase is ready to execute instruction, and these instructions are ready.
Avoid the execute phase to produce foam, a kind of method that is adopted in the upper strata pipeline stage is exactly fetch program in advance usually, and a plurality of programmed instruction are extracted into an instruction buffer.This instruction buffer can be delivered to instruction and be ready to execute phase of executing instruction.Instruction buffer often is configured to first-in first-out memory, or formation.
The instruction buffer technology was useful especially when the execute phase, one or more required instruction was not present in instruction cache.In the case, can lower how much lose the influence that fast line taking causes, be to comply with when carrying out the internal memory extraction, and instruction buffer can provide to the instruction of execute phase to be decided.
Have in program under the situation of branch instruction, buffer technology also is useful.Modern microprocessor is to utilize branch prediction logic, predicts and whether can adopt branch instruction.If adopt, then produce the destination address of branch instruction, and instruction is that from then on the destination address place extracts, rather than from the next address extraction of extracting in proper order, and can deliver to instruction buffer.
Must carry out earlier under the situation of some processing before the execute phase is delivered in instruction, instruction buffer also is helpful.For example, in some processor, it is the variable number byte that instruction set allows the length of instruction.Therefore, processor must be deciphered a string command byte, and judges the type of next instruction, to determine its length.The B0T of each instruction is that the length by previous instruction decides.This program is commonly referred to order formatization.Because order formatization needs some processing times, thus with a plurality of order formatizations, and the instruction of the format in the pipeline top section cushion is helpful, so when needed, the format instruction can be delivered to the execute phase processing.
Except extracting instruction, the upper strata pipeline stage also can produce and the extract relevant information of instructing, and these information are not command byte itself, and can use when executing instruction for the execute phase.An example is the branch prediction relevant information, and the execute phase may need it, with renewal branch prediction process, or the branch instruction of corrigendum prediction error.Another kind of example is the length of instruction, and it must be carried out in the situation of variable length instruction decision at processor and come out.Relevant information may produce after command byte has been ready to deliver to clock period of instruction buffer.Yet relevant information must relative instruction synchronously be delivered to the execute phase.
A kind of settling mode to this problem is to increase another pipeline stage, gives the relevant information more time to cushion and to deliver to the execute phase.Yet this mode has the shortcoming that reduces usefulness potentially.Especially, when the branch instruction predictions mistake, all pipeline stage before the branch instruction of prediction error must empty its instruction, and the extraction of instruction must return the bifurcation of prediction error.The required number of stages that empties the more, then to produce the possibility of foam also bigger the execute phase of microprocessor pipeline.Therefore it is few as much as possible to wish to make pipeline stage ideally.So this problem needs preferable settling mode.
Summary of the invention
The way that the present invention solves foregoing problems is than a snubber assembly reception late clock period of command byte, just relevant command information to be received to this snubber assembly.This device comprises an instrument, can make dependent instruction information bypass buffer device effectively when instruction is loaded empty impact damper, and in the identical clock period that produces this relevant information, be sent to the execute phase.Therefore, in order to reach above-mentioned purpose, the invention provides a kind of device in order to buffered instructions in the separator tube linearize microprocessor and later generation relevant information thereof, wherein this snubber assembly is up to getting instruction at least one clock period of back, just obtain this relevant information.This device comprises first formation, and it has more than first project, and each project is in order to store an instruction.This device also comprises second formation, has more than second project corresponding to this more than first project, each project in order to store with first formation in one correspondingly instruct relevant information.This device also comprises a plurality of control signals, is coupled to first formation, in order to load, to be shifted and to keep these to instruct in first formation.This device also comprises a plurality of buffers, receives these control signals, and these control signals of a clock period of output delay, to load, to be shifted and to keep relevant information in second formation.
On the other hand, the invention provides a kind of instruction buffer.This instruction buffer comprises a plurality of multitask type buffers (muxed-register), and each is in order to store an instruction.This instruction buffer also comprises a plurality of scratch pad multiplexers (registered-mux), and each is in order to store the relevant information of instruction in the corresponding buffer with these multitask type buffers.This instruction buffer also comprises a steering logic, is coupled to these multitask type buffers, in order to producing a control signal, with optionally with one in these multitask type buffers of instruction load.This instruction buffer also comprises a buffer, in order in one first clock period, receive the value on the control signal, and the second clock after first clock period exports this value in the cycle, optionally to load corresponding to one the relevant information of these scratch pad multiplexers in these multitask type buffers.
On the other hand, the invention provides a kind of microprocessor.This microprocessor comprises an order format device, in order in first clock period, exports a branch instruction.This microprocessor also comprises a steering logic, in the cycle, produces the relevant information of prediction of branch instruction therewith in order to the second clock after first clock period.This microprocessor also comprises an instruction buffer, is coupled to the order format device, in order to this branch instruction of buffering during first clock period, and receives this information in the cycle at second clock.If instruction buffer is empty, then in the cycle, optionally export this information at second clock during first clock period.If instruction buffer is not empty, then during the second clock cycle, optionally cushion this information during first clock period.
On the other hand, the invention provides the buffered instructions in a kind of microprocessor that has pipeline in order to buffering and the method for later generation relevant information thereof.It is interior with an instruction load instruction queue that the method was included in for first clock period, and the second clock after first clock period produces the relevant information of instructing therewith in the cycle.The method also comprises judges that this instruction is during the second clock cycle, from then on whether formation is shifted out, and if this instructs not formation from then on to shift out, then in the cycle this relevant information is loaded this formation at second clock, and if this instructs formation from then on to shift out, then during the second clock cycle, walk around this formation and this relevant information is delivered to pipeline together with this instruction.
On the other hand, the invention provides a kind of instruction buffer.This instruction buffer comprises first multiplexer, have an output terminal, and keep data input pin, a loading data input end with reception one instruction in first clock period, and a control input end is to receive first control signal.If the value of this control input end is true, then first multiplexer is chosen the loading data input end, otherwise can choose the maintenance data input pin.This instruction buffer also comprises first buffer, has an input end of the output terminal that is coupled to first multiplexer, and an output terminal that is coupled to the maintenance data input pin of first multiplexer.This instruction buffer also comprises second buffer, has an input end and an output terminal.This instruction buffer also comprises second multiplexer, have the input end that is coupled to second buffer an output terminal, be coupled to second buffer output terminal one keep data input pin, a loading data input end to receive the information relevant in the cycle with this instruction with the second clock after first clock period, and a control input end is to receive second control signal.If the value of this control input end is true, then second multiplexer is chosen the loading data input end, otherwise can choose the maintenance data input pin.This instruction buffer also comprises the 3rd buffer, have an input end with reception first control signal in first clock period, and an output terminal is to produce second control signal at second clock in the cycle.Therefore, if first control signal is true, then in the cycle, can export this instruction and this relevant information during first clock period at second clock.
On the other hand, the invention provides the computer data signal that is contained in a kind of in the transmission medium.This computer data signal comprises computer readable program code, so that the device of a kind of available buffer instruction and relevant information to be provided in the pipeline microprocessor.This snubber assembly is up to getting instruction at least one clock period of back, just obtaining this relevant information.This program code comprises first program code, and so that one first formation to be provided, it has more than first project, and each is in order to store an instruction.This program code also comprises second program code, and so that one second formation to be provided, it has more than second project corresponding to this more than first project, each be in order to store with first formation in one correspondingly instruct relevant information.This program code also comprises the 3rd program code, so that a plurality of control signals to be provided, is coupled to first formation, in order to load, to be shifted and to keep instruction in first formation.This program code also comprises quadruple pass preface code, so that a plurality of buffers to be provided, receives these control signals, and these control signals in one clock cycle of output delay, to load, to be shifted and to keep relevant information in second formation.
An advantage of the present invention is, can use instruction buffer or formation, and avoid increasing another pipeline stage, thus the usefulness of promoting processor.
Further feature of the present invention and advantage, details are as follows to cooperate following explanation and accompanying drawing now.
Description of drawings
Fig. 1 is the block scheme of microprocessor of the present invention.
Fig. 2 is the block scheme of formation morning of the format instruction queue of Fig. 1 of the present invention.
Fig. 3 is the block scheme of later formation of the format instruction queue of Fig. 1 of the present invention.
Fig. 4, Fig. 5 and Fig. 6 are the time sequences figure of the format instruction queue of Fig. 1 of the present invention.
Wherein, description of reference numerals is as follows:
100: microprocessor
102: steering logic
104: instruction cache
106: branch target address caching
108: the pre-decode logic
112: the command byte impact damper
114: command byte impact damper steering logic
116: the order format device
118: format instruction queue steering logic
132: formation early
134: significance bit
138: the instruction transfer interpreter
142:lload[2:0] signal
146: later formation
The 151:I-stage
152: signal F_new_instr
The 153:F-stage
154: translate instruction queue (XIQ)
The 155:X-stage
The 156:XIQ steering logic
The 157:R-stage
161: the control input
The 162:eload signal
The 164:eshift signal
165: command byte and pre-decode information
167: command byte
The 168:lshift signal
169: pre-decode information
171: micro-order
172,178,210,211,212,310,311,312: multiplexer
175: the predicted branches destination address
176: the execute phase buffer
177: the corrigendum address
179: the next address of extracting in proper order
181: existing extraction address
182: the current ordcurrent order signal-arm
183,185,220,221,222,320,321,322: buffer
186: signal X_rel_info
187: the format instruction queue
The 188:F_valid signal
189: the significance bit buffer
The 191:late0 signal
The 193:early0 signal
194: the branch prediction relevant information
The 195:XIQ_full signal
197: the format instruction
198: signal F_instr_info
199:FIQ is full of signal
202: clock signal (clk)
Embodiment
Now please refer to Fig. 1, is the block scheme of microprocessor 100 of the present invention.Microprocessor 100 is the pipelined processor that comprises a plurality of pipeline stage.Shown a part of stage among the figure, just I-stage 151, F-stage 153, X-stage 155 and R-stage 157.The I-stage 151 comprises the stage of extracting command byte from internal memory or instruction cache.In one embodiment, the I-stage 151 comprises a plurality of stages.The F-stage 153 comprises the stage that a string not formative command byte is formatted into the format instruction.The X-stage 155 comprises the stage that formative macro instruction is translated into micro-order.The R-stage 157 comprised from the buffer stage of buffer archives load operation number.Other execute phase of microprocessor 100 does not show, as is connected on address after R-stage 157 and produces stage, data phase, execute phase, storage stage and write back stage as a result.
Microprocessor 100 comprises the instruction cache 104 in the I-stage 151.Instruction cache 104 can be got the instruction of being extracted soon from the Installed System Memory that is coupled to microprocessor 100.Instruction cache 104 receives an existing extraction address 181, exports with the command byte 167 of choosing a fast line taking.In one embodiment, instruction cache 104 is the high-speed cache in a multiple stage, and just, instruction cache 104 needs a plurality of clock period, to export the 181 pairing fast line takings of existing extraction address.
Microprocessor 100 also comprises the multiplexer 178 in the I-stage 151.Multiplexer 178 can provide existing extraction address 181.Multiplexer 178 can receive the next address 179 of extracting in proper order, and it adds the stored fast line taking size of instruction cache 104 for existing extraction address 181.Multiplexer 178 also can receive a corrigendum address 177, and it specifies the address of microprocessor 100 branch that wants, with corrigendum branch prediction mistake.Multiplexer 178 also can receive a predicted branches destination address 175.
Microprocessor 100 also comprises the branch target address caching (BTAC) 106 in the I-stage 151, and it is coupled to multiplexer 178.BTAC 106 can respond existing extraction address 181, and produces predicted branches destination address 175.BTAC 106 can get the branch target address of the branch instruction of carrying out soon, and the address of branch instruction itself.In one embodiment, BTAC 106 comprises four road set associative caches (4-way set associative cache), and each road of selected set all comprises a plurality of projects, in order to store the branch prediction information of destination address and predicted branches instruction.Except predicted target address 175, BTAC 106 also can export branch prediction relevant information 194.In one embodiment, BTAC information 194 comprises: a side-play amount, in order to specify in the selected fast line taking of instruction in existing extraction address 181 position of first byte of predicted branches instruction; One indication is to point out that predicted branches instruction is whether across the border of half fast line taking (half-cache line); In the selected road, the significance bit of each project; Point out in the selected set which is the indication of least recently used (1east-recently-used); Point out that in a plurality of projects on selected road, which is least-recently-used indication; And the branch instruction prediction that whether can be used or not adopt.
Microprocessor 100 also comprises steering logic 102.If the address of getting effectively soon of a previous branch instruction of carrying out matches among existing extraction address 181 and the BTAC 106, and if BTAC 106 these branch instructions of prediction will be used, then steering logic 102 meeting control multiplexers 178 be chosen BTAC destination address 175.If the branch prediction mistake takes place, then steering logic 102 can be chosen corrigendum address 177 by control multiplexer 178.Otherwise steering logic 102 can be chosen the next address 179 of extracting in proper order by control multiplexer 178.Steering logic 102 also can receive BTAC information 194.
Microprocessor 100 also comprises the pre-decode logical one 08 in the I-stage 151, and it is coupled to instruction cache 104.Pre-decode logical one 08 can receive the fast line taking of the command byte of being sent by instruction cache 104 167, and BTAC information 194, produces pre-decode information 169 according to this.In one embodiment, pre-decode information 169 comprises: the position that is associated with each command byte, and it predicts that whether this byte is operation code (opcode) byte of the BTAC 106 predictions branch instruction that can adopt; In order to predict the position of next instruction length, it is according to the instruction length of being predicted; The position that is associated with each command byte, it predicts whether this byte is preamble (prefix) byte of instruction; And to branch instruction result's prediction.
Microprocessor 100 also comprises the command byte impact damper 112 in the F-stage 153, and it is coupled to pre-decode logical one 08.Command byte impact damper 112 can receive pre-decode information 169 from pre-decode logical one 08, and receives command byte 167 from instruction cache 104.Command byte impact damper 112 can be delivered to steering logic 102 with pre-decode information via signal 196.In one embodiment, command byte impact damper 112 can cushion the command byte and the relevant pre-decode information of nearly four fast line takings.
Microprocessor 100 also comprises command byte impact damper steering logic 114, and it is coupled to command byte impact damper 112.Command byte impact damper steering logic 114 flows into and escape instruction byte buffer 112 in order to steering order byte and relevant pre-decode information.Command byte impact damper steering logic 114 also can receive BTAC information 194.
Microprocessor 100 also comprises the order format device (instructionformatter) 116 in the F-stage 153, and it is coupled to command byte impact damper 112.Order format device 116 can receive command byte and pre-decode information 165 from command byte impact damper 112, and therefrom produces format instruction 197.Just, order format device 116 can be checked a string command byte in the command byte impact damper 112, judges which byte comprises the length of next instruction and next instruction, and next instruction is output as format instruction (formatted_instr) 197.In one embodiment, format the format instruction of instructing (formatted_instr) 197 to be provided and comprise the instruction that meets x86 framework instruction set in fact.In one embodiment, the format instruction is also referred to as macro instruction, and it can be translated into can be by the performed micro-order of execute phase of microprocessor 100 pipelines.Format instruction (formatted_instr) the 197th produced in the F-stage 153.Each order format device 116 output format instructions (formatted_instr) 197, order format device 116 can produce the F_new_instr signal 152 of true value, effectively format instruction occurs to be illustrated on format instruction (formatted_instr) 197.In addition, order format device 116 can be via signal F_instr_info 198, and the relevant information of output format instruction (formatted_instr) 197 is to steering logic 102.In one embodiment, signal F_instr_info 198 comprises: whether a branch instruction can be used or not adopted prediction (if instruction is branch instruction); The preamble of instruction; Whether instruction address hits in the branch target buffer of microprocessor; Whether instruction is direct descendant's instruction far away (far direct branch instruction); Whether instruction is indirect branch instruction far away; Whether instruction is the calling branch instruction; Whether instruction is for returning branch instruction; Whether instruction is for far returning branch instruction; Whether instruction is the unconditional branch instruction; And whether instruction is conditional branch instructions.In addition, order format device 116 can be via current ordcurrent order pointer (CIP) signal 182, the address of output format instruction, and this pointer is the length that previous instruction is added in the address of previous instruction.
Microprocessor 100 also comprises the format instruction queue (FIQ) 187 in the X-stage 155.Format instruction queue 187 can receive format instruction (formatted_instr) 197 from order format device 116.Format instruction queue 187 also can be via the 193 output format instructions of early0 signal.In addition, format instruction queue 187 can receive and the relevant information of format instruction that is received via format instruction (formatted_instr) 197 from steering logic 102 via signal X_rel_info 186.Signal X_rel_info 186 results from the X-stage 155.Format instruction queue 187 also can be via late0 signal 191, output and the relevant information of format instruction (it is via 193 outputs of early0 signal).Format instruction queue 187 and signal X_rel_info 186 will do below and illustrate in greater detail.
Microprocessor 100 also comprises format instruction queue (FIQ) steering logic 118.FIQ steering logic 118 can be from order format device 116 received signal F_new_instr 152.When format instruction queue 187 had expired, FIQ steering logic 118 can produce the FIQ_full signal 199 of a true value, delivers to order format device 116.FIQ steering logic 118 also can produce eshift signal 164, with the displacement of instruction in the control format instruction queue 187.FIQ steering logic 118 also can produce a plurality of eload signals 162, to control the action that will format the empty item of instruction queue 187 from the instruction load of format instruction (formatted_instr) 197.In one embodiment, FIQ steering logic 118 can produce an eload signal 162 for each project in the format instruction queue 187.In one embodiment, format instruction queue 187 comprises 12 projects, each items storing one format macro instruction.But, for brevity, Fig. 1 to 3 shows the format instruction queue 187 that comprises three projects; Therefore, Fig. 1 has shown three eload signals 162, with eload[2:0] signal 162 represents.
FIQ steering logic 118 also can record format instruction queue 187 in a relevant significance bit 134 of each project.Embodiment shown in Figure 1 comprises three significance bits 134, represents with V2, V1 and V0.V0 significance bit 134 is corresponding to lowest term purpose significance bit in the format instruction queue 187; V1 significance bit 134 is corresponding to the middle entry purpose significance bit in the format instruction queue 187; V2 significance bit 134 is corresponding to the significance bit of high project in the format instruction queue 187.FIQ steering logic 118 also can be exported a F_valid signal 188, and it is V0 significance bit 134 in one embodiment.Significance bit 134 is whether the corresponding project in the presentation format instruction queue 187 comprises an effective instruction.FIQ steering logic 118 also can receive an XIQ_full signal 195.
Microprocessor 100 also comprises the instruction transfer interpreter 138 in the X-stage 155, and it is coupled to format instruction queue 187.Instruction transfer interpreter 138 can receive format instruction the early0 signals 193 from format instruction queue 187, and will format macro instruction and be translated into one or more micro-order 171.In one embodiment, microprocessor 100 comprises Reduced Instruction Set Computer (reduced instruction set computer, be called for short RISC) core, with the micro-order of the instruction set carrying out primary (native) or simplify.
Microprocessor 100 also comprises translates instruction queue (XIQ) 154 in the X-stage 155, and it is coupled to instruction transfer interpreter 138.Translating instruction queue (XIQ) 154 can be cushioned by the micro-order 171 of translating that instruction transfer interpreter 138 is received.Translating instruction queue (XIQ) 154 also can be via late0 signal 191, the relevant information that buffering is received from format instruction queue 187.Relevant with micro-order 171 via the information that late0 signal 191 is received, this is because it is with to format macro instruction (micro-order be therefrom translate and get) relevant.The execute phase of microprocessor 100 can be used relevant information 191, to carry out relevant micro-order 171.
Microprocessor 100 also comprises XIQ steering logic 156, is coupled to translate instruction queue (XIQ) 154.XIQ steering logic 156 receives F_valid signal 188, and produces XIQ_full signal 195.XIQ steering logic 156 also can produce X_load signal 164, will translate micro-order 171 and relevant information 191 with control and load the action of translating instruction queue (XIQ) 154.
Microprocessor 100 also comprises the multiplexer 172 of two input ends in the X-stage 155, and it is coupled to translates instruction queue (XIQ) 154.Multiplexer 172 is to be used for being used as bypass (bypass) multiplexer, translates instruction queue (XIQ) 154 optionally to walk around.One input end of multiplexer 172 can receive the output of translating instruction queue (XIQ) 154.Another input end of multiplexer 172 then receives the input of translating instruction queue (XIQ) 154, just micro-order 171 and late0 signal 191.Multiplexer 172 can import 161 according to the control that XIQ steering logic 156 is produced, and chooses a wherein input end, exports the execute phase buffer 176 in the R-stage 157 to.If execute phase buffer 176 is ready to receive instruction, and translate instruction queue (XIQ) 154 when instruction transfer interpreter 138 output micro-orders 171 for empty, then XIQ steering logic 156 can control multiplexers 172, walk around XIQ 154.Whether microprocessor 100 also comprises significance bit buffer 189, can receive the X_valid signal 148 that comes from XIQ steering logic 156, effective with micro-order and the relevant information of pointing out to be stored in the execute phase buffer 176.
Format instruction queue 187 comprises a formation 132 early, to store the format macro instruction that is received via format instruction (formatted_instr) signal 197, and comprise the later formation 146 of a correspondence, to store the relevant information that is received via X_rel_info signal 186.Fig. 1 shows the formation 132 early that comprises three projects, represents with EE2, EE1 and EE0.EE0 is the bottom project of formation 132 early, and EE1 is the middle project of formation 132 early, and EE2 is the top project of formation 1 32 early.The content of EE0 can be delivered to output signal e arly0 193.Signal eshift 164 and eload[2:0] move by the displacement and the loading of formation 132 morning in order to control for signal 162.Similarly, Fig. 1 also shows the later formation 146 that comprises three projects, represents with LE2, LE1 and LE0.LE0 is the bottom project of later formation 146, and LE1 is the middle project of later formation 146, and LE2 is the top project of later formation 146.The content of LE0 can be delivered to output signal late0 191.
Format instruction queue 187 also comprises buffer 185.When first clock period finished, buffer 185 can receive eshift signal 164 from FIQ steering logic 118, and in the next clock period, the value of the eshift signal 164 that in lshift signal 168, is received during first clock period of output.Format instruction queue 187 also comprises three buffers 183.When first clock period finished, buffer 183 can receive eload[2:0 from FIQ steering logic 118] signal, and in the next clock period, in lload[2:0] eload[2:0 that received during first clock period of output in the signal 142] value of signal 162.Just, buffer 185 and eshift signal 164 and the eload[2:0 of 183 meeting difference clock period of output delay] signal.
In one embodiment, X_rel_info 186 comprises: the length of format macro instruction (corresponding micro-order therefrom translate and get); Whether macro instruction is across half indication on line taking border soon; The displacement field of macro instruction; (immediate) immediately field of macro instruction; The instruction pointer of macro instruction; And with branch prediction and correct relevant various information (if macro instruction be predicted to be branch instruction).
In one embodiment, whether the relevant information of branch prediction and corrigendum: being used for predicted branches instruction is used or not adopted branch-history table information if comprising; Be used for whether predicted branches instruction is used or a part of linear instruction pointer of adopted branch instruction not; Be used for and predict that employing/linear instruction pointer that does not adopt carries out branch's sample of mutual exclusion exclusive disjunction; When if branch prediction is incorrect, in order to second branch's sample of reinstatement; In order to the various marks of correlated characteristic of expression branch instruction, whether these features such as branch instruction are whether target, correlated branch, indirect branch and the branch instruction result's of conditional branch instructions, call instruction, return stack prediction is implemented by the static prediction device; With the BTAC 106 relevant various information that give a forecast, as existing extraction address 181 whether coincide among the BTAC 106 get soon address, this identical address whether effectively, whether branch instruction expects is used or is not used, in the set by the selected BTAC 106 in existing extraction address 181, recently the minimum road that is used, if the execution of instruction need be upgraded BTAC 106, can replace with which road of selected set, and the destination address exported of BTAC 106.In one embodiment, the X_rel_info 186 of a part can produce during the clock period formerly, and store to transmit with relevant information, the latter produces the clock period after macro instruction is sent by early0 signal 193 from the project EE0 than formation 132 early.
Now please refer to Fig. 2, is the block scheme of formation 132 morning of the format instruction queue 187 of Fig. 1 of the present invention.
Formation 132 early comprises three multitask type buffers that are concatenated into formation.Three multitask type buffers comprise project EE2, EE1 and the EE0 among Fig. 1.
Top multitask type buffer in the formation 132 early comprises the multiplexer 212 of two input ends of tool, and buffer 222 (representing with ER2), in order to receive the output of multiplexer 212.Multiplexer 212 comprises that one loads input end, in order to receive format instruction (formatted instr) signal 197 among Fig. 1.Multiplexer 212 comprises that also one keeps input end, in order to the output of reception buffer ER2 222.Multiplexer 212 can receive the eload[2 among Fig. 1] signal 162, be used as control input signals.If eload[2] signal 162 be that very then the loading input end of multiplexer 212 can be chosen formatted_instr signal 197; Otherwise the maintenance input end of multiplexer 212 can be chosen the output of buffer ER2 222.Run into the clock signal rising edge of (representing with clk 202), buffer ER2 222 can load the value that multiplexer 212 is exported.
Middle multitask type buffer in the formation 132 early comprises the multiplexer 211 of tool three input ends, and buffer 221 (representing with ER1), in order to receive the output of multiplexer 211.Multiplexer 211 comprises that one loads input end, in order to receive format instruction (formatted_instr) signal 197.Multiplexer 211 comprises that also one keeps input end, in order to the output of reception buffer ER1 221.Multiplexer 211 also comprises a carry input, in order to the output of reception buffer ER2 222.Multiplexer 211 can receive the eload[1 among Fig. 1] signal 162, be used as control input signals.Multiplexer 211 also can receive the eshift signal 164 among Fig. 1, is used as another control input signals.If eload[1] signal 162 be true, then the loading input end of multiplexer 211 can be chosen format and instruct (formatted_instr) signal 197; Otherwise if eshift signal 164 is true, the carry input of multiplexer 211 can be chosen the output of buffer ER2 222; As for other situation, the maintenance input end of multiplexer 211 can be chosen the output of buffer ER1 221.Run into the rising edge of clock signal (clk) 202, buffer ER1 221 can load the value that multiplexer 211 is exported.
Bottom multitask type buffer in the formation 132 early comprises the multiplexer 210 of tool three input ends, and buffer 220 (representing with ER0), in order to receive the output of multiplexer 210.Multiplexer 210 comprises that one loads input end, in order to receive format instruction (formatted_instr) signal 197.Multiplexer 210 comprises that also one keeps input end, in order to the output of reception buffer ER0 220.Multiplexer 210 also comprises a carry input, in order to the output of reception buffer ER1 221.Multiplexer 210 can receive the eload[0 among Fig. 1] signal 162, be used as control input signals.Multiplexer 210 also can receive the eshift signal 164 among Fig. 1, is used as another control input signals.If eload[0] signal 162 be true, then the loading input end of multiplexer 210 can be chosen format and instruct (formatted_instr) signal 197; Otherwise if eshift signal 164 is true, the carry input of multiplexer 210 can be chosen the output of buffer ER1 221; As for other situation, the maintenance input end of multiplexer 210 can be chosen the output of buffer ER0 220.Run into the rising edge of clock signal (clk) 202, buffer ER0220 can load the value that multiplexer 210 is exported.The output of buffer ER0 220 is to transmit with early0 signal 193.
Now please refer to Fig. 3, is the block scheme of later formation 146 of the format instruction queue 187 of Fig. 1 of the present invention.
Later formation 146 comprises three scratch pad multiplexers that are concatenated into formation.Three scratch pad multiplexers comprise project LE2, LE1 and the LE0 among Fig. 1.
Top scratch pad multiplexer in the later formation 146 comprises the multiplexer 312 of tool two input ends, and buffer 322 (representing with LR2), in order to receive the output of multiplexer 312.Multiplexer 312 comprises that one loads input end, in order to receive the signal X_rel_info 186 among Fig. 1.Multiplexer 312 comprises that also one keeps input end, in order to the output of reception buffer LR2 322.Multiplexer 312 can receive the lload[2 among Fig. 1] signal 142, be used as control input signals.If lload[2] signal 142 be that very then the loading input end of multiplexer 312 can be chosen signal X_rel_info 186; Otherwise the maintenance input end of multiplexer 312 can be chosen the output of buffer LR2 322.At the rising edge of the clock signal (clk) 202 of Fig. 2, buffer LR2 322 can load the value that multiplexers 312 are exported.
Middle scratch pad multiplexer in the later formation 146 comprises the multiplexer 311 with three input ends, and buffer 321 (representing with LRl), in order to receive the output of multiplexer 311.Multiplexer 311 comprises that one loads input end, in order to received signal X_rel_info 186.Multiplexer 311 comprises that also one keeps input end, in order to the output of reception buffer LR1 321.Multiplexer 311 also comprises a carry input, in order to the output of reception buffer LR2 322.Multiplexer 311 can receive the lload[1 among Fig. 1] signal 142, be used as control input signals.If lload[1] signal 142 be that very then the loading input end of multiplexer 311 can be chosen signal X_rel_info 186; Otherwise if lshift signal 168 is true, the carry input of multiplexer 311 can be chosen the output of buffer LR2 322; As for other situation, the maintenance input end of multiplexer 311 can be chosen the output of buffer LR1 321.At the rising edge of the clock signal (clk) 202 of Fig. 2, buffer LR1 321 can load the value that multiplexers 311 are exported.
Bottom scratch pad multiplexer in the later formation 146 comprises the multiplexer 310 of tool three input ends, and buffer 320 (representing with LR0), in order to receive the output of multiplexer 310.Multiplexer 310 comprises that one loads input end, in order to received signal X_rel_info 186.Multiplexer 310 comprises that also one keeps input end, in order to the output of reception buffer LR0 320.Multiplexer 310 also comprises a carry input, in order to the output of reception buffer LR1 321.Multiplexer 311 can receive the lload[0 among Fig. 1] signal 142, be used as control input signals.If lload[0] signal 142 be that very then the loading input end of multiplexer 310 can be chosen signal X_rel_info 186; Otherwise if lshift signal 168 is true, the carry input of multiplexer 310 can be chosen the output of buffer LR1 321; As for other situation, the maintenance input end of multiplexer 310 can be chosen the output of buffer LR0 320.At the rising edge of the clock signal (clk) 202 of Fig. 2, buffer LR0 320 can load the value that multiplexers 310 are exported.The output of multiplexer 310 is to transmit with the late0 signal 191 among Fig. 1.
Now please refer to Fig. 4, is the time sequences figure of the format instruction queue 187 of Fig. 1 of the present invention.Fig. 4 shows five clock period, each clock period with the rising edge of clk signal 202 in Fig. 2 and 3 for opening the beginning.Traditionally, in Fig. 4, the true signal value representation is the accurate position of high logic.Fig. 4 is a kind of situation of expression, and wherein when order format device 116 produced new format macro instruction, the XIQ 154 of Fig. 1 did not fill up (just, can receive micro-order from order format device 116), and format instruction queue 187 is empty.
During the clock period 1, order format device 116 can produce the F_new_instr signal 152 of true value among Fig. 1, is present in format instruction (formatted_instr) 197 of Fig. 1 to represent an effective format macro instruction, as shown in the figure.Because format instruction queue 187 is empty, so the FIQ steering logic 118 of Fig. 1 can produce the eload[0 of true value] signal 162, so that this effective format macro instruction is loaded EE0 from format instruction (formatted_instr) 197, it is minimum empty item in the format instruction queue 187.
During the clock period 2, can set the V0 134 (significance bit of the project EE0 of format instruction queue 187) of Fig. 1, EE0 comprises an effective instruction with expression.At the rising edge of clock period 2, one of them buffer 183 of Fig. 1 can load eload[0] signal 162, and the lload[0 of output true value] signal 142.Because eload[0] signal 162 is true, so new instruction can be loaded ER0 220, and via early0 signal 193 outputs of Fig. 1, to be sent to the instruction transfer interpreter 138 of Fig. 1, as shown in the figure.Instruction transfer interpreter 138 can be translated new macro instruction, and the micro-order after will translating 171 is delivered to XIQ154.In addition, steering logic 102 can produce in X_rel_info 186 and the relevant fresh information of new instruction, as shown in the figure.Because lload[0] signal 142 is true, so multiplexer 310 can be chosen the loading input end, and via late0 191, the new relevant information that output signal X_rel_info 186 is provided, translate instruction queue (XIQ) 154 and multiplexer 172 with what be sent to Fig. 1, as shown in the figure.Moreover, because instruction transfer interpreter 138 can be translated new instruction during the clock period 2, so FIQ steering logic 118 can produce the eshift signal 164 of true value among Fig. 1, so that instruction can be shifted out from format instruction queue 187 during the clock period 3.
During the clock period 3, shift out from format instruction queue 187 because newly instruct, so V0 134 is pseudo-.Rising edge in the clock period 3, XIQ steering logic 156 can depend on translates whether instruction queue (XIQ) 154 is sky or non-NULL, and will translate the dependent instruction information load and execution stage buffer 176 that micro-order 171 and late0 191 provided respectively or translate instruction queue (XIQ) 154.In addition, the buffer 185 of Fig. 1 can load eshift signal 164, and the lshift 168 of output true value.
Can observe by Fig. 4 and to learn, though new macro instruction produced during the clock period 1, and relevant information just can produce up to the clock period 2, but formatd instruction queue 187 and help in the identical clock period, made relevant information and translated micro-order and can deliver to the execute phase.
Now please refer to Fig. 5, is the time sequences figure of the format instruction queue 187 of Fig. 1 of the present invention.Fig. 5 and Fig. 4 are similar, and its difference only is in the situation of Fig. 5, when order format device 116 produces new format macro instruction, translates instruction queue (XIQ) 154 for full.
During the clock period 1, XIQ_full 195 is true.Order format device 116 can produce new instruction in format instruction (formatted_instr) 197, and makes F_new_instr 152 for true, as the situation of Fig. 4.Because format instruction queue 187 is empty, so FIQ steering logic 118 can produce the eload[0 of true value] signal 162, so that effective format macro instruction is loaded EE0 from format instruction (formatted_instr) 197, as the situation of Fig. 4.
During the clock period 2, V0 134 can be set; Buffer 183 can be exported the lload[0 of true value] signal 142; New instruction can be loaded ER0 buffer 220, and via 193 outputs of early0 signal, to be sent to instruction transfer interpreter 138; Can produce among the signal X_rel_info 186 and the relevant fresh information of new instruction; And multiplexer 310 can choose the loading input end, and via late0 signal 191, the new relevant information that output signal X_rel_info 186 is provided is translated instruction queue (XIQ) 154 and multiplexer 172 to be sent to, as the situation of Fig. 4.Yet the time to translate instruction queue (XIQ) 154 full owing to open the beginning in the clock period 2, so FIQ steering logic 118 can produce the eshift signal 164 of pseudo-value, and different with Fig. 4.Afterwards, XIQ steering logic 156 can make XIQ_full signal 195 decapacitation (deassert), be illustrated in the clock period 3 during, the instruction transfer interpreter 138 will be ready to translate new macro instruction.
During the clock period 3,,, and deliver to via early0 signal 193 and to instruct transfer interpreter 138 to translate so new instruction can remain in the ER0 buffer 220 because at the rising edge of clock signal (clk) 202, eshift signal 164 be pseudo-.Corresponding is that V0 134 still remains very.Instruction transfer interpreter 138 can be translated new macro instruction, and will translate micro-order 171 and deliver to and translate instruction queue (XIQ) 154.Because at the rising edge of clock signal (clk) 202, llaod[0] 142 be true, so during the clock period 2, the relevant information that signal X_rel_info 186 is provided can load LR0 buffer 320.Because in the remaining period of clock period 3, lload[0] signal 142 and lshift signal 168 be pseudo-, so the content of LR0 buffer 320 (just, the fresh information that and instruction is relevant) can be sent to via late0 signal 191 and translate instruction queue (XIQ) 154, as shown in the figure.After opening the beginning in the clock period 3, FIQ steering logic 118 can produce the eshift signal 164 of true value, so that during the clock period 4, new instruction will be shifted out from format instruction queue 187.
During the clock period 4, shift out from format instruction queue 187 because newly instruct, so V0 significance bit 134 is pseudo-.At the rising edge of clock period 4, the dependent instruction information that XIQ steering logic 156 can will be translated micro-order 171 and late0 191 to be provided loads translates instruction queue (XIQ) 154.In addition, the buffer 185 of Fig. 1 can load eshift signal 164, and the lshift 168 of output true value.
Can observe by Fig. 5 and to learn, though new macro instruction produced during the clock period 1, and relevant information just can produce up to the clock period 2, but formatd instruction queue 187 and help in the identical clock period, made relevant information and translated micro-order and can deliver to XIQ 154.
Now please refer to Fig. 6, is the time sequences figure of the format instruction queue 187 of Fig. 1 of the present invention.Fig. 6 and Fig. 5 are similar, and its difference only is in the situation of Fig. 6, when order format device 116 produces new format macro instruction, not only translates instruction queue (XIQ) 154 for full, and format instruction queue 187 also is not empty.
During the clock period 1, XIQ_full 195 is true.Order format device 116 can produce new instruction in format instruction (formatted_instr) 197, and makes signal F_new_instr 152 for true, as the situation of Fig. 4 and 5.Because EE0 comprises an effective instruction, so V0 134 is true; Yet, because EE1 does not comprise an effective instruction, so the significance bit of the project EE1 of the format instruction queue 187 of Fig. 1 (V1 134) is pseudo-, as shown in Figure 6.Therefore, FIQ steering logic 118 can produce the eload[1 of true value] signal 162, so that effective format macro instruction is loaded EE1 from format instruction (formatted_instr) 197.Signal early0 193 can send the instruction (being called old instruction among Fig. 6) that remains among the EE0, and signal late0 191 can send with remain in LE0 in the relevant information (being called old information) of old instruction, as shown in Figure 6.
During the clock period 2, V1 134 can be set, and EE1 comprises an effective instruction now with expression.V0 significance bit 134 is also kept set condition.Old instruction is to remain in the ER0 buffer 220, and old information then remains in the LR0 buffer 320.Buffer 183 can be exported the lload[1 of true value] signal 142.New instruction can be loaded ER1 buffer 221, as shown in Figure 6.The fresh information relevant with new instruction produces at signal X_rel_info 186, and the multiplexer 311 of Fig. 3 can be chosen the loading input end, and it can be sent to buffer LR1 321.Because it is full to translate instruction queue (XIQ) 154 when opening the beginning in the clock period 2, FIQ steering logic 118 can produce the eshift signal 164 of pseudo-value.Afterwards, XIQ steering logic 156 can make signal XIQ_full 195 decapacitation, be illustrated in the clock period 3 during, the instruction transfer interpreter 138 will be ready to translate new macro instruction.
During the clock period 3, owing to the rising edge in clock signal (clk) 202, eshift signal 164 is pseudo-, so new instruction can remain in the ER1 buffer 221.In addition, old instruction remains among the ER0 220, and delivers to instruction transfer interpreter 138 via early0 193 and translate.V1 and V0 134 still remain very.Instruction transfer interpreter 138 can be translated old instruction, and its micro-order of translating 171 is delivered to XIQ 154.Because at the remaining period of clock period 3, lload[0] signal 142 and lshift signal 168 are pseudo-, so the content of LR0 320 (just, the old information relevant with old instruction) can be sent to XIQ 154 via late0 191, as shown in Figure 6.Because at the rising edge of clock signal (clk) 202, llaod[0] signal 142 is true, so during the clock period 2, the new relevant information that signal X_rel_info 186 is provided can load LR1 buffer 321.After opening the beginning in the clock period 3, FIQ steering logic 118 can produce the eshift signal 164 of true value, so that during the clock period 4, new instruction will move on to EE0 from EE1.
During the clock period 4, because new instruction moves on to EE0 from EE1, so V1 134 is pseudo-.At the rising edge of clock period 4, the dependent instruction information that micro-order 171 that XIQ steering logic 156 can be translated from old instruction and late0 191 are provided loads XIQ 154.In addition, buffer 185 can load eshift signal 164, and the lshift 168 of output true value.Because XIQ 154 has been ready to receive another micro-order, eshift 164 can remain very.Because at the rising edge of clock signal (clk) 202, eshift signal 164 is true,, and delivers to instruction transfer interpreter 138 via early0 193 and translate so new instruction meeting moves on to ER0 220 from ER1 221.V0 134 still remains very.Instruction transfer interpreter 138 can be translated new instruction, and will deliver to XIQ 154 from the micro-order 171 that new instruction is translated.Because during the clock period 4, lshift 168 is true, thus with remain in LR1 321 in the relevant information of new instruction, can be selected at the carry input of multiplexer 310, and transmit, as shown in Figure 6 via late0 signal 191.
During the clock period 5, shift out from format instruction queue 187 because newly instruct, so FIQ steering logic 118 can be removed V0 134.At the rising edge of clock period 5, the dependent instruction information that micro-order 171 that XIQ steering logic 156 can be translated from new instruction and late0 191 are provided loads XIQ 154.
Can observe by Fig. 6 and to learn, though new macro instruction produced during the clock period 1, and relevant information just can produce up to the clock period 2, but formatd instruction queue 187 and help in the identical clock period, made relevant information and translated micro-order and can deliver to XIQ 154.
Though the present invention and purpose thereof, feature and advantage are described in detail, other embodiment also can be within the scope of the present invention.For example, though illustrated embodiment is that macro instruction is done buffering, be translated into micro-order to deliver to the instruction transfer interpreter, scope of the present invention is not limited to such embodiment; Exactly, the present invention can be widely used in any buffered instructions that needs, and the instruction relevant information situation that a clock produced in the cycle after the clock period that produces instruction itself.Moreover though previous embodiment is to implement in the microprocessor that can handle variable length instruction, the present invention is not subject to this, can be used in the processor of fixed-size command yet.
The present invention also may be implemented in computing machine and can use in (as can read) computer readable code (as computer readable program code, data etc.) that media included except utilizing hardware implements.It is feasible that computer code becomes disclosed function or framework (or both).For example, this can be by using general procedure language (as C, C++, JAVA and similar program language); The GDSII database; The hardware description language (HDL) that comprises Verilog HDL, VHDL, Altera HDL (AHDL) etc.; Or other available sequencing and/or circuit equipments of recording reach in this technical field.Computer code can place any known computing machine can use (as can read) media, comprise semiconductor memory, disk, CD (as CD-ROM, DVD-ROM and analog), and can be used as computer data signal, in be contained in computing machine and can use in (as can read) transmission medium (, comprising the media of numeral, optics or utilization simulating signal) as carrier wave or any other media.Thus, computer code can be gone up transmission at communication network (comprising world-wide web and internal network).What recognize is, invention can be implemented in computer code (as the part of intellectual property power (IP) core (as microcontroller core), or as the design (as system single chip (SOC)) of systemic hierarchial) in, and be convertible into hardware, a part of making as integrated circuit.The present invention also can actual combination as hardware and computer code.
The above only is preferred embodiment of the present invention, can not limit the scope that the present invention is implemented with this.All equalizations of being done without departing from the spirit and scope of the present invention according to claim of the present invention change and modify, and all should belong in the patent claims of the present invention scope required for protection.

Claims (24)

1. one kind in order to cushion the instruction in the pipeline microprocessor and the device of relevant information, and wherein this snubber assembly is just obtained this relevant information up to obtaining this instruction at least one clock period of back, and this device comprises:
One first formation has more than first project, and each project is in order to store an instruction;
One second formation has more than second project corresponding to this more than first project, each project in order to store with this first formation in to instructing relevant information;
A plurality of control signals are coupled to this first formation, in order to load, to be shifted and to keep described and instruct in this first formation; And
A plurality of buffers receive described control signal, and the described control signal of a clock period of output delay, to load, to be shifted and to keep this relevant information in this second formation.
2. device as claimed in claim 1, wherein this device was coupled between the execution phase of the instruction cache of this microprocessor and this microprocessor pipeline.
3. device as claimed in claim 2, wherein this second formation configuration is delivered to this execute phase in this second formation receives the identical clock period of this relevant information with this relevant information.
4. device as claimed in claim 3, wherein the bottom project in more than second project of this of this second formation comprises:
One buffer, in the clock period after this identical clock period, this relevant information of this instruction correspondence that the corresponding bottom project in this more than first project of this first formation that optionally stores is deposited; And
One multiplexer, has an input end, in order in this identical clock period, to receive this relevant information, and has an output terminal, be coupled to an input end and this execute phase of this buffer, according to the described control signal in one or more these delayed clock cycles, in this identical clock period, optionally this relevant information is delivered to this execute phase.
5. device as claimed in claim 1, wherein this relevant information comprises the instruction pointer to instructing that is stored in this first formation.
6. device as claimed in claim 1, wherein this relevant information comprises a pair of length that should instruct that is stored in this first formation.
7. device as claimed in claim 1, wherein this relevant information comprise be stored in this first formation in to instructing relevant branch prediction information, this instruction that wherein should correspondence is predicted to be a branch instruction.
8. device as claimed in claim 7, wherein this branch prediction information comprises branch-history table information.
9. device as claimed in claim 7, wherein this branch prediction information comprises a linear instruction pointer that is used for predicting this branch instruction.
10. device as claimed in claim 7, wherein this branch prediction information comprises the branch's sample that is used for predicting this branch instruction.
11. device as claimed in claim 7, wherein this branch prediction information is specified a branch instruction type of this branch instruction.
12. device as claimed in claim 1, wherein this relevant information comprises the displacement field to instructing that is stored in this first formation.
13. device as claimed in claim 1, wherein this relevant information comprises the immediate field to instructing that is stored in this first formation.
14. an instruction buffer, comprising:
A plurality of multitask type buffers, each is in order to store an instruction;
A plurality of scratch pad multiplexers, each instructs relevant information in order to store in the corresponding buffer with described multitask type buffer this;
One steering logic is coupled to described multitask type buffer, in order to producing a control signal, with optionally with one in the described multitask type buffer of this instruction load; And
One buffer, in order in first clock period, receive the value of this control signal, and the second clock after this first clock period export in the cycle this value, optionally to load this relevant information corresponding to an inherence in one in the described multitask type buffer described scratch pad multiplexer.
15. instruction buffer as claimed in claim 14, in the wherein said multitask type buffer one in the cycle, can export this instruction to one instruction transfer interpreter at this second clock.
16. a microprocessor comprises:
One order format device in order in first clock period, is exported a branch instruction;
One steering logic in the cycle, produces the information relevant with the prediction of this branch instruction in order to the second clock after this first clock period; And
One instruction buffer, be coupled to this order format device, in order to this branch instruction of buffering during this first clock period, and receive this information in the cycle at this second clock, if this instruction buffer is empty during this first clock period, then optionally export this information in the cycle, and, then during this second clock cycle, optionally cushion this information during this first clock period if this instruction buffer is not empty at this second clock.
17. microprocessor as claimed in claim 16 also comprises:
One execute phase, be coupled to this instruction buffer, if this instruction buffer is empty during this first clock period, then should the execute phase at this second clock in the cycle, receive this information from this instruction buffer, and if this instruction buffer be empty during this first clock period, then should the execute phase in one the 3rd clock period of all after dates of this second clock, receive this information from this instruction buffer.
18. microprocessor as claimed in claim 17 more comprises:
One instruction transfer interpreter, be coupled to this instruction buffer, if this instruction buffer is empty during this first clock period, then should instruction transfer interpreter at this second clock in the cycle, receive this branch instruction from this instruction buffer, and, then should instruct transfer interpreter in the 3rd clock period if this instruction buffer is not empty during this first clock period, receive this branch instruction from this instruction buffer.
19. microprocessor as claimed in claim 18, wherein this instruction buffer is if be empty during this first clock period, then should the execute phase at this second clock in the cycle, can receive one from this instruction transfer interpreter and translate micro-order, and this instruction buffer is not if be empty during this first clock period, then should the execute phase in the 3rd clock period, can instruct transfer interpreter to receive this from this and translate micro-order.
20. the method for a buffered instructions and later generation relevant information thereof has instruction and relevant information in the microprocessor of a pipeline in order to buffering, comprising:
In first clock period, with an instruction load one instruction queue;
Second clock after this first clock period produces the information relevant with this instruction in the cycle;
Judge whether this instruction during this second clock cycle, shifts out from this formation; And
If this instruction is not shifted out from this formation, then in the cycle this relevant information is loaded this formation at this second clock, and if this instruction has been shifted out from this formation, then during this second clock cycle, walk around this formation and this relevant information is delivered to this pipeline together with this instruction.
21. method as claimed in claim 20 also comprises:
If this instruction is not shifted out from this formation, judge then whether this instruction down is shifted during this second clock cycle in this formation; And
If this instruction is down displacement in this formation during this second clock cycle, then in the 3rd clock period of all after dates of this second clock, with the down displacement in this formation of this relevant information.
22. an instruction buffer comprises:
One first multiplexer, comprise that an output terminal, keeps data input pin, a loading data input end to receive an instruction in one first clock period, and one the control input end to receive one first control signal, wherein if the value of this control input end is true, then this first multiplexer is chosen this loading data input end, otherwise can choose this maintenance data input pin;
One first buffer comprises an input end that is coupled to this first multiplexer output terminal, and is coupled to the output terminal that this first multiplexer keeps data input pin;
One second buffer has an input end and an output terminal;
One second multiplexer, comprise that an output terminal that is coupled to this second buffer input end, a maintenance data input pin, the loading data input that is coupled to this second buffer output terminal receive the information relevant with this instruction with the second clock after this first clock period in the cycle, and one the control input end to receive one second control signal, wherein if the value of this control input end is true, then this second multiplexer is chosen this loading data input end, otherwise can choose this maintenance data input pin; And
One the 3rd buffer, has an input end in this first clock period, to receive this first control signal, and one output terminal to produce this second control signal in the cycle at this second clock, thus, if this first control signal is true during this first clock period, then in the cycle, can export this instruction and this relevant information at this second clock.
23. instruction buffer as claimed in claim 22, wherein this second multiplexer also comprises a shifted data input end, to receive the output of one the 4th buffer, the 4th buffer is the data that are positioned at a project of this instruction buffer on the project that this second buffer and this second multiplexer comprised in order to storage, this second multiplexer also comprises one second control input end, to receive one the 3rd control signal, wherein if the value of this control input end be that the value of false and this second control input end is for very, then this second multiplexer is chosen this shifted data input end, otherwise can choose this maintenance data input pin.
24. be contained in the computer data signal in the transmission medium in one kind, comprise;
Computer readable program code, so that the device of instruction of a kind of available buffer and relevant information to be provided in a pipeline microprocessor, wherein this snubber assembly is up to obtaining back at least one clock period of this instruction, just obtain this relevant information, this program code comprises:
First program code, so that one first formation to be provided, it has more than first project, and each is in order to store an instruction:
Second program code, so that one second formation to be provided, it has more than second project corresponding to this more than first project, each in order to store with this first formation in a pair ofly should instruct relevant information;
The 3rd program code so that a plurality of control signals to be provided, is coupled to this first formation, in order to load, to be shifted and to keep these instructions in this first formation; And
Quadruple pass preface code so that a plurality of buffers to be provided, receives these control signals, and these control signals in one clock cycle of output delay, to load, to be shifted and to keep this relevant information in this second formation.
CNB2003101204239A 2003-04-23 2003-12-11 Buffer command and its device and method for relatirely later producing related information Expired - Lifetime CN1310137C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/422,057 2003-04-23
US10/422,057 US7159097B2 (en) 2002-04-26 2003-04-23 Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts

Publications (2)

Publication Number Publication Date
CN1514357A true CN1514357A (en) 2004-07-21
CN1310137C CN1310137C (en) 2007-04-11

Family

ID=34272339

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2003101204239A Expired - Lifetime CN1310137C (en) 2003-04-23 2003-12-11 Buffer command and its device and method for relatirely later producing related information

Country Status (2)

Country Link
CN (1) CN1310137C (en)
TW (1) TWI232403B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107250977A (en) * 2015-03-04 2017-10-13 森蒂彼得塞米有限公司 Pass through the runtime code parallelization of the approximate monitoring to command sequence
CN107430511A (en) * 2015-03-31 2017-12-01 森蒂彼得塞米有限公司 Parallel execution based on the command sequence monitored in advance

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140208075A1 (en) * 2011-12-20 2014-07-24 James Earl McCormick, JR. Systems and method for unblocking a pipeline with spontaneous load deferral and conversion to prefetch
US20200310799A1 (en) * 2019-03-27 2020-10-01 Mediatek Inc. Compiler-Allocated Special Registers That Resolve Data Hazards With Reduced Hardware Complexity

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5608885A (en) * 1994-03-01 1997-03-04 Intel Corporation Method for handling instructions from a branch prior to instruction decoding in a computer which executes variable-length instructions
US5809272A (en) * 1995-11-29 1998-09-15 Exponential Technology Inc. Early instruction-length pre-decode of variable-length instructions in a superscalar processor
US5805878A (en) * 1997-01-31 1998-09-08 Intel Corporation Method and apparatus for generating branch predictions for multiple branch instructions indexed by a single instruction pointer
US6065110A (en) * 1998-02-09 2000-05-16 International Business Machines Corporation Method and apparatus for loading an instruction buffer of a processor capable of out-of-order instruction issue
US6823444B1 (en) * 2001-07-03 2004-11-23 Ip-First, Llc Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107250977A (en) * 2015-03-04 2017-10-13 森蒂彼得塞米有限公司 Pass through the runtime code parallelization of the approximate monitoring to command sequence
CN107430511A (en) * 2015-03-31 2017-12-01 森蒂彼得塞米有限公司 Parallel execution based on the command sequence monitored in advance

Also Published As

Publication number Publication date
TW200422947A (en) 2004-11-01
TWI232403B (en) 2005-05-11
CN1310137C (en) 2007-04-11

Similar Documents

Publication Publication Date Title
CN1147794C (en) Decoupling instruction fetch-actuating engine with static jump prediction support
CN1279442C (en) Device and method for selective access in different instruction buffer stages
CN1291311C (en) Device and method for selectively covering return godown to respond the detection of non-standard return sequence
EP1624369B1 (en) Apparatus for predicting multiple branch target addresses
CN110069285B (en) Method for detecting branch prediction and processor
CN1397887A (en) Virtual set high speed buffer storage for reorientation of stored data
US7159097B2 (en) Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts
US7143269B2 (en) Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor
CN104424128B (en) Variable length instruction word processor system and method
CN1397886A (en) Imaginary branch target address high speed buffer storage
CN1310137C (en) Buffer command and its device and method for relatirely later producing related information
EP1868081A1 (en) Processor
CN1138174A (en) Program translating apparatus and processor which achieve high-speed excution of subroutine branch instructions
TWI780804B (en) Microprocessor and method for adjusting prefetch instruction
CN1206144A (en) Pipeline processor capable of reducing branch hazards with small-scale circuit
CN1947092A (en) Methods and apparatus for multi-processor pipeline parallelism
CN1206145A (en) Signal processor having pipeline processing circuit and method of the same
CN1549113A (en) Apparatus and method for invalidating instructions in an instruction queue of a pipelined microprocessor
CN1821953A (en) Apparatus for predicting multiple branch target addresses
CN1870131A (en) Character string retrieving circuit and character string retrieving method
CN1117618A (en) Data processor with branch prediction and method of operation
CN1282930C (en) Apparatus and method for efficiently updating branch target address cache
CN1308813C (en) Control mechanism referenced by non-temporary memory
CN113377442A (en) Fast predictor override method and microprocessor
CN100339825C (en) Device and method for invalidating redundant items in branch target address cache

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term

Granted publication date: 20070411

CX01 Expiry of patent term