CN101833436B - Device and method suitable for a microprocessor - Google Patents
Device and method suitable for a microprocessor Download PDFInfo
- Publication number
- CN101833436B CN101833436B CN 201010185611 CN201010185611A CN101833436B CN 101833436 B CN101833436 B CN 101833436B CN 201010185611 CN201010185611 CN 201010185611 CN 201010185611 A CN201010185611 A CN 201010185611A CN 101833436 B CN101833436 B CN 101833436B
- Authority
- CN
- China
- Prior art keywords
- byte
- instruction
- command byte
- preposition
- length
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Executing Machine-Instructions (AREA)
- Advance Control (AREA)
Abstract
The invention relates to a device and method suitable for a microprocessor. An order set construction of the microprocessor comprises a length modification preposition for selecting an address/operand size except a preset address/operand size. The device is used for marking a start byte and an end byte of each order in an order byte series flow and composed of a decoding logic unit for decoding each command byte in the order bytes of preset number in the series flow to adjudge whether each order byte is appointed as the length modification preposition and generate a start mark and an end mark based on the address/operand size; and an operand/address size logic unit coupled with the decoding logic unit and for providing the preset operand/address size to the decoding logic unit and generating the start mark and the end mark in a first clock cycle, wherein the decoding logic unit decodes the order bytes of preset number in the first clock cycle.
Description
Technical field
The present invention is relevant field of microprocessors, particularly about getting instruction from a kind of command byte crossfire of microprocessor of the instruction set architecture with variable length instruction.
Background technology
Microprocessor comprises one or more performance element, carries out in order to carry out actual instruction.SuperScale (superscalar) microprocessor can send a plurality of instructions to each performance element within each clock period, thereby is promoted throughput or promote interior averaging instruction of each clock period.Yet instruction fetch and the decoding function of microprocessor pipeline upper end must provide an instruction crossfire to performance element with efficient speed, use and effectively use performance element and promote throughput.The x86 framework is owing to its instruction length and on-fixed, and therefore so that this work is more difficult, under this framework, the length of its each instruction changes, and this will be in following detailed description.Therefore, the x86 microprocessor must comprise a lot of logical circuits to process the command byte crossfire of coming in, to determine beginning and end position of instruction.Therefore, must promote the x86 microprocessor and resolve the command byte crossfire to obtain the processing speed of each instruction.
Summary of the invention
According to the features of this invention, the invention provides a kind of device that is applicable to microprocessor.The instruction set architecture of this microprocessor comprises a length and revises prepositionly, and in order to selecting one address/operand size beyond a default address/operand size, this device is in order to indicate beginning byte and the end byte of each instruction in the middle of the command byte crossfire.This device comprises: a decode logic unit, in order to each command byte in the predetermined number command byte of this crossfire of decoding, to judge that whether each command byte is appointed as this length and is revised prepositionly, and indicates at the beginning and one finishes to indicate to produce according to address/operand size; And one operand/address size logical block, be coupled to this decode logic unit, in order to provide this predetermined registration operation number/address size to this decode logic unit, begin to indicate and this end sign within one first clock period, to produce this, this decode logic unit this predetermined number command byte of within this first clock period, decoding, wherein, if within any one cycle of this first clock period and a follow-up N clock period, it is preposition that this decode logic unit represents that one of them command byte of this predetermined number command byte is appointed as this length modification, then this operand/address size logical block provides this length to revise preposition this specified address/operand size to this decode logic unit in next clock period, begins to indicate and this end sign to produce this.
According to the features of this invention, the invention provides a kind of method that is applicable to microprocessor, the instruction set architecture of this microprocessor comprises a length and revises preposition, in order to select one address/operand size beyond a default address/operand size, the method is in order to indicate beginning byte and the end byte of each instruction in the middle of the command byte crossfire, the method comprises: within one first clock period, whether each command byte in the predetermined number command byte of this crossfire of decoding is revised preposition as this length to judge each command byte; Revise preposition if decoded command byte is non-for length, then in this first clock period, according to this predetermined registration operation number/address size, produce correct at the beginning sign and for each command byte of this predetermined number command byte of this crossfire and finish to indicate.If one or more decoded command byte is revised preposition for this length, N clock period after the first clock period then, revise this address of preposition indication/operand size according to this length, produce correct this for each command byte of this predetermined number command byte of this crossfire and begin to indicate and this ends sign.Wherein above-mentioned N revises preposition number of instructions for containing at least one this length in this predetermined number command byte of this crossfire.
Description of drawings
Fig. 1 shows the calcspar of the microprocessor of the embodiment of the invention.
Fig. 2 shows the calcspar of L level of the order format device of Fig. 1.
Fig. 3 shows the preposition message 238 of the accumulation of Fig. 2.
Fig. 4 shows the operation of the microprocessor of Fig. 1.
Fig. 5 shows part L level and the M level calcspar of the order format device of Fig. 1.
Fig. 6 shows the operational flowchart of microprocessor element shown in Figure 5, in order in the command byte crossfire, taking out instruction (can take out three instructions at most in one embodiment), its can generation time postpone and and instruction in the prefix byte number irrelevant.
Fig. 7 shows the calcspar of a part of the order format device of Fig. 1.
Fig. 8 a and Fig. 8 b show the operational flowchart of the part order format device of Fig. 7.
Fig. 9 shows the detailed block diagram of multiplex's formation of Fig. 5.
Figure 10 shows the calcspar of part M level of the order format device of Fig. 1.
Figure 11 shows the calcspar of the M level steering logic unit of Fig. 5.
Figure 12 shows the operational flowchart of part M level of the order format device of Fig. 1.
Figure 13 shows that multiplex's formation of Fig. 5 is in the content of continuous two clock period, with the operation of illustration M level.
Figure 14 shows that multiplex's formation of Fig. 5 is in the content of continuous two clock period, with the operation of illustration M level.
Figure 15 shows among Figure 14 the instruction formatter in a clock in the cycle, and three instructions that will contain maximum 40 command byte obtain and send out.
Figure 16 shows that the BTAC of Fig. 1 has done bad prediction thereby caused branch's mistake of microprocessor, that is the branch of Fig. 1 is designated as logic true value but non-ly is the operational code of instruction.
Figure 17 shows the composition signal of ripple logical block output.
Figure 18 shows the operational flowchart of the microprocessor of Fig. 1.
Figure 19 shows the detailed block diagram of the length decoder of Fig. 2.
Figure 20 shows the configuration of 16 length decoders.
Figure 21 shows the operational flowchart of the length decoder of Figure 20.
[main element label declaration]
100 microprocessors, 102 instruction caches
104 x86 command byte formations, 106 order format devices
108 format instruction queues, 112 instruction transfer interpreters
114 translate instruction queue 116 working storage alias tables
118 reservation stations, 122 performance elements
124 retirement unit, 126 extraction units
128 branch target address cachings, 132 command byte
134 command byte, 136 x86 instruction crossfires
142 extract address 144 totalizers at present
146 predicted target address, 148 performance objective addresses
152 next address 154 branches that extract continuously indicate
202 length decoders, 204 ripple logical blocks
The output of 208 steering logic unit, 212 length decoders
The arbitrary preposition designator of 222 instruction lengths, 224 decodings
226 decoding LMP designators 228 are subjected to LMP to affect designator
229 preposition message, 232 start bits
234 stop bit, 236 significance bits
The 238 preposition message 252 predetermined registration operation numbers of accumulation and address sizes
302 OS 304 AS
308 REX.W appear in 306 REX
312 REX.R 314 REX.X
316 REX.B 318 REP
322 REPNE 324 LOCK
326 fragments exceed appearance 328 coding sections and exceed [2:0]
332 arbitrary preposition 402-414 steps that occur
502 multiplex's formations, 504 I1 multiplexers
506 I2 multiplexers, 508 I3 multiplexers
512 M level steering logic unit, 514 control signals
516 control signals, 518 control signals
524 first instruction I1,526 second instruction I2
528 the 3rd instruction I3,534,536,538 significance indicators
602-608 step 702 XIBQ steering logic unit
The preposition array of 802-824 step 1002 accumulation
1004 command byte arrays, 1102 subtracters
1104 part LEN, 1106 residue LEN1
1108 byte location END1,1112 byte location END0
1114 multiplexers, 1116 totalizers
1118 working storages, 1122 instruction length LEN1
The bad BTAC of 1201-1222 step 1702 position
1802-1816 step 1902 programmable logic array (PLA)
1904 totalizers, 1906 multiplexers
1912 eaLen values, 1914 control signals
1916 immLen values, 1918 eaLen values
The 2102-2116 step
Embodiment
Fig. 1 shows the calcspar of the microprocessor 100 of the embodiment of the invention.Microprocessor 100 comprises the pipeline (pipeline) that is comprised of multistage or a plurality of functional units, it comprises level Four instruction cache (four-stage instruction cache) 102, x86 command byte formation (x86 instructionbyte queue, XIBQ) 104, (it comprises three grades of L to order format device (instruction formatter) 106, M and F), format instruction queue (formatted instruction queue) 108, instruction transfer interpreter (instruction translator) 112, translate instruction queue (translatedinstruction queue) 114, working storage alias table (register alias table) 116, reservation station (reservation station) 118, performance element (execution units) 122 and retirement unit (retire unit) 124.Microprocessor 100 also comprises extraction unit (fetch unit) 126, and it provides present extraction address 142 to instruction cache 102, is listed as to XIBQ104 in order to select a command byte (byte) 132 caches.Microprocessor 100 also comprises totalizer 144, and it increases the present address 142 of extracting to produce the next address 152 of extracting continuously, feeds back to extraction unit 126 again.Extraction unit 126 also receives predicted target address 146 from branch target address caching (branch target address cache, BTAC) 128.At last, extraction unit 126 receives performance objective address (executedtarget address) 148 from performance element 122.
The formation of XIBQ104 contains a plurality of projects (entry), and each project comprises 16 byte datas from instruction cache 102.Moreover each project of XIBQ104 comprises relevant pre decoding (pre-decoded) message of data byte.Pre decoding message is to produce when instruction cache 102 flow to XIBQ104 when data byte.Caching data from XIBQ104 is command byte 134 crossfires, and its form is a plurality of 16 byte blocks, yet and does not know in the crossfire or beginning or the end position of the x86 instruction in the block.Order format device 106 is namely in order to determine beginning and end byte of each instruction in the crossfire, thereby byte serial stream is separated into x86 instruction crossfire 136, it is fed to and is stored in format instruction queue 108 again, processes with the other parts for the treatment of microprocessor 100 pipelines.When occur resetting or carry out/predict flow control instruction (for example jump over (jump) instruction, subroutine call (subroutine call) instruction or from the subroutine link order), then provide replacement address or branch target address to order format device 106 as instruction pointer (pointer), in order to activation order format device 106, make the first byte of the first effective instruction in its 16 present byte blocks that determine the instruction crossfire.Therefore, order format device 106 can add according to the starting position of first object instruction the length of first object instruction, to determine the starting position of next instruction.Order format device 106 repeats said procedure, until carry out or predict another flow control instruction.
BTAC128 also provides branch (taken) indication 154 to occur to XIBQ104.132 pairs of each command byte that instruction cache 102 offers XIBQ104 should have a branch that indication 154 occurs.Indication 154 occurs and predicts in order to represent BTAC128 whether command byte 132 row that offer XIBQ104 have branch instruction in branch; If be yes, then extraction unit 126 will be chosen the predicted target address 146 that BTAC128 provides.In details of the words, BTAC128 for first byte (even this first byte is prefix byte) of branch instruction can corresponding output logic true value branch indication 154 occurs, but for the branch that other byte of instruction then can the output logic falsity indication 154 occurs.
Microprocessor 100 is the microprocessor 100 of x86 framework.Can correctly carry out when aiming at the performed major applications program of x86 microprocessor when microprocessor, then this microprocessor namely can be considered the microprocessor of x86 framework.In the time can obtaining expected results, then this application program namely can be considered and can correctly carry out.One of feature of X86-based is variable for the instruction length in its instruction set architecture, but not fixes as the instruction length in some instruction set architectures.Moreover, for a certain x86 operational code (opcode), may affect because of whether having preposition (prefix) before the operational code length of instruction.In addition, the length of some instructions may be the function of the predetermined registration operation number (operand) under microprocessor 100 operator schemes and/or address size (for example the D position of code segment descriptor (code segment descriptor), perhaps whether microprocessor 100 operates in IA-32e or 64 bit patterns).At last, outside default address/operand size, instruction also can comprise a length and revise preposition (length-modifying prefix), in order to select address/operand size.For example, can use the REX.W position (position 3) of operand size (operand size, OS) preposition (0x66), address size (AS) preposition (0x67) and REX preposition (0x4x) to change default address/operand size.Intel (Intel) company claims that these are that length changes preposition (length-changing prefix, LCP), revise preposition (length-modifying prefix, LMP) yet be called in this manual length.Form and the length of X86 instruction are well known, details can be with reference to IA-32 Intel Architecture software development notebook (IA-32 Intel ArchitectureSoftware Developer ' s Manual), the chapter 2 of 2A collection: instruction set is with reference to (InstructionSet Reference), A-M, in June, 2006 in Christian era.
According to Intel 64 and IA-32 framework optimization reference manual (
64 and IA-32Architectures Optimization Reference Manual), in March, 2009 in Christian era, page or leaf 3-21 to 3-23 (can from following page download http://www.intel.com/Assets/PDF/manual/248966.pdf): " when pre decoder runs into LCP in extracting row, then must use slower length decoder algorithm.When using slower length decoder algorithm, pre decoder was decoded within six cycles, but not general one-period.Formation in the machine pipeline (queuing) generally is the delay that can't avoid LCP to cause.」
Fig. 2 shows the calcspar of L level of the order format device 106 of Fig. 1.Order format device 106 comprises a plurality of length decoders 202, its output 212 is coupled to respectively a plurality of ripples (ripple) logical block 204, and the output 214 of ripple logical block 204 is coupled to steering logic unit 208 and offers the M level of order format device 106.In one embodiment, length decoder 202 produces output 212 during the first phase place of the two phase clock signal of microprocessor 100, and ripple logical block 204 produces output 214 during the second phase place of two phase clock signal.
Length decoder 202 receives command byte 134 from XIBQ104.In one embodiment, each project width of XIBQ104 is 16 bytes, thereby 16 length decoders 202 should be arranged mutually, as shown in Figure 20 to 15.Each length decoder 202 receives and the corresponding command byte of decoding from the bottom of XIBQ104 project.In addition, each length decoder 202 receives and ensuing three the adjacent instructions bytes of decoding.For last three length decoders 202, it receives one or more command byte (if the bottom penult project of XIBQ104 is invalid, then last three length decoders 202 must be waited for and produce effectively output in the next clock period) from the bottom of XIBQ104 penult project.The details of length decoder 202 will illustrate in Figure 19.By this, so that length decoder 202 can determine and export the instruction length 222 of the instruction in the bottom project of XIBQ104.In one embodiment, the byte number of instruction length 222 these instructions of expression except prefix byte.In other words, in the middle of instruction length 222 presentation directiveses, the byte number from operational code to last byte.Specifically, be instruction length 222 by the instruction length of exporting corresponding to the length decoder 202 of the first command byte of instruction.
In order to produce instruction length 222, length decoder 202 also uses operand and the address size 218 that is received from steering logic unit 208.Steering logic unit 208 can output function number and address size 218 for each command byte 134.The predetermined registration operation number of the present microprocessor 100 of steering logic unit 208 bases and the output 214 of address size 252 and ripple logical block 204 are to determine operand and address size 218.If without LMP, then corresponding length decoder 202 is given for each instruction word festival-gathering output predetermined registration operation number and address size in steering logic unit 208 in output 214 presentation directiveses of ripple logical block 204.Yet, if in output 214 presentation directiveses of ripple logical block 204 one or more LMP is arranged, then steering logic unit 208 revise predetermined registration operation number and address size 252 for each instruction word festival-gathering and output function number and address size 218 to corresponding length decoder 202, wherein predetermined registration operation number and address size 252 are revised according to the value of 308 of OS 302, AS 304 and REX.W in steering logic unit 208, these are contained in the preposition message 238 of accumulation of output 214 of ripple logical block 204, as shown in Figure 3.
As shown in Figure 2, the output 212 include instruction bytes 134 of each length decoder 202, instruction length 222, the arbitrary preposition designator of decoding (decoded any prefix indicator) 224, decoding LMP designator (decoded LMP indicator) 226, be subjected to LMP to affect designator (susceptible toLMP indicator) 228 and preposition message 229.
The byte of decoding when length decoder 202 corresponds to arbitrary x86 preposition (no matter whether it is LMP), and arbitrary preposition designator 224 of then decoding is logic true value; Otherwise, be the logic falsity.
The byte of decoding when length decoder 202 corresponds to arbitrary x86LMP, that is OS preposition (0x66), AS preposition (0x67) or REX.W preposition (0x48-0x4F), and the LMP designator 226 of then decoding is logic true value; Otherwise, be the logic falsity.
The byte of decoding when length decoder 202 is opcode byte, wherein the instruction length of operational code (is not for example affected by LMP, OS is preposition to be compulsory for some SIMD instructions, therefore can not change its length), then be subjected to LMP to affect designator 228 and be the logic falsity; Otherwise, be logic true value.
Preposition message 229 comprises a plurality of position (bit), in order to presentation directives's byte whether have various x86 preposition one of them.These are similar to the preposition message 238 of accumulation shown in Figure 3.Yet the preposition message 229 of length decoder 202 output only represents single preposition, that is, be subjected to the prefix value of command byte of the single correspondence of length decoder 202 decodings.Opposite, because ripple logical block 204 is accumulated the preposition message 229 that all length demoder 202 provides, all of therefore accumulating in 238 presentation directiveses of preposition message are preposition.
As shown in Figure 2, output 214 include instruction bytes 134, start bit 232, stop bit 234, the significance bit 236 of each ripple logical block 204 and accumulate preposition message 238.The output 214 of each ripple logical block 204 also is fed to next adjacent ripple logical block 204.In one embodiment, 16 ripple logical blocks 204 are organized into four logical blocks, four command byte of each block processes and related news thereof.Each ripple logical block block 204 is also exported corresponding command byte.
When ripple logical block 204 handled bytes are the opcode byte of instruction (for example the first byte of instruction is non-is prefix byte), then start bit 232 is logic true value.Order format device 106 increases by an index, and it points to all prefix bytes, so that when pointed one non-prefix byte, and the operand byte that this pointer will directional order.
When ripple logical block 204 handled bytes were the last byte of instruction, then stop bit 234 was logic true value; Otherwise, be the logic falsity.
From first of 16 significance bits 236 of ripple logical block 204 output, until first untreated LMP occurs, each significance bit 236 is logic true value.
Accumulating preposition message 238 is shown in Fig. 3 and discusses as above.Steering logic unit 208 uses the preposition message 238 of accumulation also to cooperate significance bit 236, to determine whether to use predetermined registration operation number and address size 252 or it is made amendment.
Output 212 that it should be noted that length decoder 202 belongs to a kind of test character.In other words, it produces when exporting and does not know the address of dependent instruction byte in instruction.Especially, be to suppose that this byte is to produce under effective preposition prerequisite with preposition relevant designator 224/226/228/229, and this hypothesis may be the hypothesis of a mistake.Therefore, this byte may by chance have a preposition value, but this byte is displacement (displacement) byte with value identical with LMP in fact.For example, 0x67 is the preposition value of AS, and it is LMP.Yet the SIB byte of address displacement byte or immediate data value (immediatedata value) byte or Mod R/M byte or instruction is neither to be prefix byte, but may have the 0x67 value.Only all LMP in the present block of command byte process, could determine that the output 212 and 214 corresponding to all bytes in the block all is correct.
If within the at present clock period, all command byte in the XIBQ104 project are not decoded to go out any LMP, and then the L level can produce at single clock ripple logical block 204 outputs 214 (particularly start bit 232 and the stop bit 234) of whole project in the cycle.If decodedly in the present project of XIBQ104 go out one or more LMP, the ripple logical block 204 outputs 214 required clock periodicities that then produce the correct start bit 232 of tool and stop bit 234 are N+1, and wherein N is the number that has the instruction of at least one LMP in the present project of XIBQ104.No matter how many preposition numbers of the arbitrary instruction in the project is, the L level all can be carried out above-mentioned work, and this is shown in the process flow diagram of Fig. 4.Steering logic unit 208 comprises a state, and processed in order to which byte in the present block of presentation directives's byte, which is not yet processed.This state is so that steering logic unit 208 can produce significance bit 236 and operand and address size 218 for each command byte.Have iteration (iterative) characteristic owing to have the processing of the command byte block of the instruction that contains LMP, even when the first clock period, the instruction length 222, start bit 232 and the stop bit 234 that contain the first instruction of LMP may be also incorrect; Yet when next clock period, the first instruction and arbitrary instruction length 222, start bit 232 and stop bit 234 that does not contain the adjacent instructions of LMP can become correctly; And in the clock period of continuing, next of the first instruction contains instruction and adjacent instruction length 222, start bit 232 and the stop bit 234 that does not contain the instruction of LMP thereof of LMP all can be correct.Whether in one embodiment, this state comprises the sixteen bit working storage, processed in order to represent the dependent instruction byte.
[indicate begin and end byte] for the instruction that contains LMP
Fig. 4 shows the operation of the microprocessor 100 of Fig. 1, and this flow process starts from step 402.
In step 402, steering logic unit 208 output predetermined registration operation numbers and address size 218 are given length decoder 202.Then, flow process enters step 404.
In step 404, in the first phase place of clock period, operand and address size 218 that length decoder 202 provides according to steering logic unit 208, with the decoding XIBQ104 the bottom project command byte and produce its output 212.As previously mentioned, for each command byte of the bottom project of XIBQ104, output 212 include instruction length 222 and and the preposition relevant designator 224/226/228/229 (Fig. 2) of length decoder 202.Then, flow process enters step 406.
In step 406, in the second phase place of clock period, ripple logical block 204 exports 214 according to the output 212 of length decoder 202 to produce.As previously mentioned, the output 214 of ripple logical block 204 comprises start bit 232, stop bit 234, significance bit 236 and accumulates preposition message 238 (Fig. 3).Then, flow process enters step 408.
In step 408, the output 214 of (examine) ripple logical block 204 is inspected in steering logic unit 208, comprises untreated LMP (length is revised preposition symbol) whether to also have any instruction in the bottom project of judging XIBQ104.If be yes, then enter step 412: otherwise, step 414 entered.
In step 412, the preposition message 238 of accumulation that steering logic unit 208 provides according to ripple logical block 204 is to upgrade internal state and operand and address size.Then, flow process is returned step 404, according to new operand size and address size, again processes the command byte of bottom project.
In step 414, the command byte of project was handled fully bottom steering logic unit 208 was judged, thereby it is shifted out from XIBQ104, and the M level is delivered in its output 214 together with each command byte 134 corresponding ripple logical block 204.Specifically, as previously mentioned, because the output 214 of ripple logical block 204 comprises start bit 232 and stop bit 234, it expresses the border of each instruction in the middle of the instruction crossfire that instruction cache 102 provides, thereby so that the M level of order format device 106 and F level are able to further processing instruction crossfire, and individual instructions inserted FIQ (format instruction queue) 108, allow instruction transfer interpreter 112 process.Flow process ends at step 414.
According to aforementioned, if do not contain LMP (length is revised preposition symbol) in the command byte, then the L level can in single clock in the cycle for the whole project of XIBQ (formation of x86 byte word) 104 to produce start bit 232 and stop bit 234; If there are one or more instructions to have LMP (length is revised preposition symbol) in the project of XIBQ104, then produce start bit 232 and stop bit 234 required clock periodicities become N+1, wherein N is the number of instructions that contains at least one LMP (length is revised preposition symbol) in the XIBQ104 project, and the preposition number that no matter contains in the instruction why, and the L level can be carried out.
[accumulating preposition effectively to process the instruction that contains a plurality of prefix bytes]
The x86 framework allows instruction to contain 0 to 14 prefix byte.This causes the difficulty of pipeline (pipeline) front end when processing instruction byte crossfire.In the past when processing contains the instruction of prefix byte of a great deal of, delay that can encounter time.According to Intel 64 and IA-32 framework optimization reference manual (
64and IA-32 Architectures Optimization Reference Manual), in March, 2009 in Christian era, page or leaf 12-5, Intel mentions for the ATOM micro-architecture: " contain instruction meeting preposition more than three and produce the MSROM transfer, cause two clock cycle delays of front end." according to the micro-architecture (The microarchitecture of Intel and AMD CPU ' s) of another Research Literature-Intel and AMD central processing unit; author Agner Fog; Copenhagen University College of Enginerring; May 5 2009 Christian era last the renewal; page or leaf 93 (can in following page download www.agner.org/optimize/microarchitecture.pdf), it is mentioned: " containing a plurality of preposition instructions needs extra time to decode.The instruction decoder of P4 only can be processed one preposition in the cycle in a clock.On P4, contain its each preposition cost one clock cycle decoder that needs of a plurality of preposition instructions ", and " instruction decoder of P4E can be preposition in two of clock period treatment.Therefore, decodable code contains at the most two preposition instructions in the single clock cycle, and containing three or four preposition instructions then needs decode within two clock period.So P4E increases this function, be because under 64 bit patterns, a lot of instructions all contain two preposition (for example the operand size is preposition and REX is preposition).」
Yet, the embodiment of the invention need not increase under the condition of time delay, can process all (14 at the most) prefix bytes that framework allows in the instruction, no matter the quantity of prefix byte why (as long as should be preposition non-be LMP (length is revised preposition symbol), if this is preposition to be LMP, then contain the extra processing time that increases by a clock cycle of one or more each preposition instruction meeting, as previously mentioned).So the embodiment of the invention can reach this purpose, be because length decoder 202 produces preposition message 229, ripple logical block 204 then accumulate preposition message 229 to produce the preposition message 238 of accumulation to the opcode byte of instruction, this will be in following detailed description.
Fig. 5 shows part L level and M level (multiplex's level) calcspar of the order format device 106 of Fig. 1.The M level comprises multiplex's formation (mux queue) 502.In one embodiment, multiplex's formation 502 comprises four projects, each items storing 16 byte.The blank project of next of multiplex's formation 502 receives the output 214 (Fig. 2) of corresponding ripple logical block 204, its include instruction byte 134, start bit 232, stop bit 234 and accumulate preposition message 238.
The M level also comprises M level steering logic unit 512, its from the bottom of multiplex's formation 502 project receives/stop bit 232/234, and (in one embodiment) receive the front crossed joint of the bottom project second from the bottom (next-to-bottom entry, NTBE) of multiplex's formation 502.According to beginning/stop bit 232/234,512 controls, three groups of multiplex's logical blocks in M level steering logic unit are respectively I1 multiplexer 504, I2 multiplexer 506 and I3 multiplexer 508.I1 multiplexer 504 outputs the first instruction I1 524 is to the F level of order format device 106; I2 multiplexer 506 outputs the second instruction I2 526 to F levels; I3 multiplexer 508 outputs the 3rd instruction I3 528 to F levels.In addition, three significance indicators 534/536/538 of M level steering logic unit 512 outputs, whether effective in order to represent corresponding first, second, third instruction 524/526/528.By this, the M level is able to take out at most (extract) three formats instruction from the instruction crossfire, and provides it to the F level in the cycle at single clock.In other embodiments, the M level can be taken out in the cycle and provide more than three at single clock and be formatd instructions to the F level.Each instruction in three instructions 524/526/528 comprises command adapted thereto byte 134, and its prefix byte is replaced into the preposition message 238 of corresponding accumulation.In other words, each instruction 524/526/528 comprises the other parts of opcode byte and command byte and accumulates preposition message 238.Each multiplexer 504/506/508 is from the respective base project of multiplex's formation 502 receipt message 214 (but start bit 232, stop bit 234 except) respectively, and (in one embodiment) crossed joint before the corresponding NTBE of multiplex's formation 502 receives is in order to choose individually and output order 524/526/528.
Fig. 6 shows the operational flowchart of microprocessor shown in Figure 5 100 elements, in order in the command byte crossfire, taking out instruction (can take out three instructions at most in one embodiment), its can generation time postpone and and instruction in the prefix byte number irrelevant.As previously mentioned, ripple logical block 204 can the preposition message 229 of accumulation be accumulated preposition message 238 opcode byte to instruction to produce.Shown in flow process start from step 602.
In step 602, in the first phase place of clock period, length decoder 202 decoding instruction bytes 134 crossfires to be producing output 212 (Fig. 2), particularly preposition message 229, and the class of operation of this and step 404 seemingly.Then, enter step 604.
In step 604, in the second phase place of clock period, ripple logical block 204 according to preposition message 229 in each instruction that determines crossfire which byte as opcode byte (that is first non-prefix byte).Moreover ripple logical block 204 is accumulated its preposition message 229 for all (mostly being 14 the most) prefix bytes in the instruction, to produce preposition message 238 opcode byte to instruction of accumulation.Specifically, ripple logical block 204 begins to accumulate preposition message 229 from the first prefix byte of instruction, and accumulates one by one the preposition message 229 of each byte, until it detects opcode byte.When the time comes, ripple logical block 204 stops the accumulation of preposition message, so that the preposition message 238 of accumulation of at present instruction can not continue to be accumulated to next instruction.Ripple logical block 204 begins to carry out the accumulation of preposition message 229 from the first prefix byte of next instruction, and stops at opcode byte.Each instruction in the crossfire repeats this program.Ripple logical block 204 uses another output 212 of length decoder 202 to finish the accumulation of preposition message.For example, as previously mentioned, ripple logical block 204 uses instruction length 222 to determine the first byte of each instruction, and it may be prefix byte, in order to begin the accumulation program of preposition message.Ripple logical block 204 also uses other message 224/226/228 to determine the position of opcode byte, its first byte (by start bit 232 expressions) for not containing preposition instruction, and the position of the last byte of decision instruction (by stop bit 234 expressions).Then, flow process enters step 606.
In step 606, command byte 134 and beginning accordingly/stop bit 232/234, the preposition message 238 of accumulation are loaded in next available items of multiplex's formation 502.In one embodiment, the step shown in the step 602,604,606 is carried out (presumptive instruction does not contain LMP (length is revised preposition symbol)) in the cycle in single clock.Then, enter step 608.
In step 608, in next clock period, M level steering logic unit 512 control multiplexers 504/506/508 make it can take out at the most three instructions.In other words, no matter the quantity of prefix byte why, the M level need not increase time delay and can get instruction.Behind multiplex (MUX) (muxed), but instruction 524/526/528 each be fed to the F level.Specifically, the M level can be taken out opcode byte and the subsequent byte of each instruction along with the preposition message 238 of accumulation.The F level according to instruction kenel, the exceptional situation that some are possible, pairing property (pairability) and other characteristic with decoding instruction 524/526/528, with translating of sign on 524/526/528.F level and instruction transfer interpreter 112 can utilize the preposition message 238 of accumulation.Flow process ends at step 608.
Present embodiment is different from traditional design.As previously mentioned, ripple logical block 204 tradition is come complicatedly, its start bit that produces 232 is opcode byte of pointing to instruction, but not such as first byte (it may be prefix byte) of directional order as the tradition, and produce the preposition message 238 of accumulation, therefore, no matter why the quantity of prefix byte all can get instruction and can not cause time delay (LMP (length is revised preposition) only, as aforementioned).On the contrary, traditional practice is to point out that the first byte of instruction reality is the first byte, if instruction contains prefix byte, then this prefix byte is represented as the first instruction.When instruction contained a plurality of prefix byte, in order to remove prefix byte, therefore traditional multiplex's logic can cause time delay.
[when operation part occurs, indicating so that caching data can discharge as early as possible with beginning/end]
Fig. 7 shows the calcspar of a part of the order format device 106 of Fig. 1.In Fig. 1, instruction cache 102 provides command byte 132 to XIBQ104.In one embodiment, order format device 106 comprises pre decoding (pre-decode) logical block (be not shown in graphic in), in order to the command byte 132 from instruction cache 102 is carried out pre decoding, then be loaded onto in the lump XIBQ104 together with command byte 132 through pre decoding message.Order format device 106 comprises XIBQ steering logic unit 702, and the project of its control XIBQ104 loads and shifts out.
Length decoder 202 and ripple logical block 204 (Fig. 2) receive command byte 134 and produce output 214 from XIBQ104, in order to multiplex's formation 502 of offering Fig. 5 and the M level steering logic unit 512 of order format device 106.The project of M level steering logic unit 512 control multiplex (MUX) formations 502 loads and shifts out.Multiplex's formation 502 gives information 214 to multiplexer 504/506/508 and M level steering logic unit 512 in its project, multiplexer 504/506/508 is controlled again in M level steering logic unit 512, as previously mentioned.
When following situation, can have problems: (1) but the bottom project of XIBQ104 comprises effective command byte NTBE then not to be comprised; (2) only has the instruction (for example the first of instruction or second byte) of part in the bottom project; (3) instruction of part does not provide enough message to allow length decoder 202/ ripple logical block 204 determine instruction length 222 (and beginning/stop bit 232/234), that is instruction also has some bytes to be positioned at NTBE.For example, suppose that the start bit 232 of the byte 15 (that is last byte) in XIBQ104 bottom project is logic true value, and the value of this byte is 0x0F.In the instruction of x86, the value of the first non-prefix byte is that 0x0F represents the operational code that a tool extends, therefore need to be according to its subsequent byte to determine the instruction kenel.In other words, can't be only from the 0x0F byte to determine instruction length (in some cases, may need at the most to the 5th byte to determine instruction length).Yet, when instruction cache 102 provides the next column caching data to XIBQ104 by the time, a period of time will be needed, for example, the error (miss) of instruction cache 102 may occur, or the error of seeking impact damper (translation lookaside buffer, TLB) is translated in instruction, therefore, need to a kind ofly not wait for other command byte and the scheme of footpath row processing.Moreover in some cases, microprocessor 100 must obtain the instruction before the unknown lengths instruction, if therefore these instructions are processed, then microprocessor 100 will be waited for always.Therefore, the mode that needs a kind of footpath row to process.
Fig. 8 shows the operational flowchart of the part order format device 106 of Fig. 7.This flow process starts from step 802.
In step 802, XIBQ steering logic unit 702 detects the instruction of the bottom project terminal of XIBQ104 and strides to another row of instruction caching data crossfire, and the instruction in the project of XIBQ104 bottom is not enough to allow length decoder 202/ ripple logical block 204 determine instruction length (and beginning/stop bit 232/234), and determine that the required subsequent instructions byte of instruction length not yet places XIBQ104NTBE, that is XIBQ104NTBE is invalid or blank.Then, flow process enters step 804.
In step 804, the output 214 of the ripple logical block 204 that M level steering logic unit 512 will produce corresponding to XIBQ104 bottom project is loaded onto multiplex's formation 502.Yet M level steering logic unit 512 does not shift out the bottom project of XIBQ104, because still need to determine the stop bit 234 of unknown lengths instruction.In other words, for the instruction of unknown lengths, its byte that is positioned at XIBQ104 bottom project must keep, and when other byte of instruction is come XIBQ104, is determined instruction length and stop bit.Then, flow process enters step 806.
In step 806, previous step 804 loaded outputs 214 arrive the bottom project of multiplex's formation 502.At this moment, M level steering logic unit 512 takes out all instructions and it is reached the F level, but does not transmit the instruction of unknown lengths.Yet M level steering logic unit 512 does not shift out the bottom project of multiplex's formation 502, because the stop bit 234 of the instruction of unknown lengths also do not learn, and all the other bytes of instruction not yet can get.The existence of unknown lengths instruction is known in M level steering logic unit 512, because this instruction does not have effect stop bit 234.In other words, had the first byte of effect start bit 232 directional orders, but the byte and the NTBE that do not have the bottom project of effect stop bit 234 sensing multiplex (MUX) formations 502 are invalid.Then, flow process enters 808.
In step 808, M level steering logic unit 512 stops (stall) multiplex (MUX) formation 502, until NTBE inserts effective output 214.Then, flow process enters step 812.
In step 812, XIBQ104 receives the command byte 132 of row finally from instruction cache 102, and it is loaded onto among the NTBE.The command byte 132 of these row comprises all the other bytes of unknown lengths instruction.Then, flow process enters step 814.
In step 814, length decoder 202/ ripple logical block 204 produces instruction length 222 and beginning/stop bit 232/234 for the unknown lengths instruction.In one embodiment, XIBQ steering logic unit 702 is according to the remaining word joint number amount (it be arranged in NTBE that step 812 be loaded onto XIBQ104) of instruction length 222 with the instruction of calculating unknown lengths.This remaining word joint number amount is the position that determines stop bit 234 in following step 818.Then, flow process enters step 816.
In step 816, XIBQ steering logic unit 702 shifts out the bottom project.Yet M level steering logic unit 512 does not load the output 214 of the ripple logical block 204 of respective base project, because it has placed multiplex's formation 502 according to step 804.Then, flow process enters step 818.
In step 818, length decoder 202/ ripple logical block 204 process new XIBQ104 bottom project (that is, the caching data that receives in step 812), and M level steering logic unit 512 output 214 (it comprises the stop bit 234 of unknown lengths instruction) of ripple logical block 204 is loaded onto among the NTBE of multiplex's formation 502.Then, flow process enters step 822.
In step 822, M level steering logic unit 512 takes out unknown lengths instruction (and other instruction that can take out) from the bottom of multiplex's formation 502 project and NTBE, and is sent to the F level.Then, flow process enters step 824.
In step 824, M level steering logic unit 512 shifts out the bottom project of multiplex's formation 502.Flow process ends at step 824.
According to above-mentioned, even the order format device 106 of present embodiment is in the not yet available situation of the related news of XIBQ (formation of x86 command byte) 104 bottom projects, for the instruction with available message, by allowing message (command byte, beginning/stop bit and accumulate preposition message) disengage from the L level as early as possible, thereby solved foregoing problems.
[by preposition accumulation obtaining with the enhancement instruction]
Fig. 9 shows the detailed block diagram of multiplex's formation 502 of Fig. 5.In the embodiment of Fig. 9, multiplex's formation 502 comprises four projects, be respectively bottom project (bottom entry, BE), NTBE, bottom project third from the bottom (second-from-bottom entry, SFBE) and bottom fourth from the last project (third-from-bottom entry, TFBE).Each project of multiplex's formation 502 contains 16 bytes, and each byte is deposited a command byte and start bit 232, stop bit 234 and accumulated preposition message 238.As shown in the figure, BE is denoted as respectively 0 to 15.NTBE is denoted as respectively 16 to 31.These labels also are shown in Figure 10.SFBE is denoted as respectively 32 to 47.
Figure 10 shows the calcspar of part M level of the order format device 106 of Fig. 1.Figure 10 shows the preposition array of the accumulation of multiplex's formation 502 (accumulated prefix array) 1002 and command byte array (instruction byte array) 1004.The message of accumulating preposition array 1002 and command byte array 1004 is actually BE and the NTBE that is stored in multiplex's formation 502.Yet, multiplex's formation 502 message provide be by wire to selecting circuit (in one embodiment, it is the dynamic logic unit), it comprises the multiplexer 504/506/508 of Fig. 5.Figure 10 only demonstrates I1 multiplexer 504, yet the input that I2 multiplexer 506 and I3 multiplexer 508 receive is also as I1 multiplexer 504.Instruction multiplexer 504/506/508 is the 16:1 multiplexer.As shown in figure 10, the input of I1 multiplexer 504 is denoted as respectively 0 to 15.The input of each I1 multiplexer 504 receives 11 command byte and accumulates preposition message 238, wherein accumulates preposition message 238 lowest orders corresponding to 11 command byte of receive (lowest order) byte.The byte number that this lowest order byte is command byte array 1004, it corresponds to Entering Number of I1 multiplexer 504.For example, the input 8 of I1 multiplexer 504 receives the byte 8 to 18 (that is byte 16-18 of the byte 8-15 of BE and NTBE) of multiplex's formation 502 and the preposition message 238 of accumulation of respective byte 8.The reason that I1 multiplexer 504 receives 11 command byte is: although the x86 instruction allows maximum 15 bytes, right non-prefix byte mostly is 11 bytes most, previous embodiment only obtain and transmit non-prefix byte to the remainder of pipeline (that is, remove prefix byte and replace prefix bytes to accumulate preposition message 238), thereby can reduce in a large number the decoding workload of pipeline following stages and allow microprocessor 100 realize various benefits.
Figure 11 shows the calcspar of the M level steering logic unit 512 of Fig. 5.M level steering logic unit 512 comprises 2:1 multiplexer 1114, and in order to produce instruction length LEN1 1122, it is the instruction length by an instruction (the first instruction I1 524 of Fig. 5) of the instruction crossfire of order format device 106.Instruction length LEN11122 continues to transmit by pipeline also processed together with the first instruction I1 524.Multiplexer 1114 exists according to the situation whether partial-length was arranged in the last clock period, with the output of selection subtracter 1102 or the output of totalizer 1116.Multiplexer 1114 is controlled by working storage 1118, and it stores one in order to represent whether the last clock period have the situation of partial-length, and this will describe in detail in Figure 12 to Figure 14.If there is the partial-length situation to occur, multiplexer 1114 is selected the output of totalizer 1116; Otherwise multiplexer 1114 is selected the output of subtracter 1102.First of totalizer 1116 is input as the instruction residue length, is denoted as residue LEN1 1106, and it will describe in detail in Figure 12 to Figure 14.M level steering logic unit 512 also comprises other logical block (be not shown in graphic in), its according to stop bit 234 (it is to offer M level steering logic unit 512 by multiplex's formation 502) of the first instruction I1 524 to calculate residue LEN1 1106.Second of totalizer 1116 is input as the partial-length of present instruction, is denoted as part LEN 1104, and it is provided by the working storage that the last clock period loads, and will describe in detail in Figure 12.Subtracter 1102 deducts the byte location (END1 1108) of stop bit 234 in multiplex's formation 502 of the first instruction I1524 with the byte location (END0 1112) of stop bit 234 in multiplex's formation 502 of last instruction.Although it should be noted that the mathematical operation that M level steering logic unit 512 is carried out as shown in figure 11, yet M level steering logic unit 512 can not use conventional adders/subtracter, but implement with combinatorial logic unit.For example, in one embodiment, carry out with decoded form the position; For example, subtraction can use boolean (Boolean) AND-OR computing.The employed subtracter of length computation of the second instruction I2 526 and the 3rd instruction I3 528 (be not shown in graphic in) is similar to the subtracter of the first instruction I1 524, and END1 deducts END2 and END2 deducts END3 but be respectively.At last, the decision of the present skew (offset) of multiplex's formation 502 projects is rear bytes of selecting from the last byte of final injunction of multiplexer 504/506/508.
Figure 12 shows the operational flowchart of part M level of the order format device 106 of Fig. 1.This flow process starts from step 1201.
In step 1201, the new clock period, and BE and the NTBE (Fig. 9) of multiplex's formation 502 are inspected in M level steering logic unit 512.Then, flow process enters step 1202.
In step 1202, control multiplexers 504/506/508 in M level steering logic unit 512 are sent to the instruction of the BE of multiplex's formation 502 and NTBE (if possible) the F level of order format device 106.As previously mentioned, in one embodiment, the M level can obtain three instructions in a clock in the cycle.Because the length of x86 instruction can be zero to 15 bytes, so the bottom project of multiplex's formation 502 may have one to 16 x86 instruction.Therefore, need a plurality of clock period with all instructions of the BE that obtains multiplex's formation 502.Moreover, be prefix byte, end byte or other type byte according to the last byte of BE actually, instruction may be crossed over BE and NTBE, therefore, M level steering logic unit 512 is when getting instruction and shift out the BE of multiplex's formation 502, and its mode of operation has difference, and this will be in following detailed description.Moreover M level steering logic unit 512 calculates each and obtains/and the length of move instruction, particularly use the logic of Figure 11 to calculate the first instruction I1 524 (the instruction length LEN1 1122 of Figure 11).If be the partial-length (this will describe in detail in step 1212) of last clock period, then M level steering logic unit 512 uses the part LEN1104 that stores with computations length LEN 1 1122; Otherwise M level steering logic unit 512 uses subtracter 1102 (Figure 11) with computations length LEN 1 1122.Then, flow process enters step 1204.
In step 1204, M level steering logic unit 512 determines whether that all instructions that end at BE all have been sent to the F level.In one embodiment, in the cycle, the M level can obtain and transmit at most three instructions to the F level in a clock.Therefore, if the M level obtains three instructions from the bottom project, and the start bit 232 that at least another instruction is still arranged is in the project of bottom, and then another instruction must obtain in next clock period.All be sent to the F level if end at all instructions of BE, then flow process enters step 1206; Otherwise flow process enters step 1205.
In step 1205, M level steering logic unit 512 does not shift out BE, so that when next clock period, more instruction be obtained and be transmitted in M level steering logic unit 512 can from BE.Flow process is back to step 1201, to carry out the program of next clock period.
In step 1206, the last byte that BE is judged in M level steering logic unit 512 is actually as preposition or be non-prefix byte.If the last byte of BE is non-prefix byte, then flow process enters step 1216; If the last byte of BE is prefix byte, then flow process enters step 1212.
In step 1212, M level steering logic unit 512 calculates and is positioned at the partial-length that BE comprises the instruction of prefix byte at last, that is, from the end byte of last instruction until the prefix byte number between the last byte 15 of BE, this calculating is not carried out by the mathematical logic unit of M level steering logic unit 512 (be shown in graphic in).For example, in the example of Figure 13, the partial-length of instruction b is 14.Prefix byte between the byte is to be in " gore " (no-man ' s land) at end byte and beginning, and prefix byte in fact is unnecessary in multiplex's formation 502, because its content is Already in accumulated preposition message 238, the opcode byte of itself and instruction is stored in multiplex's formation 502.By this, if BE has obtained in this clock period all for prefix byte and all other instructions in BE at last, then M level steering logic unit 512 can shift out (step 1214) with BE (1214), because these prefix bytes are that (it will accumulate on opcode byte in the middle of ensuing 16 byte streams) that exist and M level steering logic unit 512 store the prefix byte number (the partial-length working storage 1104 that is stored to Figure 11) and shift out from multiplex's formation 502.On the other hand, if BE's is last for non-prefix byte and its not yet are obtained or transmit, then M level steering logic unit 512 can not shift out (consulting step 1222) from multiplex's formation 502 with it.Then, flow process enters step 1214.
In step 1214, the 512 control multiplex (MUX) formations 502 of M level steering logic unit are to shift out BE.Flow process is back to step 1201, to carry out the program of next clock period.
In step 1216, whether the last byte that BE is judged in M level steering logic unit 512 is the end byte of instruction, that is whether stop bit 234 is logic true value.If be yes, then flow process enters step 1214; Otherwise flow process enters step 1218.
In step 1218, M level steering logic unit 512 judges that whether NTBE is as effective.When the end byte of the final injunction of obtaining is positioned at the last byte (that is byte 15) of BE, perhaps last byte stride to NTBE and its for effectively, then M level steering logic unit 512 shifts out BE; Otherwise BE is kept until next clock period in M level steering logic unit 512.If NTBE is that effectively flow process enters step 1214; Otherwise flow process enters step 1222.
In step 1222, M level steering logic unit 512 does not shift out BE.This is that (that is, non-prefix byte) crosses over BE and NTBE because the real bytes of instruction, and NTBE is invalid.In this situation, M level steering logic unit 512 can't determine instruction length, because the stop bit of instruction 234 can't be learnt from invalid NTBE.Flow process is back to step 1201, carries out the program of next clock period, to wait for that NTBE fills up valid data.
Figure 13 shows that multiplex's formation 502 of Fig. 5 is in the content of continuous two clock period, with the operation of illustration M level.First multiplex's formation 502 contents were in for the first clock period 0, and second multiplex's formation 502 content is in the second clock cycle 1.Graphic three projects that only demonstrate the bottom.In Figure 13, " S " expression beginning byte (that is start bit 232 be logic true value), " E " represents end byte (that is stop bit 234 is logic true value), " P " represents prefix byte (that is, accumulate preposition message 238 represented).4 instructions represent with a, b, c, d respectively, and show that it begins, end and prefix byte.Shown in byte number correspond to Fig. 9, for example byte 0 to 47, it is positioned at BE, NTBE and the SFBE of multiplex's formation 502.
The cycle 0 at the beginning, the byte 1 of BE includes the end byte Ea of instruction a, and the byte 2 to 15 of BE includes the prefix byte Pb of 14 instruction b.Because instruction b starts from BE, but its to begin byte be to be positioned at NTBE rather than BE, its partial-length is calculated as ten nybbles.The content of NTBE and SFBE is invalid, that is the formation 104 of X86 command byte and length decoder 202/ ripple logical block 204 not yet provide the caching data of instruction crossfire and related news thereof (for example start bit 232, stop bit 234 and accumulate preposition message 238) to other project except BE.
In 0 o'clock cycle, content (step 1201 of Figure 12) and move instruction a to the F level (step 1202) of BE and NTBE inspected in M level steering logic unit 512.Moreover, the length of M level steering logic unit 512 computations a, it equals the difference between the end byte position of the end byte position of instruction a and last instruction.At last, because (last byte (byte 15) that instruction a) has transmitted (step 1204) and BE is prefix byte (step 1206) to end at all instructions of BE, the partial-length of M level steering logic unit 512 computations b is ten nybbles, and it is stored in part LEN 1104 working storages (step 1212).At last, M level steering logic unit 512 shifts out (step 1214) from multiplex's formation 502 with BE.
Because step 1214 has carried out shifting out and moving into ripple logical block 204 outputs 214 of other 16 byte streams in the cycle 0, thereby the beginning cycle 1, this moment, BE comprised: the beginning byte (Sb) and the end byte (Eb) (that is the non-prefix byte of instruction b only has single byte) that are positioned at the instruction b of byte 0; Be positioned at five prefix bytes (Pc) of the instruction c of byte 1 to 5; Be positioned at the beginning byte (Sc) of the instruction c of byte 6; Be positioned at the end byte (Ec) of the instruction c of byte 8; Be positioned at the beginning byte (Sd) of the instruction d of byte 9; And be positioned at the end byte (Ed) of the instruction d of byte 15.
In 1 o'clock cycle, content (step 1201) and move instruction b, c and d to the F level (step 1202) of BE and NTBE inspected in M level steering logic unit 512.Moreover, M level steering logic unit 512 calculates the following: the length of instruction b (LEN1 1122) (step 1202) (being 15 bytes in this example), and it equals the residue length (being a byte) that part LEN 1104 (being ten nybbles) adds instruction b in this example in this example; The length of instruction c (in this example be the Eight characters joint), it equals the difference of the end byte position of the end byte position of instruction c and instruction b; And the length (being seven bytes in this example) of instruction d, it equals the difference of the end byte position of the end byte position of instruction d and instruction c.Moreover, because all end at instruction (the instruction b of BE, c, d) the last byte (byte 15) that has all transmitted (step 1204) and BE is end byte (step 1216) for the last byte of non-prefix byte (step 1206) and BE, so M level steering logic unit 512 shifts out (step 1214) from multiplex's formation 502 with BE.
According to embodiment shown in Figure 13, the preposition message 238 of accumulation by accumulation instruction b is to the part LEN 1104 of its operational code and save command b, so that order format device 106 can shift out the BE that contains the prefix byte of instruction b, and obtain and transmit maximum three instructions in next clock period from multiplex's formation 502.If do not accumulate preposition message 238 and storage compartment LEN 1104, this will be impossible (that is instruction c and d can't obtain in same period and transmit by and instruction b, but must carry out in next clock period).By so that the enough instructions of functional unit tool of microprocessor can be processed, can reduce the use of microprocessor 100 resources.
Figure 14 shows that multiplex's formation 502 of Fig. 5 is in the content of continuous two clock period, with the operation of illustration M level.The example of Figure 14 is similar to the example of Figure 13; Yet, the position of instruction and multiplex's formation 502 enter and to leave sequential variant.
In the cycle 0 at the beginning, BE is positioned at the end byte (Ea) that byte 1 includes instruction a, and is positioned at 14 prefix bytes (Pb) that byte 2 to 15 includes instruction b.In addition, because instruction b starts from BE, but to begin byte but be to be positioned at NTBE for it, so part LEN 1104 is calculated as 14.NTBE comprises: be positioned at the beginning byte (Sb) of instruction b of byte 16 and the end byte (Eb) (that is instruction b only is single byte except prefix byte) of instruction b; Be positioned at five prefix bytes (Pc) of the instruction c of byte 17-21; Be positioned at the beginning byte (Sc) of the instruction c of byte 22; Be positioned at the end byte (Ec) of the instruction c of byte 27; Be positioned at three prefix bytes (Pd) of the instruction d of byte 28-30; And be positioned at the beginning byte (Sd) of the instruction d of byte 31.SFBE comprises: be positioned at the end byte (Ed) of the instruction d of byte 41, and be positioned at the beginning byte (Se) of the instruction e of byte 42.
In 0 o'clock cycle, content (step 1201 of Figure 12) and move instruction a to the F level (step 1202) of BE and NTBE inspected in M level steering logic unit 512.Moreover, the length of M level steering logic unit 512 computations a, it equals the difference between the end byte position of the end byte position of instruction a and last instruction.At last, because (last byte (byte 15) that instruction a) has transmitted (step 1204) and BE is prefix byte (step 1206) to end at all instructions of BE, the partial-length of M level steering logic unit 512 computations b is ten nybbles, and it is stored in part LEN1104 working storage (step 1212).At last, M level steering logic unit 512 shifts out (step 1214) from multiplex's formation 502 with BE.
Because step 1214 shifted out in the cycle 0, thereby the beginning cycle 1, this moment, BE comprised the content of the NTBE in 0 o'clock cycle, and NTBE comprises the content of the SFBE in 0 o'clock cycle.
In 1 o'clock cycle, content (step 1201) and move instruction b, c and d to the F level (step 1202) of BE and NTBE inspected in M level steering logic unit 512.Moreover, M level steering logic unit 512 calculates the following: the length of instruction b (LEN1 1122) (step 1202) (being 15 bytes in this example), and it equals the residue length (being a byte) that part LEN 1104 (being ten nybbles) adds instruction b in this example in this example; The length of instruction c (being 11 bytes in this example), it equals the difference of the end byte position of the end byte position of instruction c and instruction b; And the length (being ten nybbles in this example) of instruction d, it equals the difference of the end byte position of the end byte position of instruction d and instruction c.Moreover, because the last byte (byte 15) that all instructions (instruction b, c, d) that end at BE have all transmitted (step 1204) and BE is non-for end byte (step 1216) and NTBE are effectively (step 1218) for the last byte of non-prefix byte (step 1206) and BE, so M level steering logic unit 512 shifts out (step 1214) from multiplex's formation 502 with BE.
According to embodiment shown in Figure 14, order format device 106 can be in a clock in the cycle, and three instructions that will contain maximum 40 command byte obtain and send out, as shown in figure 15.
[detection of bad branch prediction, sign and accumulation are in order to fast processing instruction crossfire]
Consult Fig. 1 again, extract at present address 142 in order to when instruction cache 102 extractions one command byte is listed as and offers XIBQ104 when extraction unit 126 outputs, BTAC128 also obtains simultaneously this and extracts at present address 142.Hit (hit) BTAC128 if extract at present address 142, then the address is previously herein extracted in expression has a branch instruction once to be performed; Therefore, whether BTAC128 is measurable has branch instruction that (taken) occurs, if be yes, then BTAC128 has also predicted predicted target address 146.Specifically, BTAC128 be microprocessor 100 obtain from the command byte crossfire or the branch instruction of decoding before namely predict.Therefore, the branch instruction that BTAC128 predicts may not be present in the cache column of command byte of taking-up, that is BTAC128 has done bad prediction, causes microprocessor 100 branch's mistakes.It should be noted that this bad prediction is not equal to incorrect prediction.Because program is carried out the tool dynamic property, for example change of the value of the status code of branch instruction or status data, so all branch predictors are in essence all might prediction error.Yet bad prediction herein represents that the cache column that BTAC128 predicts is different, and perhaps identical the but content in the cache column of cache column changes.Why the reason of these situations occurs, such as United States Patent (USP) 7,134,005 description, reason has following several: because the BTAC128 only address tag of storage compartment (tag) but not full address label, thereby cause label to obscure (aliasing); Because BTAC128 only stores virtual (virtual) address tag but not physical address, thereby cause and virtually obscure; And the spontaneous code (self-modifying code) of revising.When this situation occured, microprocessor 100 must be determined not bad predict command and follow-up because of bad predict command and the false command that mistake obtains sends out.
If indication 154 (Fig. 1) occur for logic true value for its branch of a command byte but in fact be not the first byte for instruction, as shown in figure 16, represent that namely BTAC128 has done bad prediction thereby caused branch's mistake of microprocessor 100.As previously mentioned, indication 154 expression BTAC128 occur and think that this command byte is first byte (that is operational code) of branch instruction in the true value branch that BTAC128 provides, and extraction unit 126 carries out branch according to the predicted target address 146 that BTAC128 predicts.
The determining method system of bad BTAC prediction waits for, until individual other instruction obtains from the command byte crossfire and length is known, and non-the first byte that scans each instruction indication 154 occurs whether as true to inspect its branch.Yet this kind inspection method is too slow, because it needs a lot of shieldings (masking) and shifts out, and needs result with each byte via logical OR (OR) computing, therefore can cause sequence problem.
For fear of sequence problem, indication 154 message that provide occur in embodiment of the invention accumulation branch, and it is the part of ripple logical block 204 executive routines, and after the M level gets instruction, use these accumulation message.Specifically, ripple logical block 204 detected states also hand on designator until the last byte of instruction, and it inspects single byte, that is the last byte of instruction.When getting instruction from the M level, determine whether an instruction is bad instruction, that is whether this instruction will be included in the instruction crossfire and continue is transmitted down along pipeline.
Figure 17 shows the composition signal of ripple logical block 204 outputs 214.Ripple logical block 204 output signals shown in Figure 17 are similar to shown in Figure 2, but have additionally increased bad BTAC position 1702 for each command byte, and it will be in following detailed description.In addition, ripple logical block 204 output comprises: a signal when it is logic true value, represent that corresponding command byte is the first byte of the branch instruction predicted of BTAC128, yet the branch instruction that BTAC128 predicts will can not occur; And another signal, it represents that last byte is the end byte of instruction.
Figure 18 shows the operational flowchart of the microprocessor 100 of Fig. 1.This flow process starts from step 1802.
In step 1802, BTAC (branch target address caching) 128 predicts in the cache column of present extraction address 142 indications that extraction unit 126 provides, have a branch instruction, and this branch instruction will occur.BTAC (branch target address caching) 128 goes back the predicted target address 146 of predicting branch instructions.Therefore, the first row of XIBQ104 in the middle of the instruction cache 102 at 142 indication places, present extraction address receives 16 command byte, and the then secondary series in the middle of the instruction cache 102 at predicted target address 146 indication places receives 16 command byte.Then, flow process enters step 1804.
In step 1804, XIBQ104 stores each branch and indication 154 (Fig. 1) occurs together with the corresponding command byte of two row that receives in step 1802.Then, flow process enters step 1806.
In step 1806, the first row of length decoder 202 and ripple logical block 204 processing instruction bytes, and detect that indication 154 occurs for branch that command byte contains logic true value but this byte is not the situation of the first byte of instruction, error situations as shown in figure 16.In other words, ripple logical block 204 knows which byte is the first byte in 16 groups of the command byte row, in order to set stop bit 234.Accordingly, the ripple logical block 204 of the first non-prefix byte of corresponding each instruction is inspected the logic true value of branch's generation indication 154 and is detected this situation.Then, flow process enters step 1808.
In step 1808, indicate 154 to be logic true value when the true value branch generation of non-the first byte that detects instruction, the bad BTAC position 1702 that ripple logical block 204 is set these command byte is logic true value.In addition, ripple logical block 204 is passed to the bad BTAC position 1702 of true value all the other bytes of 16 byte streams from its byte location.Moreover, if the end byte of instruction does not come across the first row of command byte, ripple logical block 204 update modes (for example flip-flop) (be not shown in graphic in) then are in order to represent in the row bad BTAC (branch target address caching) 128 predictions being occured in an instruction at present.Then, when the secondary series of ripple logical block 204 processing instruction bytes, because state is true, ripple logical block 204 is set its bad BTAC position 1702 for all bytes of command byte secondary series.Then, flow process enters step 1812.
In step 1812, for first and second row of command byte, multiplex's formation 502 stores the output 214 of ripple logical blocks 204, comprises bad BTAC position 1702, and stores together with each command byte.Then, flow process enters step 1814.
In step 1814, M level steering logic unit 512 finds that the bad BTAC position 1702 corresponding to command byte be that the stop bit 234 of logic true value and this command byte also is logic true value (that is, detect the situation that bad BTAC (branch target address caching) 128 predicts).Therefore, instruction that bad situation occurs and subsequent instructions thereof are abandoned transmitting to the F level by removing corresponding significance bit 534/536/538 in M level steering logic unit 512.Yet if before the instruction that bad situation occurs an instruction is arranged, this instruction is effectively and is transferred into the F level.As previously mentioned, the bad BTAC position 1702 of true value is passed to the end byte of the instruction that bad situation occurs, will be so that single byte be only inspected in M level steering logic unit 512, that is, the byte of stop bit 234 indications, thereby obviously reduce the restriction of sequential.Then, flow process enters 1816.
In step 1816, it is invalid that microprocessor 100 allows the wrong project of BTAC (branch target address caching) 128 become.In addition, microprocessor 100 is removed all the elements of XIBQ104 and multiplex's formation 502 and is allowed extraction unit 126 upgrade and extracts at present address 142, in order to the byte that again gets instruction from bad prediction place of BTAC (branch target address caching) 128 generations.When again obtaining, BTAC (branch target address caching) 128 can not produce bad prediction, because bad item is eliminated, that is when again obtaining, BTAC (branch target address caching) 128 predicted branches can not occur.In one embodiment, step 1816 is executed in the F level of order format device 106, and/or instruction transfer interpreter 112.Flow process ends at step 1816.
[effective decision of x86 instruction length]
Determine that the x86 instruction length is very complicated, it is described in the IA-32 of Intel Framework Software exploitation handbook (Intel IA-32 Architecture Software Developer ' s Manual), the chapter 2 of 2A collection: instruction set is with reference to (Instruction Set Reference), A-M.The instruction total length is following sum: whether number (1,2 or 3), the ModR/M byte of the number of prefix byte (if any), operation byte occurs, whether the SIB byte occurs, address displacement (displacement) length (if any) reaches the immediately length (if any) of (immediate) data.Below be characteristic or the requirement of x86 instruction, it is enough to affect the decision of length (except preposition):
The number of opcode byte is:
3, if the first two byte is 0F 38/3A
2, if the first word byte is 0F, and the second word byte is not 38/3A
1, other situation
Whether the ModR/M byte operational code occurs being decided by, as follows:
If be three byte oriented operands, then ModR/M is compulsory
If be a byte or two byte oriented operands, then inspect opcode byte
Whether the SIB byte ModR/M byte occurs being decided by.
Whether displacement the ModR/M byte occurs being decided by.
Displacement scale is decided by ModR/M byte and present address size (AS).
Whether immediate data opcode byte occurs being decided by.
The size of immediate data is decided by opcode byte, operational code size (OS), present AS and REX.W are preposition at present; Specifically, the ModR/M byte can not affect the immediate data size.
If there is not the ModR/M byte, then there are not SIB, displacement or immediate data.
When determining instruction length, instruction operation code and ModR/M byte only have five kinds of forms:
Operational code
The 0F+ operational code
Operational code+ModR/M
0F+ operational code+ModR/M
0F+38/3A+ operational code+ModR/M
Figure 19 shows the detailed block diagram of the length decoder 202 of Fig. 2.Fig. 2 has shown 16 length decoders 202.Figure 19 shows a representative length decoder 202, is denoted as n.As shown in Figure 2, each length decoder 202 corresponds to a byte of command byte crossfire 134.In other words, length decoder 0 corresponds to command byte 0, and length decoder 1 corresponds to command byte 1, until length decoder 15 corresponds to command byte 15.Length decoder 202 comprises programmable logic array (Programmable Logic Array, PLA) 1902,4:1 multiplexer 1906 and totalizer 1904.
One of them input of multiplexer 1906 receives null value.Three inputs of other of multiplexer 1906 receive the eaLen value 1912 from three high-order PLA 1902.Multiplexer 1906 selects one of them input in order to provide eaLen value 1918 as output, and it offers the second input of totalizer 1904 again.In one embodiment, in order to reduce transmission delay, can not use aforesaid multiplexer 1906, each eaLen value 1912 is input to totalizer 1904, and wherein eaLen value 1912 is ternary line or (tri-statewired-OR) signal.
1. for instruction type of not having a ModR/M byte shown below, then select null value:
Operational code only, or
The 0F+ operational code
2. for following instruction type, then select PLA n+1:
Operational code+ModR/M
3. for following instruction type, then select PLA n+2:
0F+ operational code+ModR/M
4。For following instruction type, then select PLA n+3:
0F+38/3A+ operational code+ModR/M
Figure 20 shows the configuration of 16 length decoders 202.PLA 15 (programmable logic array) 1902 receives the command byte 0 of command byte 15 and previous column, and multiplexer 15 1906 receives the eaLen value 1912 of three PLA1902 (not shown), and wherein these three PLA 1902 inspect respectively the command byte 0/1,1/2 and 2/3 of previous column.
The benefit that aforementioned each PLA 1902 inspects two bytes each time is to reduce in a large number required complete and (minterm) number, thereby is reduced the size of the logical circuit on the wafer.This design provides total full delay Balancing selection between the two that the item number purpose reduces and the sequential requirement allows that reaches.
Figure 21 shows the operational flowchart of the length decoder 202 of Figure 20.This flow process starts from step 2102.
In step 2102, for each command byte 134 from XIBQ104, corresponding PLA 1902 inspects two command byte 134, that is corresponding command byte 134 and next instruction byte 134.For example, PLA 3 (programmable logic array) 1902 inspects command byte 3 and 4.Then, flow process enters step 2104 and 2106 simultaneously.
In step 2104, each PLA 1902 hypothesis two command byte 134 be the first two opcode byte of instruction, and according to this two command byte 134, operand size (OS), address size (AS), reach the REX.W value to produce immLen value 1916.In details of the words, immLen value 1916 is following sum: the size (0,1,2,4 or 8) of the number of opcode byte (1,2 or 3) and immediate data.Then, flow process enters step 2114.
In step 2106, each PLA 1902 hypothesis first command byte 134 be the ModR/M byte, and according to ModR/M byte and address size (AS) with generation eaLen value 1918, and provide eaLen value 1918 to inferior three low order multiplexers 1906.In details of the words, eaLen value 1918 is following sum: the size (0,1,2,4) of ModR/M byte number (1), SIB byte (0 or 1) and displacement.Then, flow process enters step 2108.
In step 2108, the eaLen value 1912 that each multiplexer 1906 receives zero input and receives from three high-order PLA 1902.For example, PLA 3 (programmable logic array) 1902 receives eaLen value 1912 from PLA 4,5,6 (programmable logic array) 1902.Then, flow process enters step 2112.
In step 2112, each PLA 1902 produces control signal 1914 to corresponding multiplexer 1906, and wherein one inputs with selection according to aforementioned five kinds of forms.Then, flow process enters step 2114.
In step 2114, each totalizer 1904 adds to multiplexer 1906 selected eaLen values 1918 with immLen value 1916, to produce instruction length 222.Then, flow process enters step 2116.
In step 2116, if there is LMP is arranged, then the L level is for extra clock period of each instruction cost of containing LMP, as aforementioned graphic shown in, Fig. 1 to Fig. 4 particularly.
The above is embodiments of the invention only, is not to limit claim scope of the present invention.The equivalence that the personage in familiar with computers field finishes under the spirit that the disengaging invention does not disclose changes or modifies, and all should be included in the above-mentioned claim scope.For example, can use software with start-up performance, make, set up model, emulation, description and/or test disclosed device and method.But it reaches service routine language (for example C, C++), hardware description language (HDL), and it comprises Verilog HDL, VHDL and other program.This software can place computing machine can use media, for example semiconductor, disk or CD (for example CD-ROM, DVD-ROM).Disclosed device and method embodiment can be contained in intellecture property core (IPcore), for example microcontroller core (for example placing HDL) and be converted to hardware to make integrated circuit.Moreover disclosed device and method embodiment can implement with the combination of hardware and software.Therefore, the scope of the invention is not limited to any illustrative embodiments, and should define with claim scope and equivalent scope thereof.In details of the words, invention can be implemented in the micro processor, apparatus, this microprocessor can be used in the general computing machine.At last, those skilled in the art can use disclosed concept and specific embodiment as the basis with design or be modified as other framework, in order to realize identical purpose, it does not break away from claim scope of the present invention yet.
Claims (13)
1. device that is applicable to microprocessor, the instruction set architecture of this microprocessor comprises a length and revises preposition symbol, in order to select one address/operand size beyond a default address/operand size, this device is in order to indicate beginning byte and the end byte of each instruction in the middle of the command byte crossfire, and this device comprises:
One decode logic unit, in order to each command byte in the predetermined number command byte of this crossfire of decoding, judging whether each command byte is appointed as this length and is revised preposition symbol, and indicate at the beginning and one finish to indicate to produce according to address/operand size; And
One operand/address size logical block, be coupled to this decode logic unit, in order to provide this predetermined registration operation number/address size to this decode logic unit, begin to indicate and this end sign within one first clock period, to produce this, this decode logic unit this predetermined number command byte of within this first clock period, decoding, wherein, if within any one cycle of this first clock period and a follow-up N clock period, this decode logic unit represents that one of them command byte of this predetermined number command byte is appointed as this length and revises preposition symbol, then this operand/address size logical block provides this length to revise this specified address of preposition symbol/operand size to this decode logic unit in next clock period, begins to indicate and this end sign to produce this.
2. device according to claim 1, wherein above-mentioned N contains the number of instructions that at least one this length is revised preposition symbol in this predetermined number command byte of this crossfire, wherein, do not have instruction to contain this length in this predetermined number command byte of this crossfire and revise preposition symbol, then N is zero.
3. device according to claim 2, no matter wherein how many prefix bytes the arbitrary instruction in this predetermined number command byte of this crossfire contains, above-mentioned is installed on N+1 in the clock period, produces correct this for above-mentioned each command byte and begins to indicate and this ends sign.
4. device according to claim 2, wherein,
For this first clock period, if this decode logic unit represents that not having command byte to be appointed as this length in this predetermined number command byte of this crossfire revises preposition symbol; And
For N the clock period after this first clock period, if this decode logic unit represents that having one or more command byte to be appointed as this length in this predetermined number command byte of this crossfire revises preposition symbol,
Then above-mentioned device so that this predetermined number command byte of this crossfire in an instruction queue, shift out.
5. device according to claim 1, wherein,
This of each decoded command byte begins to indicate to represent whether this decoded command byte is the first non-prefix byte of this instruction; And
This end of each decoded command byte indicates in order to represent whether this decoded command byte is the last byte of this instruction.
6. device according to claim 1, wherein above-mentioned decode logic unit produces the preposition message of an accumulation for each command byte of this predetermined number command byte of this crossfire; Wherein the preposition message of above-mentioned accumulation in order to represent following the two one of:
Accumulate this command byte of preposition message and all preposition message of prefix byte before thereof corresponding to this, wherein this prefix byte is among the instruction that comprises corresponding this command byte; And
Be the first non-prefix byte of this instruction if begin to indicate this command byte of expression corresponding to this this of this command byte of accumulating preposition message, then the preposition message of above-mentioned accumulation is relevant to all preposition message of the prefix byte of this instruction in order to expression.
7. device according to claim 1, wherein above-mentioned decode logic unit comprises:
One first logical block produces an instruction length in order to each command byte to this predetermined number command byte of this crossfire; And
One second logical block is coupled to this first logical block, and the instruction length that it provides according to this address/operand size and this first logical block begins to indicate and this ends sign with this that produces each command byte,
Wherein above-mentioned instruction length does not contain prefix byte.
8. method that is applicable to microprocessor, the instruction set architecture of this microprocessor comprises a length and revises preposition symbol, in order to select one address/operand size beyond a default address/operand size, the method is in order to indicate beginning byte and the end byte of each instruction in the middle of the command byte crossfire, and the method comprises:
Within one first clock period, whether each command byte in the predetermined number command byte of this crossfire of decoding revises preposition symbol as this length to judge each command byte;
Revise preposition symbol if decoded command byte is non-for length, then in this first clock period, according to this predetermined registration operation number/address size, produce correct at the beginning sign and for each command byte of this predetermined number command byte of this crossfire and finish to indicate; And
If one or more decoded command byte is revised preposition symbol for this length, N clock period after the first clock period then, revise this address of preposition symbol indication/operand size according to this length, produce correct this for each command byte of this predetermined number command byte of this crossfire and begin to indicate and this ends sign;
Wherein above-mentioned N contains the number of instructions that at least one this length is revised preposition symbol in this predetermined number command byte of this crossfire.
9. method according to claim 8, no matter how many prefix bytes the arbitrary instruction wherein in this predetermined number command byte of this crossfire contains, correct this of above-mentioned generation begins to indicate and this ends sign is performed in N+1 in the clock period.
10. method according to claim 8 also comprises:
For this first clock period, be that this length is revised preposition symbol if there is not command byte in this predetermined number command byte of this crossfire; And
For N the clock period after this first clock period, if having one or more command byte to revise preposition symbol for this length in this predetermined number command byte of this crossfire,
Then so that this predetermined number command byte of this crossfire in an instruction queue, shift out.
11. method according to claim 8, wherein,
This of each decoded command byte begins to indicate to represent whether this decoded command byte is the first non-prefix byte of this instruction; And
This end of each decoded command byte indicates in order to represent whether this decoded command byte is the last byte of this instruction.
12. method according to claim 8, wherein the decoding of each command byte of above-mentioned this predetermined number command byte for this crossfire comprises:
Produce the preposition message of an accumulation for each command byte,
Wherein the preposition message of above-mentioned accumulation in order to represent following the two one of:
Accumulate this command byte of preposition message and all preposition message of all prefix bytes before thereof corresponding to this, wherein this prefix byte is among the instruction that comprises corresponding this command byte; And
Be the first non-prefix byte of this instruction if begin to indicate this command byte of expression corresponding to this this of this instruction of accumulating preposition message, then the preposition message of above-mentioned accumulation is relevant to all preposition message of the prefix byte of this instruction in order to expression.
13. method according to claim 8, wherein the decoding of each command byte of above-mentioned this predetermined number command byte for this crossfire comprises:
Each command byte to this predetermined number command byte of this crossfire produces an instruction length; And
According to this address/operand size and this instruction length, this that produces each command byte begins to indicate and this ends sign,
Wherein above-mentioned instruction length does not contain prefix byte.
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17961609P | 2009-05-19 | 2009-05-19 | |
US61/179,616 | 2009-05-19 | ||
US22829609P | 2009-07-24 | 2009-07-24 | |
US61/228,296 | 2009-07-24 | ||
US12/571,997 | 2009-10-01 | ||
US12/571,997 US8612727B2 (en) | 2009-05-19 | 2009-10-01 | Apparatus and method for marking start and end bytes of instructions in a stream of instruction bytes in a microprocessor having an instruction set architecture in which instructions may include a length-modifying prefix |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101833436A CN101833436A (en) | 2010-09-15 |
CN101833436B true CN101833436B (en) | 2013-04-17 |
Family
ID=42717516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010185611 Active CN101833436B (en) | 2009-05-19 | 2010-05-19 | Device and method suitable for a microprocessor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101833436B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11822902B2 (en) * | 2020-01-21 | 2023-11-21 | California Institute Of Technology | Systems and methods for performing lossless source coding |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1375767A (en) * | 2001-07-03 | 2002-10-23 | 智权第一公司 | Apparatus and method for providing branch instruction and relative target instruction to buffering zone |
CN1414465A (en) * | 2002-05-09 | 2003-04-30 | 智慧第一公司 | Device and method for selectivity controlling result write back |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050138330A1 (en) * | 2003-12-23 | 2005-06-23 | Maxim Integrated Products, Inc. | MAXQ microcontroller |
-
2010
- 2010-05-19 CN CN 201010185611 patent/CN101833436B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1375767A (en) * | 2001-07-03 | 2002-10-23 | 智权第一公司 | Apparatus and method for providing branch instruction and relative target instruction to buffering zone |
CN1414465A (en) * | 2002-05-09 | 2003-04-30 | 智慧第一公司 | Device and method for selectivity controlling result write back |
Also Published As
Publication number | Publication date |
---|---|
CN101833436A (en) | 2010-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8769539B2 (en) | Scheduling scheme for load/store operations | |
US4860199A (en) | Hashing indexer for branch cache | |
JP6849274B2 (en) | Instructions and logic to perform a single fused cycle increment-comparison-jump | |
CN100468323C (en) | Pipeline type microprocessor, device and method for generating early stage instruction results | |
CN101558388B (en) | Data cache virtual hint way prediction, and applications thereof | |
US6542985B1 (en) | Event counter | |
US6157994A (en) | Microprocessor employing and method of using a control bit vector storage for instruction execution | |
US20020069375A1 (en) | System, method, and article of manufacture for data transfer across clock domains | |
JPH07334361A (en) | Microprocessor device with pipeline for processing of instruction and apparatus for generation of program counter value used in it | |
CN101529378B (en) | A system and method for using a working global history register | |
WO1995016954A1 (en) | Apparatus for processing instruction in computer system | |
CN104335168A (en) | Branch prediction preloading | |
JPH0785223B2 (en) | Digital computer and branch instruction execution method | |
KR20010075258A (en) | Method for calculating indirect branch targets | |
US5860154A (en) | Method and apparatus for calculating effective memory addresses | |
WO2014090085A1 (en) | Branch-free condition evaluation | |
WO2014090091A1 (en) | Tracking multiple conditions in a general purpose register and instruction therefor | |
CN101884025B (en) | Method and system for accelerating procedure return sequences | |
US9626185B2 (en) | IT instruction pre-decode | |
CN101833437B (en) | Device and method for a microprocessor | |
CN101833436B (en) | Device and method suitable for a microprocessor | |
KR100431975B1 (en) | Multi-instruction dispatch system for pipelined microprocessors with no branch interruption | |
CN101853148B (en) | Device and method adaptive to microprocessor | |
CN101853151B (en) | Device and method adaptive to microprocessor | |
CN101819517B (en) | Device and method suitable for microprocessor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |