WO1997031310A1 - A microprocessor configured to execute instructions which specify increment of a source operand - Google Patents
A microprocessor configured to execute instructions which specify increment of a source operand Download PDFInfo
- Publication number
- WO1997031310A1 WO1997031310A1 PCT/US1997/001089 US9701089W WO9731310A1 WO 1997031310 A1 WO1997031310 A1 WO 1997031310A1 US 9701089 W US9701089 W US 9701089W WO 9731310 A1 WO9731310 A1 WO 9731310A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- microprocessor
- increment
- operand
- unit
- Prior art date
Links
- 230000004044 response Effects 0.000 claims description 12
- 230000004048 modification Effects 0.000 claims description 7
- 238000012986 modification Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 10
- 239000013598 vector Substances 0.000 description 8
- 238000006073 displacement reaction Methods 0.000 description 5
- 230000003252 repetitive effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 241001306288 Ophrys fuciflora Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/355—Indexed addressing
Definitions
- TITLE A Microprocessor Configured to Execute Instructions Which Specify Increment of a Source Operand
- This invention relates to the field of microprocessors and, more particularly, to a microprocessor configured to execute instructions which specify incrementing of a source operand
- Computer systems employ one or more microprocessors, and often employ digital signal processors (DSPs)
- DSPs digital signal processors
- the DSPs are typically included within multimedia devices such as sound cards, speech recognition cards, video capture cards, etc.
- the DSPs function as coprocessors, performing complex and repetitive mathematical computations demanded by multimedia devices and other signal processing applications more efficiently than general purpose microprocessors.
- Microprocessors are typically optimized for performing integer operations upon values stored within a main memory of a computer system While DSPs perform many of the multimedia functions, the microprocessor manages the operation of the computer system.
- Digital signal processors include execution units which comprise one or more arithmetic logic units
- the instruction set primarily comprises DSP-type instructions (i.e. instructions optimized for the performance of complex mathematical operations) and also includes a small number of non-DSP mstructions.
- the non-DSP mstructions are in many ways similar to instructions executed by microprocessors, and are necessary for allowing the DSP to function independent of the microprocessor.
- the DSP is typically optimized for mathematical algorithms such as correlation, convolution, finite impulse response (FIR) filters, infinite impulse response (IIR) filters, Fast Fourier Transforms (FFTs), matrix transformations, and inner products, among other operations.
- Implementations of these mathematical algorithms generally comprise long sequences of systematic arithmetic/multiplicative operations These operations are interrupted on various occasions by decision-type commands.
- the DSP sequences are a repetition of a very small set of instructions that are executed 70% to 90% of the time The remaining 10% to 30% of the instructions are primarily boolean decision operations.
- Many of these mathematical algorithms perform a repetitive multiply and accumulate function in which a pair of operands are multiplied together and added to a third operand. The third operand is often used to store an accumulation of prior multiplications. Therefore, DSP hardware often includes hardware configured to quickly perform a multiply-add sequence.
- An exemplary DSP is the ADSP 2171 available from Analog Devices, Inc. of Norwood, Massachusetts.
- DSP digital signal processor
- Instruction code written in the x86 instruction set may perform the mathematical operations that DSPs typically perform. Cost of the computer system may be reduced through the elimination of one or more DSPs while still performing equivalent functionality. Unfortunately, the instruction code written for the microprocessor may not be as efficient at performing the operations as DSP instruction code.
- DSPs often operate upon a large number of operands stored in a memory. Therefore, DSPs are configured to access the memory operands very efficiently.
- microprocessors are often not configured in this manner.
- Table 1 below shows an exemplary instruction sequence for performing an inner product, written in the x86 instruction set.
- the x86 instruction set and microprocessor architecture is employed by many microprocessors due to its widespread acceptance in the computer industry.
- MOV EDI, addr2 EDI address of first element of second vector
- MOV EAX, 0 EAX index into first vector
- MOV EBX, 0 EBX index into second vector
- the last 8 instructions repetitively perform a multiply-add sequence upon two vectors of values stored in main memory. New operands are accessed by incrementing the EAX and EBX register values after each access.
- Similar multiply-add loops are used in other mathematical functions typically performed by DSP. It is desirable to improve the efficiency of the loop (i.e. to include fewer instructions within the loop while still performing the same function). Having fewer instructions will occupy fewer execution units for each iteration of the loop, which may improve performance by freeing execution units to execute other instructions (e.g. instructions from the next iteration of the loop). The performance increase may be considerable, since the loop is executed repetitively.
- the problems outlined above are in large part solved by a microprocessor in accordance with the present invention.
- the present microprocessor executes instructions which specify incrementation of a source operand in addition to the operation defined by the instructions.
- Several embodiments of the microprocessor are shown which include hardware for performing the increments.
- loops of instructions which increment a register to access new operands within successive loop iterations may be performed more efficiently. Fewer instructions may be included within the loop by specifying increments within an instruction which performs another function (for example, a load of the operands). Having fewer instructions within the loop may lead to fewer execution units being used to execute an iteration of the loop. Subsequent iterations of the loop or instructions subsequent to the loop may be executed in the execution units not used by the more efficient loop. Therefore, performance of the microprocessor and of a computer system employing the microprocessor may be increased.
- the present invention contemplates a microprocessor configured to execute an instruction which specifies incrementing a source operand in addition to an instruction operation.
- the microprocessor comprises an instruction decode unit and a second unit.
- the instruction decode unit is configured to decode the instruction. Coupled to receive an indication of the instruction from the instruction decode unit, the second unit is configured to increment the source operand in response to the indication of the instruction. The increment of the source operand is performed in addition to an instruction operation defined by the instruction.
- the present invention further contemplates a computer system comprising a microprocessor and a main memory.
- the microprocessor is configured to execute an instruction which specifies incrementing a source operand in addition to an instruction operation.
- the main memory is configured to store the instruction as well as other instructions executed by the microprocessor and operands to be operated upon by the microprocessor.
- Fig. 1 is a diagram of an instruction showing instruction fields including an SIB field and an increment field.
- Fig. 1 A is a diagram of one embodiment of the SIB field shown in Fig. 1.
- Fig. 1 B is a diagram of one embodiment of the increment field shown in Fig. 1.
- Fig IC is a diagram of another embodiment of the increment field shown in Fig. 1
- Fig. 2 is a block diagram of a computer system including a microprocessor
- Fig. 3 is a block diagram of one embodiment of the microprocessor shown in Fig 2, including a plurality of execute units, a reorder buffer, and a register file
- Fig 4 is a block diagram of one embodiment of the reorder buffer shown in Fig. 3
- Fig. 4A is a diagram of the information stored in one embodiment of the reorder buffer shown in Fig. 4
- Fig. 5 is a block diagram of one embodiment of an execute unit.
- Fig. 6 is a block diagram of one embodiment of the register file shown in Fig. 3.
- Instruction 10 is an improvement upon the x86 instruction format. More information regarding the x86 instruction format may be found in the publication: "PC Magazine Programmer's Technical Reference: the Processor and Coprocessor" by Hummel, Ziff-Davis Press, Emeryville, California, 1992. This publication is incorporated herein by reference m its entirety Instruction 10 includes a prefix field 12, an opcode field 14, a Mod R/M field 16, an SIB field 18, an increment field 20, a displacement field 22, and an immediate field 24.
- opcode field 14, Mod R M field 16, SIB field 18, and increment field 20 specify source and destination operands for instruction 10
- Register operands may be specified by opcode field 14, Mod R M field 16, SIB field 18 or increment field 20
- Memory operands may be specified by Mod R M field 16 and SIB field 18.
- SIB field 18 and increment field 20 specify a register operand containing a value used to form an address of a memory operand
- inclusion of increment field 20 specifies that the register operand be mcremented.
- the loop shown in Table 1 may be reduced from eight instructions to six mstructions
- the loop may be executed more efficiently, which may result in higher performance upon DSP functions by a microprocessor empioymg an instruction set including SIB field 18 and increment field 20
- Opcode field 14 includes bits which identify a particular instruction withm the instruction set employed by a microprocessor
- opcode field 14 may also specify up to one register operand
- Prefix field 12 may be used to modify the default operation specified by opcode field 14
- prefix field 12 may be encoded to change the operand or address size that an instruction is to operate upon
- Displacement field 22 specifies a constant displacement value which is used to form an address for a memory operand
- Immediate field 24 includes a constant value which is used directly as an operand of the instruction
- the fields of mstruction 10 comprise bytes (wherein a byte is eight bits, or b ary digits) as listed in Table 2 below The number of bytes withm a field may vary from mstruction to mstruction within the ranges set out in Table 2
- a "field" comprises one or more bits which are logically grouped together and interpreted by a microprocessor according to the definition of the field
- the bits of opcode field 14 are grouped together and decoded to determme which mstruction withm the mstruction set is to be executed
- An "operand” refers to a value operated upon during execution of the mstruction or produced by execution of the mstruction Values operated upon during execution of the mstruction are referred to as "source operands"
- source operands are not modified via execution of the mstruction
- a source operand may be mcremented
- the mcremented source operand is stored mto the storage location from which the source operand was drawn
- a source operand which is used to form an address of a memory operand (wherem the memory operand is operated upon by the mstruction) is mcremented Value
- SIB field 18 includes a scale field
- index field 32 specifies register operands which are used to form the address of a memory operand
- the mdex operand (1 e the operand specified by index field 32) is added to the base operand (1 e the operand specified by base field 34) to form the address
- the base register is held constant and the mdex register is modified to access various memory operands withm a particular memory range
- the register operand specified by the index field may be scaled (I e multiplied) by a scale factor specified by scale field 30
- scale field 30 comprises two bits encoding scaling factors of 1 , 2, 4, and 8
- base field 34 and index field 32 comprise three bits each
- Index field 32 is encoded, according to one embodiment, as shown in Table 3 below One encodmg of index field 32 mdicates that mcrement field 20 is included withm the mstruction Increment field 20 specifies the index operand for that encodmg, and the mdex operand is mcremented when mstruction 10 is executed
- mcrement field 20 includes an ⁇ ndex2 field 36, and a reserved field 38 Index2 field 36 compnses three bits, encoded as shown in Table 4 for this embodiment Reserved field 38 comprises the remaining 5 bits of mcrement field 20 These bits are not interpreted by a microprocessor which executes instruction 10 If the "no operand" encodmg is used, then the address formed by the microprocessor is undefined
- SIB field 18 is included in instruction 10. If the programmer wishes to code a memory operand specified by the address [Base+scale*index], index++ (wherein the index++ indication means mat the index register value is incremented subsequent to forming the address) then SIB field 18 and increment field 20 are included in instruction 10. More particularly according to one embodiment, a memory operand specified by the address [EAX+1*EBX] implies an SIB field 18 coded as hexadecimal 18. If a memory operand of [EAX+1*EBX], EBX++ is desired, then SIB field 18 is coded as hexadecimal 20 and increment field 20 is coded as hexadecimal 60.
- increment field 20 includes a plurality of increment identification fields 39A-39D.
- Each increment identification field 39A-39D identifies an action to be taken upon the value of a particular register.
- increment identification field 39A is assigned to the EAX register; increment identification field 39B is assigned to the ECX register; increment identification field 39C is assigned to the ESI register; and increment identification field 39D is assigned to the EDI register.
- increment identification fields 39A-39D indicates the incrementation action to be taken upon the value stored in the corresponding register.
- each increment identification field 39A-39D comprises a bit indicative, when set, that the corresponding register value should be incremented. When clear, the bit indicates that the corresponding register value should not be incremented.
- each increment identification field 39A-39D comprises two bits encoded as shown in Table 5 below.
- Table 5 advantageously allows for both increment and decrement of the corresponding register.
- up to four registers may be specified for increment using increment field 20.
- multiple source operands may be incremented using a single instruction.
- the INC EAX and INC EBX instructions may be removed from the loop shown in Table 1.
- the two increment instructions may be replaced by a single increment field 20 included within one of the remaining instructions in the loop.
- increment is used to refer to the modification of source operands under control of increment field 20.
- incrementing or decrementing source operands is contemplated, as shown in Table 5. Performing either increment or decrement operations in response to increment field 20 is within the spirit and scope of the present invention.
- Table 5 Encodings for Increment Fields 39A-39D
- Computer system 40 includes a microprocessor 42, a bus bridge 44, a main memory 46, and a plurality of input output (I O) devices 48A-48N (collectively referred to as I O devices 48).
- a system bus 50 couples microprocessor 42, bus bridge 44, and main memory 46.
- I O devices 48A-48N are coupled to bus bridge 44 via an I/O bus 52.
- microprocessor 42 includes circuitry for executing instructions such as instruction 10 shown in Fig. 1.
- microprocessor 42 includes hardware for executing an instruction to produce a result and further to concurrently increment a source register operand. Instructions which both produce a result and mcrement a source operand may be beneficial in a large number of programs, particularly DSP programs. Since DSP programs often include repetitive mathematical operations performed upon memory operands stored in a regular fashion within main memory 46 (such that the index operand used to form the memory address is often incremented after executing the mathematical operation), the number of instructions necessary to perform the DSP program may be advantageously reduced. The program may be smaller in size, and may execute more quickly due to the reduced number of instructions.
- Microprocessor 42 generally executes instructions and operates upon data stored in main memory 46, and may additionally communicate with I/O devices 48. In one embodiment, microprocessor 42 employs the x86 microprocessor architecture including the improved instruction encoding shown in Fig. 1.
- Bus bridge 44 is provided to assist in communications between I/O devices 48 and devices coupled to system bus 50. I/O devices 48 typically require longer bus clock cycles than microprocessor 42 and other devices coupled to system bus 50. Therefore, bus bridge 44 provides a buffer between system bus 50 and input output bus 52. Additionally, bus bridge 44 translates transactions from one bus protocol to another.
- input/output bus 52 is an Enhanced Industry Standard Architecture (EISA) bus and bus bridge 44 translates from the system bus protocol to the EISA bus protocol.
- EISA Enhanced Industry Standard Architecture
- input output bus 52 is a Peripheral Component Interconnect (PCI) bus and bus bridge 44 translates from the system bus protocol to the PCI bus protocol. It is noted that many variations of system bus protocols exist. Microprocessor 42 may employ any suitable system bus protocol.
- I O devices 48 provide an interface between computer system 10 and other devices external to the computer system. Exemplary I/O devices include a modem, a serial or parallel port, a sound card, etc. I/O devices 48 may also be referred to as peripheral devices.
- Main memory 46 stores data and instructions for use by microprocessor 42. In one embodiment, main memory 46 includes at least one Dynamic Random Access Memory (DRAM) cell and a DRAM memory controller.
- DRAM Dynamic Random Access Memory
- computer system 40 as shown in Fig. 2 includes one microprocessor, other embodiments of computer system 40 may include multiple microprocessors similar to microprocessor 42.
- computer system 40 may include multiple bus bridges 44 for translating to multiple dissimilar or similar I/O bus protocols.
- a cache memory for enhancing the performance of computer system 40 by storing instructions and data referenced by microprocessor 42 in a faster memory storage may be included.
- the cache memory may be inserted between microprocessor 42 and system bus 50, or may reside on system bus 50 in a "lookaside" configuration.
- a signal is “asserted” if it conveys a value indicative of a particular condition. Conversely, a signal is “deasserted” if it conveys a value indicative of a lack'of a particular condition.
- a signal may be defined to be asserted when it conveys a logical zero value or, conversely, when it conveys a logical one value.
- Microprocessor 42 includes a bus interface unit 60, an instruction cache 62, a data cache 64, an instruction decode unit 66, a plurality of execute units including execute units 68 A and 68B, a load/store unit 70, a reorder buffer 72, and a register file 74.
- the plurality of execute units will be collectively referred to herein as execute units 68, and may include more execute units than execute units 68 A and 68B shown in Fig. 3.
- an embodiment of microprocessor 42 may include one execute unit 68.
- Bus interface unit 60 is coupled to instruction cache 62, data cache 64, and system bus 50.
- Instruction cache 62 is coupled to instruction decode unit 66, which is further coupled to execute units 68, reorder buffer 72, and load/store unit 70.
- Reorder buffer 72, execute units 68, and load/store unit 70 are each coupled to a result bus 78 for forwarding of execution results.
- Load/store unit 70 is coupled to data cache 64.
- instruction decode unit 66 is configured to decode instructions, including instructions having increment field 20.
- an instruction is detected which includes increment field 20, an indication that the index source operand is to be incremented in addition to producing the result defined by the opcode of the instruction is conveyed with the instruction.
- Another unit within microprocessor 42 performs the increment of the source operand.
- reorder buffer 72 allocates a storage location for the incremented source operand and performs the increment.
- the incremented source operand is stored into register file 74 as well.
- execute units 68 include hardware for performing the increment in parallel with executing the instruction. Both results are conveyed from the execute unit 68 to reorder buffer 72 simultaneously.
- register file 74 performs the increment.
- microprocessor 42 is capable of executing instruction 10. Performance of many programs including DSP programs may be enhanced.
- Instruction cache 62 is a high speed cache memory for storing instructions. It is noted that instruction cache 62 may be configured into a fully associative, set-associative, or direct mapped configuration. Instruction cache 62 may additionally include a branch prediction mechanism for predicting branch instructions as either taken or not taken. Instructions are fetched from instruction cache 62 and conveyed to instruction decode unit 66 for decode and dispatch to an execute unit 68.
- instruction decode unit 66 decodes instructions.
- decoding refers to transforming the instruction from the format shown as instruction 10 into a second format expected by execute units 68.
- the second format comprises decoded control signals for controlling data flow elements such as adders and multiplexors in order to form the operation the instruction defines.
- instruction decode unit 66 decodes each instruction fetched from instruction cache 62.
- Instruction decode unit 66 dispatches the instruction to execute units 68 and/or load/store unit 70.
- Instruction decode unit 66 also detects the register operands used by the instruction and requests these operands from reorder buffer 72 and register file 74.
- execute units 68 are symmetrical execution units.
- Symmetrical execution units are each configured to execute a particular subset of the instruction set employed by microprocessor 42. The subsets of the instruction set executed by each of the symmetrical execution units are the same.
- execute units 68 are asymmetrical execution units configured to execute dissimilar instruction subsets.
- execute units 68 may include a branch execute unit for executing branch instructions, one or more arithmetic/logic units for executing arithmetic and logical instructions, and one or more floating point units for executing floating point instructions.
- Instruction decode unit 66 dispatches an instruction to an execute unit 68 or load/store unit 70 which is configured to execute that instruction.
- Load/store unit 70 provides an interface between execute units 68 and data cache 64. Load and store memory operations are performed by load/store unit 70 to data cache 64. Additionally, memory dependencies between load and store memory operations are detected and handled by load/store unit 70.
- Execute units 68 and load/store unit 70 may include one or more reservation stations for storing instructions whose operands have not yet been provided. An instruction is selected from those stored in the reservation stations for execution if: (1) the operands of the instruction have been provided, and (2) the instructions which are prior to the instruction being selected have not yet received operands. It is noted that a centralized reservation station may be included instead of separate reservation stations. The centralized reservation station is coupled between instruction decode unit 66, execute units 68, and load store unit 70. Such an embodiment may perform the dispatch function within the centralized reservation station.
- Microprocessor 42 supports out of order execution, and employs reorder buffer 72 for storing execution results of speculatively executed instructions and for storing these results into register file 74 in program order; for performing dependency checking and register renaming; and for providing for mispredicted branch and exception recovery.
- reorder buffer 72 for storing execution results of speculatively executed instructions and for storing these results into register file 74 in program order; for performing dependency checking and register renaming; and for providing for mispredicted branch and exception recovery.
- one of three values is transferred to the execute unit 68 and or load/store unit 70 which receives the mstruction- ( 1 ) the value stored in reorder buffer 72, if the value has been speculatively generated; (2) a tag identifying a location withm reorder buffer 72 which will store the result, if the value has not been speculatively generated; or (3) the value stored in the register within register file 74, if no instructions within reorder buffer 72 modify the register Additionally, a storage location withm reorder buffer 72 is allocated for stormg the results of the instruction being decoded by instruction decode unit 66. The storage location is identified by a tag, which is conveyed to the unit receiving the instruction.
- execute units 68 or load/store unit 70 execute an instruction
- the tag assigned to the mstruction by reorder buffer 72 is conveyed upon result bus 78 along with the result of the mstruction.
- Reorder buffer 72 stores the result in the indicated storage location.
- execute units 68 and load/store unit 70 compare the tags conveyed upon result bus 78 with tags of operands for mstructions stored therein. If a match occurs, the unit captures the result from result bus 78 and stores it with the corresponding instruction In this manner, an mstruction may receive the operands it is intended to operate upon. Capturing results from result bus 78 for use by instructions is referred to as "result forwarding"
- Instruction results are stored into register file 74 by reorder buffer 72 in program order.
- Stormg the results of an instruction and deleting the instruction from reorder buffer 72 is referred to as "retiring" the instruction.
- retiring By retiring the instructions in program order, recovery from incorrect speculative execution may be performed. For example, if an instruction is subsequent to a branch instruction whose taken/not taken prediction is incorrect, then the instruction may be executed incorrectly.
- reorder buffer 72 discards the instructions subsequent to the mispredicted branch instructions. Instructions thus discarded are also flushed from execute units 68, load/store unit 70. and instruction decode unit 66.
- Register file 74 includes storage locations for each register defined by the microprocessor architecture employed by microprocessor 42.
- microprocessor 42 may employ the x86 microprocessor architecture.
- register file 74 includes locations for storing the EAX, EBX, ECX, EDX, ESI, EDI, ESP, and EBP register values.
- Data cache 64 is a high speed cache memory configured to store data to be operated upon by microprocessor 42. It is noted that data cache 64 may be configured into a fully associative, set-associative or direct-mapped configuration.
- Bus interface unit 60 is configured to effect communication between microprocessor 42 and devices coupled to system bus 50 For example, instruction fetches which miss instruction cache 62 may be transferred from mam memory 46 by bus interface unit 60 Similarly, data requests performed by load/store unit 70 which miss data cache 64 may be transferred from ma memory 46 by bus interface unit 60 Additionally, data cache 64 may discard a cache line of data which has been modified by microprocessor 42 Bus interface unit 60 transfers the modified line to main memory 46
- instruction decode unit 66 may be configured to dispatch an instruction to more than one execution unit
- certain instructions may operate upon memory operands Executing such an instruction involves transferring the memory operand from data cache 64, executing the instruction, and transferring the result to memory (if the destination operand is a memory location)
- Load/store unit 70 performs the memory transfers, and an execute unit 68 performs the execution of the mstruction
- instruction decode unit 66 may be configured to decode multiple instructions per clock cycle
- mstruction decode unit 66 is configured to decode and dispatch up to one instruction per execute unit 68 and load/store unit 70 per clock cycle
- reorder buffer 72 which implements the incrementing of source operands is shown
- the embodiment shown in Fig 4 may be suitable for an embodiment of microprocessor 42 in which reorder buffer 72 is configured to increment source operands
- Reorder buffer 72 includes a control unit 80, an instruction buffer 82, and a plurality of incrementor circuits 84 (including incrementor circuits 84A, 84B, and 84C)
- Control unit 80 receives destination operands, source operands, and increment indications from instruction decode unit 66 upon destination operands bus 86, source operands bus 88, and increment bus 90, respectively
- results of executing instruction are received from execute units 68 and load/store unit 70 upon result bus 78
- Operand tags and value are conveyed m response to source and destination operands upon operand tags/values bus 92
- Instruction buffer 82 includes a plurality of storage locations 83 (such as storage locations 83A, 83B, and
- instruction buffer 82 includes storage locations 83 having sufficient storage space for a destination value as well as an incremented source value Such an embodiment may thereby store modifications to each register operand modified by a particular instruction
- the information stored in each storage location 83 is shown below in another embodiment, multiple source operands may be incremented (e g increment identification field 39 shown in Fig
- Storage locations 83 include storage sufficient for the multiple incremented source operands when employed in such embodiments
- Destination operands bus 86 and source operands bus 88 transmit the source and destination register operands to control unit 80
- Control unit 80 provides operand values or tags for source operands if those operands are stored within instruction buffer 82 Otherwise, the operand values are provided by register file 74
- a storage location within instruction buffer 82 is allocated for each instruction being dispatched
- tags for the destination operands are allocated according to the storage locations allocated The tag, therefore, identifies the storage location within instruction buffer 82 which is allocated to store the instruction result
- These tags and/or operand values are conveyed upon operand tags/values bus 92 to execute units 68 and load/store unit 70
- the receiving units associate the operands with their respective instructions
- control unit 80 maintains head and tail pointers indicating the first and last instructions (in program order) withm mstruction buffer 82 Tags are allocated using the tail pointer, and the tail pointer is modified to allocate the required number of storage locations As noted above, the tags are conveyed along with results upon result
- the increment indication associated with a particular instruction causes control unit 80 to allocate a tag for the source operand to be incremented, as well as for the destination operand for stormg execution results In this manner, tags are allocated for each register which is modified by the instruction
- the tag therefore, additionally identifies whether a particular value is the result of an instruction or is the source operand which is mcremented
- the tag comprises a plurality of bits indicative of one of storage locations 83 and a bit indicative of whether the associated value is an incremented operand value or a result value
- the tag assigned to the source operand is provided by reorder buffer 72
- Reorder buffer 72 is configured to broadcast incremented operands along with their respective tags upon result bus 78 when the increment is performed, such that instructions awaiting the incremented operand may receive the incremented value
- reorder buffer 72 If reorder buffer 72 is storing the source operand, then the source operand is conveyed to the unit receiving the instruction Additionally, the source operand is copied into the storage location which stores results for the corresponding instruction
- One of incrementor circuits 84 subsequently increments the source operand, producmg the incremented source operand which is to be stored when the associated mstruction is retired
- reorder buffer 72 is stormg a tag indicative of the source operand, that tag is stored mto the storage location of the correspondmg instruction
- results are provided upon result bus 78, those results are stored mto the indicated storage location and into the storage location of the instruction which increments the operand value
- Control unit 80 searches each storage location currently allocated when results are provided to determme if another instruction is to increment the result (which the instruction is concurrently using as a source operand) Similar to the above description, one of incrementor circuits 84 subsequently increments the supplied value to form the incremented operand value
- increment bus 90 includes a bit for each instruction which may be concurrently dispatched by instruction decode unit 66 The bit is indicative, when set, that the associated instruction increments a source operand
- the incremented source operand compnses the index operand Control unit 80 stores results received upon result bus 78 into the indicated storage locations within instruction buffer 82.
- reorder buffer 72 When an instruction is indicated to be the first instruction in program order and its results have been calculated, the results are stored into register file 74 and the instruction is deleted from instruction buffer 82.
- Other functions of reorder buffer 72 (such as discarding instructions when a mispredicted branch has been detected) have not been shown in Fig. 4. Such functions are well known, and any suitable mechanism for performing these functions may be employed by reorder buffer 72.
- incrementors 84 may be coupled between control unit 80 and instruction buffer 82, in one embodiment. Such an embodiment increments source operands as these are stored into the corresponding storage location.
- reorder buffer 72 which increment multiple source operands per instruction. Such a reorder buffer may be suitable for use with increment field 20 as shown in Fig. IC.
- Storage location 83 includes a plurality of fields for storing information related to a particular instruction. Fields included are a valid field 100, a result field 102, a result valid field 104, an increment field 106, an increment valid field 108, an incremented field 1 10, a destination register field 112, an increment tag and register field 1 14, and a miscellaneous field 116.
- Increment field 106 increment valid field 108, incremented field 1 10, and increment tag and register field 1 14 are provided to support the incrementing of a source operand by an instruction.
- Increment field 106 stores the incremented source operand.
- increment field 106 comprises 32 bits for storing a 32 bit incremented source operand.
- Increment valid field 108 is indicative of whether or not the instruction represented by storage location 83 increments a source operand.
- increment valid field 108 comprises a bit indicative, when set, that the instruction specifies incrementation of a source operand.
- Incremented field 110 indicates that the increment has been performed (i.e. that the source operand has been received and that one of incrementors circuits 84 have incremented the value).
- incremented field 1 10 comprises a bit indicative, when set, that increment field 106 has been incremented.
- increment tag and register field 114 stores a value indicative of the register corresponding to the source operand which is to be incremented, as well as the tag corresponding to the most recent update of the register.
- the indication of the register is used when the instruction is retired to identify the location within register file 74 to update.
- the tag is used by control unit 80 to compare against the results upon result bus 78, in order to capture the source operand, increment the operand, and store the operand in increment field 106.
- Valid field 100 comprises a bit indicative, when set, that the storage location is storing valid information.
- Result field 102 stores the result of executing the instruction.
- result field 102 comprises 32 bits.
- Result valid field 104 is indicative of the validity of result field 102.
- result valid field 104 comprises a bit indicative, when set, that result field 102 is storing a result (i.e. the corresponding instruction has been executed).
- the register into which the destination operand is to be stored is identified by destination register field 112, which comprises 3 bits in one embodiment The three bits identif y one of the eight x86 registers Miscellaneous information such as instruction type, an indication that the instruction is a mispredicted branch etc is stored in miscellaneous field 116
- an instruction may be retired if
- result vaiid field 104 indicates that the result of executing the instruction has been provided
- (2a) incremented field 1 10 indicates that the source operand has been incremented
- (2b) increment valid field 108 indicates that the instruction is not encoded for incrementing the source operand
- execute unit 68A which implements the incrementing of source operands
- Other execute units 68 may be configured similarly Execute units 68 configured in the manner of Fig 5 may be mcluded within microprocessor 42 as a second embodiment of microprocessor 42 which executes mstructions such as mstruction 10
- Execute unit 68A receives operands and control signals comprising a decoded mstruction upon operands bus 120 and control bus 122 Operands bus 120 and control bus 122 emanate from mstruction decode unit 66 or from reservation stations included within or near execute unit 68A
- Execute unit 68 A produces an mstruction result upon result 1 bus 78 A and produces an incremented source operand upon result2 bus 78B Result 1 bus 78A and result2 bus 78B form part of result bus 78
- Execute unit 68 A includes an arithmetic/logic unit (ALU) 124 which performs the arithmetic and logic operations for which execute unit 68A is configured
- ALU arithmetic/logic unit
- the result defined by the instruction is produced by ALU 124 by operating upon the operands provided on operands bus 120 under the control of control bus 122
- the result is conveyed upon result 1 bus 78 A, to which ALU 124 is coupled
- execute unit 68 A includes an incrementor circuit 126 for lncrementmg a source operand (I e the index operand in this embodiment) if an increment control signal withm control bus 122 is asserted
- the incremented source operand is conveyed upon result2 bus 78B, along with a tag mdicative of the reorder buffer storage location which is allocated to store the incremented source operand
- incrementor circuits similar to incrementor circuit 126 may be included other embodiments of execute unit 68 A These embodiments may be employed when the definition of increment field 20 as shown Fig IC is employed It is further noted that the embodiment shown in Fig 5 may be used with a reorder buffer 72 which is configured to accept more than one result from a particular execution unit durmg a clock cycle The instruction may be allocated two storage locations withm reorder buffer 72, or reorder buffer 72 may include storage for two results withm each storage location, similar to Fig 4A
- Register file 74 includes a register storage 130 including storage locations for each register with the microprocessor architecture employed by microprocessor 42 A result is conveyed to register file 74 upon result bus 78, and a register selection value indicative of the register to store the result is conveyed upon a register selection bus 132 Register storage 130 stores the result into the selected storage location. It is noted that result bus 78 and register selection bus 132 may be configured to convey multiple results and register selections concurrently. Additionally, register selections may be made for accessing operands, which are conveyed upon an operands bus 134 to requesting units.
- a plurality of incrementor circuits 136 may be included within register file 74 for incrementing source operands. Each of incrementor circuits 1 6 are coupled to one of the storage locations within register file. In one embodiment, an increment bus 138 is included for signalling register values which are to be incremented. The incrementor circuit 136 coupled to the storage location which is to be incremented increments the value stored therein and stores the incremented value into the storage location. It is noted that multiple storage locations may be selected for incrementation during a clock cycle.
- a register file such as register file 74 shown in Fig. 6 may be advantageously incorporated into a microprocessor which does not perform increments upon source operands, and enable the increment of source operands with minimal changes to the remainder of the microprocessor.
- a decode unit might be modified to detect the source operand increment field within the instruction and to transmit increment indications upon increment bus 138 to register file 74 after the original operand value is accessed from register file 74. Execute units would not need to be modified in this example, since the increment is performed by register file 74.
- microprocessor 42 Although the x86 instruction set and microprocessor architecture is used above in exemplary embodiments of microprocessor 42, the present invention is not limited to this instruction set. Embodiments comprising other microprocessor architectures are contemplated.
- microprocessor which executes instructions specifying increment of a source operand in addition to producing the result of the instruction.
- instruction sequences which use source operands as an address of a memory operand and then increment that operand may be shortened by including the increment in the instruction which accesses the memory operand.
- Instruction sequences which perform such manipulations often, such as DSP instruction sequences, may enjoy increased performance due to performance of the increment in parallel with the memory access and due to the lesser execution resources required to perform the operation.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US60587096A | 1996-02-23 | 1996-02-23 | |
US08/605,870 | 1996-02-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997031310A1 true WO1997031310A1 (en) | 1997-08-28 |
Family
ID=24425537
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1997/001089 WO1997031310A1 (en) | 1996-02-23 | 1997-01-23 | A microprocessor configured to execute instructions which specify increment of a source operand |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1997031310A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4240142A (en) * | 1978-12-29 | 1980-12-16 | Bell Telephone Laboratories, Incorporated | Data processing apparatus providing autoincrementing of memory pointer registers |
US4616313A (en) * | 1983-03-25 | 1986-10-07 | Tokyo Shibaura Denki Kabushiki Kaisha | High speed address calculation circuit for a pipeline-control-system data-processor |
EP0206653A2 (en) * | 1985-06-28 | 1986-12-30 | Hewlett-Packard Company | Method and means for loading and storing data in a reduced instruction set computer |
EP0227900A2 (en) * | 1985-12-02 | 1987-07-08 | International Business Machines Corporation | Three address instruction data processing apparatus |
EP0230038A2 (en) * | 1985-12-20 | 1987-07-29 | Nec Corporation | Address generation system |
US5261113A (en) * | 1988-01-25 | 1993-11-09 | Digital Equipment Corporation | Apparatus and method for single operand register array for vector and scalar data processing operations |
-
1997
- 1997-01-23 WO PCT/US1997/001089 patent/WO1997031310A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4240142A (en) * | 1978-12-29 | 1980-12-16 | Bell Telephone Laboratories, Incorporated | Data processing apparatus providing autoincrementing of memory pointer registers |
US4616313A (en) * | 1983-03-25 | 1986-10-07 | Tokyo Shibaura Denki Kabushiki Kaisha | High speed address calculation circuit for a pipeline-control-system data-processor |
EP0206653A2 (en) * | 1985-06-28 | 1986-12-30 | Hewlett-Packard Company | Method and means for loading and storing data in a reduced instruction set computer |
EP0227900A2 (en) * | 1985-12-02 | 1987-07-08 | International Business Machines Corporation | Three address instruction data processing apparatus |
EP0230038A2 (en) * | 1985-12-20 | 1987-07-29 | Nec Corporation | Address generation system |
US5261113A (en) * | 1988-01-25 | 1993-11-09 | Digital Equipment Corporation | Apparatus and method for single operand register array for vector and scalar data processing operations |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5829028A (en) | Data cache configured to store data in a use-once manner | |
US5611063A (en) | Method for executing speculative load instructions in high-performance processors | |
US6351804B1 (en) | Control bit vector storage for a microprocessor | |
US5649138A (en) | Time dependent rerouting of instructions in plurality of reservation stations of a superscalar microprocessor | |
EP0891583B1 (en) | A microprocessor configured to detect a subroutine call of a dsp routine and to direct a dsp to execute said routine | |
US7730285B1 (en) | Data processing system with partial bypass reorder buffer and combined load/store arithmetic logic unit and processing method thereof | |
EP0762270B1 (en) | Microprocessor with load/store operation to/from multiple registers | |
US5564056A (en) | Method and apparatus for zero extension and bit shifting to preserve register parameters in a microprocessor utilizing register renaming | |
US6167507A (en) | Apparatus and method for floating point exchange dispatch with reduced latency | |
US5864689A (en) | Microprocessor configured to selectively invoke a microcode DSP function or a program subroutine in response to a target address value of branch instruction | |
US5968162A (en) | Microprocessor configured to route instructions of a second instruction set to a second execute unit in response to an escape instruction | |
WO1997025669A1 (en) | Method and apparatus to translate a first instruction set to a second instruction set | |
JP2007515715A (en) | How to transition from instruction cache to trace cache on label boundary | |
US6247117B1 (en) | Apparatus and method for using checking instructions in a floating-point execution unit | |
EP0976028B1 (en) | A microprocessor configured to switch instruction sets upon detection of a plurality of consecutive instructions | |
US5721945A (en) | Microprocessor configured to detect a DSP call instruction and to direct a DSP to execute a routine corresponding to the DSP call instruction | |
JP3207124B2 (en) | Method and apparatus for supporting speculative execution of a count / link register change instruction | |
US5713039A (en) | Register file having multiple register storages for storing data from multiple data streams | |
US5768553A (en) | Microprocessor using an instruction field to define DSP instructions | |
US5812812A (en) | Method and system of implementing an early data dependency resolution mechanism in a high-performance data processing system utilizing out-of-order instruction issue | |
US5829031A (en) | Microprocessor configured to detect a group of instructions and to perform a specific function upon detection | |
US6370637B1 (en) | Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria | |
US20020120830A1 (en) | Data processor assigning the same operation code to multiple operations | |
US6336182B1 (en) | System and method for utilizing a conditional split for aligning internal operation (IOPs) for dispatch | |
US5758117A (en) | Method and system for efficiently utilizing rename buffers to reduce dispatch unit stalls in a superscalar processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 1997903935 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1997903935 Country of ref document: EP |
|
122 | Ep: pct application non-entry in european phase | ||
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP Ref document number: 97530155 Format of ref document f/p: F |