WO1997031310A1 - A microprocessor configured to execute instructions which specify increment of a source operand - Google Patents

A microprocessor configured to execute instructions which specify increment of a source operand

Info

Publication number
WO1997031310A1
WO1997031310A1 PCT/US1997/001089 US9701089W WO1997031310A1 WO 1997031310 A1 WO1997031310 A1 WO 1997031310A1 US 9701089 W US9701089 W US 9701089W WO 1997031310 A1 WO1997031310 A1 WO 1997031310A1
Authority
WO
Grant status
Application
Patent type
Prior art keywords
instruction
field
operand
increment
unit
Prior art date
Application number
PCT/US1997/001089
Other languages
French (fr)
Inventor
Thomas W. Lynch
Original Assignee
Advanced Micro Devices, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • G06F9/355Indexed addressing, i.e. using more than one address operand

Abstract

A microprocessor is provided which executes instructions specifying incrementation of a source operand in addition to the operation defined by the instructions. Several embodiments of the microprocessor are shown which include hardware for performing the increments. Loops of instructions which increment a register to access new operands within successive loop iterations may be performed more efficiently. Fewer instructions may be included within the loop by specifying increments within an instruction which performs another function (for example, a load of the operands). Having fewer instructions within the loop may lead to fewer execution units being used to execute an iteration of the loop. Subsequent iterations of the loop or instructions subsequent to the loop may be executed in the execution units not used by the more efficient loop. Therefore, performance of the microprocessor and of a computer system employing the microprocessor may be increased.

Description

TITLE A Microprocessor Configured to Execute Instructions Which Specify Increment of a Source Operand

BACKGROUND OF THE INVENTION

1 Field of the Invention

This invention relates to the field of microprocessors and, more particularly, to a microprocessor configured to execute instructions which specify incrementing of a source operand

2 Description of the Relevant Art

Computer systems employ one or more microprocessors, and often employ digital signal processors (DSPs) The DSPs are typically included within multimedia devices such as sound cards, speech recognition cards, video capture cards, etc. The DSPs function as coprocessors, performing complex and repetitive mathematical computations demanded by multimedia devices and other signal processing applications more efficiently than general purpose microprocessors. Microprocessors are typically optimized for performing integer operations upon values stored within a main memory of a computer system While DSPs perform many of the multimedia functions, the microprocessor manages the operation of the computer system.

Digital signal processors include execution units which comprise one or more arithmetic logic units

(ALUs) coupled to hardware multipliers which implement complex mathematical algorithms in a pipelined manner The instruction set primarily comprises DSP-type instructions (i.e. instructions optimized for the performance of complex mathematical operations) and also includes a small number of non-DSP mstructions. The non-DSP mstructions are in many ways similar to instructions executed by microprocessors, and are necessary for allowing the DSP to function independent of the microprocessor.

The DSP is typically optimized for mathematical algorithms such as correlation, convolution, finite impulse response (FIR) filters, infinite impulse response (IIR) filters, Fast Fourier Transforms (FFTs), matrix transformations, and inner products, among other operations. Implementations of these mathematical algorithms generally comprise long sequences of systematic arithmetic/multiplicative operations These operations are interrupted on various occasions by decision-type commands. In general, the DSP sequences are a repetition of a very small set of instructions that are executed 70% to 90% of the time The remaining 10% to 30% of the instructions are primarily boolean decision operations. Many of these mathematical algorithms perform a repetitive multiply and accumulate function in which a pair of operands are multiplied together and added to a third operand. The third operand is often used to store an accumulation of prior multiplications. Therefore, DSP hardware often includes hardware configured to quickly perform a multiply-add sequence. An exemplary DSP is the ADSP 2171 available from Analog Devices, Inc. of Norwood, Massachusetts.

As microprocessors continue to increase in performance due to increases in operating frequency and the number of transistors which may be included withm a smgle semiconductor substrate, it becomes desirable to perform certain DSP functions within the microprocessor. Instruction code written in the x86 instruction set, for example, may perform the mathematical operations that DSPs typically perform. Cost of the computer system may be reduced through the elimination of one or more DSPs while still performing equivalent functionality. Unfortunately, the instruction code written for the microprocessor may not be as efficient at performing the operations as DSP instruction code.

DSPs often operate upon a large number of operands stored in a memory. Therefore, DSPs are configured to access the memory operands very efficiently. Unfortunately, microprocessors are often not configured in this manner. For example, Table 1 below shows an exemplary instruction sequence for performing an inner product, written in the x86 instruction set. The x86 instruction set and microprocessor architecture is employed by many microprocessors due to its widespread acceptance in the computer industry.

Table 1 : Inner Product, x86 code

Instruction Comment

MOV ECX, num elements ECX = number of elements in vectors

MOV ESI, addr 1 ESI = address of first element of first vector

MOV EDI, addr2 EDI = address of first element of second vector

MOV EAX, 0 EAX = index into first vector

MOV EBX, 0 EBX = index into second vector

FLDZ zero the accumulated result

AGAIN:FLD [ESI+EAX*4] load element of first vector

INC EAX increment to next element

FLD [EDI+EBX*4] load element of second vector

INC EBX increment to next element

FMULP ST( 1 ), ST multiply elements

FADDP ST( 1 ), ST add product to accumulated result

DEC ECX Decrement element counter

JNE AGAIN Repeat if more elements

The last 8 instructions repetitively perform a multiply-add sequence upon two vectors of values stored in main memory. New operands are accessed by incrementing the EAX and EBX register values after each access. Although shown here for the inner product, similar multiply-add loops are used in other mathematical functions typically performed by DSP. It is desirable to improve the efficiency of the loop (i.e. to include fewer instructions within the loop while still performing the same function). Having fewer instructions will occupy fewer execution units for each iteration of the loop, which may improve performance by freeing execution units to execute other instructions (e.g. instructions from the next iteration of the loop). The performance increase may be considerable, since the loop is executed repetitively. SUMMARY OF THE INVENTION

The problems outlined above are in large part solved by a microprocessor in accordance with the present invention. The present microprocessor executes instructions which specify incrementation of a source operand in addition to the operation defined by the instructions. Several embodiments of the microprocessor are shown which include hardware for performing the increments. Advantageously, loops of instructions which increment a register to access new operands within successive loop iterations may be performed more efficiently. Fewer instructions may be included within the loop by specifying increments within an instruction which performs another function (for example, a load of the operands). Having fewer instructions within the loop may lead to fewer execution units being used to execute an iteration of the loop. Subsequent iterations of the loop or instructions subsequent to the loop may be executed in the execution units not used by the more efficient loop. Therefore, performance of the microprocessor and of a computer system employing the microprocessor may be increased.

Broadly speaking, the present invention contemplates a microprocessor configured to execute an instruction which specifies incrementing a source operand in addition to an instruction operation. The microprocessor comprises an instruction decode unit and a second unit. The instruction decode unit is configured to decode the instruction. Coupled to receive an indication of the instruction from the instruction decode unit, the second unit is configured to increment the source operand in response to the indication of the instruction. The increment of the source operand is performed in addition to an instruction operation defined by the instruction.

The present invention further contemplates a computer system comprising a microprocessor and a main memory. The microprocessor is configured to execute an instruction which specifies incrementing a source operand in addition to an instruction operation. Coupled to the microprocessor, the main memory is configured to store the instruction as well as other instructions executed by the microprocessor and operands to be operated upon by the microprocessor.

BRIEF DESCRIPTION OF THE DRAWINGS

Other objects and advantages of the invention will become apparent upon reading the following detailed description and upon reference to the accompanying drawings in which:

Fig. 1 is a diagram of an instruction showing instruction fields including an SIB field and an increment field.

Fig. 1 A is a diagram of one embodiment of the SIB field shown in Fig. 1.

Fig. 1 B is a diagram of one embodiment of the increment field shown in Fig. 1. Fig IC is a diagram of another embodiment of the increment field shown in Fig. 1

Fig. 2 is a block diagram of a computer system including a microprocessor

Fig. 3 is a block diagram of one embodiment of the microprocessor shown in Fig 2, including a plurality of execute units, a reorder buffer, and a register file

Fig 4 is a block diagram of one embodiment of the reorder buffer shown in Fig. 3

Fig. 4A is a diagram of the information stored in one embodiment of the reorder buffer shown in Fig. 4

Fig. 5 is a block diagram of one embodiment of an execute unit.

Fig. 6 is a block diagram of one embodiment of the register file shown in Fig. 3.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described m detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.

DETAILED DESCRIPTION OF THE INVENTION

Turning now to Fig 1 , a diagram of an instruction 10 is shown Instruction 10 is an improvement upon the x86 instruction format. More information regarding the x86 instruction format may be found in the publication: "PC Magazine Programmer's Technical Reference: the Processor and Coprocessor" by Hummel, Ziff-Davis Press, Emeryville, California, 1992. This publication is incorporated herein by reference m its entirety Instruction 10 includes a prefix field 12, an opcode field 14, a Mod R/M field 16, an SIB field 18, an increment field 20, a displacement field 22, and an immediate field 24.

Generally speaking, opcode field 14, Mod R M field 16, SIB field 18, and increment field 20 specify source and destination operands for instruction 10 Register operands may be specified by opcode field 14, Mod R M field 16, SIB field 18 or increment field 20 Memory operands may be specified by Mod R M field 16 and SIB field 18. In particular, SIB field 18 and increment field 20 specify a register operand containing a value used to form an address of a memory operand Furthermore, inclusion of increment field 20 specifies that the register operand be mcremented. Use of SIB field 18 and increment field 20 within the FLD mstructions of Table 1 allows the INC EAX and INC EBX instructions to be removed from the loop. Advantageously, the loop shown in Table 1 may be reduced from eight instructions to six mstructions The loop may be executed more efficiently, which may result in higher performance upon DSP functions by a microprocessor empioymg an instruction set including SIB field 18 and increment field 20

Opcode field 14 includes bits which identify a particular instruction withm the instruction set employed by a microprocessor In one embodiment, opcode field 14 may also specify up to one register operand Prefix field 12 may be used to modify the default operation specified by opcode field 14 For example, prefix field 12 may be encoded to change the operand or address size that an instruction is to operate upon Displacement field 22 specifies a constant displacement value which is used to form an address for a memory operand Immediate field 24 includes a constant value which is used directly as an operand of the instruction In one embodiment, the fields of mstruction 10 comprise bytes (wherein a byte is eight bits, or b ary digits) as listed in Table 2 below The number of bytes withm a field may vary from mstruction to mstruction within the ranges set out in Table 2

Table 2: Bytes in each Field of One Embodiment of Instruction 10

Field Number of Bvtes

Prefix 0-5

Opcode 1-2

Mod R/M 0-1

SIB 0-1

Increment 0-1

Displacement 0, 1, 2, 4

Immediate 0, 1, 2, 4

As used herein with respect to mstructions, a "field" comprises one or more bits which are logically grouped together and interpreted by a microprocessor according to the definition of the field For example, the bits of opcode field 14 are grouped together and decoded to determme which mstruction withm the mstruction set is to be executed An "operand" refers to a value operated upon during execution of the mstruction or produced by execution of the mstruction Values operated upon during execution of the mstruction are referred to as "source operands" Typically, source operands are not modified via execution of the mstruction However, when increment field 20 is mcluded in an instruction, a source operand may be mcremented The mcremented source operand is stored mto the storage location from which the source operand was drawn In one embodiment, a source operand which is used to form an address of a memory operand (wherem the memory operand is operated upon by the mstruction) is mcremented Values produced by execution of the mstruction are "result operands" or "results" Operands may be "register operands" wherem the operand is stored withm a register defined to exist withm the microprocessor Conversely, operands may be "memory operands" stored in a memory location which may be accessed by the microprocessor The address of the memory operand may be formed via values stored in registers, via a displacement value, or both

Turning now to Fig 1 A, one embodiment of SIB field 18 is shown SIB field 18 includes a scale field

30, an index field 32, and a base field 34 Index field 32 and base field 34 specify register operands which are used to form the address of a memory operand The mdex operand (1 e the operand specified by index field 32) is added to the base operand (1 e the operand specified by base field 34) to form the address Typically, the base register is held constant and the mdex register is modified to access various memory operands withm a particular memory range The register operand specified by the index field may be scaled (I e multiplied) by a scale factor specified by scale field 30 In one embodiment, scale field 30 comprises two bits encoding scaling factors of 1 , 2, 4, and 8 For this embodiment, base field 34 and index field 32 comprise three bits each

Index field 32 is encoded, according to one embodiment, as shown in Table 3 below One encodmg of index field 32 mdicates that mcrement field 20 is included withm the mstruction Increment field 20 specifies the index operand for that encodmg, and the mdex operand is mcremented when mstruction 10 is executed

Table 3: Index Field 32 Encoding

Value Index Operand

000 EAX

001 ECX

010 EDX

01 1 EBX

100 Increment Field

101 EBP

110 ESI

111 EDI

Turning to Fig IB, a pair of fields withm mcrement field 20 are shown accordmg to one embodiment As shown in Fig 1 B, mcrement field 20 includes an ιndex2 field 36, and a reserved field 38 Index2 field 36 compnses three bits, encoded as shown in Table 4 for this embodiment Reserved field 38 comprises the remaining 5 bits of mcrement field 20 These bits are not interpreted by a microprocessor which executes instruction 10 If the "no operand" encodmg is used, then the address formed by the microprocessor is undefined

Table 4: Index2 Field Encoding

Value Index Operand

000 EAX

001 ECX

010 EDX

01 1 EBX

100 no operand

101 EBP

1 10 ESI

1 1 1 EDI As an example, if a programmer wishes to code a memory operand specified by the address [Base+scale*index], then SIB field 18 is included in instruction 10. If the programmer wishes to code a memory operand specified by the address [Base+scale*index], index++ (wherein the index++ indication means mat the index register value is incremented subsequent to forming the address) then SIB field 18 and increment field 20 are included in instruction 10. More particularly according to one embodiment, a memory operand specified by the address [EAX+1*EBX] implies an SIB field 18 coded as hexadecimal 18. If a memory operand of [EAX+1*EBX], EBX++ is desired, then SIB field 18 is coded as hexadecimal 20 and increment field 20 is coded as hexadecimal 60.

Turning next to Fig. 1 C, a second embodiment of increment field 20 is shown. In this embodiment, increment field 20 includes a plurality of increment identification fields 39A-39D. Each increment identification field 39A-39D identifies an action to be taken upon the value of a particular register. In one particular example, increment identification field 39A is assigned to the EAX register; increment identification field 39B is assigned to the ECX register; increment identification field 39C is assigned to the ESI register; and increment identification field 39D is assigned to the EDI register. Various embodiments may include more or fewer increment identification fields 39, the increment identification fields may be assigned to different registers. The encoding of a particular increment identification field 39A-39D indicates the incrementation action to be taken upon the value stored in the corresponding register. In one embodiment, each increment identification field 39A-39D comprises a bit indicative, when set, that the corresponding register value should be incremented. When clear, the bit indicates that the corresponding register value should not be incremented.

In another embodiment, each increment identification field 39A-39D comprises two bits encoded as shown in Table 5 below. The embodiment shown in Table 5 advantageously allows for both increment and decrement of the corresponding register. For the embodiment shown, up to four registers may be specified for increment using increment field 20. Advantageously, multiple source operands may be incremented using a single instruction. For example, the INC EAX and INC EBX instructions may be removed from the loop shown in Table 1. The two increment instructions may be replaced by a single increment field 20 included within one of the remaining instructions in the loop.

It is noted that, in the discussion below, the term "increment" is used to refer to the modification of source operands under control of increment field 20. Either incrementing or decrementing source operands is contemplated, as shown in Table 5. Performing either increment or decrement operations in response to increment field 20 is within the spirit and scope of the present invention. Table 5: Encodings for Increment Fields 39A-39D

Value Increment Operation

00 No increment

01 Increment

10 Decrement

11 reserved

Turning now to Fig. 2, one embodiment of a computer system 40 is shown. Computer system 40 includes a microprocessor 42, a bus bridge 44, a main memory 46, and a plurality of input output (I O) devices 48A-48N (collectively referred to as I O devices 48). A system bus 50 couples microprocessor 42, bus bridge 44, and main memory 46. I O devices 48A-48N are coupled to bus bridge 44 via an I/O bus 52.

Generally speaking, microprocessor 42 includes circuitry for executing instructions such as instruction 10 shown in Fig. 1. In particular, microprocessor 42 includes hardware for executing an instruction to produce a result and further to concurrently increment a source register operand. Instructions which both produce a result and mcrement a source operand may be beneficial in a large number of programs, particularly DSP programs. Since DSP programs often include repetitive mathematical operations performed upon memory operands stored in a regular fashion within main memory 46 (such that the index operand used to form the memory address is often incremented after executing the mathematical operation), the number of instructions necessary to perform the DSP program may be advantageously reduced. The program may be smaller in size, and may execute more quickly due to the reduced number of instructions. Microprocessor 42 generally executes instructions and operates upon data stored in main memory 46, and may additionally communicate with I/O devices 48. In one embodiment, microprocessor 42 employs the x86 microprocessor architecture including the improved instruction encoding shown in Fig. 1.

Bus bridge 44 is provided to assist in communications between I/O devices 48 and devices coupled to system bus 50. I/O devices 48 typically require longer bus clock cycles than microprocessor 42 and other devices coupled to system bus 50. Therefore, bus bridge 44 provides a buffer between system bus 50 and input output bus 52. Additionally, bus bridge 44 translates transactions from one bus protocol to another. In one embodiment, input/output bus 52 is an Enhanced Industry Standard Architecture (EISA) bus and bus bridge 44 translates from the system bus protocol to the EISA bus protocol. In another embodiment, input output bus 52 is a Peripheral Component Interconnect (PCI) bus and bus bridge 44 translates from the system bus protocol to the PCI bus protocol. It is noted that many variations of system bus protocols exist. Microprocessor 42 may employ any suitable system bus protocol.

I O devices 48 provide an interface between computer system 10 and other devices external to the computer system. Exemplary I/O devices include a modem, a serial or parallel port, a sound card, etc. I/O devices 48 may also be referred to as peripheral devices. Main memory 46 stores data and instructions for use by microprocessor 42. In one embodiment, main memory 46 includes at least one Dynamic Random Access Memory (DRAM) cell and a DRAM memory controller.

It is noted that although computer system 40 as shown in Fig. 2 includes one microprocessor, other embodiments of computer system 40 may include multiple microprocessors similar to microprocessor 42.

Similarly, computer system 40 may include multiple bus bridges 44 for translating to multiple dissimilar or similar I/O bus protocols. Still further, a cache memory for enhancing the performance of computer system 40 by storing instructions and data referenced by microprocessor 42 in a faster memory storage may be included. The cache memory may be inserted between microprocessor 42 and system bus 50, or may reside on system bus 50 in a "lookaside" configuration.

It is still further noted that the present discussion may refer to the assertion of various signals. As used herein, a signal is "asserted" if it conveys a value indicative of a particular condition. Conversely, a signal is "deasserted" if it conveys a value indicative of a lack'of a particular condition. A signal may be defined to be asserted when it conveys a logical zero value or, conversely, when it conveys a logical one value.

Turning now to Fig. 3, a block diagram of one embodiment of microprocessor 42 is shown. Microprocessor 42 includes a bus interface unit 60, an instruction cache 62, a data cache 64, an instruction decode unit 66, a plurality of execute units including execute units 68 A and 68B, a load/store unit 70, a reorder buffer 72, and a register file 74. The plurality of execute units will be collectively referred to herein as execute units 68, and may include more execute units than execute units 68 A and 68B shown in Fig. 3. Additionally, an embodiment of microprocessor 42 may include one execute unit 68. Bus interface unit 60 is coupled to instruction cache 62, data cache 64, and system bus 50. Instruction cache 62 is coupled to instruction decode unit 66, which is further coupled to execute units 68, reorder buffer 72, and load/store unit 70. Reorder buffer 72, execute units 68, and load/store unit 70 are each coupled to a result bus 78 for forwarding of execution results. Load/store unit 70 is coupled to data cache 64.

Generally speaking, instruction decode unit 66 is configured to decode instructions, including instructions having increment field 20. When an instruction is detected which includes increment field 20, an indication that the index source operand is to be incremented in addition to producing the result defined by the opcode of the instruction is conveyed with the instruction. Another unit within microprocessor 42 performs the increment of the source operand. In one embodiment, reorder buffer 72 allocates a storage location for the incremented source operand and performs the increment. When the instruction's results are stored into register file 74, the incremented source operand is stored into register file 74 as well. In another embodiment, execute units 68 include hardware for performing the increment in parallel with executing the instruction. Both results are conveyed from the execute unit 68 to reorder buffer 72 simultaneously. In yet another embodiment, register file 74 performs the increment. Advantageously, microprocessor 42 is capable of executing instruction 10. Performance of many programs including DSP programs may be enhanced. Instruction cache 62 is a high speed cache memory for storing instructions. It is noted that instruction cache 62 may be configured into a fully associative, set-associative, or direct mapped configuration. Instruction cache 62 may additionally include a branch prediction mechanism for predicting branch instructions as either taken or not taken. Instructions are fetched from instruction cache 62 and conveyed to instruction decode unit 66 for decode and dispatch to an execute unit 68.

As noted above, instruction decode unit 66 decodes instructions. As used herein, "decoding" refers to transforming the instruction from the format shown as instruction 10 into a second format expected by execute units 68. Often, the second format comprises decoded control signals for controlling data flow elements such as adders and multiplexors in order to form the operation the instruction defines. In the embodiment shown, instruction decode unit 66 decodes each instruction fetched from instruction cache 62. Instruction decode unit 66 dispatches the instruction to execute units 68 and/or load/store unit 70. Instruction decode unit 66 also detects the register operands used by the instruction and requests these operands from reorder buffer 72 and register file 74. In one embodiment, execute units 68 are symmetrical execution units. Symmetrical execution units are each configured to execute a particular subset of the instruction set employed by microprocessor 42. The subsets of the instruction set executed by each of the symmetrical execution units are the same. In another embodiment, execute units 68 are asymmetrical execution units configured to execute dissimilar instruction subsets. For example, execute units 68 may include a branch execute unit for executing branch instructions, one or more arithmetic/logic units for executing arithmetic and logical instructions, and one or more floating point units for executing floating point instructions. Instruction decode unit 66 dispatches an instruction to an execute unit 68 or load/store unit 70 which is configured to execute that instruction.

Load/store unit 70 provides an interface between execute units 68 and data cache 64. Load and store memory operations are performed by load/store unit 70 to data cache 64. Additionally, memory dependencies between load and store memory operations are detected and handled by load/store unit 70.

Execute units 68 and load/store unit 70 may include one or more reservation stations for storing instructions whose operands have not yet been provided. An instruction is selected from those stored in the reservation stations for execution if: (1) the operands of the instruction have been provided, and (2) the instructions which are prior to the instruction being selected have not yet received operands. It is noted that a centralized reservation station may be included instead of separate reservation stations. The centralized reservation station is coupled between instruction decode unit 66, execute units 68, and load store unit 70. Such an embodiment may perform the dispatch function within the centralized reservation station.

Microprocessor 42 supports out of order execution, and employs reorder buffer 72 for storing execution results of speculatively executed instructions and for storing these results into register file 74 in program order; for performing dependency checking and register renaming; and for providing for mispredicted branch and exception recovery. When an instruction is decoded by instruction decode unit 66, requests for register operands are conveyed to reorder buffer 72 and register file 74. In response to the register operand requests, one of three values is transferred to the execute unit 68 and or load/store unit 70 which receives the mstruction- ( 1 ) the value stored in reorder buffer 72, if the value has been speculatively generated; (2) a tag identifying a location withm reorder buffer 72 which will store the result, if the value has not been speculatively generated; or (3) the value stored in the register within register file 74, if no instructions within reorder buffer 72 modify the register Additionally, a storage location withm reorder buffer 72 is allocated for stormg the results of the instruction being decoded by instruction decode unit 66. The storage location is identified by a tag, which is conveyed to the unit receiving the instruction. It is noted that, if more than one reorder buffer storage location is allocated for storing results corresponding to a particular register, the value or tag corresponding to the last result in program order is conveyed in response to a register operand request for that particular register. Tags and or operand values are conveyed upon an operand tags/value bus 92.

When execute units 68 or load/store unit 70 execute an instruction, the tag assigned to the mstruction by reorder buffer 72 is conveyed upon result bus 78 along with the result of the mstruction. Reorder buffer 72 stores the result in the indicated storage location. Additionally, execute units 68 and load/store unit 70 compare the tags conveyed upon result bus 78 with tags of operands for mstructions stored therein. If a match occurs, the unit captures the result from result bus 78 and stores it with the corresponding instruction In this manner, an mstruction may receive the operands it is intended to operate upon. Capturing results from result bus 78 for use by instructions is referred to as "result forwarding"

Instruction results are stored into register file 74 by reorder buffer 72 in program order. Stormg the results of an instruction and deleting the instruction from reorder buffer 72 is referred to as "retiring" the instruction. By retiring the instructions in program order, recovery from incorrect speculative execution may be performed. For example, if an instruction is subsequent to a branch instruction whose taken/not taken prediction is incorrect, then the instruction may be executed incorrectly. When a mispredicted branch instruction or an instruction which causes an exception is detected, reorder buffer 72 discards the instructions subsequent to the mispredicted branch instructions. Instructions thus discarded are also flushed from execute units 68, load/store unit 70. and instruction decode unit 66.

Register file 74 includes storage locations for each register defined by the microprocessor architecture employed by microprocessor 42. For example, microprocessor 42 may employ the x86 microprocessor architecture. For such an embodiment, register file 74 includes locations for storing the EAX, EBX, ECX, EDX, ESI, EDI, ESP, and EBP register values.

Data cache 64 is a high speed cache memory configured to store data to be operated upon by microprocessor 42. It is noted that data cache 64 may be configured into a fully associative, set-associative or direct-mapped configuration.

Bus interface unit 60 is configured to effect communication between microprocessor 42 and devices coupled to system bus 50 For example, instruction fetches which miss instruction cache 62 may be transferred from mam memory 46 by bus interface unit 60 Similarly, data requests performed by load/store unit 70 which miss data cache 64 may be transferred from ma memory 46 by bus interface unit 60 Additionally, data cache 64 may discard a cache line of data which has been modified by microprocessor 42 Bus interface unit 60 transfers the modified line to main memory 46

It is noted that instruction decode unit 66 may be configured to dispatch an instruction to more than one execution unit For example, in embodiments of microprocessor 42 which employ the x86 microprocessor architecture, certain instructions may operate upon memory operands Executing such an instruction involves transferring the memory operand from data cache 64, executing the instruction, and transferring the result to memory (if the destination operand is a memory location) Load/store unit 70 performs the memory transfers, and an execute unit 68 performs the execution of the mstruction It is further noted that instruction decode unit 66 may be configured to decode multiple instructions per clock cycle In one embodiment, mstruction decode unit 66 is configured to decode and dispatch up to one instruction per execute unit 68 and load/store unit 70 per clock cycle

Turning now to Fig 4, an embodiment of reorder buffer 72 which implements the incrementing of source operands is shown The embodiment shown in Fig 4 may be suitable for an embodiment of microprocessor 42 in which reorder buffer 72 is configured to increment source operands Reorder buffer 72 includes a control unit 80, an instruction buffer 82, and a plurality of incrementor circuits 84 (including incrementor circuits 84A, 84B, and 84C) Control unit 80 receives destination operands, source operands, and increment indications from instruction decode unit 66 upon destination operands bus 86, source operands bus 88, and increment bus 90, respectively

Additionally, results of executing instruction are received from execute units 68 and load/store unit 70 upon result bus 78 Operand tags and value are conveyed m response to source and destination operands upon operand tags/values bus 92

Instruction buffer 82 includes a plurality of storage locations 83 (such as storage locations 83A, 83B, and

83C) for stormg instruction execution results In one embodiment, instruction buffer 82 includes storage locations 83 having sufficient storage space for a destination value as well as an incremented source value Such an embodiment may thereby store modifications to each register operand modified by a particular instruction The information stored in each storage location 83 according to one embodiment is shown below in another embodiment, multiple source operands may be incremented (e g increment identification field 39 shown in Fig

IC) Storage locations 83 include storage sufficient for the multiple incremented source operands when employed in such embodiments

Destination operands bus 86 and source operands bus 88 transmit the source and destination register operands to control unit 80 Control unit 80 provides operand values or tags for source operands if those operands are stored within instruction buffer 82 Otherwise, the operand values are provided by register file 74 In addition, a storage location within instruction buffer 82 is allocated for each instruction being dispatched In one embodiment, tags for the destination operands are allocated according to the storage locations allocated The tag, therefore, identifies the storage location within instruction buffer 82 which is allocated to store the instruction result These tags and/or operand values are conveyed upon operand tags/values bus 92 to execute units 68 and load/store unit 70 The receiving units associate the operands with their respective instructions In one embodiment, control unit 80 maintains head and tail pointers indicating the first and last instructions (in program order) withm mstruction buffer 82 Tags are allocated using the tail pointer, and the tail pointer is modified to allocate the required number of storage locations As noted above, the tags are conveyed along with results upon result bus 78 when a unit executes an instruction The tag identifies the storage location to which stores the result

The increment indication associated with a particular instruction causes control unit 80 to allocate a tag for the source operand to be incremented, as well as for the destination operand for stormg execution results In this manner, tags are allocated for each register which is modified by the instruction The tag, therefore, additionally identifies whether a particular value is the result of an instruction or is the source operand which is mcremented In one embodiment, the tag comprises a plurality of bits indicative of one of storage locations 83 and a bit indicative of whether the associated value is an incremented operand value or a result value As with the tags for destination operands, if a subsequent instruction accesses the incremented source operand prior to performance of the mcrement, then the tag assigned to the source operand is provided by reorder buffer 72

Reorder buffer 72 is configured to broadcast incremented operands along with their respective tags upon result bus 78 when the increment is performed, such that instructions awaiting the incremented operand may receive the incremented value

If reorder buffer 72 is storing the source operand, then the source operand is conveyed to the unit receiving the instruction Additionally, the source operand is copied into the storage location which stores results for the corresponding instruction One of incrementor circuits 84 subsequently increments the source operand, producmg the incremented source operand which is to be stored when the associated mstruction is retired Alternatively, if reorder buffer 72 is stormg a tag indicative of the source operand, that tag is stored mto the storage location of the correspondmg instruction When results are provided upon result bus 78, those results are stored mto the indicated storage location and into the storage location of the instruction which increments the operand value Control unit 80 searches each storage location currently allocated when results are provided to determme if another instruction is to increment the result (which the instruction is concurrently using as a source operand) Similar to the above description, one of incrementor circuits 84 subsequently increments the supplied value to form the incremented operand value

If reorder buffer 72 is not storing information regarding a source operand, then register file 74 stores the source operand For this case, reorder buffer 72 captures the source operand as it is provided to the execute unit by register file 72 Reorder buffer 72 increments the source operand, and stores the incremented source operand withm mstruction buffer 82 In one embodiment, increment bus 90 includes a bit for each instruction which may be concurrently dispatched by instruction decode unit 66 The bit is indicative, when set, that the associated instruction increments a source operand In one particular embodiment, the incremented source operand compnses the index operand Control unit 80 stores results received upon result bus 78 into the indicated storage locations within instruction buffer 82. When an instruction is indicated to be the first instruction in program order and its results have been calculated, the results are stored into register file 74 and the instruction is deleted from instruction buffer 82. Other functions of reorder buffer 72 (such as discarding instructions when a mispredicted branch has been detected) have not been shown in Fig. 4. Such functions are well known, and any suitable mechanism for performing these functions may be employed by reorder buffer 72. It is noted that incrementors 84 may be coupled between control unit 80 and instruction buffer 82, in one embodiment. Such an embodiment increments source operands as these are stored into the corresponding storage location.

Although the above description discusses a single source operand being incremented per instruction, it is noted that embodiments of reorder buffer 72 are contemplated which increment multiple source operands per instruction. Such a reorder buffer may be suitable for use with increment field 20 as shown in Fig. IC.

Turning now to Fig. 4A, one embodiment of a storage location 83 within instruction buffer 82 is shown. Storage location 83 includes a plurality of fields for storing information related to a particular instruction. Fields included are a valid field 100, a result field 102, a result valid field 104, an increment field 106, an increment valid field 108, an incremented field 1 10, a destination register field 112, an increment tag and register field 1 14, and a miscellaneous field 116.

Increment field 106, increment valid field 108, incremented field 1 10, and increment tag and register field 1 14 are provided to support the incrementing of a source operand by an instruction. Increment field 106 stores the incremented source operand. In one embodiment, increment field 106 comprises 32 bits for storing a 32 bit incremented source operand. Increment valid field 108 is indicative of whether or not the instruction represented by storage location 83 increments a source operand. In one embodiment, increment valid field 108 comprises a bit indicative, when set, that the instruction specifies incrementation of a source operand.

Incremented field 110 indicates that the increment has been performed (i.e. that the source operand has been received and that one of incrementors circuits 84 have incremented the value). In one embodiment, incremented field 1 10 comprises a bit indicative, when set, that increment field 106 has been incremented. Finally, increment tag and register field 114 stores a value indicative of the register corresponding to the source operand which is to be incremented, as well as the tag corresponding to the most recent update of the register. The indication of the register is used when the instruction is retired to identify the location within register file 74 to update. The tag is used by control unit 80 to compare against the results upon result bus 78, in order to capture the source operand, increment the operand, and store the operand in increment field 106.

Valid field 100 comprises a bit indicative, when set, that the storage location is storing valid information.

Result field 102 stores the result of executing the instruction. In one embodiment, result field 102 comprises 32 bits. Result valid field 104 is indicative of the validity of result field 102. In one embodiment, result valid field 104 comprises a bit indicative, when set, that result field 102 is storing a result (i.e. the corresponding instruction has been executed). The register into which the destination operand is to be stored is identified by destination register field 112, which comprises 3 bits in one embodiment The three bits identify one of the eight x86 registers Miscellaneous information such as instruction type, an indication that the instruction is a mispredicted branch etc is stored in miscellaneous field 116

According to the embodiment of storage location 83 shown in Fig 4A, an instruction may be retired if

( 1 ) result vaiid field 104 indicates that the result of executing the instruction has been provided, and (2a) incremented field 1 10 indicates that the source operand has been incremented, or (2b) increment valid field 108 indicates that the instruction is not encoded for incrementing the source operand

Turning now to Fig 5, an embodiment of execute unit 68A which implements the incrementing of source operands is shown Other execute units 68 may be configured similarly Execute units 68 configured in the manner of Fig 5 may be mcluded within microprocessor 42 as a second embodiment of microprocessor 42 which executes mstructions such as mstruction 10 Execute unit 68A receives operands and control signals comprising a decoded mstruction upon operands bus 120 and control bus 122 Operands bus 120 and control bus 122 emanate from mstruction decode unit 66 or from reservation stations included within or near execute unit 68A Execute unit 68 A produces an mstruction result upon result 1 bus 78 A and produces an incremented source operand upon result2 bus 78B Result 1 bus 78A and result2 bus 78B form part of result bus 78

Execute unit 68 A includes an arithmetic/logic unit (ALU) 124 which performs the arithmetic and logic operations for which execute unit 68A is configured The result defined by the instruction is produced by ALU 124 by operating upon the operands provided on operands bus 120 under the control of control bus 122 The result is conveyed upon result 1 bus 78 A, to which ALU 124 is coupled Additionally, execute unit 68 A includes an incrementor circuit 126 for lncrementmg a source operand (I e the index operand in this embodiment) if an increment control signal withm control bus 122 is asserted The incremented source operand is conveyed upon result2 bus 78B, along with a tag mdicative of the reorder buffer storage location which is allocated to store the incremented source operand

It is noted that multiple incrementor circuits similar to incrementor circuit 126 may be included other embodiments of execute unit 68 A These embodiments may be employed when the definition of increment field 20 as shown Fig IC is employed It is further noted that the embodiment shown in Fig 5 may be used with a reorder buffer 72 which is configured to accept more than one result from a particular execution unit durmg a clock cycle The instruction may be allocated two storage locations withm reorder buffer 72, or reorder buffer 72 may include storage for two results withm each storage location, similar to Fig 4A

Turning now to Fig 6, an embodiment of register file 74 which may be employed within yet another embodiment of microprocessor 42 is shown Register file 74 includes a register storage 130 including storage locations for each register with the microprocessor architecture employed by microprocessor 42 A result is conveyed to register file 74 upon result bus 78, and a register selection value indicative of the register to store the result is conveyed upon a register selection bus 132 Register storage 130 stores the result into the selected storage location. It is noted that result bus 78 and register selection bus 132 may be configured to convey multiple results and register selections concurrently. Additionally, register selections may be made for accessing operands, which are conveyed upon an operands bus 134 to requesting units.

A plurality of incrementor circuits 136 (including incrementor circuits 136A, 136B, and 136C) may be included within register file 74 for incrementing source operands. Each of incrementor circuits 1 6 are coupled to one of the storage locations within register file. In one embodiment, an increment bus 138 is included for signalling register values which are to be incremented. The incrementor circuit 136 coupled to the storage location which is to be incremented increments the value stored therein and stores the incremented value into the storage location. It is noted that multiple storage locations may be selected for incrementation during a clock cycle.

A register file such as register file 74 shown in Fig. 6 may be advantageously incorporated into a microprocessor which does not perform increments upon source operands, and enable the increment of source operands with minimal changes to the remainder of the microprocessor. For example, a decode unit might be modified to detect the source operand increment field within the instruction and to transmit increment indications upon increment bus 138 to register file 74 after the original operand value is accessed from register file 74. Execute units would not need to be modified in this example, since the increment is performed by register file 74.

Although the x86 instruction set and microprocessor architecture is used above in exemplary embodiments of microprocessor 42, the present invention is not limited to this instruction set. Embodiments comprising other microprocessor architectures are contemplated.

In accordance with the above disclosure, a microprocessor has been described which executes instructions specifying increment of a source operand in addition to producing the result of the instruction.

Advantageously, instruction sequences which use source operands as an address of a memory operand and then increment that operand may be shortened by including the increment in the instruction which accesses the memory operand. Instruction sequences which perform such manipulations often, such as DSP instruction sequences, may enjoy increased performance due to performance of the increment in parallel with the memory access and due to the lesser execution resources required to perform the operation.

Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.

Claims

WHAT IS CLAIMED IS:
1. A microprocessor configured to execute an instruction which specifies modifying a source operand in addition to an instruction operation, comprising:
an instruction decode unit configured to decode said instruction; and
a second unit coupled to receive an indication of said instruction from said instruction decode unit, wherein said second unit is configured to modify said source operand in response to said indication of said instruction, and wherein modification of said source operand is performed in addition to an instruction operation defmed by said instruction.
2. The microprocessor as recited in claim 1 wherein said source operand comprises an index operand.
3. The microprocessor as recited in claim 2 wherein said instruction further comprises a base operand, and wherein said microprocessor is configured to add said index operand to said base operand to form the address of a memory operand for use by said instruction.
4. The microprocessor as recited in claim 1 wherein said second unit comprises an execution unit configured to execute said instruction.
5. The microprocessor as recited in claim 4 wherein said execution unit comprises an incrementor circuit configured to increment said source operand.
6. The microprocessor as recited in claim 5 wherein said execution unit further comprises an arithmetic/logic unit configured to perform operations defined by said instruction.
7. The microprocessor as recited in claim 1 wherein said second unit comprises a reorder buffer configured to store instruction execution results until said results are retired.
8. The microprocessor as recited in claim 7 wherein said reorder buffer comprises an incrementor circuit coupled to receive said source operand and to produce an incremented source operand
9. The microprocessor as recited in claim 8 wherein said reorder buffer further comprises an instruction buffer configured to store a plurality of results including said incremented source operand.
10. The microprocessor as recited in claim 1 wherein said second unit comprises a register file including a plurality of storage locations configured to store instruction execution results.
11. The microprocessor as recited in claim 10 wherein said register file comprises an incrementor circuit coupled to increment one of said instruction execution results in response to said instruction.
12. The microprocessor as recited in claim 1 1 wherein said register file is coupled to receive an increment indication asserted in response to said instruction.
13. The microprocessor as recited in claim 12 wherein said incrementor circuit is configured to perform an increment according to said increment indication.
14. The microprocessor as recited in claim 1 wherein said second unit is configured to increment multiple source operands in response to said instruction, wherein said instruction specifies increment of multiple source operands.
15. A computer system, comprising:
a microprocessor configured to execute an instruction which specifies incrementing a source operand in addition to an instruction operation; and
a main memory coupled to said microprocessor, wherein said main memory is configured to store said instruction as well as other instructions executed by said microprocessor and operands to be operated upon by said microprocessor.
16. The computer system as recited in claim 15, wherein said microprocessor comprises:
an instruction decode unit configured to decode said instruction; and
a second unit coupled to receive said instruction from said instruction decode unit, wherein said second unit is configured to increment said source operand in response to said instruction, and wherein increment of said source operand is performed in addition to an instruction operation defmed by said instruction.
PCT/US1997/001089 1996-02-23 1997-01-23 A microprocessor configured to execute instructions which specify increment of a source operand WO1997031310A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US60587096 true 1996-02-23 1996-02-23
US08/605,870 1996-02-23

Publications (1)

Publication Number Publication Date
WO1997031310A1 true true WO1997031310A1 (en) 1997-08-28

Family

ID=24425537

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/001089 WO1997031310A1 (en) 1996-02-23 1997-01-23 A microprocessor configured to execute instructions which specify increment of a source operand

Country Status (1)

Country Link
WO (1) WO1997031310A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4240142A (en) * 1978-12-29 1980-12-16 Bell Telephone Laboratories, Incorporated Data processing apparatus providing autoincrementing of memory pointer registers
US4616313A (en) * 1983-03-25 1986-10-07 Tokyo Shibaura Denki Kabushiki Kaisha High speed address calculation circuit for a pipeline-control-system data-processor
EP0206653A2 (en) * 1985-06-28 1986-12-30 Hewlett-Packard Company Method and means for loading and storing data in a reduced instruction set computer
EP0227900A2 (en) * 1985-12-02 1987-07-08 International Business Machines Corporation Three address instruction data processing apparatus
EP0230038A2 (en) * 1985-12-20 1987-07-29 Nec Corporation Address generation system
US5261113A (en) * 1988-01-25 1993-11-09 Digital Equipment Corporation Apparatus and method for single operand register array for vector and scalar data processing operations

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4240142A (en) * 1978-12-29 1980-12-16 Bell Telephone Laboratories, Incorporated Data processing apparatus providing autoincrementing of memory pointer registers
US4616313A (en) * 1983-03-25 1986-10-07 Tokyo Shibaura Denki Kabushiki Kaisha High speed address calculation circuit for a pipeline-control-system data-processor
EP0206653A2 (en) * 1985-06-28 1986-12-30 Hewlett-Packard Company Method and means for loading and storing data in a reduced instruction set computer
EP0227900A2 (en) * 1985-12-02 1987-07-08 International Business Machines Corporation Three address instruction data processing apparatus
EP0230038A2 (en) * 1985-12-20 1987-07-29 Nec Corporation Address generation system
US5261113A (en) * 1988-01-25 1993-11-09 Digital Equipment Corporation Apparatus and method for single operand register array for vector and scalar data processing operations

Similar Documents

Publication Publication Date Title
US5737629A (en) Dependency checking and forwarding of variable width operands
US5802339A (en) Pipeline throughput via parallel out-of-order execution of adds and moves in a supplemental integer execution unit
US5699537A (en) Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions
US6301655B1 (en) Exception processing in asynchronous processor
US6157996A (en) Processor programably configurable to execute enhanced variable byte length instructions including predicated execution, three operand addressing, and increased register space
US5598546A (en) Dual-architecture super-scalar pipeline
US6826704B1 (en) Microprocessor employing a performance throttling mechanism for power management
US6151662A (en) Data transaction typing for improved caching and prefetching characteristics
US5958047A (en) Method for precise architectural update in an out-of-order processor
US6625723B1 (en) Unified renaming scheme for load and store instructions
US6711667B1 (en) Microprocessor configured to translate instructions from one instruction set to another, and to store the translated instructions
US5822575A (en) Branch prediction storage for storing branch prediction information such that a corresponding tag may be routed with the branch instruction
US6279105B1 (en) Pipelined two-cycle branch target address cache
US5826074A (en) Extenstion of 32-bit architecture for 64-bit addressing with shared super-page register
US6434689B2 (en) Data processing unit with interface for sharing registers by a processor and a coprocessor
US5781790A (en) Method and apparatus for performing floating point to integer transfers and vice versa
US5295249A (en) Compounding preprocessor for cache for identifying multiple instructions which may be executed in parallel
US6253316B1 (en) Three state branch history using one bit in a branch prediction mechanism
US5901301A (en) Data processor and method of processing data
US6101595A (en) Fetching instructions from an instruction cache using sequential way prediction
US6192465B1 (en) Using multiple decoders and a reorder queue to decode instructions out of order
US5553256A (en) Apparatus for pipeline streamlining where resources are immediate or certainly retired
US5867725A (en) Concurrent multitasking in a uniprocessor
US5838984A (en) Single-instruction-multiple-data processing using multiple banks of vector registers
US5754878A (en) CPU with DSP function preprocessor having pattern recognition detector that uses table for translating instruction sequences intended to perform DSP function into DSP macros

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1997903935

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1997903935

Country of ref document: EP

122 Ep: pct application non-entry in european phase
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

Ref document number: 97530155

Format of ref document f/p: F