US20020112142A1 - Implementation of a conditional move instruction in an out-of-order processor - Google Patents

Implementation of a conditional move instruction in an out-of-order processor Download PDF

Info

Publication number
US20020112142A1
US20020112142A1 US09/195,121 US19512198A US2002112142A1 US 20020112142 A1 US20020112142 A1 US 20020112142A1 US 19512198 A US19512198 A US 19512198A US 2002112142 A1 US2002112142 A1 US 2002112142A1
Authority
US
United States
Prior art keywords
instruction
instructions
generated
conditional move
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/195,121
Other versions
US6449713B1 (en
Inventor
Joel Springer Emer
Bruce Edwards
Daniel Lawrence Leibholz
Edward J. McLellan
Derrick R. Meyer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Enterprise Development LP
Compaq Information Technologies Group LP
Original Assignee
Compaq Information Technologies Group LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US09/195,121 priority Critical patent/US6449713B1/en
Application filed by Compaq Information Technologies Group LP filed Critical Compaq Information Technologies Group LP
Assigned to COMPAQ COMPUTER CORPORATION reassignment COMPAQ COMPUTER CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EDWARDS, BRUCE, EMER, JOEL SPRINGER, MEYER, DERRICK R., MCLELLAN, EDWARD J.
Assigned to DIGITAL EQUIPMENT CORPORATION reassignment DIGITAL EQUIPMENT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LEIBHOLZ, DANIEL LAWRENCE
Assigned to COMPAQ COMPUTER CORPORATION reassignment COMPAQ COMPUTER CORPORATION MERGER (SEE DOCUMENT FOR DETAILS). Assignors: DIGITAL EQUIPMENT CORPORATION
Assigned to COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P. reassignment COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: COMPAQ COMPUTER CORPORATION
Publication of US20020112142A1 publication Critical patent/US20020112142A1/en
Publication of US6449713B1 publication Critical patent/US6449713B1/en
Application granted granted Critical
Assigned to LASALLE BANK, N.A. reassignment LASALLE BANK, N.A. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SLOAN VALVE COMPANY
Assigned to LASALLE BANK, N.A. reassignment LASALLE BANK, N.A. SECURITY AGREEMENT Assignors: SLOAN VALVE COMPANY
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Anticipated expiration legal-status Critical
Assigned to SLOAN VALVE COMPANY reassignment SLOAN VALVE COMPANY TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS, TRADEMARKS AND TRADENAMES Assignors: BANK OF AMERICA, N.A. (AS SUCCESSOR-IN-INTEREST TO LASALLE BANK)
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30094Condition code generation, e.g. Carry, Zero flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30072Arrangements for executing specific machine instructions to perform conditional operations, e.g. using predicates or guards
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30087Synchronisation or serialisation instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros

Definitions

  • the present invention relates generally to data processing and in particular to techniques for processing a conditional move instruction within a data processor.
  • conditional move instruction instructs a processor to test whether a particular condition exists (e.g., whether a particular register stores zero), and to move information into a destination register if the particular condition exists. If the
  • CMOVXX indicates that the instruction is a conditional move instruction that tests for a condition “XX”.
  • S_R A and “S_R B ” are source operands that respectively identify registers R A and R B .
  • D_R C is a destination operand that identifies register R C .
  • the processor determines whether a condition XX exists involving physical register R A (e.g., whether physical register R A stores zero). If the condition XX exists, the processor moves the contents of physical register R B into physical register R C . Otherwise, the processor leaves the original contents of physical register R C unaltered.
  • instruction source and destination operands typically identify logical registers instead of the physical registers directly.
  • the out-of-order processor maps these logical registers to physical processor registers just before instruction execution such that the result of each instruction is stored in a new physical register. This approach enables the processor to avoid problems when executing instructions out of program order (e.g., read-after-write data hazards).
  • the pseudo-code for executing a CMOVXX instruction in an out-of-order processor is therefore somewhat more complex.
  • the out-of-order processor maps logical register R A to physical register R A1 , logical register R B to physical register R B1 , and logical register R C to physical register R C1 .
  • the out-of-order processor maps logical register R C to physical register R C2 (an new physical register).
  • the pseudo-code for executing the CMOVXX instruction in such a processor is therefore as follows:
  • the out-of-order processor determines whether a condition XX exists involving physical register R A1 (logical register R A ). If the condition XX exists, the processor moves the contents of physical register R B1 (logical register R B ) into physical register R C2 (to which logical register R C presently is mapped). As such, the contents of logical register R B are stored in logical register R C . If the condition XX does not exist, the processor moves the contents of physical register R C1 (to which logical register R C previously was mapped) into physical register R C2 such that a programmer perceives the contents of logical register R C as remaining unaltered.
  • an execution circuit (or unit) of the processor receives instruction data through input ports, and executes the instruction according to the instruction data.
  • an execution unit of an in-order processor may execute the conditional move instruction:
  • R A , R B and R C refer to physical registers within the in-order processor.
  • the execution unit requires only two input ports: a first port to receive the contents of physical register R A , and a second port to receive the contents of physical register R B .
  • an execution unit of an out-of-order processor executes the CMOVXX instruction according to the following pseudo-code:
  • R A1 , R B1 , R C1 and R C2 refer to physical registers within the in-order processor.
  • the out of order execution unit requires three input ports: a first port to receive the contents of physical register R A1 , a second port to receive the contents of physical register R B1 , and a third port to receive the contents of physical register R C1 .
  • processors that uses three input ports to execute instructions.
  • such a processor would require substantial semiconductor resources (e.g., a disproportionately large area for input port routing).
  • processors typically use no more than two input ports to execute non-conditional move instructions. Accordingly, processor designers generally prefer to limit the number of input ports for each instruction to no more than two.
  • CMOVXX instruction within an out-of-order processor uses three input ports.
  • an embodiment of the present invention is directed to a technique for handling a conditional move instruction in an out-of-order data processor.
  • the technique involves detecting a conditional move instruction within an instruction stream, and generating multiple instructions according to the detected conditional move instruction.
  • the technique further involves replacing the conditional move instruction within the instruction stream with the generated multiple instructions.
  • each of the generated multiple instructions executes using no more than two input ports. As such, it is unnecessary for the processor to use three input ports to execute the instructions.
  • the generation of multiple instructions preferably involves providing a first generated instruction that determines whether a condition exists, and providing a second generated instruction that performs a move operation based on whether the condition exists.
  • the second generated instruction performs a first move operation when the condition is determined to exist, and a second move operation when the condition is determined not to exist.
  • the first move operation loads a new physical register with contents from a specified source register so that, from a programmer's perspective, the processor alters a logical register mapped to the new physical register.
  • the second move operation loads the new physical register with contents of a previously used physical register (to which the logical register was previously mapped) so that, from the programmer's perspective, the processor leaves the logical register unaltered.
  • Instruction generation may involve providing a first generated instruction that produces a condition result, and providing a second generated instruction that (i) inputs the condition result from a first portion of a register that is separate from a second portion that stores standard contents of the register, and (ii) performs an operation according to the first portion.
  • the mechanisms for storing the condition result and the standard contents are treated as a single entity (e.g., a register with an extra bit field to store the condition result) rather than as separate registers.
  • the same circuitry for addressing and accessing the standard portion of the registers can be used to address and access the condition field.
  • This feature allows the processor to transfer the condition result through one of two existing input ports alleviating the need for a third input port to carry the condition result.
  • the processor includes a register file containing instruction registers, each of which has a standard field and a condition field.
  • instructions Prior to detecting the conditional move instruction, instructions may be loaded from memory in groups (e.g., fetch blocks).
  • the technique may further involve retrieving a first group of instructions from a memory during a first fetch period, the first group of instructions including the conditional move instruction.
  • Such a retrieval enables instructions to be loaded using less retrieve operations than loading instructions individually.
  • the technique may further include retrieving a second group of instructions from the memory during a second fetch period, the second group following the first group within the instruction stream.
  • the technique may involve retrieving the second group of instructions from the memory again during a third fetch period while the multiple instructions are generated simultaneously. This feature provides an optimization in the sense that, retrieval of the second group of instructions during the third fetch period will make the second group of instructions available at a convenient point in the pipeline to receive one of the generated multiple instructions.
  • the technique may involve overwriting the conditional move instruction in the retrieved first group of instructions with one of the generated multiple instructions, and overwriting an instruction following the conditional move instruction in the retrieved first group of instructions with another of the generated multiple instructions.
  • the instruction following the conditional move instruction is preferably a blank instruction that performs no operation when executed. Accordingly, the processor simply modifies the fetch block containing the conditional move instruction without affecting a subsequent fetch block.
  • FIG. 1 is a block diagram of an instruction pipeline for a data processor in which the present invention may be used.
  • FIG. 2 is a block diagram of a portion of an instruction fetch stage of FIG. 1 that detects a conditional move instruction within an instruction stream.
  • FIG. 3 is a block diagram of a portion of the instruction fetch stage of FIG. 1 that generates multiple instructions according to the detected conditional move instruction, and replaces the conditional move instruction within the instruction stream with the generated multiple instructions.
  • FIG. 4A is a block diagram of instructions before and after being handled in a first manner by the instruction fetch stage of FIG. 1.
  • FIG. 4B is a block diagram of instructions before and after being handled in a second manner by the instruction fetch stage of FIG. 1.
  • FIG. 5 is a flow diagram of a procedure performed by the instruction fetch stage of FIG. 1.
  • FIG. 6 is a block diagram of execution circuitry within an instruction execution stage of FIG. 1 that executes the generated multiple instructions.
  • the present invention involves detecting a conditional move instruction within an instruction stream, and replacing it with multiple replacement instructions such that a data processor processing the instruction stream executes the multiple replacement instructions rather than the original conditional move instruction.
  • the data processor uses no more than two input ports when executing each of the multiple instructions so that additional processor resources (e.g. a third input port for each instruction) are unnecessary.
  • the invention is preferably used in an instruction pipeline of a speculative execution out-of-order data processor such as the pipeline 10 shown in FIG. 1.
  • the pipeline 10 has a series of stages including an instruction fetch stage 12 , an instruction slot stage 14 , an instruction map stage 16 , an instruction issue/queue stage 18 , an instruction read stage 20 , an instruction execution stage 22 , an instruction write stage 24 , and an instruction retire stage 26 .
  • the pipeline 10 processes a stream of instructions 28 .
  • the instruction fetch stage 12 retrieves the instructions from memory.
  • the instruction slot stage 14 determines to which execution unit the instructions should be sent, e.g., a floating point unit or an integer unit (not shown).
  • the instruction map stage 16 maps the instructions such that the instructions refer to physical registers rather than logical registers.
  • the instruction issue/queue stage 18 queues the instructions for execution.
  • the instruction read stage 20 reads data used by the instructions from the physical registers.
  • the instruction execution stage 22 executes the instructions.
  • the instruction write stage 24 stores results of the executed instructions in the physical registers.
  • the instruction retire stage 26 retires the instructions by committing the processor state to the results of the instructions.
  • FIG. 2 shows a circuit portion 30 of the instruction fetch stage 12 that retrieves the instructions 28 from a memory 32 (e.g., main memory or a second-level cache), and temporarily stores the retrieved instructions 28 in an instruction cache (or ICACHE) 34 .
  • the circuit portion 30 includes a program counter circuit 36 and a detect circuit 38 .
  • the program counter circuit 36 provides program counter information (e.g., a FILL PC pointer) identifying locations within the memory 32 that store instructions to be retrieved.
  • the detect circuit 38 reads instructions from the memory 32 based on the program counter information, scans the retrieved instructions for any conditional move instructions (e.g., CMOVXX), and stores the instructions and scan results in the instruction cache 34 .
  • CMOVXX conditional move instructions
  • the detect circuit 38 groups the instructions into fetch blocks (e.g., fetch block 40 ), generates a conditional move code for each fetch block (e.g., conditional move code 42 ) indicating the locations of any conditional move instructions within that fetch block, and stores each fetch block and its corresponding conditional move code as an entry of the instruction cache 34 (e.g., entry 44 ).
  • conditional move code 42 has the binary value “0100” to indicate that the second instruction of fetch block 40 is a conditional move instruction, as shown in FIG. 2.
  • Each entry within the instruction cache 34 further includes cache tag information (e.g., TAG) indicating whether that entry is valid or invalid.
  • cache tag information e.g., TAG
  • TAG cache tag information
  • FIG. 3 shows a circuit portion 50 of the instruction fetch stage 12 that provides instructions from the instruction cache 34 to other circuits in the pipeline 10 . If a conditional move instruction exists within the instructions, the circuit portion 50 generates multiple instructions according to the conditional move instruction, and replaces the conditional move instruction with the generated multiple instructions.
  • the circuit portion 50 includes a PC latch 52 , a PC multiplexer 54 , a PC silo 56 , an instruction latch 58 , an instruction sequencer 60 , an instruction counter 62 , and conditional move logic (or CMOVXX logic) 64 .
  • the PC latch 52 , the PC multiplexer 54 and PC silo 56 (hereinafter generally referred to as PC circuitry) operate to provide program counter information identifying instruction cache entries (e.g., entry 44 ) to be transferred out of the instruction cache 34 .
  • the instruction latch 58 holds the fetch blocks from the identified entries, and provides them to the CMOVXX logic 64 .
  • the instruction sequencer 60 retrieves the corresponding conditional move codes from the identified entries, and controls the operation of the PC circuitry and the CMOVXX logic 64 based on the retrieved conditional move codes.
  • the instruction sequencer 60 signals the CMOVXX logic 64 simply to pass the fetch block from the instruction latch 58 to circuits further down the pipeline 10 (e.g., a register mapper 68 ).
  • the instruction sequencer 60 signals the PC circuitry to continue providing a program counter signal (NEXT PC) received on an input 72 of the PC multiplexer 54 so that another entry of the instruction cache 34 can be identified for transfer.
  • a program counter signal NXT PC
  • the instruction sequencer 60 signals the CMOVXX logic 64 (i) to generate multiple instructions (i.e., CMOV 1 XX and CMOV 2 XX), and (ii) to replace the conditional move instruction with the generated multiple instructions.
  • the CMOVXX logic 64 forms two copies of the fetch block (e.g., fetch block 40 ) containing the conditional move instruction.
  • the CMOVXX logic 64 In the first copy (fetch block 74 A), the CMOVXX logic 64 overwrites the conditional move instruction with one of the multiple instructions (CMOV 1 XX), and invalidates any instructions in the first copy that follow the conditional move instruction. In the second copy (fetch block 74 B), the CMOVXX logic 64 overwrites the conditional move instruction with another of the multiple instructions (CMOV 2 XX), and invalidates any instructions preceding the conditional move instruction in the second copy. As a result, the CMOVXX logic creates two fetch blocks that preserve the fetch block positions of the non-conditional move instructions, and that have the conditional move instruction replaced with the multiple generated instructions (CMOV 1 XX and CMOV 2 XX).
  • the instruction sequencer 60 when the instruction sequencer 60 signals the CMOVXX logic 64 to convert a fetch block containing a conditional move instruction into two fetch blocks, the instruction sequencer 60 signals other circuits of the event so that they may adjust their operation accordingly.
  • the instruction sequencer 60 signals the PC circuitry to provide extra time for the CMOVXX logic 64 to convert one fetch block into the two conditional move fetch blocks 74 .
  • the PC circuitry responds by repeating previously provided program counter information enabling the instruction latch 58 to read twice a fetch block of the entry following the entry having the conditional move instruction.
  • the CMOVXX logic 64 ignores it since this read coincides with formation of the second copy of the two conditional move fetch blocks 74 . However, when the instruction latch 58 reads this fetch block the second time, the CMOVXX logic 64 processes it in a normal fashion.
  • the pipeline 10 is preferably capable of speculative execution of instructions since the processor is an out-of-order data processor.
  • the pipeline 10 includes silos for storing prior processor states so that the pipeline can return to a previous state when instruction execution occurs down an incorrect instruction branch.
  • the pipeline 10 includes a PC silo 56 that stores prior ICACHE addresses (e.g, FILL PC), and a register silo 70 that stores prior logical register to physical register mappings (or assignments).
  • the instruction sequencer 60 When the instruction sequencer 60 signals the CMOVXX logic 64 to convert fetch block containing a conditional move instruction into two fetch block with replaced instructions, the instruction sequencer 60 signals the silos (e.g., the PC silo 56 and the register silo 70 ) of the event. In particular, the instruction sequencer 60 signals the instruction counter 62 which, in turn, updates the PC silo 56 and the register silo 70 . Accordingly, if the pipeline 10 executes down an incorrect instruction branch and attempts to recover, the pipeline 10 will have accounted for the conversion of the one fetch block containing a conditional move instruction into two fetch blocks.
  • the silos e.g., the PC silo 56 and the register silo 70
  • one fetch can be used, as shown in FIG. 4B.
  • the compiler can append a blank instruction (e.g., a NO-OP instruction) after the conditional move instruction within the executable.
  • the CMOVXX logic 64 can simply modify the instruction stream (e.g., fetch block 92 ) by replacing the CMOVXX instruction with the CMOV 1 XX instruction, and replacing the subsequent blank instruction with the CMOV 2 XX instruction (e.g., fetch block 94 ). In this situation, it is unnecessary to signal other circuits (e.g., the PC circuitry and silos) to account for a change in the number of fetch blocks in the pipeline 10 .
  • FIG. 5 shows a flow diagram of a procedure 80 performed by the circuit portions 30 , 50 of the instruction fetch stage 12 .
  • the detect circuit 38 reads a group of instructions (e.g., a fetch block) from the memory 32 (see FIG. 2).
  • the detect circuit 38 determines whether the group includes any conditional move instructions. If the group does not include any conditional move instructions, step 84 proceeds to step 86 , which involves providing the group of instructions to other circuits (e.g., to the register mapper 68 ) further down the pipeline 10 (see FIG. 3).
  • step 88 if the group includes a conditional move instruction, the CMOVXX logic 64 , under control of the instruction sequencer 60 which reads the conditional move code provided by the detect circuit 38 , generates multiple instructions according to the conditional move instruction (i.e., the multiple instructions preserve the “XX” operation of the CMOVXX instruction), and replaces the conditional move instruction within the instruction stream with the generated multiple instructions.
  • the CMOVXX logic 64 performs the replacement in a manner that preserves the instruction positions of the non-conditional move instructions within the fetch blocks.
  • Step 90 which follows steps 86 and 88 , loops back to step 82 to handle more instructions within the instruction stream, unless the procedure 80 is terminated (e.g., due to a reset or power down of the processor).
  • the pipeline 10 includes a register file that includes physical processor registers (see physical registers R A1 , R B1 , R C1 , R C2 and R C3 in FIG. 6).
  • Each of the registers includes a standard field for storing a standard register value (e.g., a 64-bit value), and a predicate (or condition) field (e.g., a single bit).
  • the standard field corresponds to what programmers commonly refer to as the contents of the register.
  • the predicate field is a special field that is preferably used only by the multiple instructions replacing the CMOVXX instruction. That is, the predicate field is preferably not readable directly by the programmers.
  • conditional move instruction which is replaced by the multiple instructions generated by the CMOVXX logic 64 , has the following format:
  • S_R A and S_R B identify logical source registers R A and R B , respectively, and D_R C identifies a logical destination register R C within the processor.
  • D_R C identifies a logical destination register R C within the processor.
  • CMOV 1 XX indicates that the instruction is a first instruction generated from the CMOVXX instruction
  • S_R A and S_R C identify logical source registers R A and R C , respectively
  • D_R C identifies a logical destination register R C within the processor.
  • “XX” within “CMOV 1 XX” indicates that the CMOV 1 XX instruction performs the same type of operation (or function) as that of the CMOVXX instruction (e.g., checking whether the contents of a particular register equal zero).
  • the pseudo-code for the CMOV 1 XX instruction is as follows:
  • R A1 and R C1 are physical registers respectively mapped to logical registers R A and R C prior to mapping the CMOV 1 XX instruction
  • R C2 is a physical register mapped to logical register R C after mapping the CMOV 1 XX instruction
  • R C2 .P is a predicate field of the physical register R C2 .
  • the second instruction has the following format:
  • CMOV 2 XX indicates that the instruction is a second instruction generated from the CMOVXX instruction
  • S_R B and S_R C identify logical source registers R B and R C , respectively
  • D_R C identifies a logical destination register R C within the processor.
  • the pseudo-code for the CMOV 2 XX instruction is as follows:
  • R B1 and R C2 are physical registers respectively mapped to logical registers R B and R C after mapping the CMOV 1 XX instruction and prior to mapping the CMOV 2 XX instruction
  • R C3 is a physical register mapped to logical register R C after mapping the CMOV 2 XX instruction
  • R C2 .P is the predicate field of the physical register R C2 .
  • FIG. 6 shows an execution circuit 100 of the instruction execution stage 22 that operates during execution of the CMOV 1 XX and CMOV 2 XX instructions.
  • the execution circuit 100 includes a CMOV 1 XX circuit 102 and a CMOV 2 XX circuit 104 that connect with physical registers of the register file.
  • the CMOV 1 XX circuit 102 has a first input port 106 that receives the contents of the standard field of physical register R A1 , and a second input port 108 that receives the contents of the standard field of physical register R C1 .
  • the CMOV 2 XX circuit 104 has a first input port 110 that receives the contents of the standard field of physical register R C3 , and a second input port 112 that receives the contents of the standard field of physical register R B1 .
  • the input port 110 further receives the predicate field of physical register R C2 .
  • the predicate field of physical register R C3 i.e., R C2 .P
  • the standard field of physical register R C2 passes through a set of connections 110 B .
  • the CMOV 1 XX circuit 102 includes an evaluation block 114 and a pass-thru block 116 .
  • the evaluation block 114 evaluates the contents of physical register R A1 using the function XX (e.g., equal to, greater than, less than, not equal to, etc.).
  • the pass-thru block 116 transfers the standard field of physical register R C1 to the standard field of physical register R C2 .
  • the CMOV 2 XX circuit 104 includes a multiplexer 118 that selects between the standard field of physical register R C2 (the connection 110 B of input port 110 ) and the standard field of physical register R B1 (input port 112 ) according to the predicate field of physical register R C2 (the connection 110 A of input port 110 ).
  • the multiplexer 118 outputs the contents of the selected standard field to the standard field of physical register R C3 . Accordingly, if the predicate field of physical register R C2 indicates that condition XX exists at physical register R A1 , the multiplexer 118 transfers the standard field of physical register R C2 to the standard field of physical register R C3 .
  • the multiplexer 118 transfers the standard field of physical register R B1 to the standard field of physical register R C3 .
  • each of the CMOV 1 XX instruction and the CMOV 2 XX instruction uses no more than two input ports.
  • An extra connection e.g., a bit line
  • Each register of the register file preferably has such an extra connection to provide access to the predicate field of that register.
  • the circuitry shown in FIG. 6 can be optimized to bypass physical register R C2 such that the output of the evaluation block 114 goes directly to the multiplexer 122 through input port 112 2 .
  • Such an optimization removes the steps of storing a value in the predicate field of physical register R C2 , and subsequently reading the predicate field of physical register R C2 .
  • the invalidated instructions of the fetch blocks 74 A and 74 B in FIG. 4A may be blank instructions (e.g., NO-OP instructions).
  • the invalidated instructions may be the original instructions (e.g., INSTA, INSTB and INSTC) with corresponding flags (not shown) set to indicated to the processor that these instructions are invalid.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

A technique for handling a conditional move instruction in an out-of-order data processor. The technique involves detecting a conditional move instruction within an instruction stream, and generating multiple instructions according to the detected conditional move instruction. The technique further involves replacing the conditional move instruction within the instruction stream with the generated multiple instructions. The generated multiple instructions are generated such that each of the generated multiple instructions executes using no more than two input ports of an execution unit. The generated multiple instructions include a first generated instruction that produces a condition result indicating whether a condition exists, and a second generated instruction that inputs the condition result as a portion of an operand which identifies a register of the out-of-order data processor. The second generated instruction performs a first move operation when the condition is determined to exist, and a second move operation when the condition is determined not to exist.

Description

    FIELD OF THE INVENTION
  • The present invention relates generally to data processing and in particular to techniques for processing a conditional move instruction within a data processor. [0001]
  • BACKGROUND OF THE INVENTION
  • In general, data processors are capable of executing a variety of instructions. One type of instruction is called a conditional move instruction. From a programmer's perspective, a typical conditional move instruction instructs a processor to test whether a particular condition exists (e.g., whether a particular register stores zero), and to move information into a destination register if the particular condition exists. If the[0002]
  • CMOVXX S_RA, S_RB, D_RC,
  • where “CMOVXX” indicates that the instruction is a conditional move instruction that tests for a condition “XX”. “S_R[0003] A” and “S_RB” are source operands that respectively identify registers RA and RB. “D_RC” is a destination operand that identifies register RC.
  • In general, how a processor uses registers depends on whether the processor is capable of executing instructions out of program order. For a processor that cannot execute instructions out of program order (i.e., an in-order processor), instruction source and destination operands typically identify physical registers within the processor. The pseudo-code for executing the CMOVXX instruction in an in-order processor is as follows:[0004]
  • if (XX(RA)), then RC=RB.
  • According to the pseudo-code, the processor determines whether a condition XX exists involving physical register R[0005] A (e.g., whether physical register RA stores zero). If the condition XX exists, the processor moves the contents of physical register RB into physical register RC. Otherwise, the processor leaves the original contents of physical register RC unaltered.
  • In a processor that is capable of executing instructions out of program order (i.e., an out-of-order processor), instruction source and destination operands typically identify logical registers instead of the physical registers directly. The out-of-order processor maps these logical registers to physical processor registers just before instruction execution such that the result of each instruction is stored in a new physical register. This approach enables the processor to avoid problems when executing instructions out of program order (e.g., read-after-write data hazards). [0006]
  • The pseudo-code for executing a CMOVXX instruction in an out-of-order processor is therefore somewhat more complex. Suppose that, prior to mapping the CMOVXX instruction, the out-of-order processor maps logical register R[0007] A to physical register RA1, logical register RB to physical register RB1, and logical register RC to physical register RC1. Additionally suppose that, after mapping the CMOVXX instruction, the out-of-order processor maps logical register RC to physical register RC2 (an new physical register). The pseudo-code for executing the CMOVXX instruction in such a processor is therefore as follows:
  • if (XX(RA1)), then RC2=RB1 else RC2=RC3.
  • According to the pseudo-code, the out-of-order processor determines whether a condition XX exists involving physical register R[0008] A1 (logical register RA). If the condition XX exists, the processor moves the contents of physical register RB1 (logical register RB) into physical register RC2 (to which logical register RC presently is mapped). As such, the contents of logical register RB are stored in logical register RC. If the condition XX does not exist, the processor moves the contents of physical register RC1 (to which logical register RC previously was mapped) into physical register RC2 such that a programmer perceives the contents of logical register RC as remaining unaltered.
  • SUMMARY OF THE INVENTION
  • When a processor executes an instruction within an instruction stream, an execution circuit (or unit) of the processor receives instruction data through input ports, and executes the instruction according to the instruction data. For example, an execution unit of an in-order processor may execute the conditional move instruction:[0009]
  • CMOVXX S_RA, S_RB, D_RC
  • according to the pseudo-code:[0010]
  • if (XX(RA)), then RC=RB
  • where R[0011] A, RB and RC refer to physical registers within the in-order processor. To receive instruction data used by the CMOVXX instruction, the execution unit requires only two input ports: a first port to receive the contents of physical register RA, and a second port to receive the contents of physical register RB.
  • However, an execution unit of an out-of-order processor executes the CMOVXX instruction according to the following pseudo-code:[0012]
  • if (XX(RA1)), then RC2=RB1, else RC2=RC1
  • where R[0013] A1, RB1, RC1 and RC2 refer to physical registers within the in-order processor. To implement this instruction, the out of order execution unit requires three input ports: a first port to receive the contents of physical register RA1, a second port to receive the contents of physical register RB1, and a third port to receive the contents of physical register RC1.
  • There are disadvantages to a processor that uses three input ports to execute instructions. In particular, such a processor would require substantial semiconductor resources (e.g., a disproportionately large area for input port routing). Additionally, processors typically use no more than two input ports to execute non-conditional move instructions. Accordingly, processor designers generally prefer to limit the number of input ports for each instruction to no more than two. Unfortunately, as explained above, a conventional implementation the CMOVXX instruction within an out-of-order processor uses three input ports. [0014]
  • In contrast, an embodiment of the present invention is directed to a technique for handling a conditional move instruction in an out-of-order data processor. The technique involves detecting a conditional move instruction within an instruction stream, and generating multiple instructions according to the detected conditional move instruction. The technique further involves replacing the conditional move instruction within the instruction stream with the generated multiple instructions. Preferably, each of the generated multiple instructions executes using no more than two input ports. As such, it is unnecessary for the processor to use three input ports to execute the instructions. [0015]
  • The generation of multiple instructions preferably involves providing a first generated instruction that determines whether a condition exists, and providing a second generated instruction that performs a move operation based on whether the condition exists. In particular, the second generated instruction performs a first move operation when the condition is determined to exist, and a second move operation when the condition is determined not to exist. When the condition exists, the first move operation loads a new physical register with contents from a specified source register so that, from a programmer's perspective, the processor alters a logical register mapped to the new physical register. When the condition does not exist, the second move operation loads the new physical register with contents of a previously used physical register (to which the logical register was previously mapped) so that, from the programmer's perspective, the processor leaves the logical register unaltered. [0016]
  • Instruction generation may involve providing a first generated instruction that produces a condition result, and providing a second generated instruction that (i) inputs the condition result from a first portion of a register that is separate from a second portion that stores standard contents of the register, and (ii) performs an operation according to the first portion. To this end, the mechanisms for storing the condition result and the standard contents are treated as a single entity (e.g., a register with an extra bit field to store the condition result) rather than as separate registers. As such, the same circuitry for addressing and accessing the standard portion of the registers can be used to address and access the condition field. This feature allows the processor to transfer the condition result through one of two existing input ports alleviating the need for a third input port to carry the condition result. In particular, the processor includes a register file containing instruction registers, each of which has a standard field and a condition field. [0017]
  • Prior to detecting the conditional move instruction, instructions may be loaded from memory in groups (e.g., fetch blocks). In particular, the technique may further involve retrieving a first group of instructions from a memory during a first fetch period, the first group of instructions including the conditional move instruction. Such a retrieval enables instructions to be loaded using less retrieve operations than loading instructions individually. [0018]
  • Other subsequent groups of instructions may be loaded as well. For example, the technique may further include retrieving a second group of instructions from the memory during a second fetch period, the second group following the first group within the instruction stream. The technique may involve retrieving the second group of instructions from the memory again during a third fetch period while the multiple instructions are generated simultaneously. This feature provides an optimization in the sense that, retrieval of the second group of instructions during the third fetch period will make the second group of instructions available at a convenient point in the pipeline to receive one of the generated multiple instructions. [0019]
  • Alternatively, the technique may involve overwriting the conditional move instruction in the retrieved first group of instructions with one of the generated multiple instructions, and overwriting an instruction following the conditional move instruction in the retrieved first group of instructions with another of the generated multiple instructions. In this situation, the instruction following the conditional move instruction is preferably a blank instruction that performs no operation when executed. Accordingly, the processor simply modifies the fetch block containing the conditional move instruction without affecting a subsequent fetch block.[0020]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. [0021]
  • FIG. 1 is a block diagram of an instruction pipeline for a data processor in which the present invention may be used. [0022]
  • FIG. 2 is a block diagram of a portion of an instruction fetch stage of FIG. 1 that detects a conditional move instruction within an instruction stream. [0023]
  • FIG. 3 is a block diagram of a portion of the instruction fetch stage of FIG. 1 that generates multiple instructions according to the detected conditional move instruction, and replaces the conditional move instruction within the instruction stream with the generated multiple instructions. [0024]
  • FIG. 4A is a block diagram of instructions before and after being handled in a first manner by the instruction fetch stage of FIG. 1. [0025]
  • FIG. 4B is a block diagram of instructions before and after being handled in a second manner by the instruction fetch stage of FIG. 1. [0026]
  • FIG. 5 is a flow diagram of a procedure performed by the instruction fetch stage of FIG. 1. [0027]
  • FIG. 6 is a block diagram of execution circuitry within an instruction execution stage of FIG. 1 that executes the generated multiple instructions.[0028]
  • DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT
  • The present invention involves detecting a conditional move instruction within an instruction stream, and replacing it with multiple replacement instructions such that a data processor processing the instruction stream executes the multiple replacement instructions rather than the original conditional move instruction. Preferably, the data processor uses no more than two input ports when executing each of the multiple instructions so that additional processor resources (e.g. a third input port for each instruction) are unnecessary. [0029]
  • The invention is preferably used in an instruction pipeline of a speculative execution out-of-order data processor such as the pipeline [0030] 10 shown in FIG. 1. The pipeline 10 has a series of stages including an instruction fetch stage 12, an instruction slot stage 14, an instruction map stage 16, an instruction issue/queue stage 18, an instruction read stage 20, an instruction execution stage 22, an instruction write stage 24, and an instruction retire stage 26.
  • The pipeline [0031] 10 processes a stream of instructions 28. First, the instruction fetch stage 12 retrieves the instructions from memory. Second, the instruction slot stage 14 determines to which execution unit the instructions should be sent, e.g., a floating point unit or an integer unit (not shown). Third, the instruction map stage 16 maps the instructions such that the instructions refer to physical registers rather than logical registers. Fourth, the instruction issue/queue stage 18 queues the instructions for execution. Fifth, the instruction read stage 20 reads data used by the instructions from the physical registers. Next, the instruction execution stage 22 executes the instructions. Then, the instruction write stage 24 stores results of the executed instructions in the physical registers. Finally, the instruction retire stage 26 retires the instructions by committing the processor state to the results of the instructions.
  • FIG. 2 shows a [0032] circuit portion 30 of the instruction fetch stage 12 that retrieves the instructions 28 from a memory 32 (e.g., main memory or a second-level cache), and temporarily stores the retrieved instructions 28 in an instruction cache (or ICACHE) 34. The circuit portion 30 includes a program counter circuit 36 and a detect circuit 38. The program counter circuit 36 provides program counter information (e.g., a FILL PC pointer) identifying locations within the memory 32 that store instructions to be retrieved. The detect circuit 38 reads instructions from the memory 32 based on the program counter information, scans the retrieved instructions for any conditional move instructions (e.g., CMOVXX), and stores the instructions and scan results in the instruction cache 34. In particular, the detect circuit 38 groups the instructions into fetch blocks (e.g., fetch block 40), generates a conditional move code for each fetch block (e.g., conditional move code 42) indicating the locations of any conditional move instructions within that fetch block, and stores each fetch block and its corresponding conditional move code as an entry of the instruction cache 34 (e.g., entry 44). By way of example, the conditional move code 42 has the binary value “0100” to indicate that the second instruction of fetch block 40 is a conditional move instruction, as shown in FIG. 2.
  • Each entry within the [0033] instruction cache 34 further includes cache tag information (e.g., TAG) indicating whether that entry is valid or invalid. When the tag information indicates that the entry is valid, a read attempt of that entry results in a cache hit. If the tag information indicates that the entry is invalid, a read attempt of that entry results in a cache miss.
  • FIG. 3 shows a circuit portion [0034] 50 of the instruction fetch stage 12 that provides instructions from the instruction cache 34 to other circuits in the pipeline 10. If a conditional move instruction exists within the instructions, the circuit portion 50 generates multiple instructions according to the conditional move instruction, and replaces the conditional move instruction with the generated multiple instructions.
  • The circuit portion [0035] 50 includes a PC latch 52, a PC multiplexer 54, a PC silo 56, an instruction latch 58, an instruction sequencer 60, an instruction counter 62, and conditional move logic (or CMOVXX logic) 64. The PC latch 52, the PC multiplexer 54 and PC silo 56 (hereinafter generally referred to as PC circuitry) operate to provide program counter information identifying instruction cache entries (e.g., entry 44) to be transferred out of the instruction cache 34. The instruction latch 58 holds the fetch blocks from the identified entries, and provides them to the CMOVXX logic 64. Simultaneously, the instruction sequencer 60 retrieves the corresponding conditional move codes from the identified entries, and controls the operation of the PC circuitry and the CMOVXX logic 64 based on the retrieved conditional move codes. In particular, when a conditional move code indicates that its corresponding fetch block does not include a conditional move instruction, the instruction sequencer 60 signals the CMOVXX logic 64 simply to pass the fetch block from the instruction latch 58 to circuits further down the pipeline 10 (e.g., a register mapper 68). Additionally, the instruction sequencer 60 signals the PC circuitry to continue providing a program counter signal (NEXT PC) received on an input 72 of the PC multiplexer 54 so that another entry of the instruction cache 34 can be identified for transfer.
  • However, when a conditional move code indicates that its corresponding fetch block includes a conditional move instruction, the [0036] instruction sequencer 60 signals the CMOVXX logic 64 (i) to generate multiple instructions (i.e., CMOV1XX and CMOV2XX), and (ii) to replace the conditional move instruction with the generated multiple instructions. In response, as shown in FIG. 4A, the CMOVXX logic 64 forms two copies of the fetch block (e.g., fetch block 40) containing the conditional move instruction. In the first copy (fetch block 74A), the CMOVXX logic 64 overwrites the conditional move instruction with one of the multiple instructions (CMOV1XX), and invalidates any instructions in the first copy that follow the conditional move instruction. In the second copy (fetch block 74B), the CMOVXX logic 64 overwrites the conditional move instruction with another of the multiple instructions (CMOV2XX), and invalidates any instructions preceding the conditional move instruction in the second copy. As a result, the CMOVXX logic creates two fetch blocks that preserve the fetch block positions of the non-conditional move instructions, and that have the conditional move instruction replaced with the multiple generated instructions (CMOV1XX and CMOV2XX).
  • With reference again directed to the circuit portion [0037] 50 and FIG. 3, when the instruction sequencer 60 signals the CMOVXX logic 64 to convert a fetch block containing a conditional move instruction into two fetch blocks, the instruction sequencer 60 signals other circuits of the event so that they may adjust their operation accordingly. In particular, the instruction sequencer 60 signals the PC circuitry to provide extra time for the CMOVXX logic 64 to convert one fetch block into the two conditional move fetch blocks 74. The PC circuitry responds by repeating previously provided program counter information enabling the instruction latch 58 to read twice a fetch block of the entry following the entry having the conditional move instruction. When the instruction latch 58 reads this fetch block the first time, the CMOVXX logic 64 ignores it since this read coincides with formation of the second copy of the two conditional move fetch blocks 74. However, when the instruction latch 58 reads this fetch block the second time, the CMOVXX logic 64 processes it in a normal fashion.
  • It should be understood that the pipeline [0038] 10 is preferably capable of speculative execution of instructions since the processor is an out-of-order data processor. The pipeline 10 includes silos for storing prior processor states so that the pipeline can return to a previous state when instruction execution occurs down an incorrect instruction branch. In particular, the pipeline 10 includes a PC silo 56 that stores prior ICACHE addresses (e.g, FILL PC), and a register silo 70 that stores prior logical register to physical register mappings (or assignments).
  • When the [0039] instruction sequencer 60 signals the CMOVXX logic 64 to convert fetch block containing a conditional move instruction into two fetch block with replaced instructions, the instruction sequencer 60 signals the silos (e.g., the PC silo 56 and the register silo 70) of the event. In particular, the instruction sequencer 60 signals the instruction counter 62 which, in turn, updates the PC silo 56 and the register silo 70. Accordingly, if the pipeline 10 executes down an incorrect instruction branch and attempts to recover, the pipeline 10 will have accounted for the conversion of the one fetch block containing a conditional move instruction into two fetch blocks.
  • As an alternative to creating two fetch blocks, one fetch can be used, as shown in FIG. 4B. In particular, when one or more programs are compiled to form an executable, the compiler can append a blank instruction (e.g., a NO-OP instruction) after the conditional move instruction within the executable. When the processor executes the executable, the [0040] CMOVXX logic 64 can simply modify the instruction stream (e.g., fetch block 92) by replacing the CMOVXX instruction with the CMOV1XX instruction, and replacing the subsequent blank instruction with the CMOV2XX instruction (e.g., fetch block 94). In this situation, it is unnecessary to signal other circuits (e.g., the PC circuitry and silos) to account for a change in the number of fetch blocks in the pipeline 10.
  • FIG. 5 shows a flow diagram of a [0041] procedure 80 performed by the circuit portions 30,50 of the instruction fetch stage 12. In step 82, the detect circuit 38 reads a group of instructions (e.g., a fetch block) from the memory 32 (see FIG. 2). In step 84, the detect circuit 38 determines whether the group includes any conditional move instructions. If the group does not include any conditional move instructions, step 84 proceeds to step 86, which involves providing the group of instructions to other circuits (e.g., to the register mapper 68) further down the pipeline 10 (see FIG. 3). However, in step 88, if the group includes a conditional move instruction, the CMOVXX logic 64, under control of the instruction sequencer 60 which reads the conditional move code provided by the detect circuit 38, generates multiple instructions according to the conditional move instruction (i.e., the multiple instructions preserve the “XX” operation of the CMOVXX instruction), and replaces the conditional move instruction within the instruction stream with the generated multiple instructions. In particular, the CMOVXX logic 64 performs the replacement in a manner that preserves the instruction positions of the non-conditional move instructions within the fetch blocks. Step 90, which follows steps 86 and 88, loops back to step 82 to handle more instructions within the instruction stream, unless the procedure 80 is terminated (e.g., due to a reset or power down of the processor).
  • Further details of how the multiple instructions execute within the pipeline [0042] 10 will now be provided. The pipeline 10 includes a register file that includes physical processor registers (see physical registers RA1, RB1, RC1, RC2 and RC3 in FIG. 6). Each of the registers includes a standard field for storing a standard register value (e.g., a 64-bit value), and a predicate (or condition) field (e.g., a single bit). The standard field corresponds to what programmers commonly refer to as the contents of the register. The predicate field is a special field that is preferably used only by the multiple instructions replacing the CMOVXX instruction. That is, the predicate field is preferably not readable directly by the programmers.
  • The conditional move instruction, which is replaced by the multiple instructions generated by the [0043] CMOVXX logic 64, has the following format:
  • CMOVXX S_RA, S_RB, D_RC
  • where S_R[0044] A and S_RB identify logical source registers RA and RB, respectively, and D_RC identifies a logical destination register RC within the processor. When the CMOVXX logic 64 encounters such an instruction, the CMOVXX logic 64, under control of the instruction sequencer 60, generates two instructions, the first of which has the following format:
  • CMOV1XX S_RA, S_RC, D_RC
  • where CMOV[0045] 1XX indicates that the instruction is a first instruction generated from the CMOVXX instruction, S_RA and S_RC identify logical source registers RA and RC, respectively, and D_RC identifies a logical destination register RC within the processor. “XX” within “CMOV1XX” indicates that the CMOV1XX instruction performs the same type of operation (or function) as that of the CMOVXX instruction (e.g., checking whether the contents of a particular register equal zero). The pseudo-code for the CMOV1XX instruction is as follows:
  • RC3.P=xx(RA1); RC2=RC1
  • where R[0046] A1 and RC1 are physical registers respectively mapped to logical registers RA and RC prior to mapping the CMOV1XX instruction, RC2 is a physical register mapped to logical register RC after mapping the CMOV1XX instruction, and RC2.P is a predicate field of the physical register RC2.
  • The second instruction has the following format:[0047]
  • CMOV2XX S_RB, S_RC, D_RC
  • where CMOV[0048] 2XX indicates that the instruction is a second instruction generated from the CMOVXX instruction, S_RB and S_RC identify logical source registers RB and RC, respectively, and D_RC identifies a logical destination register RC within the processor. The pseudo-code for the CMOV2XX instruction is as follows:
  • if (RC2.P) RC3=RC2 else RC3=RB1
  • where R[0049] B1 and RC2 are physical registers respectively mapped to logical registers RB and RC after mapping the CMOV1XX instruction and prior to mapping the CMOV2XX instruction, RC3 is a physical register mapped to logical register RC after mapping the CMOV2XX instruction, and RC2.P is the predicate field of the physical register RC2.
  • FIG. 6 shows an [0050] execution circuit 100 of the instruction execution stage 22 that operates during execution of the CMOV1XX and CMOV2XX instructions. The execution circuit 100 includes a CMOV1XX circuit 102 and a CMOV2XX circuit 104 that connect with physical registers of the register file. The CMOV1XX circuit 102 has a first input port 106 that receives the contents of the standard field of physical register RA1, and a second input port 108 that receives the contents of the standard field of physical register RC1. The CMOV2XX circuit 104 has a first input port 110 that receives the contents of the standard field of physical register RC3, and a second input port 112 that receives the contents of the standard field of physical register RB1.
  • The input port [0051] 110 further receives the predicate field of physical register RC2. In particular, the predicate field of physical register RC3 (i.e., RC2.P) passes through a connection 110 A, and the standard field of physical register RC2 passes through a set of connections 110 B.
  • The [0052] CMOV1XX circuit 102 includes an evaluation block 114 and a pass-thru block 116. The evaluation block 114 evaluates the contents of physical register RA1 using the function XX (e.g., equal to, greater than, less than, not equal to, etc.). The pass-thru block 116 transfers the standard field of physical register RC1 to the standard field of physical register RC2.
  • The [0053] CMOV2XX circuit 104 includes a multiplexer 118 that selects between the standard field of physical register RC2 (the connection 110 B of input port 110) and the standard field of physical register RB1 (input port 112) according to the predicate field of physical register RC2 (the connection 110 A of input port 110). The multiplexer 118 outputs the contents of the selected standard field to the standard field of physical register RC3. Accordingly, if the predicate field of physical register RC2 indicates that condition XX exists at physical register RA1, the multiplexer 118 transfers the standard field of physical register RC2 to the standard field of physical register RC3. On the other hand, if the predicate field of physical register RC2 indicates that the condition XX does not exist at physical register RA1, the multiplexer 118 transfers the standard field of physical register RB1 to the standard field of physical register RC3.
  • As is shown in FIG. 6, each of the CMOV[0054] 1XX instruction and the CMOV2XX instruction uses no more than two input ports. An extra connection (e.g., a bit line) is used rather than an entire third input port (multiple bit lines). Each register of the register file preferably has such an extra connection to provide access to the predicate field of that register. Such an arrangement provides substantial savings in semiconductor resources relative to providing each instruction with a third input port.
  • EQUIVALENTS
  • While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. [0055]
  • For example, the circuitry shown in FIG. 6 can be optimized to bypass physical register R[0056] C2 such that the output of the evaluation block 114 goes directly to the multiplexer 122 through input port 112 2. Such an optimization removes the steps of storing a value in the predicate field of physical register RC2, and subsequently reading the predicate field of physical register RC2.
  • Furthermore, it should be understood that the invalidated instructions of the fetch [0057] blocks 74A and 74B in FIG. 4A may be blank instructions (e.g., NO-OP instructions). Alternatively, the invalidated instructions may be the original instructions (e.g., INSTA, INSTB and INSTC) with corresponding flags (not shown) set to indicated to the processor that these instructions are invalid.

Claims (25)

What is claimed is:
1. A method for handling a conditional move instruction in an out-of-order data processor, comprising the steps of:
detecting a conditional move instruction within an instruction stream;
generating multiple instructions according to the detected conditional move instruction; and
replacing the conditional move instruction within the instruction stream with the generated multiple instructions.
2. The method of claim 1, wherein the step of generating includes the step of:
forming the multiple instructions such that each of the multiple instructions executes using no more than two input ports of an execution unit.
3. The method of claim 1, wherein the step of generating includes the step of:
providing a first generated instruction that determines whether a condition exists; and
providing a second generated instruction that performs a first move operation when the condition is determined to exist, and a second move operation when the condition is determined not to exist.
4. The method of claim 1, wherein the step of generating includes the step of:
providing a condition result in a first portion of a register, the first portion of the register being separate from a second portion of the register that stores standard register contents; and
providing a second generated instruction that (i) inputs the condition result from the first portion of the register, and (ii) performs an operation according to the condition result.
5. The method of claim 1, further comprising the steps of:
retrieving a first group of instructions from a memory during a first fetch period, the first group of instructions including the conditional move instruction; and
retrieving a second group of instructions from the memory during a second fetch period, the second group following the first group within the instruction stream.
6. The method of claim 5, further comprising the step of:
retrieving the second group of instructions from the memory again during a third fetch period while the multiple instructions are generated simultaneously.
7. The method of claim 1, wherein the step of replacing includes the step of:
overwriting the conditional move instruction with one of the generated multiple instructions; and
overwriting an instruction following the conditional move instruction in the instruction stream with another of the generated multiple instructions.
8. The method of claim 7, wherein the instruction following the conditional move instruction in the retrieved first group of instructions is a blank instruction that performs no operation when executed.
9. The method of claim 1, wherein the step of replacing includes the step of:
retrieving, from the memory, a first group of instructions that includes the conditional move instruction;
generating a second group of instructions having the conditional move instruction from the first group; and
revising the first and second groups of instructions such that (i) the conditional move instruction of the first group is replaced with one of the generated multiple instructions and any instructions following the conditional move instruction of the first group are invalidated, and (ii) the conditional move instruction of the second group is replaced with another of the generated multiple instructions and any instructions ahead of the conditional move instruction of the second group are invalidated.
10. The method of claim 1, further comprising the step of:
associating a same program counter value with each of the generated multiple instructions such that the generated multiple instructions are identifiable when speculative execution occurs down an incorrect instruction branch.
11. The method of claim 1, further comprising the step of:
generating a code that identifies a position of the conditional move instruction within a group of instructions.
12. A pipeline circuit for handling a conditional move instruction in an out-of-order data processor, comprising:
a detect circuit that detects a conditional move instruction within an instruction stream; and
a control circuit, coupled to the detect circuit, that generates multiple instructions according to the detected conditional move instruction, and replaces the conditional move instruction within the instruction stream with the generated multiple instructions.
13. The pipeline circuit of claim 12, wherein the control circuit includes:
an instruction forming circuit that forms the multiple instructions such that each of the multiple instructions executes using no more than two input ports of an execution unit.
14. The pipeline circuit of claim 12, wherein the control circuit includes an output that provides, as the generated multiple instructions:
a first generated instruction that determines whether a condition exists; and
a second generated instruction that performs a first move operation when the condition is determined to exist, and a second move operation when the condition is determined not to exist.
15. The pipeline circuit of claim 12, wherein the control circuit includes an output that provides, as the generated multiple instructions:
a first generated instruction that produces a condition result in a first portion of a register, the first portion of the register being separate from a second portion of the register that stores standard register contents; and
a second generated instruction that (i) inputs the condition result from the first portion of the register, and (ii) performs an operation according to the condition result.
16. The pipeline circuit of claim 15, further comprising:
a register file that includes multiple registers, each of the multiple registers having a standard field and a condition field, a particular one of the multiple registers being the register having the first and second portions, the first portion being a condition field and the second portion being a standard field.
17. The pipeline circuit of claim 12, wherein the control circuit includes:
a sequencing circuit that (i) retrieves a first group of instructions from a memory during a first fetch period, the first group of instructions including the conditional move instruction, and (ii) retrieves a second group of instructions from the memory during a second fetch period, the second group following the first group within the instruction stream.
18. The pipeline circuit of claim 17, wherein the sequencing circuit is further adapted to retrieve the second group of instructions from the memory again during a third fetch period while the multiple instructions are generated simultaneously.
19. The pipeline circuit of claim 12, wherein the control circuit further includes:
instruction logic that (i) overwrites the conditional move instruction with one of the generated multiple instructions, and (ii) overwrites an instruction following the conditional move instruction in the instruction stream with another of the generated multiple instructions.
20. The pipeline circuit of claim 19, wherein the instruction following the conditional move instruction in the retrieved first group of instructions is a blank instruction that performs no operation when executed.
21. The pipeline circuit of claim 12, wherein the control circuit includes circuitry that:
generates a second group of instructions having a conditional move instruction from a first group of instructions having the conditional move instruction; and
revises the first and second groups of instructions such that (i) the conditional move instruction of the first group is replaced with one of the generated multiple instructions and any instructions following the conditional move instruction of the first group are invalidated, and (ii) the conditional move instruction of the second group is replaced with another of the generated multiple instructions and any instructions ahead of the conditional move instruction of the second group are invalidated.
22. The pipeline circuit of claim 12, wherein the control circuit includes:
an instruction sequencer that associates a same program counter value with each of the generated multiple instructions such that the generated multiple instructions are identifiable when speculative execution occurs down an incorrect instruction branch.
23. The pipeline circuit of claim 12, wherein the detect circuit is adapted to generate a code that identifies a position of the conditional move instruction within a group of instructions.
24. A method for handling a conditional move instruction in an out-of-order data processor, comprising the steps of:
detecting a conditional move instruction within an instruction stream;
generating multiple instructions according to the detected conditional move instruction; and
replacing the conditional move instruction within the instruction stream with the generated multiple instructions, the generated multiple instructions being generated such that each of the generated multiple instructions executes using no more than two input ports of an execution unit, the generated multiple instructions including a first generated instruction that produces a condition result indicating whether a condition exists, and a second generated instruction that inputs the condition result as a portion of an operand which identifies a register of the out-of-order data processor, the second generated instruction performing a first move operation when the condition is determined to exist, and a second move operation when the condition is determined not to exist.
25. A pipeline circuit for handling a conditional move instruction in an out-of-order data processor, comprising:
a detect circuit that detects a conditional move instruction within an instruction stream; and
a control circuit, coupled to the detect circuit, that (i) generates multiple instructions according to the detected conditional move instruction, and (ii) replaces the conditional move instruction within the instruction stream with the generated multiple instructions, the generated multiple instructions being generated such that each of the generated multiple instructions executes using no more than two input ports of an execution unit, the generated multiple instructions including a first generated instruction that produces a condition result indicating whether a condition exists, and a second generated instruction that inputs the condition result as a portion of an operand which identifies a register of the out-of-order data processor, the second generated instruction performing a first move operation when the condition is determined to exist, and a second move operation when the condition is determined not to exist.
US09/195,121 1998-11-18 1998-11-18 Implementation of a conditional move instruction in an out-of-order processor Expired - Lifetime US6449713B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/195,121 US6449713B1 (en) 1998-11-18 1998-11-18 Implementation of a conditional move instruction in an out-of-order processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/195,121 US6449713B1 (en) 1998-11-18 1998-11-18 Implementation of a conditional move instruction in an out-of-order processor

Publications (2)

Publication Number Publication Date
US20020112142A1 true US20020112142A1 (en) 2002-08-15
US6449713B1 US6449713B1 (en) 2002-09-10

Family

ID=22720140

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/195,121 Expired - Lifetime US6449713B1 (en) 1998-11-18 1998-11-18 Implementation of a conditional move instruction in an out-of-order processor

Country Status (1)

Country Link
US (1) US6449713B1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020124159A1 (en) * 2000-11-27 2002-09-05 Bekooji Marco Jan Gerrit Data processing apparatus
KR100628573B1 (en) 2004-09-08 2006-09-26 삼성전자주식회사 Apparatus capable of execution of conditional instructions in out of order and method thereof
US20080082795A1 (en) * 2006-09-29 2008-04-03 Mips Technologies, Inc. Twice issued conditional move instruction, and applications thereof
US20080209127A1 (en) * 2007-02-23 2008-08-28 Daniel Alan Brokenshire System and method for efficient implementation of software-managed cache
US20190171592A1 (en) * 2014-10-29 2019-06-06 Hewlett Packard Enterprise Development Lp Trans-fabric instruction set for a communication fabric
US20200382271A1 (en) * 2019-05-27 2020-12-03 Idemia Identity & Security France Methods for implementing and obfuscating a cryptographic algorithm having a given secret key

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020124157A1 (en) * 2001-03-01 2002-09-05 International Business Machines Corporation Method and apparatus for fast operand access stage in a CPU design using a cache-like structure
JP3763518B2 (en) * 2001-05-29 2006-04-05 インターナショナル・ビジネス・マシーンズ・コーポレーション COMPILER, COMPILING METHOD THEREOF, AND PROGRAM
US6948162B2 (en) * 2002-01-09 2005-09-20 Sun Microsystems, Inc. Enhanced parallelism in trace scheduling by using renaming
US20040148496A1 (en) * 2003-01-27 2004-07-29 Thimmannagari Chandra Mohan Reddy Method for handling a conditional move instruction in an out of order multi-issue processor
JP3974063B2 (en) * 2003-03-24 2007-09-12 松下電器産業株式会社 Processor and compiler
US20050050524A1 (en) * 2003-08-25 2005-03-03 Arm Limited Generating software test information
US7624256B2 (en) * 2005-04-14 2009-11-24 Qualcomm Incorporated System and method wherein conditional instructions unconditionally provide output
US7793079B2 (en) * 2007-06-27 2010-09-07 Qualcomm Incorporated Method and system for expanding a conditional instruction into a unconditional instruction and a select instruction

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5163140A (en) * 1990-02-26 1992-11-10 Nexgen Microsystems Two-level branch prediction cache
CA2045791A1 (en) * 1990-06-29 1991-12-30 Richard Lee Sites Branch performance in high speed processor
KR100299691B1 (en) * 1991-07-08 2001-11-22 구사마 사부로 Scalable RSC microprocessor architecture
US5564118A (en) 1992-11-12 1996-10-08 Digital Equipment Corporation Past-history filtered branch prediction
US5426600A (en) * 1993-09-27 1995-06-20 Hitachi America, Ltd. Double precision division circuit and method for digital signal processor
US5974240A (en) * 1995-06-07 1999-10-26 International Business Machines Corporation Method and system for buffering condition code data in a data processing system having out-of-order and speculative instruction execution
US5745724A (en) * 1996-01-26 1998-04-28 Advanced Micro Devices, Inc. Scan chain for rapidly identifying first or second objects of selected types in a sequential list
US5860017A (en) * 1996-06-28 1999-01-12 Intel Corporation Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction
US5889984A (en) * 1996-08-19 1999-03-30 Intel Corporation Floating point and integer condition compatibility for conditional branches and conditional moves
JPH10214188A (en) * 1997-01-30 1998-08-11 Toshiba Corp Method for supplying instruction of processor, and device therefor
US6058472A (en) * 1997-06-25 2000-05-02 Sun Microsystems, Inc. Apparatus for maintaining program correctness while allowing loads to be boosted past stores in an out-of-order machine
US6170052B1 (en) * 1997-12-31 2001-01-02 Intel Corporation Method and apparatus for implementing predicated sequences in a processor with renaming

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020124159A1 (en) * 2000-11-27 2002-09-05 Bekooji Marco Jan Gerrit Data processing apparatus
KR100628573B1 (en) 2004-09-08 2006-09-26 삼성전자주식회사 Apparatus capable of execution of conditional instructions in out of order and method thereof
US20080082795A1 (en) * 2006-09-29 2008-04-03 Mips Technologies, Inc. Twice issued conditional move instruction, and applications thereof
US8078846B2 (en) * 2006-09-29 2011-12-13 Mips Technologies, Inc. Conditional move instruction formed into one decoded instruction to be graduated and another decoded instruction to be invalidated
US20080209127A1 (en) * 2007-02-23 2008-08-28 Daniel Alan Brokenshire System and method for efficient implementation of software-managed cache
US7752350B2 (en) * 2007-02-23 2010-07-06 International Business Machines Corporation System and method for efficient implementation of software-managed cache
US20190171592A1 (en) * 2014-10-29 2019-06-06 Hewlett Packard Enterprise Development Lp Trans-fabric instruction set for a communication fabric
US10846246B2 (en) * 2014-10-29 2020-11-24 Hewlett Packard Enterprise Development Lp Trans-fabric instruction set for a communication fabric
US20200382271A1 (en) * 2019-05-27 2020-12-03 Idemia Identity & Security France Methods for implementing and obfuscating a cryptographic algorithm having a given secret key

Also Published As

Publication number Publication date
US6449713B1 (en) 2002-09-10

Similar Documents

Publication Publication Date Title
US5577200A (en) Method and apparatus for loading and storing misaligned data on an out-of-order execution computer system
JP3588755B2 (en) Computer system
US5584009A (en) System and method of retiring store data from a write buffer
US5471598A (en) Data dependency detection and handling in a microprocessor with write buffer
US7996646B2 (en) Efficient encoding for detecting load dependency on store with misalignment
US6449713B1 (en) Implementation of a conditional move instruction in an out-of-order processor
JPH0429093B2 (en)
EP0651331B1 (en) A write buffer for a superpipelined, superscalar microprocessor
US5615402A (en) Unified write buffer having information identifying whether the address belongs to a first write operand or a second write operand having an extra wide latch
KR19990072271A (en) High performance speculative misaligned load operations
US6338134B1 (en) Method and system in a superscalar data processing system for the efficient processing of an instruction by moving only pointers to data
US5740393A (en) Instruction pointer limits in processor that performs speculative out-of-order instruction execution
US7321964B2 (en) Store-to-load forwarding buffer using indexed lookup
US6711670B1 (en) System and method for detecting data hazards within an instruction group of a compiled computer program
US5841999A (en) Information handling system having a register remap structure using a content addressable table
US6651164B1 (en) System and method for detecting an erroneous data hazard between instructions of an instruction group and resulting from a compiler grouping error
US5293499A (en) Apparatus for executing a RISC store and RI instruction pair in two clock cycles
JPH07129399A (en) Microcomputer executing multi-function instruction by use of instruction attribute register
JP2626675B2 (en) Apparatus and method for generating data-induced state signal
US6289439B1 (en) Method, device and microprocessor for performing an XOR clear without executing an XOR instruction
JP2987281B2 (en) Processor

Legal Events

Date Code Title Description
AS Assignment

Owner name: COMPAQ COMPUTER CORPORATION, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:EMER, JOEL SPRINGER;EDWARDS, BRUCE;MCLELLAN, EDWARD J.;AND OTHERS;REEL/FRAME:011236/0289;SIGNING DATES FROM 20000727 TO 20000807

AS Assignment

Owner name: DIGITAL EQUIPMENT CORPORATION, MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LEIBHOLZ, DANIEL LAWRENCE;REEL/FRAME:011539/0256

Effective date: 19981110

AS Assignment

Owner name: COMPAQ COMPUTER CORPORATION, TEXAS

Free format text: MERGER;ASSIGNOR:DIGITAL EQUIPMENT CORPORATION;REEL/FRAME:011776/0688

Effective date: 19991209

AS Assignment

Owner name: COMPAQ INFORMATION TECHNOLOGIES GROUP, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:COMPAQ COMPUTER CORPORATION;REEL/FRAME:012460/0775

Effective date: 20010620

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: LASALLE BANK, N.A., ILLINOIS

Free format text: SECURITY AGREEMENT;ASSIGNOR:SLOAN VALVE COMPANY;REEL/FRAME:014683/0095

Effective date: 20030529

Owner name: LASALLE BANK, N.A., ILLINOIS

Free format text: SECURITY INTEREST;ASSIGNOR:SLOAN VALVE COMPANY;REEL/FRAME:015302/0867

Effective date: 20030529

FEPP Fee payment procedure

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027

AS Assignment

Owner name: SLOAN VALVE COMPANY, ILLINOIS

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENTS, TRADEMARKS AND TRADENAMES;ASSIGNOR:BANK OF AMERICA, N.A. (AS SUCCESSOR-IN-INTEREST TO LASALLE BANK);REEL/FRAME:056728/0307

Effective date: 20210630