US20020032852A1 - Backing out of a processor architectural state - Google Patents

Backing out of a processor architectural state Download PDF

Info

Publication number
US20020032852A1
US20020032852A1 US09/132,042 US13204298A US2002032852A1 US 20020032852 A1 US20020032852 A1 US 20020032852A1 US 13204298 A US13204298 A US 13204298A US 2002032852 A1 US2002032852 A1 US 2002032852A1
Authority
US
United States
Prior art keywords
instruction
register
registers
processor
set forth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/132,042
Other versions
US6412067B1 (en
Inventor
Ricardo Ramirez
Michael J. Morrison
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US09/132,042 priority Critical patent/US6412067B1/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MORRISON, MICHAEL J., RAMIREZ, RICARDO
Publication of US20020032852A1 publication Critical patent/US20020032852A1/en
Application granted granted Critical
Publication of US6412067B1 publication Critical patent/US6412067B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • G06F9/3863Recovery, e.g. branch miss-prediction, exception handling using multiple copies of the architectural state, e.g. shadow registers

Definitions

  • the present invention relates generally to computers and processors, and more specifically, to delaying the deallocation of registers and backing out of architectural states.
  • Processors fetch and execute a sequence of instructions from memory.
  • the instructions ordinarily manipulate data stored in memory or registers.
  • the processor decodes the instructions into first and second types of instructions adapted to execution on particular types of hardware units.
  • the first type of micro-instruction loads and stores data between the memory and registers, which are typically internal to the processor.
  • the second type of micro-instruction manipulates data stored in the internal registers and writes the results from the manipulations back to the internal registers. Since the number of internal registers is limited, an absence of available internal registers may occur causing a bottleneck at the decode stage.
  • the processor ordinarily employs methods that efficiently use the internal registers to reduce the occurrence of decode bottlenecks.
  • One mechanism for using the limited number of internal registers entails producing instructions through several operations.
  • the processor decodes an incoming instruction into one or more instructions having logical operands.
  • logical operands are defined to mean dummy variables for some source and destination addresses of instructions.
  • an allocator assigns one or more of the available internal registers to the logical operands introduced in the first step.
  • a retirement unit deallocates the previously assigned internal registers of executed instructions without substantial delay when other instructions no longer need to read the contents of the registers. Deallocation makes more internal registers available for assignment to newly decoded instructions. Thus, retirement units should rapidly deallocate registers to reduce the occurrence of instruction decode bottlenecks.
  • Exceptions may be attributable to interrupts and faults generated during execution of instructions. Recovering from an exception involves both detecting the exception and reporting the exception to hardware that may re-execute any improperly executed instructions. The proper re-execution normally involves returning the processor to a pre-exception state. Thus, re-execution may include restoring original data to internal registers and reinserting the excepting instruction and the instructions dependent thereupon back into execution pipelines.
  • a system designed to detect and report all exceptions may employ substantial hardware, i.e., a large area on the processor chip, and may encumber the ordinary retirement cycle.
  • the detection of complex fault events may entail heavy area and time costs, because more verifications are ordinarily employed to check for complex faults.
  • Complex fault detection may slow the retirement process with verifications for rarely occurring faults.
  • an exception may occur on both the earlier and later members of the sequence, e.g. ⁇ I 1 , or ⁇ I 2 .
  • Two methods may be pursued to recover from an exception on a later member, e.g. ⁇ I 2 .
  • the processor may correct the condition causing the exception and re-execute only the excepting instructions by (a) detecting which instruction excepted, and (b) reinstating the initial execution state associated therewith.
  • the processor may correct the condition causing the exception and re-execute the entire sequence, i.e., ⁇ I′ 1 , ⁇ I′ 2 , etc., whenever any member of the sequence registers an exception. Implementing either of the above methods may be problematic.
  • the sequence from the macro-instruction may include “retired” instructions, because earlier members, e.g., ⁇ I 1 , may have completed execution.
  • the instruction R+R′ ⁇ R destroys the original data in R when the instruction is retired, i.e., the architectural state has changed.
  • executing earlier members of the sequence e.g., ⁇ I 1
  • Prior art processors may not handle exceptions on instructions produced by decoding a single macro-instruction inefficiently.
  • the present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.
  • a first aspect of the present invention provides an apparatus.
  • the processor has a plurality of registers.
  • the processor is capable of re-executing at least one selected instruction by backing out of an architectural register state.
  • a second aspect of the present provides a method for backing a processor out of an architectural state. The method comprises reassigning a register to a logical operand of an instruction, the register having been assigned to the logical operand in a previous architectural state; and re-executing the instruction.
  • FIG. 1 is a high-level block diagram of an embodiment, in accordance with the present invention, of a processor that delays the deallocation of a portion of the registers;
  • FIG. 2 is a flowchart illustrating an embodiment of a method for executing instructions in the processor of FIG. 1;
  • FIG. 3 is a high-level block diagram of an embodiment of a processor having a back-out register for use in delaying the deallocation of selected registers:
  • FIG. 4 is a flowchart illustrating an embodiment of a method for re-executing selected instructions in the processor of FIG. 3;
  • FIG. 5 is a flowchart illustrating an embodiment of a method of re-executing selected instructions in the processor of FIG. 3 by backing out of an architectural state
  • FIG. 6 is high-level block diagram an embodiment of a processor which implements speculative execution and also backs out of architectural states for re-executions involving selected registers;
  • FIG. 7 illustrates a time line of the register allocation table and back-out register as instructions progress through the processor of FIG. 6:
  • FIG. 8 is a flowchart illustrating an embodiment of one method of operating the processor of FIG. 6.
  • an architectural state is the state of a processor's registers and memories after writes by all executed instructions determined to have executed properly, i.e., after sites by all properly retired instructions.
  • a speculative state is the state of processor's registers and memories after writes by all executed instructions, i.e., after writes by all executed instructions whether or not the instructions have been determined to have executed properly.
  • a processor updates a speculative state to an architectural state after a retirement unit determines that the instructions, which will update the state, have properly executed.
  • FIG. 1 is a high-level block diagram illustrating a portion of a first embodiment of a processor 20 that delays the deallocation of selected registers.
  • the selected registers include all internal registers.
  • An allocator 21 is a hardware device that assigns registers 22 , 23 , 24 to a portion of the logical operands of incoming instructions.
  • the registers 22 , 23 , 24 belong to a register file 25 , i.e., a hardware structure for handling and directing accesses of the plurality of internal registers 22 , 23 , 24 .
  • a retirement unit 26 retires instructions that have been executed by an execution unit 27 . The retirement unit 26 deallocates the registers 22 , 23 .
  • Deallocation makes the registers 22 , 23 , 24 available for assignment to new incoming instructions by the allocator 21 .
  • the retirement unit 26 delays the deallocation of a selected portion of the registers 22 , 23 , 24 .
  • the registers 22 , 23 , 24 may be classified into available registers, used registers, and delayed registers.
  • the allocator 21 may assign an “available” register to a logical operand of an incoming instruction.
  • the allocator 21 may not assign either “used” or “delayed” registers to the logical operands of incoming instructions.
  • the “used” and the “delayed” registers are not deallocated in the sense that the allocator 21 may not reassign them to a destination logical operand of incoming instruction.
  • at least one instruction may read or write a “used” register. Active instructions may neither read nor write “delayed” registers.
  • the processor 20 saves the identifiers of “delayed” registers.
  • Register identifiers are physical addresses. Thus, execution results in “delayed” registers may be accessed and used even though the instructions that produced the results are retired, and the results have been removed from the processor's architectural state.
  • a class of registers stores a type of data, e.g., floating-point data, integer data, predicate values, multimedia data, etc.
  • the term class may also apply to logical operands, e.g., floating-point data, integer data, predicate values, multimedia data. etc.
  • the classes may also store types of data, which are not numerated above.
  • FIG. 2 is a flowchart illustrating an embodiment of a method 30 for executing instructions in the processor 20 of FIG. 1.
  • the allocator 21 assigns a first register 23 to a first logical operand of a first instruction.
  • the allocator 21 assigns a second register 24 to a second logical operand of a second instruction.
  • the second instruction follows the first instruction in the instruction sequence.
  • the first and second logical operands are the same logical operand. In other embodiments, the first and second logical operands may be different logical operands as long as they belong to the same preselected class, e.g., floating point.
  • the execution unit 27 executes the first and second instructions.
  • the retirement unit 26 retires the executed first instruction.
  • the retirement unit 26 saves the identifier of the first register in response to retiring the second instruction.
  • the first register 23 is a “delayed” register, i.e. the contents therein may still be retrieved.
  • the retirement unit 26 delays the deallocation and saves the identifiers of a preselected classes of the registers 22 , 23 . 24 and logical operands or of the registers of selected classes of instructions. Different embodiments may select different classes of the registers 22 , 23 , 24 and logical operands or different classes of instructions. In specific embodiments, the retirement unit 26 may delay the deallocation of one or of more than one selected class of registers. In some embodiments, the retirement unit 26 may delay the deallocation of the registers 22 . 23 , 24 associated with specific instruction classes, e.g., one or more registers 22 . 23 , 24 assigned to instructions resulting from the decoding of a selected single macro-instruction.
  • FIG. 3 is a high-level block diagram illustrating one embodiment of a processor 38 that includes a back-out register file 39 to delay the deallocation of selected registers.
  • the back-out register file 39 has storage positions 40 , 41 to save the identifiers of the registers 22 , 23 , 24 , previously assigned to the selected destination logical operand of one or more retired instructions.
  • the back-out register file 39 may comprise one or several registers.
  • the retirement unit 26 writes the identifier of the register 22 , 23 , 24 assigned to a destination logical operand of a selected and retired first instruction to the back-out register file 39 in response to retiring a second instruction belonging to the same class.
  • the second instruction is an instruction having the same destination logical operand as the first instruction.
  • the previously assigned registers might have been deallocated, because unretired instructions may no longer read the data stored in the register assigned to the first instruction after the retirement of the second instruction having the same destination logical register.
  • the portion of the registers 22 . 23 , 24 with identifiers stored in the back-out register file 39 are “delayed” registers.
  • the processor 38 re-executes instructions in response to selected exceptions detected by the retirement unit 26 .
  • a decoder 44 translates incoming instructions into sequences of instructions, and sends the instructions to the allocator 21 .
  • the retirement unit 26 detects selected exceptions and sets instructions for re-execution in response to the selected exceptions.
  • the retirement unit 26 may signal microcode 45 to prepare instructions for re-execution.
  • Microcode is a combination of hardware and specialized permanent memory, e.g., read-only memory (ROM), that performs a special function and is ordinarily internal to the processor.
  • the microcode 45 In response to the signal from the retirement unit 26 , the microcode 45 reads the back-out register file 39 to obtain the identifiers of the portion of the registers 22 , 23 , 24 previously assigned to the selected logical operands.
  • the microcode 45 produces machine code for the instructions for re-execution.
  • selected logical operands are assigned the portion of the registers 22 , 23 , 24 , which were previously assigned and correspond to the identifiers saved in the back-out register file 39 , i.e. the “delayed registers.
  • the microcode 45 introduces the previous register assignments in the machine code of the instructions for re-execution. This may be referred to as “backing out architectural register assignments.”
  • An output line 46 sends the instructions to re-execute from the microcode 45 to the execution unit 27 .
  • FIG. 4 is a flowchart illustrating an embodiment of a method 47 for re-executing selected instructions in the processor 38 of FIG. 3.
  • the method includes backing out of an architectural register state.
  • the processor 38 executes a first instruction having a first register as a destination address.
  • the retirement unit 26 retires a second instruction having a second register 22 as a destination address.
  • the first and second registers 23 , 22 have been assigned to the same selected logical operand by the allocator 21 .
  • the retirement unit 26 makes the second register 22 a “delayed” register in response to determining that the first instruction is ready to retire.
  • the retirement unit 26 retires the first instruction, already having retired the second instruction.
  • the processor 38 re-executes a third instruction having the selected logical operand as a source or as a destination address.
  • the third instruction may be one of the above-mentioned instructions or another instruction.
  • Re-executing includes reassigning the second register 22 , i.e. a delayed register, to the same selected logical operand in the third instruction. By reassigning the second register to the selected logical operand, re-execution backs out of the architectural assignment.
  • some embodiments may deallocate a register if another instruction having a destination register of the selected class retires. For example, at block 53 , the retirement unit 26 deallocates the first register 23 in response to retiring a fourth instruction having the different register 24 assigned to the same logical operand.
  • the storage positions 40 , 41 may store the identifiers of both the portion of the registers 22 , 23 , 24 previously assigned and before previously assigned to the selected logical operands. Such an embodiment may back-out of several changes to the architectural register state.
  • the storage positions 40 , 41 may store the identifiers of a portion of the registers 22 , 23 , 24 previously assigned to several selected logical operands.
  • FIG. 5 is a flowchart illustrating an embodiment of a method 54 of re-executing instructions in the processor 38 of FIG. 3 by backing out of an architectural register state. Blocks 48 , 49 , 50 , 51 , and 52 were described in FIG. 4.
  • the retirement unit 26 writes the identity of the second register 22 to one of the positions 40 , 41 of the back-out register file 39 .
  • the positions 40 , 41 correspond to a destination logical operand X to which the second register was assigned by the allocator 21 .
  • the retirement unit 26 sets a third instruction for re-execution, e.g., in response to an exception.
  • the third instruction has the X logical operand as a source address.
  • the microcode 45 reads the back-out register file 22 to determine the identifier of the previously assigned register for the logical operand X, i.e. the second register 22 , and reassigns the identifier therefrom to the logical operand X in the third instruction.
  • the microcode 45 redirects the decoder 44 to send the third instruction for re-execution with the logical operand X replaced by the second register 22 .
  • the processor 38 backs out of an architectural register state to re-execute the third instruction.
  • the first and second instructions of FIGS. 4 and 5 result from decoding an incoming “packed” floating-point macro-instruction. i.e. an instruction performing several floating-point operations in parallel, or multimedia macro-instruction.
  • the exceptions stimulating a back-out of an architectural state occur on the second or later sequential instruction of the same class.
  • the processor 38 of FIG. 3 recovers from exceptions on either the first or the second instructions by correcting and re-executing both. In some embodiments, correcting and re-executing both instructions may the time and hardware used to detect exceptions.
  • the architectural register state is no longer proper for re-executing the first or earlier sequential instruction, but the back-out register file 39 enables the processor 38 to restore the proper register state.
  • Some embodiments may increase efficiency by re-executing all instructions coming from decoding a selected macro-instructions even when only a subset of the instructions encounter exceptions. This method may reduce the amount of hardware employed for detecting exceptions. Similarly, less operating time may be used to determine whether any, as opposed to which, of the selected instructions encountered an exception. In some embodiments, the time costs to individually detect the selected exceptions are high, and the selected exceptions are rare. Then, the added time to re-execute all the instructions coming from decoding a single macro-instruction may be less than the total time saved. Then, re-execution by the back-out methods of FIGS. 4 and 5 may increase the effective performance of a processor.
  • FIG. 6 is a high-level block diagram illustrating an embodiment of an out-of-order processor 60 that employs speculative execution and also backs out of some architectural states for re-executions involving selected registers 22 , 23 , 24 .
  • a line 61 brings incoming instructions to a decoder 64 .
  • the decoder 63 includes a multiplexer (MUX) 63 having first and second input ports 62 , 102 .
  • the first and second input ports 62 , 100 receive newly decoded instructions and instructions for re-execution, respectively.
  • the decoder 64 sends instructions from an output port 65 of the MUX 63 to the allocator 21 .
  • MUX multiplexer
  • the allocator 21 may write and read identifiers of a portion of the registers 22 , 23 , 24 to and from a register allocation table (“RAT”) 66 .
  • the rows 67 , 68 , 69 of the RAT 66 have both speculative and architectural assignment positions 70 , 71 to store the identifiers of the portion of the registers 22 , 23 , 24 assigned to the destination logical operands of the instructions.
  • the execution units 27 , 72 in the particular embodiment of FIG. 6 may also execute the instructions out-of-order.
  • a reorder queue (“ROQ”) 73 saves the original instruction sequence so that retirement of executed instructions, may be performed in-order.
  • the retirement unit 26 may write the identifiers of selected classes of the registers 22 , 23 , 24 to a back-out register 74 having one or more storage positions (not shown).
  • the allocator 21 assignments are initially speculative.
  • the retirement unit 26 flushes unretired instructions from portions of the processor 60 between the allocator 21 and the retirement unit 26 in response to certain exceptions.
  • the processor 60 may copy the entries of the architectural assignment positions 71 to the speculative assignment positions 70 .
  • re-execution of unretired and improperly executed instructions may start from the earlier state defined by the architectural register assignments.
  • the speculative assignments become architectural in response to the proper retirement of the instruction to which the assignments were made.
  • re-execution in response to such exceptions does not entail backing out of the “architectural” state defined by the assignments of retired instructions .
  • FIG. 7 is a time line 80 of the RAT 66 and the back-out register 74 as instructions I 0 and I 1 progress through of the embodiment of the processor 60 illustrated in FIG. 6.
  • the instruction I 0 retires.
  • the row 68 of the RAT 66 for the logical operand X stores the identifier of the register R 2 in both the speculative and the architectural assignment positions 70 , 71 , because allocator 21 had assigned register R 2 to I 0 .
  • the speculative and architectural assignment positions 70 , 71 may store identical identifiers between the retirement of an instruction and the allocation of a new register 22 , 23 , 24 to a second instruction having the same destination logical operand as the first instruction.
  • the entries R 2 , R 3 , and R 4 of the RAT 66 are “used” registers, meaning they may be read by active and/or incoming instructions. Active and incoming instructions may read data from the registers R 2 and R 3 in the speculative assignment positions 70 . Active and incoming instructions may also read the registers R 2 and R 4 in the architectural assignments positions 71 if the retirement unit 26 copies architectural register assignments to corresponding speculative assignments in response to an exception. As discussed in respect to FIG. 6, this corresponds to a re-execution without a back out from an architectural state, which is instituted for certain exceptions in the embodiment of FIG. 6.
  • the register identifiers R 2 , R 3 , and R 4 in either the speculative or the architectural assignment positions 70 , 71 correspond to physical addresses of “used” registers 22 , 23 , 24 , because unretired instructions may read the data stored therein in this embodiment.
  • the allocator 21 assigns the register R 1 to the logical operand X of the instruction I 1 at block 86 .
  • the speculative assignment position 70 for the logical operand X stores the identifier R 1 in response to assignment of block 86 .
  • the instruction I 1 retires without exceptions.
  • the retirement unit 26 writes the identifier R 1 to the architectural assignment position 71 for the logical operand X and writes the identifier R 2 , from the previous architectural assignment for X, to the back-out register 74 .
  • the register R 2 is a “delayed” register, as defined above, because unretired instructions may not read R 2 even in response to an exception.
  • the register R 2 may be read if the processor 60 performs a re-execution by backing out of the writes by retired instructions. i.e., instructions that were properly executed.
  • the processor 60 may handle a selected class of exceptions in a manner that includes backing out of writes by selected retired instructions, i.e., instructions that have been determined to have properly executed.
  • the processor 60 backs out of writes by the retired instructions to execute a new instruction in response to the selected class of exceptions.
  • the new instruction is executed in a “previous” architectural register state.
  • the back-out register 74 stores register assignments for logical operands of the selected retired instructions. These register assignments enable backing out of the present architectural register state so that the execution of the new instruction can be performed with the “previous” architectural state.
  • back-out execution enables the processor 60 to execute an entire sequence of instructions in a previous architectural register state. For example, one embodiment performs a back-out execution of a new sequence of instructions. i.e., ⁇ I′ 1 , ⁇ I′ 2 , etc., in response to exceptions occurring on any instruction of a selected sequence ⁇ I 1 , ⁇ I 2 , etc., wherein the sequence comes from decoding one macro-instruction.
  • the new sequence ⁇ I′ 1 , ⁇ I′ 2 may differ from the original sequence, ⁇ I 1 , ⁇ I 2 , etc., to correct the problems that caused the exception.
  • the processor 60 effectively re-executes all of the sequence.
  • such a procedure may reduce the hardware and time costs employed for detecting the selected exceptions.
  • the retirement unit 26 delays the deallocation of selected registers 22 , 23 , 24 of previously retired instructions by transferring the corresponding identifiers of the registers 22 , 23 , 24 from the architectural assignment positions 71 to the back-out register 74 .
  • the retirement unit 26 does not inform the allocator 21 that the delayed registers 22 , 23 , 24 are available.
  • the retirement unit 26 writes the identifiers of the registers 22 , 23 , 24 of the previously retired instructions to the back-out register 74 in response to determining that a later instruction, having the same destination logical operand, is ready to retire.
  • the decoder 64 receives instructions for back-out re-execution from line 100 .
  • the retirement unit 26 directs the back-out re-execution by a signal to a select input port 102 of the MUX 63 .
  • the selected logical operands of the instructions for back-out re-execution are assigned register identifiers from the back-out register 74 .
  • the logical operand X becomes the register R 2 in the example of block 92 in FIG. 7.
  • microcode (not shown) creates the machine code for the instructions for back-out re-execution.
  • the machine code may also contain one or more bits that direct the allocator 21 not to assign other registers 22 , 23 , 24 to logical operands already assigned identifiers of “delayed” registers.
  • FIG. 8 is a flowchart illustrating an embodiment 110 of a method of operating of the processor 60 of FIG. 6.
  • the allocator 21 receives a first instruction having a destination logical operand X.
  • the allocator 21 assigns a first register to the logical operand X and writes the corresponding first identifier thereof to the speculative assignment position 70 in the RAT 66 for X.
  • Subsequent instructions with the source logical operand X will read the first register.
  • the retirement unit 26 retires the executed first instruction and writes the first identifier to the architectural assignment position 71 for X.
  • the allocator 21 writes a second identifier, corresponding to a second register, to speculative assignment position 70 for X in response to the second instruction having the address logical operand X.
  • the retirement unit 26 writes the first identifier from the architectural assignment position 71 for X to the back-out register 74 in response to determining that the second instruction is ready to retire.
  • the first register is a delayed register, and active and/or incoming instructions may neither read or write from or to the first register.
  • the retirement unit 26 writes the second identifier to the architectural assignment position 71 for X in response to retiring the second instruction.
  • some embodiments deallocate the first register in response retiring another instruction with the destination logical operand X.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

A processor having a plurality of registers is provided. The processor is capable of re-executing at least one selected instruction by backing out of an architectural register state. A method is provided for backing a processor out of an architectural state. The method comprises reassigning a register to a logical operand of an instruction, the register having been assigned to the logical operand in a previous architectural state; and re-executing the instruction.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0001]
  • The present invention relates generally to computers and processors, and more specifically, to delaying the deallocation of registers and backing out of architectural states. [0002]
  • 2. Description of the Related Art [0003]
  • Processors fetch and execute a sequence of instructions from memory. The instructions ordinarily manipulate data stored in memory or registers. Typically, the processor decodes the instructions into first and second types of instructions adapted to execution on particular types of hardware units. The first type of micro-instruction loads and stores data between the memory and registers, which are typically internal to the processor. The second type of micro-instruction manipulates data stored in the internal registers and writes the results from the manipulations back to the internal registers. Since the number of internal registers is limited, an absence of available internal registers may occur causing a bottleneck at the decode stage. The processor ordinarily employs methods that efficiently use the internal registers to reduce the occurrence of decode bottlenecks. [0004]
  • One mechanism for using the limited number of internal registers entails producing instructions through several operations. First, the processor decodes an incoming instruction into one or more instructions having logical operands. Hereafter, logical operands are defined to mean dummy variables for some source and destination addresses of instructions. Second, an allocator assigns one or more of the available internal registers to the logical operands introduced in the first step. Third, a retirement unit deallocates the previously assigned internal registers of executed instructions without substantial delay when other instructions no longer need to read the contents of the registers. Deallocation makes more internal registers available for assignment to newly decoded instructions. Thus, retirement units should rapidly deallocate registers to reduce the occurrence of instruction decode bottlenecks. [0005]
  • Processors also have hardware for recovering from what are referred to as execution “exceptions”. Exceptions may be attributable to interrupts and faults generated during execution of instructions. Recovering from an exception involves both detecting the exception and reporting the exception to hardware that may re-execute any improperly executed instructions. The proper re-execution normally involves returning the processor to a pre-exception state. Thus, re-execution may include restoring original data to internal registers and reinserting the excepting instruction and the instructions dependent thereupon back into execution pipelines. [0006]
  • A system designed to detect and report all exceptions may employ substantial hardware, i.e., a large area on the processor chip, and may encumber the ordinary retirement cycle. The detection of complex fault events may entail heavy area and time costs, because more verifications are ordinarily employed to check for complex faults. Complex fault detection may slow the retirement process with verifications for rarely occurring faults. [0007]
  • For a macro-instruction, I[0008] 1, decoding into a sequence μI1, μI2, etc., an exception may occur on both the earlier and later members of the sequence, e.g. μI1, or μI2. Two methods may be pursued to recover from an exception on a later member, e.g. μI2. First, the processor may correct the condition causing the exception and re-execute only the excepting instructions by (a) detecting which instruction excepted, and (b) reinstating the initial execution state associated therewith. Second, the processor may correct the condition causing the exception and re-execute the entire sequence, i.e., μI′1, μI′2, etc., whenever any member of the sequence registers an exception. Implementing either of the above methods may be problematic.
  • Since detecting exceptions on individual members of a sequence may be complex, re-executing the entire sequence from decoding the macro-instruction may save time and reduce hardware needs. But, the sequence from the macro-instruction may include “retired” instructions, because earlier members, e.g., μI[0009] 1, may have completed execution. For example, the instruction R+R′→R destroys the original data in R when the instruction is retired, i.e., the architectural state has changed. Thus, executing earlier members of the sequence, e.g., μI1, may be problematic. Prior art processors may not handle exceptions on instructions produced by decoding a single macro-instruction inefficiently.
  • The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above. [0010]
  • SUMMARY OF THE INVENTION
  • A first aspect of the present invention provides an apparatus. The processor has a plurality of registers. The processor is capable of re-executing at least one selected instruction by backing out of an architectural register state. A second aspect of the present provides a method for backing a processor out of an architectural state. The method comprises reassigning a register to a logical operand of an instruction, the register having been assigned to the logical operand in a previous architectural state; and re-executing the instruction.[0011]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Other objects and advantages of the invention will become apparent upon reading the following detailed description and upon reference to the drawings in which: [0012]
  • FIG. 1 is a high-level block diagram of an embodiment, in accordance with the present invention, of a processor that delays the deallocation of a portion of the registers; [0013]
  • FIG. 2 is a flowchart illustrating an embodiment of a method for executing instructions in the processor of FIG. 1; [0014]
  • FIG. 3 is a high-level block diagram of an embodiment of a processor having a back-out register for use in delaying the deallocation of selected registers: [0015]
  • FIG. 4 is a flowchart illustrating an embodiment of a method for re-executing selected instructions in the processor of FIG. 3; [0016]
  • FIG. 5 is a flowchart illustrating an embodiment of a method of re-executing selected instructions in the processor of FIG. 3 by backing out of an architectural state; [0017]
  • FIG. 6 is high-level block diagram an embodiment of a processor which implements speculative execution and also backs out of architectural states for re-executions involving selected registers; [0018]
  • FIG. 7 illustrates a time line of the register allocation table and back-out register as instructions progress through the processor of FIG. 6: and [0019]
  • FIG. 8 is a flowchart illustrating an embodiment of one method of operating the processor of FIG. 6. [0020]
  • While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims. [0021]
  • DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
  • Specific embodiments of the invention are described below. In the interest of clarity, not all features of an actual implementation are described in this specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions may be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it will be appreciated that such a development effort, even if complex and time-consuming, would be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure. [0022]
  • Hereafter, an architectural state is the state of a processor's registers and memories after writes by all executed instructions determined to have executed properly, i.e., after sites by all properly retired instructions. A speculative state is the state of processor's registers and memories after writes by all executed instructions, i.e., after writes by all executed instructions whether or not the instructions have been determined to have executed properly. A processor updates a speculative state to an architectural state after a retirement unit determines that the instructions, which will update the state, have properly executed. [0023]
  • FIG. 1 is a high-level block diagram illustrating a portion of a first embodiment of a [0024] processor 20 that delays the deallocation of selected registers. In some embodiments, the selected registers include all internal registers. An allocator 21 is a hardware device that assigns registers 22, 23, 24 to a portion of the logical operands of incoming instructions. In some embodiments, the registers 22, 23, 24 belong to a register file 25, i.e., a hardware structure for handling and directing accesses of the plurality of internal registers 22, 23, 24. A retirement unit 26 retires instructions that have been executed by an execution unit 27. The retirement unit 26 deallocates the registers 22, 23. 24 that were assigned to the executed instructions. Deallocation makes the registers 22, 23, 24 available for assignment to new incoming instructions by the allocator 21. The retirement unit 26 delays the deallocation of a selected portion of the registers 22, 23, 24.
  • Still referring to FIG. 1, the [0025] registers 22, 23, 24 may be classified into available registers, used registers, and delayed registers. The allocator 21 may assign an “available” register to a logical operand of an incoming instruction. The allocator 21 may not assign either “used” or “delayed” registers to the logical operands of incoming instructions. The “used” and the “delayed” registers are not deallocated in the sense that the allocator 21 may not reassign them to a destination logical operand of incoming instruction. By definition, at least one instruction may read or write a “used” register. Active instructions may neither read nor write “delayed” registers. The processor 20 saves the identifiers of “delayed” registers. Register identifiers are physical addresses. Thus, execution results in “delayed” registers may be accessed and used even though the instructions that produced the results are retired, and the results have been removed from the processor's architectural state.
  • Hereafter, a class of registers stores a type of data, e.g., floating-point data, integer data, predicate values, multimedia data, etc. The term class may also apply to logical operands, e.g., floating-point data, integer data, predicate values, multimedia data. etc. In various embodiments, the classes may also store types of data, which are not numerated above. [0026]
  • FIG. 2 is a flowchart illustrating an embodiment of a [0027] method 30 for executing instructions in the processor 20 of FIG. 1. At block 31, the allocator 21 assigns a first register 23 to a first logical operand of a first instruction. At block 32, the allocator 21 assigns a second register 24 to a second logical operand of a second instruction. The second instruction follows the first instruction in the instruction sequence. In the illustrated embodiments, the first and second logical operands are the same logical operand. In other embodiments, the first and second logical operands may be different logical operands as long as they belong to the same preselected class, e.g., floating point. At block 33, the execution unit 27 executes the first and second instructions. At block 34, the retirement unit 26 retires the executed first instruction. At block 35, the retirement unit 26 saves the identifier of the first register in response to retiring the second instruction. The first register 23 is a “delayed” register, i.e. the contents therein may still be retrieved.
  • Still referring to FIG. 1 and [0028] 2, the retirement unit 26 delays the deallocation and saves the identifiers of a preselected classes of the registers 22, 23. 24 and logical operands or of the registers of selected classes of instructions. Different embodiments may select different classes of the registers 22, 23, 24 and logical operands or different classes of instructions. In specific embodiments, the retirement unit 26 may delay the deallocation of one or of more than one selected class of registers. In some embodiments, the retirement unit 26 may delay the deallocation of the registers 22. 23, 24 associated with specific instruction classes, e.g., one or more registers 22. 23, 24 assigned to instructions resulting from the decoding of a selected single macro-instruction.
  • Some embodiments in accordance with the invention may employ “backing out” of an architectural state in a processor that speculatively executes instructions. FIG. 3 is a high-level block diagram illustrating one embodiment of a [0029] processor 38 that includes a back-out register file 39 to delay the deallocation of selected registers. The back-out register file 39 has storage positions 40, 41 to save the identifiers of the registers 22, 23, 24, previously assigned to the selected destination logical operand of one or more retired instructions. The back-out register file 39 may comprise one or several registers. The retirement unit 26 writes the identifier of the register 22, 23, 24 assigned to a destination logical operand of a selected and retired first instruction to the back-out register file 39 in response to retiring a second instruction belonging to the same class. In some embodiments, the second instruction is an instruction having the same destination logical operand as the first instruction. In the prior art, the previously assigned registers might have been deallocated, because unretired instructions may no longer read the data stored in the register assigned to the first instruction after the retirement of the second instruction having the same destination logical register. The portion of the registers 22. 23, 24 with identifiers stored in the back-out register file 39 are “delayed” registers.
  • Still referring to FIG. 3, the [0030] processor 38 re-executes instructions in response to selected exceptions detected by the retirement unit 26. A decoder 44 translates incoming instructions into sequences of instructions, and sends the instructions to the allocator 21. The retirement unit 26 detects selected exceptions and sets instructions for re-execution in response to the selected exceptions. The retirement unit 26 may signal microcode 45 to prepare instructions for re-execution. Microcode is a combination of hardware and specialized permanent memory, e.g., read-only memory (ROM), that performs a special function and is ordinarily internal to the processor. In response to the signal from the retirement unit 26, the microcode 45 reads the back-out register file 39 to obtain the identifiers of the portion of the registers 22, 23, 24 previously assigned to the selected logical operands. The microcode 45 produces machine code for the instructions for re-execution. During the re-execution, selected logical operands are assigned the portion of the registers 22, 23, 24, which were previously assigned and correspond to the identifiers saved in the back-out register file 39, i.e. the “delayed registers. The microcode 45 introduces the previous register assignments in the machine code of the instructions for re-execution. This may be referred to as “backing out architectural register assignments.” An output line 46 sends the instructions to re-execute from the microcode 45 to the execution unit 27.
  • FIG. 4 is a flowchart illustrating an embodiment of a [0031] method 47 for re-executing selected instructions in the processor 38 of FIG. 3. The method includes backing out of an architectural register state. At block 48, the processor 38 executes a first instruction having a first register as a destination address. At block 49, the retirement unit 26 retires a second instruction having a second register 22 as a destination address. The first and second registers 23, 22 have been assigned to the same selected logical operand by the allocator 21. At block 50, the retirement unit 26 makes the second register 22 a “delayed” register in response to determining that the first instruction is ready to retire. At block 51, the retirement unit 26 retires the first instruction, already having retired the second instruction. At retirement, the first register 23 becomes an architectural register that may still be read by unretired instructions. At block 52, the processor 38 re-executes a third instruction having the selected logical operand as a source or as a destination address. The third instruction may be one of the above-mentioned instructions or another instruction. Re-executing includes reassigning the second register 22, i.e. a delayed register, to the same selected logical operand in the third instruction. By reassigning the second register to the selected logical operand, re-execution backs out of the architectural assignment.
  • Referring still to FIG. 4, some embodiments may deallocate a register if another instruction having a destination register of the selected class retires. For example, at [0032] block 53, the retirement unit 26 deallocates the first register 23 in response to retiring a fourth instruction having the different register 24 assigned to the same logical operand. In other embodiments (not shown), the storage positions 40, 41 may store the identifiers of both the portion of the registers 22, 23, 24 previously assigned and before previously assigned to the selected logical operands. Such an embodiment may back-out of several changes to the architectural register state. In other embodiments, the storage positions 40, 41 may store the identifiers of a portion of the registers 22, 23, 24 previously assigned to several selected logical operands.
  • FIG. 5 is a flowchart illustrating an embodiment of a [0033] method 54 of re-executing instructions in the processor 38 of FIG. 3 by backing out of an architectural register state. Blocks 48, 49, 50, 51, and 52 were described in FIG. 4. At block 55, the retirement unit 26 writes the identity of the second register 22 to one of the positions 40, 41 of the back-out register file 39. The positions 40, 41 correspond to a destination logical operand X to which the second register was assigned by the allocator 21. At block 56, the retirement unit 26 sets a third instruction for re-execution, e.g., in response to an exception. The third instruction has the X logical operand as a source address. At block 57, the microcode 45 reads the back-out register file 22 to determine the identifier of the previously assigned register for the logical operand X, i.e. the second register 22, and reassigns the identifier therefrom to the logical operand X in the third instruction. At block 58, the microcode 45 redirects the decoder 44 to send the third instruction for re-execution with the logical operand X replaced by the second register 22. Thus, the processor 38 backs out of an architectural register state to re-execute the third instruction.
  • In one embodiment the first and second instructions of FIGS. 4 and 5 result from decoding an incoming “packed” floating-point macro-instruction. i.e. an instruction performing several floating-point operations in parallel, or multimedia macro-instruction. In this embodiment, the exceptions stimulating a back-out of an architectural state occur on the second or later sequential instruction of the same class. The [0034] processor 38 of FIG. 3 recovers from exceptions on either the first or the second instructions by correcting and re-executing both. In some embodiments, correcting and re-executing both instructions may the time and hardware used to detect exceptions. The architectural register state is no longer proper for re-executing the first or earlier sequential instruction, but the back-out register file 39 enables the processor 38 to restore the proper register state.
  • Some embodiments may increase efficiency by re-executing all instructions coming from decoding a selected macro-instructions even when only a subset of the instructions encounter exceptions. This method may reduce the amount of hardware employed for detecting exceptions. Similarly, less operating time may be used to determine whether any, as opposed to which, of the selected instructions encountered an exception. In some embodiments, the time costs to individually detect the selected exceptions are high, and the selected exceptions are rare. Then, the added time to re-execute all the instructions coming from decoding a single macro-instruction may be less than the total time saved. Then, re-execution by the back-out methods of FIGS. 4 and 5 may increase the effective performance of a processor. [0035]
  • FIG. 6 is a high-level block diagram illustrating an embodiment of an out-of-[0036] order processor 60 that employs speculative execution and also backs out of some architectural states for re-executions involving selected registers 22, 23, 24. A line 61 brings incoming instructions to a decoder 64. The decoder 63 includes a multiplexer (MUX) 63 having first and second input ports 62, 102. The first and second input ports 62, 100 receive newly decoded instructions and instructions for re-execution, respectively. The decoder 64 sends instructions from an output port 65 of the MUX 63 to the allocator 21. The allocator 21 may write and read identifiers of a portion of the registers 22, 23, 24 to and from a register allocation table (“RAT”) 66. The rows 67, 68, 69 of the RAT 66 have both speculative and architectural assignment positions 70, 71 to store the identifiers of the portion of the registers 22, 23, 24 assigned to the destination logical operands of the instructions. The execution units 27, 72 in the particular embodiment of FIG. 6 may also execute the instructions out-of-order. A reorder queue (“ROQ”) 73 saves the original instruction sequence so that retirement of executed instructions, may be performed in-order. The retirement unit 26 may write the identifiers of selected classes of the registers 22, 23, 24 to a back-out register 74 having one or more storage positions (not shown).
  • Still referring to FIG. 6, the allocator [0037] 21 assignments are initially speculative. The retirement unit 26 flushes unretired instructions from portions of the processor 60 between the allocator 21 and the retirement unit 26 in response to certain exceptions. To recover from the exceptions, the processor 60 may copy the entries of the architectural assignment positions 71 to the speculative assignment positions 70. Then, re-execution of unretired and improperly executed instructions may start from the earlier state defined by the architectural register assignments. The speculative assignments become architectural in response to the proper retirement of the instruction to which the assignments were made. Thus, re-execution in response to such exceptions, as opposed to the selected exceptions of FIGS. 1-5, does not entail backing out of the “architectural” state defined by the assignments of retired instructions .
  • FIG. 7 is a [0038] time line 80 of the RAT 66 and the back-out register 74 as instructions I0 and I1 progress through of the embodiment of the processor 60 illustrated in FIG. 6. At block 82, the instruction I0 retires. At block 84, the row 68 of the RAT 66 for the logical operand X stores the identifier of the register R2 in both the speculative and the architectural assignment positions 70, 71, because allocator 21 had assigned register R2 to I0. The speculative and architectural assignment positions 70, 71 may store identical identifiers between the retirement of an instruction and the allocation of a new register 22, 23, 24 to a second instruction having the same destination logical operand as the first instruction.
  • At [0039] block 84 of FIG. 7, the entries R2, R3, and R4 of the RAT 66 are “used” registers, meaning they may be read by active and/or incoming instructions. Active and incoming instructions may read data from the registers R2 and R3 in the speculative assignment positions 70. Active and incoming instructions may also read the registers R2 and R4 in the architectural assignments positions 71 if the retirement unit 26 copies architectural register assignments to corresponding speculative assignments in response to an exception. As discussed in respect to FIG. 6, this corresponds to a re-execution without a back out from an architectural state, which is instituted for certain exceptions in the embodiment of FIG. 6. The register identifiers R2, R3, and R4 in either the speculative or the architectural assignment positions 70, 71 correspond to physical addresses of “used” registers 22, 23, 24, because unretired instructions may read the data stored therein in this embodiment.
  • Still referring to FIG. 7, the [0040] allocator 21 assigns the register R1 to the logical operand X of the instruction I1 at block 86. At block 88, the speculative assignment position 70 for the logical operand X stores the identifier R1 in response to assignment of block 86. At block 90, the instruction I1 retires without exceptions. At block 92, the retirement unit 26 writes the identifier R1 to the architectural assignment position 71 for the logical operand X and writes the identifier R2, from the previous architectural assignment for X, to the back-out register 74. The register R2 is a “delayed” register, as defined above, because unretired instructions may not read R2 even in response to an exception. The register R2 may be read if the processor 60 performs a re-execution by backing out of the writes by retired instructions. i.e., instructions that were properly executed.
  • Referring back to FIG. 6, the [0041] processor 60 may handle a selected class of exceptions in a manner that includes backing out of writes by selected retired instructions, i.e., instructions that have been determined to have properly executed. In one embodiment, the processor 60 backs out of writes by the retired instructions to execute a new instruction in response to the selected class of exceptions. The new instruction is executed in a “previous” architectural register state. The back-out register 74 stores register assignments for logical operands of the selected retired instructions. These register assignments enable backing out of the present architectural register state so that the execution of the new instruction can be performed with the “previous” architectural state.
  • In some embodiments, back-out execution enables the [0042] processor 60 to execute an entire sequence of instructions in a previous architectural register state. For example, one embodiment performs a back-out execution of a new sequence of instructions. i.e., μI′1, μI′2, etc., in response to exceptions occurring on any instruction of a selected sequence μI1, μI2, etc., wherein the sequence comes from decoding one macro-instruction. The new sequence μI′1, μI′2, may differ from the original sequence, μI1, μI2, etc., to correct the problems that caused the exception. In this embodiment, the processor 60 effectively re-executes all of the sequence. e.g., μI1 μI2, etc., even though the architectural state has changed due to the retirement of earlier instructions of the sequence, i.e. instructions not registering exceptions. In some embodiments, such a procedure may reduce the hardware and time costs employed for detecting the selected exceptions.
  • Referring to FIG. 6, the [0043] retirement unit 26 delays the deallocation of selected registers 22, 23, 24 of previously retired instructions by transferring the corresponding identifiers of the registers 22, 23, 24 from the architectural assignment positions 71 to the back-out register 74. The retirement unit 26 does not inform the allocator 21 that the delayed registers 22, 23, 24 are available. The retirement unit 26 writes the identifiers of the registers 22, 23,24 of the previously retired instructions to the back-out register 74 in response to determining that a later instruction, having the same destination logical operand, is ready to retire.
  • Referring still to FIG. 6, the [0044] decoder 64 receives instructions for back-out re-execution from line 100. The retirement unit 26 directs the back-out re-execution by a signal to a select input port 102 of the MUX 63. The selected logical operands of the instructions for back-out re-execution are assigned register identifiers from the back-out register 74. For example, the logical operand X becomes the register R2 in the example of block 92 in FIG. 7. In some embodiments, microcode (not shown) creates the machine code for the instructions for back-out re-execution. The machine code may also contain one or more bits that direct the allocator 21 not to assign other registers 22, 23, 24 to logical operands already assigned identifiers of “delayed” registers.
  • FIG. 8 is a flowchart illustrating an [0045] embodiment 110 of a method of operating of the processor 60 of FIG. 6. At block 112, the allocator 21 receives a first instruction having a destination logical operand X. At block 114, the allocator 21 assigns a first register to the logical operand X and writes the corresponding first identifier thereof to the speculative assignment position 70 in the RAT 66 for X. Subsequent instructions with the source logical operand X will read the first register. At block 116, the retirement unit 26 retires the executed first instruction and writes the first identifier to the architectural assignment position 71 for X. At block 118, the allocator 21 writes a second identifier, corresponding to a second register, to speculative assignment position 70 for X in response to the second instruction having the address logical operand X. At block 120, the retirement unit 26 writes the first identifier from the architectural assignment position 71 for X to the back-out register 74 in response to determining that the second instruction is ready to retire. Now, the first register is a delayed register, and active and/or incoming instructions may neither read or write from or to the first register. At block 122, the retirement unit 26 writes the second identifier to the architectural assignment position 71 for X in response to retiring the second instruction. At block 124, some embodiments deallocate the first register in response retiring another instruction with the destination logical operand X.
  • The particular embodiments disclosed above are illustrative only, as the invention may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Furthermore, no limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope and spirit of the invention. Accordingly, the protection sought herein is as set forth in the claims below. [0046]

Claims (33)

What is claimed is:
1. A apparatus, comprising:
a processor having a plurality of registers, the processor being capable of executing at least one selected instruction by backing out of an architectural register state.
2. The apparatus as set forth in claim 1, the processor further comprising:
at least one back-out register adapted to store an identifier of a delayed register, the processor capable of reassigning the identifier to a logical operand of the selected instruction.
3. The apparatus as set forth in claim 1, wherein the processor is capable of executing a plurality of instructions speculatively.
4. The apparatus as set forth in claim 1, the processor further comprising a retirement unit to delay deallocation of a first one of the registers in response to the retirement of a second one of the registers.
5. A method for backing a processor out of an architectural state, the method comprising:
reassigning a register to a logical operand of an instruction, the register having been assigned to the logical operand in a previous architectural state; and
executing the instruction.
6. The method as set forth in claim 5, wherein the act of reassigning assigns a register that is neither used or available.
7. The method as set forth in claim 5, further comprising delaying deallocation of the register in response to retiring a second instruction.
8. The method as set forth in claim 7, wherein the act of delaying comprises inserting an identifier of the register in a back-out register and wherein the act of reassigning includes writing the identifier from the back-out register into a machine code for the instruction.
9. A processor, comprising:
a plurality of registers;
an allocator to assign the registers to logical operands of instructions;
at least one execution unit; and
a retirement unit to retire instructions executed by the execution unit and to deallocate the registers assigned to logical operands of instructions, the retirement unit being capable of changing at least one of the registers assigned to a first instruction to a delayed register.
10. The processor as set forth in claim 9, wherein the retirement unit is adapted to delay the deallocation of the one of the registers in the absence of an instruction capable of reading the one of the registers.
11. The processor as set forth in claim 9, wherein the plurality of registers belong to a preselected class.
12. The processor as set forth in claim 9, wherein the plurality of registers include at least one of a floating-point register and a multimedia register.
13. The processor as set forth in claim 9, wherein the retirement unit is adapted to delay the deallocation of a first register assigned to a destination logical operand of a first retired instruction in response to determining that a second instruction is ready to retire, the second instruction having a second register assigned to the destination logical operand.
14. The processor as set forth in claim 9, further comprising a back-out register having at least one storage position, the retirement unit to write an identifier of the one of the registers to the storage position to change the one of the registers to a delayed register.
15. The processor as set forth in claim 14, wherein the storage position corresponds to a particular logical operand.
16. The processor as set forth in claim 9, wherein the one of the registers is assigned to a first instruction and wherein the retirement unit is capable of changing the one of the registers to a delayed register in response to retiring a second instruction, the second instruction having a second register, the first and second registers being architectural and speculative registers, respectively, assigned to the same destination logical operand.
17. A processor capable of backing out of an architectural state, comprising:
a plurality of registers;
a decoder;
an allocator to assign the registers to destination logical operands of a portion of instructions received from the decoder;
at least one execution unit to execute a portion of instructions; and
a retirement unit to set selected instructions for back-out re-execution and to delay deallocation of a first register assigned to a first retired instruction in response to retiring a second instruction, a second register assigned to the second instruction.
18. The processor as set forth in claim 17, wherein the first and second registers correspond to the same destination logical operand in the first and second instructions, respectively.
19. The processor as set forth in claim 17, wherein the retirement unit is adapted to assign the first register to a third instruction in response to sending the third instruction for back-out execution, the first register being assigned to the same logical operand in the third instruction and the first instruction.
20. The processor as set forth in claim 17, wherein the retirement unit is adapted to set instructions for back-out execution in response to selected exceptions.
21. The processor as set forth in claim 17, further comprising:
a back-out register, the retirement unit adapted to write an identifier of the first register to the back-out register in response to retiring the second instruction; and
microcode to receive the identifier from the back-out register and to insert the identifier to one of the selected instructions in response the one of the selected instructions being sent for back-out execution, the one of the selected instructions having a logical operand, the first register being assigned to the logical operand in the first instruction.
22. A method, comprising:
allocating a first register to a logical operand in a first instruction;
allocating a second register to the logical operand in a second instruction;
executing the first and second instructions;
retiring the first instruction; and
saving the identifier of the first register in response to retiring the second instruction.
23. The method as set forth in claim 22, wherein the act of saving includes delaying the deallocation of the first register.
24. The method as set forth in claim 22, further comprising executing a third instruction, the third instruction having the logical operand as an address, the act of executing including reassigning the first register to the logical operand in the third instruction.
25. The method as set forth in claim 24, wherein the act of executing a third instruction is in response to detecting a preselected exception.
26. The method as set forth in claim 24, wherein the act of executing a third instruction is performed in response to an exception on a fourth instruction, the fourth instruction being a non-leading instruction in a sequence of instructions generated by decoding a selected macro-instruction.
27. The method as set forth in claim 22, wherein the act of saving includes transferring the identifier for the first register from the architectural register assignment position in a register allocation table to a back-out register.
28. A method, comprising:
executing a first instruction having a first register assigned to a destination logical operand;
retiring a second instruction assigned a second register to the destination logical operand;
delaying the deallocation of the second register in response to determining that the first instruction is ready to retire;
retiring the first instruction; and
executing a third instruction having the logical operand as a source or destination address, the act of executing including assigning the second register to the logical operand.
29. The method as set forth in claim 28, wherein the act of executing the third instruction is in response to a preselected exception.
30. The method as set forth in claim 28, wherein the act of delaying includes writing an identifier of the second register to a position in a back-out register, the position corresponding to the logical operand.
31. The method as set forth in claim 28, wherein the act of executing includes reading the identifier from the back-out register and writing the identifier to the position for the logical operand in a machine code for the third instruction.
32. The method as set forth in claim 28, wherein the act of executing includes redirecting the instruction flow to receive the third instruction.
33. The method as set forth in claim 28, wherein the act of executing the third instruction includes re-executing a properly retired instruction.
US09/132,042 1998-08-11 1998-08-11 Backing out of a processor architectural state Expired - Lifetime US6412067B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/132,042 US6412067B1 (en) 1998-08-11 1998-08-11 Backing out of a processor architectural state

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/132,042 US6412067B1 (en) 1998-08-11 1998-08-11 Backing out of a processor architectural state

Publications (2)

Publication Number Publication Date
US20020032852A1 true US20020032852A1 (en) 2002-03-14
US6412067B1 US6412067B1 (en) 2002-06-25

Family

ID=22452179

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/132,042 Expired - Lifetime US6412067B1 (en) 1998-08-11 1998-08-11 Backing out of a processor architectural state

Country Status (1)

Country Link
US (1) US6412067B1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8149243B1 (en) * 2006-07-28 2012-04-03 Nvidia Corporation 3D graphics API extension for a packed float image format
US20140281422A1 (en) * 2013-03-15 2014-09-18 Soft Machines, Inc. Method and Apparatus for Sorting Elements in Hardware Structures
US9582322B2 (en) 2013-03-15 2017-02-28 Soft Machines Inc. Method and apparatus to avoid deadlock during instruction scheduling using dynamic port remapping
US9627038B2 (en) 2013-03-15 2017-04-18 Intel Corporation Multiport memory cell having improved density area
US9891915B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method and apparatus to increase the speed of the load access and data return speed path using early lower address bits
US9946538B2 (en) 2014-05-12 2018-04-17 Intel Corporation Method and apparatus for providing hardware support for self-modifying code

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6625756B1 (en) * 1997-12-19 2003-09-23 Intel Corporation Replay mechanism for soft error recovery
US7406565B2 (en) * 2004-01-13 2008-07-29 Hewlett-Packard Development Company, L.P. Multi-processor systems and methods for backup for non-coherent speculative fills
US7376794B2 (en) * 2004-01-13 2008-05-20 Hewlett-Packard Development Company, L.P. Coherent signal in a multi-processor system
US7340565B2 (en) * 2004-01-13 2008-03-04 Hewlett-Packard Development Company, L.P. Source request arbitration
US8281079B2 (en) * 2004-01-13 2012-10-02 Hewlett-Packard Development Company, L.P. Multi-processor system receiving input from a pre-fetch buffer
US8301844B2 (en) * 2004-01-13 2012-10-30 Hewlett-Packard Development Company, L.P. Consistency evaluation of program execution across at least one memory barrier
US7383409B2 (en) 2004-01-13 2008-06-03 Hewlett-Packard Development Company, L.P. Cache systems and methods for employing speculative fills
US7380107B2 (en) * 2004-01-13 2008-05-27 Hewlett-Packard Development Company, L.P. Multi-processor system utilizing concurrent speculative source request and system source request in response to cache miss
US7409500B2 (en) * 2004-01-13 2008-08-05 Hewlett-Packard Development Company, L.P. Systems and methods for employing speculative fills
US7409503B2 (en) * 2004-01-13 2008-08-05 Hewlett-Packard Development Company, L.P. Register file systems and methods for employing speculative fills
US7360069B2 (en) * 2004-01-13 2008-04-15 Hewlett-Packard Development Company, L.P. Systems and methods for executing across at least one memory barrier employing speculative fills

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5797013A (en) * 1995-11-29 1998-08-18 Hewlett-Packard Company Intelligent loop unrolling
US6182210B1 (en) 1997-12-16 2001-01-30 Intel Corporation Processor having multiple program counters and trace buffers outside an execution pipeline
US6240509B1 (en) 1997-12-16 2001-05-29 Intel Corporation Out-of-pipeline trace buffer for holding instructions that may be re-executed following misspeculation
US6047370A (en) 1997-12-19 2000-04-04 Intel Corporation Control of processor pipeline movement through replay queue and pointer backup
US6205542B1 (en) 1997-12-24 2001-03-20 Intel Corporation Processor pipeline including replay
US6076153A (en) 1997-12-24 2000-06-13 Intel Corporation Processor pipeline including partial replay
US6092175A (en) * 1998-04-02 2000-07-18 University Of Washington Shared register storage mechanisms for multithreaded computer systems with out-of-order execution

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8149243B1 (en) * 2006-07-28 2012-04-03 Nvidia Corporation 3D graphics API extension for a packed float image format
US8432410B1 (en) 2006-07-28 2013-04-30 Nvidia Corporation 3D graphics API extension for a shared exponent image format
US9627038B2 (en) 2013-03-15 2017-04-18 Intel Corporation Multiport memory cell having improved density area
US9436476B2 (en) * 2013-03-15 2016-09-06 Soft Machines Inc. Method and apparatus for sorting elements in hardware structures
TWI567636B (en) * 2013-03-15 2017-01-21 軟體機器公司 Method and apparatus for sorting elements in hardware structures
US9582322B2 (en) 2013-03-15 2017-02-28 Soft Machines Inc. Method and apparatus to avoid deadlock during instruction scheduling using dynamic port remapping
US20140281422A1 (en) * 2013-03-15 2014-09-18 Soft Machines, Inc. Method and Apparatus for Sorting Elements in Hardware Structures
US9753734B2 (en) 2013-03-15 2017-09-05 Intel Corporation Method and apparatus for sorting elements in hardware structures
US9891915B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method and apparatus to increase the speed of the load access and data return speed path using early lower address bits
US10180856B2 (en) 2013-03-15 2019-01-15 Intel Corporation Method and apparatus to avoid deadlock during instruction scheduling using dynamic port remapping
TWI652618B (en) 2013-03-15 2019-03-01 英特爾股份有限公司 Method and apparatus for managing elements stored in hardware structures
US10289419B2 (en) 2013-03-15 2019-05-14 Intel Corporation Method and apparatus for sorting elements in hardware structures
US9946538B2 (en) 2014-05-12 2018-04-17 Intel Corporation Method and apparatus for providing hardware support for self-modifying code

Also Published As

Publication number Publication date
US6412067B1 (en) 2002-06-25

Similar Documents

Publication Publication Date Title
US6412067B1 (en) Backing out of a processor architectural state
JP2938426B2 (en) Method and apparatus for detecting and recovering interference between out-of-order load and store instructions
JP2597811B2 (en) Data processing system
US7523296B2 (en) System and method for handling exceptions and branch mispredictions in a superscalar microprocessor
US6085312A (en) Method and apparatus for handling imprecise exceptions
JP3984786B2 (en) Scheduling instructions with different latency
US6189088B1 (en) Forwarding stored dara fetched for out-of-order load/read operation to over-taken operation read-accessing same memory location
US5546597A (en) Ready selection of data dependent instructions using multi-cycle cams in a processor performing out-of-order instruction execution
EP0762270B1 (en) Microprocessor with load/store operation to/from multiple registers
US5961630A (en) Method and apparatus for handling dynamic structural hazards and exceptions by using post-ready latency
JPH02257219A (en) Pipeline processing apparatus and method
JPH06214799A (en) Method and apparatus for improvement of performance of random-sequence loading operation in computer system
JP3813157B2 (en) Multi-pipe dispatch and execution of complex instructions on superscalar processors
US6735688B1 (en) Processor having replay architecture with fast and slow replay paths
US5761467A (en) System for committing execution results when branch conditions coincide with predetermined commit conditions specified in the instruction field
US8347066B2 (en) Replay instruction morphing
US6237076B1 (en) Method for register renaming by copying a 32 bits instruction directly or indirectly to a 64 bits instruction
JPH09152973A (en) Method and device for support of speculative execution of count / link register change instruction
US7302553B2 (en) Apparatus, system and method for quickly determining an oldest instruction in a non-moving instruction queue
US5678016A (en) Processor and method for managing execution of an instruction which determine subsequent to dispatch if an instruction is subject to serialization
US20040261068A1 (en) Methods and apparatus for preserving precise exceptions in code reordering by using control speculation
US5850563A (en) Processor and method for out-of-order completion of floating-point operations during load/store multiple operations
KR100508320B1 (en) Processor having replay architecture with fast and slow replay paths
US7380111B2 (en) Out-of-order processing with predicate prediction and validation with correct RMW partial write new predicate register values

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RAMIREZ, RICARDO;MORRISON, MICHAEL J.;REEL/FRAME:009562/0824;SIGNING DATES FROM 19980804 TO 19981026

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12