US20020032852A1 - Backing out of a processor architectural state - Google Patents
Backing out of a processor architectural state Download PDFInfo
- Publication number
- US20020032852A1 US20020032852A1 US09/132,042 US13204298A US2002032852A1 US 20020032852 A1 US20020032852 A1 US 20020032852A1 US 13204298 A US13204298 A US 13204298A US 2002032852 A1 US2002032852 A1 US 2002032852A1
- Authority
- US
- United States
- Prior art keywords
- instruction
- register
- registers
- processor
- set forth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 37
- 230000004044 response Effects 0.000 claims description 35
- 230000003111 delayed effect Effects 0.000 claims description 19
- 230000015654 memory Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000001934 delay Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000004936 stimulating effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3861—Recovery, e.g. branch miss-prediction, exception handling
- G06F9/3863—Recovery, e.g. branch miss-prediction, exception handling using multiple copies of the architectural state, e.g. shadow registers
Definitions
- the present invention relates generally to computers and processors, and more specifically, to delaying the deallocation of registers and backing out of architectural states.
- Processors fetch and execute a sequence of instructions from memory.
- the instructions ordinarily manipulate data stored in memory or registers.
- the processor decodes the instructions into first and second types of instructions adapted to execution on particular types of hardware units.
- the first type of micro-instruction loads and stores data between the memory and registers, which are typically internal to the processor.
- the second type of micro-instruction manipulates data stored in the internal registers and writes the results from the manipulations back to the internal registers. Since the number of internal registers is limited, an absence of available internal registers may occur causing a bottleneck at the decode stage.
- the processor ordinarily employs methods that efficiently use the internal registers to reduce the occurrence of decode bottlenecks.
- One mechanism for using the limited number of internal registers entails producing instructions through several operations.
- the processor decodes an incoming instruction into one or more instructions having logical operands.
- logical operands are defined to mean dummy variables for some source and destination addresses of instructions.
- an allocator assigns one or more of the available internal registers to the logical operands introduced in the first step.
- a retirement unit deallocates the previously assigned internal registers of executed instructions without substantial delay when other instructions no longer need to read the contents of the registers. Deallocation makes more internal registers available for assignment to newly decoded instructions. Thus, retirement units should rapidly deallocate registers to reduce the occurrence of instruction decode bottlenecks.
- Exceptions may be attributable to interrupts and faults generated during execution of instructions. Recovering from an exception involves both detecting the exception and reporting the exception to hardware that may re-execute any improperly executed instructions. The proper re-execution normally involves returning the processor to a pre-exception state. Thus, re-execution may include restoring original data to internal registers and reinserting the excepting instruction and the instructions dependent thereupon back into execution pipelines.
- a system designed to detect and report all exceptions may employ substantial hardware, i.e., a large area on the processor chip, and may encumber the ordinary retirement cycle.
- the detection of complex fault events may entail heavy area and time costs, because more verifications are ordinarily employed to check for complex faults.
- Complex fault detection may slow the retirement process with verifications for rarely occurring faults.
- an exception may occur on both the earlier and later members of the sequence, e.g. ⁇ I 1 , or ⁇ I 2 .
- Two methods may be pursued to recover from an exception on a later member, e.g. ⁇ I 2 .
- the processor may correct the condition causing the exception and re-execute only the excepting instructions by (a) detecting which instruction excepted, and (b) reinstating the initial execution state associated therewith.
- the processor may correct the condition causing the exception and re-execute the entire sequence, i.e., ⁇ I′ 1 , ⁇ I′ 2 , etc., whenever any member of the sequence registers an exception. Implementing either of the above methods may be problematic.
- the sequence from the macro-instruction may include “retired” instructions, because earlier members, e.g., ⁇ I 1 , may have completed execution.
- the instruction R+R′ ⁇ R destroys the original data in R when the instruction is retired, i.e., the architectural state has changed.
- executing earlier members of the sequence e.g., ⁇ I 1
- Prior art processors may not handle exceptions on instructions produced by decoding a single macro-instruction inefficiently.
- the present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.
- a first aspect of the present invention provides an apparatus.
- the processor has a plurality of registers.
- the processor is capable of re-executing at least one selected instruction by backing out of an architectural register state.
- a second aspect of the present provides a method for backing a processor out of an architectural state. The method comprises reassigning a register to a logical operand of an instruction, the register having been assigned to the logical operand in a previous architectural state; and re-executing the instruction.
- FIG. 1 is a high-level block diagram of an embodiment, in accordance with the present invention, of a processor that delays the deallocation of a portion of the registers;
- FIG. 2 is a flowchart illustrating an embodiment of a method for executing instructions in the processor of FIG. 1;
- FIG. 3 is a high-level block diagram of an embodiment of a processor having a back-out register for use in delaying the deallocation of selected registers:
- FIG. 4 is a flowchart illustrating an embodiment of a method for re-executing selected instructions in the processor of FIG. 3;
- FIG. 5 is a flowchart illustrating an embodiment of a method of re-executing selected instructions in the processor of FIG. 3 by backing out of an architectural state
- FIG. 6 is high-level block diagram an embodiment of a processor which implements speculative execution and also backs out of architectural states for re-executions involving selected registers;
- FIG. 7 illustrates a time line of the register allocation table and back-out register as instructions progress through the processor of FIG. 6:
- FIG. 8 is a flowchart illustrating an embodiment of one method of operating the processor of FIG. 6.
- an architectural state is the state of a processor's registers and memories after writes by all executed instructions determined to have executed properly, i.e., after sites by all properly retired instructions.
- a speculative state is the state of processor's registers and memories after writes by all executed instructions, i.e., after writes by all executed instructions whether or not the instructions have been determined to have executed properly.
- a processor updates a speculative state to an architectural state after a retirement unit determines that the instructions, which will update the state, have properly executed.
- FIG. 1 is a high-level block diagram illustrating a portion of a first embodiment of a processor 20 that delays the deallocation of selected registers.
- the selected registers include all internal registers.
- An allocator 21 is a hardware device that assigns registers 22 , 23 , 24 to a portion of the logical operands of incoming instructions.
- the registers 22 , 23 , 24 belong to a register file 25 , i.e., a hardware structure for handling and directing accesses of the plurality of internal registers 22 , 23 , 24 .
- a retirement unit 26 retires instructions that have been executed by an execution unit 27 . The retirement unit 26 deallocates the registers 22 , 23 .
- Deallocation makes the registers 22 , 23 , 24 available for assignment to new incoming instructions by the allocator 21 .
- the retirement unit 26 delays the deallocation of a selected portion of the registers 22 , 23 , 24 .
- the registers 22 , 23 , 24 may be classified into available registers, used registers, and delayed registers.
- the allocator 21 may assign an “available” register to a logical operand of an incoming instruction.
- the allocator 21 may not assign either “used” or “delayed” registers to the logical operands of incoming instructions.
- the “used” and the “delayed” registers are not deallocated in the sense that the allocator 21 may not reassign them to a destination logical operand of incoming instruction.
- at least one instruction may read or write a “used” register. Active instructions may neither read nor write “delayed” registers.
- the processor 20 saves the identifiers of “delayed” registers.
- Register identifiers are physical addresses. Thus, execution results in “delayed” registers may be accessed and used even though the instructions that produced the results are retired, and the results have been removed from the processor's architectural state.
- a class of registers stores a type of data, e.g., floating-point data, integer data, predicate values, multimedia data, etc.
- the term class may also apply to logical operands, e.g., floating-point data, integer data, predicate values, multimedia data. etc.
- the classes may also store types of data, which are not numerated above.
- FIG. 2 is a flowchart illustrating an embodiment of a method 30 for executing instructions in the processor 20 of FIG. 1.
- the allocator 21 assigns a first register 23 to a first logical operand of a first instruction.
- the allocator 21 assigns a second register 24 to a second logical operand of a second instruction.
- the second instruction follows the first instruction in the instruction sequence.
- the first and second logical operands are the same logical operand. In other embodiments, the first and second logical operands may be different logical operands as long as they belong to the same preselected class, e.g., floating point.
- the execution unit 27 executes the first and second instructions.
- the retirement unit 26 retires the executed first instruction.
- the retirement unit 26 saves the identifier of the first register in response to retiring the second instruction.
- the first register 23 is a “delayed” register, i.e. the contents therein may still be retrieved.
- the retirement unit 26 delays the deallocation and saves the identifiers of a preselected classes of the registers 22 , 23 . 24 and logical operands or of the registers of selected classes of instructions. Different embodiments may select different classes of the registers 22 , 23 , 24 and logical operands or different classes of instructions. In specific embodiments, the retirement unit 26 may delay the deallocation of one or of more than one selected class of registers. In some embodiments, the retirement unit 26 may delay the deallocation of the registers 22 . 23 , 24 associated with specific instruction classes, e.g., one or more registers 22 . 23 , 24 assigned to instructions resulting from the decoding of a selected single macro-instruction.
- FIG. 3 is a high-level block diagram illustrating one embodiment of a processor 38 that includes a back-out register file 39 to delay the deallocation of selected registers.
- the back-out register file 39 has storage positions 40 , 41 to save the identifiers of the registers 22 , 23 , 24 , previously assigned to the selected destination logical operand of one or more retired instructions.
- the back-out register file 39 may comprise one or several registers.
- the retirement unit 26 writes the identifier of the register 22 , 23 , 24 assigned to a destination logical operand of a selected and retired first instruction to the back-out register file 39 in response to retiring a second instruction belonging to the same class.
- the second instruction is an instruction having the same destination logical operand as the first instruction.
- the previously assigned registers might have been deallocated, because unretired instructions may no longer read the data stored in the register assigned to the first instruction after the retirement of the second instruction having the same destination logical register.
- the portion of the registers 22 . 23 , 24 with identifiers stored in the back-out register file 39 are “delayed” registers.
- the processor 38 re-executes instructions in response to selected exceptions detected by the retirement unit 26 .
- a decoder 44 translates incoming instructions into sequences of instructions, and sends the instructions to the allocator 21 .
- the retirement unit 26 detects selected exceptions and sets instructions for re-execution in response to the selected exceptions.
- the retirement unit 26 may signal microcode 45 to prepare instructions for re-execution.
- Microcode is a combination of hardware and specialized permanent memory, e.g., read-only memory (ROM), that performs a special function and is ordinarily internal to the processor.
- the microcode 45 In response to the signal from the retirement unit 26 , the microcode 45 reads the back-out register file 39 to obtain the identifiers of the portion of the registers 22 , 23 , 24 previously assigned to the selected logical operands.
- the microcode 45 produces machine code for the instructions for re-execution.
- selected logical operands are assigned the portion of the registers 22 , 23 , 24 , which were previously assigned and correspond to the identifiers saved in the back-out register file 39 , i.e. the “delayed registers.
- the microcode 45 introduces the previous register assignments in the machine code of the instructions for re-execution. This may be referred to as “backing out architectural register assignments.”
- An output line 46 sends the instructions to re-execute from the microcode 45 to the execution unit 27 .
- FIG. 4 is a flowchart illustrating an embodiment of a method 47 for re-executing selected instructions in the processor 38 of FIG. 3.
- the method includes backing out of an architectural register state.
- the processor 38 executes a first instruction having a first register as a destination address.
- the retirement unit 26 retires a second instruction having a second register 22 as a destination address.
- the first and second registers 23 , 22 have been assigned to the same selected logical operand by the allocator 21 .
- the retirement unit 26 makes the second register 22 a “delayed” register in response to determining that the first instruction is ready to retire.
- the retirement unit 26 retires the first instruction, already having retired the second instruction.
- the processor 38 re-executes a third instruction having the selected logical operand as a source or as a destination address.
- the third instruction may be one of the above-mentioned instructions or another instruction.
- Re-executing includes reassigning the second register 22 , i.e. a delayed register, to the same selected logical operand in the third instruction. By reassigning the second register to the selected logical operand, re-execution backs out of the architectural assignment.
- some embodiments may deallocate a register if another instruction having a destination register of the selected class retires. For example, at block 53 , the retirement unit 26 deallocates the first register 23 in response to retiring a fourth instruction having the different register 24 assigned to the same logical operand.
- the storage positions 40 , 41 may store the identifiers of both the portion of the registers 22 , 23 , 24 previously assigned and before previously assigned to the selected logical operands. Such an embodiment may back-out of several changes to the architectural register state.
- the storage positions 40 , 41 may store the identifiers of a portion of the registers 22 , 23 , 24 previously assigned to several selected logical operands.
- FIG. 5 is a flowchart illustrating an embodiment of a method 54 of re-executing instructions in the processor 38 of FIG. 3 by backing out of an architectural register state. Blocks 48 , 49 , 50 , 51 , and 52 were described in FIG. 4.
- the retirement unit 26 writes the identity of the second register 22 to one of the positions 40 , 41 of the back-out register file 39 .
- the positions 40 , 41 correspond to a destination logical operand X to which the second register was assigned by the allocator 21 .
- the retirement unit 26 sets a third instruction for re-execution, e.g., in response to an exception.
- the third instruction has the X logical operand as a source address.
- the microcode 45 reads the back-out register file 22 to determine the identifier of the previously assigned register for the logical operand X, i.e. the second register 22 , and reassigns the identifier therefrom to the logical operand X in the third instruction.
- the microcode 45 redirects the decoder 44 to send the third instruction for re-execution with the logical operand X replaced by the second register 22 .
- the processor 38 backs out of an architectural register state to re-execute the third instruction.
- the first and second instructions of FIGS. 4 and 5 result from decoding an incoming “packed” floating-point macro-instruction. i.e. an instruction performing several floating-point operations in parallel, or multimedia macro-instruction.
- the exceptions stimulating a back-out of an architectural state occur on the second or later sequential instruction of the same class.
- the processor 38 of FIG. 3 recovers from exceptions on either the first or the second instructions by correcting and re-executing both. In some embodiments, correcting and re-executing both instructions may the time and hardware used to detect exceptions.
- the architectural register state is no longer proper for re-executing the first or earlier sequential instruction, but the back-out register file 39 enables the processor 38 to restore the proper register state.
- Some embodiments may increase efficiency by re-executing all instructions coming from decoding a selected macro-instructions even when only a subset of the instructions encounter exceptions. This method may reduce the amount of hardware employed for detecting exceptions. Similarly, less operating time may be used to determine whether any, as opposed to which, of the selected instructions encountered an exception. In some embodiments, the time costs to individually detect the selected exceptions are high, and the selected exceptions are rare. Then, the added time to re-execute all the instructions coming from decoding a single macro-instruction may be less than the total time saved. Then, re-execution by the back-out methods of FIGS. 4 and 5 may increase the effective performance of a processor.
- FIG. 6 is a high-level block diagram illustrating an embodiment of an out-of-order processor 60 that employs speculative execution and also backs out of some architectural states for re-executions involving selected registers 22 , 23 , 24 .
- a line 61 brings incoming instructions to a decoder 64 .
- the decoder 63 includes a multiplexer (MUX) 63 having first and second input ports 62 , 102 .
- the first and second input ports 62 , 100 receive newly decoded instructions and instructions for re-execution, respectively.
- the decoder 64 sends instructions from an output port 65 of the MUX 63 to the allocator 21 .
- MUX multiplexer
- the allocator 21 may write and read identifiers of a portion of the registers 22 , 23 , 24 to and from a register allocation table (“RAT”) 66 .
- the rows 67 , 68 , 69 of the RAT 66 have both speculative and architectural assignment positions 70 , 71 to store the identifiers of the portion of the registers 22 , 23 , 24 assigned to the destination logical operands of the instructions.
- the execution units 27 , 72 in the particular embodiment of FIG. 6 may also execute the instructions out-of-order.
- a reorder queue (“ROQ”) 73 saves the original instruction sequence so that retirement of executed instructions, may be performed in-order.
- the retirement unit 26 may write the identifiers of selected classes of the registers 22 , 23 , 24 to a back-out register 74 having one or more storage positions (not shown).
- the allocator 21 assignments are initially speculative.
- the retirement unit 26 flushes unretired instructions from portions of the processor 60 between the allocator 21 and the retirement unit 26 in response to certain exceptions.
- the processor 60 may copy the entries of the architectural assignment positions 71 to the speculative assignment positions 70 .
- re-execution of unretired and improperly executed instructions may start from the earlier state defined by the architectural register assignments.
- the speculative assignments become architectural in response to the proper retirement of the instruction to which the assignments were made.
- re-execution in response to such exceptions does not entail backing out of the “architectural” state defined by the assignments of retired instructions .
- FIG. 7 is a time line 80 of the RAT 66 and the back-out register 74 as instructions I 0 and I 1 progress through of the embodiment of the processor 60 illustrated in FIG. 6.
- the instruction I 0 retires.
- the row 68 of the RAT 66 for the logical operand X stores the identifier of the register R 2 in both the speculative and the architectural assignment positions 70 , 71 , because allocator 21 had assigned register R 2 to I 0 .
- the speculative and architectural assignment positions 70 , 71 may store identical identifiers between the retirement of an instruction and the allocation of a new register 22 , 23 , 24 to a second instruction having the same destination logical operand as the first instruction.
- the entries R 2 , R 3 , and R 4 of the RAT 66 are “used” registers, meaning they may be read by active and/or incoming instructions. Active and incoming instructions may read data from the registers R 2 and R 3 in the speculative assignment positions 70 . Active and incoming instructions may also read the registers R 2 and R 4 in the architectural assignments positions 71 if the retirement unit 26 copies architectural register assignments to corresponding speculative assignments in response to an exception. As discussed in respect to FIG. 6, this corresponds to a re-execution without a back out from an architectural state, which is instituted for certain exceptions in the embodiment of FIG. 6.
- the register identifiers R 2 , R 3 , and R 4 in either the speculative or the architectural assignment positions 70 , 71 correspond to physical addresses of “used” registers 22 , 23 , 24 , because unretired instructions may read the data stored therein in this embodiment.
- the allocator 21 assigns the register R 1 to the logical operand X of the instruction I 1 at block 86 .
- the speculative assignment position 70 for the logical operand X stores the identifier R 1 in response to assignment of block 86 .
- the instruction I 1 retires without exceptions.
- the retirement unit 26 writes the identifier R 1 to the architectural assignment position 71 for the logical operand X and writes the identifier R 2 , from the previous architectural assignment for X, to the back-out register 74 .
- the register R 2 is a “delayed” register, as defined above, because unretired instructions may not read R 2 even in response to an exception.
- the register R 2 may be read if the processor 60 performs a re-execution by backing out of the writes by retired instructions. i.e., instructions that were properly executed.
- the processor 60 may handle a selected class of exceptions in a manner that includes backing out of writes by selected retired instructions, i.e., instructions that have been determined to have properly executed.
- the processor 60 backs out of writes by the retired instructions to execute a new instruction in response to the selected class of exceptions.
- the new instruction is executed in a “previous” architectural register state.
- the back-out register 74 stores register assignments for logical operands of the selected retired instructions. These register assignments enable backing out of the present architectural register state so that the execution of the new instruction can be performed with the “previous” architectural state.
- back-out execution enables the processor 60 to execute an entire sequence of instructions in a previous architectural register state. For example, one embodiment performs a back-out execution of a new sequence of instructions. i.e., ⁇ I′ 1 , ⁇ I′ 2 , etc., in response to exceptions occurring on any instruction of a selected sequence ⁇ I 1 , ⁇ I 2 , etc., wherein the sequence comes from decoding one macro-instruction.
- the new sequence ⁇ I′ 1 , ⁇ I′ 2 may differ from the original sequence, ⁇ I 1 , ⁇ I 2 , etc., to correct the problems that caused the exception.
- the processor 60 effectively re-executes all of the sequence.
- such a procedure may reduce the hardware and time costs employed for detecting the selected exceptions.
- the retirement unit 26 delays the deallocation of selected registers 22 , 23 , 24 of previously retired instructions by transferring the corresponding identifiers of the registers 22 , 23 , 24 from the architectural assignment positions 71 to the back-out register 74 .
- the retirement unit 26 does not inform the allocator 21 that the delayed registers 22 , 23 , 24 are available.
- the retirement unit 26 writes the identifiers of the registers 22 , 23 , 24 of the previously retired instructions to the back-out register 74 in response to determining that a later instruction, having the same destination logical operand, is ready to retire.
- the decoder 64 receives instructions for back-out re-execution from line 100 .
- the retirement unit 26 directs the back-out re-execution by a signal to a select input port 102 of the MUX 63 .
- the selected logical operands of the instructions for back-out re-execution are assigned register identifiers from the back-out register 74 .
- the logical operand X becomes the register R 2 in the example of block 92 in FIG. 7.
- microcode (not shown) creates the machine code for the instructions for back-out re-execution.
- the machine code may also contain one or more bits that direct the allocator 21 not to assign other registers 22 , 23 , 24 to logical operands already assigned identifiers of “delayed” registers.
- FIG. 8 is a flowchart illustrating an embodiment 110 of a method of operating of the processor 60 of FIG. 6.
- the allocator 21 receives a first instruction having a destination logical operand X.
- the allocator 21 assigns a first register to the logical operand X and writes the corresponding first identifier thereof to the speculative assignment position 70 in the RAT 66 for X.
- Subsequent instructions with the source logical operand X will read the first register.
- the retirement unit 26 retires the executed first instruction and writes the first identifier to the architectural assignment position 71 for X.
- the allocator 21 writes a second identifier, corresponding to a second register, to speculative assignment position 70 for X in response to the second instruction having the address logical operand X.
- the retirement unit 26 writes the first identifier from the architectural assignment position 71 for X to the back-out register 74 in response to determining that the second instruction is ready to retire.
- the first register is a delayed register, and active and/or incoming instructions may neither read or write from or to the first register.
- the retirement unit 26 writes the second identifier to the architectural assignment position 71 for X in response to retiring the second instruction.
- some embodiments deallocate the first register in response retiring another instruction with the destination logical operand X.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
A processor having a plurality of registers is provided. The processor is capable of re-executing at least one selected instruction by backing out of an architectural register state. A method is provided for backing a processor out of an architectural state. The method comprises reassigning a register to a logical operand of an instruction, the register having been assigned to the logical operand in a previous architectural state; and re-executing the instruction.
Description
- 1. Field of the Invention
- The present invention relates generally to computers and processors, and more specifically, to delaying the deallocation of registers and backing out of architectural states.
- 2. Description of the Related Art
- Processors fetch and execute a sequence of instructions from memory. The instructions ordinarily manipulate data stored in memory or registers. Typically, the processor decodes the instructions into first and second types of instructions adapted to execution on particular types of hardware units. The first type of micro-instruction loads and stores data between the memory and registers, which are typically internal to the processor. The second type of micro-instruction manipulates data stored in the internal registers and writes the results from the manipulations back to the internal registers. Since the number of internal registers is limited, an absence of available internal registers may occur causing a bottleneck at the decode stage. The processor ordinarily employs methods that efficiently use the internal registers to reduce the occurrence of decode bottlenecks.
- One mechanism for using the limited number of internal registers entails producing instructions through several operations. First, the processor decodes an incoming instruction into one or more instructions having logical operands. Hereafter, logical operands are defined to mean dummy variables for some source and destination addresses of instructions. Second, an allocator assigns one or more of the available internal registers to the logical operands introduced in the first step. Third, a retirement unit deallocates the previously assigned internal registers of executed instructions without substantial delay when other instructions no longer need to read the contents of the registers. Deallocation makes more internal registers available for assignment to newly decoded instructions. Thus, retirement units should rapidly deallocate registers to reduce the occurrence of instruction decode bottlenecks.
- Processors also have hardware for recovering from what are referred to as execution “exceptions”. Exceptions may be attributable to interrupts and faults generated during execution of instructions. Recovering from an exception involves both detecting the exception and reporting the exception to hardware that may re-execute any improperly executed instructions. The proper re-execution normally involves returning the processor to a pre-exception state. Thus, re-execution may include restoring original data to internal registers and reinserting the excepting instruction and the instructions dependent thereupon back into execution pipelines.
- A system designed to detect and report all exceptions may employ substantial hardware, i.e., a large area on the processor chip, and may encumber the ordinary retirement cycle. The detection of complex fault events may entail heavy area and time costs, because more verifications are ordinarily employed to check for complex faults. Complex fault detection may slow the retirement process with verifications for rarely occurring faults.
- For a macro-instruction, I1, decoding into a sequence μI1, μI2, etc., an exception may occur on both the earlier and later members of the sequence, e.g. μI1, or μI2. Two methods may be pursued to recover from an exception on a later member, e.g. μI2. First, the processor may correct the condition causing the exception and re-execute only the excepting instructions by (a) detecting which instruction excepted, and (b) reinstating the initial execution state associated therewith. Second, the processor may correct the condition causing the exception and re-execute the entire sequence, i.e., μI′1, μI′2, etc., whenever any member of the sequence registers an exception. Implementing either of the above methods may be problematic.
- Since detecting exceptions on individual members of a sequence may be complex, re-executing the entire sequence from decoding the macro-instruction may save time and reduce hardware needs. But, the sequence from the macro-instruction may include “retired” instructions, because earlier members, e.g., μI1, may have completed execution. For example, the instruction R+R′→R destroys the original data in R when the instruction is retired, i.e., the architectural state has changed. Thus, executing earlier members of the sequence, e.g., μI1, may be problematic. Prior art processors may not handle exceptions on instructions produced by decoding a single macro-instruction inefficiently.
- The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.
- A first aspect of the present invention provides an apparatus. The processor has a plurality of registers. The processor is capable of re-executing at least one selected instruction by backing out of an architectural register state. A second aspect of the present provides a method for backing a processor out of an architectural state. The method comprises reassigning a register to a logical operand of an instruction, the register having been assigned to the logical operand in a previous architectural state; and re-executing the instruction.
- Other objects and advantages of the invention will become apparent upon reading the following detailed description and upon reference to the drawings in which:
- FIG. 1 is a high-level block diagram of an embodiment, in accordance with the present invention, of a processor that delays the deallocation of a portion of the registers;
- FIG. 2 is a flowchart illustrating an embodiment of a method for executing instructions in the processor of FIG. 1;
- FIG. 3 is a high-level block diagram of an embodiment of a processor having a back-out register for use in delaying the deallocation of selected registers:
- FIG. 4 is a flowchart illustrating an embodiment of a method for re-executing selected instructions in the processor of FIG. 3;
- FIG. 5 is a flowchart illustrating an embodiment of a method of re-executing selected instructions in the processor of FIG. 3 by backing out of an architectural state;
- FIG. 6 is high-level block diagram an embodiment of a processor which implements speculative execution and also backs out of architectural states for re-executions involving selected registers;
- FIG. 7 illustrates a time line of the register allocation table and back-out register as instructions progress through the processor of FIG. 6: and
- FIG. 8 is a flowchart illustrating an embodiment of one method of operating the processor of FIG. 6.
- While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
- Specific embodiments of the invention are described below. In the interest of clarity, not all features of an actual implementation are described in this specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions may be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it will be appreciated that such a development effort, even if complex and time-consuming, would be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
- Hereafter, an architectural state is the state of a processor's registers and memories after writes by all executed instructions determined to have executed properly, i.e., after sites by all properly retired instructions. A speculative state is the state of processor's registers and memories after writes by all executed instructions, i.e., after writes by all executed instructions whether or not the instructions have been determined to have executed properly. A processor updates a speculative state to an architectural state after a retirement unit determines that the instructions, which will update the state, have properly executed.
- FIG. 1 is a high-level block diagram illustrating a portion of a first embodiment of a
processor 20 that delays the deallocation of selected registers. In some embodiments, the selected registers include all internal registers. Anallocator 21 is a hardware device that assignsregisters registers register file 25, i.e., a hardware structure for handling and directing accesses of the plurality ofinternal registers retirement unit 26 retires instructions that have been executed by anexecution unit 27. Theretirement unit 26 deallocates theregisters registers allocator 21. Theretirement unit 26 delays the deallocation of a selected portion of theregisters - Still referring to FIG. 1, the
registers allocator 21 may assign an “available” register to a logical operand of an incoming instruction. Theallocator 21 may not assign either “used” or “delayed” registers to the logical operands of incoming instructions. The “used” and the “delayed” registers are not deallocated in the sense that theallocator 21 may not reassign them to a destination logical operand of incoming instruction. By definition, at least one instruction may read or write a “used” register. Active instructions may neither read nor write “delayed” registers. Theprocessor 20 saves the identifiers of “delayed” registers. Register identifiers are physical addresses. Thus, execution results in “delayed” registers may be accessed and used even though the instructions that produced the results are retired, and the results have been removed from the processor's architectural state. - Hereafter, a class of registers stores a type of data, e.g., floating-point data, integer data, predicate values, multimedia data, etc. The term class may also apply to logical operands, e.g., floating-point data, integer data, predicate values, multimedia data. etc. In various embodiments, the classes may also store types of data, which are not numerated above.
- FIG. 2 is a flowchart illustrating an embodiment of a
method 30 for executing instructions in theprocessor 20 of FIG. 1. Atblock 31, theallocator 21 assigns afirst register 23 to a first logical operand of a first instruction. Atblock 32, theallocator 21 assigns asecond register 24 to a second logical operand of a second instruction. The second instruction follows the first instruction in the instruction sequence. In the illustrated embodiments, the first and second logical operands are the same logical operand. In other embodiments, the first and second logical operands may be different logical operands as long as they belong to the same preselected class, e.g., floating point. Atblock 33, theexecution unit 27 executes the first and second instructions. Atblock 34, theretirement unit 26 retires the executed first instruction. Atblock 35, theretirement unit 26 saves the identifier of the first register in response to retiring the second instruction. Thefirst register 23 is a “delayed” register, i.e. the contents therein may still be retrieved. - Still referring to FIG. 1 and2, the
retirement unit 26 delays the deallocation and saves the identifiers of a preselected classes of theregisters registers retirement unit 26 may delay the deallocation of one or of more than one selected class of registers. In some embodiments, theretirement unit 26 may delay the deallocation of theregisters 22. 23, 24 associated with specific instruction classes, e.g., one or more registers 22. 23, 24 assigned to instructions resulting from the decoding of a selected single macro-instruction. - Some embodiments in accordance with the invention may employ “backing out” of an architectural state in a processor that speculatively executes instructions. FIG. 3 is a high-level block diagram illustrating one embodiment of a
processor 38 that includes a back-outregister file 39 to delay the deallocation of selected registers. The back-outregister file 39 hasstorage positions registers register file 39 may comprise one or several registers. Theretirement unit 26 writes the identifier of theregister register file 39 in response to retiring a second instruction belonging to the same class. In some embodiments, the second instruction is an instruction having the same destination logical operand as the first instruction. In the prior art, the previously assigned registers might have been deallocated, because unretired instructions may no longer read the data stored in the register assigned to the first instruction after the retirement of the second instruction having the same destination logical register. The portion of theregisters 22. 23, 24 with identifiers stored in the back-outregister file 39 are “delayed” registers. - Still referring to FIG. 3, the
processor 38 re-executes instructions in response to selected exceptions detected by theretirement unit 26. Adecoder 44 translates incoming instructions into sequences of instructions, and sends the instructions to theallocator 21. Theretirement unit 26 detects selected exceptions and sets instructions for re-execution in response to the selected exceptions. Theretirement unit 26 may signalmicrocode 45 to prepare instructions for re-execution. Microcode is a combination of hardware and specialized permanent memory, e.g., read-only memory (ROM), that performs a special function and is ordinarily internal to the processor. In response to the signal from theretirement unit 26, themicrocode 45 reads the back-outregister file 39 to obtain the identifiers of the portion of theregisters microcode 45 produces machine code for the instructions for re-execution. During the re-execution, selected logical operands are assigned the portion of theregisters register file 39, i.e. the “delayed registers. Themicrocode 45 introduces the previous register assignments in the machine code of the instructions for re-execution. This may be referred to as “backing out architectural register assignments.” Anoutput line 46 sends the instructions to re-execute from themicrocode 45 to theexecution unit 27. - FIG. 4 is a flowchart illustrating an embodiment of a
method 47 for re-executing selected instructions in theprocessor 38 of FIG. 3. The method includes backing out of an architectural register state. Atblock 48, theprocessor 38 executes a first instruction having a first register as a destination address. Atblock 49, theretirement unit 26 retires a second instruction having asecond register 22 as a destination address. The first andsecond registers allocator 21. Atblock 50, theretirement unit 26 makes the second register 22 a “delayed” register in response to determining that the first instruction is ready to retire. Atblock 51, theretirement unit 26 retires the first instruction, already having retired the second instruction. At retirement, thefirst register 23 becomes an architectural register that may still be read by unretired instructions. Atblock 52, theprocessor 38 re-executes a third instruction having the selected logical operand as a source or as a destination address. The third instruction may be one of the above-mentioned instructions or another instruction. Re-executing includes reassigning thesecond register 22, i.e. a delayed register, to the same selected logical operand in the third instruction. By reassigning the second register to the selected logical operand, re-execution backs out of the architectural assignment. - Referring still to FIG. 4, some embodiments may deallocate a register if another instruction having a destination register of the selected class retires. For example, at
block 53, theretirement unit 26 deallocates thefirst register 23 in response to retiring a fourth instruction having thedifferent register 24 assigned to the same logical operand. In other embodiments (not shown), the storage positions 40, 41 may store the identifiers of both the portion of theregisters registers - FIG. 5 is a flowchart illustrating an embodiment of a
method 54 of re-executing instructions in theprocessor 38 of FIG. 3 by backing out of an architectural register state.Blocks block 55, theretirement unit 26 writes the identity of thesecond register 22 to one of thepositions register file 39. Thepositions allocator 21. At block 56, theretirement unit 26 sets a third instruction for re-execution, e.g., in response to an exception. The third instruction has the X logical operand as a source address. Atblock 57, themicrocode 45 reads the back-outregister file 22 to determine the identifier of the previously assigned register for the logical operand X, i.e. thesecond register 22, and reassigns the identifier therefrom to the logical operand X in the third instruction. Atblock 58, themicrocode 45 redirects thedecoder 44 to send the third instruction for re-execution with the logical operand X replaced by thesecond register 22. Thus, theprocessor 38 backs out of an architectural register state to re-execute the third instruction. - In one embodiment the first and second instructions of FIGS. 4 and 5 result from decoding an incoming “packed” floating-point macro-instruction. i.e. an instruction performing several floating-point operations in parallel, or multimedia macro-instruction. In this embodiment, the exceptions stimulating a back-out of an architectural state occur on the second or later sequential instruction of the same class. The
processor 38 of FIG. 3 recovers from exceptions on either the first or the second instructions by correcting and re-executing both. In some embodiments, correcting and re-executing both instructions may the time and hardware used to detect exceptions. The architectural register state is no longer proper for re-executing the first or earlier sequential instruction, but the back-outregister file 39 enables theprocessor 38 to restore the proper register state. - Some embodiments may increase efficiency by re-executing all instructions coming from decoding a selected macro-instructions even when only a subset of the instructions encounter exceptions. This method may reduce the amount of hardware employed for detecting exceptions. Similarly, less operating time may be used to determine whether any, as opposed to which, of the selected instructions encountered an exception. In some embodiments, the time costs to individually detect the selected exceptions are high, and the selected exceptions are rare. Then, the added time to re-execute all the instructions coming from decoding a single macro-instruction may be less than the total time saved. Then, re-execution by the back-out methods of FIGS. 4 and 5 may increase the effective performance of a processor.
- FIG. 6 is a high-level block diagram illustrating an embodiment of an out-of-
order processor 60 that employs speculative execution and also backs out of some architectural states for re-executions involving selected registers 22, 23, 24. Aline 61 brings incoming instructions to adecoder 64. Thedecoder 63 includes a multiplexer (MUX) 63 having first andsecond input ports second input ports decoder 64 sends instructions from anoutput port 65 of theMUX 63 to theallocator 21. Theallocator 21 may write and read identifiers of a portion of theregisters rows RAT 66 have both speculative and architectural assignment positions 70, 71 to store the identifiers of the portion of theregisters execution units retirement unit 26 may write the identifiers of selected classes of theregisters out register 74 having one or more storage positions (not shown). - Still referring to FIG. 6, the allocator21 assignments are initially speculative. The
retirement unit 26 flushes unretired instructions from portions of theprocessor 60 between the allocator 21 and theretirement unit 26 in response to certain exceptions. To recover from the exceptions, theprocessor 60 may copy the entries of the architectural assignment positions 71 to the speculative assignment positions 70. Then, re-execution of unretired and improperly executed instructions may start from the earlier state defined by the architectural register assignments. The speculative assignments become architectural in response to the proper retirement of the instruction to which the assignments were made. Thus, re-execution in response to such exceptions, as opposed to the selected exceptions of FIGS. 1-5, does not entail backing out of the “architectural” state defined by the assignments of retired instructions . - FIG. 7 is a
time line 80 of theRAT 66 and the back-out register 74 as instructions I0 and I1 progress through of the embodiment of theprocessor 60 illustrated in FIG. 6. Atblock 82, the instruction I0 retires. Atblock 84, the row 68 of theRAT 66 for the logical operand X stores the identifier of the register R2 in both the speculative and the architectural assignment positions 70, 71, becauseallocator 21 had assigned register R2 to I0. The speculative and architectural assignment positions 70, 71 may store identical identifiers between the retirement of an instruction and the allocation of anew register - At
block 84 of FIG. 7, the entries R2, R3, and R4 of theRAT 66 are “used” registers, meaning they may be read by active and/or incoming instructions. Active and incoming instructions may read data from the registers R2 and R3 in the speculative assignment positions 70. Active and incoming instructions may also read the registers R2 and R4 in the architectural assignments positions 71 if theretirement unit 26 copies architectural register assignments to corresponding speculative assignments in response to an exception. As discussed in respect to FIG. 6, this corresponds to a re-execution without a back out from an architectural state, which is instituted for certain exceptions in the embodiment of FIG. 6. The register identifiers R2, R3, and R4 in either the speculative or the architectural assignment positions 70, 71 correspond to physical addresses of “used” registers 22, 23, 24, because unretired instructions may read the data stored therein in this embodiment. - Still referring to FIG. 7, the
allocator 21 assigns the register R1 to the logical operand X of the instruction I1 atblock 86. Atblock 88, thespeculative assignment position 70 for the logical operand X stores the identifier R1 in response to assignment ofblock 86. Atblock 90, the instruction I1 retires without exceptions. Atblock 92, theretirement unit 26 writes the identifier R1 to thearchitectural assignment position 71 for the logical operand X and writes the identifier R2, from the previous architectural assignment for X, to the back-out register 74. The register R2 is a “delayed” register, as defined above, because unretired instructions may not read R2 even in response to an exception. The register R2 may be read if theprocessor 60 performs a re-execution by backing out of the writes by retired instructions. i.e., instructions that were properly executed. - Referring back to FIG. 6, the
processor 60 may handle a selected class of exceptions in a manner that includes backing out of writes by selected retired instructions, i.e., instructions that have been determined to have properly executed. In one embodiment, theprocessor 60 backs out of writes by the retired instructions to execute a new instruction in response to the selected class of exceptions. The new instruction is executed in a “previous” architectural register state. The back-out register 74 stores register assignments for logical operands of the selected retired instructions. These register assignments enable backing out of the present architectural register state so that the execution of the new instruction can be performed with the “previous” architectural state. - In some embodiments, back-out execution enables the
processor 60 to execute an entire sequence of instructions in a previous architectural register state. For example, one embodiment performs a back-out execution of a new sequence of instructions. i.e., μI′1, μI′2, etc., in response to exceptions occurring on any instruction of a selected sequence μI1, μI2, etc., wherein the sequence comes from decoding one macro-instruction. The new sequence μI′1, μI′2, may differ from the original sequence, μI1, μI2, etc., to correct the problems that caused the exception. In this embodiment, theprocessor 60 effectively re-executes all of the sequence. e.g., μI1 μI2, etc., even though the architectural state has changed due to the retirement of earlier instructions of the sequence, i.e. instructions not registering exceptions. In some embodiments, such a procedure may reduce the hardware and time costs employed for detecting the selected exceptions. - Referring to FIG. 6, the
retirement unit 26 delays the deallocation of selectedregisters registers out register 74. Theretirement unit 26 does not inform theallocator 21 that the delayed registers 22, 23, 24 are available. Theretirement unit 26 writes the identifiers of theregisters out register 74 in response to determining that a later instruction, having the same destination logical operand, is ready to retire. - Referring still to FIG. 6, the
decoder 64 receives instructions for back-out re-execution fromline 100. Theretirement unit 26 directs the back-out re-execution by a signal to aselect input port 102 of theMUX 63. The selected logical operands of the instructions for back-out re-execution are assigned register identifiers from the back-out register 74. For example, the logical operand X becomes the register R2 in the example ofblock 92 in FIG. 7. In some embodiments, microcode (not shown) creates the machine code for the instructions for back-out re-execution. The machine code may also contain one or more bits that direct the allocator 21 not to assignother registers - FIG. 8 is a flowchart illustrating an
embodiment 110 of a method of operating of theprocessor 60 of FIG. 6. Atblock 112, theallocator 21 receives a first instruction having a destination logical operand X. Atblock 114, theallocator 21 assigns a first register to the logical operand X and writes the corresponding first identifier thereof to thespeculative assignment position 70 in theRAT 66 for X. Subsequent instructions with the source logical operand X will read the first register. Atblock 116, theretirement unit 26 retires the executed first instruction and writes the first identifier to thearchitectural assignment position 71 for X. Atblock 118, theallocator 21 writes a second identifier, corresponding to a second register, tospeculative assignment position 70 for X in response to the second instruction having the address logical operand X. Atblock 120, theretirement unit 26 writes the first identifier from thearchitectural assignment position 71 for X to the back-out register 74 in response to determining that the second instruction is ready to retire. Now, the first register is a delayed register, and active and/or incoming instructions may neither read or write from or to the first register. Atblock 122, theretirement unit 26 writes the second identifier to thearchitectural assignment position 71 for X in response to retiring the second instruction. Atblock 124, some embodiments deallocate the first register in response retiring another instruction with the destination logical operand X. - The particular embodiments disclosed above are illustrative only, as the invention may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Furthermore, no limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope and spirit of the invention. Accordingly, the protection sought herein is as set forth in the claims below.
Claims (33)
1. A apparatus, comprising:
a processor having a plurality of registers, the processor being capable of executing at least one selected instruction by backing out of an architectural register state.
2. The apparatus as set forth in claim 1 , the processor further comprising:
at least one back-out register adapted to store an identifier of a delayed register, the processor capable of reassigning the identifier to a logical operand of the selected instruction.
3. The apparatus as set forth in claim 1 , wherein the processor is capable of executing a plurality of instructions speculatively.
4. The apparatus as set forth in claim 1 , the processor further comprising a retirement unit to delay deallocation of a first one of the registers in response to the retirement of a second one of the registers.
5. A method for backing a processor out of an architectural state, the method comprising:
reassigning a register to a logical operand of an instruction, the register having been assigned to the logical operand in a previous architectural state; and
executing the instruction.
6. The method as set forth in claim 5 , wherein the act of reassigning assigns a register that is neither used or available.
7. The method as set forth in claim 5 , further comprising delaying deallocation of the register in response to retiring a second instruction.
8. The method as set forth in claim 7 , wherein the act of delaying comprises inserting an identifier of the register in a back-out register and wherein the act of reassigning includes writing the identifier from the back-out register into a machine code for the instruction.
9. A processor, comprising:
a plurality of registers;
an allocator to assign the registers to logical operands of instructions;
at least one execution unit; and
a retirement unit to retire instructions executed by the execution unit and to deallocate the registers assigned to logical operands of instructions, the retirement unit being capable of changing at least one of the registers assigned to a first instruction to a delayed register.
10. The processor as set forth in claim 9 , wherein the retirement unit is adapted to delay the deallocation of the one of the registers in the absence of an instruction capable of reading the one of the registers.
11. The processor as set forth in claim 9 , wherein the plurality of registers belong to a preselected class.
12. The processor as set forth in claim 9 , wherein the plurality of registers include at least one of a floating-point register and a multimedia register.
13. The processor as set forth in claim 9 , wherein the retirement unit is adapted to delay the deallocation of a first register assigned to a destination logical operand of a first retired instruction in response to determining that a second instruction is ready to retire, the second instruction having a second register assigned to the destination logical operand.
14. The processor as set forth in claim 9 , further comprising a back-out register having at least one storage position, the retirement unit to write an identifier of the one of the registers to the storage position to change the one of the registers to a delayed register.
15. The processor as set forth in claim 14 , wherein the storage position corresponds to a particular logical operand.
16. The processor as set forth in claim 9 , wherein the one of the registers is assigned to a first instruction and wherein the retirement unit is capable of changing the one of the registers to a delayed register in response to retiring a second instruction, the second instruction having a second register, the first and second registers being architectural and speculative registers, respectively, assigned to the same destination logical operand.
17. A processor capable of backing out of an architectural state, comprising:
a plurality of registers;
a decoder;
an allocator to assign the registers to destination logical operands of a portion of instructions received from the decoder;
at least one execution unit to execute a portion of instructions; and
a retirement unit to set selected instructions for back-out re-execution and to delay deallocation of a first register assigned to a first retired instruction in response to retiring a second instruction, a second register assigned to the second instruction.
18. The processor as set forth in claim 17 , wherein the first and second registers correspond to the same destination logical operand in the first and second instructions, respectively.
19. The processor as set forth in claim 17 , wherein the retirement unit is adapted to assign the first register to a third instruction in response to sending the third instruction for back-out execution, the first register being assigned to the same logical operand in the third instruction and the first instruction.
20. The processor as set forth in claim 17 , wherein the retirement unit is adapted to set instructions for back-out execution in response to selected exceptions.
21. The processor as set forth in claim 17 , further comprising:
a back-out register, the retirement unit adapted to write an identifier of the first register to the back-out register in response to retiring the second instruction; and
microcode to receive the identifier from the back-out register and to insert the identifier to one of the selected instructions in response the one of the selected instructions being sent for back-out execution, the one of the selected instructions having a logical operand, the first register being assigned to the logical operand in the first instruction.
22. A method, comprising:
allocating a first register to a logical operand in a first instruction;
allocating a second register to the logical operand in a second instruction;
executing the first and second instructions;
retiring the first instruction; and
saving the identifier of the first register in response to retiring the second instruction.
23. The method as set forth in claim 22 , wherein the act of saving includes delaying the deallocation of the first register.
24. The method as set forth in claim 22 , further comprising executing a third instruction, the third instruction having the logical operand as an address, the act of executing including reassigning the first register to the logical operand in the third instruction.
25. The method as set forth in claim 24 , wherein the act of executing a third instruction is in response to detecting a preselected exception.
26. The method as set forth in claim 24 , wherein the act of executing a third instruction is performed in response to an exception on a fourth instruction, the fourth instruction being a non-leading instruction in a sequence of instructions generated by decoding a selected macro-instruction.
27. The method as set forth in claim 22 , wherein the act of saving includes transferring the identifier for the first register from the architectural register assignment position in a register allocation table to a back-out register.
28. A method, comprising:
executing a first instruction having a first register assigned to a destination logical operand;
retiring a second instruction assigned a second register to the destination logical operand;
delaying the deallocation of the second register in response to determining that the first instruction is ready to retire;
retiring the first instruction; and
executing a third instruction having the logical operand as a source or destination address, the act of executing including assigning the second register to the logical operand.
29. The method as set forth in claim 28 , wherein the act of executing the third instruction is in response to a preselected exception.
30. The method as set forth in claim 28 , wherein the act of delaying includes writing an identifier of the second register to a position in a back-out register, the position corresponding to the logical operand.
31. The method as set forth in claim 28 , wherein the act of executing includes reading the identifier from the back-out register and writing the identifier to the position for the logical operand in a machine code for the third instruction.
32. The method as set forth in claim 28 , wherein the act of executing includes redirecting the instruction flow to receive the third instruction.
33. The method as set forth in claim 28 , wherein the act of executing the third instruction includes re-executing a properly retired instruction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/132,042 US6412067B1 (en) | 1998-08-11 | 1998-08-11 | Backing out of a processor architectural state |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/132,042 US6412067B1 (en) | 1998-08-11 | 1998-08-11 | Backing out of a processor architectural state |
Publications (2)
Publication Number | Publication Date |
---|---|
US20020032852A1 true US20020032852A1 (en) | 2002-03-14 |
US6412067B1 US6412067B1 (en) | 2002-06-25 |
Family
ID=22452179
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/132,042 Expired - Lifetime US6412067B1 (en) | 1998-08-11 | 1998-08-11 | Backing out of a processor architectural state |
Country Status (1)
Country | Link |
---|---|
US (1) | US6412067B1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8149243B1 (en) * | 2006-07-28 | 2012-04-03 | Nvidia Corporation | 3D graphics API extension for a packed float image format |
US20140281422A1 (en) * | 2013-03-15 | 2014-09-18 | Soft Machines, Inc. | Method and Apparatus for Sorting Elements in Hardware Structures |
US9582322B2 (en) | 2013-03-15 | 2017-02-28 | Soft Machines Inc. | Method and apparatus to avoid deadlock during instruction scheduling using dynamic port remapping |
US9627038B2 (en) | 2013-03-15 | 2017-04-18 | Intel Corporation | Multiport memory cell having improved density area |
US9891915B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method and apparatus to increase the speed of the load access and data return speed path using early lower address bits |
US9946538B2 (en) | 2014-05-12 | 2018-04-17 | Intel Corporation | Method and apparatus for providing hardware support for self-modifying code |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6625756B1 (en) * | 1997-12-19 | 2003-09-23 | Intel Corporation | Replay mechanism for soft error recovery |
US7406565B2 (en) * | 2004-01-13 | 2008-07-29 | Hewlett-Packard Development Company, L.P. | Multi-processor systems and methods for backup for non-coherent speculative fills |
US7376794B2 (en) * | 2004-01-13 | 2008-05-20 | Hewlett-Packard Development Company, L.P. | Coherent signal in a multi-processor system |
US7340565B2 (en) * | 2004-01-13 | 2008-03-04 | Hewlett-Packard Development Company, L.P. | Source request arbitration |
US8281079B2 (en) * | 2004-01-13 | 2012-10-02 | Hewlett-Packard Development Company, L.P. | Multi-processor system receiving input from a pre-fetch buffer |
US8301844B2 (en) * | 2004-01-13 | 2012-10-30 | Hewlett-Packard Development Company, L.P. | Consistency evaluation of program execution across at least one memory barrier |
US7383409B2 (en) | 2004-01-13 | 2008-06-03 | Hewlett-Packard Development Company, L.P. | Cache systems and methods for employing speculative fills |
US7380107B2 (en) * | 2004-01-13 | 2008-05-27 | Hewlett-Packard Development Company, L.P. | Multi-processor system utilizing concurrent speculative source request and system source request in response to cache miss |
US7409500B2 (en) * | 2004-01-13 | 2008-08-05 | Hewlett-Packard Development Company, L.P. | Systems and methods for employing speculative fills |
US7409503B2 (en) * | 2004-01-13 | 2008-08-05 | Hewlett-Packard Development Company, L.P. | Register file systems and methods for employing speculative fills |
US7360069B2 (en) * | 2004-01-13 | 2008-04-15 | Hewlett-Packard Development Company, L.P. | Systems and methods for executing across at least one memory barrier employing speculative fills |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5797013A (en) * | 1995-11-29 | 1998-08-18 | Hewlett-Packard Company | Intelligent loop unrolling |
US6182210B1 (en) | 1997-12-16 | 2001-01-30 | Intel Corporation | Processor having multiple program counters and trace buffers outside an execution pipeline |
US6240509B1 (en) | 1997-12-16 | 2001-05-29 | Intel Corporation | Out-of-pipeline trace buffer for holding instructions that may be re-executed following misspeculation |
US6047370A (en) | 1997-12-19 | 2000-04-04 | Intel Corporation | Control of processor pipeline movement through replay queue and pointer backup |
US6205542B1 (en) | 1997-12-24 | 2001-03-20 | Intel Corporation | Processor pipeline including replay |
US6076153A (en) | 1997-12-24 | 2000-06-13 | Intel Corporation | Processor pipeline including partial replay |
US6092175A (en) * | 1998-04-02 | 2000-07-18 | University Of Washington | Shared register storage mechanisms for multithreaded computer systems with out-of-order execution |
-
1998
- 1998-08-11 US US09/132,042 patent/US6412067B1/en not_active Expired - Lifetime
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8149243B1 (en) * | 2006-07-28 | 2012-04-03 | Nvidia Corporation | 3D graphics API extension for a packed float image format |
US8432410B1 (en) | 2006-07-28 | 2013-04-30 | Nvidia Corporation | 3D graphics API extension for a shared exponent image format |
US9627038B2 (en) | 2013-03-15 | 2017-04-18 | Intel Corporation | Multiport memory cell having improved density area |
US9436476B2 (en) * | 2013-03-15 | 2016-09-06 | Soft Machines Inc. | Method and apparatus for sorting elements in hardware structures |
TWI567636B (en) * | 2013-03-15 | 2017-01-21 | 軟體機器公司 | Method and apparatus for sorting elements in hardware structures |
US9582322B2 (en) | 2013-03-15 | 2017-02-28 | Soft Machines Inc. | Method and apparatus to avoid deadlock during instruction scheduling using dynamic port remapping |
US20140281422A1 (en) * | 2013-03-15 | 2014-09-18 | Soft Machines, Inc. | Method and Apparatus for Sorting Elements in Hardware Structures |
US9753734B2 (en) | 2013-03-15 | 2017-09-05 | Intel Corporation | Method and apparatus for sorting elements in hardware structures |
US9891915B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method and apparatus to increase the speed of the load access and data return speed path using early lower address bits |
US10180856B2 (en) | 2013-03-15 | 2019-01-15 | Intel Corporation | Method and apparatus to avoid deadlock during instruction scheduling using dynamic port remapping |
TWI652618B (en) | 2013-03-15 | 2019-03-01 | 英特爾股份有限公司 | Method and apparatus for managing elements stored in hardware structures |
US10289419B2 (en) | 2013-03-15 | 2019-05-14 | Intel Corporation | Method and apparatus for sorting elements in hardware structures |
US9946538B2 (en) | 2014-05-12 | 2018-04-17 | Intel Corporation | Method and apparatus for providing hardware support for self-modifying code |
Also Published As
Publication number | Publication date |
---|---|
US6412067B1 (en) | 2002-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6412067B1 (en) | Backing out of a processor architectural state | |
JP2938426B2 (en) | Method and apparatus for detecting and recovering interference between out-of-order load and store instructions | |
JP2597811B2 (en) | Data processing system | |
US7523296B2 (en) | System and method for handling exceptions and branch mispredictions in a superscalar microprocessor | |
US6085312A (en) | Method and apparatus for handling imprecise exceptions | |
JP3984786B2 (en) | Scheduling instructions with different latency | |
US6189088B1 (en) | Forwarding stored dara fetched for out-of-order load/read operation to over-taken operation read-accessing same memory location | |
US5546597A (en) | Ready selection of data dependent instructions using multi-cycle cams in a processor performing out-of-order instruction execution | |
EP0762270B1 (en) | Microprocessor with load/store operation to/from multiple registers | |
US5961630A (en) | Method and apparatus for handling dynamic structural hazards and exceptions by using post-ready latency | |
JPH02257219A (en) | Pipeline processing apparatus and method | |
JPH06214799A (en) | Method and apparatus for improvement of performance of random-sequence loading operation in computer system | |
JP3813157B2 (en) | Multi-pipe dispatch and execution of complex instructions on superscalar processors | |
US6735688B1 (en) | Processor having replay architecture with fast and slow replay paths | |
US5761467A (en) | System for committing execution results when branch conditions coincide with predetermined commit conditions specified in the instruction field | |
US8347066B2 (en) | Replay instruction morphing | |
US6237076B1 (en) | Method for register renaming by copying a 32 bits instruction directly or indirectly to a 64 bits instruction | |
JPH09152973A (en) | Method and device for support of speculative execution of count / link register change instruction | |
US7302553B2 (en) | Apparatus, system and method for quickly determining an oldest instruction in a non-moving instruction queue | |
US5678016A (en) | Processor and method for managing execution of an instruction which determine subsequent to dispatch if an instruction is subject to serialization | |
US20040261068A1 (en) | Methods and apparatus for preserving precise exceptions in code reordering by using control speculation | |
US5850563A (en) | Processor and method for out-of-order completion of floating-point operations during load/store multiple operations | |
KR100508320B1 (en) | Processor having replay architecture with fast and slow replay paths | |
US7380111B2 (en) | Out-of-order processing with predicate prediction and validation with correct RMW partial write new predicate register values |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RAMIREZ, RICARDO;MORRISON, MICHAEL J.;REEL/FRAME:009562/0824;SIGNING DATES FROM 19980804 TO 19981026 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |