US20040068641A1 - Method and apparatus for exchanging the contents of registers - Google Patents

Method and apparatus for exchanging the contents of registers Download PDF

Info

Publication number
US20040068641A1
US20040068641A1 US10/678,263 US67826303A US2004068641A1 US 20040068641 A1 US20040068641 A1 US 20040068641A1 US 67826303 A US67826303 A US 67826303A US 2004068641 A1 US2004068641 A1 US 2004068641A1
Authority
US
United States
Prior art keywords
register
stack
contents
instructions
move
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/678,263
Inventor
Kevin Safford
Patrick Knebel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US10/678,263 priority Critical patent/US20040068641A1/en
Publication of US20040068641A1 publication Critical patent/US20040068641A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • G06F9/30174Runtime instruction translation, e.g. macros for non-native instruction set, e.g. Javabyte, legacy code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units

Definitions

  • the present invention relates to digital computer systems, and more particularly, but not by way of limitation, to methods and apparatus for executing instructions in such systems.
  • the floating point unit comprises a plurality of data registers. Floating point instructions treat this plurality of data registers as a register stack. All addressing of the data registers is relative to the register on the top of the stack. The register number of the current top-of-stack register is stored in a stack TOP field. Thus, load operations decrement TOP by one and load a value into the new top-of-stack register, while store operations store the value from the current top-of-stack register in memory and then increment TOP by one.
  • register renaming In many computer architectures, instructions, such as the FXCH, must be executed by emulation because the native hardware that supports such an instruction is not present.
  • One way of emulating the FXCH instruction in such architectures is through a technique called register renaming.
  • register renaming the physical registers in question (e.g., ST( 0 ) and ST(i)) are mapped into a stack register map.
  • the pointers that map the physical registers into the stack register map are changed or “re-pointed” from their original register to the other register, and thus the operation is performed.
  • register renaming at least one problem with register renaming is that it requires that the pointers be stored in additional hardware which adds to the cost and complexity of the system as well as consuming valuable space.
  • Another way of emulating the FXCH is to sequentially execute at least three micro-code instructions as follows:
  • This sequence of instructions uses a temporary register to switch the contents of the top register ST( 0 ) and the ith register ST(i).
  • This method of emulation consumes three times as many clock cycles as the single FXCH instruction and, in some cases, may consume even more, depending upon the latency associated with the move operations.
  • a processor based computer system having dependency checking logic and a register stack, wherein the system overrides the dependency logic such that move instructions associated with the stack registers may be executed in parallel.
  • the system operates such that it can be determined whether a stack underflow exception has occurred and if it has, the move instructions can be flushed, and a micro-code handler algorithm invoked that operates to allow execution of the move instructions in parallel without a stack underflow exception.
  • FIG. 1 is a block diagram of a computer system including the present invention.
  • FIG. 2 is block diagram of the processor of FIG. 1.
  • FIG. 3 is an illustration of pipelined or lockstep operations.
  • FIG. 4 is a flow chart of the operation of the stack underflow fault micro-code handler of the present invention.
  • FIG. 1 illustrates a computer system 10 in which the present invention may be implemented.
  • the computer system 10 comprises at least one processor 20 , main memory 30 , and various interconnecting data, address, and control bases (numbered collectively as 40 ).
  • An instruction set 50 (which may be a guest instruction set) and an operating system 60 may be stored in main memory 30 . As illustrated in FIG.
  • the processor 20 comprises a floating point unit 70 , dispatch or dependency checking logic 80 , at least two execution units 90 a and 90 b , micro-code ROM 100 , a register stack 120 (in this embodiment the register stack 120 comprises eight individual registers 120 ( 0 )-( 7 )), a floating point tag word (FPTW) register 130 , and various busses and interconnections (numbered collectively as 110 ).
  • the processor 20 comprises a floating point unit 70 , dispatch or dependency checking logic 80 , at least two execution units 90 a and 90 b , micro-code ROM 100 , a register stack 120 (in this embodiment the register stack 120 comprises eight individual registers 120 ( 0 )-( 7 )), a floating point tag word (FPTW) register 130 , and various busses and interconnections (numbered collectively as 110 ).
  • FPTW floating point tag word
  • Instructions are provided to the processor 20 from main memory 30 .
  • the instructions provided to the processor 20 are macro-code instructions that map to one or more micro-code instructions 140 stored in the micro-code ROM 100 .
  • the micro-code instructions can be directly executed by processor 20 .
  • Also stored in the micro-code ROM 100 are a set of micro-code handlers 150 that may be invoked to handle processor exceptions.
  • the floating point unit 70 accesses the register stack 120 to store and retrieve data in response to instructions.
  • the FPTW register 130 is updated accordingly.
  • the processor 20 may have a pipelined architecture and may have allow for parallel processing of certain instructions.
  • Dependency checking logic 80 operates to determine which instructions can be operated in parallel, i.e., whether to issue two instructions or one instruction per cycle to the execution units 90 .
  • this sequence of code is not capable of exchanging the contents of the two registers (ST( 0 ) and ST(i).
  • each instruction is executed independently and in serial fashion.
  • the first instruction will overwrite ST(i) before the second instruction reads ST(i), and the result placed in ST( 0 ) will be ST( 0 ) rather than ST(i). This is obviously an incorrect result.
  • the traditional method for correctly performing the desired exchange would require an additional temporary location and an extra instruction.
  • the computer system 10 emulates the FXCH instruction by forcing the two move instructions to be executed in parallel (illustrated conceptually below):
  • the computer system 10 flushes both operations when either of them causes a fault.
  • a stack underflow exception occurs when an operation attempts to read the contents of an empty stack register 120 ( 0 )-( 7 ).
  • a floating point tag word stored in the FPTW register 130 indicates whether a stack register 120 ( 0 )-( 7 ) is empty or not.
  • a defined architectural response to a stack underflow is to replace the empty register with a QNaN, mark it as non-empty, and the perform the instruction again. While it is possible to add hardware inside the execution units 90 a and 90 b to indicate which of the two move instruction caused the stack underflow fault, that is not desired because of the additional complexity and cost.
  • a micro-code handler algorithm 200 is invoked by processor 20 when a stack underflow exception occurs.
  • a stack underflow exception has occurred when an attempt to execute the two move operations in parallel was attempted.
  • the micro-code handler algorithm 200 causes the FPTW bits in the FPTW register 130 that correspond to the ST( 0 ) register to be checked and then at block 230 , a decision is made as to whether register ST( 0 ) is empty or not.
  • register ST( 0 ) is empty, at block 240 its contents are replaced with a QNaN and the corresponding FPTW bit is set to indicate that the ST( 0 ) register is no longer empty. Proceeding now to block 260 , emulation of the FXCH instruction is performed again (by issuing the two instructions in parallel again). If the register ST( 0 ) was not empty, the exception must have occurred because register ST(i) was empty and, at block 250 the register ST(i) contents are replaced with a QnaN, the corresponding FPTW bit is set, and the emulation is performed again at block 260 .
  • a stack underflow exception occurs once again, it is known that both registers involved in the operation must have been empty originally and that this time, the ST(i) register caused the exception.
  • the registers contents at this stage are:
  • the ST(i) register is loaded with a QNAN and at block 290 , the emulation proceeds again, this time without any exceptions—the QNaNs in the two registers will be harmlessly exchanged.
  • emulation of the FXCH instruction may be achieved in substantially less time than previous methods and without adding hardware to the computer system 10 .
  • a microcode handler ensures correct execution of the emulated FXCH in the event of a stack underflow exception.
  • the computer system 10 instead checks the ST(i) register thereby eliminating the need to re-execute an instruction.
  • additional hardware or microcode is added to determine which register ST(i) is referenced in the instruction. Accordingly, it is intended that the scope of the invention be only limited as necessitated by the accompanying claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)

Abstract

A processor based computer system having dependency checking logic and a register stack, wherein the system overrides the dependency logic such that move instructions associated with the stack registers may be executed in parallel. The system operates such that it can be determined whether a stack underflow exception has occurred and if it has, the move instructions can be flushed, and a micro-code handler algorithm invoked that operates to allow execution of the move instructions in parallel without a stack underflow exception.

Description

    I. FIELD
  • The present invention relates to digital computer systems, and more particularly, but not by way of limitation, to methods and apparatus for executing instructions in such systems. [0001]
  • II. BACKGROUND
  • In x86 computer systems, the floating point unit (FPU) comprises a plurality of data registers. Floating point instructions treat this plurality of data registers as a register stack. All addressing of the data registers is relative to the register on the top of the stack. The register number of the current top-of-stack register is stored in a stack TOP field. Thus, load operations decrement TOP by one and load a value into the new top-of-stack register, while store operations store the value from the current top-of-stack register in memory and then increment TOP by one. [0002]
  • Many floating point instructions, however, only operate on the top one or two registers of a register stack. Thus, if the desired information is located in, e.g., the fourth stack register, one or more operations must be performed before the information in the fourth stack register can be moved into the top register of the stack where it can be operated upon. This creates a “bottleneck” in the stack. To this end, the floating point exchange register contents instruction (FXCH) is used in the IA-32 computer architecture to exchange the floating point information in a selected stack register with that in the top register of the stack. For example, the instruction [0003]
  • FXCH(0), ST(i) [0004]
  • will exchange the information in the top register in the stack (denoted ST([0005] 0)) with the ith register in the stack (denoted ST(i)). In this way, the bottleneck in the register stack can be alleviated by putting desired information at the top of the stack, where it can then be operated upon by most floating point instructions. More information regarding the FXCH instruction may be found in the Intel Architecture Software Developer's Manual, Volumes 1-3, which are hereby incorporated by reference.
  • In many computer architectures, instructions, such as the FXCH, must be executed by emulation because the native hardware that supports such an instruction is not present. One way of emulating the FXCH instruction in such architectures is through a technique called register renaming. In register renaming, the physical registers in question (e.g., ST([0006] 0) and ST(i)) are mapped into a stack register map. To exchange the contents of the two physical registers, the pointers that map the physical registers into the stack register map are changed or “re-pointed” from their original register to the other register, and thus the operation is performed. But, at least one problem with register renaming is that it requires that the pointers be stored in additional hardware which adds to the cost and complexity of the system as well as consuming valuable space.
  • Another way of emulating the FXCH is to sequentially execute at least three micro-code instructions as follows: [0007]
  • move temp := ST([0008] 0);
  • move ST([0009] 0) := ST(i);
  • move ST(i) := temp; [0010]
  • This is the traditional method of exchanging the contents of the register. This sequence of instructions uses a temporary register to switch the contents of the top register ST([0011] 0) and the ith register ST(i). This method of emulation, with its three micro-code instructions, consumes three times as many clock cycles as the single FXCH instruction and, in some cases, may consume even more, depending upon the latency associated with the move operations. Thus, there exists a need for methods and apparatus for emulating the FXCH instruction without adding excess hardware and that consumes relatively few clock cycles. More generally, there exists a need for methods and apparatus for exchanging the contents of two registers in a relatively quick and efficient manner.
  • III. SUMMARY
  • In one embodiment of the present invention, there is a processor based computer system having dependency checking logic and a register stack, wherein the system overrides the dependency logic such that move instructions associated with the stack registers may be executed in parallel. In another embodiment, the system operates such that it can be determined whether a stack underflow exception has occurred and if it has, the move instructions can be flushed, and a micro-code handler algorithm invoked that operates to allow execution of the move instructions in parallel without a stack underflow exception.[0012]
  • IV. BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of a computer system including the present invention. [0013]
  • FIG. 2 is block diagram of the processor of FIG. 1. [0014]
  • FIG. 3 is an illustration of pipelined or lockstep operations. [0015]
  • FIG. 4 is a flow chart of the operation of the stack underflow fault micro-code handler of the present invention. [0016]
  • V. DETAILED DESCRIPTION
  • A. Description of an Embodiment [0017]
  • FIG. 1 illustrates a [0018] computer system 10 in which the present invention may be implemented. The computer system 10 comprises at least one processor 20, main memory 30, and various interconnecting data, address, and control bases (numbered collectively as 40). An instruction set 50 (which may be a guest instruction set) and an operating system 60 may be stored in main memory 30. As illustrated in FIG. 2, the processor 20 comprises a floating point unit 70, dispatch or dependency checking logic 80, at least two execution units 90 a and 90 b, micro-code ROM 100, a register stack 120 (in this embodiment the register stack 120 comprises eight individual registers 120(0)-(7)), a floating point tag word (FPTW) register 130, and various busses and interconnections (numbered collectively as 110). (One skilled in the relevant art will note that there need not be a separate floating point unit—the execution units are equally capable of executing floating point instructions).
  • Instructions are provided to the [0019] processor 20 from main memory 30. The instructions provided to the processor 20 are macro-code instructions that map to one or more micro-code instructions 140 stored in the micro-code ROM 100. The micro-code instructions can be directly executed by processor 20. Also stored in the micro-code ROM 100 are a set of micro-code handlers 150 that may be invoked to handle processor exceptions.
  • The [0020] floating point unit 70 accesses the register stack 120 to store and retrieve data in response to instructions. The FPTW register 130 is updated accordingly.
  • The [0021] processor 20 may have a pipelined architecture and may have allow for parallel processing of certain instructions. Dependency checking logic 80 operates to determine which instructions can be operated in parallel, i.e., whether to issue two instructions or one instruction per cycle to the execution units 90.
  • B. Method of Operation [0022]
  • 1. Parallel Execution of Move Instructions [0023]
  • Assume that the [0024] processor 20 is presented with the following instructions:
  • ST(i): = move ST([0025] 0);
  • ST([0026] 0): = move ST(i);
  • In a sequential microprocessor, this sequence of code is not capable of exchanging the contents of the two registers (ST([0027] 0) and ST(i). In a sequential microprocessor, each instruction is executed independently and in serial fashion. Thus, in this sequence the first instruction will overwrite ST(i) before the second instruction reads ST(i), and the result placed in ST(0) will be ST(0) rather than ST(i). This is obviously an incorrect result. As noted, the traditional method for correctly performing the desired exchange would require an additional temporary location and an extra instruction.
  • However, in the present invention, the [0028] computer system 10 emulates the FXCH instruction by forcing the two move instructions to be executed in parallel (illustrated conceptually below):
  • ST(i): = move ST([0029] 0); ST(0): = move ST(i);
  • Because these two instructions each modify a register used by the other instruction, hardware would normally inhibit them from being executed in parallel. Thus, in the present invention, a mechanism is provided that forces the hardware to override the dependency checking logic, and forces the hardware to execute the two instructions in parallel. Both instructions are provided to [0030] respective execution units 90 a and 90 b that (1) read their respective operand at substantially the same time, (2) write their results at substantially the same time (at least one clock cycle after the read), and (3) finish executing their instructions at substantially the same time (i.e., they operate in lockstep). The latency associated with the respective execution units 90 a and 90 b is such that this can be accomplished. This lockstep execution is illustrated conceptually in FIG. 3. In this manner, the computer system 10 can effectively emulate the FXCH instruction in substantially less time than it would take to execute the three move instructions of a sequential system.
  • 2. Stack Underflow Exception [0031]
  • In modern superscalar microprocessors (ones with the ability to execute multiple instructions in parallel) exceptions are precise, meaning that when an operation faults, all of the “younger” operations in the pipeline must be flushed. However, in the present invention, a temporary register is not used to hold intermediate results of operations in the pipeline—i.e., the two instructions are completed in lockstep. Thus, if either one of the two move instructions above causes an exception, both of the move instructions are flushed and re-executed in parallel to prevent data corruption. However, that is not conventional operation for many microprocessors. That type of operation will work properly if the “first” instruction causes the exception, but not if the “second” instruction does—in that case, conventional operation dictates that the “first” instruction would not be flushed (see example below): [0032]
    ST(i): = mov ST(0) ST(0): = mov ST(i)
    “first” “second”
    “older” “younger”
  • Thus, in the present invention, the [0033] computer system 10 flushes both operations when either of them causes a fault.
  • An exception that may occur when these move instructions are executed is called a “stack underflow” exception. A stack underflow exception occurs when an operation attempts to read the contents of an empty stack register [0034] 120(0)-(7). A floating point tag word stored in the FPTW register 130 indicates whether a stack register 120(0)-(7) is empty or not. A defined architectural response to a stack underflow is to replace the empty register with a QNaN, mark it as non-empty, and the perform the instruction again. While it is possible to add hardware inside the execution units 90 a and 90 b to indicate which of the two move instruction caused the stack underflow fault, that is not desired because of the additional complexity and cost.
  • Thus, in the [0035] computer system 10, a micro-code handler algorithm 200, such as that illustrated in FIG. 4, is invoked by processor 20 when a stack underflow exception occurs. At block 210 of FIG. 4, a stack underflow exception has occurred when an attempt to execute the two move operations in parallel was attempted. At block 220, the micro-code handler algorithm 200 causes the FPTW bits in the FPTW register 130 that correspond to the ST(0) register to be checked and then at block 230, a decision is made as to whether register ST(0) is empty or not. If register ST(0) is empty, at block 240 its contents are replaced with a QNaN and the corresponding FPTW bit is set to indicate that the ST(0) register is no longer empty. Proceeding now to block 260, emulation of the FXCH instruction is performed again (by issuing the two instructions in parallel again). If the register ST(0) was not empty, the exception must have occurred because register ST(i) was empty and, at block 250 the register ST(i) contents are replaced with a QnaN, the corresponding FPTW bit is set, and the emulation is performed again at block 260. At block 270, if a stack underflow exception occurs once again, it is known that both registers involved in the operation must have been empty originally and that this time, the ST(i) register caused the exception. The registers contents at this stage are:
  • ST([0036] 0) := QNaN ST(i) := empty
  • Accordingly, at [0037] block 280, the ST(i) register is loaded with a QNAN and at block 290, the emulation proceeds again, this time without any exceptions—the QNaNs in the two registers will be harmlessly exchanged.
  • C. Remarks [0038]
  • By “forcing” the two move instructions to execute in parallel, emulation of the FXCH instruction may be achieved in substantially less time than previous methods and without adding hardware to the [0039] computer system 10. Furthermore, a microcode handler ensures correct execution of the emulated FXCH in the event of a stack underflow exception.
  • It will be readily apparent to those skilled in the art that innumerable variations, modifications, applications, and extensions of these embodiments and principles can be made without departing from the principles and spirit of the invention. For example, the methods and apparatus described herein may be applied to emulation of other instructions and to the handling of exceptions that occur when they are executed. [0040]
  • In another embodiment, rather than re-executing the instruction with respect to the ST(i) register (at [0041] block 260 of FIG. 4), the computer system 10 instead checks the ST(i) register thereby eliminating the need to re-execute an instruction. In this embodiment, additional hardware or microcode is added to determine which register ST(i) is referenced in the instruction. Accordingly, it is intended that the scope of the invention be only limited as necessitated by the accompanying claims.

Claims (20)

What is claimed is:
1. A method for emulating an instruction that switches the contents of a top stack register and a selected stack register, comprising:
overriding the dependency of the top stack register and the selected stack register;
executing a first instruction that moves the contents of the top stack register into the selected stack register in parallel with a second instruction that moves the contents of the selected stack register into the top stack register.
2. The method of claim 1 further comprising:
determining whether the top register is empty if a stack underflow exception occurs on a first attempt to execute the first and second instructions, and
if the top register is empty, replacing its contents with a QNaN, and
if the top register is not empty, replacing the contents of the selected register with a QNaN.
3. The method of claim 2 further comprising:
replacing the contents of the selected register with a QNaN if the top register is empty on the first attempt to execute the first and second instructions and if a stack underflow exception occurs on a second attempt to execute the first and second instructions.
4. A system comprising:
main memory storing an instruction set; and
a processor operably connected to main memory by a bus network, wherein the processor comprises:
a floating point unit;
a register stack;
dependency checking logic for determining whether instructions are executed sequentially or in parallel;
two execution units for executing instructions; and
ROM storing a micro-code handler that is invoked when two move instructions operating on the register stack are executed in parallel and cause a stack underflow exception.
5. In a processor based computer system having dependency checking logic and a register stack, a method comprising:
overriding the dependency logic such that move instructions associated with the stack registers may be executed in parallel;
executing the move instructions in parallel;
determining whether a stack underflow exception has occurred and if it has;
flushing the move instructions; and
invoking a micro-code handler algorithm that operates to allow execution of the move instructions in parallel without a stack underflow exception.
6. The method of claim 5, wherein the register stack comprises a top register and a selectable register, and wherein the act of invoking the micro-code handler further comprises:
determining whether the top register of the stack is empty and if it is, replacing its contents with an appropriate architectural response.
7. The method of claim 6, wherein the act of determining whether the top register of the stack is empty, further comprises:
replacing the contents of the selectable register with an appropriate architectural response if the top register of the stack is not empty.
8. The method of claim 7, further comprising:
executing the move instructions in parallel if the top or selectable register contents have be replaced with an appropriate architectural response.
9. The method of claim 7, further comprising:
replacing the contents of the selectable register with an appropriate architectural response if the contents of the top register have been replaced with an appropriate architectural response and a stack underflow exception has occurred.
10. A method for emulating an FXCH instruction, comprising:
providing a processor with a move ST(0) instruction and a move ST(i) instruction, wherein ST(0) denotes the top register of a stack and ST(i) denotes the ith register of the stack;
overriding the sequential dependency of the ST(0) and ST(i) registers;
executing the move ST(0) and move ST(i) instructions in parallel.
11. The method of claim 10, further comprising:
providing the move ST(0) instruction and move ST(i) instruction to respective execution units such that the instructions are executed substantially at the same time.
12. The method of claim 10, further comprising:
flushing the move ST(0) instruction and the move ST(i) instruction if a stack underflow exception occurs when the instructions are executed in parallel; and
invoking an algorithm that determines whether the stack underflow exception occurred because the ST(0) register, ST(i) register, or both were empty.
13. The method of claim 12, wherein the algorithm replaces an empty stack register with an appropriate architectural response.
14. The method of claim 13, wherein the algorithm replaces the ST(i) register with an appropriate architectural response if a stack underflow exception occurs the first time the move instructions are executed and the ST(0) register is not empty.
15. The method of claim 14, wherein the algorithm replaces the ST(i) register with an appropriate architectural response if a stack underflow exception occurs and the contents of the ST(0) register have been replaced with an appropriate architectural response.
16. A processor, comprising:
dependency checking logic; and
a register stack having a top register and a plurality of other registers;
wherein the processor is configured to:
override the dependency logic such that operations related to the register stack may be executed in parallel;
substantially simultaneously switch the contents of the top register of the register stack with the contents of another register of the register stack and vice versa, such that a hazard does not occur; and
execute an algorithm that replaces the contents of the top register, the other register, or both registers with an appropriate architectural response if an exception occurs when the contents of both registers are switched.
17. The processor of claim 16, wherein the appropriate architectural response is a QNaN.
18. The processor of claim 16 further comprising:
at least two execution units and a ROM.
19. The processor of claim 18, wherein the ROM stores a handler that is capable of implementing the appropriate architectural response.
20. A method of exchanging the contents of two registers, comprising:
overriding the dependency of the two registers; and
executing instructions that exchange the contents of the two registers in parallel and in lockstep.
US10/678,263 1999-11-26 2003-10-06 Method and apparatus for exchanging the contents of registers Abandoned US20040068641A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/678,263 US20040068641A1 (en) 1999-11-26 2003-10-06 Method and apparatus for exchanging the contents of registers

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/449,804 US6668315B1 (en) 1999-11-26 1999-11-26 Methods and apparatus for exchanging the contents of registers
US10/678,263 US20040068641A1 (en) 1999-11-26 2003-10-06 Method and apparatus for exchanging the contents of registers

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/449,804 Division US6668315B1 (en) 1999-11-26 1999-11-26 Methods and apparatus for exchanging the contents of registers

Publications (1)

Publication Number Publication Date
US20040068641A1 true US20040068641A1 (en) 2004-04-08

Family

ID=23785560

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/449,804 Expired - Fee Related US6668315B1 (en) 1999-11-26 1999-11-26 Methods and apparatus for exchanging the contents of registers
US10/678,263 Abandoned US20040068641A1 (en) 1999-11-26 2003-10-06 Method and apparatus for exchanging the contents of registers

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/449,804 Expired - Fee Related US6668315B1 (en) 1999-11-26 1999-11-26 Methods and apparatus for exchanging the contents of registers

Country Status (2)

Country Link
US (2) US6668315B1 (en)
FR (1) FR2801694A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150068546A1 (en) 2013-09-09 2015-03-12 Healthy Hair Inc. Hair Replacement and Method of Use

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5522051A (en) * 1992-07-29 1996-05-28 Intel Corporation Method and apparatus for stack manipulation in a pipelined processor
US5572664A (en) * 1994-06-01 1996-11-05 Advanced Micro Devices, Inc. System for generating floating point test vectors
US5696955A (en) * 1994-06-01 1997-12-09 Advanced Micro Devices, Inc. Floating point stack and exchange instruction
US5764938A (en) * 1994-06-01 1998-06-09 Advanced Micro Devices, Inc. Resynchronization of a superscalar processor
US5991863A (en) * 1996-08-30 1999-11-23 Texas Instruments Incorporated Single carry/borrow propagate adder/decrementer for generating register stack addresses in a microprocessor
US6079011A (en) * 1996-11-06 2000-06-20 Hyundai Electronics Industries Co., Ltd. Apparatus for executing a load instruction or exchange instruction in parallel with other instructions in a dual pipelined processor
US6370637B1 (en) * 1999-08-05 2002-04-09 Advanced Micro Devices, Inc. Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5471593A (en) * 1989-12-11 1995-11-28 Branigin; Michael H. Computer processor with an efficient means of executing many instructions simultaneously
US5404469A (en) * 1992-02-25 1995-04-04 Industrial Technology Research Institute Multi-threaded microprocessor architecture utilizing static interleaving
US5634118A (en) * 1995-04-10 1997-05-27 Exponential Technology, Inc. Splitting a floating-point stack-exchange instruction for merging into surrounding instructions by operand translation
US5940311A (en) * 1996-04-30 1999-08-17 Texas Instruments Incorporated Immediate floating-point operand reformatting in a microprocessor
US5860017A (en) 1996-06-28 1999-01-12 Intel Corporation Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction
US5884062A (en) * 1996-08-30 1999-03-16 Texas Instruments Incorporated Microprocessor with pipeline status integrity logic for handling multiple stage writeback exceptions
US5859999A (en) 1996-10-03 1999-01-12 Idea Corporation System for restoring predicate registers via a mask having at least a single bit corresponding to a plurality of registers
US5819060A (en) 1996-10-08 1998-10-06 Lsi Logic Corporation Instruction swapping in dual pipeline microprocessor
US5870576A (en) * 1996-12-16 1999-02-09 Hewlett-Packard Company Method and apparatus for storing and expanding variable-length program instructions upon detection of a miss condition within an instruction cache containing pointers to compressed instructions for wide instruction word processor architectures

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5522051A (en) * 1992-07-29 1996-05-28 Intel Corporation Method and apparatus for stack manipulation in a pipelined processor
US5572664A (en) * 1994-06-01 1996-11-05 Advanced Micro Devices, Inc. System for generating floating point test vectors
US5696955A (en) * 1994-06-01 1997-12-09 Advanced Micro Devices, Inc. Floating point stack and exchange instruction
US5764938A (en) * 1994-06-01 1998-06-09 Advanced Micro Devices, Inc. Resynchronization of a superscalar processor
US5857089A (en) * 1994-06-01 1999-01-05 Advanced Micro Devices, Inc. Floating point stack and exchange instruction
US5991863A (en) * 1996-08-30 1999-11-23 Texas Instruments Incorporated Single carry/borrow propagate adder/decrementer for generating register stack addresses in a microprocessor
US6079011A (en) * 1996-11-06 2000-06-20 Hyundai Electronics Industries Co., Ltd. Apparatus for executing a load instruction or exchange instruction in parallel with other instructions in a dual pipelined processor
US6370637B1 (en) * 1999-08-05 2002-04-09 Advanced Micro Devices, Inc. Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria

Also Published As

Publication number Publication date
US6668315B1 (en) 2003-12-23
FR2801694A1 (en) 2001-06-01

Similar Documents

Publication Publication Date Title
JP2597811B2 (en) Data processing system
US6009512A (en) Mechanism for forwarding operands based on predicated instructions
US6356918B1 (en) Method and system for managing registers in a data processing system supports out-of-order and speculative instruction execution
US5584009A (en) System and method of retiring store data from a write buffer
US5978901A (en) Floating point and multimedia unit with data type reclassification capability
US5577200A (en) Method and apparatus for loading and storing misaligned data on an out-of-order execution computer system
US6691221B2 (en) Loading previously dispatched slots in multiple instruction dispatch buffer before dispatching remaining slots for parallel execution
EP1099157B1 (en) Processor configured to map logical register numbers to physical register numbers using virtual register numbers
US5446912A (en) Partial width stalls within register alias table
US6219773B1 (en) System and method of retiring misaligned write operands from a write buffer
US7685402B2 (en) RISC microprocessor architecture implementing multiple typed register sets
US6279107B1 (en) Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions
US5931943A (en) Floating point NaN comparison
US7603497B2 (en) Method and apparatus to launch write queue read data in a microprocessor recovery unit
US6119223A (en) Map unit having rapid misprediction recovery
US6195745B1 (en) Pipeline throughput via parallel out-of-order execution of adds and moves in a supplemental integer execution unit
EP0868689A1 (en) A method and apparatus for executing floating point and packed data instructions using a single register file
WO1996012228A1 (en) Redundant mapping tables
EP0651331B1 (en) A write buffer for a superpipelined, superscalar microprocessor
US5740398A (en) Program order sequencing of data in a microprocessor with write buffer
US5644779A (en) Processing system and method of operation for concurrent processing of branch instructions with cancelling of processing of a branch instruction
US5615402A (en) Unified write buffer having information identifying whether the address belongs to a first write operand or a second write operand having an extra wide latch
US20050081021A1 (en) Automatic register backup/restore system and method
JP3142813B2 (en) Information processing system and method for managing register renaming
US6266763B1 (en) Physical rename register for efficiently storing floating point, integer, condition code, and multimedia values

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION