GB2456891A

GB2456891A - Updating corrupted local working registers in a multi-staged pipelined execution unit by refreshing from the last state hold a global checkpoint array

Info

Publication number: GB2456891A
Application number: GB0823186A
Authority: GB
Inventors: Guenter Gerwig; Ulrich Mayer; Frank Lehnert; Kevin Chung-Lung Shum
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2008-01-30
Filing date: 2008-12-19
Publication date: 2009-08-05
Anticipated expiration: 2028-12-19
Also published as: GB0823186D0; GB2456891B

Abstract

Disclosed is a method for updating corrupted local working registers in a multi-staged pipeline structure, after an exception. The registers being needed to execute complex instructions in an execution unit, e.g. a floating-point unit, whose deep pipeline structure comprises a set of local working registers. The pipeline being such data dependencies among different instructions referencing the same registers exist. The method operates by refreshing any corrupted local working register from the last architected state hold in a global checkpoint array. The registers may also be updated using the hardware infrastructure of the execution unit when the data is corrupted by early pipeline updates. A master copy of all local working registers may be held in the checkpoint array, which is not updated in exception cases. All the early loads or early register updates form instructions that were issued after an instruction got into the exception may be refreshed.

Description

DESCRIPTION

Method to update corrupted local working registers in a multi-staged pipelined execution unit

Technical field

The invention generally relates to microprocessor systems and how exceptions are handled in such systems.

Background of the invention

To speed up in order execution of data dependent instructions especially when using multi-staged pipeline structures, that tend to get pretty deep, specific hardware assists are implemented. For some of these hardware assists the local working registers get corrupted in exception cases. For architectures that require precise values of registers during exception cases, it is important to detect corrupted registers and restore their correct values.

In order to execute complex instructions, a floating-point execution unit, as an example, needs a multi-staged pipeline structure. Depending on the instruction being executed, its result may be available at different stages within the pipeline.

Common performance problems for such deep pipelines are data dependencies among different instructions referencing the same registers. There are different solutions to provide better performance that overcome delays due to such data or register dependencies. One is to have internal forwarding paths between different pipeline stages so results can be bypassed before they are written into the register copies. But sometimes building all these bypasses can be physically prohibitive. Thus often times, state of the art design will prefer the next option.

The second option is to update internal register copies relatively early in the pipeline, which can be referred to as "early load". This means if an update is done very early in the pipeline, succeeding instructions reading the same register are able to get the updated value and therefore can proceed executing. Doing so, the gap between an instruction updating a specific register and an instruction just reading this register can be minimized. This means that the update is done at a time where not all potential exceptions, caused by the updating instruction itself or older instructions that are still in the pipeline stages, are detected. In cases where exceptions are detected in later pipeline stages for an older instruction, the local working copy is now considered to be corrupted by a younger instruction which architecturally cannot have updated the working registers. With these corrupted values in the working register, further instruction processing cannot be done since these registers have already been updated by wrong values.

These registers will have to be restored back to the correct architec ted values.

In a microprocessor that includes checkpointed values of its architected registers in a recovery unit (RU) for error recovery, one can refresh all the working registers to the checkpointed values in this case. However, in order to fall back to the last valid architected register-state all registers used in this execution unit are refreshed no matter if they were accidentally updated or not, although in a worst-case scenario only one working copy was corrupted. This mechanism impacts performance massively.

Another possible solution to overcome a corrupted working register is to have two times the amount of registers in hardware. As an example, a deep pipelined floating point unit (FPU) that supports an architecture defining 16 Floating Point Registers (FPR5), an additional sixteen FPR5 will be added in hardware. Further as an example, during instruction execution of a "load FPR" instruction, FPU writes only the working register copy early in its pipeline but not the working register itself.

If that "load FPR" instruction and all prior instructions complete without any exceptions, the actual working register gets updated with its preliminary copy. Such an implementation does not have the disadvantage of updating registers unnecessarily but it needs a big amount of additional hardware.

Also the control logic used to manage register addresses for reads and writes gets more complicated.

Summarized, state of the art implementations described above have disadvantages either in terms of performance or in the terms of additional hardware. A better scheme is required to allow early updates, prevent corrupt registers, but not incur the performance penalty of refreshing all working registers.

US patent US 7,200,742 B2 describes a mechanism for mini-refreshing corrupted registers in an out of order processor environment.

Object of the invention An object of the invention is to develop a method to restore corrupted local working registers, e.g. FPR5, in a multi-staged pipeline structure, which method provides an improved performance without requiring additional hardware.

Summary of the invention

The disadvantages of the state of the art are overcome by a method to restore internal register copies, i.e. to update corrupted local working registers, e.g. FPRS, in a multi-staged pipeline structure needed to execute complex instructions in an execution unit, e.g. in a Binary Floating Point Unit (FPU), whose deep pipeline structure comprises a set of local working registers, wherein data dependencies among different instructions referencing the same registers exist. According to the invention, a corrupted local working register is refreshed with the last architected state hold in a global checkpoint array in a highly efficient manner.

Thus the invention provides a method that helps to overcome the disadvantages mentioned above by refreshing at least all corrupted local working registers from the last architected state hold in a global checkpoint array.

Doing so, the invention provides a solution to refresh corrupted registers in an in order processor environment where register renaming is not performed.

According to a preferred embodiment of the invention, working registers that are corrupted during an exception case are more precisely updated by also using an existing hardware infrastructure of the execution unit processor for exception cases where a local working register in an execution unit, e.g. a FPR in a FPU, gets corrupted by early pipeline updates, which hardware infrastructure within the processor usually is used to recover from possibly erroneous local working register states caused by single event upset (SEU) or defective hardware or undefined/inconsistent processor states.

Preferably a master copy for each local working register, e.g. FPR, is provided and held in the global checkpoint array, which master copy holds the last valid architected state, and which master copy is not updated in exception cases accordingly to the architecture, while in which local working registers can be corrupted.

A particularly preferred embodiment of the invention is characterized by refreshing all early loads or early register updates currently in the pipeline from instructions that are issued after any instruction that got an exception or if the exception belongs to the early load itself, which should have been prevented according to architecture, wherein if an exception occurs and there is no early load in the pipeline, no refresh is done, and wherein if an exception occurs and there is one or more early loads in the pipeline that could have corrupted working registers, ala pipeline addresses are refreshed until all early loads have drained out. The refresh process does not care whether each address is used for reading or writing a specific local working register.

According to another embodiment of the invention, information needed to do the partial refresh is derived from the execution unit pipeline. The early load indication and FPR address associated to each pipeline stage are already used by the execution unit internally and thus no extra hardware must be added. In order to support the scheme of updating corrupted local working registers, only execution unit early load pipeline tags have to be combined by logically "OR"ing the indications together. The resulting signal indicates whether any restoration of corrupted local working registers has to be performed when an exception is seen. The process of partially restoring corrupted local working registers is also named Mini-Refresh or, in this case, FPU Mini-Refresh.

The multi-staged pipeline structure can be a floating point unit, in particular a binary floating point unit. Preferably the multi-staged pipeline structure is arranged inside a microprocessor.

A particularly preferred embodiment of said method according to the invention comprises the steps of: -Associating an early load flag to each pipeline stage of an execution unit, e.g. a FPU, indicating that a current instruction at that stage has made an early register update.

-Combining all early load flags, preferably with logical "OR", to signal an early update indication, also called a mini-refresh indication, to a recovery unit (RU) that at least one early load instruction is in the FPU pipeline and thus a FPU mini-refresh as mentioned above, is required.

Once the last early load has drained out of the FPU pipeline, the mini-refresh indication drops.

-Propagating a register write address, e.g. a FPR write address, for any instruction that update a FPR through all FPU pipeline stages; -Sending mini-refresh indication together with the FPR write address corresponding to the last stage of the pipeline to the RU in every cycle.

-Dropping the early update indication if the last early load has drained out.

-Using FPR write address to read out the appropriate checkpoint entry in RU after an exception is detected and while early update indication, i.e. mini-refresh indication, is active; -Sending the architected register state, e.g. FPR state, stored in the global checkpoint array to a fixed-point unit (FXU) and spreading it from there to the multi-staged pipeline unit, e.g. the floating-point unit, using a common result bus (CBUS).

-Writing and thus refreshing the local working register copy, e.g. the FPU FPR working copy, with data and according to the address through the CBUS interface in said multi-staged pipeline unit through the functional/recovery data path, i.e. from CBUS through the functional/recovery data path; Thereby register write addresses can be the last write address or any value, if the instruction at that pipeline stage is not updating any registers, or if there is no instruction at that pipeline stage.

Advantages of the invention over the state of the art are that the proposed solution allows more precise updating of the FPRs that get corrupted with only a little amount of additional hardware. This means that the invention combines the advantages of state of the art implementations introductorily described without having their disadvantages. The hardware infrastructure used for hardware error recovery scheme is also used for refreshing corrupted FPRs and thus no extra hardware is required. Furthermore the control logic needed to manage the particular FPR refreshes is not complicated since all necessary information are associated to each FPU pipeline stage and thus just need to be propagated through the pipeline without getting modified. The only additional logic that is needed for this partial refresh scheme is some minor control logic that resides in the FPU and the RU.

Although this invention is described for a FPU, it can extended to any execution unit with a multi-staged pipeline that prefer to have early results written into its local register.

The foregoing, together with other objects, features, and advantages of this invention can be better appreciated with reference to the following specification, claims and drawings.

Brief description of the drawings, with

Fig. 1 showing a scheme of a FPU mini refresh timing according to the invention.

Fig. 2 showing a scheme of an arrangement comprising a floating-point unit performing a method according to the invention.

Detailed description of the drawings

Compared to the state of the art implementation described first, where all sixteen FPRs are updated regardless which working registers are corrupted, according to the invention, the registers that were corrupted are more precisely updated without having additional hardware. The key in refreshing a corrupted FPR copy is to have a master copy for each FPR that holds the last valid architected state and that is not updated in exception cases as defined in the architecture. Besides that also a write path is needed for updating the master copy with the latest value if an instruction has been completed successfully. And in case a working copy gets corrupted there must be also a read access path used for getting the latest valid state and reflecting it back to the appropriate working copy.

One aspect of the invention is to use an existing hardware infrastructure used to recover from potential erroneous processor states, including FPU register, caused by SEU, defective hardware or undefined/inconsistent processor states also for cases where a FPR gets corrupted by early loads. This existing hardware infrastructure has all necessary building blocks and fits perfectly in the concept of mini-refreshing FPR5.

The most important point is to find out whether a FPR is corrupted by an early load or not at the time an exception encountered. The key here is to improve the performance significantly by only refreshing corrupted registers when needed. This is achieved by refreshing all early loads currently in the pipeline if issued after the instruction that got the exception or if the exception belongs to the early load itself.

This means if an exception occurs and there is no early load in the pipeline, no refresh is done. If an exception occurs and there are one or more early loads in the pipeline all pipeline FPR addresses are refreshed until all early loads have drained out. As an example, if only one early load had been performed, only one refresh is needed. As another example, if an early load instruction is immediately followed by a non-early load instruction, then immediately followed by another early load instruction, 3 refreshes will be done. The refresh process, for simplicity, does not care whether the FPR address is used for reading or writing a specific FPR.

All information needed to do the partial refresh can be easily derived from the Binary FPU pipeline. The early load indication and the FPR address are already associated to each pipeline stage in the FPU internally and thus no extra hardware must be added. In order to support the new mini-refresh scheme only the FPU early load pipeline tags have to be combined by logically "OR"ing the indications together. The resulting signal indicates whether a mini-refresh has to be performed when an exception is seen. The process of partially updating corrupted FPR registers is also named FPU Mini-Refresh.

The FPU mini-refresh comprises the following steps as shown in Fig. 1 and Fig. 2 -A flag 04 is associated to each FPU pipeline stage indicating that instruction currently at that stage has made an early register update. All early load flags are ORed together 05 to signal RU that a FPU mini refresh may be required.

-10 - -A FPR write address is propagated through all pipeline stages 03. In the case of an instruction that doesn't write, or there is no instruction, the write address value does not matter, and most likely in some implementation equals what it was before.

-The mini refresh indication 06 together with the FPR write address 07 is sent to RU 13 in every cycle.

-In case of an exception all FPRs pointed to by the FPR address as sent by the FPU are refreshed as long as mini refresh indication is active. If last early load has drained out of FPU pipeline, the mini refresh indication 06 drops.

FPR addresses sent by the FPU are used to read out the appropriate checkpointed entry from the checkpointed array of FPRs 10 via the refresh control 09.

-Each architected FPR state 11 needed for refresh and its corresponding FPR address are sent to the FXU 02 and from there spread out to the FPU 01 using the CBUS interfaces 12.

-The FPU 02 refreshes FPR working copy 08 with data and FPR address from the CBUS interface 12 using the functional/recovery data path 14.

Advantages of the invention over the state of the art are that the proposed solution allows more precise updating of the FPRs that get corrupted with only a little amount of additional hardware. This means that the invention combines the advantages of state of the art implementations introductorily described without having their disadvantages. The hardware infrastructure used for hardware error recovery scheme is also used for refreshing corrupted FPRs and thus no extra hardware is required. Furthermore the control logic needed to manage the particular FPR refreshes is less complicated since all necessary information are already associated to each FPU pipeline stage and thus need to be propagated through the pipeline without getting modified. The only additional logic that is needed for -11 -this partial refresh scheme is some minor control logic that resides in the FPU and RU.

The execution unit can be a floating-point unit like e.g. a binary FPU, or it can be any execution unit that employs a multi-staged pipeline that will benefit from early pipeline updates.

The invention preferably can be utilized in combination with a microprocessor known from Siegel et al: "The IBM eServer z990 microprocessor"; IBM J. Res. & Dev. Vol. 48 No. 3/4 Nay/July 2004.

A particularly preferred utilization of the invention can be in combination with a floating-point unit known from Cerwig et al: "The IBM eServei-z990 floating-point unit"; IBM J. Res. & Dev.

Vol. 48 No. 3/4 May/July 2004.

While the present invention has been described in detail, in conjunction with specific preferred embodiments, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art in light of the foregoing description. it is therefore contemplated that the appended claims will embrace any such alternatives, modifications and variations as falling within the true scope and spirit of the present invention.

Claims

-12 - CLAIMS

1. Method to update corrupted local working registers in a multi-staged pipeline structure needed to execute complex instructions in an execution unit, whose deep pipeline structure comprises a set of local working registers, wherein data dependencies among different instructions referencing the same registers exist, characterized by refreshing any corrupted local working register from the last architected state hold in a global checkpoint array.

2. Method according to claim 1, characterized in that working registers that are corrupted during an exception case are more precisely updated by also using an existing hardware infrastructure of the execution unit for cases where a local working register gets corrupted by early pipeline updates, which hardware infrastructure within the processor usually is used to recover from possibly erroneous local working register states caused by single event upset or defective hardware or undefined/inconsistent processor states.

3. Method according to claim 1 or 2, characterized in that a master copy for each local working register is held in the global checkpoint array, which master copy holds the last valid architected state, and which master copy is not updated in exception cases accordingly to the architecture, while in which local working registers can be corrupted.

4. Method according to one of the previous claims, characterized by refreshing all early loads or early register updates currently in the pipeline from instructions that are issued after any instruction that got an exception or if the exception belongs to an early load itself, wherein if an exception occurs and there is no early load in the pipeline, no refresh is done, and wherein if an exception occurs and there is one or more -13 -early loads in the pipeline that could have corrupted working registers, all pipeline addresses between the first and last early loads are refreshed until all early loads have drained out.

5. Method according to one of the previous claims, characterized in that information needed to do the partial refresh are derived from the execution unit pipeline.

6. Method according to claim 5, characterized in that the multi-staged pipeline structure can be a floating point unit, in particular a binary floating point unit.

7. Method according to claim 5, characterized in that the multi-staged pipeline structure is arranged inside a microprocessor.

8. Method according to one of the previous claims, characterized by the steps of: -associating an early load flag to each pipeline stage indicating that a current instruction at that stage has made an early register update; -combining all early load flags to signal an early update indication to a recovery-unit (RU) that at least one update of a corrupted local working register is required; -propagating a register write address through all pipeline stages; -sending the register write address from the last pipeline stage together with an early update indication to the RU in every cycle; -dropping the early update indication if the last early load has drained out; -using register write address to read out the appropriate checkpoint entry in RU after an exception is detected and while early update indication is active; -14 - -sending the last architected register state stored in the global checkpoint array to a fixed-point unit (FXU) and spreading it from there to the multi-staged pipeline unit using a common result bus (CBUS); -writing and thus refreshing the local working register copy with data and according to the address through the CBUS interface in the said multi-staged pipeline unit through the functional/recovery data path.

9. Method according to claim 8, characterized in that register write addresses can be the last write address or any value, if the instruction at that pipeline stage is not updating any registers, or if there is no instruction at that pipeline stage.