WO2006115474A1

WO2006115474A1 - Error recovery within processing stages of an integrated circuit

Info

Publication number: WO2006115474A1
Application number: PCT/US2005/013555
Authority: WO
Inventors: David Theodore Blaauw; David Michael Bull; Shidhartha Das
Original assignee: Arm Limited; Universty Of Michigan
Priority date: 2005-04-21
Filing date: 2005-04-21
Publication date: 2006-11-02
Also published as: JP2008537438A; CN101203836A; GB2439019A; JP4722994B2; GB2439019B; GB0719031D0; CN100565465C

Abstract

An integrated circuit comprises an error detection circuit 3230-1 to 3230-4 operable to detect a transition in the signal value in a predetermined time window, which is indicative of an error in operation of the integrated circuit. The integrated circuit also comprises a storage unit 3296 operable to store a recoverable state of the data processing apparatus comprising at least a subset of architectural state variables corresponding to a programmer's model of the integrated circuit. An error recovery circuit 3250, 3260,3210 is provided as part of the integrated circuit and this serves to enable the integrated circuit to recover from detected errors in operation using the stored recoverable state from the storage unit 3296. An operational parameter controller 3242 of the integrated circuit adjusts the operating parameters of the integrated circuit, such as the clock frequency, the operating voltage, the body biased voltage, temperature in dependence upon one or more characteristics of detected errors in operation so as to maintain a finite non-zero error rate in a manner that increases overall performance.

Description

ERROR RECOVERY WITHIN PROCESSING STAGES OF AN

INTEGRATED CIRCUIT BACKGROUND OF THE INVENTION Field of the Invention This invention relates to the field of integrated circuits. More particularly, this invention relates to the detection of operational errors within the processing stages of an integrated circuit and recovery from such errors.

Description of the Prior Art It is known to provide integrated circuits formed of serially connected processing stages, for example a pipelined circuit. Each processing stage comprises processing logic and a latch for storing an output value from one stage which is subsequently supplied as input to the succeeding processing stage. The time taken for the processing logic to complete its processing operation determines the speed at which the integrated circuit may operate. The fastest rate at which the processing logic can operate is constrained by the slowest of the processing logic stages. In order to process data as rapidly as possible, the processing stages of the circuit will be driven at as rapid a rate as possible until the slowest of the processing stages is unable to keep pace. However, in situations where the power consumption of the integrated circuit is more important that increasing the processing rate, the operating voltage of the integrated circuit will be reduced so as to reduce power consumption to the point at which the slowest processing stage is no longer able to keep pace. Both the situation where the voltage level is reduced to the point at which the slowest processing stage can no longer keep pace and the situation where the operating frequency is increased to the point at which the slowest processing stage can no longer perform its processing will give rise to the occurrence of processing errors that will adversely effect the forward-progress of the computation.

It is known to avoid the occurrence of such processing errors by setting an integrated circuit to operate at a voltage level which is sufficiently above a minimum voltage level and at a processing frequency that is sufficiently less than the maximum desirable processing frequency taking into account properties of the integrated circuits including manufacturing variation between different integrated circuits within a batch, operating environment conditions, such as typical temperature ranges, data dependencies of signals being processed and the like. This conventional approach is cautious in restricting the maximum operating frequency and the minimum operating voltage to take account of the worst case situations.

US Patent Application Publication No. US2004-0199821, discloses a system in which an integrated circuit is arranged to operate so as to maintain a non-zero rate of errors in operation by dynamically controlling at least one performance controlling parameter, such as frequency, operating voltage, or temperature. This system enables- forward progress of the computation, despite the presence of timing errors, by the use of a delayed latch that captures data at a point later in time than the main latch of the associated processing stage of the integrated circuit. The data value captured by the delayed latch is used in the event of detection of an error to replace the value captured by the main latch at a point in time before the output of the processing stage was stable. By deliberately operating the integrated circuit at a non-zero error rate, an individual integrated circuit can be tuned to obtain the fastest possible processing speed or the lowest possible energy consumption as required by the particular processing application. However, the requirement to modify the processing circuit by providing a delayed latch for each main latch of the processing stages can in certain circumstances be inflexible. For example, if operational errors are not restricted to the datapath of the central processing unit (CPU), but also occur in the control logic itself or in other critical paths of the integrated circuits then a considerable number of delay latches would have to be added to the integrated circuit to implement the error detection and recovery. Furthermore, in embodiments of US-2004-0199821 that use existing pipeline sequencing logic to implement error recovery by reading data values from the delayed latches it may be difficult to ensure that the pipeline sequencing logic itself is not affected by errors in operation, either directly due to a critical path in the control logic itself or indirectly by feeding back a metastable value from the datapath into the control logic. Thus, there is a need for a technique that enables improved performance to be derived from an integrated circuit yet does not require extensive modifications to existing integrated circuit design to accommodate error recovery operations.

SUMMARY OF THE INVENTION

Viewed from one aspect the present invention provides an integrated circuit for a data processing apparatus, said integrated circuit being operable to perform digital data processing and comprising: an error detection circuit operable to monitor a digital signal value within said integrated circuit and to detect a transition in said signal value in a predetermined time window, said transition being indicative of an error in operation of said integrated circuit; a storage unit operable to store a recoverable state of said data processing apparatus, said recoverable state comprising at least a subset of architectural state variables corresponding to a programmer's model of said integrated circuit; an error-recovery circuit responsive to said error detection circuit and operable to enable said integrated circuit to recover from said error in operation using said stored recoverable state; an operational parameter controller operable to control one or more performance controlling operational parameters of said integrated circuit; wherein said operational parameter controller dynamically controls at least one of said one or more performance controlling parameters in dependence upon one or more characteristics of errors detected by said error detection circuit to maintain a nonzero rate of errors in operation, said error-recovery circuit being operable to enable the integrated circuit to recover from said errors in operation such that data processing by said integrated circuit continues.

The present technique recognises that the operation of processing stages can be directly monitored to find the limiting conditions in which they fail. When actual failures occur, error-recovery can be performed by restoring the integrated circuit to a previous recoverable state of operation from which processing can be safely resumed. This technique recognises that error detection can be performed without the requirement to capture a delayed value from each processing stage or the requirement to reload the correct values into the processing logic in the event of an error in operation. The present technique enables integrated circuits to be relatively easily modified so that the error detection and recovery can be applied to any critical path within the integrated circuit including both CPU data paths and control logic.

The recoverable state stored by the storage unit (which may be multiple storage elements dispersed throughout the integrated circuit) could comprise at least a subset of architectural state variables corresponding to the programmer's model, such as register values, flag values and processing modes. However, in one embodiment the recoverable state comprises at least a subset of micro-architectural state variables that are not part of the programmer's model such as, for example, information on variables stored in cache. This arrangement provides flexibility in the error recovery capability of the integrated circuit since different errors in operation will require different subsets of recoverable state in order to return the integrated circuit to a state from which forward-progress of the computation can be reliably performed. It will be appreciated that some errors in operation will have effects that propagate to more state variables and different types of state variables than other errors in operation.

It will be appreciated that the error detection circuit could detect the error in operation in a number of different ways. However, in one embodiment the error detection circuit is arranged to detect a transition in a data value by calculating a difference between an input signal value at a first sampling time and the same signal at a second, subsequent sampling time. Thus, any difference in the signal value within a time period when no difference in output is expected if the circuit is operating reliably, enables straight-forward detection of an error. In another embodiment the error detection circuit is arranged to detect a transition in the data signal by detecting any change of state in the signal value within a predetermined time window. This contrasts with the embodiment that involves two distinct sampling points by detecting a glitch in the signal value between two sampling points that would not otherwise be detected. Thus the detection of the transition the signal value is effectively continuous rather than discrete.

In one embodiment, the error detection circuit is operable to detect an error in an output signal of an associated processing circuit element of the integrated circuit.

This enables effective correlation between the processing stage and the occurrence of an error. In alternative arrangements a detection circuit may be shared between a number of processing stages.

In-one embodiment the integrated circuit has an error detection circuit having a metastability window that is mutually exclusive with a setup window of the associated processing circuit element (e.g. main flip-flop). This enables detection of an error in operation even when the input data transitions in the setup window of the main flip- flop. Arranging the metastability window of the error detection circuit such that it is non-overlapping with the setup window of the main latch associated with the processing stage obviates the need to provide a power-hungry metastability detection circuit and enables sensing of transitions in the data signal both during the set up window of the main latch of the processing stage and during the hold window of the clock signal that is the positive phase of the clock signal.

It will be appreciated that the integrated circuit could be a non-pipelined integrated circuit, but in one embodiment the integrated circuit is a pipelined integrated circuit comprising a plurality of serially connected processing stages.

Although the particular processing circuit element with which an error detection circuit is associated could be any circuit element capable of storing the processing value, for example a latching sense-amp, in one embodiment the processing circuit element is a latch for passing data between consecutive ones of a plurality of pipeline stages. A latch is a simple circuit element and association of an error detection circuit with a latch provides for efficient error-detection that is easy to implement. In one embodiment the error detection circuit comprises at least one error delay element arranged to delay an input digital signal to enable detection of a transition occurring during a set-up time of the processing circuit element. This avoids the possibility of an error in operation being missed when a data transition occurs during the set-up time of the main processing circuit element, since in such a case the logic state of that processing element would otherwise be unresolved. Delaying the digital signal has the effect of aligning the data transition for the input to the error detection circuit such that the sampling window of the error detection circuit overlaps the setup window of the main processing element causing signal transitions in the setup window of the main processing element to be reliably detected as errors in the error detection circuit.

It will be appreciated that the error detection circuit could take many different forms but in one embodiment the error detection circuit comprises at least one of a zero-to-one transition detector and a one-to-zero transition detector. These transition detectors could be distinct detectors or could be a single circuit operable to detect transitions of both orientations.

Although the integrated circuit could recover from errors in operation by flushing the pipeline of erroneous values and restoring a previous state directly from the reusable state store, in one embodiment the error recovery circuit comprises at least one stability pipeline stage operable to enable a verification of output values of the plurality of pipeline stages in the pipelined integrated circuit prior to commitment of those output values as stored state variables of the integrated circuit. The stability pipeline stages allow sufficient time to determine whether an error has occurred in the production of output values of the pipeline states and this reduces the likelihood that committed state variables will be corrupted.

Although inclusion of at least one stability pipeline stage in the error recovery circuit may involve delay in committal of calculated pipeline values, in one embodiment the integrated circuit comprises data forwarding circuitry operable to supply a value calculated by a particular one of the plurality of pipeline stages directly from the particular pipeline stage to another different one of the plurality of pipeline stages for use as an input value. This reduces the impact of read-after-write hazards that could potentially arise from provision of the extra stability pipeline stages. The forwarding circuitry enables the value calculated by a previous processing stage to be supplied to a subsequent processing stage currently in the pipeline before that value has been committed to a register. This prevents the subsequent processing stage from using an incorrect input value.

It will be appreciated that the storage unit could be any type of memory, such as stack memory, but in one embodiment the storage unit includes a register bank.

Although the register bank could be operable to store state variables before those state variables have been confirmed as being free of errors, in one embodiment the register bank is operable to store only confirmed state variables, the confirmed state variables having been confirmed to be free of timing violations. Thus, the state variables stored in the register bank are reliable state variables and can be used to recover from a subsequent detected error in operation of the integrated circuit.

It will be appreciated that the integrated circuit could comprise a single storage unit comprising a single register bank. However, in one embodiment the integrated circuit comprises a speculative register bank operable to store speculative state variables whose values have not been^" confirmed as being free of timing violations in addition to a confirmed register bank operable to store confirmed state variables whose values have been confirmed as being correct (stable) values. This enables a portion of the error recovery to be performed in parallel with the main processing. Thus, values in the speculative register bank are corrected using values from the confirmed register bank only in the event of the detection of an error in operation of the integrated circuit. At any one time the speculative register bank stores state variables for more advanced processing stages than the currently stored state variables in the confirmed register bank. In the event of an error, the error recovery circuit is operable to replace a subset of the speculative state variables in the speculative register bank by corresponding ones of the confirmed state variables from the confirmed register bank so that the processing can return to a previous stage at which the detected error in operation has not yet effected any of the state variables. This ensures forward-progress of the computation despite the occurrence of a processing error.

It will be appreciated that the operational parameter controller could be operable to adjust the performance controlling parameters in response to detection of an error in operation of the integrated circuit. The parameter adjustment could be performed immediately in response to detection of an error. For example the operating frequency could be reduced or the operating voltage increased to ensure that the likelihood of errors in operation is decreased. These adjustments could be perfromed at- least temporarily. However, in one embodiment of the invention the response of adjusting the operational parameters by the operational parameter controller is damped so that the there is a time delay following detection of at least one error in operation before the adjustment of one or more of the performance controlling parameters. This allows the integrated circuit to assess the likelihood of the increased error rate persisting since such an increase may not be systematic and could be dealt with without adjustment of the operational parameters by simply re-executing the relevant sequence of processing operations. However, temporary adjustment of one or more operational parameters may be performed to prevent deadlock.

According to a second aspect the present invention provides a method of controlling an integrated circuit for performing data processing, said method comprising the steps of: monitoring a digital signal value within said integrated circuit and detecting a transition in said signal value in a predetermined time window, said transition being indicative of an error in operation of said integrated circuit; storing a recoverable state of said data processing apparatus, said recoverable state comprising at least a subset of architectural state variables corresponding to a programmer's model of said integrated circuit; using said stored recoverable state, in response to said detection of said error in operation, to enable said integrated circuit to recover from said error in operation; controlling one or more performance controlling operational parameters of said integrated circuit; wherein said step of controlling comprises dynamically controlling at least one of said one or more performance controlling parameters in dependence upon one or more characteristics of errors detected in said monitoring and detecting step to maintain a non-zero rate of errors in operation, use of said stored recoverable state in response to said detection of said error in operation enabling said integrated circuit to recover from said errors in operation such that data processing by said integrated circuit continues.

The above, and other objects, features and advantages of this invention will be — apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 schematically illustrates one example of a plurality of processing stages of an integrated circuit to which the present technique is applied;

Figure 2 schematically illustrates a pipeline in which error recovery is performed using a confirmed register bank together with a speculative register bank;

Figure 3A schematically illustrates a pipeline arrangement in which error recovery is performed using state variables stored in a single register bank;

Figure 3B is a flow chart schematically illustrating how the circuit of Figure 3 A recovers from a detected error;

Figure 3C is a flow chart that schematically illustrates an operational parameter tuning process;

Figure 4 schematically illustrates a transition detection D- flip-flop according to the present technique; Figure 5 schematically illustrates a functional timing diagram that illustrates how detection of a transition of data in a set up window of the main flip-flop of Figure 4 is detected;

Figures 6A to 6G schematically illustrate functional timing diagrams for signals passing through the circuit of Figure 4 when detection of a transition from logic level one to logic level zero is performed;

Figures 7A to 7G schematically illustrate a functional timing diagram for the signals in the circuit of Figure 4 when detecting a data transition from-the logic level zero to the logic level one;

Figures 8A and 8B schematically illustrate how the metastability windows of the main flip-flop and the transition detector of Figure 4 are non-overlapping; and

Figure 9 schematically illustrates error synchronisation of error signals derived from transition detectors.

DESCRIPTION OF THE PREFERRED EMBODIMENTS Figure 1 schematically illustrates part of an integrated circuit, which may be part of a synchronous pipeline within a processor core, such as an ARM processor core designed by ARM Limited of Cambridge, England. A synchronous pipeline is formed of a plurality of processing stages. The first stage comprises logic module 3010 followed by a latch 3020 in the form of a flip-flop. The output of the logic module 2010 is supplied to a transition detector 3030, which is operable to detect a transition in the logic signal value, which occurs in a predetermined time window and is indicative of an error in operation of the integrated circuit. Such errors in operation are likely to arise if the operating parameters for the integrated circuit are such that the logic module 3010 has not completed its processing operation by the time the flip-flop 3020 captures its value.

The operating parameters of the integrated circuit include the clock-signal frequency supplied by a clock 3031, an operating voltage supplied to the integrated circuit, the body bias voltage, the temperature etc. In particular, if the clock frequency is set to be so rapid that the slowest of the processing data stages is unable to keep pace, or if the operating voltage of the integrated circuit is reduced so as to reduce power consumption to the point at which the slowest of the processing stages is no longer able to keep pace, then systematic processing errors will occur. Subsequent processing stages of the integrated circuit are similarly formed of a logic module that leads into a transition detector and a flip-flop that captures the output value of the associated logic module.

In Figure 1 three stages of processing are illustrated and there are three corresponding transition detectors 3030,- 3032 and 3034. The outputs of these transition detectors are each supplied to an OR gate 3040. A high output from the OR gate 3040 indicates that a processing error has occurred in at least one of the associated logic modules. This indication of an error is supplied as an output of the OR gate 3040 and as an input to an error recovery logic module 3050, which is responsive to each of the transition detectors and is operable to enable the integrated circuit to recover from an error in operation. Recovery from an error in operation is achieved by the error recovery logic 3050 by using stored state information 3060. The stored state information 3060 allows the integrated circuit to recover from the error in operation by enabling a return to a previous state of processing from which to re-commence the calculation. The state information may include both architectural state variables and micro-architectural state variables.

Architectural state variables correspond to those variables that would be specified in a programmer's model of the integrated circuit, for example register values, instruction flags, program counter values etc. An example of micro-architectural state variables is cache content. For example, for an ADD instruction with a flag set, execution of the instruction ADDS RO RO Rl would involve storage of state variable RO, the flags associated with the flag set operation and the program counter value associated with this instruction. Other examples of state variables are the particular operational mode of the processor, such as privileged mode or user mode. The error recovery logic 3050 enables forward progress of the computation in the presence of errors in operation of the integrated circuit. This is achieved by detection of timing errors by the transition detectors 3030, 3032, 3034 and the use of the error recovery logic 3050 to recover from the detected error using the stored state information 3060. The stored state information 3060 used for error recovery will be the values that have been confirmed to be unaffected by errors in operation and most recently stored to registers. Such stored values correspond to the architectural state of the integrated circuit prior to the detection of an error in operation.

Figure 2 schematically illustrates an arrangement according to one example of the present technique that uses a confirmed register bank in addition to the speculative register bank to recover from an error in operation. The arrangement comprises: a main processing pipeline 3100; a speculative register bank 3110; a plurality of stability pipeline stages 3120; a critical state buffer 3122; a confirmed state buffer 3124; a confirmed register bank 3130; an array of transition detectors 3142-1 to 3142-4; an OR logic gate 3150; error detection logic 3160; pipeline flush logic 3170; confirmed state recovery logic 3180; and program counter reset logic 3190. The main processing pipeline 3100 comprises four distinct pipeline stages, a first execution stage n, a second execution stage n-1, a third execution stage n-2 and a writeback stage n-3. Outputs from a processing pipeline stage are passed to the subsequent pipeline stage via a latch (such as a flip-flop 3020 of Figure 1). The output of the writeback pipeline stage n-3 is supplied to the speculative register bank 3110 via the signal paths 3101 and 3103, which lead respectively to the two write ports SWO and SWl of the speculative register bank 3110. In the particular arrangement illustrated in Figure 2 the writeback stage of the main pipeline corresponds to processing stage n-3 and thus the last state that has been stored in the speculative register bank 3110 in this arrangement corresponds to the processing stage n-4.

Output from the first execution stage n is output to the transition detector 3142- 1; output from the second execution stage n-1 is output to the transition detector 3142-

2; output from the third execution stage of the main pipeline n-2 is output to the transition detector 3142-3; and finally output from the writeback stage WB of the main 13

pipeline 3100 is output to the transition detector 3142-4. Each of these transition detectors 3142-1 to 3142-4 is capable of indicating an error in operation of the processing circuitry. The outputs of all four transition detectors are supplied with inputs to the OR logic gate 3150, whose output is supplied to the error detection logic 3160. Thus if any transition is detected in any one of the four main pipeline states n, n-1, n-2 or n-3 then the OR logic gate will output a value indicative of an error in operation. The error detection logic 3160 is responsive to the output of the OR logic gate 3150 to initiate error recovery processes performed by the pipeline flush logic 3170, confirmed state recovery 3180 logic and program counter reset 3190 logic so that the detected error in operation does not affect any of the values stored within the confirmed register bank 3130. Thus in response to a detected error in operation the pipeline flush logic 3170 initiates a pipeline flush to clear the pipeline of any potentially erroneous values. The pipeline flush logic 3170 is connected both to the critical state buffer 3122 and to the stability pipeline stages 3120. In the event of a detected error in operation all of the values in the main pipeline are flushed in addition to the values in the stability stages of the pipeline 3120 and all of the values currently stored in the critical state buffer 3022 which have not yet been stored in the confirmed register bank 3130. Once the pipeline has been flushed the confirmed state recovery logic 3180 initiates a series of processing operations whereby the data processing apparatus is returned to a previous state in which the instruction whose values have most recently been stored in the confirmed register bank 3130 has just been executed. Re-execution starting from this instruction is commenced after the program counter reset logic 3190 has reset the program counter from the current instruction to the instruction following that for which values have most recently been stored to the confirmed register bank 3130.

Normal processing operations involve execution of a plurality of instructions each of which may involve the update of a number of different types of architectural state variables. For example execution of a single given instruction may require that one or more general purpose registers, flags, a program-status register, or a program counter be updated. However, the physical elements that store these updated variables will not necessarily be updated in one and the same clock cycle, even though they 2005/013555

14

relate to the same given instruction. For example, in the ARM^R™ instruction set a load instruction is not capable of changing the flags and thus it is possible to store the updates to the flags in a processing cycle earlier than that in which the updates to the general purpose registers are stored. Note that the general purpose registers cannot be updated until it is known that a load instruction has not generated a memory-stage related exception, such as a permission fault. It will be appreciated that an error in operation could happen in any processing cycle. Thus, in the arrangement of Figure 2 it is necessary to ensure that updates to the confirmed register bank 3130 are "synchronised" to ensure that recovery is possible using instruction re-execution. This is achievable only if a certain- critical sub-set of architectural-state-variables have been stored in the confirmed register bank 3130. To ensure that all of the critical sub-set of architectural state variables are available to enable re-execution, the critical state buffer 3122 of Figure 2 is provided to hold updated values associated with a given instruction until it is known that all of the values for critical state updates associated with that particular instruction are available and that all of the non-critical state updates have either already been stored to the confirmed register bank 3130 or are present in the confirmed state buffer 3124. Only once all of the values associated with the given instruction are available are the critical variables associated with that instruction be stored in the confirmed register bank 3130. The confirmed register bank 3130 has two write ports indicated as CWO and CWl. Similarly, the speculative register bank has two write ports SWO and SWl .

Note that the actual physical update of values associated with a given instruction to the confirmed register bank may not happen immediately. This will be the case for example, if more critical state updates are required than can be performed in a single processing cycle due to the limited number of write ports on the register bank (in this case two write ports). The output of the critical state buffer is supplied to the confirmed state buffer 3124 before being supplied to the confirmed register bank 3130. The confirmed state buffer 3124 is simply a write-buffer for the confirmed register bank 3130. This is provided to avoid stalling the entire pipeline in the event that there are more than two confirmed values to be written to the confirmed register bank 3130 in a given processing cycle (e.g. due to the re-ordering of the critical state updates).

The output of the stability pipeline stages 3120 is supplied both to the critical state buffer 3122 and to the confirmed state buffer 3124. The stability pipeline stages

3120 allow sufficient time for errors in operation in the main pipeline to be detected by the error detection logic 3160 prior to those values being stored in the confirmed register bank 3130.

Consider the case where the transition detector-3142-3 indicates that an error has occurred in the third execution state of the main pipeline corresponding to instruction n-2. In this case, the program counter resetting logic 3190 will reset the program counter from the instruction n to the instruction n-5, since the last confirmed state of the integrated circuit corresponds to the instruction n-6. The confirmed state corresponding to the instruction n-6 is recovered by copying the data pertaining to the critical sub-set of state variables associated with instruction n-6 from the confirmed register bank 3130 into the speculative register bank 3110 via data path 3111. Execution of the processing operations then proceeds from stage n-5 onwards so that the error in operation of the integrated circuit does not affect the outcome of the calculation. The last processing state to be stored in the confirmed register bank 3130 is the state information for processing stage n-6.

The state variables stored in the confirmed register bank 3130 have a greater mean time between failures (and are thus much less likely to be erroneous) than the state variables stored in the speculative register bank 3110. Accordingly state variables from the confirmed register bank 3130 are used to recover from the detected error in operation in the main pipeline 3100 by restoring the last confirmed state n-6 when an error in operation is detected. Thus the system is able to recover from operation errors by using the last confirmed state of the integrated circuit.

Note that the arrangement of Figure 2 is a simplified arrangement provided for the purposes of illustration. In other arrangements according to the present technique there will not be a one-to-one correspondence between instructions and pipeline stages since a single instruction can potentially span several pipeline stages. Accordingly, in such alternative arrangements the program counter corresponding to the instruction whose critical variables were last stored to the confirmed register bank 3130 is not simply derived from the current program counter and the length of the pipeline. Rather, the program counter corresponding to the last successfully executed instruction is obtained from a separate pipeline of program counter values that shadows the main execution pipeline.

Figure 3 A schematically illustrates an arrangement according to the present technique comprising a number of stability pipeline stages appended to the end of the main pipeline. The arrangement comprises a plurality of pipeline stages 3210 including two stability stages 3220 and 3222 at the end of the pipeline; an array of transition detectors 3230-1 to 3230-4; an OR gate 3240; an operational parameter controller 3242; error detection logic 3250; pipeline flush logic 3260; confirmed state recovery logic 3262; program counter resetting logic 3270; a decode pipeline stage 3280; a score card file 3282, forwarding logic 3290; a critical state buffer 3292; a confirmed state buffer 3294 and a confirmed register bank 3296.

As in the example embodiment of Figure 2, the pipeline 3210 comprises three execute stages corresponding to instructions n, (n-1), (n-2) and (n-3). Appended to the end of this pipeline are the two stability stages 3220 and 3222 corresponding respectively to two instructions (n-4) and (n-5). Appending the additional stability stages directly to the end of the main pipeline in this way causes the output to the register bank to be slightly delayed but these extra stability stages give the integrated circuit the opportunity to detect the occurrence of an error in operation before output of data to the register bank 3296. This means that the error detection process will have completed by the time the output of the pipeline is supplied to the register bank 3296. Again the outputs of each of the processing stages of the main pipeline are supplied to transition detectors 3200-1 to 3200-4, which in turn supply their outputs to the OR gate 3240. In the event of detection of an error, error recovery is initiated via the error detection logic 3250 using the pipeline flush logic 3260, the confirmed state recovery logic 3262 and the program counter reset logic 3270, similarly as described above with reference to Figure 2. The occurrence of an error in operation is also signalled to the operational parameter controller 3242, which is operable to adjust at least one of the clock frequency, the operating voltage, the body biased voltage or the temperature in dependence upon one or more characteristics of detected errors in operation so as to maintain a finite non-zero error note in a manner that increases overall efficiency. As mentioned above with reference to Figure 3A, it will be appreciated that in alternative embodiments, there is not a one-to-one correspondence between pipeline stages and instructions.

In this example the two stability stages correspond to instruction numbers (n-4) and (n-5) respectively, which means that the last committed state variables in the register bank correspond to instruction number (n-6). Thus, for example, in the event of an error at pipeline stage (n-1) the transition detector 3230-2 is triggered, which in turn triggers a high output from the OR gate 3240. A recovery sequence is initiated and the pipeline is flushed to eliminate any pipeline values affected by the error. The program counter is reset by the logic 3270 from instruction n to the instruction (n-5) to enable forward progress of the calculation. Since the additional stability stages 3220 and 3222 incur some delay in the instruction execution in the pipeline it is appropriate to provide forwarding logic 3290 that connects output of one pipeline stage to the input of earlier pipeline stages corresponding to later executed instructions. In this case the output of pipeline stage (n-2) is fed as input to a pipeline stage associated with execution of instruction n. Forwarding logic (not shown) is also provided from pipeline stages (n-5), (n-4), (n-3) and (n-1.) and from the critical state buffer 3292 and the confirmed state buffer 3294. This enables non-committed values from later pipeline stages that have not yet been saved to the register bank 3292 to be supplied as input to subsequent processing instructions where appropriate.

The integrated circuit uses the score card file 3282 to keep track of which instruction writes to which register numbers). The score card file is written to by an earlier stage of the pipeline, in particular the decode stage 3280 of the pipeline 3210. The score card 3282 need only keep track of which instruction writes to which register and not of which instruction reads from which register since only the instruction writes are likely to affect input values to the various pipeline stages. For example, if the instruction at stage (n-2) writes to the register R3 and the subsequent instruction executed at pipeline stage n reads from register R3 as an input before the output of instruction (n-2) has been committed to the register bank, it is necessary to provide the output corresponding to the value to be written to register R3 as an input to the pipeline stage corresponding to instruction n.

Note that in the arrangements of both Figure 2 and Figure 3 A the stages of error detection, pipeline flushing, program counter resetting and recovery of the last confirmed state can be performed in a number of different orders and the present technique is not restricted to the particular ordering of these logic modules as illustrated in these Figures.

In the arrangement of Figure 3 A if an error should occur at processing stage (n-1), the state variables of the integrated circuit will be restored to the value corresponding to the last instruction that was committed to the register bank 3296. In storing the state variables used for recovery from an error, account is taken of instruction dependencies to help determine which state updates are critical. This helps to determine the ordering of writes required to leave the register bank in a consistent state, such that if an error occurs, then recovery is possible. Thus the state variables that must be restored by recovering values from the register bank will vary according to the particular error. The manner and ordering in which the state variables are stored to the register bank aids identification of a particular subset of architectural and/or micro-architectural state variables that are used by the error recovery circuits in order to recover from the error in operation.

Figure 3B schematically illustrates a sequence of operations involved in error detection and recovery as performed by the circuits of Figure 2 and Figure 3A. At stage 3297 the processing circuitry begins processing associated with the next processing cycle and subsequently at stage 3298 it is determined whether or not an error in operation has occurred. If at stage 3298 no error in operation has been detected by one of the transition detectors then the process continues by processing the subsequent cycle at stage 3297. However, if an error in operation has been detected, then the process proceeds to stage 3299 whereupon the entire pipeline is flushed of non-confirmed state variables. In alternative arrangements only a subset of values currently stored in the pipeline need be flushed. The process then continues to stage 3300 where a program counter is reset to the instruction following the last confirmed instruction. This instigates re-execution of instructions to eliminate any effects of the error in operation. At stage 3301 it is determined whether the program counter value reset at stage 3300 is equal to the last reset program counter value. This stage of the process serves to detect a deadlock in the computation whereby a given instruction repeatedly executes resulting in an error in operation.

If at stage 3301 the current program counter value is determined not to be equal to the last reset program counter value, then the process proceeds directly to stage 3303 where the program counter value is stored for future deadlock detection. However, if it is determined at stage 3301 that the program counter value is equal to the last reset program counter value this is indicative of a deadlock. Accordingly, the process proceeds to stage 3302 where one or more operating parameters of the processor are adjusted to prevent continuation of any deadlock. In this particular arrangement the adjustment of operational parameters involves reducing the clock rate temporarily. However, it will be appreciated that in alternative arrangements the voltage could be adjusted to achieve the same result. Once the clock rate has been temporarily reduced at stage 3302, the process proceeds to stage 3303 where the program counter value is stored for future deadlock detection. The process then returns to stage 3397 whereupon the next processing cycle is executed.

Although in the arrangement according to Figure 3B, deadlock is actively detected and a temporary change to the operational parameters is made in response to a deadlock, in alternative arrangements the operational parameters are temporarily changed in response to every error detection e.g. by slowing the clock rate. In this case there is no need to actively detect deadlock.

Figure 3C schematically illustrates a flow chart showing an operational parameter tuning process according to the present technique. The operational parameter tuning process is a separate process from the error detection and recovery process of Figure 3B. The operational parameter tuning process as illustrated in Figure 3C is a three stage process that begins at stage 3304 with sampling the error rate associated with processing operations. It is subsequently determined at stage 3305 whether the error rate is within acceptable bounds and if this is the case then no adjustments are made to operational parameters but the error rate continues to be sampled. However, if it is determined that the error rate is not within acceptable bounds then the process proceeds to the next stage 3306 whereby the operational parameters are adjusted. If this adjustment of the operational parameters does not return the sample error rate to within the acceptable bounds, then further adjustments - are made as required. The operational parameter modification process of -Figure 3C- can be performed entirely in hardware or using a combination of hardware and software such that the error rate information is recorded in either hardware registers or in memory. This error rate information is subsequently read by software, which uses software programmable register to modify the operational parameters.

Figure 4 schematically illustrates a transition detection D-type flip-flop according to the present technique. The arrangement comprises a standard master-slave positive edge triggered flip-flop 3310 and a transition detector circuit 3350. The flip-flop 3310 corresponds to the flip-flop 3020 of Figure 1- that connects the pipeline stages. In alternative arrangements the flip-flop could be replaced by any circuit element operable to store a signal value irrespective of triggering and other requirements. The processing of the circuit arrangement of Figure 4 is driven by a clock signal CLK. The clock signal nCLK corresponds to the clock signal after it has been passed through a single inverter element whereas the clock signal bCLK corresponds to the clock signal after it has been passed through two inverter elements. Input data is supplied to the main flip-flop and is also supplied to the transition detector 3350 via an arrangement of three inverters I₁, 1₂ and I₃. The delay induced by the combination of three inverters is equal to the set up time of the main flip-flop. The set-up time is a characteristic of the flip-flop and represents the time required for the flip-flop circuit to stabilise at a definite logic value.

Within the transition detector 3350 the input signal is supplied to a series of four inverters I₄, Is, U and I₇. Outputs from various points in the inverter array are supplied to the transistor array comprising transistors Nl, N2, N3, N4, N5 and N6. Transistor Nl is driven by an output derived from the signal corresponding to the input of the inverter I₄; the transistor N2 is driven by the output of the inverter I₆; the transistor N3 is driven by the output of the inverter L» and the transistor N4 is driven by the output of inverter I₇. The transistor N5 is on only when the clock signal is high- The transistor N6 is associated with a dynamic node ERR_DYN. The ERR_DYN node is robustly protected from discharge due to noise by back-to-back inverters I₈ and I9 and an error output signal is output from the circuit via inverter I₁₀. The error signals from each individual error detection circuit are supplied to a control state machine (not shown), which is responsive to the error signals to output a global error reset signal Err_reset. This signal pre-charges the ERR_DYN node for the next error event. This conditional pre-charge scheme significantly reduces the capacitive load on a pin associated with the clock 3032 and provides a low power overhead design. It also precludes the need for an extra latching element that would otherwise be required to hold the state of the error signal during a pre- charge phase. The circuit arrangement of Figure 4 is operable to flag an error in operation of the integrated circuit when the input data transitions either in the set up time window of the main flip-flop 3310 or during the clock phase following the sampling edge as shown in Figure 5. A data transition in either the setup window or the following clock phase is indicative of a late transitioning input.

An alternative to the transition detector of Figure 4 would be to use a delayed latch, to capture the output of the processing logic at a later time than performed by the flip-flop 3020. A comparison between the delayed value and the non-delayed value stored by the flip-flop 3020 can be used to determine occurrence of an error. This error detection system was described in US Application Publication No. US2004-0199821.

This system involves detecting a transition by calculating a different between a signal value at a first sampling time and at a second, subsequent sampling time. However, the transition detector 3350 of Figure 4 is arranged to detect any change of state in the signal within a predetermined time window.

Figure 5 schematically illustrates a functional timing diagram for a data transition occurring within the set up period of the main flip-flop 3310. The set up time of the main flip-flop T_SE_TUP__FF is indicated in the upper most portion of Figure 5 in relation to the clock edge and it can be seen that the set up time immediately precedes the clock edge. The time for which the clock edge remains positive is indicated by the time period Tpos- It can be seen that the transition in the input data occurs in the set up period of the main flip-flop in this case. However, as a result of the delay elements Ij, I₂ and I₃ of Figure 4, through which the input data must pass prior to input to the transition detector 3350, the transition in the data is shifted to a later time such that it occurs within the time Tpos but outside the period T_SETU_P_F_F- The data profile DATAJDEL3 corresponds to the input to the first of the inverters I₄ in the transition detector 3350. This data profile is inverted with respect to the input data transition profile since it has passed- through an odd number of inverters Ij, I₂ and I₃.

Figure 6 schematically illustrates a functional timing diagram representing how the circuit of Figure 4 acts to detect a data transition from logic state one to logic state zero. The circuit of Figure 4 detects such a transition when the transistors Nl , N2 and N5 are all ON. As shown in Figure 6 A the clock signal goes from low to high at time Ta and returns from a high state to a low state at time Tc₂. Figure 6B shows a data transition from high to low at a time Tp which is within the period of when the clock signal is high. Figure 6C shows the profile of the signal DATA_DEL3 of Figure 4 which is the output of the inverter I₃, and controls the transistor Nl . This signal goes from low to high at a time Ti₃, which is slightly later than the data transition time Tø. Figure 6D shows the data profile of data signal DATA_DEL4, which controls the transistor input N3. This data signal transitions from high to low at a time later again than T₁3, that is, at the time T_M- Figure 6E shows the data profile of data signal DATA_DEL5, which is output by delay element I₄ and does not supply an input to any transistors of the transistor array. Figure 6F shows the profile of the data signal DATA_DEL6, which controls the N2 transistor input and transitions from high to low at a time TK which is later than the time Ti₄. Finally, Figure 6G shows the profile of DATA_DEL7, which controls the input to the transistor N4 and which transitions from low to high at a time Ti₇, which is later again than time Ti₆. Transistor Nl is off before the point in time Ti₃ and on after that time. Transistor N3 is on prior to the time Tμ and off after that time. Transistor N2 is on prior to the time Ti₆ but is off after that time and the transistor N4 is off prior to the time Tn and is on after that time. Accordingly it can be seen that there is a time window in which both transistors Nl and N2 are simultaneously switched on but there is no time window in this functional timing diagram in which both the transistors N3 and N4 are switched on.

In the time window starting at T=O and finishing at Ti₃ the transistors Nl and N4 are switched off whereas the transistors N2 and N3 are switched on, since both the signal controlling Nl and the signal controlling N3 are high within that time window. In the time window between Ti₃ and T_M the transistors Nl, N2, and N3 are all switched on whereas trTe^"tfansistor N4 is ^"switched offr In the time window between Ti₄ and Ti₆ the transistors Nl and N2 are both switched on whereas the transistors N3 and N4 are both switched off. In the time window between T^ and Tn the transistor Nl is the only transistor that is switched on and in the time window between Tn and T₂ the transistors Nl and N4 are switched on whereas the transistors N2 and N3 are switched off. Accordingly for the duration when the clock pulse is high (when the transistor N5 is switched on) and from the time To to the time T^ the transistors Nl, N2 and N5 are all switched on. This will result in the detection of a transition since a conduction path is provided from the array of transistors to the latch node Err_dyn.

Figures 7A to 7 G schematically illustrate a functional timing diagram for the circuit of Figure 4 for detection of a data transition from logic value zero to logic value one. Figure 7A shows the clock signal, which is positive for a period from Tci to Tc₂. The data transitions from zero to one as shown in Figure 7B after time T_D2, which is just within the positive phase of the clock signal. Figure 7C shows the profile of the data signal DATA_DEL3, which drives the input of transistor Nl . This data signal transitions from one to zero at the time T_BA, which is later than the time T_D2 by a time corresponding to the evaluation time of the inverter I₃. Figure TD schematically illustrates the profile of the data signal DATA_DEL4 which drives the input of the transistor N3. This signal transitions from low to high at a time Ti_4A, which is later than the time TB_A by a period corresponding to the evaluation time of inverter I₄. Figure 7E shows the profile of the data signal DATA DEL5 corresponding to the output of the inverter Is. Figure 7F shows the data profile of the data signal DATA_DEL6, which drives the transistor N2 input and this signal transitions from zero to one at the time TK_A, which is later than the time T_MA by a time corresponding to the evaluation time of inverter I₅ and the evaluation time of inverter I₆. Finally, Figure 7G shows the data profile of the data signal DATA DEL7, which drives the input of the transistor N4. This data signal transitions from one to zero at the time Ti_7A. The output of the inverter I₁₀ will transition from high to low only in this case if transistors N3, N4 and N5 are all on. As can be seen from Figures 7 A to 7G there is a time window in which this is the case. In particular, the time window starting at T_14A when the transistor N3 switches on until the time TΠA when the transistor N4 switches off. There is no time window in which the transistors Nl, N2 and N5 are all switched on in this case. Thus it can be seen that a transition in the data from zero to one is indicated by the circuit of Figure 4 when the transistors N3, N4 and N5 are all high.

Figure 8A schematically illustrates the functional timing diagram for the main flip-flop 3310 of Figure 4 whereas Figure 8B schematically illustrates a functional timing diagram for the transition detector circuit 3350 of Figure 4. Together, the functional timing diagrams of Figures 8A and 8B illustrate how the metastability window of the transition detector is aligned such that it does not overlap with the setup window of the main flip-flop 3210. It is required that the transition detector should detect a transition in either the setup window of the main flip-flop 3310 or in a time window following the rising edge of the clock. Such a transition is indicative of a late signal, such that the main flip-flop may not be outputting the correct value at the specified time. The clock signal illustrated in Figure 8A is associated with the main flip-flop and shows a setup window Tsetup_ff, which precedes the rising clock edge. There are two requirements that define this setup window for the main flip-flop. The first requirement is that the correct data values should always be reliably sampled and the second requirement is that the output timing (i.e. the clock to data out time) is deterministic and can be characterised. Of these requirements, typically the output timing requirement is (marginally) more stringent than that of sampling the correct value. Accordingly, the setup time Tsetup_ff for the main flip-flop can be sub-divided into two time windows. The first of these time windows is Tlate (see Figure 8A) and in this time window if a signal transition occurs although the correct value is always sampled. The output timing is not within the specified bounds. The second window within the setup time of the main flip-flop is labelled in Figure 8A as Tmstable-ff, which is the metastability window of the main flip-flop. In the window Tmstable-ff the correct data value cannot be sampled and the time taken for the output to resolve to a defined value is likely to be non-deterministic.

Referring back to the main flip-flop as illustrated in the circuit diagram of Figure 4, in the main flip-flop 3310 it is possible that when a transition gate TGl closes, the voltage levels at nodes Ml and M2 on either side of an invertor situated at the output of the transmission gate TGl are such that a tri-state invertor Fl arranged in parallel with the inverter at the output of the transmission gate TGl will always feed back the correct value. However, the time taken for the value to pass through a subsequent transmission gate TG2 and through the nodes Sl and S2, which are on either side of a further inverter subsequent to the output of TG2 and the time taken for the value to pass through the subsequent inverters labelled by Qbar and Q will be longer than the time that would be taken if M2 was at "full-rail" (either Vdd for logic state 1 or GND for logic state 0).

Referring now to Figure 8B, which is a functional timing diagram associated with the transition detector 3350 of Figure 4, the transition detector 3350 does not have a setup time to the rising edge of the clock in the same way as the flip-flop 3310 does (and as illustrated in both Figure 5 and Figure 8A). Rather, for the transition detector 3350 there is a time window for which a transition in the data input can be reliably detected and this time window is referred to as the "sampling window". In Figure 8B the sampling window is labelled by Tsample td. In Figure 8 A the sampling window Tsample_td has been sub-divided into three distinct sub-windows. The first two sub- windows correspond to the sub-windows Tlate and Tmstable-ff of the main flip-flop as described above. A third sub-window Tincorrect, which is adjacent to the window Tmstable_ff forms together with Tlate and Tmstable_ff the full time window Tsample_td in which a transition in the data signal must be detected by the transition detector 3350. If the data signal transitions in the sub-window Tlate, then the Q output of the flip-flop 3310 of Figure 4 will be correct but the transition will be late. If the data transition occurs in the time window Tmstable ff, then the master latch part of the flip-flop 3310 may become metastable thus leading to an incorrect and/or late value being output by the circuit. Finally if the transition occurs in the sub-window Tincorrect then the output will have an incorrect value and the transmission gate TGl in Figure 4 will have completely shut before the new signal value arrives. The portion of the cycle subsequent to Tincorrect in Figure 8A and indicated by Tcorrect represents the remainder of the timing cycle during which a transition is not indicative of an error. Note that the operational parameters of the device of Figure 4 are arranged such that an input signal to the main flip-flop 3310 will never evaluate later than in the Tincorrect window. This arrangement also imposes a-constraint on the hold time of the input to the main flip-flop 3310, such that the earliest input to the main flip-flop can change is the start of the Tcorrect window.

The transition detector 3350 also has a metastability window, which is indicated as Tmstable_td in Figure 8B and this time window precedes the time window Tsample_td. If a transition occurs in the time window Tmstable td then the Err_dyn mode shown in Figure 4 may become metastable resulting in the error output becoming unknown (i.e. logic 1, logic 0 or some intermediate value). However, by designing the circuit such that Tmstable td occurs within the window Tcorrect as shown, yet does not overlap with Tlate, Tmstable_ff or Tincorrect, then it is known that if the metastability does occur in the transition detector 3350 then the Q output of the main flip-flop 3310 both have the correct value and output timing. This enables the use of standard synchronising logic to be applied to the output of logic driven by the error signal. This is illustrated in figure 9.

Figure 9 schematically illustrates error synchronisation of error signals derived from transition detectors. The arrangement of Figure 9 comprises the OR gate 3040 (corresponding to that illustrated in Figure 1), a first flip-flop 3042 and a second flip- flop 3044 to which the output of the OR gate 3040 is supplied in succession. The first flip-flop 3042 is designed specifically for fast metastability resolution and has very high gain in the feedback loop, which is the cause of metastability. A standard flip- flop typically has less gain in the feedback loop than the flip-flop 3042 since there are design tradeoffs between the gain and the other parameters of the flip-flop such as setup time and area. The second flip-flop 3044 is a standard flip-flop. As shown in Figure 9 the number of error signals, error 1, error 2, error 3, ... error N, which are derived from individual transition detectors are ORed together to form GlobalError signal. If any one of the individual error signals that are input to the OR gate 3040 is metastable then this can also result in metastability or non-deterministic timing of the output GlobalError signal. The GlobalError signal is passed through a standard arrangement for synchronising a signal to a particular clock domain consisting of the two flip-flops 3042 and 3044. The output of the second flip-flop 3044 is a synchronised version of the -GlobalError signal since it has a voltage level corresponding to a definite logic value and has deterministic timing. This signal is labelled GlobalErrorSync in Figure 9.

In the situation where the GlobalError signal is metastable then the GlobalErrorSync signal may be either a logic 0 or a logic 1. The GlobalErrorSync signal is used by the error recovery logic 3050 of Figure 1 to determine when an error in operation has occurred. Since the metastability window of the transition detector

3350 lies entirely within the Tcorrect time window (refer to Figures 8A and 8B), in the event that the transition detector 3350 becomes metastable then the resulting value of the GlobalErrorSync signal will correspond to a "don't care" condition. In the event of a GlobalErrorSync signal indicating the logic value 1 in this case, the error recovery process will be initiated although this is benign.

Claims

1. An integrated circuit for a data processing apparatus, said integrated circuit being operable to perform digital data processing and comprising: an error detection circuit operable to monitor a digital signal value within said integrated circuit and to detect a transition in said signal value in a predetermined time window, said transition being indicative of an error in operation of said integrated circuit; a storage unit operable to store a recoverable state of said data processing apparatus, said recoverable state comprising at least a subset of architectural state variables corresponding to a programmer's model of said integrated circuit; an error-recovery circuit responsive to said error detection circuit and operable to enable said integrated circuit to recover from said error in operation using said stored recoverable state; an operational parameter controller operable to control one or more performance controlling operational parameters of said integrated circuit; wherein said operational parameter controller dynamically controls at least one of said one or more performance controlling parameters in dependence upon one or more characteristics of errors detected by said error detection circuit to maintain a nonzero rate of errors in operation, said error-recovery circuit being operable to enable the integrated circuit to recover from said errors in operation such that data processing by said integrated circuit continues.

2. An integrated circuit as claimed in claim 1, in which said recoverable state comprises at least a subset of micro-architectural state variables.

3. An integrated circuit as claimed in claim 1 or claim 2, in which said error detection circuit is arranged to detect said transition by calculating a difference between said signal value at a first sampling time and said signal value at a second, subsequent sampling time.

4. An integrated circuit as claimed in claim 1 or claim 2, in which said error detection circuit is arranged to detect said transition by detecting any change of state in said signal value within a predetermined time window.

5. An integrated circuit as claimed in any one of the preceding claims, in which said error detection circuit is operable to detect an error in an output signal of an associated processing circuit element of said integrated circuit.

6. An integrated circuit as claimed in claim 5, in which said integrated circuit has an error detection circuit metastability window that is mutually exclusive-with a setup - window of said associated processing circuit element.

7. An integrated circuit as claimed in claim 5 or claim 6, in which said integrated circuit comprises a plurality of error detection circuits associated with a respective plurality of processing circuit elements.

8. An integrated circuit as claimed in any one of claims 5 to 7, in which said integrated circuit has an instruction pipeline comprising a plurality of pipeline stages.

9. An integrated circuit as claimed in claim 7, in which said associated processing circuit element is a latch for passing data between consecutive ones of said plurality of pipeline stages.

10. An integrated circuit as claimed in any one of claims 5 to 9, in which said error detection circuit comprises at least one delay element arranged to delay an input digital signal to enable detection of said transition when said transition occurs during a a setup time of said processing circuit element.

11. An integrated circuit as claimed in any one of the preceding claims, in which said error detection circuit comprises at least one of a zero-to-one transition detector and a one-to-zero transition detector. 005/013555

30

12. An integrated circuit as claimed in any one of claims 5 to 11, in which said error recovery circuit comprises at least one stability pipeline stage operable to enable verification of output values of said plurality of pipeline stages prior to commitment said output values as state variables of said integrated circuit. 5

13. An integrated circuit as claimed in claim 12, in which said integrated circuit comprises data forwarding circuitry operable to supply a value calculated by a particular one of said plurality of pipeline stages directly from said particular pipeline stage to another different one of said plurality of pipeline stages for use as an input

-10 value for a different pipeline stage.

14. An integrated circuit as claimed in claim in any one of the preceding claims, in which said storage unit is a register bank operable to store confirmed state variables, said confirmed state variables having been confirmed to be unaffected by said errors in

15 operation.

15. An integrated circuit as claimed in any one of claims 1 to 14, in which said integrated circuit comprises a speculative register bank operable to store speculative state variables whose values have not been confirmed as unaffected by said errors in

20 operation and in which said storage unit is a confirmed register bank operable to store confirmed state variables whose values have been confirmed as being correct values.

16. An integrated circuit as claimed in claim 15, in which said error recovery circuit is operable to replace a subset of said speculative state variables in said

25 speculative register bank by corresponding ones of said confirmed state variables from said confirmed register bank in the event that said error detection circuit detects an error in operation.

17. An integrated circuit as claimed in any one of the preceding claims, in which 30 said operational parameter controller is operable to at least temporarily adjust one or more of said performance controlling parameters after a time delay following detection of said error in operation.

18. An integrated circuit as claimed in claim 17, comprising a deadlock detection module operable to detect repetition of an error in operation corresponding to a given instruction and in which said operational parameter controller is operable to temporarily adjust one or more of said performance controlling parameters in response to said detected repetition.

19. A method of controlling an integrated circuit for performing data processing, said method comprising the steps of: monitoring a digital signal value within said integrated circuit and detecting a transition in said signal value in a predetermined time window, said transition being indicative of an error in operation of said integrated circuit; storing a recoverable state of said data processing apparatus, said recoverable state comprising at least a subset of architectural state variables corresponding to a programmer ' s model of said integrated circuit; using said stored recoverable state, in response to said detection of said error in operation, to enable said integrated circuit to recover from said error in operation; controlling one or more performance controlling operational parameters of said integrated circuit; wherein said step of controlling comprises dynamically controlling at least one of said one or more performance controlling parameters in dependence upon one or more characteristics of errors detected in said monitoring and detecting step to maintain a non-zero rate of errors in operation, use of said stored recoverable state in response to said detection of said error in operation enabling said integrated circuit to recover from said errors in operation such that data processing by said integrated circuit continues.