GB2456656A - Controlling timeouts of an error recovery procedure in a digital circuit - Google Patents

Controlling timeouts of an error recovery procedure in a digital circuit Download PDF

Info

Publication number
GB2456656A
GB2456656A GB0822778A GB0822778A GB2456656A GB 2456656 A GB2456656 A GB 2456656A GB 0822778 A GB0822778 A GB 0822778A GB 0822778 A GB0822778 A GB 0822778A GB 2456656 A GB2456656 A GB 2456656A
Authority
GB
United Kingdom
Prior art keywords
state machine
finite state
recovery procedure
error recovery
states
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0822778A
Other versions
GB2456656B (en
GB0822778D0 (en
Inventor
Ulrich Mayer
Frank Lehnert
Guenter Gerwig
Scott Barnett Swaney
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB0822778D0 publication Critical patent/GB0822778D0/en
Publication of GB2456656A publication Critical patent/GB2456656A/en
Application granted granted Critical
Publication of GB2456656B publication Critical patent/GB2456656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Retry When Errors Occur (AREA)

Abstract

The invention relates to apparatus for controlling timeouts and delays of an error recovery procedure in a digital circuit, e.g. a microprocessor. The apparatus comprises a finite state machine (FSM) 10, having a plurality of states 12 and a plurality of transitions 14. Transitions 14 are arranged between two states 12 respectively. States 12 correspond with operation steps (40, 44, 52, 56, 58, 64) of the error recovery procedure, including error classification, a drain operation, a fence operation in which a microprocessor core does not communicate with memory, a reset or refresh operation, and automatic built-in self test (ABIST). Transitions 14 of the FSM 10 depend on conditions (46, 50, 53, 57, 59, 62) for the error recovery procedure. The FSM 10 is coupled with a timeout logic circuit 20 which controls a timer to obtain the timeouts (46, 53, 57, 59) of the error recovery procedure. The FSM is configurable by a data vector which describes states 12 of the FSM for which the timer should be engaged.

Description

-i -2456656
DESCRI PT ION
An apparatus and a method for controlling timeouts and delays of an error recovery procedure in a digital circuit
Background of the invention
1. Field of the invention
The present invention relates to an apparatus for controlling timeouts and delays of an error recovery procedure in a digital circuit. Further, the present invention relates to a method for controlling timeouts and delays of an error recovery procedure in a digital circuit. In particular, the digital circuit may be a microprocessor or a core of the microprocessor.
2. Description of the related art
In some microprocessors or some other digital circuits a recovery procedure is performed after an error has been occurred. In the error recovery procedure the timeouts and delays have to be controlled. The increasing complexity of microprocessors requires the control of a relative high number of timers. However, not all microprocessors support a recovery procedure after an error occurred. Many microprocessors do not even detect the error. Hence, the error recovery is optional in microprocessors.
The article The IBM eServer z900 microprocessor" by T.J.
Slegel, E. Pfeffer and J.A. Magee (IBM J. Res. & Dev., Vol. 48, No. 3/4, May/July 2004) describes a microprocessor with a recovery unit.
In known microprocessor systems the long timeouts are checked by time-of-day (TOD) clocks. The short timeout in the known systems are checked by individual counters. This concept requires a lot of time for performing the test cases.
Further, the recovery procedure needs to wait for a certain numbers of cycles, whenever a control signal has been sent Out.
Examples of the control signal are a recovery reset signal or a recovery refresh mode indication signal.
ject of the Invention It is an object of the present invention to provide an improved apparatus and method for controlling timeouts and delays in an error recovery procedure.
Summary of the invention
The above object is achieved by a method as laid out in the independent claims. Further advantageous embodiments of the present invention are described in the dependent claims and are
taught in the description below.
The core idea of the invention is to provide a central unit in order to control a single timer of the digital circuit. A timeout logic circuit controls the timer in order to obtain the timeouts and delays in the digital Circuit. The timeout logic circuit is coupled with a finite state machine.
The finite state machine comprises a plurality of states and a Plurality of transitions Each transition is arranged between two states of the finite state machine. The states of the finite state machine correspond with operation steps of the error recovery procedure. The transitions of the finite State machine depend on conditions for the error recovery procedure.
The finite state machine is configured or configurable by a data vector, which may act as an activation vector. At least one bit of the data vector corresponds with each state of the finite state machine, respectively. Said bit of the data vector defines when a timer is activated for the corresponding state or not.
Thus, the timer is Controlled by a central device.
The apparatus of the present invention is a part the recovery state machine hardware. The recovery procedure is defined by this hardware. If a timeout happens during a recovery action, i.e. within the recovery procedure, then a proper recovery escalation action can be started.
The central device is realized by the finite state machine, which may be flexibly configured by the data vector, e.g. the activation vector. The finite state machine is coupled with the timeout logic circuit.
The present invention allows an easy adjusting of the delays and timeout values. The present invention requires only a small hardware. In particular, the same counter may be used for all cycle related waits.
Brief description of the drawings
The above as well as additional objectives, features and advantages of the present invention will be apparent in the
following detailed written description.
The novel and inventive features believed characteristics of the invention are set forth in the appended claims. The invention itself, their pref erred embodiments and advantages thereof will be best understood by reference to the following detailed description of preferred embodiments in conjunction with the accompanied drawings, wherein: Fig. 1 illustrates a schematic diagram of an apparatus for controlling timeouts and delays in an error recovery procedure according to a preferred embodiment of the present invention, Fig. 2 illustrates a first part of a schematic flow chart diagram of a method for controlling timeouts and delays in an error recovery procedure according to the preferred embodiment of the present invention, Fig. 3 illustrates a second part of the schematic flow chart diagram of the method for controlling timeouts and delays in the error recovery procedure according to the preferred embodiment of the present invention, and Fig. 4 illustrates a third part of the schematic flow chart diagram of the method for controlling timeouts and delays in the error recovery procedure according to the preferred embodiment of the present invention.
Detailed description of the invention
Fig. 1 illustrates a schematic diagram of an apparatus for controlling timeouts and delays in an error recovery procedure according to a preferred embodiment of the present invention. In particular, the apparatus is provided for a core of a microprocessor. The apparatus comprises a finite state machine (FSM) 10 and a timeout logic circuit 20.
The finite state machine 10 includes a finite number of states 12. Said states 12 are represented by circular symbols. The number inside the circular symbols characterizes the kind of the state. Transitions 14 between two states 12 are represented by arrows. The transition 14 corresponds with a predetermined condition for performing said transition 14.
Between two neighbored states 12 with the numbers "0", "1", "3", "4" and "5", respectively, there is a transition 14. Thus, the states 12 with the numbers "0", "1", "3", "4" and "5" an unidirectional series. Further, there is a transition 14 between the states 12 with the numbers "1", "3", "4" and "5", respectively, on the one hand and the state 12 with the number "2" on the other hand.
The finite state machine 10 in this example is a so-called acceptor finite state machine. In a digital circuit the finite state machine 10 may be implemented by a programmable logic device, a programmable logic controller, logic gates or storage elements, like flip-flops or latches.
The timeout logic circuit 20 comprises an FSM state register 22, a setup timer 24, an AND gate 26, a setup multiplexer 28 and a timer register 30. The output terminal of the FSM state register 22 is connected to a first input terminal of the AND gate 26.
The output terminal of the setup timer 24 is connected to a second input terminal of the AND gate 26. The output terminal of the AND gate 26 is connected to the setup multiplexer 28. The setup multiplexer 28 is provided for storing a plurality of different tinier setup values 32. The setup values 32 can come from registers, scan only registers or can be hard coded. For example, the setup values 32 are single bits at the level of VDD or GND. For example, a typical implementation of the present invention combines scan only registers and hard coded values in order to save hardware. It comes at the price of reduced timeout value ranges. For example, large timeouts may have the low twelve bits tied to zero, and the small timeouts may have the high twelve bits tied to zero. The output terminal of the setup multiplexer 28 is connected to the input terminal of the timer register 30.
The finite state machine 10 acts as a recovery finite state machine 10. The finite state machine 10 and the timeout logic circuit 20 are coupled via an activation vector. Initialization values are set up according to the timeout requirement of the respective state 12 of the finite state machine 10. The activation vector describes those states 12 of the finite state machine 10, for which the timer should be engaged.
Preferably, each state 12 of the finite state machine 10 corresponds with one bit of the activation vector. The initialization values describe the timeout values for each cycle in which the timer is engaged.
When a transition 14 between two states 12 of the finite state machine 10 occurs, the timeout logic circuit 20 is initialized.
If the activation vector requests a timeout, then the timeout logic circuit 20 is initialized with an initialization value. If the activation vector does not select a timer initialization vector, then the timeout logic circuit 20 is initialized to zero.
If the timeout logic circuit 20 has been initialized to a value unlike zero, then the timeout logic circuit 20 starts a next cycle. If a decrement changes the timeout logic circuit 20 to zero, then a timeout indication is generated and interpreted in those states of the finite state machine 10, for which the setup timer 24 has been set up.
Fig. 2 illustrates a first part of a schematic flow chart diagram of a method for controlling timeouts and delays in an error recovery procedure according to the preferred embodiment of the present invention. The operating steps of said method correspond with the states 12 in FIG. 1. The requesting steps of said method correspond with the transitions 14 in FIG. 2.
A step 40 is the idle state of the finite state machine 10, i.e. during a normal operation of the microprocessor, when no error is detected, the recovery logic is doing nothing else than to wait in said idle state until an error was detected.
As a next step it is requested in a further step 42, if an error occurs. If no error occurs, then the idle operation 40 is set forth. If an error has detected, then in a step 44 a known good machine state is preserved and an error classification is expected. The error classification establishes, if the error is recoverable or not.
In a next step 46 the error classification is evaluated, too. If an xspp error was indicated or a timeout has occurred, then the next state is the XSTP state in a step 48. The XSTP state means that the microprocessor or other digital circuit is not usable anymore. In this case the clock may be stopped and an attempt will be made to transfer the last known good state of the non-usable processor on a spare processor. If the error was classified recoverable and no timeout has occurred, then in a step 50 it IS requested, if a recovery error occurs. If a recovery error has been occurred, then also the XSTP state is recognized in a step 48. If no recovery error has been occurred, then a drain operation is performed in a step 52.
Fig. 3 illustrates a second part of the schematic flow chart diagram of the method for Controlling timeouts and delays in the error recovery procedure according to the preferred embodiment of the present invention.
After the step 52 of the drain operation, it is requested again in a further step 53, if a timeout occurs. If the timeout has been occurred, then a fence operation is performed in a step 56.
During said fence operation the core of the microprocessor does not communicate with the interface between the microprocessors and a memory. If no timeout occurs, then it is requested in a step 54, if the drain operation has been finished. If the drain operation has not been finished yet, then the drain operation is set forth in the step 52 again. If the drain operation has been finished, then also the fence operation is performed in the step 56.
After the fence operation in the step 56, it is requested in a step 57 again, if the timeout occurs. If no timeout has been occurred, then the fence operation is set forth in the step 56.
If the timeout has been occurred, then a reset operation is performed in a step 58.
Fig. 4 illustrates a third part of the schematic flow chart diagram of the method for controlling timeouts and delays in the error recovery procedure according to the preferred embodiment of the present invention.
After the reset operation of the step 58 it requested in a step 59 again, if the timeout occurs. If no timeout has been occurred, then the reset operation is set forth in the step 58.
The steps 58 and 59 are an example of a delay. After a reset signal is activated it has to be waited until said reset signal has reached each part of the microprocessor Since the reset signal can be staged many times for cycle time reasons it has to be waited for a number cycles. In this case the timeout has to occur. If the timeout has occurred, then a refresh operation is performed in a step 60.
In a step 62 it is requested, if the refresh operation has been finished. If the refresh operation has not been finished yet, then the refresh operation is set forth again in the step 60. If the refresh operation has been finished, then in a next step 64 an automatic built-in self-test (ABIST) is performed. After the step 64 the recovering procedure continues in a conventional way. Some of the following recovery steps which are not described also use the timeout method according to the present invention.
The present invention can also be embedded in a computer program product which comprises all the features enabling the implementation of the methods described herein. Further, when loaded in computer system, said computer program product is able to carry out these methods.
Although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understoo1 that the present invention is not limited to those precise embodiments, and that various other changes and modifications may be affected therein by one skilled in the art Without departing from the scope or spirit of the invention. All such changes and modifications are intended to be included within the scope of the invention as defined by the appended claims.
-10 -
LIST OF REFERENCE NUMERALS
finite state machine (FSM) 12 state 14 transition timeout logic circuit 22 FSM state register 24 setup timer 26 AND gate 28 setup multiplexer timer register 32 initialization values step of an idle operation 42 step of requesting an error 44 step of an error classification 46 step of requesting a timeout 48 step of an XSTP state step of requesting a recovery error 52 step of a drain operation 53 step of requesting a timeout 54 step of requesting the drain operation 56 step of requesting a fence operation 57 step of requesting a timeout 58 step of requesting a reset operation 59 step of requesting a timeout step of a refresh operation 62 step of requesting a refresh 64 step of an ABIST

Claims (19)

-11 - CLAIMS
1. An apparatus for controlling timeouts and delays of an error recovery procedure in a digital circuit, wherein -the apparatus comprises a finite state machine (10) with a plurality of states (12) and a plurality of transitions (14), -the transitions (14) are arranged between two states (12), respectively, -the states (12) correspond with operation steps (40, 44, 52, 56, 58, 64) of the error recovery procedure, -the transitions (14) of the finite state machine (10) depend on Conditions (46, 50, 53, 57, 59, 62) for the error recovery procedure, -the finite state machine (10) is coupled with a timeout logic circuit (20), and -the timeout logic circuit (20) is provided to control a timer in order to obtain the timeouts and delays in the digital circuit.
2. The apparatus according to claim 1, wherein the finite state machine (10) is configured or configurable by a data vector.
3. The apparatus according to claim 2, wherein at least one bit of the data vector corresponds with each state (12) of the finite state machine (10), respectively.
4. The apparatus according to claim 3, wherein said bit of the data vector defines, if the timer is activated for the corresponding state (12).
5. The apparatus according to any one of the preceding claims, wherein the finite state machine (10) is an acceptor finite state machine.
-12 -
6. The apparatus according to any one of the preceding claims, wherein the digital circuit is a microprocessor.
7. The apparatus according to any one of the preceding claims, wherein the apparatus is realized in hardware or a combination of hardware and software.
8. A method for controlling timeouts and delays of an error recovery procedure in a digital circuit, wherein -said method uses a finite state machine (10) with a plurality of states (12) and a plurality of transitions (14), -the transitions (14) are arranged between two states (12), respectively, -operation steps (40, 44, 52, 56, 58, 64) of the error recovery procedure correspond with the states (12) of the finite state machine (10), -conditions (46, 50, 53, 57, 59, 62) for the error recovery procedure define the transitions (14), -the finite state machine (10) controls the timeout logic circuit (20), and -the timeout logic circuit (20) controls a timer in order to obtain the timeouts and delays in the digital circuit.
9. The method according to claim 8, wherein a data vector configures the finite state machine (10).
10. The method according to claim 8 or 9, wherein at least one bit of the data vector corresponds with each state (12) of the finite state machine (10), respectively.
11. The method according to any one of the claims 8 to 10, wherein said bit of the data vector activates the timer for the corresponding state (12).
-13 -
12. The method according to any one of the claims 8 to 11, wherein the data vector acts as an activation vector for the finite state machine (10).
13. The method according to any one of the claims 8 to 12, wherein initialization values are defined.
14. The method according to claim 13, wherein the initialization values defines the timeout values for each cycle, in which the timer is engaged.
15. The method according to any one of the claims 8 to 14, wherein said method uses an acceptor finite state machine.
16. The method according to any one of the claims 8 to 15, wherein said method is provided for a microprocessor.
17. The method according to any one of the claims 8 to 16, wherein said method is provided for a core of a microprocessor.
18. The method according to any one of the claims 8 to 17, wherein the method is realized in hardware, software or a combination of hardware and software.
19. A computer program product stored on a computer usable medium, comprising computer readable program means for causing a computer to perform a method according to anyone of the preceding claims 8 to 18.
GB0822778.7A 2008-01-24 2008-12-15 An apparatus and a method for controlling timeouts and delays of an error recovery procedure in a digital circuit Active GB2456656B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP08150585 2008-01-24

Publications (3)

Publication Number Publication Date
GB0822778D0 GB0822778D0 (en) 2009-01-21
GB2456656A true GB2456656A (en) 2009-07-29
GB2456656B GB2456656B (en) 2012-11-07

Family

ID=40326081

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0822778.7A Active GB2456656B (en) 2008-01-24 2008-12-15 An apparatus and a method for controlling timeouts and delays of an error recovery procedure in a digital circuit

Country Status (1)

Country Link
GB (1) GB2456656B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10229748B1 (en) 2017-11-28 2019-03-12 International Business Machines Corporation Memory interface latch with integrated write-through function
US10381098B2 (en) 2017-11-28 2019-08-13 International Business Machines Corporation Memory interface latch with integrated write-through and fence functions

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4503535A (en) * 1982-06-30 1985-03-05 Intel Corporation Apparatus for recovery from failures in a multiprocessing system
US6327675B1 (en) * 1998-07-31 2001-12-04 Nortel Networks Limited Fault tolerant system and method
US6397346B1 (en) * 1996-06-03 2002-05-28 Sun Microsystems, Inc. Method and apparatus for controlling server activation in a multi-threaded environment
US6421757B1 (en) * 1998-09-30 2002-07-16 Conexant Systems, Inc Method and apparatus for controlling the programming and erasing of flash memory
US20070168720A1 (en) * 2005-11-30 2007-07-19 Oracle International Corporation Method and apparatus for providing fault tolerance in a collaboration environment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8121828B2 (en) * 1999-01-28 2012-02-21 Ati Technologies Ulc Detecting conditions for transfer of execution from one computer instruction stream to another and executing transfer on satisfaction of the conditions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4503535A (en) * 1982-06-30 1985-03-05 Intel Corporation Apparatus for recovery from failures in a multiprocessing system
US6397346B1 (en) * 1996-06-03 2002-05-28 Sun Microsystems, Inc. Method and apparatus for controlling server activation in a multi-threaded environment
US6327675B1 (en) * 1998-07-31 2001-12-04 Nortel Networks Limited Fault tolerant system and method
US6421757B1 (en) * 1998-09-30 2002-07-16 Conexant Systems, Inc Method and apparatus for controlling the programming and erasing of flash memory
US20070168720A1 (en) * 2005-11-30 2007-07-19 Oracle International Corporation Method and apparatus for providing fault tolerance in a collaboration environment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10229748B1 (en) 2017-11-28 2019-03-12 International Business Machines Corporation Memory interface latch with integrated write-through function
US10381098B2 (en) 2017-11-28 2019-08-13 International Business Machines Corporation Memory interface latch with integrated write-through and fence functions
US10916323B2 (en) 2017-11-28 2021-02-09 International Business Machines Corporation Memory interface latch with integrated write-through and fence functions

Also Published As

Publication number Publication date
GB2456656B (en) 2012-11-07
GB0822778D0 (en) 2009-01-21

Similar Documents

Publication Publication Date Title
TWI338835B (en) Method and apparatus for controlling a data processing system during debug
US7574638B2 (en) Semiconductor device tested using minimum pins and methods of testing the same
CN105406842B (en) Output timing control circuit of semiconductor device and method thereof
US9870233B2 (en) Initializing a memory subsystem of a management controller
CN108334184B (en) System chip for controlling memory power using handshake process and method of operating the same
WO2009114288A1 (en) Address multiplexing in pseudo-dual port memory
JP2009527861A (en) Data processing system and method having address translation bypass
KR102354764B1 (en) Providing memory training of dynamic random access memory (dram) systems using port-to-port loopbacks, and related methods, systems, and apparatuses
US9910757B2 (en) Semiconductor device, log acquisition method and electronic apparatus
US10802742B2 (en) Memory access control
CN108962333B (en) Semiconductor device including power gating circuit and method of repairing the same
US20070038795A1 (en) Asynchronous bus interface and processing method thereof
US8266464B2 (en) Power controller, a method of operating the power controller and a semiconductor memory system employing the same
GB2456656A (en) Controlling timeouts of an error recovery procedure in a digital circuit
US9450587B2 (en) Test circuit and test method of semiconductor apparatus
US9660617B2 (en) Semiconductor apparatus
US11486913B2 (en) Electronic device for detecting stuck voltage state and method of monitoring stuck voltage state
US20180090221A1 (en) Boot-up control circuit and semiconductor apparatus including the same
US7852701B1 (en) Circuits for and methods of determining a period of time during which a device was without power
JP2005108434A (en) Semiconductor storage device
US10761581B2 (en) Method and module for programmable power management, and system on chip
US10496414B2 (en) Semiconductor device and method of operating the same
US11579776B2 (en) Optimizing power consumption of memory repair of a device
JP4587000B2 (en) Chip select circuit
JP2006127091A (en) Semiconductor integrated circuit

Legal Events

Date Code Title Description
746 Register noted 'licences of right' (sect. 46/1977)

Effective date: 20130107