EP2176757A2 - Watchdog-mechanismus mit fehlerbehebung - Google Patents
Watchdog-mechanismus mit fehlerbehebungInfo
- Publication number
- EP2176757A2 EP2176757A2 EP08786592A EP08786592A EP2176757A2 EP 2176757 A2 EP2176757 A2 EP 2176757A2 EP 08786592 A EP08786592 A EP 08786592A EP 08786592 A EP08786592 A EP 08786592A EP 2176757 A2 EP2176757 A2 EP 2176757A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- watchdog
- escalation
- level
- events
- correct
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
- G06F11/0754—Error or fault detection not based on redundancy by exceeding limits
- G06F11/0757—Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
Definitions
- the invention relates to a method for handling watchdog events in an electronic device.
- the invention also relates to an electronic device adapted to handle watchdog events.
- Watchdog mechanisms are used in electronic devices, like watchdog devices, microcontrollers, digital signal processors (DSPs) and other devices having a CPU and executing programs. These electronic devices are usually part of an electronic system, e.g., acting as a system supervisor.
- a watchdog mechanism is typically based on a counter that is clocked by the system clock or a clock which is derived from the system clock.
- the counter issues a watchdog fault every time a predefined counter state is reached.
- the watchdog fault state entails a system reset in order to bring the system back into a well- defined initial state in case the counter is not serviced by a watchdog trigger before the predefined counter state is reached, such as because the program issuing the watchdog trigger hangs or malfunctions.
- the system reset may not be the appropriate means to overcome the problems of the CPU.
- a mere reset may cause a loss of internal data.
- the CPU is not available for further data processing, which might be a waste of CPU processing time, if only a minor and temporary problem exists.
- the invention provides a method for handling watchdog events of an electronic device.
- a watchdog fault is detected, which is a watchdog event in which a watchdog trigger is not correctly serviced.
- the electronic device Upon detection of the watchdog fault, the electronic device enters into a first escalation level from the normal mode.
- the escalation level can be one of nx escalation levels, wherein nx is an integer equal to or greater than 1.
- the electronic device In this first escalation level, correct watchdog events, which are watchdog events in which a watchdog trigger is correctly serviced, and watchdog faults are detected.
- the electronic device remains in the first escalation level until a specific first escalation condition or a first de-escalation condition is met. Both conditions can be based on the detected correct watchdog events and the detected watchdog faults.
- the electronic device can recover in a recovering step back from any of the nx escalation levels to any previous level or mode, if a specific de-escalation condition is met. So, the invention provides at least one escalation level, in which a further escalation condition is monitored before the electronic device proceeds to another level.
- a de-escalation or recovery mechanism allows the electronic device to step back to a previous escalation level or to the normal mode. This allows the watchdog faults and system resets to be handled in a more flexible way.
- the electronic device may continue program execution in this first escalation state until the second escalation condition is met. If a specific de- escalation condition is met, the electronic device can even return to normal mode or a lower escalation level. This aspect provides that after a certain time of normal operation, the system can re-establish an entirely normal functionality.
- the program can be any kind of sequence of operations implemented with software, hardware, finite state machines, microcode, nanocode, logic gates, etc.
- the de-escalation condition can be defined such that n consecutive correct watchdog events are detected before another watchdog fault is detected, wherein n is greater than 1.
- the number and sequence of correct watchdog events is a reliable indicator that normal functionality of the electronic device is resumed.
- the first escalation condition can be met if the number of counted watchdog faults exceeds a maximum number of watchdog faults, or if a correct watchdog event is not detected before expiration of a first recovery time after detection of the last watchdog fault. So, there can be time limit in the form of a recovery time, during which at least one correct watchdog event must be detected. Further, every time a watchdog fault occurs, the corresponding count of watchdog faults can be increased.
- Further escalation levels can be provided, up to a theoretically unlimited number.
- the second escalation level can have substantially the same or different properties with respect to the first escalation level.
- the electronic device can enter into a second escalation level after leaving the first escalation level and remain in the second escalation level until a second escalation condition is met.
- the electronic device can recover in a recovering step back from any escalation level, i.e., any of the nx escalation levels, to any previous level or mode, for example to the first escalation level or to the normal mode, if a second de-escalation condition is met.
- the watchdog faults can be detected and counted and correct watchdog events can be detected concurrently.
- the second de-escalation condition can be implemented to be defined such that m consecutive correct watchdog events are detected before another watchdog fault is detected, wherein m is greater than 1.
- the second escalation condition can be predetermined such that a maximum number of watchdog faults is reached or a correct watchdog event is not detected before a second recovery time has expired after detection of the last watchdog fault.
- the length of the second and the first recovery times can be the same or different.
- a reset signal can be activated in the second escalation level.
- the reset signal can be used to reset specific parts or stages of the system (e.g., the CPU) or a limited number of functional blocks of an electronic device.
- the watchdog will preferably not be reset in the second escalation level.
- the reset signal can preferably be deactivated when a finite reset time has expired. Also, the reset signal can be activated for the finite reset time each time a watchdog fault is detected. Accordingly, the reset signal is only asserted for a time sufficiently long in order to correctly reset the system. However, after having reset the system, the electronic device remains in the second escalation level and continues operation until a second escalation condition is met.
- the electronic device After leaving any escalation level, or numerous escalation levels of a first type or a second type following the first type, the electronic device and therefore the system can enter into a final safe state.
- the final safe state is a state where the system to which the electronic device belongs is secured by measures that are specific for the application.
- the electronic device can be microcontroller in a car used for controlling the brakes. If the microcontroller malfunctions, i.e., watchdog faults occur, the microcontroller may then pass from normal mode to a first escalation level and from there to a second escalation level. If the device still malfunctions after being reset in the second escalation level, the device enters into a safe state, where the basic functionality of the brakes is somehow maintained. After having performed the necessary steps to ensure that the brakes continue to work, the microcontroller can then, for example, switch off. Other applications may require that a specific data is copied from volatile memory to non-volatile memory, when the safe state is reached.
- watchdog events are handled in a more flexible way. If a processor, which uses the invention, produces a watchdog fault, the processor can remain in the first escalation level. A reset pulse is not issued. Further, normal operation of the processor can continue and important processing time is preserved. The possibility of recovering or de- escalating from any escalation level, either in a stepwise manner or directly to normal mode gives an additional degree of reliability and system stability.
- the invention also relates to an electronic device, in particular to a microcontroller or a processor having an integrated CPU, which is adapted to handle watchdog events.
- the electronic device is adapted to detect a watchdog fault in a normal operating mode, which is a watchdog event in which a watchdog trigger is not correctly serviced. Further, the electronic device is adapted to enter from the normal mode into a first escalation level upon detection of the watchdog fault, which can be one of nx escalation levels, wherein nx is an integer equal to or greater than 1.
- the electronic device can then (i.e., in the first escalation level) detect correct watchdog events, which are watchdog events in which a watchdog trigger is correctly serviced, and concurrently detect watchdog faults.
- the electronic device embodiment is adapted to leave the first escalation level if a first escalation condition is met, which is based on the detected correct watchdog events and the detected watchdog faults. Accordingly, the electronic device is adapted in accordance with some or all of the aspects explained hereinabove .
- Each of the escalation levels and also the safe state mode may include several states.
- the first escalation level may include a first state and a second state dependent on the last detected watchdog event.
- the second escalation level may include two states: a first state, if the last detected event was a watchdog fault; and a second state, if the last event was a correct watchdog event. The electronic device may then toggle between the two states until the escalation condition is reached.
- one of the states can include issuing of the reset pulse, whereas the other state does not trigger a reset signal.
- a recovery step to a lower escalation level can advantageously only start from a specific state within a level. This might preferably be a state, in which a correct watchdog event has previously been detected at least once.
- FIG. 1 shows a simplified state diagram illustrating the steps according to the invention
- FIGS. 2A-2C show signals relating to the first escalation level according to the invention.
- FIGS. 3A-3C show signals relating to the second escalation level according to the invention.
- FIG. 1 shows the different levels or states of an example electronic device implemented in accordance with the invention.
- the watchdog fault and the system reset are typically issued as a signal, but may also be available as a flag indicating the signal value.
- the electronic device In a normal mode, the electronic device remains in state Sl as long as it sees correct watchdog triggers (i.e., correct WD triggers) , and the signals and the related flags WDFault and Reset are inactive.
- the parameters sa and ha are initialized and set to zero. If a watchdog fault (i.e., an incorrect WD trigger) is detected, the electronic device enters into state S2 in escalation level 1.
- the parameter sa is increased by one every time a watchdog fault (incorrect WD trigger) is detected.
- state S2 the WDFault signal and the related flag remains active, and the reset signal Reset and the related flag remains inactive. If a correct WD trigger is detected, the processor passes to state S3, where the WDFault signal and the related flag are set inactive. The electronic device remains in state S3 as long as correct WD triggers are detected.
- the electronic device proceeds to state S4 in the second escalation level (escalation level 2) . Also in state S4, the WDFault signal and the related flag are set active as long as no correct WD triggers are detected. Each time an incorrect WD trigger is detected, the parameter ha is increased by one. If a correct WD trigger occurs, the processor moves on to state S5, and the WDFault signal and the related flag are set inactive. Further, the reset signal Reset and the related flag become inactive.
- a recovery step could be provided that leads from state S5 to S3.
- Each escalation level can thus be left in two ways, either to a higher escalation level or to lower (i.e., previous) escalation. Escalation level 1 would then be the previous escalation level with respect to escalation level 2.
- a recovery step could be implemented leading back to any escalation level lower than nx (i.e., any previous escalation level) and also to normal mode.
- FIGS. 2A-2C shows signals relating to the first escalation level, escalation level 1.
- FIG. 2A illustrates a situation, where sa ⁇ sa max and n correct WD triggers are received within the first recovery time t reC overi • Accordingly, the reset signal Reset remains inactive and the watchdog fault signal WDFault toggles from high to low when a first watchdog fault WDFault occurs. However, within the recovery time t reCo v e ri n correct watchdog event WD triggers occur and the watchdog fault signal WDFault is set inactive, i.e., logic high again.
- the system will move in a recovery step to state Sl, i.e., back to normal mode.
- state Sl i.e., back to normal mode.
- FIGS. 3A-3C show signals relating to the second escalation level.
- a reset pulse is issued in order to reset the electronic device.
- the situation for ha ⁇ ha max is shown in FIG. 3A. If m correct watchdog events are detected (indicated by m-th WD trigger in FIG. 3A) WDFault is set inactive, i.e., WDFault is set to logic high. This will also de-escalate the system and the system will continue in lower level, i.e., in normal mode. A recovery step to escalation level 1, i.e., to any previous escalation is also conceivable.
- FIG. 3A relates to a situation where ha ⁇ ha max and a correct WD trigger is detected within the second recovery time trecover2 •
- An electronic device such as any integrated electronic device with a CPU, can be adapted to perform the described method steps.
- the number of escalation levels is not limited to one first escalation level without reset and a second escalation level with a reset function.
- the number and sequence of escalation levels of the first type or the second type can be an integer equal to or greater than 1.
- the sequence of escalation levels with and without reset can be any sequence of first and second escalation levels.
- the safe state can also be reached directly after the first escalation level, if a second escalation level with reset is not required.
- the recovery or de- escalation steps are not limited. Any higher escalation level can have a recovery mechanism so as to recover to any lower or previous escalation level.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102007035584A DE102007035584B4 (de) | 2007-07-30 | 2007-07-30 | Watchdog-Vorrichtung zur Überwachung eines elektronischen Systems |
DE102007035586A DE102007035586B4 (de) | 2007-07-30 | 2007-07-30 | Watchdog-Vorrichtung zur Überwachung eines elektronischen Systems |
US1675207P | 2007-12-26 | 2007-12-26 | |
US1675107P | 2007-12-26 | 2007-12-26 | |
PCT/EP2008/059957 WO2009016187A2 (en) | 2007-07-30 | 2008-07-29 | Watchdog mechanism with fault recovery |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2176757A2 true EP2176757A2 (de) | 2010-04-21 |
Family
ID=40219984
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP08786592A Ceased EP2176757A2 (de) | 2007-07-30 | 2008-07-29 | Watchdog-mechanismus mit fehlerbehebung |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP2176757A2 (de) |
WO (1) | WO2009016187A2 (de) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006014400A1 (en) * | 2004-07-06 | 2006-02-09 | Intel Corporation | System and method to detect errors and predict potential failures |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4291403A (en) * | 1979-05-22 | 1981-09-22 | Rockwell International Corporation | Digital implementation of parity monitor and alarm |
US4598355A (en) * | 1983-10-27 | 1986-07-01 | Sundstrand Corporation | Fault tolerant controller |
US5600785A (en) * | 1994-09-09 | 1997-02-04 | Compaq Computer Corporation | Computer system with error handling before reset |
US20030028680A1 (en) * | 2001-06-26 | 2003-02-06 | Frank Jin | Application manager for a content delivery system |
-
2008
- 2008-07-29 WO PCT/EP2008/059957 patent/WO2009016187A2/en active Application Filing
- 2008-07-29 EP EP08786592A patent/EP2176757A2/de not_active Ceased
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006014400A1 (en) * | 2004-07-06 | 2006-02-09 | Intel Corporation | System and method to detect errors and predict potential failures |
Non-Patent Citations (1)
Title |
---|
See also references of WO2009016187A2 * |
Also Published As
Publication number | Publication date |
---|---|
WO2009016187A2 (en) | 2009-02-05 |
WO2009016187A3 (en) | 2009-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7966528B2 (en) | Watchdog mechanism with fault escalation | |
JP3633092B2 (ja) | マイコン故障監視装置 | |
JP5244981B2 (ja) | マイクロコンピュータ及びその動作方法 | |
CN106527249B (zh) | 外围看门狗定时器 | |
US7966527B2 (en) | Watchdog mechanism with fault recovery | |
CN107077408A (zh) | 故障处理的方法、计算机系统、基板管理控制器和系统 | |
CN113946148B (zh) | 一种基于多ecu协同控制的mcu芯片唤醒系统 | |
CN113407391A (zh) | 故障处理的方法、计算机系统、基板管理控制器和系统 | |
CN109960599B (zh) | 芯片系统及其看门狗自检方法、电器设备 | |
EP2176757A2 (de) | Watchdog-mechanismus mit fehlerbehebung | |
Lamberson | Single and Multistage Watchdog Timers | |
KR101300806B1 (ko) | 다중 프로세스 시스템에서 오동작 처리 장치 및 방법 | |
US8230286B1 (en) | Processor reliability improvement using automatic hardware disablement | |
JP4534995B2 (ja) | ディジタル形保護継電装置のリスタート方式 | |
JP2009053752A (ja) | ウォッチドッグ処理方法および異常検出回路 | |
CN102521089A (zh) | 硬件设备错误检测方法 | |
US7962264B2 (en) | Method and apparatus for adapting a monitoring device of a control unit for a restraint system of a motor vehicle | |
JP2017007539A (ja) | 制御装置 | |
JP4402715B2 (ja) | マイクロコントローラシステムおよびその駆動方法 | |
CN116431377B (zh) | 一种看门狗电路 | |
JP4396572B2 (ja) | 信号処理装置のリセット方法 | |
JP2003097345A (ja) | 車両用電子制御装置 | |
JP2018097442A (ja) | 電子制御装置 | |
JPH02293939A (ja) | スタックオーバーフロー検出時処理方式 | |
CN112596916A (zh) | 双核锁步错误恢复系统及方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20100301 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA MK RS |
|
17Q | First examination report despatched |
Effective date: 20100621 |
|
DAX | Request for extension of the european patent (deleted) | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20190205 |