CN117591347A - System abnormality detection device, processor and embedded system - Google Patents

System abnormality detection device, processor and embedded system Download PDF

Info

Publication number
CN117591347A
CN117591347A CN202410073942.6A CN202410073942A CN117591347A CN 117591347 A CN117591347 A CN 117591347A CN 202410073942 A CN202410073942 A CN 202410073942A CN 117591347 A CN117591347 A CN 117591347A
Authority
CN
China
Prior art keywords
task
module
counter
interrupt
hardware clock
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410073942.6A
Other languages
Chinese (zh)
Other versions
CN117591347B (en
Inventor
朱强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jidu Technology Co Ltd
Original Assignee
Beijing Jidu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jidu Technology Co Ltd filed Critical Beijing Jidu Technology Co Ltd
Priority to CN202410073942.6A priority Critical patent/CN117591347B/en
Priority claimed from CN202410073942.6A external-priority patent/CN117591347B/en
Publication of CN117591347A publication Critical patent/CN117591347A/en
Application granted granted Critical
Publication of CN117591347B publication Critical patent/CN117591347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a system abnormality detection device, a processor and a system. The system abnormality detection device comprises a first detection module; the first detection module comprises a first hardware clock, a time recording module, a timeout detection module and a first abnormality recording module; the time recording module is used for recording the starting time of the first task based on the first hardware clock, and determining the expected ending time of the first task according to the starting time of the first task and the first preset running time of the first task; the timeout detection module is used for detecting whether the first task is finished or not when the first hardware clock counts to the expected finishing time of the first task, and triggering the first exception recording module to generate exception information representing that the first task is a timeout exception task under the condition that the first task is detected not to be finished.

Description

System abnormality detection device, processor and embedded system
Technical Field
The present invention relates to the field of software technologies, and in particular, to a system anomaly detection device, a processor, and an embedded system.
Background
When the problem of abnormal software operation occurs, the system can reset and record the reset information, but the reasons causing the abnormal system can not be accurately positioned through the reset information. Therefore, it is necessary to propose a scheme that can record the cause of the system abnormality.
Disclosure of Invention
An object of the present invention is to provide a system abnormality detection device, a processor, and an embedded system, which can record abnormality information of the system.
According to a first aspect of the present invention, there is provided a system abnormality detection apparatus. The device comprises a first detection module; the first detection module comprises a first hardware clock, a time recording module, a timeout detection module and a first abnormality recording module; the time recording module is used for recording the starting time of the first task based on the first hardware clock, and determining the expected ending time of the first task according to the starting time of the first task and the first preset running time of the first task; the timeout detection module is used for detecting whether the first task is finished or not when the first hardware clock counts to the expected finishing time of the first task, and triggering the first exception recording module to generate exception information representing that the first task is a timeout exception task under the condition that the first task is detected not to be finished.
Optionally, the first hardware clock is a clock generated based on a system bus clock, the system bus clock being a bus clock generated based on a processor external hardware clock source.
Optionally, the apparatus comprises a second detection module; the second detection module is used for detecting timeout of a plurality of second tasks based on an interrupt mechanism of a window watchdog of the system.
Optionally, the second detection module comprises a second hardware clock, a dog feeding module, a counter setting module and a second abnormality recording module; the second hardware clock is used for triggering a first interrupt according to the feeding period of the window watchdog of the system; the counter setting module is used for generating a corresponding task counter when a second task starts, setting an initial value of the task counter as a first initial value, and canceling the corresponding task counter when the second task ends, wherein the first initial value is determined based on a ratio of a second preset running time of the second task to the dog feeding period; the dog feeding module is used for executing a dog feeding action on the window watchdog and triggering a task counter to be decremented by one under the condition that the first interrupt is triggered and the values of all task counters are not zero during the running period after the system is started; the second exception recording module is used for generating exception information representing a task counter which causes the window watchdog to generate interrupt when the window watchdog generates interrupt during the running period after the system is started.
Optionally, in the case that the first task and the second task are the same target task, the second preset running time is greater than the first preset running time; the timeout detection module is further configured to trigger the counter setting module to cancel a task counter corresponding to the target task when the target task is detected not to be ended.
Optionally, the apparatus comprises a third detection module; the third detection module is used for detecting whether the starting process of the system is overtime or not based on an interrupt mechanism of a window watchdog of the system.
Optionally, the third detection module comprises a second hardware clock, a dog feeding module, a counter setting module and a second abnormality recording module; the second hardware clock is used for triggering a first interrupt according to the feeding period of the window watchdog of the system; the counter setting module is used for generating a first counter when the system is electrified and setting an initial value of the first counter as a second initial value, and canceling the first counter after the system is started, wherein the second initial value is determined based on the ratio of the preset system starting time to the dog feeding period; the dog feeding module is used for executing a dog feeding action on the window watchdog and triggering the first counter to be decremented by one when the first interrupt is triggered and the value of the first counter is not zero during the system starting; the second exception recording module is used for generating exception information representing system starting overtime when the window watchdog generates an interrupt during system starting.
Optionally, the apparatus comprises a fourth detection module; the fourth detection module is used for detecting whether the power-down process of the system is overtime or not based on an interrupt mechanism of a window watchdog of the system.
Optionally, the fourth detection module comprises a second hardware clock, a dog feeding module, a counter setting module and a second abnormality recording module; the second hardware clock is used for triggering a first interrupt according to the feeding period of the window watchdog of the system; the counter setting module is used for generating a second counter when the system is powered down and starting, setting the initial value of the second counter as a third initial value, and canceling the second counter when the system is powered down and ending, wherein the third initial value is determined based on the ratio of the preset system power down time to the dog feeding period; the dog feeding module is used for executing a dog feeding action on the window watchdog and triggering the second counter to decrease by one when the first interrupt is triggered and the value of the second counter is not zero during the power-down period of the system; and the second exception recording module is used for generating exception information representing power-down timeout of the system when the window watchdog generates an interrupt during power-down of the system.
Optionally, the second hardware clock is a clock generated based on a system bus clock, the system bus clock being a bus clock generated based on a processor external hardware clock source.
Optionally, the device comprises an anomaly information storage module; the exception information storage module is to store exception information to a target memory area in a random access memory of a processor of the system, the target memory area configured not to be cleaned at a warm start of the processor of the system.
Optionally, the apparatus comprises a storage management module; and the storage management module is used for backing up the abnormal information in the target storage area into a nonvolatile memory of the system after the processor of the system is restarted.
According to a second aspect of the present invention, there is provided a processor. The processor includes the system abnormality detection apparatus according to any one of the first aspects of the present invention.
According to a third aspect of the present invention, an embedded system is provided. The embedded system comprises the system abnormality detection device according to the first aspect of the present invention or the processor according to the second aspect of the present invention.
The system abnormality detection device comprises a first detection module, wherein the first detection module comprises a first hardware clock, a time recording module, a timeout detection module and a first abnormality recording module; the time recording module is used for recording the starting time of the first task based on the first hardware clock, and determining the expected ending time of the first task according to the starting time of the first task and the first preset running time of the first task; the timeout detection module is used for detecting whether the first task is finished when the first hardware clock counts to the expected finishing time of the first task, and triggering the first exception recording module to generate exception information representing that the first task is a timeout exception task under the condition that the first task is detected not to be finished. The system abnormality detection device can actively detect overtime of task operation based on the first hardware clock, cannot occupy excessive system resources to influence normal operation of the system, and is accurate in detection result and not easy to identify by mistake.
Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic diagram of a first detection module of a system anomaly detection device in an embodiment of the present application.
Fig. 2 is a schematic diagram of a second detection module of the system anomaly detection device in an embodiment of the present application.
Fig. 3 is a schematic diagram of a third detection module of the system anomaly detection device in an embodiment of the present application.
Fig. 4 is a schematic diagram of a fourth detection module of the system anomaly detection device in the embodiment of the present application.
Fig. 5 is a schematic diagram of a system abnormality detection apparatus in one embodiment of the present application.
FIG. 6 is a schematic diagram of a first hardware clock in one embodiment of the present application.
FIG. 7 is a schematic diagram of a second hardware clock in one embodiment of the present application.
Fig. 8 is a schematic diagram of a processor in an embodiment of the present application.
Fig. 9 is a schematic diagram of an embedded system in an embodiment of the present application.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
In the running process of the system, when the problem of abnormal software running occurs, the system can reset and record the reset information, but the reasons of the abnormal system can not be accurately positioned through the reset information. In order to accurately locate the cause of system abnormality, the embodiment of the application provides a system abnormality detection device.
The system abnormality detection device provided by the embodiment of the application can be arranged in a processor of a system. The processor in the present application may be a CPU (Central Processing Unit ), an MCU (Micro controller Unit, microcontroller), an SOC (System On a Chip), an MPU (Micro Processor Unit, microprocessor), or the like. The system abnormality detection device provided by the embodiment of the application can be suitable for detecting the system abnormality of the embedded system.
Fig. 1 is a schematic diagram of a system abnormality detection apparatus in an embodiment of the present application. Referring to fig. 1, the system anomaly detection device includes a first detection module, where the first detection module includes a first hardware clock, a time recording module, a timeout detection module, and a first anomaly recording module.
The first hardware clock is used for timing, and compared with the software clock, the hardware clock occupies less system resources, and is high in timing precision and not easy to make timing errors.
In one embodiment, the first hardware clock may be a hardware clock disposed in the processor. In one embodiment, the first hardware clock may be generated based on a bus clock of the system, the bus clock being a physical signal generated by the hardware. Alternatively, in one embodiment, the first hardware clock may also be generated directly based on the hardware clock source.
In one embodiment, a system bus clock is constructed based on a hardware clock source of the system, and a first hardware clock is constructed via the bus clock. For example, a hardware clock source is coupled to the first frequency adjustment circuit to generate a bus clock. The bus clock is coupled to the second frequency adjustment circuit to generate a signal of the first hardware clock. For example, referring to fig. 6, the hardware clock source is a crystal oscillator, the clock frequency of the crystal oscillator is 2M, the frequency of the crystal oscillator is multiplied by the frequency multiplier circuit to form a 280M bus clock, and the 280M bus clock is divided by the frequency divider circuit to form a 32M first hardware clock.
In one embodiment, the processor has a hardware clock source integrated therein, and the bus clock and the first hardware clock may be generated using the processor's own hardware clock source. In one embodiment, the processor itself does not have a hardware clock source, and the processor constructs the bus clock and the first hardware clock from its external hardware clock source.
The time recording module is used for recording the starting time of the first task based on the first hardware clock, and determining the expected ending time of the first task according to the starting time of the first task and the first preset running time of the first task.
The timeout detection module is used for detecting whether the first task is finished when the first hardware clock counts to the expected finishing time of the first task, and triggering the first exception recording module to generate exception information representing that the first task is a timeout exception task under the condition that the first task is detected not to be finished.
For example, a first preset running time of a certain first task (task) is 10ms, and if the first task is not completed within 10ms after the first task starts, the first task is recorded as a timeout abnormal task.
In an embodiment, the first preset running time of the first task may be determined based on the running time actually required under the condition that the first task normally runs, and the system anomaly detection device in the embodiment of the application may actively detect whether the first task runs overtime.
In one embodiment, after the first detection module detects that the first task has timed out, the system may go to process the timeout problem for the first task, e.g., the system may force termination of the first task.
The system abnormality detection device can actively detect the first task operation overtime based on the first hardware clock, cannot occupy excessive system resources to influence the normal operation of the system, and is accurate in detection result and not easy to identify by mistake.
In one embodiment, the system anomaly detection apparatus may include a second detection module configured to timeout detect the plurality of second tasks based on an interrupt mechanism of a window watchdog of the system.
Referring to fig. 2, the second detection module includes a second hardware clock, a dog feeding module, a counter setting module, and a second anomaly recording module.
The second hardware clock is used to trigger the first interrupt according to a watchdog feeding period of a window watchdog of the system.
The counter setting module is used for generating a corresponding task counter when a second task starts and setting an initial value of the task counter as a first initial value, and canceling the corresponding task counter when the second task ends, wherein the first initial value is determined based on a ratio of a second preset running time of the second task to a dog feeding period.
During the operation period after the system is started, the dog feeding module executes a dog feeding action on the window watchdog and triggers each task counter to decrement one when the first interrupt is triggered and the values of all task counters are not zero.
The second exception recording module is used for generating exception information representing a task counter which causes the window watchdog to generate interrupt when the window watchdog generates interrupt during the operation period after the system is started.
In one embodiment, the window watchdog may be a window watchdog integrated in the processor, or may be a window watchdog located outside the processor and connected to the processor. For window watchdog, feeding the dog during its window period is required. If the dog is fed during the window period, the window watchdog will be refreshed. If the dog is not fed in the window period, the window watchdog generates an interrupt, triggers the system to reset, and causes the window watchdog and the second hardware clock to restart.
The feeding period is determined according to the upper window limit and the lower window limit of the window watchdog, the window watchdog is not interrupted when the dog is fed according to the feeding period, and the window watchdog is interrupted when the dog is fed for only one time after the feeding period is missed. According to the thinking, assuming that the window period of the window watchdog is X1-X2 after the window watchdog is started, namely, the time point corresponding to the upper limit of the window is X1 time after the window watchdog is started, and the time point corresponding to the lower limit of the window is X2 time after the window watchdog is started, the dog feeding period can be set to be X3, and X3 needs to meet that X1 is less than or equal to X3 is less than or equal to X2 and 2X 3 is more than or equal to X2. In one example, assuming that the window period of the window watchdog is 10 ms-30 ms after the window watchdog is started, that is, the time point corresponding to the upper window limit is 10ms after the window watchdog is started, and the time point corresponding to the lower window limit is 30ms after the window watchdog is started, the watchdog feeding period may be set to 20ms, and after the second hardware clock and the window watchdog are restarted, the first interrupt is triggered when the second hardware clock is clocked to 20ms according to the watchdog feeding period.
In an embodiment, the second preset running time of the second task may be determined based on the running time actually required under the normal running condition of the second task, and the system anomaly detection device in the embodiment of the present application may actively detect whether the running of the second task is overtime.
In one embodiment, the task counter is a down counter, and the count value of the task counter is decremented under the control of the dog feeding module until the count value is decremented to zero.
In one embodiment, a ratio of a second preset run time of the second task to the dog feeding period is calculated and rounded up, the rounded up result is taken as a first initial value, and the initial value of the task counter is set to the first initial value.
For example, the task counter corresponding to the second task a is T1, the second preset running time of the second task a is 150ms, the dog feeding period is 20ms, and the initial value of the task counter T1 is 8. When the second task a starts to start, the counter setting module generates a task counter T1 and sets the initial value of the task counter T1 to 8.
After the second task a starts to start, if the second task a does not complete running within 8 dog feeding cycles, the task counter T1 is decremented to zero. If the second task A is still not running at the 9 th feeding period, the feeding module cannot feed dogs at the 9 th feeding period, so that the window watchdog generates an interrupt. If the second task A completes running before the 9 th feeding cycle arrives, the task counter T1 is cancelled, and the feeding action of the feeding module is not affected. That is, when the second task a has a timeout (when the 9 th dog feeding period arrives, the second task a is still not completed), the window watchdog may be triggered to generate an interrupt, the second exception recording module may record a task counter T1 that causes the window watchdog to generate the interrupt, and determine that the second task a is a timeout task according to the recorded task counter T1.
Referring to fig. 2, assume that a plurality of second tasks are currently required to be monitored, where the plurality of second tasks includes a second task 1 to a second task N. When the first interrupt is triggered, if the values of the task counters of the second tasks 1-N are not zero, executing a dog feeding action on the window watchdog and triggering the task counter corresponding to the second tasks 1-N to be decremented by one. When the first interrupt is triggered, if the value of any task counter is zero, the dog feeding module cannot feed dogs, and after the window period of the window watchdog is finished, the window watchdog generates interrupt to trigger the system reset, so that the window watchdog and the second hardware clock are restarted; the second exception record module generates exception information representing a task counter which causes the window watchdog to generate an interrupt when the window watchdog generates the interrupt, and can determine a second task with a timeout exception according to the task counter which causes the window watchdog to generate the interrupt.
The system abnormality detection device can detect the second task operation overtime based on the second hardware clock and the window watchdog, so that the normal operation of the system is not influenced by excessive system resources, and the detection result is accurate and is not easy to be identified by mistake.
In one embodiment, in the case that the first task and the second task are the same target task, the second preset running time is set to be greater than the first preset running time; the timeout detection module is further configured to trigger the counter setting module to cancel a task counter corresponding to the target task if it is detected that the target task is not ended when the first hardware clock times to an expected end time of the target task.
And setting the second preset running time to be longer than the first preset running time for the target task under the condition that the first task and the second task are the same target task. If the first detection module works normally and detects that the target task is overtime based on the first preset running time, the system can solve the overtime problem, and at the moment, a task counter corresponding to the target task can be canceled to stop overtime detection of the second detection module on the target task. If the first detection module fails to detect whether the target task exceeds the first preset running time, the second detection module can detect whether the target task is overtime based on the second preset running time. In this way, the reliability of the abnormality detection device is improved.
In one embodiment, a task is more suitable for detection by the first detection module as a first task if the actual required run time for normal operation of the task is less than the dog feeding period.
In one embodiment, the system anomaly detection device may include a third detection module. The third detection module is used for detecting whether the system starting process is overtime or not based on an interrupt mechanism of a window watchdog of the system.
Referring to fig. 3, the third detection module includes a second hardware clock, a dog feeding module, a counter setting module, and a second anomaly recording module.
The second hardware clock is used to trigger the first interrupt according to a watchdog feeding period of a window watchdog of the system.
The counter setting module is used for generating a first counter when the system is powered on, setting the initial value of the first counter as a second initial value, and canceling the first counter after the system is started, wherein the second initial value is determined based on the ratio of the preset system starting time to the dog feeding period.
The feeding dog module is used for executing a feeding dog action on the window watchdog and triggering the first counter to be decremented by one when the first interrupt is triggered and the value of the first counter is not zero during the system starting.
The second exception recording module is used for generating exception information representing system starting overtime when the window watchdog generates an interrupt during system starting.
In one embodiment, the window watchdog may be a window watchdog integrated in the processor, or may be a window watchdog located outside the processor and connected to the processor. For window watchdog, feeding the dog during its window period is required. If the dog is fed during the window period, the window watchdog will be refreshed. If the dog is not fed in the window period, the window watchdog generates an interrupt, triggers the system to reset, and causes the window watchdog and the second hardware clock to restart.
The feeding period is determined according to the upper window limit and the lower window limit of the window watchdog, the window watchdog is not interrupted when the dog is fed according to the feeding period, and the window watchdog is interrupted when the dog is fed for only one time after the feeding period is missed. According to the thinking, assuming that the window period of the window watchdog is X1-X2 after the window watchdog is started, namely, the time point corresponding to the upper limit of the window is X1 time after the window watchdog is started, and the time point corresponding to the lower limit of the window is X2 time after the window watchdog is started, the dog feeding period can be set to be X3, and X3 needs to meet that X1 is less than or equal to X3 is less than or equal to X2 and 2X 3 is more than or equal to X2. In one example, assuming that the window period of the window watchdog is 10 ms-30 ms after the window watchdog is started, that is, the time point corresponding to the upper window limit is 10ms after the window watchdog is started, and the time point corresponding to the lower window limit is 30ms after the window watchdog is started, the watchdog feeding period may be set to 20ms, and after the second hardware clock and the window watchdog are restarted, the first interrupt is triggered when the second hardware clock is clocked to 20ms according to the watchdog feeding period.
In one embodiment, the preset system start time may be determined based on a time required for normal start of the system, and the system anomaly detection device in the embodiment of the present application may actively detect whether the system start process is overtime.
In one embodiment, the first counter is a decrementing counter, and the count value of the first counter is decremented under control of the feeding module until the decrementing is zero.
In one embodiment, a ratio of a preset system start time and a dog feeding period is calculated and rounded up, the rounded up result is taken as a second initial value, and the initial value of the first counter is set to the second initial value.
Assuming that the initial value of the first counter is M, at the start of the system, the counter setting module generates the first counter and sets the initial value of the first counter to M. After the system is started, the counter setting module cancels the first counter.
After the system starts to start, if the system does not complete the start-up process within M dog feeding cycles, the first counter is decremented to zero. If the system is still not started when the M+1st dog feeding period comes, the dog feeding module cannot feed dogs in the M+1st dog feeding period, so that the window watchdog is interrupted. If the system is started before the M+1st dog feeding period comes, the first counter is cancelled, and the dog feeding action of the dog feeding module is not influenced. That is, in the case of a system start timeout, the window watchdog may be triggered to generate an interrupt, and the second exception record module may generate exception information that characterizes the system start timeout.
The system abnormality detection device can detect overtime of the system starting process based on the second hardware clock and the window watchdog, cannot occupy excessive system resources to influence the system starting process, and is accurate in detection result and not easy to identify by mistake.
In one embodiment, the anomaly detection means comprises a fourth detection module. The fourth detection module is used for detecting whether the power-down process of the system is overtime or not based on an interrupt mechanism of a window watchdog of the system.
Referring to fig. 4, the fourth detection module includes a second hardware clock, a dog feeding module, a counter setting module, and a second anomaly recording module.
The second hardware clock is used to trigger the first interrupt according to a watchdog feeding period of a window watchdog of the system.
The counter setting module is used for generating a second counter when the system is powered down and setting an initial value of the second counter to be a third initial value, and canceling the second counter when the system is powered down and is finished, wherein the third initial value is determined based on a ratio of the preset system power down time to the dog feeding period.
The feeding dog module is used for executing a feeding dog action on the window watchdog and triggering the second counter to be decremented by one when the first interrupt is triggered and the value of the second counter is not zero during the power-down period of the system.
The second exception recording module is used for generating exception information representing power-down timeout of the system when the window watchdog generates an interrupt during power-down of the system.
In one embodiment, the window watchdog may be a window watchdog integrated in the processor, or may be a window watchdog located outside the processor and connected to the processor. For window watchdog, feeding the dog during its window period is required. If the dog is fed during the window period, the window watchdog will be refreshed. If the dog is not fed in the window period, the window watchdog generates an interrupt, triggers the system to reset, and causes the window watchdog and the second hardware clock to restart.
The feeding period is determined according to the upper window limit and the lower window limit of the window watchdog, the window watchdog is not interrupted when the dog is fed according to the feeding period, and the window watchdog is interrupted when the dog is fed for only one time after the feeding period is missed. According to the thinking, assuming that the window period of the window watchdog is X1-X2 after the window watchdog is started, namely, the time point corresponding to the upper limit of the window is X1 time after the window watchdog is started, and the time point corresponding to the lower limit of the window is X2 time after the window watchdog is started, the dog feeding period can be set to be X3, and X3 needs to meet that X1 is less than or equal to X3 is less than or equal to X2 and 2X 3 is more than or equal to X2. In one example, assuming that the window period of the window watchdog is 10 ms-30 ms after the window watchdog is started, that is, the time point corresponding to the upper window limit is 10ms after the window watchdog is started, and the time point corresponding to the lower window limit is 30ms after the window watchdog is started, the watchdog feeding period may be set to 20ms, and after the second hardware clock and the window watchdog are restarted, the first interrupt is triggered when the second hardware clock is clocked to 20ms according to the watchdog feeding period.
In an embodiment, the preset system power-down time may be determined based on a time required for normal power-down of the system, and the system abnormality detection device in the embodiment of the present application may actively detect whether a power-down process of the system is overtime.
In one embodiment, the second counter is a decrementing counter, and the count value of the second counter is decremented under control of the feeding module until the decrementing is zero.
In one embodiment, the ratio of the power-down time of the preset system to the feeding period is calculated and rounded up, the rounded up result is used as a third initial value, and the initial value of the second counter is set to the third initial value.
Assuming that the initial value of the second counter is L, at the start of power-down of the system, the counter setting module generates the second counter and sets the initial value of the second counter to L. And after the system is powered down, the counter setting module cancels the second counter.
After the system starts powering down, if the system does not complete the powering down process within L feeding cycles, the second counter is decremented to zero. If the system is still not powered down when the L+1th dog feeding period arrives, the dog feeding module cannot feed dogs in the L+1th dog feeding period, so that the window watchdog is interrupted. If the system power down is over before the L+1st dog feeding period comes, the second counter is cancelled, and the dog feeding action of the dog feeding module is not affected. That is, under the condition of power-down timeout of the system, the window watchdog can be triggered to generate interrupt, and the second exception recording module can generate exception information representing the power-down timeout of the system.
The system abnormality detection device can detect overtime of the system power-down process based on the second hardware clock and the window watchdog, so that excessive system resources cannot be occupied to influence the system power-down process, and the detection result is accurate and is not easy to identify by mistake.
The second hardware clock mentioned in the above embodiment is explained below. The second hardware clock is used for timing, and compared with the software clock, the hardware clock occupies less system resources, and is high in timing precision and not easy to make timing errors.
In one embodiment, the second hardware clock may be a hardware clock disposed in the processor. In one embodiment, the second hardware clock may be generated based on a bus clock of the system, the bus clock being a physical signal generated by the hardware. Alternatively, in one embodiment, the second hardware clock may also be generated directly based on the hardware clock source.
In one embodiment, a system bus clock is constructed based on a hardware clock source of the system, and a second hardware clock is constructed via the bus clock. For example, a hardware clock source is coupled to the first frequency adjustment circuit to generate a bus clock. The bus clock is coupled to the second frequency adjustment circuit to generate a signal of the second hardware clock. For example, referring to fig. 7, the hardware clock source is a crystal oscillator, the clock frequency of the crystal oscillator is 2M, the frequency of the crystal oscillator is multiplied by the frequency multiplier circuit to form a 280M bus clock, and the 280M bus clock is divided by the frequency divider circuit to form a 160M second hardware clock.
In one embodiment, the processor has a hardware clock source integrated therein, and the bus clock and the second hardware clock may be generated using the processor's own hardware clock source. In one embodiment, the processor itself does not have a hardware clock source, and the bus clock and the second hardware clock are constructed from a hardware clock source external to the processor.
In one embodiment, referring to fig. 5, the system anomaly detection apparatus further includes an anomaly information storage module.
The abnormality information storage module is configured to store abnormality information detected by each detection module in the foregoing embodiments into a target storage area in a random access memory (Random Access Memory, RAM) of a processor of the system, the target storage area being configured not to be cleared at a time of a warm start of the processor of the system.
That is, the random access memory of the processor is previously partitioned into a target memory area that is configured not to be cleaned up at the time of a hot start of the processor. By storing the abnormal information in the target storage area in this way, it can be ensured that the previously stored abnormal information is not lost after the system is abnormally restarted.
In one embodiment, referring to fig. 5, the system anomaly detection apparatus further includes a storage management module.
In one embodiment, the storage management module is configured to periodically back up exception information in the target storage area to a non-volatile memory (NVM) of the system.
In one embodiment, the storage management module is configured to backup the exception information in the target storage area to the nonvolatile memory of the system after the processor of the system is restarted. In one embodiment, the storage management module backs up the exception information in the target storage area to the non-volatile memory of the system in the event of a cold and hot boot of the processor.
In one embodiment, the storage management module can only backup the abnormal information which is not backed up in the target storage area to the nonvolatile memory of the system each time, thereby avoiding repeated backup and improving the backup efficiency.
In one embodiment, the storage management module backs up the exception information in the target storage area into non-volatile Flash memory (Nor Flash).
By the method, the abnormal information in the target storage area can be backed up to the nonvolatile memory in time, so that the safety of the abnormal information storage is further ensured, and the method is convenient for related personnel to read and use.
In one embodiment, the storage management module is configured to store information of a current cold start as exception information in the target storage area after the processor of the system is cold started.
In one embodiment, the storage management module is configured to clean the target storage area after the processor of the system is cold started, and then store information of the cold start as abnormal information in the target storage area. The storage management module may backup the exception information in the target storage area to the non-volatile memory before the target storage area is cleaned.
In one embodiment, the storage management module is configured to store information of a current cold start as exception information in the target storage area after the processor of the system is cold started, backup the exception information in the target storage area to the nonvolatile memory, and then clean the target storage area.
That is, after the processor is cold started, the storage management module cleans the target storage area, so that the target storage area can have space to store new exception information. After the processor is cold started, the storage management module also stores the information of the cold start in the target storage area, and the newly stored information of the cold start is regarded as abnormal information to be backed up in the nonvolatile memory of the system.
The embodiment of the disclosure provides a system abnormality detection method, which comprises steps S101-S102.
Step S101, recording a start time of the first task based on the first hardware clock, and determining an expected end time of the first task according to the start time of the first task and a first preset running time of the first task.
Step S102, detecting whether the first task is finished or not when the first hardware clock times to the expected finishing time of the first task, and triggering the first exception recording module to generate exception information representing that the first task is a timeout exception task under the condition that the first task is detected not to be finished.
The embodiment of the disclosure provides a system abnormality detection method, which comprises steps S201-S203.
Step S201, generating a corresponding task counter at the beginning of a second task, setting the initial value of the task counter as a first initial value, and canceling the corresponding task counter at the end of the second task, wherein the first initial value is determined based on the ratio of the second preset running time of the second task to the dog feeding period of the window watchdog.
In step S202, during the operation after the system is started, in the case that the first interrupt is triggered and the values of all task counters are not zero, the watchdog feeding operation is performed once for the window watchdog and the task counter is triggered to decrement by one. Wherein the first interrupt is triggered by the second hardware clock according to a watchdog feeding period of the window watchdog.
Step S203, during the operation after the system is started, generating abnormal information representing a task counter that causes the window watchdog to generate an interrupt when the window watchdog generates an interrupt.
The embodiment of the disclosure provides a system anomaly detection method, which comprises steps S301-S303.
Step S301, a first counter is generated when the system is powered on, an initial value of the first counter is set to be a second initial value, the first counter is canceled after the system is started, and the second initial value is determined based on the ratio of the preset system starting time to the dog feeding period of the window watchdog.
Step S302, during system start-up, in case the first interrupt is triggered and the value of the first counter is not zero, performing a feeding action on the window watchdog and triggering the first counter to decrement by one. Wherein the first interrupt is triggered by the second hardware clock according to a watchdog feeding period of the window watchdog.
Step S303, during system start-up, generating exception information characterizing system start-up timeout when the window watchdog generates an interrupt.
The embodiment of the disclosure provides a system anomaly detection method, which comprises steps S401-S403.
Step S401, generating a second counter at the beginning of system power-down and setting the initial value of the second counter as a third initial value, and canceling the second counter at the end of system power-down, wherein the third initial value is determined based on the ratio of the preset system power-down time and the dog feeding period of the window watchdog.
Step S402, during power-down of the system, in a case where the first interrupt is triggered and the value of the second counter is not zero, performing a feeding action on the window watchdog and triggering the second counter to decrement by one. Wherein the first interrupt is triggered by the second hardware clock according to a watchdog feeding period of the window watchdog.
Step S403, during the power-down period of the system, generating abnormal information characterizing the power-down timeout of the system when the window watchdog generates an interrupt.
The relevant features of the system abnormality detection method of the above disclosed embodiment may be referred to the relevant description of the system abnormality detection device, and will not be repeated here. In addition, the system abnormality detection methods of the above disclosed embodiments may be used in combination, which is not limited in this application.
An embodiment of the present disclosure provides a processor, including a system anomaly detection apparatus as described in any one of the preceding embodiments. Referring to fig. 8, the processor may further include a random access memory.
The embodiment of the disclosure provides an embedded system, which comprises the system abnormality detection device described in any one of the previous embodiments or the processor described in the previous embodiments. Referring to fig. 9, the embedded system may further include a hardware clock source and a nonvolatile memory, each of which is coupled to the processor.
This embodiment describes a computer program product comprising computer programs/instructions which when executed implement the steps in the system anomaly detection method according to any one of the embodiments of the present invention.
This embodiment describes a readable storage medium having stored thereon a program or instructions that when executed by a processor implement a system anomaly detection method according to any one of the embodiments of the present invention.
The methods in this application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described herein are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user device, a core network device, an OAM, or other programmable apparatus.
The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, e.g., floppy disk, hard disk, tape; but also optical media such as digital video discs; but also semiconductor media such as solid state disks. The computer readable storage medium may be volatile or nonvolatile storage medium, or may include both volatile and nonvolatile types of storage medium.
The computer programs/instructions described herein may be downloaded from a computer readable storage medium to the individual computing/processing devices or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvement of the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (14)

1. A system anomaly detection device, characterized in that the device comprises a first detection module; the first detection module comprises a first hardware clock, a time recording module, a timeout detection module and a first abnormality recording module;
the time recording module is used for recording the starting time of the first task based on the first hardware clock, and determining the expected ending time of the first task according to the starting time of the first task and the first preset running time of the first task;
the timeout detection module is used for detecting whether the first task is finished or not when the first hardware clock counts to the expected finishing time of the first task, and triggering the first exception recording module to generate exception information representing that the first task is a timeout exception task under the condition that the first task is detected not to be finished.
2. The apparatus of claim 1, wherein the first hardware clock is a clock generated based on a system bus clock, the system bus clock being a bus clock generated based on a processor external hardware clock source.
3. The apparatus of claim 1, wherein the apparatus comprises a second detection module; the second detection module is used for detecting timeout of a plurality of second tasks based on an interrupt mechanism of a window watchdog of the system.
4. The apparatus of claim 3, wherein the second detection module comprises a second hardware clock, a dog feeding module, a counter setting module, and a second exception logging module;
the second hardware clock is used for triggering a first interrupt according to the feeding period of the window watchdog of the system;
the counter setting module is used for generating a corresponding task counter when a second task starts, setting an initial value of the task counter as a first initial value, and canceling the corresponding task counter when the second task ends, wherein the first initial value is determined based on a ratio of a second preset running time of the second task to the dog feeding period;
the dog feeding module is used for executing a dog feeding action on the window watchdog and triggering a task counter to be decremented by one under the condition that the first interrupt is triggered and the values of all task counters are not zero during the running period after the system is started;
The second exception recording module is used for generating exception information representing a task counter which causes the window watchdog to generate interrupt when the window watchdog generates interrupt during the running period after the system is started.
5. The apparatus of claim 4, wherein the second preset run time is greater than the first preset run time if the first task and the second task are the same target task;
the timeout detection module is further configured to trigger the counter setting module to cancel a task counter corresponding to the target task when the target task is detected not to be ended.
6. The apparatus of claim 1, wherein the apparatus comprises a third detection module; the third detection module is used for detecting whether the starting process of the system is overtime or not based on an interrupt mechanism of a window watchdog of the system.
7. The apparatus of claim 6, wherein the third detection module comprises a second hardware clock, a dog feeding module, a counter setting module, and a second exception logging module;
the second hardware clock is used for triggering a first interrupt according to the feeding period of the window watchdog of the system;
The counter setting module is used for generating a first counter when the system is electrified and setting an initial value of the first counter as a second initial value, and canceling the first counter after the system is started, wherein the second initial value is determined based on the ratio of the preset system starting time to the dog feeding period;
the dog feeding module is used for executing a dog feeding action on the window watchdog and triggering the first counter to be decremented by one when the first interrupt is triggered and the value of the first counter is not zero during the system starting;
the second exception recording module is used for generating exception information representing system starting overtime when the window watchdog generates an interrupt during system starting.
8. The apparatus of claim 1, wherein the apparatus comprises a fourth detection module; the fourth detection module is used for detecting whether the power-down process of the system is overtime or not based on an interrupt mechanism of a window watchdog of the system.
9. The apparatus of claim 8, wherein the fourth detection module comprises a second hardware clock, a dog feeding module, a counter setting module, and a second exception logging module;
the second hardware clock is used for triggering a first interrupt according to the feeding period of the window watchdog of the system;
The counter setting module is used for generating a second counter when the system is powered down and starting, setting the initial value of the second counter as a third initial value, and canceling the second counter when the system is powered down and ending, wherein the third initial value is determined based on the ratio of the preset system power down time to the dog feeding period;
the dog feeding module is used for executing a dog feeding action on the window watchdog and triggering the second counter to decrease by one when the first interrupt is triggered and the value of the second counter is not zero during the power-down period of the system;
and the second exception recording module is used for generating exception information representing power-down timeout of the system when the window watchdog generates an interrupt during power-down of the system.
10. The apparatus of any of claims 4, 7, 9, wherein the second hardware clock is a clock generated based on a system bus clock, the system bus clock being a bus clock generated based on a processor external hardware clock source.
11. The apparatus of claim 1, wherein the apparatus comprises an anomaly information storage module;
the exception information storage module is to store exception information to a target memory area in a random access memory of a processor of the system, the target memory area configured not to be cleaned at a warm start of the processor of the system.
12. The apparatus of claim 11, wherein the apparatus comprises a storage management module;
and the storage management module is used for backing up the abnormal information in the target storage area into a nonvolatile memory of the system after the processor of the system is restarted.
13. A processor comprising the system anomaly detection device of any one of claims 1 to 12.
14. An embedded system comprising the system anomaly detection device of any one of claims 1 to 12 or the processor of claim 13.
CN202410073942.6A 2024-01-18 System abnormality detection device, processor and embedded system Active CN117591347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410073942.6A CN117591347B (en) 2024-01-18 System abnormality detection device, processor and embedded system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410073942.6A CN117591347B (en) 2024-01-18 System abnormality detection device, processor and embedded system

Publications (2)

Publication Number Publication Date
CN117591347A true CN117591347A (en) 2024-02-23
CN117591347B CN117591347B (en) 2024-04-26

Family

ID=

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09244923A (en) * 1996-03-11 1997-09-19 Hitachi Ltd Abnormality monitoring device using watchdog timer
CN102339029A (en) * 2011-06-30 2012-02-01 电子科技大学 Method for realizing timing protection of embedded operating system
CN102761439A (en) * 2012-06-13 2012-10-31 烽火通信科技股份有限公司 Device and method for detecting and recording abnormity on basis of watchdog in PON (Passive Optical Network) access system
CN103885847A (en) * 2014-02-08 2014-06-25 京信通信系统(中国)有限公司 Dog feeding method and device based on embedded system
US20230089576A1 (en) * 2021-09-23 2023-03-23 Apple Inc. Datalogging Circuit Triggered by a Watchdog Timer

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09244923A (en) * 1996-03-11 1997-09-19 Hitachi Ltd Abnormality monitoring device using watchdog timer
CN102339029A (en) * 2011-06-30 2012-02-01 电子科技大学 Method for realizing timing protection of embedded operating system
CN102761439A (en) * 2012-06-13 2012-10-31 烽火通信科技股份有限公司 Device and method for detecting and recording abnormity on basis of watchdog in PON (Passive Optical Network) access system
CN103885847A (en) * 2014-02-08 2014-06-25 京信通信系统(中国)有限公司 Dog feeding method and device based on embedded system
US20230089576A1 (en) * 2021-09-23 2023-03-23 Apple Inc. Datalogging Circuit Triggered by a Watchdog Timer

Similar Documents

Publication Publication Date Title
EP3660681B1 (en) Memory fault detection method and device, and server
CN105988884B (en) Method and apparatus for controlling watchdog
TW201235840A (en) Error management across hardware and software layers
US20150143052A1 (en) Managing faulty memory pages in a computing system
US20200033928A1 (en) Method of periodically recording for events
CN110109741B (en) Method and device for managing circular tasks, electronic equipment and storage medium
US20120105112A1 (en) Method and Apparatus for Providing System Clock Failover
CN104320308A (en) Method and device for detecting anomalies of server
US9389942B2 (en) Determine when an error log was created
JP2016224883A (en) Fault detection method, information processing apparatus, and fault detection program
CN112631820A (en) Fault recovery method and device of software system
CN109302445A (en) Host node state determines method, apparatus, host node and storage medium
US11023335B2 (en) Computer and control method thereof for diagnosing abnormality
US9513983B2 (en) Method for maintaining file system of computer system
CN117591347B (en) System abnormality detection device, processor and embedded system
CN100507866C (en) CPU suppression system and CPU suppression method using service processor
CN117591347A (en) System abnormality detection device, processor and embedded system
US8099637B2 (en) Software fault detection using progress tracker
CN107179911B (en) Method and equipment for restarting management engine
CN113127245B (en) Method, system and device for processing system management interrupt
JP5627414B2 (en) Action log collection system and program
CN110865906B (en) Motor initial position angle storage method and device, vehicle and storage medium
JPH11259340A (en) Reactivation control circuit for computer
CN102934090A (en) Device and method for restoring information in a main storage device
WO2014112039A1 (en) Information processing device, method for controlling information processing device and information processing device control program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant