CN116048861A - Multi-level watchdog design method, device, equipment and storage medium - Google Patents

Multi-level watchdog design method, device, equipment and storage medium Download PDF

Info

Publication number
CN116048861A
CN116048861A CN202310076336.5A CN202310076336A CN116048861A CN 116048861 A CN116048861 A CN 116048861A CN 202310076336 A CN202310076336 A CN 202310076336A CN 116048861 A CN116048861 A CN 116048861A
Authority
CN
China
Prior art keywords
watchdog
thread
kernel
hardware
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310076336.5A
Other languages
Chinese (zh)
Inventor
唐仕斌
陈淑武
彭府
吴世川
邱梓捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XIAMEN FOUR-FAITH COMMUNICATION TECHNOLOGY CO LTD
Original Assignee
XIAMEN FOUR-FAITH COMMUNICATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XIAMEN FOUR-FAITH COMMUNICATION TECHNOLOGY CO LTD filed Critical XIAMEN FOUR-FAITH COMMUNICATION TECHNOLOGY CO LTD
Priority to CN202310076336.5A priority Critical patent/CN116048861A/en
Publication of CN116048861A publication Critical patent/CN116048861A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • G06F9/4887Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a multi-level watchdog design method, a device, equipment and a storage medium, which comprise the following steps: starting a thread watchdog module, feeding dogs to a process at regular time through the module, detecting a process feedback signal and a thread current state, and processing the thread according to the thread current state; starting a process watchdog program, feeding dogs to the kernel at regular time through the program, detecting a kernel feedback signal and a current state of the process, and processing the process according to the current state of the process; starting a kernel watchdog driver, feeding the watchdog to hardware at regular time through the driver, detecting a hardware feedback signal and the current state of the kernel, and processing the kernel according to the current state of the kernel; and starting a hardware watchdog circuit, detecting the feeding state of the kernel through the hardware watchdog circuit, and processing the embedded system according to the feeding state. In addition, the existing watchdog design lacks fine exception handling, which is easy to cause the blockage of a single feeding channel, so that the thread feeding is not timely caused to cause overtime of feeding.

Description

Multi-level watchdog design method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of watchdog, in particular to a multi-level watchdog design method, device, equipment and storage medium.
Background
Aiming at the complexity and reliability requirements of the embedded system on the current market, the important service of the embedded system is generally required to attempt to restore the normal operation of the service by restarting the service or restarting the system or resetting the hardware when the abnormality occurs. In the embedded linux system on the current market, a standard watchdog driving frame is provided, the inner core uses the driving timing to reset an external hardware watchdog circuit, when the inner core abnormally stops running, the reset of the external hardware watchdog circuit is stopped, and when the hardware watchdog circuit is reset for a certain time, the reset restarting operation is carried out on the whole system through a chip reset pin or a power supply reset pin; and the device is provided with a/dev/watchdog character device, when an application program turns on the device, dogs need to be fed to the kernel through the character device at regular time, and when the application layer abnormally exits, the kernel timer overtime, the system restarting function is triggered. In the current linux system release openwrt, a procd service is also provided at an application layer, application layer software is monitored, and when an application is abnormally exited or the application is restarted, the application program is restarted, so that the watchdog function of the application program is realized.
However, the watchdog driver of linux on the market generally directly restarts the system for exception handling of the application layer program, but many exceptions only need to restart the application, and do not need to restart the system; the watchdog design of the linux regards an external watchdog circuit as a trusted environment, but the hardware watchdog circuit also has failure condition, and when the failure condition occurs, no corresponding warning and processing exists. The process monitoring of the procd on the market only responds to the abnormal exit of the process, but does not process the dead problem of the application program, and the abnormality of the key thread in the multithreaded application cannot be processed; and the progress daemon and the watchdog of linux are mutually independent, after the procd itself is abnormal, the abnormal processing function of the application layer program is immediately lost, and although the procd and the watchdog are driven to be hooked by a technical means, the key thread problem in the multithreaded program still cannot be solved.
In short, the two common watchdog designs of the watchdog driving and the procd process of the linux system in the market have advantages but are independent, and the process is taken as the minimum unit for monitoring the abnormality, so that the method lacks of fine exception handling; however, if the exception monitoring is directly refined to the threads, the feeding operation of a large number of threads is very easy to cause the blockage of a single feeding channel, so that the threads are not fed timely and the feeding time is overtime. The watchdog design has exception transmission, and the exception transmission belongs to unidirectional transmission, so that an upper-level watchdog is regarded as a trusted environment, if the upper level is abnormal, exception processing cannot be executed, and the final hardware reset or system restarting can be waited only; failure of an application or sub-thread to go through a normal exit procedure may cause damage to the external hardware of the control or security problems, while data may be lost or corrupted.
In view of this, the present application is presented.
Disclosure of Invention
In view of the above, the present invention aims to provide a multi-level watchdog design method, device, apparatus and storage medium, which can effectively solve the problem that in the prior art, the watchdog design monitors the abnormality in a minimum unit of process, and lacks of fine exception handling; however, if the exception monitoring is directly refined to the threads, the feeding operation of a large number of threads is very easy to cause the blockage of a single feeding channel, so that the problem of overtime of feeding is caused because the threads are not fed timely.
The invention discloses a multi-level watchdog design method, which comprises the following steps:
starting a thread watchdog module embedded in an application program, feeding a process timing dog through a first timing task of the thread watchdog module, detecting a feedback signal of the process and the current state of the thread, and restarting the thread according to the current state of the thread;
starting a process watchdog program running in a user mode, feeding dogs to the kernel at regular time through a second timing task of the process watchdog program, detecting a feedback signal of the kernel and the current state of the process, and restarting the process according to the current state of the process;
starting a kernel watchdog driver running in a kernel mode, feeding the hardware with timing through a third timing task of the kernel watchdog driver, detecting a feedback signal of the hardware and the current state of the kernel, and restarting the kernel according to the current state of the kernel;
and starting a hardware watchdog circuit, detecting the feeding state of the inner core through the hardware watchdog circuit, and resetting the embedded system according to the feeding state of the inner core.
Preferably, a thread watchdog module embedded in an application program is started, a process is fed with timing by a first timing task of the thread watchdog module, and a feedback signal of the process is detected, specifically:
after an application program is started, registering a process where the thread watchdog module is located in the process watchdog program, and controlling the thread watchdog module to send a watchdog feeding signal to the process watchdog program;
and when the first ack signal returned by the process watchdog program is not received for more than the preset times, sending an abnormal signal to the thread, informing the thread to save the data and exiting the process.
Preferably, the feedback signal of the process and the current state of the thread are detected, and the restart processing is performed on the thread according to the current state of the thread, specifically:
generating a daemon thread queue corresponding to the thread watchdog module;
when a thread is started, registering a thread ID of the thread into the daemon thread queue through a message queue, and updating the running time of the thread in the daemon thread queue at regular time;
continuously detecting the running time of the threads in the daemon thread queue;
restarting the thread when judging that the thread is dead or abnormally exited according to the running time of the thread in the daemon thread queue;
and when judging that the restarting invalidation of the thread exceeds the preset times, exiting the process, and sending an abnormal signal to the process watchdog program.
Preferably, a process watchdog program running in a user mode is started, a kernel is fed with a dog at a timing through a second timing task of the process watchdog program, and a feedback signal of the kernel is detected, wherein the feedback signal is specifically:
after a file system is started, controlling the process watchdog program to send a watchdog feeding signal to the kernel watchdog driver;
and when the fact that the second ack signal returned by the kernel watchdog driver is not received exceeds the preset times is judged, sending a SIGING signal to each daemon process, and informing the process to execute a normal exit flow.
Preferably, the feedback signal of the kernel and the current state of the process are detected, and the process is restarted according to the current state of the process, specifically:
registering a PIPE communication pipeline in a file system, and generating a daemon queue corresponding to the process watchdog program;
when the daemon process is started, the process number of the daemon process is sent and registered into the daemon process queue through the PIPE communication pipeline, and the running time of the daemon process is updated regularly through the PIPE communication pipeline;
continuously detecting the running time of the daemon process in the daemon process queue;
restarting the process when judging that the process is dead or abnormally exited according to the running time of the daemon process in the daemon process queue;
and when judging that the restarting invalidation of the process exceeds the preset times, exiting the process, and sending an abnormal signal to the kernel watchdog driver.
Preferably, a kernel watchdog driver running in a kernel mode is started, the hardware is fed with the timing of a third timing task of the kernel watchdog driver, and a feedback signal of the hardware is detected, specifically:
after starting the kernel, controlling the kernel watchdog driver to send a dog feeding signal to the hardware watchdog circuit;
when the fact that the third ack signal returned by the hardware watchdog circuit is not received exceeds the preset times is judged, a SIGING signal is sent to an application layer;
after capturing the SIGING signal, the application program sends out a hardware damage warning prompt.
Preferably, the feedback signal of the hardware and the current state of the kernel are detected, and the restart processing is performed on the kernel according to the current state of the kernel, specifically:
generating character equipment corresponding to the kernel watchdog driver, and starting a timeout counter;
the process watchdog program resets the timeout counter at regular time through the character device;
when judging that the process watchdog program is dead or abnormally exits according to the timeout counter, adding 1 to an abnormal count value stored in a power-down holding area;
stopping the reset operation of the hardware watchdog circuit when the abnormal count value is judged to exceed a preset value in a preset time;
and executing the system soft restart when the hardware reset is not received within the preset time.
The invention also discloses a multi-level watchdog design device, which comprises:
the thread watchdog unit is used for starting a thread watchdog module embedded in the application program, feeding a process with timing through a first timing task of the thread watchdog module, detecting a feedback signal of the process and the current state of the thread, and restarting the thread according to the current state of the thread;
the process watchdog unit is used for starting a process watchdog program running in a user mode, feeding the process watchdog program to the kernel at regular time through a second timing task of the process watchdog program, detecting a feedback signal of the kernel and the current state of the process, and restarting the process according to the current state of the process;
the kernel watchdog unit is used for starting a kernel watchdog driver running in a kernel mode, feeding the hardware with timing through a third timing task of the kernel watchdog driver, detecting a feedback signal of the hardware and the current state of the kernel, and restarting the kernel according to the current state of the kernel;
and the hardware watchdog unit is used for starting the hardware watchdog circuit, detecting the feeding state of the inner core through the hardware watchdog circuit, and resetting the embedded system according to the feeding state of the inner core.
The invention also discloses a multi-level watchdog design device, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor realizes the multi-level watchdog design method according to any one of the above when executing the computer program.
The invention also discloses a readable storage medium storing a computer program executable by a processor of a device in which the storage medium is located to implement a multi-level watchdog design method as described in any one of the above.
In summary, the multi-level watchdog design method, device, equipment and storage medium provided by the embodiment benefits from multi-level design, and the multi-level watchdog design method can perform hierarchical processing on the exception, from restarting a thread to restarting a process, to restarting a system, and finally to resetting system hardware, so that the exception processing speed can be effectively improved on the premise of ensuring the system safety; the multi-level watchdog design method is used for detecting and responding to hardware, kernel, application process and business thread errors in the embedded system, carrying out fine exception handling aiming at error types, and providing interlocking among the levels of watchdog so as to realize a bidirectional signal transmission function among the levels. Therefore, the problem that in the prior art, the watchdog design monitors the abnormality by taking the process as the minimum unit and lacks of fine abnormality treatment is solved; however, if the exception monitoring is directly refined to the threads, the feeding operation of a large number of threads is very easy to cause the blockage of a single feeding channel, so that the problem of overtime of feeding is caused because the threads are not fed timely.
Drawings
Fig. 1 is a schematic flow chart of a multi-level watchdog design method according to a first aspect of the present invention.
Fig. 2 is a schematic flow chart of a multi-level watchdog design method according to a second aspect of the present invention.
Fig. 3 is a schematic flow diagram of a thread watchdog module of a multi-level watchdog design method according to an embodiment of the present invention.
Fig. 4 is a schematic flow chart of a process watchdog program of a multi-level watchdog design method according to an embodiment of the present invention.
Fig. 5 is a schematic flow diagram of a kernel watchdog driver of a multi-level watchdog design method according to an embodiment of the present invention.
Fig. 6 is a schematic flow diagram of a hardware watchdog circuit of a multi-level watchdog design method according to an embodiment of the present invention.
Fig. 7 is a schematic block diagram of a multi-level watchdog design device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.
Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a first embodiment of the present invention provides a multi-level watchdog design method, which may be executed by a watchdog design device (hereinafter referred to as a watchdog device), and in particular, by one or more processors in the watchdog device, to implement the following steps:
s101, starting a thread watchdog module embedded in an application program, feeding a process with timing by a first timing task of the thread watchdog module, detecting a feedback signal of the process and the current state of the thread, and restarting the thread according to the current state of the thread;
specifically, step S101 includes: after an application program is started, registering a process where the thread watchdog module is located in the process watchdog program, and controlling the thread watchdog module to send a watchdog feeding signal to the process watchdog program;
and when the first ack signal returned by the process watchdog program is not received for more than the preset times, sending an abnormal signal to the thread, informing the thread to save the data and exiting the process.
Generating a daemon thread queue corresponding to the thread watchdog module;
when a thread is started, registering a thread ID of the thread into the daemon thread queue through a message queue, and updating the running time of the thread in the daemon thread queue at regular time;
continuously detecting the running time of the threads in the daemon thread queue;
restarting the thread when judging that the thread is dead or abnormally exited according to the running time of the thread in the daemon thread queue;
and when judging that the restarting invalidation of the thread exceeds the preset times, exiting the process, and sending an abnormal signal to the process watchdog program.
It should be noted that the watchdog device may be a remote management platform, which may implement wired or wireless communication with the intelligent gateway, where the remote management platform may be a smart phone, a smart computer, or other smart devices.
The watchdog driving and procd processes of the linux system on the market are two common watchdog designs, namely, the watchdog driving and procd processes have advantages but are independent of each other, and the processes are used as the minimum unit for monitoring the abnormality, so that the refined exception handling is lacking; however, if the exception monitoring is directly refined to the threads, the feeding operation of a large number of threads is very easy to cause the blockage of a single feeding channel, so that the threads are not fed timely and the feeding time is overtime. The watchdog design has exception transmission, and the exception transmission belongs to unidirectional transmission, so that an upper-level watchdog is regarded as a trusted environment, if the upper level is abnormal, exception processing cannot be executed, and the final hardware reset or system restarting can be waited only; failure of an application or sub-thread to go through a normal exit procedure may cause damage to the external hardware of the control or security problems, while data may be lost or corrupted.
Specifically, in this embodiment, the thread watchdog module feeds dogs based on mutex locks and custom data structure contexts; the thread watchdog module is embedded in an application program, and after the application program is started, the process is registered in the process watchdog program and the thread watchdog module feeds the thread watchdog program to update the self timeout count Cp.
In this embodiment, after the thread watchdog module described below is started, a daemon thread queue is provided, when a thread is started, the tid of the thread, that is, the thread ID, is registered into the queue through the message queue, the running time of the thread in the queue is updated regularly, and an ACK response is obtained, when a unique registration number It is obtained, the running count Ct of the thread in the queue is reset regularly through sending It, and when the thread is normally exited, the thread is deleted from the queue. Meanwhile, the thread watchdog module can continuously detect the self-increasing running count Ct of the thread in the queue and continuously detect the running time of the thread in the queue, when the thread is dead or abnormally exited, the running time of the corresponding thread in the queue stops updating, the self-increasing count Ct value of the corresponding thread in the queue continuously increases, when the thread watchdog module detects that the time stops updating for a certain time or the value of the self-increasing running count Ct exceeds a set value Cto, the thread is tried to be restarted, if the restarting time exceeds Eto, the process is exited, and the abnormality is reported to the process watchdog program.
And when the number of times that the ACK signal cannot be received is larger than the preset value At, the thread sending signal is used for notifying the thread to perform data storage and exit flow so as to cope with the restarting operation of the following hardware watchdog circuit or kernel watchdog program.
S102, starting a process watchdog program running in a user mode, feeding dogs to a kernel at regular time through a second timing task of the process watchdog program, detecting a feedback signal of the kernel and the current state of the process, and restarting the process according to the current state of the process;
specifically, step S102 includes: after a file system is started, controlling the process watchdog program to send a watchdog feeding signal to the kernel watchdog driver;
and when the fact that the second ack signal returned by the kernel watchdog driver is not received exceeds the preset times is judged, sending a SIGING signal to each daemon process, and informing the process to execute a normal exit flow.
Registering a PIPE communication pipeline in a file system, and generating a daemon queue corresponding to the process watchdog program;
when the daemon process is started, the process number of the daemon process is sent and registered into the daemon process queue through the PIPE communication pipeline, and the running time of the daemon process is updated regularly through the PIPE communication pipeline;
continuously detecting the running time of the daemon process in the daemon process queue;
restarting the process when judging that the process is dead or abnormally exited according to the running time of the daemon process in the daemon process queue;
and when judging that the restarting invalidation of the process exceeds the preset times, exiting the process, and sending an abnormal signal to the kernel watchdog driver.
Referring to fig. 4, specifically, in this embodiment, the process watchdog program feeds dogs based on a PIPE communication PIPE function; and the process watchdog program runs in a user mode, and starts to feed the kernel watchdog driver at fixed time after the system is started.
In this embodiment, after the process watchdog program is started, a PIPE communication pipeline is registered in a file system, and a daemon process queue is generated, when the daemon process is started, the daemon process queue of the process watchdog program is sent and registered with its own PID, i.e. the process number, through the pipeline, the running time of the daemon process watchdog program is updated regularly, a unique registration number Ip is obtained, and the self increment count Cp of the daemon process watchdog program is reset and an ACK response is obtained regularly through the pipeline. Meanwhile, the process watchdog continuously detects the process running time and the process running count in a process queue, if the process is dead or abnormally exited, the self-increment count Cp value of a corresponding thread in the queue is continuously increased, when the Cp value exceeds a set value Cpo or the running time of a corresponding process number in the queue is not updated for a certain time, the process is tried to restart, and if the restarting time exceeds Epo, the process is exited, and the abnormality is reported to the kernel watchdog driver.
And when the process watchdog program feeds the kernel watchdog driver each time, an ACK signal is received, if the number of times of the received signal is larger than a preset value Ap, a SIGING signal is sent to each daemon process, and the process is informed to execute a normal exit flow to cope with the subsequent restarting operation of the hardware watchdog circuit.
S103, starting a kernel watchdog driver running in a kernel mode, feeding the hardware with timing through a third timing task of the kernel watchdog driver, detecting a feedback signal of the hardware and the current state of the kernel, and restarting the kernel according to the current state of the kernel;
specifically, step S103 includes: after starting the kernel, controlling the kernel watchdog driver to send a dog feeding signal to the hardware watchdog circuit;
when the fact that the third ack signal returned by the hardware watchdog circuit is not received exceeds the preset times is judged, a SIGING signal is sent to an application layer;
after capturing the SIGING signal, the application program sends out a hardware damage warning prompt.
Generating character equipment corresponding to the kernel watchdog driver, and starting a timeout counter;
the process watchdog program resets the timeout counter at regular time through the character device;
when judging that the process watchdog program is dead or abnormally exits according to the timeout counter, adding 1 to an abnormal count value stored in a power-down holding area;
stopping the reset operation of the hardware watchdog circuit when the abnormal count value is judged to exceed a preset value in a preset time;
and executing the system soft restart when the hardware reset is not received within the preset time.
Referring to fig. 5, specifically, in the present embodiment, the kernel watchdog driver operates a watchdog based on a character device; the kernel watchdog driver runs in a kernel mode, and after the kernel is started, the kernel watchdog driver executes a watchdog feeding operation on the hardware watchdog circuit through an output pin.
In this embodiment, after the kernel watchdog driver is mounted, a character device of/dev/ff_wtd is generated, and a timeout counter is started to perform self-increment count Ck. Wherein the process watchdog program in user mode uses ioctl to periodically reset a timeout counter Ck in the kernel watchdog driver via the character device. When the process watchdog program is dead or abnormally exits, the timeout counter is overtime, the self-increment count kp value is continuously increased, when Ck is larger than a set value Cko, the abnormal count Ek stored in the power-down holding area is increased by 1, if the abnormal count Ek exceeds the set value Eko within the set time, the reset operation of the hardware watchdog circuit is stopped, the hardware is waited for reset, and otherwise, the system is restarted softly.
After the kernel watchdog driver feeds the hardware watchdog, the normal working signal of the hardware watchdog circuit is obtained through the state turnover of the input pin level, if the input pin turnover signal is received, the hardware circuit is represented to be abnormal, at the moment, the kernel watchdog driver sends a SIGIO signal to an application layer, and after the application program captures the signal, a hardware damage warning prompt is made for a user through a mail or a light signal.
And S104, starting a hardware watchdog circuit, detecting the feeding state of the kernel through the hardware watchdog circuit, and resetting the embedded system according to the feeding state of the kernel.
Referring to fig. 6, specifically, in the present embodiment, the hardware watchdog circuit operates a watchdog based on an input-output pin IO; the hardware watchdog circuit uses common watchdog chips such as SP706SEN, etc., a WDI pin is connected to a main control output pin, and a WDO pin is connected to an input pin of the main control chip and a system reset circuit.
In this embodiment, when the hardware watchdog circuit is valid, after receiving the upper and lower edge level signals input by the WDI, the WDO pin outputs the upper and lower edge level signals and feeds them back to the main control chip, which indicates that the circuit and the watchdog feeding signal are valid. After the inner core watchdog drives to abnormally stop feeding the dogs, the WDO continuously outputs a low level to cause system reset after the overtime Th set by the circuit, so that the hardware reset function of the system is realized. When the hardware watchdog circuit is abnormal, the pin is always in a high-level state due to the fact that the WDO pull-up resistor is pulled up, and therefore the inner core watchdog driver can detect that the circuit is abnormal.
In this embodiment, the multi-level watchdog design method performs interlocking among four levels, and the exception signal is transmitted in two directions, and each level has a processing flow aiming at the exception of the upper and lower levels. After the hardware watchdog receives signals sent by the inner core watchdog driver through the output pins of the main control MCU, the hardware watchdog sends level overturning signals to the inner core watchdog driver through the input pins of the main control MCU, so that the working states of the hardware watchdog and the inner core watchdog driver are mutually monitored, two layers can initiate an abnormal processing flow when the other side is abnormal, and the layers of the hardware watchdog circuit and the inner core watchdog driver are interlocked. The inner core watchdog driver receives a feeding signal sent by the process watchdog program through the ioctl control/dev/ff_wtd character device, and simultaneously sends an ACK signal back to the process watchdog program, so that the inner core watchdog driver and the process watchdog program monitor the working state of the other party mutually, two layers can initiate an abnormal processing flow aiming at the other party after the other party is abnormal, and the level interlocking of the inner core watchdog driver and the process watchdog program is realized. The process watchdog program receives a feeding dog signal sent by the thread watchdog module through a naming pipeline/tmp/pwtd_feed, and sends an ACK signal back to the thread watchdog module through the naming pipeline/tmp/pwtd_ack, so that the process watchdog program and the thread watchdog module monitor working states of the other party mutually, two layers can initiate an abnormal processing flow aiming at the other party after the other party is abnormal, and stage interlocking of the process watchdog program and the thread watchdog module is realized.
In summary, the multi-level watchdog design method implements four-level design on threads, processes, kernels and hardware, attempts to use mild means to carry out exception recovery on lower-level exceptions, transfers the exceptions to an upper level only when the exceptions cannot be recovered, and increases the degree of recovery means among each level layer by layer, thereby effectively preventing secondary damage to equipment or data caused by direct use of reset or restart after equipment exceptions; and through abnormal signal summarization, the blockage of a dog feeding channel caused by feeding dogs to a high-level watchdog directly by a low-level application or a thread can be reduced, and the further abnormal phenomenon caused by blockage is avoided to trigger a more aggressive abnormal flow. Meanwhile, the realized four-level design is interlocked in a level manner, so that the two-way transfer of the abnormal information is realized, the application and the thread in the low-level environment sense the high-level abnormality, and the next high-level abnormality processing flow is prepared.
In short, the multi-level watchdog design method provides a level hub concept of abnormal signals, and each level of watchdog is a hub, and abnormal summarization, processing and reporting are performed on the sub-level watchdog. The exception is transmitted and processed in a layering way, so that gradual progression from a rapid and mild exception processing means to a long-time and aggressive processing means is realized, and the problem that exception processing at different levels is mutually independent and the problem that the processing means are not fine enough is solved; and a signal layering summarizing mechanism is adopted, namely, abnormal signals are summarized in a hierarchy, a plurality of low-level anomalies of a low hierarchy are combined into a higher-level abnormal signal to be transmitted to an upper level, so that the data volume of an abnormal processing environment of the high hierarchy is reduced, and the problem of dog feeding channel blockage caused by feeding a large number of threads in a system when the anomalies are guarded to be fine threads is effectively solved. Meanwhile, the multi-level watchdog design method provides multi-level design and refined exception handling, namely provides interlocking operation, realizes bidirectional transmission of exception signals, realizes bidirectional signal transmission between layers, enables application of a low-level environment to be capable of coping with high-level exception problems, and solves secondary problems possibly caused when an aggressive exception handling means is used.
Referring to fig. 7, a second embodiment of the present invention provides a multi-level watchdog design device, comprising:
a thread watchdog unit 201, configured to start a thread watchdog module embedded in an application program, timely feed a process with a first timing task of the thread watchdog module, detect a feedback signal of the process and a current state of the thread, and restart the thread according to the current state of the thread;
a process watchdog unit 202, configured to start a process watchdog program running in a user mode, timely feed a kernel with a second timing task of the process watchdog program, detect a feedback signal of the kernel and a current state of a process, and restart the process according to the current state of the process;
the kernel watchdog unit 203 is configured to start a kernel watchdog driver running in a kernel mode, timely feed the hardware with a third timing task of the kernel watchdog driver, detect a feedback signal of the hardware and a current state of the kernel, and restart the kernel according to the current state of the kernel;
and the hardware watchdog unit 204 is used for starting a hardware watchdog circuit, detecting the feeding state of the inner core through the hardware watchdog circuit, and resetting the embedded system according to the feeding state of the inner core.
A third embodiment of the present invention provides a multi-level watchdog design device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a multi-level watchdog design method as described in any one of the above when executing the computer program.
A fourth embodiment of the present invention provides a readable storage medium storing a computer program executable by a processor of a device in which the storage medium is located to implement a multi-level watchdog design method according to any one of the above.
Illustratively, the computer programs described in the third and fourth embodiments of the present invention may be divided into one or more modules, which are stored in the memory and executed by the processor to complete the present invention. The one or more modules may be a series of computer program instruction segments capable of performing particular functions for describing the execution of the computer program in the implementation of a multi-level watchdog design device. For example, the device described in the second embodiment of the present invention.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is the control center of the one multi-level watchdog design approach, connecting the various interfaces and lines throughout the implementation to the various portions of the one multi-level watchdog design approach.
The memory may be used to store the computer program and/or module, and the processor may implement various functions of a multi-level watchdog design method by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, a text conversion function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, text message data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the modules may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on this understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each method embodiment described above when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention.

Claims (10)

1. A method of multi-level watchdog design, comprising:
starting a thread watchdog module embedded in an application program, feeding a process timing dog through a first timing task of the thread watchdog module, detecting a feedback signal of the process and the current state of the thread, and restarting the thread according to the current state of the thread;
starting a process watchdog program running in a user mode, feeding dogs to the kernel at regular time through a second timing task of the process watchdog program, detecting a feedback signal of the kernel and the current state of the process, and restarting the process according to the current state of the process;
starting a kernel watchdog driver running in a kernel mode, feeding the hardware with timing through a third timing task of the kernel watchdog driver, detecting a feedback signal of the hardware and the current state of the kernel, and restarting the kernel according to the current state of the kernel;
and starting a hardware watchdog circuit, detecting the feeding state of the inner core through the hardware watchdog circuit, and resetting the embedded system according to the feeding state of the inner core.
2. The method for designing the multi-level watchdog according to claim 1, wherein a thread watchdog module embedded in an application program is started, a process is fed with timing by a first timing task of the thread watchdog module, and a feedback signal of the process is detected, specifically:
after an application program is started, registering a process where the thread watchdog module is located in the process watchdog program, and controlling the thread watchdog module to send a watchdog feeding signal to the process watchdog program;
and when the first ack signal returned by the process watchdog program is not received for more than the preset times, sending an abnormal signal to the thread, informing the thread to save the data and exiting the process.
3. The method for designing a multi-level watchdog according to claim 1, wherein detecting a feedback signal of a process and a current state of a thread, and restarting the thread according to the current state of the thread comprises:
generating a daemon thread queue corresponding to the thread watchdog module;
when a thread is started, registering a thread ID of the thread into the daemon thread queue through a message queue, and updating the running time of the thread in the daemon thread queue at regular time;
continuously detecting the running time of the threads in the daemon thread queue;
restarting the thread when judging that the thread is dead or abnormally exited according to the running time of the thread in the daemon thread queue;
and when judging that the restarting invalidation of the thread exceeds the preset times, exiting the process, and sending an abnormal signal to the process watchdog program.
4. The multi-level watchdog design method according to claim 1, wherein a process watchdog program running in a user mode is started, a kernel is fed with timing by a second timing task of the process watchdog program, and a feedback signal of the kernel is detected, specifically:
after a file system is started, controlling the process watchdog program to send a watchdog feeding signal to the kernel watchdog driver;
and when the fact that the second ack signal returned by the kernel watchdog driver is not received exceeds the preset times is judged, sending a SIGING signal to each daemon process, and informing the process to execute a normal exit flow.
5. The method for designing a multi-level watchdog according to claim 1, wherein detecting a feedback signal of a kernel and a current state of a process, and restarting the process according to the current state of the process comprises:
registering a PIPE communication pipeline in a file system, and generating a daemon queue corresponding to the process watchdog program;
when the daemon process is started, the process number of the daemon process is sent and registered into the daemon process queue through the PIPE communication pipeline, and the running time of the daemon process is updated regularly through the PIPE communication pipeline;
continuously detecting the running time of the daemon process in the daemon process queue;
restarting the process when judging that the process is dead or abnormally exited according to the running time of the daemon process in the daemon process queue;
and when judging that the restarting invalidation of the process exceeds the preset times, exiting the process, and sending an abnormal signal to the kernel watchdog driver.
6. The method for designing a multi-level watchdog according to claim 1, wherein a kernel watchdog driver running in a kernel mode is started, a third timing task of the kernel watchdog driver is used for feeding the hardware timing, and a feedback signal of the hardware is detected, specifically:
after starting the kernel, controlling the kernel watchdog driver to send a dog feeding signal to the hardware watchdog circuit;
when the fact that the third ack signal returned by the hardware watchdog circuit is not received exceeds the preset times is judged, a SIGING signal is sent to an application layer;
after capturing the SIGING signal, the application program sends out a hardware damage warning prompt.
7. The method for designing a multi-level watchdog according to claim 1, wherein detecting a feedback signal of hardware and a current state of a core, and restarting the core according to the current state of the core comprises:
generating character equipment corresponding to the kernel watchdog driver, and starting a timeout counter;
the process watchdog program resets the timeout counter at regular time through the character device;
when judging that the process watchdog program is dead or abnormally exits according to the timeout counter, adding 1 to an abnormal count value stored in a power-down holding area;
stopping the reset operation of the hardware watchdog circuit when the abnormal count value is judged to exceed a preset value in a preset time;
and executing the system soft restart when the hardware reset is not received within the preset time.
8. A multi-level watchdog design device, comprising:
the thread watchdog unit is used for starting a thread watchdog module embedded in the application program, feeding a process with timing through a first timing task of the thread watchdog module, detecting a feedback signal of the process and the current state of the thread, and restarting the thread according to the current state of the thread;
the process watchdog unit is used for starting a process watchdog program running in a user mode, feeding the process watchdog program to the kernel at regular time through a second timing task of the process watchdog program, detecting a feedback signal of the kernel and the current state of the process, and restarting the process according to the current state of the process;
the kernel watchdog unit is used for starting a kernel watchdog driver running in a kernel mode, feeding the hardware with timing through a third timing task of the kernel watchdog driver, detecting a feedback signal of the hardware and the current state of the kernel, and restarting the kernel according to the current state of the kernel;
and the hardware watchdog unit is used for starting the hardware watchdog circuit, detecting the feeding state of the inner core through the hardware watchdog circuit, and resetting the embedded system according to the feeding state of the inner core.
9. A multi-level watchdog design device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a multi-level watchdog design method according to any of claims 1 to 7 when executing the computer program.
10. A readable storage medium, characterized in that a computer program is stored, which computer program is executable by a processor of a device in which the storage medium is located, for implementing a multi-level watchdog design method according to any of the claims 1 to 7.
CN202310076336.5A 2023-01-17 2023-01-17 Multi-level watchdog design method, device, equipment and storage medium Pending CN116048861A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310076336.5A CN116048861A (en) 2023-01-17 2023-01-17 Multi-level watchdog design method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310076336.5A CN116048861A (en) 2023-01-17 2023-01-17 Multi-level watchdog design method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116048861A true CN116048861A (en) 2023-05-02

Family

ID=86133040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310076336.5A Pending CN116048861A (en) 2023-01-17 2023-01-17 Multi-level watchdog design method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116048861A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117130832A (en) * 2023-10-25 2023-11-28 南京芯驰半导体科技有限公司 Monitoring reset method and system of multi-core heterogeneous system, chip and electronic equipment
CN118101537A (en) * 2024-04-25 2024-05-28 北京芯驰半导体科技股份有限公司 Gateway monitoring method and device, system-level chip and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117130832A (en) * 2023-10-25 2023-11-28 南京芯驰半导体科技有限公司 Monitoring reset method and system of multi-core heterogeneous system, chip and electronic equipment
CN117130832B (en) * 2023-10-25 2024-02-23 南京芯驰半导体科技有限公司 Monitoring reset method and system of multi-core heterogeneous system, chip and electronic equipment
CN118101537A (en) * 2024-04-25 2024-05-28 北京芯驰半导体科技股份有限公司 Gateway monitoring method and device, system-level chip and electronic equipment
CN118101537B (en) * 2024-04-25 2024-08-27 北京芯驰半导体科技股份有限公司 Gateway monitoring method and device, system-level chip and electronic equipment

Similar Documents

Publication Publication Date Title
CN116048861A (en) Multi-level watchdog design method, device, equipment and storage medium
US20070088988A1 (en) System and method for logging recoverable errors
CN101268447B (en) Computer with software process monitor
US6505298B1 (en) System using an OS inaccessible interrupt handler to reset the OS when a device driver failed to set a register bit indicating OS hang condition
US7434085B2 (en) Architecture for high availability using system management mode driven monitoring and communications
JPH11149433A (en) Defect reporting system using local area network and its method
CN103577237A (en) Application program starting control method and device
CN100383748C (en) Policy-based response to system errors occuring during os runtime
US20240289243A1 (en) Server and control method therefor
CN113535446B (en) Bidirectional process daemon method and system for protecting business data during line access
CN113568707B (en) Computer control method and system for ocean platform based on container technology
US6732359B1 (en) Application process monitor
CN112612635B (en) Multi-level protection method for application program
CN109684117B (en) Processor crash recovery method and device
US20040025007A1 (en) Restricting access to a method in a component
CN115658356A (en) Watchdog feeding method and system in Linux system
JP2001101034A (en) Fault restoring method under inter-different kind of os control
CN112817727A (en) Task management method, system, equipment and storage medium based on micro-service architecture
JP2004086520A (en) Monitoring control device and its method
JPH07261888A (en) Blocking method for data processing and data processor
JP4126849B2 (en) Multi-CPU system monitoring method
CN111708666B (en) Method, system, equipment and medium for starting container log
JP4068277B2 (en) Hardware system
CN113515397B (en) IPMI command processing method, server, and non-transitory computer readable storage medium
TWI413000B (en) Method,system and storage medium for safely interrupting blocked work in a server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination