CN112749038A - Method and system for realizing software watchdog in software system - Google Patents

Method and system for realizing software watchdog in software system Download PDF

Info

Publication number
CN112749038A
CN112749038A CN202110104876.0A CN202110104876A CN112749038A CN 112749038 A CN112749038 A CN 112749038A CN 202110104876 A CN202110104876 A CN 202110104876A CN 112749038 A CN112749038 A CN 112749038A
Authority
CN
China
Prior art keywords
watchdog
service process
linked list
service
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110104876.0A
Other languages
Chinese (zh)
Other versions
CN112749038B (en
Inventor
赵康
瞿洪桂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinonet Science and Technology Co Ltd
Original Assignee
Beijing Sinonet Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinonet Science and Technology Co Ltd filed Critical Beijing Sinonet Science and Technology Co Ltd
Priority to CN202110104876.0A priority Critical patent/CN112749038B/en
Publication of CN112749038A publication Critical patent/CN112749038A/en
Application granted granted Critical
Publication of CN112749038B publication Critical patent/CN112749038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files

Abstract

The invention discloses a method and a system for realizing a software watchdog in a software system, wherein the method comprises the steps of S1, starting a monitoring process, loading a configuration file in the monitoring process, and entering S2; if no configuration file exists, automatically generating a default configuration file and entering S2; loading a configuration file into a memory, and loading all monitored business processes into a process linked list in the memory according to the configuration file in a linked list mode; and the monitoring program starts each service process one by one according to the process linked list, registers watchdog information for each service process after each service process is started, and adds the watchdog information to the watchdog information linked list corresponding to the process linked list one by one. The advantages are that: by detecting the process file state in the virtual file system at regular time without depending on the way of inter-process message communication, the problem of false restart caused by that a service process does not send heartbeat messages to a monitoring process due to busy operation is solved.

Description

Method and system for realizing software watchdog in software system
Technical Field
The invention relates to the technical field of software service monitoring, in particular to a method and a system for realizing a software watchdog in a software system.
Background
Currently, a watchdog system includes a hardware watchdog system and a software watchdog system. The hardware watchdog system generates interruption and restarts the system when the system enters an unrecoverable error, and is mainly applied to an embedded system. The hardware watchdog system has high manufacturing cost and single function, and the system restart will cause the termination of other normal running processes. The software watchdog system is realized in most cases in a mode of utilizing inter-Linux process communication to complete message transmission between a monitoring process and a service process, and each service process sends heartbeat to the monitoring process at regular time to prove that the software watchdog system is in a normal running state. When the monitoring process finds that a heartbeat message is not sent in a certain process for a long time, the monitoring process judges that the process is hung up, and restarts the process to enable the system to be normal. Because a certain process has no time to send a heartbeat message to the monitoring process because the normal operation is very busy, under the condition, the monitoring process can mistakenly think that the program which normally operates abnormally exits, thereby causing unnecessary fault recovery.
Disclosure of Invention
The present invention aims to provide a method and a system for implementing a software watchdog in a software system, so as to solve the aforementioned problems in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a method for implementing a software watchdog in a software system, comprising the steps of,
s1, starting a monitoring process, loading a configuration file by the monitoring process, and entering S2; if no configuration file exists, automatically generating a default configuration file and entering S2;
s2, loading a configuration file into a memory, and loading all monitored business processes into a process linked list in the memory according to the configuration file in a linked list mode; the monitoring program starts each service process one by one according to the process linked list, registers watchdog information for each service process after each service process is started, and adds the watchdog information to the watchdog information linked list corresponding to the process linked list one by one;
s3, traversing the process linked list in a timing cycle manner, clearing or accumulating the watchdog timer of each service process according to the survival state of each service process, restarting the corresponding service process after the timeout time is reached, and registering watchdog information for the corresponding service process again;
s4, according to the watchdog information linked list, circularly traversing each watchdog information, and according to the size relation between the timer in the watchdog information and the overtime time in the watchdog information, determining whether the service process corresponding to the watchdog information is restarted and the restarting times thereof, thereby adopting different coping strategies.
Preferably, the configuration file includes a name of the service process, whether the service process needs to be started, a parameter for starting the service process, and a delay time of the service process.
Preferably, the watchdog information of the service process includes a process name, a process number, a timeout time, a survival flag, and a number of restarts of the service process.
Preferably, step S3 specifically includes the following steps,
s31, judging whether the current business process needs to be started, if not, entering S34; if yes, go to S32;
s32, judging whether the current business process is started, if not, starting the business process after delaying according to the delay time of the configuration file, registering the business process into a watchdog information linked list, and entering S34; if yes, go to S33;
s33, acquiring the survival state according to the process number of the business process, if the survival state is alive, resetting the watchdog timer of the business process, and entering S34; if the survival status is dead, replacing the flag bit of the service process indicating that the service process is started with false, and entering S34;
s34, judging whether the service process is the last service process in the process linked list, if yes, ending the traversal, and starting the traversal of the process linked list next time; if not, the judgment of the next service process in the process linked list is carried out.
Preferably, the specific process of acquiring the survival status of the service process according to the process number of the service process in step S33 is,
s331, acquiring a complete path of a cmdline file and a stat file of a business process in a virtual file system according to a process number of the business process;
s332, reading the stat file of the service process into a memory; judging whether the name of the process in the stat file is the same as the name of the business process and judging whether the process state in the stat file is a zombie state; if the process name in the stat file is the same as the name of the service process and the process state in the stat file is not a zombie state, the step S333 is entered; otherwise, returning the survival state of the business process as dead;
s333, reading the cmdlene file of the business process into a memory; judging whether the cmdline file contains the name string of the business process, if so, returning the survival state of the business process as alive; if not, returning the survival state of the business process as dead.
Preferably, step S4 specifically includes the following steps,
s41, judging whether the watchdog timer of the service process corresponding to the current watchdog information is larger than the overtime time in the current watchdog information, if so, indicating that the service process is restarted, resetting the watchdog timer, adding 1 to the restarting times, and entering S42; if not, the watchdog timer is automatically increased, and the process goes to S42;
s42, judging whether the current watchdog information is the last watchdog information in the watchdog information linked list, if so, ending the traversal, and starting the traversal of the watchdog information linked list next time; if not, the judgment of the next watchdog information in the watchdog information linked list is carried out.
Preferably, in step S41, when the number of restart times of the service process exceeds the preset number of restart times, the system may be restarted to prevent the service process from being restarted all the time and failing, or to stop the service process from being started.
The invention also aims to provide a system for realizing the software watchdog in the software system, which is used for realizing any one of the above methods for realizing the software watchdog, and the system for realizing the software watchdog comprises,
a dynamic configuration module; the system comprises a process chain table, a monitoring process, a process chain table and a memory, wherein the process chain table is used for starting the monitoring process to load a configuration file and loading all monitored service processes into the process chain table in the memory according to the configuration file in a chain table mode; the monitoring program starts each service process one by one according to the process linked list, registers watchdog information for each service process after each service process is started, and adds the watchdog information to the watchdog information linked list corresponding to the process linked list one by one;
a timing detection module; the system is used for regularly and circularly traversing the process linked list, clearing or accumulating the watchdog timer of each service process according to the survival state of each service process, restarting the corresponding service process after the timeout time is reached, and registering watchdog information for the corresponding service process again; according to the watchdog information linked list, circularly traversing each watchdog information, and according to the size relationship between the timer in the watchdog information and the timeout time in the watchdog information, determining whether the service process corresponding to the watchdog information is restarted and the restarting times thereof, thereby adopting different coping strategies;
a process state query module; and the method is used for acquiring the survival state of the business process according to the process number of the business process.
Preferably, the dynamic configuration module includes two interfaces respectively used for dynamically opening or closing monitoring on a certain service process, which are a SetProcessActive interface and a SetProcessInactive interface respectively;
the SetProcessActive interface is used for dynamically activating the service process, and transmitting the name of a certain service process, the starting parameter of the service process and the delay time of the service process; the interface firstly searches whether a service process with the same name as the service process exists in a process linked list, if not, the service process is added into the process linked list, the service process is started in the first circulation after the delay time is reached, and the watchdog information of the service process is registered;
the SetProcessInactive interface is used for dynamically enabling the business process to be separated from monitoring and transmitting the name of a certain business process; the interface searches whether a service process identical to the service process exists in a process linked list and a watchdog information linked list, if so, the corresponding zone bit of the service process is replaced by false, and the survival state of the service process is not detected any more.
Preferably, when the number of restart times of the service process exceeds the preset number of restart times, the system may be restarted to prevent the service process from being failed to restart all the time, or the setprocessinactivve interface of the dynamic configuration module stops the starting of the service process.
The invention has the beneficial effects that: 1. by detecting the process file state in the virtual file system (/ proc) at regular time without depending on the mode of inter-process message communication, the problem of false restart caused by that a service process does not send heartbeat messages to a monitoring process due to busy operation is solved. 2. The problem of error restarting is solved, meanwhile, the system overhead is reduced, and the coupling between programs is reduced.
Drawings
FIG. 1 is a schematic flow chart of a method for implementing a watchdog in an embodiment of the present invention;
FIG. 2 is a schematic flow chart of loop detection of the survival status of each business process in the embodiment of the present invention;
FIG. 3 is a flowchart illustrating an embodiment of obtaining the survival status of a business process according to the process number of the business process;
fig. 4 is a schematic flowchart of a process of circularly detecting watchdog information of each business process in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
Example one
In the embodiment, as shown in fig. 1, there is provided a method for implementing a software watchdog in a software system, comprising the following steps,
s1, starting a monitoring process, loading a configuration file by the monitoring process, and entering S2; if no configuration file exists, automatically generating a default configuration file and entering S2;
s2, loading a configuration file into a memory, and loading all monitored business processes into a process linked list in the memory according to the configuration file in a linked list mode; the monitoring program starts each service process one by one according to the process linked list, registers watchdog information for each service process after each service process is started, and adds the watchdog information to the watchdog information linked list corresponding to the process linked list one by one;
s3, traversing the process linked list in a timing cycle manner, clearing or accumulating the watchdog timer of each service process according to the survival state of each service process, restarting the corresponding service process after the timeout time is reached, and registering watchdog information for the corresponding service process again;
s4, according to the watchdog information linked list, circularly traversing each watchdog information, and according to the size relation between the timer in the watchdog information and the overtime time in the watchdog information, determining whether the service process corresponding to the watchdog information is restarted and the restarting times thereof, thereby adopting different coping strategies.
In this embodiment, the configuration file includes a name of the service process, whether the service process needs to be started, a parameter for starting the service process, and a delay time of the service process.
In this embodiment, assuming that the monitoring process sequentially starts M service processes, after each service process is started, watchdog information needs to be registered for each service process, where each watchdog information includes a process name, a process number (pid), timeout time, a survival flag, and a restart frequency of the service process corresponding to the watchdog information. One business process can be selected from the M business processes as a sub daemon process, and other processes and the sub daemon process are mutually guarded while the monitoring process monitors the business processes, so that the software watchdog can be quickly recovered when the software watchdog is abnormal, and the self reliability of the software watchdog is ensured.
As shown in fig. 2, in this embodiment, step S3 specifically includes the following contents,
s31, judging whether the current service process needs to be started (namely, judging a flag bit active), if not, entering S34; if yes, go to S32;
s32, judging whether the current service process is started (namely judging flag isStarted), if not, after delaying according to the delay time of the configuration file, starting the service process, and registering the service process in the watchdog information linked list; if yes, go to S33;
s33, acquiring the survival state of the business process according to the pid of the business process, and if the survival state is alive, clearing a watchdog timer (timeTick) of the business process; if the survival state is dead, replacing a flag bit (an isStarted flag) which indicates that the service process is started with false;
s34, judging whether the service process is the last service process in the process linked list, if yes, ending the traversal, and starting the traversal of the process linked list next time; if not, the judgment of the next service process in the process linked list is carried out.
As shown in fig. 3, in this embodiment, the specific process of acquiring the survival status of the service process according to the pid of the service process in step S33 is,
s331, acquiring a complete path of a cmdline file and a stat file of a business process in a virtual file system according to a process number of the business process;
s332, reading the stat file of the service process into a memory; judging whether the name of the process in the stat file is the same as the name of the business process and judging whether the process state in the stat file is a zombie state; if the process name in the stat file is the same as the name of the service process and the process state in the stat file is not a zombie state, the step S333 is entered; otherwise, returning the survival state of the business process as dead;
s333, reading the cmdlene file of the business process into a memory; judging whether the cmdline file contains the name string of the business process, if so, returning the survival state of the business process as alive; if not, returning the survival state of the business process as dead.
In this embodiment, the/proc directory on the Linux system is a file system, i.e., a proc file system. Unlike other common file systems,/proc is a pseudo file system (i.e., a virtual file system) that stores a series of special files of the current kernel running state, and under the/proc directory there is a directory name that is consistent with the process PID, and this directory contains all the information of the process. The state information of the process can be obtained by querying the information in the directory. cmdlene — a complete command to start the current process, this file in the bot process directory does not contain any information. Stat- -State information of the current process.
As shown in fig. 4, in this embodiment, step S4 specifically includes the following contents,
s41, judging whether the watchdog timer of the service process corresponding to the current watchdog information is larger than the overtime time in the current watchdog information, if so, indicating that the service process is restarted, resetting the watchdog timer, adding 1 to the restarting times, and entering S42; if not, the watchdog timer is automatically increased, and the process goes to S42;
s42, judging whether the current watchdog information is the last watchdog information in the watchdog information linked list, if so, ending the traversal, and starting the traversal of the watchdog information linked list next time; if not, the judgment of the next watchdog information in the watchdog information linked list is carried out.
In step S41, when the number of times of restarting the service process exceeds the preset number of times of restarting, the system may be restarted to prevent the service process from being restarted all the time and failing, or to stop the starting of the service process. For example: the restart caused by insufficient system memory can be avoided by restarting the virtual file system; the restart of the business process itself, caused by a problem, may be stopped if the business is not critical.
When the number of restart times of a service process exceeds a preset number of restart times, for example, a service process is restarted continuously, and when the service process is restarted continuously ten times (the number of restart times exceeds the preset number of restart times), the service process is restarted all the time for two reasons: 1. if the system is hung up after the business process is started due to insufficient system memory, the system can be restarted to avoid the failure of restarting the business process all the time; 2. a problem with the business process itself, at which point the start of the business process may be stopped if the business process is not a particularly important business process. Moreover, the reason for the process restart is not limited to the two reasons, and different strategies can be adopted for different reasons. The preset restart times can be specifically set according to actual conditions so as to better meet actual requirements.
Example two
In this embodiment, a system for implementing a software watchdog in a software system is provided, where the system for implementing a software watchdog is used to implement the above method for implementing a software watchdog, and the system for implementing a software watchdog includes,
a dynamic configuration module; the system comprises a process chain table, a monitoring process, a process chain table and a memory, wherein the process chain table is used for starting the monitoring process to load a configuration file and loading all monitored service processes into the process chain table in the memory according to the configuration file in a chain table mode; the monitoring program starts each service process one by one according to the process linked list, registers watchdog information for each service process after each service process is started, and adds the watchdog information to the watchdog information linked list corresponding to the process linked list one by one;
a timing detection module; the system is used for regularly and circularly traversing the process linked list, clearing or accumulating the watchdog timer of each service process according to the survival state of each service process, restarting the corresponding service process after the timeout time is reached, and registering watchdog information for the corresponding service process again; according to the watchdog information linked list, circularly traversing each watchdog information, and according to the size relationship between the timer in the watchdog information and the timeout time in the watchdog information, determining whether the service process corresponding to the watchdog information is restarted and the restarting times thereof, thereby adopting different coping strategies;
a process state query module; for obtaining its survival status according to the pid (process number) of the business process.
In this embodiment, the dynamic configuration module includes two interfaces respectively used for dynamically opening or closing monitoring on a certain service process, which are a SetProcessActive interface and a SetProcessInactive interface;
the SetProcessActive interface is used for dynamically activating the service process, and transmitting the name of a certain service process, the starting parameter of the service process and the delay time of the service process; the interface firstly searches whether a service process with the same name as the service process exists in a process linked list, if not, the service process is added into the process linked list, the service process is started in the first circulation after the delay time is reached, and the watchdog information of the service process is registered;
the SetProcessInactive interface is used for dynamically enabling the business process to be separated from monitoring and transmitting the name of a certain business process; the interface searches whether a service process identical to the service process exists in a process linked list and a watchdog information linked list, if so, the corresponding zone bit of the service process is replaced by false, and the survival state of the service process is not detected any more.
In this embodiment, when the number of restart times of a service process exceeds a preset number of restart times, for example, when a service process is restarted continuously for ten times (exceeding the preset number of restart times), there are two reasons why the service process is restarted all the time: 1. if the system is hung up after the business process is started due to insufficient system memory, the system can be restarted to avoid the failure of restarting the business process all the time; 2. the problem of the business process itself, at this time, if the business process is not a particularly important business process, the starting of the business process is stopped through the SetProcessInactive interface of the dynamic configuration module under the condition of not influencing the work of other business modules. Moreover, the reasons for restarting the process are not limited to the two reasons, and different strategies are adopted for different reasons. The preset restart times can be specifically set according to actual conditions so as to better meet actual requirements.
In this embodiment, in many unix computer systems, the process file system comprises a pseudo file system that is dynamically generated at startup for accessing process information through the kernel. The file system is usually mounted to the/proc directory, and since the/proc is not a real file system, it does not occupy storage space and only occupies limited memory. The system for realizing the software watchdog provided by the invention does not need to rely on inter-process message communication (IPC), but obtains the survival state of the process with the process number being pid by reading/proc/[ pid ]/state files under the directory. The system solves the problem of false restart, reduces the system overhead and reduces the coupling between programs.
In this embodiment, an sdkdevds process is taken as an example to describe, sdkdevds is a process for acquiring device capability, and when the sdkdevds process is started, a directory having the same number as the sdkdevds process exists in the/proc/directory; when detecting that sdkdevds is hung up in a certain time, the session monitoring process restarts sdkdevds and registers the sdkdevds in the watchdog information linked list.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the invention provides a method and a system for realizing a software watchdog in a software system, which solve the problem of false restart caused by that a business process does not send heartbeat messages to a monitoring process due to busy operation by detecting the state of a process file in a virtual file system (/ proc) at regular time without depending on a mode of inter-process message communication. The problem of error restarting is solved, meanwhile, the system overhead is reduced, and the coupling between programs is reduced.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims (10)

1. A method for realizing a software watchdog in a software system is characterized in that: comprises the following steps of (a) carrying out,
s1, starting a monitoring process, loading a configuration file by the monitoring process, and entering S2; if no configuration file exists, automatically generating a default configuration file and entering S2;
s2, loading a configuration file into a memory, and loading all monitored business processes into a process linked list in the memory according to the configuration file in a linked list mode; the monitoring program starts each service process one by one according to the process linked list, registers watchdog information for each service process after each service process is started, and adds the watchdog information to the watchdog information linked list corresponding to the process linked list one by one;
s3, traversing the process linked list in a timing cycle manner, clearing or accumulating the watchdog timer of each service process according to the survival state of each service process, restarting the corresponding service process after the timeout time is reached, and registering watchdog information for the corresponding service process again;
s4, according to the watchdog information linked list, circularly traversing each watchdog information, and according to the size relation between the timer in the watchdog information and the overtime time in the watchdog information, determining whether the service process corresponding to the watchdog information is restarted and the restarting times thereof, thereby adopting different coping strategies.
2. The method of claim 1 for implementing a software watchdog in a software system, wherein: the configuration file comprises the name of the business process, whether the business process needs to be started or not, the starting parameter of the business process and the delay time of the business process.
3. The method of claim 2 for implementing a software watchdog in a software system, wherein: the watchdog information of the business process comprises a process name, a process number, timeout time, a survival flag and restart times of the business process.
4. A method for implementing a software watchdog in a software system according to claim 3, characterized in that: the step S3 specifically includes the following contents,
s31, judging whether the current business process needs to be started, if not, entering S34; if yes, go to S32;
s32, judging whether the current business process is started, if not, starting the business process after delaying according to the delay time of the configuration file, registering the business process into a watchdog information linked list, and entering S34; if yes, go to S33;
s33, acquiring the survival state according to the process number of the business process, if the survival state is alive, resetting the watchdog timer of the business process, and entering S34; if the survival status is dead, replacing the flag bit of the service process indicating that the service process is started with false, and entering S34;
s34, judging whether the service process is the last service process in the process linked list, if yes, ending the traversal, and starting the traversal of the process linked list next time; if not, the judgment of the next service process in the process linked list is carried out.
5. The method of claim 4 for implementing a software watchdog in a software system, wherein: the specific process of acquiring the survival status according to the process number of the service process in step S33 is,
s331, acquiring a complete path of a cmdline file and a stat file of a business process in a virtual file system according to a process number of the business process;
s332, reading the stat file of the service process into a memory; judging whether the name of the process in the stat file is the same as the name of the business process and judging whether the process state in the stat file is a zombie state; if the process name in the stat file is the same as the name of the service process and the process state in the stat file is not a zombie state, the step S333 is entered; otherwise, returning the survival state of the business process as dead;
s333, reading the cmdlene file of the business process into a memory; judging whether the cmdl ine file contains the name string of the business process, if so, returning the survival state of the business process to be alive; if not, returning the survival state of the business process as dead.
6. The method of claim 5 for implementing a software watchdog in a software system, wherein: the step S4 specifically includes the following contents,
s41, judging whether the watchdog timer of the service process corresponding to the current watchdog information is larger than the overtime time in the current watchdog information, if so, indicating that the service process is restarted, resetting the watchdog timer, adding 1 to the restarting times, and entering S42; if not, the watchdog timer is automatically increased, and the process goes to S42;
s42, judging whether the current watchdog information is the last watchdog information in the watchdog information linked list, if so, ending the traversal, and starting the traversal of the watchdog information linked list next time; if not, the judgment of the next watchdog information in the watchdog information linked list is carried out.
7. The method of claim 6 for implementing a software watchdog in a software system, wherein: in step S41, when the number of times of restarting the service process exceeds the preset number of times of restarting, the system may be restarted to prevent the service process from being restarted all the time and failing, or to stop the starting of the service process.
8. A system for implementing a software watchdog in a software system, comprising: system for implementing a software watchdog for implementing a method for implementing a software watchdog according to any one of the preceding claims 1 to 7, the system for implementing a software watchdog comprising,
a dynamic configuration module; the system comprises a process chain table, a monitoring process, a process chain table and a memory, wherein the process chain table is used for starting the monitoring process to load a configuration file and loading all monitored service processes into the process chain table in the memory according to the configuration file in a chain table mode; the monitoring program starts each service process one by one according to the process linked list, registers watchdog information for each service process after each service process is started, and adds the watchdog information to the watchdog information linked list corresponding to the process linked list one by one;
a timing detection module; the system is used for regularly and circularly traversing the process linked list, clearing or accumulating the watchdog timer of each service process according to the survival state of each service process, restarting the corresponding service process after the timeout time is reached, and registering watchdog information for the corresponding service process again; according to the watchdog information linked list, circularly traversing each watchdog information, and according to the size relationship between the timer in the watchdog information and the timeout time in the watchdog information, determining whether the service process corresponding to the watchdog information is restarted and the restarting times thereof, thereby adopting different coping strategies;
a process state query module; and the method is used for acquiring the survival state of the business process according to the process number of the business process.
9. The system of claim 8, wherein the software watchdog is implemented in a software system comprising: the dynamic configuration module comprises two interfaces which are respectively used for dynamically starting or closing monitoring on a certain service process, namely a SetProcessActive interface and a SetProcessInactive interface;
the SetProcessActive interface is used for dynamically activating the service process, and transmitting the name of a certain service process, the starting parameter of the service process and the delay time of the service process; the interface firstly searches whether a service process with the same name as the service process exists in a process linked list, if not, the service process is added into the process linked list, the service process is started in the first circulation after the delay time is reached, and the watchdog information of the service process is registered;
the SetProcessInactive interface is used for dynamically enabling the business process to be separated from monitoring and transmitting the name of a certain business process; the interface searches whether a service process identical to the service process exists in a process linked list and a watchdog information linked list, if so, the corresponding zone bit of the service process is replaced by false, and the survival state of the service process is not detected any more.
10. The system of claim 8, wherein the software watchdog is implemented in a software system comprising: when the restart times of the business process exceed the preset restart times, the system can be restarted to avoid the failure of restarting the business process all the time, or the startup of the business process is stopped through a SetProcessInactive interface of the dynamic configuration module.
CN202110104876.0A 2021-01-26 2021-01-26 Method and system for realizing software watchdog in software system Active CN112749038B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110104876.0A CN112749038B (en) 2021-01-26 2021-01-26 Method and system for realizing software watchdog in software system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110104876.0A CN112749038B (en) 2021-01-26 2021-01-26 Method and system for realizing software watchdog in software system

Publications (2)

Publication Number Publication Date
CN112749038A true CN112749038A (en) 2021-05-04
CN112749038B CN112749038B (en) 2023-03-10

Family

ID=75653163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110104876.0A Active CN112749038B (en) 2021-01-26 2021-01-26 Method and system for realizing software watchdog in software system

Country Status (1)

Country Link
CN (1) CN112749038B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113645104A (en) * 2021-10-19 2021-11-12 北京国科天迅科技有限公司 FC switch bus configuration management software monitoring method and device
CN116841622A (en) * 2023-09-01 2023-10-03 上海燧原智能科技有限公司 Address self-increasing memory instruction generation method, device, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070220375A1 (en) * 2006-02-24 2007-09-20 Symbol Technologies, Inc. Methods and apparatus for a software process monitor
CN101178662A (en) * 2006-11-08 2008-05-14 中兴通讯股份有限公司 Monitoring method of embedded LINUX applications progress
CN101630262A (en) * 2009-07-17 2010-01-20 北京数帅科技有限公司 Method for monitoring and controlling subprocess based on Linux system
CN103034552A (en) * 2012-12-11 2013-04-10 太仓市同维电子有限公司 Method for implementing software watchdog in software system
CN105677501A (en) * 2016-01-07 2016-06-15 烽火通信科技股份有限公司 Refined process monitoring method and system based on watchdog in Linux system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070220375A1 (en) * 2006-02-24 2007-09-20 Symbol Technologies, Inc. Methods and apparatus for a software process monitor
CN101178662A (en) * 2006-11-08 2008-05-14 中兴通讯股份有限公司 Monitoring method of embedded LINUX applications progress
CN101630262A (en) * 2009-07-17 2010-01-20 北京数帅科技有限公司 Method for monitoring and controlling subprocess based on Linux system
CN103034552A (en) * 2012-12-11 2013-04-10 太仓市同维电子有限公司 Method for implementing software watchdog in software system
CN105677501A (en) * 2016-01-07 2016-06-15 烽火通信科技股份有限公司 Refined process monitoring method and system based on watchdog in Linux system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陆倩 等: "《一种提高嵌入式系统可靠性的自监测技术》", 《系统仿真学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113645104A (en) * 2021-10-19 2021-11-12 北京国科天迅科技有限公司 FC switch bus configuration management software monitoring method and device
CN116841622A (en) * 2023-09-01 2023-10-03 上海燧原智能科技有限公司 Address self-increasing memory instruction generation method, device, equipment and medium
CN116841622B (en) * 2023-09-01 2023-11-24 上海燧原智能科技有限公司 Address self-increasing memory instruction generation method, device, equipment and medium

Also Published As

Publication number Publication date
CN112749038B (en) 2023-03-10

Similar Documents

Publication Publication Date Title
US10095576B2 (en) Anomaly recovery method for virtual machine in distributed environment
US6697972B1 (en) Method for monitoring fault of operating system and application program
US6195760B1 (en) Method and apparatus for providing failure detection and recovery with predetermined degree of replication for distributed applications in a network
US6266781B1 (en) Method and apparatus for providing failure detection and recovery with predetermined replication style for distributed applications in a network
US7243267B2 (en) Automatic failure detection and recovery of applications
US7802128B2 (en) Method to avoid continuous application failovers in a cluster
CN112749038B (en) Method and system for realizing software watchdog in software system
CN102360324B (en) Failure recovery method and equipment for failure recovery
US10924326B2 (en) Method and system for clustered real-time correlation of trace data fragments describing distributed transaction executions
US7856639B2 (en) Monitoring and controlling applications executing in a computing node
US7219264B2 (en) Methods and systems for preserving dynamic random access memory contents responsive to hung processor condition
US7162714B2 (en) Software-based watchdog method and apparatus
US20100325642A1 (en) Automatically re-starting services
US20150019671A1 (en) Information processing system, trouble detecting method, and information processing apparatus
US20090138757A1 (en) Failure recovery method in cluster system
CN114816022B (en) Method, system and storage medium for monitoring server power supply abnormality
CN112783618A (en) Task scheduling monitoring system, computer equipment and storage medium
CN115904793B (en) Memory transfer method, system and chip based on multi-core heterogeneous system
CN112367386B (en) Ignite-based automatic operation and maintenance method and device and computer equipment
CN113157493A (en) Backup method, device and system based on ticket checking system and computer equipment
CN117130832B (en) Monitoring reset method and system of multi-core heterogeneous system, chip and electronic equipment
JP6368842B2 (en) Process monitoring program and process monitoring system
CN117632564A (en) Global health management method and system based on container and object model operating system
CN117873695A (en) Method, device, equipment and medium for processing equipment state change
JP6309711B2 (en) Process monitoring program and process monitoring system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant