CN113901443A - Daemon process fault detection method and device, storage medium and electronic equipment - Google Patents

Daemon process fault detection method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN113901443A
CN113901443A CN202111175438.XA CN202111175438A CN113901443A CN 113901443 A CN113901443 A CN 113901443A CN 202111175438 A CN202111175438 A CN 202111175438A CN 113901443 A CN113901443 A CN 113901443A
Authority
CN
China
Prior art keywords
daemon
recovery
strategy
daemon process
debugging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111175438.XA
Other languages
Chinese (zh)
Inventor
李拓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qax Technology Group Inc
Secworld Information Technology Beijing Co Ltd
Original Assignee
Qax Technology Group Inc
Secworld Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qax Technology Group Inc, Secworld Information Technology Beijing Co Ltd filed Critical Qax Technology Group Inc
Publication of CN113901443A publication Critical patent/CN113901443A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1417Boot up procedures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a daemon fault detection method and device, a storage medium and electronic equipment, and a technical thought for performing fault detection on a daemon process is provided by researching the operating characteristics of the daemon process on a file descriptor after the daemon process fails, namely, automatically closing the file descriptor by utilizing the operating characteristics of the file descriptor after the daemon process fails, so that the fault can be sensed at the first time when the daemon process fails, the defects of the prior art in the condition that polling cycle durations are different are overcome, recovery operation is performed according to a recovery strategy, and the problem that system functions are unavailable after the daemon process is recovered is solved.

Description

Daemon process fault detection method and device, storage medium and electronic equipment
The present application claims priority of chinese patent application entitled "daemon process fault detection method and apparatus, storage medium, and electronic device" filed 24/09.24/2021 with chinese patent office, application number 202111121780.1, the entire contents of which are incorporated herein by reference.
Technical Field
The present application relates to the field of network security technologies, and in particular, to a daemon fault detection method and apparatus, a storage medium, and an electronic device.
Background
Network security equipment such as a firewall belongs to embedded equipment, and the core function of the embedded equipment is provided by a daemon process. The daemon process is a special process which runs in the background, is started when the system is booted, and runs until the system is closed, and is used for executing specific system tasks. Once the daemon process fails, the normal operation of the functions of the embedded system is affected, and therefore fault detection needs to be performed on the daemon process.
In the prior art, the state of the daemon process is periodically polled by starting a monitoring script, and whether the daemon process normally runs is judged, so that fault detection of the daemon process is realized. However, when the polling period is long, the fault cannot be detected at the first time when the daemon process fails, so that the functions of the embedded system are in an unavailable state for a long time, and when the polling period is short, a large amount of polling operations consume a large amount of CPU resources, thereby affecting the operating efficiency of the system.
Disclosure of Invention
The application provides a daemon fault detection method and device, a storage medium and electronic equipment, and aims to overcome the defects of the prior art in the case that polling periods are different in duration.
In order to achieve the above object, the present application provides the following technical solutions:
a daemon process fault detection method is applied to an embedded system, and comprises the following steps:
after the daemon process is started, starting a pre-established process monitoring file;
acquiring a file descriptor of the process monitoring file, and writing the file descriptor into the daemon process;
if a closing instruction for closing the file descriptor is received, determining that the daemon process breaks down;
determining a target debugging strategy and a target recovery strategy from each pre-stored debugging strategy and recovery strategy, and executing debugging operation corresponding to the target debugging strategy;
closing the file descriptor when the debugging operation is completed;
and closing the daemon process, and executing recovery operation corresponding to the target recovery strategy.
Optionally, the method for determining a target debugging policy and a target recovery policy from the pre-stored debugging policies and recovery policies includes:
acquiring a process identifier of the daemon process;
determining a debugging strategy and a recovery strategy corresponding to the process identifier from each pre-stored debugging strategy and recovery strategy;
and determining the debugging strategy corresponding to the process identifier as a target debugging strategy, and determining the recovery strategy corresponding to the process identifier as a target recovery strategy.
The foregoing method, optionally, includes the storage process of each recovery policy, including:
acquiring a configuration file of each daemon process in the embedded system;
constructing a process dependency relationship graph based on each configuration file; the process dependency relationship graph is used for indicating the dependency relationship among all the daemon processes;
determining a recovery strategy of each daemon process based on each configuration file and the process dependency relationship graph;
the recovery policy for each daemon process is stored.
Optionally, the determining a recovery policy of each daemon process based on each configuration file and the process dependency graph includes:
executing a first operation on each node with the degree of income of 0 in the process dependency relationship graph; the first operation includes: if the recovery strategy configuration information in the configuration file corresponding to the node is a restart process or a restart system, determining the recovery strategy configuration information as a recovery strategy of a daemon process corresponding to the node; if the recovery strategy configuration information in the configuration file corresponding to the node is not the restart process or the restart system, traversing the process dependency relationship graph by using the node as an initial node through a preset traversal strategy to obtain the restart sequence of the daemon process and the daemon process having a dependency relationship with the daemon process, and determining the restart sequence and the dependency relationship and recovery strategy configuration information included in the configuration file as the recovery strategy of the daemon process corresponding to the node; wherein, the nodes in the process dependency relationship graph are used for representing daemon processes;
deleting the node with the degree of 0 in the process dependency relationship graph to obtain an updated process dependency relationship graph, and returning to execute the step of executing the first operation on each node with the degree of 0 in the process dependency relationship graph based on the updated process dependency relationship graph until no node exists in the process dependency relationship graph.
Optionally, the executing the recovery operation corresponding to the target recovery policy includes:
restarting the embedded system, or restarting the daemon process and the daemon process which has a dependency relationship with the daemon process based on a restart sequence in the recovery strategy.
Optionally, the executing the debugging operation corresponding to the target debugging policy in the foregoing method includes:
if the target debugging strategy is to acquire a call stack or acquire memory dump information, acquiring process fault information of the daemon process;
and analyzing the process fault information to obtain call stack or memory dump information included in the process fault information.
Optionally, the method for starting the pre-created process monitoring file includes:
performing initialization operation on an initialization function in a link library linked in advance by the daemon process; and calling a preset first function to start a pre-created process monitoring file in the process of executing initialization operation on the initialization function.
Optionally, the method for monitoring the file creation process by the process includes:
determining the creation position of a process monitoring file to be created according to a preset creation path;
and creating a process monitoring file in the creating position.
A daemon process fault detection device is applied to an embedded system, and comprises:
the starting unit is used for starting the pre-established process monitoring file after the daemon process is started;
the first acquisition unit is used for acquiring a file descriptor of the process monitoring file and writing the file descriptor into the daemon process;
the first determining unit is used for determining that the daemon process breaks down if a closing instruction for closing the file descriptor is received;
the second acquisition unit is used for determining a target debugging strategy and a target recovery strategy from each pre-stored debugging strategy and recovery strategy and executing debugging operation corresponding to the target debugging strategy;
a first closing unit, configured to close the file descriptor when the execution of the debugging operation is completed;
and the second closing unit is used for closing the daemon process and executing the recovery operation corresponding to the recovery strategy.
A daemon fault detection system comprising:
the system comprises a daemon process, a fault detection module and a fault recovery module;
the fault detection module is used for starting a pre-established process monitoring file after the daemon process is started, acquiring a file descriptor of the process monitoring file, writing the file descriptor into the daemon process, determining that the daemon process has a fault if a closing instruction for closing the file descriptor is received, acquiring a process identifier of the daemon process, sending the process identifier to the fault recovery module, and closing the daemon process after a notification message sent by the fault recovery module is received;
the fault recovery module is configured to receive a process identifier sent by the fault detection module, determine a target debugging policy and a target recovery policy from pre-stored debugging policies and recovery policies based on the process identifier, execute a debugging operation corresponding to the target debugging policy, send a notification message to the fault detection module when the debugging operation is completed, and execute a recovery operation corresponding to the target recovery policy after it is monitored that the daemon process is closed; wherein the notification message is used to notify that a debug operation has been completed.
The above system, optionally, further includes:
a recovery strategy refinement module and a recovery strategy database;
the recovery strategy refining module is used for acquiring configuration files of all daemon processes in the embedded system, constructing a process dependency relationship graph based on the configuration files, determining a recovery strategy of each daemon process based on the configuration files and the process dependency relationship graph, and storing the recovery strategy of each daemon process to the recovery strategy database, wherein the process dependency relationship graph is used for indicating the dependency relationship among all daemon processes.
A storage medium storing a set of instructions, wherein the set of instructions, when executed by a processor, implements a daemon fault detection method as described above.
An electronic device, comprising:
a memory for storing at least one set of instructions;
and the processor is used for executing the instruction set stored in the memory and realizing the daemon process fault detection method by executing the instruction set.
Compared with the prior art, the method has the following advantages:
the application provides a daemon fault detection method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: the method comprises the steps of presetting a process monitoring file, starting the preset process monitoring file after a daemon process is started, obtaining a file descriptor of the process monitoring file, writing the file descriptor into the daemon process, determining that the daemon process breaks down if a closing instruction for closing the file descriptor is received, determining a target debugging strategy and a target recovery strategy from each pre-stored debugging strategy and recovery strategy, executing debugging operation corresponding to the debugging strategy, closing the file descriptor under the condition that the debugging operation is finished, closing the daemon process, and executing the recovery operation corresponding to the recovery strategy. Therefore, according to the technical scheme, by researching the operating characteristics of the daemon process on the file descriptor after the daemon process fails, the technical idea of automatically closing the file descriptor to detect the fault of the daemon process by utilizing the operating characteristics of the daemon process on the file descriptor after the daemon process fails is provided, so that the fault can be sensed at the first time when the daemon process fails, the defects of the prior art in the condition that the polling periods are different in duration are overcome, the recovery operation is executed according to the recovery strategy, and the problem that the system function is unavailable after the daemon process is recovered is solved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for detecting a daemon process fault according to the present application;
FIG. 2 is a flowchart of another method of detecting a daemon process fault according to the present application;
FIG. 3 is a flowchart of another method of a daemon process fault detection method according to the present application;
fig. 4 is a schematic structural diagram of a daemon fault detection system provided in the present application;
fig. 5 is a schematic structural diagram of a daemon fault detection system provided in the present application;
FIG. 6 is a diagram illustrating an exemplary method for detecting a daemon process fault according to the present disclosure;
FIG. 7 is a diagram illustrating another example of a daemon fault detection method according to the present application;
fig. 8 is a schematic structural diagram of a daemon fault detection apparatus provided in the present application;
fig. 9 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the disclosure of the present application are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in the disclosure herein are exemplary rather than limiting, and those skilled in the art will understand that "one or more" will be understood unless the context clearly dictates otherwise.
The application is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor apparatus, distributed computing environments that include any of the above devices or equipment, and the like.
An embodiment of the present application provides a daemon process fault detection method, where an execution main body of the method may be a processor of an embedded system, and referring to fig. 1, a method flowchart of the daemon process fault detection method is shown in fig. 1, and specifically includes:
s101, after the daemon process is started, a pre-established process monitoring file is started.
In this embodiment, the embedded system includes a failure detection module and a failure recovery module, where the failure detection module runs in a kernel of the embedded system and is a virtual promiscuous device (misc device) driver.
In this embodiment, the process monitoring file is created in advance, specifically, a creation position of the process monitoring file to be created is determined according to a preset creation path, and the process monitoring file is created in the creation position. The preset creation path may be a root directory of the embedded system, that is, the process monitoring file is created under the root directory of the embedded system, for example, the creation path of the process monitoring file may be/dev/proc _ monitor, where the/dev/proc _ monitor indicates that the process monitoring file with the file name proc _ monitor is created under the root directory of the embedded system.
In this embodiment, the daemon process links a preset link library in advance, where the link library includes a dynamic library or a static library, and the link library includes an initialization function.
In this embodiment, the process of starting the pre-created process monitoring file specifically includes: after the daemon process is started, initializing functions in a link library linked with the daemon process, and calling a preset first function to start a preset process monitoring file in the process of initializing the initializing functions, wherein the preset first function can be an open function.
It should be noted that the attribute constructor of the open function may be __ attribute __ ((constructor)), that is, __ attribute __ ((constructor)) is used to modify the open function, so as to ensure that the open function runs before main (main function) function executes.
It should be noted that, after the preset process monitoring file is started, the preset second function is called, where the preset second function may be a read function, and the preset second function is a blocking call until the process identifier of the daemon process is received.
S102, obtaining a file descriptor of the process monitoring file, and writing the file descriptor into a daemon process.
In this embodiment, a file descriptor of a process monitoring file is obtained, where the file descriptor is a descriptor of the process monitoring file generated by an embedded system kernel after the process monitoring file is started.
In this embodiment, the file descriptor is written into the daemon process, and specifically, the file descriptor is stored in the memory of the daemon process.
S103, judging whether a closing instruction for closing the file descriptor is received, if so, executing S104, and if not, executing S103.
In this embodiment, when the daemon process is about to exit after a fault occurs, the embedded system may automatically close each file descriptor opened by the daemon process, including but not limited to the file descriptor of the process monitoring file.
In this embodiment, whether a closing instruction for closing a file descriptor of a process monitoring file is received is sensed in real time.
And S104, determining that the daemon process fails.
In this embodiment, the inventor finds, through research, that the daemon process automatically closes the file descriptor after a failure occurs, and based on this, when a closing instruction for closing the file descriptor is received, it can be determined that the daemon process has a failure.
S105, determining a target debugging strategy and a target recovery strategy from the pre-stored debugging strategies and recovery strategies.
In this embodiment, the debugging policy and the recovery policy corresponding to each daemon process in the embedded system are pre-stored, and the target debugging policy and the target recovery policy are determined from the pre-stored debugging policies and recovery policies.
And S106, executing debugging operation corresponding to the target debugging strategy.
In this embodiment, a debugging operation corresponding to the debugging policy is performed, that is, no debugging is performed, or a call stack is acquired, or memory dump information is acquired. If the debugging strategy is not debugging, debugging operation is not performed, and if the debugging strategy is to acquire a call stack or acquire memory dump information, process fault information of a daemon process is acquired, and the process fault information is analyzed to obtain the call stack or the memory dump information included in the process fault information.
S107, closing the file descriptor when the debugging operation is completed.
In this embodiment, after the completion of the debugging operation, the fault recovery module sends notification information to the fault detection module through a preset third function, where the notification information is used to notify that the debugging operation has been completed. Wherein the preset third function is a write function.
In this embodiment, the release function performs function return when the debugging operation is completed, and closes the file descriptor of the process monitoring file after the function return of the release function is received.
It should be noted that after all the file descriptors opened by the daemon process that has failed are closed, the daemon process exits, that is, the daemon process is closed.
And S108, closing the daemon process, and executing the recovery operation corresponding to the target recovery strategy.
In this embodiment, after the file description is closed, when all the file descriptors opened by the daemon process that has failed are closed, the daemon process exits, that is, the daemon process is closed, and a debugging operation corresponding to the debugging policy is performed, that is, the embedded system is restarted, or the daemon process and the daemon process having a dependency relationship with the daemon process are restarted based on a restart sequence in the recovery policy.
Specifically, if the recovery policy is a restart system, the embedded system is restarted, if the recovery policy is a restart process, the daemon process is restarted, and if the recovery policy is the restart daemon process and the daemon process that has a dependency relationship with the daemon process, the daemon process and the daemon process that has a dependency relationship with the daemon process are restarted based on the restart sequence in the recovery policy.
According to the daemon fault detection method provided by the embodiment of the application, by researching the operation characteristics of the daemon process on the file descriptor after the daemon process breaks down, the technical idea of utilizing the operation characteristics of the daemon process on the file descriptor after the daemon process breaks down, namely automatically closing the file descriptor, to detect the fault of the daemon process is provided, so that the fault can be sensed at the first time when the daemon process breaks down, and the defects of the prior art in the condition that the polling cycle duration is different are overcome. And according to a recovery strategy, fault recovery is carried out, including restarting the embedded system, or restarting the daemon process and the daemon process which has a dependency relationship with the daemon process, so that the problem that in the prior art, only the crash process is restarted, the dependency relationship among the processes is not considered, and the system function is possibly abnormal after the process is restarted is solved.
Referring to fig. 2, the daemon process fault detection method provided in the embodiment of the present application, in step S105, specifically includes the following steps:
s201, acquiring a process identifier of the daemon process.
In this embodiment, when receiving a file description of a process monitoring file to be closed, the embedded system calls a preset function to obtain a process identifier of the daemon process, where the preset function may be a release function.
Optionally, the process identifier may also be sent to a read function, and the read function returns the function after receiving the process identifier, that is, returns the process identifier to the fault recovery module embedded in the system.
S202, determining a debugging strategy and a recovery strategy corresponding to the process identifier from the pre-stored debugging strategies and recovery strategies.
In this embodiment, a corresponding configuration file is configured for each daemon process in advance, where the configuration file at least includes recovery policy configuration information, a debugging policy, and a dependency relationship of the daemon process on other daemon processes; the debugging strategy comprises but is not limited to not debugging, obtaining a call stack or obtaining memory dump information; the recovery policy configuration information is shown in the policy table of table 1:
Figure BDA0003294871620000091
Figure BDA0003294871620000101
TABLE 1 policy Table
In this embodiment, the recovery policy configuration information includes, but is not limited to, the restart system, the restart process, or the restart daemon itself and all other daemon processes directly or indirectly relying on the daemon process.
In this embodiment, the pre-storing of the debugging policy and the recovery policy of each daemon process included in the embedded system, where the storing process of each debugging policy specifically includes: and determining the debugging strategy of each daemon process based on the configuration file of each daemon process in the embedded system, and storing the debugging strategy of each daemon process.
Referring to fig. 3, the storage process of each recovery policy specifically includes the following steps:
s301, obtaining a configuration file of each daemon process in the embedded system.
S302, constructing a process dependency relationship graph based on the configuration files.
In this embodiment, a process dependency relationship diagram is constructed based on the dependency relationship of the daemon process included in each configuration file on other daemon processes, where the process dependency relationship diagram is used to indicate the dependency relationship between the daemon processes. It should be noted that the constructed process dependency graph is a directed graph, a vertex represents a daemon process, and a directed edge represents a dependency relationship of a process, for example, when a daemon process B depends on a daemon process a, a vertex a- > B exists in the process dependency relationship graph.
S303, determining the recovery strategy of each daemon process based on each configuration file and the process dependency relationship graph.
In this embodiment, the specific process of determining the recovery policy of each daemon process based on each configuration file and the process dependency graph includes the following steps:
executing a first operation on each node with the degree of 0 in the process dependency relationship graph; the first operation includes: if the recovery strategy configuration information in the configuration file corresponding to the node is a restart process or a restart system, determining the recovery strategy configuration information as a recovery strategy of a daemon process corresponding to the node; if the recovery strategy configuration information in the configuration file corresponding to the node is not the restart process and the restart system, traversing a process dependency relationship graph by using the node as an initial node through a preset traversal strategy to obtain a restart sequence of the daemon process and the daemon process having a dependency relationship with the daemon process, and determining the restart sequence and the dependency relationship and recovery strategy configuration information included in the configuration file as the recovery strategy of the daemon process corresponding to the node; the nodes in the process dependency relationship graph are used for representing daemon processes;
deleting the node with the degree of 0 in the process dependency relationship graph to obtain an updated process dependency relationship graph, and returning to the step of executing the first operation on each node with the degree of 0 in the process dependency relationship graph based on the updated process dependency relationship graph until no node exists in the process dependency relationship graph.
In this embodiment, a first operation is performed on each node with an intro degree of 0 in the process dependency graph, where the first operation includes: if the recovery strategy configuration information corresponding to the node is a restart process or a restart system, directly determining the recovery strategy configuration information as a recovery strategy of a daemon corresponding to the node, if the recovery strategy configuration information corresponding to the node is not the restart process or the restart system, traversing a process dependency relationship graph by using the node as an initial node through a preset traversal strategy, optionally, determining a traversal sequence as the daemon and a daemon sequence with a dependency relationship with the daemon, and determining the restart sequence, the restart dependency relationship and the recovery strategy configuration information as recovery strategies of the daemon corresponding to the node.
In this embodiment, the node with the degree of 0 in the process dependency relationship graph is deleted to obtain a new process dependency relationship graph, and the first operation is performed on each node in the new process dependency relationship graph until no node exists in the process dependency relationship graph, that is, until all nodes in the process dependency relationship graph are deleted.
S304, storing the recovery strategy of each daemon process.
In this embodiment, the recovery policy of each daemon process is stored, and optionally, the recovery policy of each daemon process may be stored in a process fault recovery policy database that is constructed in advance.
It should be noted that the file format of the configuration file of the daemon process includes, but is not limited to, an xml format, a json format, and an ini format.
In this embodiment, the debugging policy and the recovery policy corresponding to the process identifier are determined from the stored debugging policies and recovery policies, specifically, the process name of the daemon process that has failed is determined based on the process identifier, the debugging policy and the recovery policy corresponding to the process name are determined from the stored debugging policies and recovery policies, and optionally, the recovery policy corresponding to the process name may be searched from a process failure recovery policy database.
S203, determining the debugging strategy corresponding to the process identifier as a target debugging strategy, and determining the recovery strategy corresponding to the process identifier as a target recovery strategy.
In this embodiment, the debugging policy corresponding to the process identifier is determined as a target debugging policy, and the recovery policy corresponding to the process identifier is determined as a target recovery policy.
In the daemon process fault detection method provided by the embodiment of the application, before the function return is performed on the preset function, the embedded system does not recover the resources such as the memory of the daemon process with the fault, so that the process fault site can be reserved, and the process fault information can be acquired.
Referring to fig. 4, an embodiment of the present application further provides a daemon fault detection system, including:
a daemon 401, a failure detection module 402 and a failure recovery module 403.
The fault detection module 402 is configured to start a pre-created process monitoring file after the daemon 401 is started, obtain a file descriptor of the process monitoring file, write the file descriptor into the daemon 401, determine that the daemon 401 fails if a shutdown instruction for shutting down the file descriptor is received, obtain a process identifier of the daemon 401, send the process identifier to the fault recovery module 403, and shut down the daemon 401 after receiving a notification message sent by the fault recovery module 403;
a fault recovery module 403, configured to receive the process identifier sent by the fault detection module 402, determine a target debugging policy and a target recovery policy from the pre-stored debugging policies and recovery policies based on the process identifier, execute a debugging operation corresponding to the target debugging policy, send a notification message to the fault detection module when the debugging operation is completed, and execute a recovery operation corresponding to the target recovery policy after it is monitored that the daemon process is closed; wherein the notification message is used to notify that the debug operation has been completed.
In this embodiment, the daemon fault detection system may further include a recovery policy refining module and a recovery policy database, referring to fig. 5, that is, the daemon fault detection system includes a daemon 501, a fault detection module 502, a fault recovery module 503, a recovery policy database 504, and a recovery policy refining module 505.
The recovery policy refining module 505 is configured to obtain a configuration file of each daemon process in the embedded system, construct a process dependency relationship diagram based on each configuration file, determine a recovery policy of each daemon process based on each configuration file and the process dependency relationship diagram, and store the recovery policy of each daemon process in the recovery policy database 504, where the process dependency relationship diagram is used to indicate a dependency relationship between the daemon processes.
In this embodiment, referring to fig. 6, the daemon process, the fault recovery module, the recovery policy database, the recovery policy refining module, and the daemon process configuration file of the daemon process are all in a user state, and the fault detection module is in a kernel state.
In the implementation, the daemon process has the same functions as the daemon process in the prior art, and codes do not need to be modified. But in the linking phase, an additional link is required to a dynamic or static link library specific to the present invention. The link library has an initialization function, opens the device specific file/dev/proc _ monitor through an open system call, and records the corresponding file descriptor. The function is decorated with __ attribute __ (constructor), and the compiler ensures that the function runs before the main function executes.
The fault detection module runs in the kernel of the embedded system and is a virtual promiscuous device (misc device) driver. After the fault detection module is initialized, a process monitoring file/dev/proc _ monitor is created under the system root directory. The embodiment implements four function interfaces for it, including an open function, a release function, a read function, and a write function.
Wherein,
(1) the open function:
when the daemon process is initialized, the open/dev/proc _ monitor is called through the open function, and the embedded system finally calls the open function to inform the fault detection module that the daemon process is established.
(2) The release function:
when the daemon process is about to exit, the embedded system automatically closes all the file descriptors opened by the daemon process, wherein the file descriptors corresponding to the/dev/proc _ monitor are included. When the file descriptor corresponding to the/dev/proc _ monitor is closed, the operating system finally calls the release function. In this embodiment, the basic flow of the release function is as follows:
step1, acquiring a process number pid of the daemon process, namely a process identifier of the daemon process;
step2, return pid to the caller of the read function (i.e., the fault recovery module);
step3, waiting for the fault recovery module to obtain the process fault information (such as memory dump) of the fault process;
step4, the function returns.
Before the release function returns, the embedded system does not recover the resources such as the memory of the fault daemon process, and the like, so that the process fault site can be reserved to acquire process fault information.
(3) The read function:
invoked by the failure recovery module. If the daemon process fails, the pid of the failed daemon process is returned, otherwise, the daemon process is blocked.
(4) The write function:
and the fault recovery execution module is used for calling and informing the fault detection module that the diagnosis information is completely acquired.
Each supervised daemon process has a corresponding configuration file for describing the dependency relationship of the daemon process on other processes, debugging strategies and recovery strategies when faults occur and other information.
The debugging strategy comprises the steps of not debugging, obtaining a call stack, obtaining memory dump and the like;
the crash recovery policy includes booting the system, restarting the process itself, and all processes that directly or indirectly depend on the process.
And the recovery strategy database is used for storing the recovery strategies of all daemons in the embedded system.
The recovery strategy refining module runs when the system is started and is used for refining the recovery strategy of the daemon process, and the recovery strategy refining module comprises the following steps:
step1, traversing the configuration files of all daemons and establishing a process dependency relationship graph. The process dependency graph is a directed graph: vertices represent processes and directed edges represent process dependencies. For example, if process B is dependent on process A, then there is vertex A- > B in the graph;
step2, traversing the node with the degree of income of 0 in the graph: if the recovery strategy corresponding to the node is restart _ system or restart _ process, recording the recovery strategy of the process into a process fault recovery strategy database; if the recovery strategy corresponding to the node is restart _ dependent, the node is taken as an initial node, the dependency graph is traversed with breadth first, the traversal sequence of the node is the restart sequence of the process, and the traversal sequence is recorded into a process fault recovery strategy database;
step3, deleting the nodes with the income degree of 0 in the graph, and repeating step2 until all the nodes in the dependency graph are deleted;
step4, the process exits.
The fault recovery module is used for performing recovery operation when the daemon process fails, and comprises the following steps:
step1, when the embedded system starts, opening/dev/proc _ monitor is called through open function, file descriptor is obtained, and read function calling of the file descriptor is blocked;
step2, if the daemon process fails and exits, the read function call returns, and the pid of the failed process returns to the process failure recovery module;
step3, acquiring a process name according to the pid, and searching a recovery strategy and a debugging strategy corresponding to the process name in a recovery strategy database;
step4, executing debugging action;
step5, notifying the fault detection module of completion of debugging through write system call;
step6, executing the recovery strategy after the process exits.
The daemon fault detection system provided by the embodiment of the application utilizes the characteristic that the daemon automatically closes the file descriptor after the daemon breaks down, and realizes fault detection on the daemon, so that the daemon can sense the fault at the first time when the daemon breaks down, functions of an embedded system are prevented from being in an unavailable state for a long time, consumption of CPU resources is reduced, and running efficiency of the system is improved. Before function return is carried out on the preset function, resources such as a memory of a daemon process with a fault cannot be recovered by embedding the function into the system, so that a process fault site can be reserved, and process fault information can be acquired. And fault recovery is carried out according to a recovery strategy, so that the problems that in the prior art, only the crash process is restarted, the dependency relationship among the processes is not considered, and the system function is possibly abnormal after the process is restarted are solved.
Referring to fig. 7, the above-mentioned daemon fault detection method is exemplified as follows:
after the embedded system is started, the process monitoring file is opened through an open (), namely, an open function, a file descriptor of the process monitoring file is obtained, and a read (), namely, a read function is called, wherein the read function is a blocking call, and the function return is only carried out under the condition that a process identifier of a daemon process is received.
After the daemon process is started, a first notification message is sent to the fault detection module through an open function to notify the fault detection module that the daemon process is started, and a file descriptor is written into a memory of the daemon process.
Under the condition that the daemon process fails, a closing instruction for closing the file description is received, release () is called, namely a release function is called, a process identifier pid of the daemon process is obtained, the pid is returned to a read function, the read function meets a return condition, and the pid is returned to a failure recovery module.
The method comprises the steps that a fault recovery module obtains a pid, determines a process name corresponding to the pid based on the pid, searches a debugging strategy and a recovery strategy corresponding to the process name from a pre-stored debugging strategy and a pre-stored recovery strategy, and carries out debugging operation based on the debugging strategy, wherein if the debugging strategy is to obtain a call stack or obtain memory dump information, the process fault information of a daemon process is obtained through a function get _ debug _ info () function, and the call stack or the memory dump information is obtained based on the process fault information.
After completing the debugging operation, the failure recovery module calls write (), namely calls a write function to send a second notification message to the failure detection module so as to notify the failure detection module that the debugging is completed.
And after receiving the function return of the release function, closing the file descriptors of the process monitoring file, so that the daemon exits after closing all the opened file descriptors.
After the daemon process exits, the fault recovery module executes recovery operation through a recovery function recovery _ crash _ process () based on a recovery strategy, namely restarting the embedded system, or restarting the daemon process and the daemon process which has a dependency relationship with the daemon process.
It should be noted that while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous.
It should be understood that the various steps recited in the method embodiments disclosed herein may be performed in a different order and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the disclosure is not limited in this respect.
Corresponding to the method described in fig. 1, an embodiment of the present application further provides a device for detecting a daemon process fault, which is used to implement the method in fig. 1 specifically, and a schematic structural diagram of the device is shown in fig. 8, and specifically includes:
a starting unit 801, configured to start a pre-created process monitoring file after a daemon process is started;
a first obtaining unit 802, configured to obtain a file descriptor of the process monitoring file, and write the file descriptor into the daemon process;
a first determining unit 803, configured to determine that the daemon process fails if a close instruction for closing the file descriptor is received;
a second obtaining unit 804, configured to determine a target debugging policy and a target recovery policy from each pre-stored debugging policy and recovery policy, and execute a debugging operation corresponding to the target debugging policy;
a first closing unit 805 configured to close the file descriptor when the execution of the debugging operation is completed;
a second closing unit 806, configured to close the daemon process and perform a recovery operation corresponding to the recovery policy.
According to the daemon fault detection device provided by the embodiment of the application, by researching the operation characteristics of the daemon process on the file descriptor after the daemon process breaks down, the technical idea of utilizing the operation characteristics of the daemon process on the file descriptor after the daemon process breaks down, namely automatically closing the file descriptor, to detect the fault of the daemon process is provided, so that the fault can be sensed at the first time when the daemon process breaks down, the defects of the prior art that the fault exists respectively under the condition that the polling period duration is different are overcome, the recovery operation is executed according to the recovery strategy, and the problem that the system function is unavailable after the daemon process is recovered is solved.
In an embodiment of the present application, based on the foregoing scheme, when determining the target debug policy and the target recovery policy from the pre-stored debug policies and recovery policies, the second obtaining unit 804 is specifically configured to:
acquiring a process identifier of the daemon process;
determining a debugging strategy and a recovery strategy corresponding to the process identifier from each pre-stored debugging strategy and recovery strategy;
and determining the debugging strategy corresponding to the process identifier as a target debugging strategy, and determining the recovery strategy corresponding to the process identifier as a target recovery strategy.
In an embodiment of the present application, based on the foregoing scheme, the method may further include:
the third acquisition unit is used for acquiring the configuration file of each daemon process in the embedded system;
the construction unit is used for constructing a process dependency relationship graph based on each configuration file; the process dependency relationship graph is used for indicating the dependency relationship among all the daemon processes;
the second determining unit is used for determining the recovery strategy of each daemon process based on each configuration file and the process dependency relationship graph;
and the storage unit is used for storing the recovery strategy of each daemon process.
In an embodiment of the application, based on the foregoing scheme, the second determining unit is specifically configured to:
executing a first operation on each node with the degree of income of 0 in the process dependency relationship graph; the first operation includes: if the recovery strategy configuration information in the configuration file corresponding to the node is a restart process or a restart system, determining the recovery strategy configuration information as a recovery strategy of a daemon process corresponding to the node; if the recovery strategy configuration information in the configuration file corresponding to the node is not the restart process or the restart system, traversing the process dependency relationship graph by using the node as an initial node through a preset traversal strategy to obtain the restart sequence of the daemon process and the daemon process having a dependency relationship with the daemon process, and determining the restart sequence and the dependency relationship and recovery strategy configuration information included in the configuration file as the recovery strategy of the daemon process corresponding to the node; wherein, the nodes in the process dependency relationship graph are used for representing daemon processes;
deleting the node with the degree of 0 in the process dependency relationship graph to obtain an updated process dependency relationship graph, and returning to execute the step of executing the first operation on each node with the degree of 0 in the process dependency relationship graph based on the updated process dependency relationship graph until no node exists in the process dependency relationship graph.
In an embodiment of the present application, based on the foregoing scheme, when the second closing unit 806 executes the recovery operation corresponding to the recovery policy, specifically, to:
restarting the embedded system, or restarting the daemon process and the daemon process which has a dependency relationship with the daemon process based on a restart sequence in the recovery strategy.
In an embodiment of the application, based on the foregoing scheme, when executing the debugging operation corresponding to the target debugging policy, the second obtaining unit is specifically configured to:
if the target debugging strategy is to acquire a call stack or acquire memory dump information, acquiring process fault information of the daemon process;
and analyzing the process fault information to obtain call stack or memory dump information included in the process fault information.
In an embodiment of the present application, based on the foregoing scheme, the starting unit 801 is specifically configured to:
performing initialization operation on an initialization function in a link library linked in advance by the daemon process; and calling a preset first function to start a pre-created process monitoring file in the process of executing initialization operation on the initialization function.
In an embodiment of the present application, based on the foregoing scheme, the method may further include:
the third determining unit is used for determining the creation position of the process monitoring file to be created according to a preset creation path;
and the creating unit is used for creating the process monitoring file in the creating position.
The embodiment of the present application further provides a storage medium, where an instruction set is stored in the storage medium, and when the instruction set runs, the daemon process fault detection method disclosed in any of the above embodiments is executed.
An electronic device is further provided in the embodiments of the present application, and a schematic structural diagram of the electronic device is shown in fig. 9, and specifically includes a memory 901 for storing at least one set of instruction sets; a processor 902 for executing the instruction set stored in the memory, and implementing the daemon fault detection method disclosed in any of the above embodiments by executing the instruction set.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
While several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
The foregoing description is only exemplary of the preferred embodiments disclosed herein and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the disclosure. For example, the above features and (but not limited to) technical features having similar functions disclosed in the present disclosure are mutually replaced to form the technical solution.

Claims (13)

1. A daemon process fault detection method is applied to an embedded system, and comprises the following steps:
after the daemon process is started, starting a pre-established process monitoring file;
acquiring a file descriptor of the process monitoring file, and writing the file descriptor into the daemon process;
if a closing instruction for closing the file descriptor is received, determining that the daemon process breaks down;
determining a target debugging strategy and a target recovery strategy from each pre-stored debugging strategy and recovery strategy, and executing debugging operation corresponding to the target debugging strategy;
closing the file descriptor when the debugging operation is completed;
and closing the daemon process, and executing recovery operation corresponding to the target recovery strategy.
2. The method of claim 1, wherein determining a target debug policy and a target restore policy from pre-stored debug policies and restore policies comprises:
acquiring a process identifier of the daemon process;
determining a debugging strategy and a recovery strategy corresponding to the process identifier from each pre-stored debugging strategy and recovery strategy;
and determining the debugging strategy corresponding to the process identifier as a target debugging strategy, and determining the recovery strategy corresponding to the process identifier as a target recovery strategy.
3. The method of claim 2, wherein the storing of each recovery policy comprises:
acquiring a configuration file of each daemon process in the embedded system;
constructing a process dependency relationship graph based on each configuration file; the process dependency relationship graph is used for indicating the dependency relationship among all the daemon processes;
determining a recovery strategy of each daemon process based on each configuration file and the process dependency relationship graph;
the recovery policy for each daemon process is stored.
4. The method of claim 3, wherein determining the recovery policy for each daemon process based on the respective configuration file and the process dependency graph comprises:
executing a first operation on each node with the degree of income of 0 in the process dependency relationship graph; the first operation includes: if the recovery strategy configuration information in the configuration file corresponding to the node is a restart process or a restart system, determining the recovery strategy configuration information as a recovery strategy of a daemon process corresponding to the node; if the recovery strategy configuration information in the configuration file corresponding to the node is not the restart process or the restart system, traversing the process dependency relationship graph by using the node as an initial node through a preset traversal strategy to obtain the restart sequence of the daemon process and the daemon process having a dependency relationship with the daemon process, and determining the restart sequence and the dependency relationship and recovery strategy configuration information included in the configuration file as the recovery strategy of the daemon process corresponding to the node; wherein, the nodes in the process dependency relationship graph are used for representing daemon processes;
deleting the node with the degree of 0 in the process dependency relationship graph to obtain an updated process dependency relationship graph, and returning to execute the step of executing the first operation on each node with the degree of 0 in the process dependency relationship graph based on the updated process dependency relationship graph until no node exists in the process dependency relationship graph.
5. The method of claim 4, wherein the performing the recovery operation corresponding to the target recovery policy comprises:
restarting the embedded system, or restarting the daemon process and the daemon process which has a dependency relationship with the daemon process based on a restart sequence in the recovery strategy.
6. The method of claim 1, wherein the performing the debug operation corresponding to the target debug policy comprises:
if the target debugging strategy is to acquire a call stack or acquire memory dump information, acquiring process fault information of the daemon process;
and analyzing the process fault information to obtain call stack or memory dump information included in the process fault information.
7. The method of claim 1, wherein the initiating a pre-created process monitoring file comprises:
performing initialization operation on an initialization function in a link library linked in advance by the daemon process; and calling a preset first function to start a pre-created process monitoring file in the process of executing initialization operation on the initialization function.
8. The method of claim 1, wherein the process monitors a file creation process, comprising:
determining the creation position of a process monitoring file to be created according to a preset creation path;
and creating a process monitoring file in the creating position.
9. A daemon process fault detection device is applied to an embedded system, and comprises the following components:
the starting unit is used for starting the pre-established process monitoring file after the daemon process is started;
the first acquisition unit is used for acquiring a file descriptor of the process monitoring file and writing the file descriptor into the daemon process;
the first determining unit is used for determining that the daemon process breaks down if a closing instruction for closing the file descriptor is received;
the second acquisition unit is used for determining a target debugging strategy and a target recovery strategy from each pre-stored debugging strategy and recovery strategy and executing debugging operation corresponding to the target debugging strategy;
a first closing unit, configured to close the file descriptor when the execution of the debugging operation is completed;
and the second closing unit is used for closing the daemon process and executing the recovery operation corresponding to the recovery strategy.
10. A daemon fault detection system, comprising:
the system comprises a daemon process, a fault detection module and a fault recovery module;
the fault detection module is used for starting a pre-established process monitoring file after the daemon process is started, acquiring a file descriptor of the process monitoring file, writing the file descriptor into the daemon process, determining that the daemon process has a fault if a closing instruction for closing the file descriptor is received, acquiring a process identifier of the daemon process, sending the process identifier to the fault recovery module, and closing the daemon process after a notification message sent by the fault recovery module is received;
the fault recovery module is configured to receive a process identifier sent by the fault detection module, determine a target debugging policy and a target recovery policy from pre-stored debugging policies and recovery policies based on the process identifier, execute a debugging operation corresponding to the target debugging policy, send a notification message to the fault detection module when the debugging operation is completed, and execute a recovery operation corresponding to the target recovery policy after it is monitored that the daemon process is closed; wherein the notification message is used to notify that a debug operation has been completed.
11. The system of claim 10, further comprising:
a recovery strategy refinement module and a recovery strategy database;
the recovery strategy refining module is used for acquiring configuration files of all daemon processes in the embedded system, constructing a process dependency relationship graph based on the configuration files, determining a recovery strategy of each daemon process based on the configuration files and the process dependency relationship graph, and storing the recovery strategy of each daemon process to the recovery strategy database, wherein the process dependency relationship graph is used for indicating the dependency relationship among all daemon processes.
12. A storage medium storing a set of instructions, wherein the set of instructions, when executed by a processor, implement a daemon process fault detection method according to any one of claims 1 to 8.
13. An electronic device, comprising:
a memory for storing at least one set of instructions;
a processor for executing the instruction set stored in the memory, and implementing the daemon fault detection method according to any one of claims 1 to 8 by executing the instruction set.
CN202111175438.XA 2021-09-24 2021-10-09 Daemon process fault detection method and device, storage medium and electronic equipment Pending CN113901443A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021111217801 2021-09-24
CN202111121780 2021-09-24

Publications (1)

Publication Number Publication Date
CN113901443A true CN113901443A (en) 2022-01-07

Family

ID=79190740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111175438.XA Pending CN113901443A (en) 2021-09-24 2021-10-09 Daemon process fault detection method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113901443A (en)

Similar Documents

Publication Publication Date Title
JP5128944B2 (en) Method and system for minimizing data loss in computer applications
US8423718B2 (en) Low-overhead run-time memory leak detection and recovery
EP2427822B1 (en) Exception raised notification
Liu et al. FCatch: Automatically detecting time-of-fault bugs in cloud systems
WO2023115999A1 (en) Device state monitoring method, apparatus, and device, and computer-readable storage medium
CN111324423B (en) Method and device for monitoring processes in container, storage medium and computer equipment
CN110413432B (en) Information processing method, electronic equipment and storage medium
Kim et al. WakeScope: Runtime WakeLock anomaly management scheme for Android platform
US20070083792A1 (en) System and method for error detection and reporting
CN111800304A (en) Process running monitoring method, storage medium and virtual device
CN110457907A (en) A kind of firmware program detecting method and device
CN113467981A (en) Exception handling method and device
KR20160138523A (en) Method and apparatus for determining behavior information corresponding to a dangerous file
US7340594B2 (en) Bios-level incident response system and method
CN116627702A (en) Method and device for restarting virtual machine in downtime
CN113901443A (en) Daemon process fault detection method and device, storage medium and electronic equipment
CN107179911B (en) Method and equipment for restarting management engine
US20230088318A1 (en) Remotely healing crashed processes
CN114546717A (en) Method and device for starting android intelligent terminal, intelligent terminal and storage medium
CN114153503A (en) BIOS control method, device and medium
Sanvito et al. Syslrn: Learning what to monitor for efficient anomaly detection
CN112068980A (en) Method and device for sampling information before CPU hang-up, equipment and storage medium
Taheri Investigating suspected background processes in Android malware classification through dynamic automated reverse engineering and semi-automated debugging
CN111949362A (en) Host information acquisition method based on virtualization technology
CN117873695A (en) Method, device, equipment and medium for processing equipment state change

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: Room 332, 3 / F, Building 102, 28 xinjiekouwei street, Xicheng District, Beijing 100088

Applicant after: QAX Technology Group Inc.

Applicant after: Qianxin Wangshen information technology (Beijing) Co.,Ltd.

Address before: Room 332, 3 / F, Building 102, 28 xinjiekouwei street, Xicheng District, Beijing 100088

Applicant before: QAX Technology Group Inc.

Country or region before: China

Applicant before: LEGENDSEC INFORMATION TECHNOLOGY (BEIJING) Inc.