CN110502369B - Method and device for recovering equipment crash and storage medium - Google Patents

Method and device for recovering equipment crash and storage medium Download PDF

Info

Publication number
CN110502369B
CN110502369B CN201910769103.7A CN201910769103A CN110502369B CN 110502369 B CN110502369 B CN 110502369B CN 201910769103 A CN201910769103 A CN 201910769103A CN 110502369 B CN110502369 B CN 110502369B
Authority
CN
China
Prior art keywords
reset
module
processor
watchdog
feeding signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910769103.7A
Other languages
Chinese (zh)
Other versions
CN110502369A (en
Inventor
贾克明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Jingxin Communication Technology Co ltd
Original Assignee
Comba Network Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Comba Network Systems Co Ltd filed Critical Comba Network Systems Co Ltd
Priority to CN201910769103.7A priority Critical patent/CN110502369B/en
Publication of CN110502369A publication Critical patent/CN110502369A/en
Application granted granted Critical
Publication of CN110502369B publication Critical patent/CN110502369B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/24Resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1441Resetting or repowering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a method and a device for recovering equipment crash and a storage medium, which are used for improving the success rate of the recovery of the equipment crash. If the first watchdog timing module is over time and does not receive the first dog feeding signal, indicating a processor in the equipment to reset; the first watchdog timing module is used for receiving a first watchdog feeding signal sent by an application layer in the equipment; if the second watchdog timing module is in time-out state and does not receive a second dog feeding signal, the whole board in the equipment is indicated to reset; the second watchdog timing module is configured to receive a second watchdog feeding signal sent by a kernel layer in the device.

Description

Method and device for recovering equipment crash and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method, an apparatus, and a storage medium for recovering from a device crash.
Background
A processor is a relatively common component in a communication device. In order to avoid situations such as a crash of the communication device, a corresponding watchdog is often set in the communication device. Under the condition of normal operation, the CPU sends a dog feeding signal to the watchdog at a certain fixed period. And after the watchdog receives the dog feeding signal, the timing is restarted. If the watchdog does not receive the dog feeding signal within the preset time length, a reset signal is triggered, and the processor is reset.
But if other parts in the device have problems, the device is difficult to restore only by resetting the processor.
Disclosure of Invention
The embodiment of the application provides a method and a device for recovering equipment crash and a storage medium, which are used for improving the success rate of recovering the equipment crash.
In a first aspect, a method for recovering from a dead halt of a device is provided, which includes:
if the first watchdog timing module is over time and does not receive the first dog feeding signal, indicating a processor in the equipment to reset; the first watchdog timing module is used for receiving a first watchdog feeding signal sent by an application layer in the equipment;
if the second watchdog timing module is in time-out state and does not receive a second dog feeding signal, the whole board in the equipment is indicated to reset; the second watchdog timing module is configured to receive a second watchdog feeding signal sent by a kernel layer in the device.
In the embodiment of the application, the situations needing to reset the equipment are divided into two types, and then different reset operations are executed on the processor based on different types; for example: and receiving a first dog feeding signal of an application layer and a second dog feeding signal of an inner core layer in the equipment, resetting the processor when the application layer fails, and resetting the whole board when the inner core layer fails. Compared with the mode of only resetting the processor in the prior art, the method can improve the success rate of the recovery of the dead halt of the equipment after the resetting process. And the processor is reset in a targeted manner, and the reset time of the processor is relatively shorter than that of the whole board, so that the processor or the whole board is reset reasonably, and compared with a mode of only resetting the whole board, the efficiency of the crash recovery processing can be relatively improved.
In one possible embodiment, the method further comprises:
and when the time length of the processor for executing the reset exceeds a first time length or the time length of the whole board for executing the reset in the equipment exceeds a second time length, indicating the equipment to execute the power-on recovery processing again.
In the embodiment of the application, if the processor is unsuccessfully reset or the device is unsuccessfully reset, the device can be powered on again for recovery, and the success rate of the crash recovery of the device is further improved.
In one possible embodiment, the method further comprises:
after a processor in the equipment is reset, receiving a first dog feeding signal of an application layer in the equipment, and restarting timing by the first watchdog timing module;
and after the whole board in the equipment is reset, receiving a second dog feeding signal of an inner core layer in the equipment, and restarting timing by the second watchdog timing module.
In the embodiment of the application, if the processor is successfully reset, the first watchdog timing module is controlled to count again. And after the whole board is successfully reset, controlling the second watchdog timing module to count again so as to ensure that the first watchdog timing module and the second watchdog timing module are in a normal working state.
In one possible embodiment, the processor is reset through a reset pin of the processor, and the whole board is reset through reset pins of the modules in the whole board.
In the embodiment of the application, when the processor is reset and the whole board is reset, the reset pins of all the modules can be reset, and the reset mode is simple.
In a second aspect, an apparatus for recovering from a crash of a device is provided, including:
the first watchdog timing module is used for receiving a first dog feeding signal of an application layer in the equipment;
the second watchdog timing module is used for receiving a second dog feeding signal of a kernel layer in the equipment;
the reset control module is used for detecting whether the first watchdog timing module and the second watchdog timing module are overtime or not, and controlling the processor in the equipment to reset if the first watchdog timing module is overtime; if the second watchdog timing module is overtime, controlling the whole board in the equipment to reset; wherein the entire board includes the processor and a clock module in the device.
In one possible embodiment, the apparatus further comprises a power control module, wherein:
and the power supply control module is used for controlling the equipment to execute the power-on recovery processing again when the time length for resetting the processor exceeds a first time length or the time length for resetting the whole board exceeds a second time length.
In a possible implementation manner, the reset control module is specifically configured to control a reset pin of the processor to reset or control a reset pin of each module in the entire board to reset.
In a possible implementation manner, the first watchdog timing module is further configured to restart timing when a first dog feeding signal of an application layer in the device is received after the processor in the device is reset;
the second watchdog timing module is further configured to restart timing when a second dog feeding signal of a kernel layer in the device is received after the whole board in the device is reset.
In a possible embodiment, the apparatus further comprises: a first dog feeding signal generation module and a second dog feeding signal generation module, wherein:
the first dog feeding signal generating module is configured to detect an operation condition of an application layer in the device, generate a first dog feeding signal when the application layer in the device operates normally, and periodically send the first dog feeding signal to the first watchdog timing module;
and the second dog feeding signal generating module is used for detecting the running condition of an inner core layer in the equipment, generating a second dog feeding signal when the inner core layer in the equipment runs normally, and periodically sending the second dog feeding signal to the second watchdog timing module.
In a third aspect, an apparatus for recovering from a crash of a device is provided, including:
at least one processor, and
a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, and the at least one processor implements the method according to any one of the first aspect and possible embodiments by executing the instructions stored by the memory.
In a fourth aspect, a computer-readable storage medium is provided, which stores computer instructions that, when executed on a computer, cause the computer to perform the method according to any of the first aspect and possible embodiments.
Drawings
Fig. 1 is an application scenario diagram of a method for recovering from a device crash provided in an embodiment of the present application;
fig. 2 is a first flowchart illustrating a method for recovering from a crash of equipment according to an embodiment of the present application;
fig. 3 is a flowchart illustrating a second method for recovering from a crash of equipment according to an embodiment of the present application;
fig. 4 is a first schematic structural diagram of an apparatus for recovering from a dead halt of a device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a device for recovering from a dead halt in a first jaw according to an embodiment of the present application.
Detailed Description
In order to better understand the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the drawings and specific embodiments.
In order to facilitate those skilled in the art to more clearly understand the technical solutions in the present application, the terms related to the embodiments of the present application are explained below.
(1) The crash can be understood as a case where the device does not respond to an external instruction. There are various reasons for the crash, such as the processing resources of the device are used up, or the operating system inside the device runs in error.
(2) The crash recovery means that corresponding measures are taken to enable the equipment to enter a normal working state and respond to an external instruction and the like under the condition that the equipment crashes.
In the prior art, a processor is generally matched with a watchdog timing module. And under the condition that the processor crashes, the watchdog timing module cannot normally receive the dog feeding signal. If the watchdog timing module is overtime and does not receive the dog feeding signal, the processor is in an abnormal state, and the processor is instructed to reset. However, other hardware in the device may be abnormal, and in this case, the processor is reset, so that the problem of the dead halt of the device cannot be solved.
In view of this, an embodiment of the present application provides a method for recovering a device crash, where the device in the embodiment of the present application generally refers to a device with a processor, such as an embedded communication device, a server, or a terminal device. Communication devices such as base stations and the like.
Referring to fig. 1, fig. 1 is an application scenario diagram of the method, where the application scenario includes a whole board 170, an application layer, a kernel layer, and an apparatus 100 for recovering from a dead halt. The entire board 170 includes the processor 160 as well as other modules, such as a clock module, a flash memory (flash memory) module, and the like.
The apparatus 100 for recovering the device crash may be implemented by a software module, a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC). Generally, the apparatus 100 for recovering from device crash is disposed in the device and belongs to a part of the device, for example, the apparatus 100 for recovering from device crash is coupled in the device.
Specifically, fig. 1 further includes a schematic structural diagram of the apparatus 100 for recovering from a dead halt of the device. The apparatus 100 for recovering from a crash of a device includes a first watchdog timing module 110, a second watchdog timing module 120, a reset control module 150, and a power control module 180.
The first watchdog timing module 110 receives a first watchdog signal transmitted by an application layer, and the second watchdog timing module 120 receives a second watchdog signal transmitted by a kernel layer. If the first watchdog timer module 110 times out and does not receive the first watchdog signal, the reset control module 150 controls the processor 160 to reset. If the second watchdog 120 times out and does not receive the second dog feeding signal, the reset control module 150 controls the whole board 170 to reset. The reset control module 150 is, for example, a watchdog circuit implemented based on a Complex Programmable Logic Device (CPLD).
On the basis of the application scenario discussed in fig. 1, a method for recovering the device crash involved in the embodiment of the present application is described below, and the method is performed by the apparatus 100 for recovering the device crash discussed earlier. Referring to fig. 2, the method includes:
step 201, if the first watchdog timing module 110 times out, and the first watchdog feeding signal is not received, the processor in the equipment is instructed to reset;
step 202, if the second watchdog timing module 120 times out, and the second watchdog signal is not received, the whole board in the device is instructed to reset.
Specifically, under a normal operation condition, the application layer of the device sends a first dog feeding signal to the first watchdog timing module 110. The first watchdog timer module 110 is configured with a first timeout period, and the reset control module 150 instructs the processor 160 to reset if the first watchdog timer module 110 has not received the first dog feeding signal after the first timeout period is exceeded. Under the condition that the kernel layer of the device normally works, the kernel layer sends a second dog feeding signal to the second watchdog timing module 120. If the second watchdog timer module 120 sets a second timeout period, and if the second watchdog signal is not received after the second timeout period is exceeded, the reset control module 150 instructs the entire board 170 to reset.
As an embodiment, the first timeout period is less than the second timeout period. Because the load of the inner core layer is generally larger, and the second timeout period is set to be longer, the dog feeding frequency of the inner core layer can be relatively improved, and the burden of the inner core layer is further reduced.
It should be noted that the execution order of step 201 and step 202 may be arbitrary, and the execution order of both is not limited herein.
In the method for recovering from the device crash discussed in fig. 2, two watchdog timing modules are provided, one watchdog timing module is responsible for receiving a feeding dog signal from the application layer 130, and if the feeding dog signal is not normally received from the application layer, it indicates that the application layer is abnormal, the processor 160 is reset. And a watchdog timing module is responsible for receiving the dog feeding signal from the inner core layer, and if the dog feeding signal is not normally received from the inner core layer, the inner core layer is abnormal, and the whole board 170 is reset. According to the embodiment of the application, the processor 160 and the whole board 170 are selectively reset, so that the success rate of the crash recovery of the equipment is improved.
The following describes in detail a process of the method for recovering from a device crash in the embodiment of the present application with reference to fig. 3.
Referring to fig. 3, after the device is powered on again, the reset control module 150 executes step 301 to control to start the first watchdog timer module 110 and the second watchdog timer module 120.
After the first watchdog timer module 110 and the second watchdog timer module 120 are started, the application layer sends a first dog feeding signal to the first watchdog timer 110 at a first period Ta, and the kernel layer sends a second dog feeding signal to the second watchdog timer module at a second period Tb.
Specifically, a first software module is arranged in the application layer, and a second software module is arranged in the kernel layer. The first software module is configured to detect a current operation condition of the kernel layer, and if the operation condition of the application layer is good, the first software module sends a first dog feeding signal to the first watchdog timing module 110. The second software module is configured to detect a current operating condition of the kernel layer, and if the current operating condition of the kernel layer is good, the second software module sends a first dog feeding signal to the second watchdog timing module 120.
A first software module and a second software module are provided in the processor 160, the first software module communicating with the first watchdog timer module 110, which is equivalent to the processor 160 sending the first watchdog timer module 110 with a first dog feeding signal. The second software module communicates with the second watchdog timer module 120, which corresponds to the processor 160 sending a second watchdog signal to the second watchdog timer module 120.
The processor 160 may send the corresponding dog feeding signal to the corresponding watchdog timing module through a General-Purpose Input/Output (GPIO) Interface, a Serial Peripheral Interface (SPI), a Bus Interface (I2C), a Universal Asynchronous Receiver/Transmitter (UART) Interface, or a Controller Area Network Bus (CANBUS).
For example, processor 160 receives a first feed dog signal via GPIO1 and transmits the first feed dog signal to reset control module 150, and processor 160 receives a second feed dog signal via GPIO2 and transmits the second feed dog signal to reset control module 150.
The first watchdog timing module 110 is involved in cooperation with the reset control module 150 to implement the first process of resetting the processor 160, and the second watchdog timing module 120 is involved in cooperation with the reset control module 150 to implement the second process of resetting the entire board 170. The two processes are described separately below.
In the first process, the first watchdog timer module 110 cooperates with the reset control module 150 to implement the process of resetting the processor 160:
after the first watchdog timer module 110 is started, step 302 is performed and the first watchdog timer module 110 restarts timing. After the first watchdog timer module 110 restarts timing, the reset control module 150 performs step 303 to determine whether the first watchdog timer module 110 receives the first feeding dog signal within a first timeout period T1. Wherein the first timeout period T1 is greater than the first period Ta.
If it is determined that the first dog feeding signal is received within the first timeout period T1, indicating that the application layer operating status is normal, the reset control module 150 executes step 302, i.e., controls the first watchdog timer module 110 to restart timing. If it is determined that the first dog feeding signal is not received within the first timeout period T1, it indicates that the operating status of the application layer may be abnormal, and in general, the application layer abnormality is caused by a fault of the processor 160, so the reset control module 150 in this embodiment of the present application performs step 304, i.e., resets the processor 160.
Specifically, there are various ways to reset the processor 160.
The first method is as follows:
the processor 160 is provided with a reset pin, and the reset control module 150 sends a reset signal to the reset pin in the processor 160 to reset the processor 160. A reset pin of processor 160, e.g., CPU _ RST.
In the first mode, by resetting the reset pin of the processor 160, the processor 160 can be reset more comprehensively, so as to improve the success rate of the crash recovery of the device.
The second method comprises the following steps:
the process of shutting down the processor 160 is controlled to implement the reset process for the processor 160.
Specifically, the failure of the processor 160 is caused in large part by the excessive load, and the reset control module 150 controls to shut down part or all of the processes in the processor 160, thereby resetting the processor 160.
In the second mode, by shutting down some or all processes in the processor 160 for resetting, the influence on other functions of the processor 160 can be relatively reduced.
The reset control module 150 starts timing while resetting the processor 160, and the timing duration is used to indicate the reset duration for resetting the processor 160. If the reset duration for resetting the processor 160 exceeds the first duration, i.e., the processor 160 is initially reset, after the first duration has elapsed, the first watchdog timer module 110 has not received the first watchdog signal. In this case, since it indicates that the process of resetting the processor 160 cannot be completed, the reset control module 150 controls the power control module 180 to perform the power-on recovery process on the device. The power-on recovery processing means that the power supply of the equipment is disconnected firstly, and then the power-on processing is carried out on the equipment. After the device is powered back up, step 301 is entered.
If the reset duration does not exceed the first duration, the first watchdog timer module 110 normally receives the first feeding dog signal, which indicates that the reset process for the processor 160 is successful, and then the process goes to step 302.
As an embodiment, in order to record the reset duration, the apparatus 100 for recovering from the device crash may separately set a third timer, the reset control module 150 controls to start the third timer when the reset control module 150 recovers from the processor 160, and the power control module 180 performs the power-on recovery processing on the device after the third timer exceeds the first duration.
To avoid the situation that the second watchdog timer module 120 times out during the process of resetting the processor 160, in the embodiment of the present application, the reset control module 150 stops the first watchdog timer module and the second watchdog timer module when the processor 160 is reset.
In the second process, the second watchdog timing module 120 cooperates with the reset control module 150 to implement the process of resetting the whole board 170:
after the second watchdog timer module 120 is started, step 307 is executed, and the second watchdog timer module 120 restarts timing. After restarting the timing, the reset control module 150 executes step 308 to determine whether the second watchdog timer module 120 received the second watchdog signal within a second timeout period T2. The second timeout period T2 is greater than the second period Tb.
If it is determined that the second watchdog signal is received within the second timeout period, indicating that the core layer operating status is normal, step 307 is executed, in which the second watchdog timer module 120 restarts timing. If the second dog feeding signal is not received within the second timeout period, the operating state of the kernel layer is possibly abnormal. The reason for the abnormality in the kernel layer is relatively complex, and it is difficult to solve the problem of the dead halt of the device by only resetting the processor 160, so the reset control module 150 executes step 309 to reset the entire board 170.
Specifically, there are various ways to reset the whole board 170, for example, a reset pin is disposed on each module in the whole board 170, and the reset control module 150 inputs a reset signal to the reset pin of each module, so as to reset each module in the whole board 170.
While the whole board 170 is reset, the reset control module 150 starts timing, and if the reset time period for resetting the whole board 170 exceeds a second time period, that is, the second watchdog timing module 120 starts resetting the whole board 170, and does not receive the second dog feeding signal after the second time period. In this case, the reset control module 150 controls the power control module 180 to perform power restoration processing on the device, which indicates that the process of resetting the entire board 170 cannot be completed. The power-on recovery process can refer to the foregoing discussion, and is not described in detail here. After the device is powered back up, it jumps to step 301.
If the reset time for resetting the whole board 170 does not exceed the second time, the second watchdog timing module 120 normally receives the second dog feeding signal, which indicates that the reset processing for the whole board 170 is successful, and then the process goes to step 307.
In one embodiment, the reset control module 150 controls to start the fourth timer when the reset control module 150 restores the entire board 170, and the power control module 180 performs power-on restoration processing on the device when the fourth timer exceeds the second time duration.
In one embodiment, the first duration is less than the second duration. Since the time period required to reset the entire board 170 is longer, the second time period is set longer than the first time period to allow sufficient time to reset the entire board 170.
In order to avoid the first watchdog timing module 110 being timed out during the process of resetting the entire board 170, the reset control module 150 controls to stop the first watchdog timing module 110 and the second watchdog timing module 120 when the entire board 170 is reset.
Based on the same inventive concept, an embodiment of the present application provides an apparatus 100 for recovering from a dead halt, please continue to refer to fig. 1, where the apparatus 100 includes a first watchdog timing module 110, a second watchdog timing module 120, and a reset control module 150, where:
a first watchdog timing module 110, configured to receive a first dog feeding signal of an application layer in a device;
a second watchdog timing module 120, configured to receive a second dog feeding signal of a kernel layer in the device;
the reset control module 150 is configured to detect whether the first watchdog timing module and the second watchdog timing module are overtime, and control the processor in the device to reset if the first watchdog timing module is overtime; if the second watchdog timing module is overtime, controlling the whole board in the equipment to reset; wherein the entire board includes the processor and a clock module in the device.
It should be noted that, in the embodiment of the present application, the first watchdog timing module 110 and the second watchdog timing module 120 may be implemented by a software program, or may be implemented by a watchdog chip, and a specific implementation manner of the first watchdog timing module 110 and the second watchdog timing module 120 is not limited in the embodiment of the present application.
In one possible embodiment, referring to fig. 1, the apparatus 100 further includes a power control module 180, wherein:
and the reset control module 150 is configured to control the power control module 180 to perform a power-on recovery process on the device when a time duration for resetting the processor exceeds a first time duration or when a time duration for resetting the whole board exceeds a second time duration.
In one possible embodiment, please continue to refer to fig. 1, the reset control module 150 is specifically configured to control the reset pin of the processor 160 to reset or control the reset pins of the modules in the whole board 170 to reset.
In a possible embodiment, the first watchdog timing module 110 is further configured to restart timing when a first watchdog feeding signal of an application layer in the device is received after the processor in the device is reset;
the second watchdog timing module 120 is further configured to restart timing when a second watchdog feeding signal of the kernel layer in the device is received after the whole board in the device is reset.
In one possible embodiment, referring to fig. 4, the apparatus 100 further includes a first dog-feeding signal generating module 410 and a second dog-feeding signal generating module 420, wherein:
a first dog feeding signal generating module 410, configured to detect an operation condition of an application layer in the device, generate a first dog feeding signal when the application layer in the device operates normally, and periodically send the first dog feeding signal to a first watchdog timing module;
the second dog feeding signal generating module 420 is configured to detect an operation condition of an inner core layer in the device, generate a second dog feeding signal when the inner core layer in the device operates normally, and periodically send the second dog feeding signal to the second watchdog timing module.
It should be noted that the power control module 180, the first dog feeding signal generation module 410, and the second dog feeding signal generation module 420 in the embodiment of the present application belong to optional modules.
Based on the foregoing discussion of a method for recovering from a crash of equipment, an embodiment of the present application provides an apparatus 100 for recovering from a crash of equipment, please refer to fig. 5, where the apparatus 100 includes:
at least one processor 501, and
a memory 502 communicatively coupled to the at least one processor 501;
wherein the memory 502 stores instructions executable by the at least one processor 501, the at least one processor 501 implements a method of recovering from a crash of a device as previously discussed by executing the instructions stored by the memory 502.
It should be noted that fig. 5 illustrates one processor 501, but the number of processors 501 is not limited in practice.
As an example, the processor 501 in fig. 5 may implement the functions of the first watchdog timing module 110, the second watchdog timing module 120 and the reset control module 150 in fig. 1.
The processor 501 and the memory 502 in fig. 5 may be coupled together or may be relatively independently disposed in the apparatus 100, as an example.
It should be noted that if the apparatus 100 for recovering from device crash is coupled in a device to be processed, the processor 501 may be the same as or different from the processor 160 in the device discussed in the foregoing, and when the processor 501 is different from the processor 160 in the device discussed in the foregoing, the processor 501 may be understood as a processor having a function of executing the method for recovering from device crash. And the processor 160 in the foregoing is the object on which the recovery process is performed.
On the basis of the method for recovering from the device crash discussed above, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when executed on a computer, cause the computer to perform the method for recovering from the device crash as discussed above.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (11)

1. A method for recovering from a crash of a device, comprising:
if the first watchdog timing module does not receive the first dog feeding signal when exceeding a first timeout period, indicating a processor in the equipment to reset; the first watchdog timing module is used for receiving a first watchdog feeding signal sent by an application layer in the equipment;
if the second watchdog timing module does not receive the second dog feeding signal when exceeding a second timeout period, the whole board in the equipment is indicated to reset; the second watchdog timing module is configured to receive a second watchdog feeding signal sent by a kernel layer in the device, where the second timeout period is greater than the first timeout period, and the whole board includes the processor and the clock module.
2. The method of claim 1, wherein the method further comprises:
and if the time length for executing the reset by the processor exceeds a first time length or the time length for executing the reset on the whole board exceeds a second time length, indicating the equipment to execute the power-on recovery processing again.
3. The method of claim 1, wherein the method further comprises:
after the processor is reset, receiving a first dog feeding signal of an application layer in the device, and restarting timing by the first watchdog timing module;
and after the whole board in the equipment is reset, receiving a second dog feeding signal of an inner core layer in the equipment, and restarting timing by the second watchdog timing module.
4. A method according to any of claims 1-3, wherein the processor is reset via a reset pin of the processor, and the entire board is reset via a reset pin of each module in the entire board, the each module comprising the processor and the clock module.
5. An apparatus for recovering from a crash of a device, comprising:
the first watchdog timing module is used for receiving a first dog feeding signal of an application layer in the equipment;
the second watchdog timing module is used for receiving a second dog feeding signal of a kernel layer in the equipment;
the reset control module is used for detecting whether the first watchdog timing module and the second watchdog timing module are overtime or not, and controlling the processor in the equipment to reset if the first watchdog timing module exceeds a first overtime period; and if the second watchdog timing module exceeds a second timeout period, controlling the whole board in the equipment to reset, wherein the second timeout period is greater than the first timeout period, and the whole board comprises the processor and the clock module.
6. The apparatus of claim 5, further comprising a power control module, wherein:
and the power supply control module is used for controlling the equipment to execute the power-on recovery processing again when the time length for resetting the processor exceeds a first time length or the time length for resetting the whole board exceeds a second time length.
7. The apparatus of claim 5,
the reset control module is specifically configured to control a reset pin of the processor to reset or control a reset pin of each module in the entire board to reset, where each module includes the processor and the clock module.
8. The apparatus of any of claims 5-7,
the first watchdog timing module is further configured to restart timing when a first dog feeding signal of an application layer in the device is received after the processor in the device is reset;
the second watchdog timing module is further configured to restart timing when a second dog feeding signal of a kernel layer in the device is received after the whole board in the device is reset.
9. The apparatus of any of claims 5-7, wherein the apparatus further comprises: a first dog feeding signal generation module and a second dog feeding signal generation module, wherein:
the first dog feeding signal generating module is configured to detect an operation condition of an application layer in the device, generate a first dog feeding signal when the application layer in the device operates normally, and periodically send the first dog feeding signal to the first watchdog timing module;
and the second dog feeding signal generating module is used for detecting the running condition of an inner core layer in the equipment, generating a second dog feeding signal when the inner core layer in the equipment runs normally, and periodically sending the second dog feeding signal to the second watchdog timing module.
10. An apparatus for recovering from a crash of a device, comprising:
at least one processor, and
a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method of any one of claims 1-4 by executing the instructions stored by the memory.
11. A computer-readable storage medium having stored thereon computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-4.
CN201910769103.7A 2019-08-20 2019-08-20 Method and device for recovering equipment crash and storage medium Active CN110502369B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910769103.7A CN110502369B (en) 2019-08-20 2019-08-20 Method and device for recovering equipment crash and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910769103.7A CN110502369B (en) 2019-08-20 2019-08-20 Method and device for recovering equipment crash and storage medium

Publications (2)

Publication Number Publication Date
CN110502369A CN110502369A (en) 2019-11-26
CN110502369B true CN110502369B (en) 2022-03-25

Family

ID=68588813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910769103.7A Active CN110502369B (en) 2019-08-20 2019-08-20 Method and device for recovering equipment crash and storage medium

Country Status (1)

Country Link
CN (1) CN110502369B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948162A (en) * 2021-03-15 2021-06-11 南京能瑞自动化设备股份有限公司 Method, device and system for preventing single chip microcomputer from crashing and storage medium
CN113630281B (en) * 2021-10-14 2022-02-15 深圳华北工控股份有限公司 BYPASS control method, device, terminal and storage medium
CN116684520B (en) * 2022-09-30 2024-04-12 荣耀终端有限公司 Shutdown method, electronic equipment, storage medium and chip

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739305A (en) * 2010-02-09 2010-06-16 太仓市同维电子有限公司 Operating system kernel level real-time dongle monitoring device and monitoring method thereof
CN102736958A (en) * 2012-05-30 2012-10-17 深圳市共进电子股份有限公司 Watchdog realizing method and system for monitoring a plurality of process systems
CN103885847A (en) * 2014-02-08 2014-06-25 京信通信系统(中国)有限公司 Dog feeding method and device based on embedded system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7003775B2 (en) * 2001-08-17 2006-02-21 Hewlett-Packard Development Company, L.P. Hardware implementation of an application-level watchdog timer
US9032258B2 (en) * 2012-09-14 2015-05-12 Infineon Technologies Ag Safety system challenge-and-response using modified watchdog timer
US20140201578A1 (en) * 2013-01-11 2014-07-17 Apple Inc. Multi-tier watchdog timer
KR101534974B1 (en) * 2013-12-19 2015-07-08 현대자동차주식회사 Micro core monitoring apparatus and method thereof
CN106844164A (en) * 2015-12-06 2017-06-13 上海协同科技股份有限公司 Device and its reset monitoring method with polylith intelligent chip
CN108304275A (en) * 2018-01-09 2018-07-20 福州瑞芯微电子股份有限公司 A kind of method and apparatus of detection Android system application layer exception
CN109710441A (en) * 2018-12-20 2019-05-03 北京东土军悦科技有限公司 A kind of software watchdog, electronic equipment and computer storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739305A (en) * 2010-02-09 2010-06-16 太仓市同维电子有限公司 Operating system kernel level real-time dongle monitoring device and monitoring method thereof
CN102736958A (en) * 2012-05-30 2012-10-17 深圳市共进电子股份有限公司 Watchdog realizing method and system for monitoring a plurality of process systems
CN103885847A (en) * 2014-02-08 2014-06-25 京信通信系统(中国)有限公司 Dog feeding method and device based on embedded system

Also Published As

Publication number Publication date
CN110502369A (en) 2019-11-26

Similar Documents

Publication Publication Date Title
CN110502369B (en) Method and device for recovering equipment crash and storage medium
CN100568191C (en) The full-range dog-feeding method of embedded system
CN111352338B (en) Dual-redundancy flight control computer and redundancy management method
CN107943603B (en) Running state detection method, detection circuit and electronic equipment
CN111782360A (en) Distributed task scheduling method and device
CN109361542B (en) Client fault processing method, device, system, terminal and server
CN111324494B (en) Processor control method, device and storage medium
CN114116280A (en) Interactive BMC self-recovery method, system, terminal and storage medium
US7428660B2 (en) Starting control method, duplex platform system, and information processor
CN110990328B (en) Method and system for reliable communication between dual processors of TBox
CN102637139A (en) Method, device and single panel for feeding watchdog by embedded system
CN110750374A (en) Watchdog circuit and control method thereof
CN101901168A (en) Watchdog resetting system and resetting method thereof
CN103178977A (en) Computer system and starting-up management method of same
CN101291201A (en) Heart beat information transmission system and method
CN101557307B (en) Dispatch automation system application state management method
KR101369430B1 (en) Apparatus and method for hang up management
CN115904793B (en) Memory transfer method, system and chip based on multi-core heterogeneous system
CN109684117B (en) Processor crash recovery method and device
CN111221683A (en) Double-flash hot backup method, system, terminal and storage medium for data center switch
CN107276832B (en) Method and device for improving communication reliability of PSU and system
TW201416855A (en) System power-on monitoring method and electronic apparatus
CN113938406B (en) Ethernet communication abnormity monitoring and processing method and system based on SOMEIP protocol
JP4126849B2 (en) Multi-CPU system monitoring method
CN116991637B (en) Operation control method and device of embedded system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200107

Address after: 510663 Shenzhou Road 10, Guangzhou Science City, Guangzhou economic and Technological Development Zone, Guangzhou, Guangdong

Applicant after: COMBA TELECOM SYSTEMS (CHINA) Ltd.

Address before: 510663 Shenzhou Road, Guangzhou Science City, Guangzhou economic and Technological Development Zone, Guangdong, 10

Applicant before: COMBA TELECOM SYSTEMS (CHINA) Ltd.

Applicant before: COMBA TELECOM SYSTEMS (GUANGZHOU) Ltd.

Applicant before: COMBA TELECOM TECHNOLOGY (GUANGZHOU) Ltd.

Applicant before: TIANJIN COMBA TELECOM SYSTEMS Ltd.

TA01 Transfer of patent application right
CB02 Change of applicant information

Address after: 510663 Shenzhou Road 10, Guangzhou Science City, Guangzhou economic and Technological Development Zone, Guangzhou, Guangdong

Applicant after: Jingxin Network System Co.,Ltd.

Address before: 510663 Shenzhou Road 10, Guangzhou Science City, Guangzhou economic and Technological Development Zone, Guangzhou, Guangdong

Applicant before: COMBA TELECOM SYSTEMS (CHINA) Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20221130

Address after: 510663 Area A, 7/F, Building (7), No. 10, Shenzhou Road, Huangpu District, Guangzhou, Guangdong

Patentee after: Guangzhou Jingxin Communication Technology Co.,Ltd.

Address before: 510663 Shenzhou Road 10, Guangzhou Science City, Guangzhou economic and Technological Development Zone, Guangzhou, Guangdong

Patentee before: Jingxin Network System Co.,Ltd.

TR01 Transfer of patent right