CN108958989B - System fault recovery method and device - Google Patents

System fault recovery method and device Download PDF

Info

Publication number
CN108958989B
CN108958989B CN201710417137.0A CN201710417137A CN108958989B CN 108958989 B CN108958989 B CN 108958989B CN 201710417137 A CN201710417137 A CN 201710417137A CN 108958989 B CN108958989 B CN 108958989B
Authority
CN
China
Prior art keywords
board card
board
system fault
information
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710417137.0A
Other languages
Chinese (zh)
Other versions
CN108958989A (en
Inventor
笪禹
卜弋天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Orion Star Technology Co Ltd
Original Assignee
Beijing Orion Star Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Orion Star Technology Co Ltd filed Critical Beijing Orion Star Technology Co Ltd
Priority to CN201710417137.0A priority Critical patent/CN108958989B/en
Publication of CN108958989A publication Critical patent/CN108958989A/en
Application granted granted Critical
Publication of CN108958989B publication Critical patent/CN108958989B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The embodiment of the application provides a system fault recovery method and device, relates to the technical field of computers, and is applied to a first board card in intelligent equipment, wherein the method comprises the following steps: after determining that a system fault occurs in a second board card, broadcasting fault information of the system fault occurring in the second board card, wherein the second board card is as follows: a board card of the intelligent device except the first board card; under the condition that fault information of system faults of the second board card broadcasted by other board cards except the first board card in the intelligent equipment is received, judging whether the second board card can be used for carrying out system fault recovery on the second board card; and if so, performing system fault recovery on the second board card. By applying the scheme provided by the embodiment of the application to system fault recovery, the operation can be simplified, and the fault recovery efficiency can be improved.

Description

System fault recovery method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for recovering a system failure.
Background
With the rapid development of hardware technology, more and more intelligent devices such as robots enter people's lives. It will be appreciated that these smart devices typically include a multitude of functional modules, for example, a robot including: the system comprises a man-machine interaction module, an image recognition module, a voice recognition module, a mechanical control module, a power management module, a motion control module and the like, wherein the functional modules are distributed on different boards, and software systems on the boards may be different, for example, the software systems may be: android system, Linux system, bear-metal system, etc.
In the operation process of the intelligent device, software systems on the board cards may fail due to user operation or other reasons, and the intelligent device may not operate normally. When the above-mentioned condition appears in the prior art, generally need maintenance personal to connect the trouble integrated circuit board through PC or notebook computer, carry out system fault recovery to the trouble integrated circuit board through special instrument.
Therefore, when the system fault recovery is performed on the board card by applying the mode in the prior art, the operation is complex and the fault recovery efficiency is low.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for system fault recovery, so as to simplify operations performed on a board card during system fault recovery and improve fault recovery efficiency. The specific technical scheme is as follows:
a system fault recovery method is applied to a first board card in intelligent equipment, and comprises the following steps:
after determining that a system fault occurs in a second board card, broadcasting fault information of the system fault occurring in the second board card, wherein the second board card is as follows: a board card of the intelligent device except the first board card;
under the condition that fault information of system faults of the second board card broadcasted by other board cards except the first board card in the intelligent equipment is received, judging whether the second board card can be used for carrying out system fault recovery on the second board card;
And if so, performing system fault recovery on the second board card.
In an implementation manner of the present application, it is determined that the second board card has a system fault by:
determining a target bus from the buses of the intelligent device;
sending fault detection information to the second board card through each target bus;
monitoring whether a fault detection response from the second board card is not received through each target bus;
if so, determining that the second board card has a system fault;
the step of broadcasting the fault information of the second board card with the system fault includes:
and broadcasting fault information of system faults of the second board card through each target bus.
In an implementation manner of the present application, the step of determining whether the second board card can be used for system failure recovery includes:
self-recommendation information of the self-recommendation board is broadcasted, and self-recommendation information broadcasted by other board cards is monitored;
when self-recommendation information broadcasted by other board cards is monitored, whether the self-recommendation information can be used for carrying out system fault recovery on the second board card is judged according to the self-recommendation information and the received self-recommendation information;
and when the self-recommendation information broadcast by other boards is not monitored, determining that the self-energy can be used for performing system fault recovery on the second board.
In one implementation manner of the present application, before the step of broadcasting self-referral information of the self, the method further includes:
acquiring self hardware state information, and judging whether the self hardware state information meets a first self-recommendation condition;
if the first self-recommendation condition is met, obtaining a self-service scene;
judging whether the self service scene meets a second self-recommendation condition;
and if the second self-referral condition is met, executing the step of broadcasting self-referral information of the self-referral information.
In an implementation manner of the present application, before the step of performing system failure recovery on the second board, the method further includes:
sending system fault recovery prompt information;
acquiring response information of a user aiming at the system fault recovery prompt information;
and executing the step of performing system fault recovery on the second board card under the condition that the response information indicates that the user agrees to perform system fault recovery.
In an implementation manner of the present application, the step of performing system failure recovery on the second board includes:
turning off a power supply of the second board card;
controlling a system on the second board card to enter a fault recovery mode through an I/O pin connected between the board cards so as to enable the second board card to carry out system fault recovery;
Receiving a system fault recovery completion notification sent by the second board card after the system fault recovery is completed;
determining whether the second board card completes system fault recovery according to the system fault recovery completion notification;
if so, the power supply of the second board card is turned off, and the power supply of the second board card is turned on again, so that the second board card is subjected to system restart.
In an implementation manner of the present application, the step of performing system failure recovery on the second board includes:
switching the starting equipment of the second board card to the backup equipment of the second board card by setting a hardware I/O state;
resetting the second board card so that: the second board card is reset and started, starting equipment is switched to the backup equipment after the hardware I/O state is read, and files for system fault recovery in the backup equipment are recovered to the main equipment of the second board card;
receiving a system fault recovery completion notification sent by the second board card after the system fault recovery is completed;
determining whether the second board card completes system fault recovery according to the system fault recovery completion notification;
If so, clearing the hardware I/O state;
and sending a reset notice to the second board card so that the second board card is reset and started.
In an implementation manner of the present application, the step of performing system failure recovery on the second board includes:
switching the starting mode of the second board card to a preset upgrading mode by setting a hardware I/O state;
resetting the second board card so that: the second board card is reset and started and enters the preset upgrading mode after the hardware I/O state is read;
upgrading the second board card according to the preset upgrading mode;
after the upgrading of the second board card is determined to be completed, clearing the hardware I/O state;
and sending a reset notice to the second board card so that the second board card is reset and started.
In an implementation manner of the present application, the step of performing system failure recovery on the second board includes:
upgrading the second board card by operating a preset flash memory programming program;
and after the second board card is determined to be upgraded, resetting the second board card so as to reset and start the second board card.
A system failure recovery device is applied to a first board card in intelligent equipment, and the device comprises:
the information broadcasting module is used for broadcasting the fault information of the system fault of the second board card after the second board card is determined to have the system fault, wherein the second board card is as follows: a board card of the intelligent device except the first board card;
the recovery judging module is used for judging whether the second board card can be used for performing system fault recovery on the second board card or not under the condition that fault information of system faults of the second board card broadcast by other board cards except the first board card in the intelligent equipment is received, and if the fault information is received, triggering the fault recovery module;
and the fault recovery module is used for recovering the system fault of the second board card.
In an implementation manner of the present application, the system failure recovery apparatus further includes: a fault determination module;
the fault determining module is used for determining whether the second board card has a system fault;
the fault determination module includes:
the bus determining submodule is used for determining a target bus from the buses of the intelligent equipment;
the message sending submodule is used for sending fault detection messages to the second board card through each target bus;
The response detection submodule is used for monitoring whether fault detection responses from the second board card are not received through all the target buses, and if so, the fault determination submodule is triggered;
the fault determining submodule is used for determining that the second board card has a system fault;
the information broadcasting module is specifically configured to broadcast fault information that the second board card has a system fault through each target bus.
In an implementation manner of the present application, the recovery determining module includes:
the information receiving and determining submodule is used for determining the fault information of the system fault of the second board card which is broadcast by other board cards except the first board card in the intelligent equipment;
the information broadcasting sub-module is used for broadcasting self-recommendation information of the information broadcasting sub-module, monitoring the self-recommendation information broadcast by other board cards, triggering the recovery judgment sub-module when the self-recommendation information broadcast by other board cards is monitored, and triggering the recovery determination sub-module when the self-recommendation information broadcast by other board cards is not monitored;
the recovery judgment submodule is used for judging whether the second board card can be used for carrying out system fault recovery on the second board card according to self-recommendation information and received self-recommendation information;
And the recovery determining submodule is used for determining that the self-energy can be used for performing system fault recovery on the second board card.
In an implementation manner of the present application, the recovery determining module further includes:
the state information obtaining submodule is used for obtaining self hardware state information after the information receiving determining submodule determines that the fault information is received, judging whether the self hardware state information meets a first self-recommendation condition or not, and triggering the scene obtaining submodule if the self hardware state information meets the first self-recommendation condition;
the scene obtaining submodule is used for obtaining a self service scene;
and the condition judgment submodule is used for judging whether the self service scene meets a second self-recommendation condition or not, and triggering the information broadcasting submodule if the self service scene meets the second self-recommendation condition.
In an implementation manner of the present application, the system failure recovery apparatus further includes:
the information sending module is used for sending system fault recovery prompt information;
and the information acquisition module is used for acquiring response information of the user aiming at the system fault recovery prompt information and triggering the fault recovery module under the condition that the response information indicates that the user agrees to carry out system fault recovery.
In an implementation manner of the present application, the failure recovery module is specifically configured to turn off a power supply of the second board card; controlling a system on the second board card to enter a fault recovery mode through an I/O pin connected between the board cards so as to enable the second board card to carry out system fault recovery; receiving a system fault recovery completion notification sent by the second board card after the system fault recovery is completed; determining whether the second board card completes system fault recovery according to the system fault recovery completion notification; if so, the power supply of the second board card is turned off, and the power supply of the second board card is turned on again, so that the second board card is subjected to system restart.
In an implementation manner of the present application, the failure recovery module is specifically configured to switch a starting device of the second board card to a backup device of the second board card by setting a hardware I/O state; resetting the second board card so that: the second board card is reset and started, starting equipment is switched to the backup equipment after the hardware I/O state is read, and files for system fault recovery in the backup equipment are recovered to the main equipment of the second board card; receiving a system fault recovery completion notification sent by the second board card after the system fault recovery is completed; determining whether the second board card completes system fault recovery according to the system fault recovery completion notification; if so, clearing the hardware I/O state; and sending a reset notice to the second board card so that the second board card is reset and started.
In an implementation manner of the present application, the failure recovery module is specifically configured to switch a start mode of the second board card to a preset upgrade mode by setting a hardware I/O state; resetting the second board card so that: the second board card is reset and started and enters the preset upgrading mode after the hardware I/O state is read; upgrading the second board card according to the preset upgrading mode; after the upgrading of the second board card is determined to be completed, clearing the hardware I/O state; and sending a reset notice to the second board card so that the second board card is reset and started.
In an implementation manner of the present application, the failure recovery module is specifically configured to upgrade the second board card by running a preset flash programming program; and after the second board card is determined to be upgraded, resetting the second board card so as to reset and start the second board card.
The utility model provides an electronic equipment, electronic equipment is first integrated circuit board in the smart machine, includes: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;
A memory for storing a computer program;
the processor is configured to implement the steps of the system failure recovery method provided in the embodiment of the present application when executing the program stored in the memory.
A computer-readable storage medium, which is a readable storage medium of a first board in an intelligent device, and a computer program is stored in the computer-readable storage medium, and when being executed by a processor, the computer program implements the system fault recovery method steps provided in an embodiment of the present application.
As can be seen from the above, in the scheme provided in the embodiment of the present application, a plurality of boards in the intelligent device cooperate to determine whether a certain board in the intelligent device has a system fault, and when it is determined that the board has the system fault, a board in the intelligent device that does not have the system fault performs fault recovery on the board that has the fault. Therefore, when the scheme provided by the embodiment of the application is applied to fault recovery, manual operation of maintenance personnel is not needed, operation of the board in system fault recovery is simplified, and further fault recovery efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a system fault recovery method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of another system fault recovery method according to an embodiment of the present disclosure;
fig. 3 is a schematic signaling flow diagram of a system failure recovery method according to an embodiment of the present application;
fig. 4 is a schematic signaling flow diagram of another system fault recovery method according to an embodiment of the present application;
fig. 5 is a schematic signaling flow diagram of another system fault recovery method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a system fault recovery apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a schematic flow diagram of a system fault recovery method provided in an embodiment of the present application, where the method is applied to a first board in an intelligent device, where the intelligent device includes at least two boards, and the first board is any one board in the intelligent device.
Specifically, the method comprises the following steps:
s101: and after the second board card is determined to have the system fault, broadcasting fault information of the second board card having the system fault.
Wherein, the second board card is: the intelligent device comprises a board card except the first board card.
In an implementation manner of the present application, it may be determined whether the second board card has a system fault in the following manner: determining a target bus from buses of the intelligent equipment; sending fault detection information to the second board card through each target bus; monitoring whether a fault detection response from the second board card is not received through each target bus; if so, determining that the second board card has a system fault; if it is monitored that a fault detection response from the second board card is received through at least one of the target buses, it is indicated to a certain extent that the second board card has no system fault, so that it cannot be determined that the second board card has a system fault.
In this case, when the fault information of the system fault of the second board card is broadcasted, the fault information of the system fault of the second board card can be broadcasted through each target bus.
It is understood that multiple buses may be provided in the smart device, and in the embodiment of the present application, one or more buses may be selected from the multiple buses of the smart device according to the specific states of the various buses, so as to send the fault detection message to the second board. Preferably, two or more buses can be selected as the target bus, so that the second board card can be effectively prevented from failing to receive the fault detection message due to reasons such as data transmission delay and the like in the process of sending the fault detection message through one bus, the second board card can not feed back a fault detection response to the first board card, and the second board card is further misjudged to have a system fault.
In addition, the above fault detection message may be understood as: and the message is used for detecting whether the board card has system fault.
The fault detection response can be understood as: and the board A receives the fault detection message and then notifies the board B that the board A has no system fault, wherein the board B is a board which sends the fault detection message to the board A.
That is to say, if the board card a has no system fault, the board card a can normally receive the fault detection message sent by the board card B, and then feed back a fault detection response to the board card B; if the board card a has a system fault and the board card a cannot normally work, the board card a cannot receive the fault detection message sent by the board card B, or cannot respond to the fault detection message after receiving the fault detection message sent by the board card B, and further under the two conditions, the board card a cannot send a fault detection response to the board card B.
S102: under the condition that fault information of system faults of a second board card broadcasted by other board cards except the first board card in the intelligent equipment is received, whether the second board card can be used for carrying out system fault recovery on the second board card is judged, and if the second board card can be used for carrying out system fault recovery on the second board card, S103 is executed.
After the first board card determines that the second board card has the system fault, the second board card needs to be subjected to system fault recovery in order to ensure normal operation of the intelligent device. In the embodiment of the application, the mode that in the prior art, the maintenance personnel manually carry out system fault recovery on the fault board card is abandoned, and the board card without the system fault in the intelligent equipment is selected, namely, the board card in the normal state carries out system fault recovery on the fault board card. In view of this, the first board needs to determine whether the first board has the capability of performing system fault recovery on the second board, if so, the system fault recovery is performed on the second board, and if not, the first board gives up performing system fault recovery on the second board, and the other boards in the intelligent device perform system fault recovery on the second board.
The first board card is any board card in the intelligent device, that is, the system fault recovery method provided in the embodiment of the present application is applicable to any board card in the intelligent device, based on this, the first board card confirms whether the second board card has a system fault, and meanwhile, other board cards also confirm whether the second board card has a system fault, and when the other board cards determine that the second board card has a system fault, the same operation as the first board card may be adopted to broadcast fault information that the second board card has a system fault, so that the first board card may receive fault information that the second board card broadcast by the other board cards has a system fault.
In addition, in the process that the first board card confirms whether the second board card has the system fault or not, the second board card may be judged to have the system fault due to the communication delay and the like, thereby causing a determination error, in this embodiment, after the fault information of the system fault of the second board card broadcast by the other board cards is received, that is, other board cards also judge whether the second board card can be used for carrying out system fault recovery on the second board card after confirming that the second board card has system fault, therefore, the first board card does not determine whether to recover the system fault of the second board card only after the first board card confirms that the system fault of the second board card occurs, but jointly confirm whether the second board card has a system fault with other board cards, that is, whether the second board card has a system fault is jointly confirmed by the multiple board cards, so that the probability of misjudging the second board card to have the system fault can be effectively reduced.
In one implementation mode of the application, when judging whether the self can be used for carrying out system fault recovery on the second board card, the self-recommendation information of the self can be broadcasted, and the self-recommendation information broadcasted by other board cards is monitored; when self-recommendation information broadcasted by other board cards is monitored, whether the self-recommendation information can be used for carrying out system fault recovery on a second board card is judged according to the self-recommendation information and the received self-recommendation information; and when the self-recommendation information broadcast by other boards is not monitored, determining that the self-energy can be used for performing system fault recovery on the second board.
Wherein, the self-referral information can be understood as: one board card in the intelligent equipment recommends information for performing system fault recovery on a fault board card in the intelligent equipment to other board cards. Specifically, the self-referral information may include: the present application is only described with reference to the information such as CPU occupancy and service scenario, and the specific content included in the self-referral information is not limited.
Specifically, when the first board card judges whether the first board card can be used for performing system fault recovery on the second board card according to self-recommendation information and received self-recommendation information, the priority between the first board card and each board card corresponding to the received self-recommendation information can be determined according to the self-recommendation information and the received self-recommendation information, then whether the board card with the highest priority is the first board card itself is judged, if so, the first board card is determined to be used for performing system fault recovery on the second board card, and if not, the system fault recovery on the second board card is abandoned.
For example, the priority between the first board and each board corresponding to the received self-referral information may be determined in such a manner that the lower the CPU occupancy, the higher the priority, and the like.
In an implementation manner of the application, before broadcasting self-recommendation information, hardware state information of the user can be obtained, and whether the hardware state information meets a first self-recommendation condition is judged; if the first self-recommendation condition is met, obtaining a self-service scene; judging whether the self service scene meets a second self-recommendation condition; and if the second self-referral condition is met, executing the step of broadcasting self-referral information of the self-referral information.
Specifically, the hardware state information may include: CPU occupancy, memory occupancy, and the like.
The first self-recommendation condition is a condition related to hardware status information, and may be different according to different hardware status information, for example, when the hardware status information includes a CPU occupancy rate, if the CPU occupancy rate is high, it indicates that the board is in a busy state, there may not be enough resources to perform system failure recovery on the failed board, and in this case, the system failure recovery on the failed board may be abandoned, and the self-recommendation information is not broadcast; on the contrary, if the CPU occupancy rate is low, it indicates that the idle resources in the board card are relatively abundant, and there may be enough resources to perform system failure recovery on the failed board card, and under this condition, it may recommend itself to other board cards in the intelligent device to see whether it can perform system failure recovery on the failed board card. In view of the above, the first self-recommended condition may be: the CPU occupancy is less than a preset value, e.g., 40%, etc.
The second self-recommended condition is a condition related to a business scenario. It can be understood that the overall demands on resources are different in different service scenarios, and the overall demands on resources are higher in some service scenarios, and in this case, in order to ensure normal execution of the service, sufficient resources may not be provided to perform system fault recovery on the failed board card, and in this case, it may not be recommended to other board cards in the intelligent device to perform system fault recovery on the failed board card; in addition, the overall demand of some service scenarios on resources is low, and under such a situation, sufficient resources may be provided to perform system fault recovery on the failed board, and system fault recovery on the failed board may be recommended to other boards in the intelligent device. In view of the above, the second self-recommended condition may be: the service scene is a preset service scene, and the like.
S103: and recovering the system fault of the second board card.
The method for performing system failure recovery on the second board may be influenced by the type of the system on the second board, and specific details may refer to the methods provided in the embodiments shown in fig. 3 to fig. 5, which are not described in detail herein.
As can be seen from the above, in the solutions provided in the above embodiments, the multiple board cards in the intelligent device cooperate to determine whether a certain board card in the intelligent device has a system fault, and when it is determined that the certain board card has the system fault, one board card in the intelligent device that has no system fault performs fault recovery on the faulty board card. Therefore, when the scheme provided by each embodiment is applied to fault recovery, manual operation of maintenance personnel is not needed, operation of the board in system fault recovery is simplified, and further fault recovery efficiency is improved.
The following describes a system failure recovery method provided in an embodiment of the present application by using a specific embodiment, and referring to fig. 2, a schematic flow diagram of another system failure recovery method is provided, where the method is applied to a first board in an intelligent device, and includes:
s201: and determining a target bus from the buses of the intelligent device.
S202: and sending fault detection information to the second board card through each target bus.
S203: and monitoring whether the fault detection response from the second board card is not received through each target bus, and if so, executing S204.
S204: and determining that the second board card has a system fault.
S205: and broadcasting fault information of system faults of the second board card through each target bus.
S206: and under the condition that fault information of system faults of a second board card broadcasted by other board cards except the first board card in the intelligent equipment is received, acquiring own hardware state information, judging whether the own hardware state information meets a first self-recommendation condition, and if so, executing S207.
S207: and obtaining the self service scene.
S208: and judging whether the self service scene meets the second self-recommendation condition, and if so, executing S209.
S209: self-recommendation information of the board is broadcasted, and self-recommendation information broadcasted by other boards is monitored.
S210: when the self-referral information broadcasted by other boards is monitored, whether the board can be used for performing system fault recovery on the second board or not is judged according to the self-referral information and the received self-referral information, and if the board can be used for performing system fault recovery on the second board, S212 is executed.
S211: when the self-referral information broadcast by other boards is not monitored, determining that self-enablement can be used for system fault recovery of the second board, and executing S212.
S212: and recovering the system fault of the second board card.
In an implementation manner of the present application, before performing system failure recovery on the second board card, system failure recovery prompt information may also be sent; acquiring response information of a user for system fault recovery prompt information; and executing the step of performing system fault recovery on the second board card under the condition that the response information indicates that the user agrees to perform system fault recovery.
Before the system fault recovery is carried out on the second board card, the user can be prompted to carry out the system fault recovery on the second board card, under the condition, if the user agrees to carry out the system fault recovery on the second board card, the S103 step is executed to carry out the system fault recovery on the second board card, if the user disagrees to carry out the system fault recovery on the second board card, the system fault recovery on the second board card can be postponed, the system fault recovery can be carried out on the fault board card by taking the user' S will into account, and the user experience can be greatly improved.
Specifically, the fault recovery prompt message may be a voice-type system fault recovery prompt message, or may also be a text-type or graphic-type system fault recovery prompt message, which is not limited in the present application.
When the fault recovery prompt message is a voice-type system fault recovery prompt message, the first board card sends the system fault recovery prompt message, the system fault recovery prompt message can be sent to a voice processing module in the intelligent device, the voice processing module plays the system fault recovery prompt message for a user, in addition, the voice processing module can obtain voice response of the user, and sends a processing result to the first board card after voice recognition and other processing are carried out on the voice response, and then the first board card obtains response information of the user aiming at the system fault recovery prompt message.
When the system fault recovery prompt message is in the form of characters, graphs and the like, the first board card sends the system fault recovery prompt message, the system fault recovery prompt message can be sent to an interface display module in the intelligent device, the interface display module displays the system fault recovery prompt message for a user, in addition, the interface display module can obtain a response input by the user, the response input by the user is sent to the first board card, and then the first board card obtains the response message of the user for the system fault recovery prompt message.
Specifically, whether the user agrees to perform system failure recovery may be determined according to at least one of the following information:
the user responds to the voice system failure recovery prompt message;
and the user responds to the system fault recovery prompt information in the forms of characters, graphs and the like.
Since the system fault recovery method for the faulty board is influenced by the system type on the faulty board, there are currently multiple system types, and different system fault recovery methods for the first board to the second board are introduced below through different embodiments.
In an implementation manner of the present application, referring to fig. 3, a signaling flow diagram of a system failure recovery method is provided, where the method includes:
the first board card turns off the power supply of the second board card (S301), the second board card is powered off (S302), the first board card controls a system on the second board card to enter a fault recovery mode through an I/O pin connected between the board cards (S303), the second board card is powered on and enters a fault recovery mode (S304), then the second board card can start to carry out system fault recovery (S305), the second board card sends a system fault recovery completion notice to the first board card after the system fault recovery is completed (S306), the first board card is informed that the system fault recovery is completed, the first board card receives the system fault recovery completion notice, whether the second board card completes the system fault recovery or not is determined according to the system fault recovery completion notice (S307), if yes, the power supply of the second board card is turned off (S308), the second board card is powered off (S309), and then the first board card turns on the power supply of the second board card (S310), and the second board card restarts the system (S311), so far that the first board card completely completes the system fault recovery of the second board card.
Specifically, the inventor verifies through experiments that the above method can be applied to the case that the system on the second board card is the Android system.
Each board card is generally provided with a main device and a backup device, wherein the main device is used for storing information such as system files of a system on the board card, the operation of the system on the board card depends on the system files in the main device, and when the system fault occurs on the board card, the fault can be understood to exist in the system files stored in the main device; the backup device is used for storing backup files of the system on the board card.
Based on the above situation, in an implementation manner of the present application, referring to fig. 4, a signaling flow diagram of another system fault recovery method is provided, where the method includes:
the first board card switches the starting device of the second board card to the backup device of the second board card (S401) by setting a hardware I/O state (S402), resets the second board card (S403), switches the starting device to the backup device (S405) after reading the hardware I/O state (S404), restores the files for system fault recovery in the backup device to the main device of the second board card (S406), sends a system fault recovery completion notification to the first board card (S407) after the system fault recovery is completed, notifies the first board card of completion of system fault recovery, receives the system fault recovery completion notification, determines whether the second board card completes the system fault recovery according to the system fault recovery completion notification (S408), and if so, clears the hardware I/O state (S409), and sending a reset notification to the second board card (S410), and resetting and starting the second board card (S411), so that the first board card completely completes system fault recovery of the second board card.
Specifically, the inventor verifies through experiments that the above method can be applied to the case where the system on the second board card is a Linux system.
In an implementation manner of the present application, referring to fig. 5, a signaling flow diagram of another system fault recovery method is provided, where the method includes:
the first board card is used for switching a starting mode of the second board card to a preset upgrading mode (S501) by setting a hardware I/O state, resetting operation is carried out on the second board card (S502), the second board card is reset and started (503) and enters the preset upgrading mode (S505) after the hardware I/O state is read (S504), the first board card is used for upgrading the second board card (S506) according to the preset upgrading mode, the hardware I/O state is cleared (S508) after the second board card is confirmed to be upgraded (S507), a reset notification is sent to the second board card (S509), the second board card is reset and started (S510), and the first board card is used for completely completing system fault recovery of the second board card.
The preset upgrade mode may be: ISP upgrade mode.
Specifically, the inventor verifies through experiments that the above method can be applied to the case that the system on the second board card is a bear-metal system and the second board card supports ISP upgrading.
In an implementation manner of the present application, system fault recovery can also be performed on the second board card by the following manner:
upgrading the second board card by operating a preset flash memory programming program;
and after the upgrade of the second board card is determined to be completed, resetting the second board card so as to reset and start the second board card.
The preset flash memory programming program may be: JTAG procedure.
Specifically, the inventor verifies through experiments that the above method can be applied to the case that the system on the second board card is a bear-metal system and the second board card supports JTAG upgrade.
Corresponding to the system fault recovery method, the embodiment of the application also provides a system fault recovery device.
Fig. 6 is a schematic structural diagram of a system failure recovery apparatus provided in an embodiment of the present application, where the apparatus is applied to a first board in an intelligent device, and includes:
the information broadcasting module 601 is configured to broadcast fault information that a system fault occurs on a second board card after it is determined that the system fault occurs on the second board card, where the second board card is: a board card of the intelligent device except the first board card;
a recovery determining module 602, configured to determine, when receiving failure information of a system failure of the second board broadcasted by other boards in the intelligent device except the first board, whether the second board can be used for performing system failure recovery on the second board, and if so, trigger a failure recovery module 603;
The failure recovery module 603 is configured to perform system failure recovery on the second board.
Specifically, the system failure recovery apparatus may further include: a fault determination module;
the fault determining module is used for determining whether the second board card has a system fault;
the fault determination module includes:
the bus determining submodule is used for determining a target bus from the buses of the intelligent equipment;
the message sending submodule is used for sending fault detection messages to the second board card through each target bus;
the response detection submodule is used for monitoring whether fault detection responses from the second board card are not received through all the target buses, and if so, the fault determination submodule is triggered;
the fault determining submodule is used for determining that the second board card has a system fault;
the information broadcasting module is specifically configured to broadcast fault information that the second board card has a system fault through each target bus.
Specifically, the recovery determining module 602 may include:
the information receiving and determining submodule is used for determining the fault information of the system fault of the second board card which is broadcast by other board cards except the first board card in the intelligent equipment;
The information broadcasting sub-module is used for broadcasting self-recommendation information of the information broadcasting sub-module, monitoring the self-recommendation information broadcast by other board cards, triggering the recovery judgment sub-module when the self-recommendation information broadcast by other board cards is monitored, and triggering the recovery determination sub-module when the self-recommendation information broadcast by other board cards is not monitored;
the recovery judgment submodule is used for judging whether the second board card can be used for carrying out system fault recovery on the second board card according to self-recommendation information and received self-recommendation information;
and the recovery determining submodule is used for determining that the self-energy can be used for performing system fault recovery on the second board card.
Specifically, the recovery determining module 602 may further include:
the state information obtaining submodule is used for obtaining self hardware state information after the information receiving determining submodule determines that the fault information is received, judging whether the self hardware state information meets a first self-recommendation condition or not, and triggering the scene obtaining submodule if the self hardware state information meets the first self-recommendation condition;
the scene obtaining submodule is used for obtaining a self service scene;
and the condition judgment submodule is used for judging whether the self service scene meets a second self-recommendation condition or not, and triggering the information broadcasting submodule if the self service scene meets the second self-recommendation condition.
Specifically, the system failure recovery apparatus may further include:
the information sending module is used for sending system fault recovery prompt information;
and the information acquisition module is used for acquiring response information of the user aiming at the system fault recovery prompt information and triggering the fault recovery module under the condition that the response information indicates that the user agrees to carry out system fault recovery.
Specifically, the failure recovery module 603 is specifically configured to turn off the power supply of the second board card; controlling a system on the second board card to enter a fault recovery mode through an I/O pin connected between the board cards so as to enable the second board card to carry out system fault recovery; receiving a system fault recovery completion notification sent by the second board card after the system fault recovery is completed; determining whether the second board card completes system fault recovery according to the system fault recovery completion notification; if so, the power supply of the second board card is turned off, and the power supply of the second board card is turned on again, so that the second board card is subjected to system restart.
Specifically, the failure recovery module 603 is specifically configured to switch the starting device of the second board card to the backup device of the second board card by setting a hardware I/O state; resetting the second board card so that: the second board card is reset and started, starting equipment is switched to the backup equipment after the hardware I/O state is read, and files for system fault recovery in the backup equipment are recovered to the main equipment of the second board card; receiving a system fault recovery completion notification sent by the second board card after the system fault recovery is completed; determining whether the second board card completes system fault recovery according to the system fault recovery completion notification; if so, clearing the hardware I/O state; and sending a reset notice to the second board card so that the second board card is reset and started.
Specifically, the failure recovery module 603 is specifically configured to switch a start mode of the second board card to a preset upgrade mode by setting a hardware I/O state; resetting the second board card so that: the second board card is reset and started and enters the preset upgrading mode after the hardware I/O state is read; upgrading the second board card according to the preset upgrading mode; after the upgrading of the second board card is determined to be completed, clearing the hardware I/O state; and sending a reset notice to the second board card so that the second board card is reset and started.
Specifically, the failure recovery module 603 is specifically configured to upgrade the second board card by running a preset flash programming program; and after the second board card is determined to be upgraded, resetting the second board card so as to reset and start the second board card.
As can be seen from the above, in the solutions provided in the above embodiments, the multiple board cards in the intelligent device cooperate to determine whether a certain board card in the intelligent device has a system fault, and when it is determined that the certain board card has the system fault, one board card in the intelligent device that has no system fault performs fault recovery on the faulty board card. Therefore, when the scheme provided by each embodiment is applied to fault recovery, manual operation of maintenance personnel is not needed, operation of the board in system fault recovery is simplified, and further fault recovery efficiency is improved.
An embodiment of the present application further provides an electronic device, where the electronic device is a first board card in an intelligent device, as shown in fig. 7, including: a processor 701, a communication interface 702, a memory 703 and a communication bus 704, wherein the processor 701, the communication interface 702 and the memory 703 are communicated with each other via the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to implement the system failure recovery method provided in the embodiment of the present application when executing the program stored in the memory 703.
Specifically, the system fault recovery method includes:
after determining that a system fault occurs in a second board card, broadcasting fault information of the system fault occurring in the second board card, wherein the second board card is as follows: a board card of the intelligent device except the first board card;
under the condition that fault information of system faults of the second board card broadcasted by other board cards except the first board card in the intelligent equipment is received, judging whether the second board card can be used for carrying out system fault recovery on the second board card;
and if so, performing system fault recovery on the second board card.
Other embodiments of the system fault recovery method are the same as those mentioned in the previous method, and are not described again here.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a network Processor (Ne word Processor, NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
As can be seen from the above, in the scheme provided in this embodiment, the multiple board cards in the intelligent device cooperate to determine whether a certain board card in the intelligent device has a system fault, and when it is determined that the board card has the system fault, a board card in the intelligent device that does not have the system fault performs fault recovery on the board card that has the fault. Therefore, when the scheme provided by the embodiment is applied to fault recovery, manual operation of maintenance personnel is not needed, operation of the board card during system fault recovery is simplified, and further fault recovery efficiency is improved.
The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is a readable storage medium of a first board in an intelligent device, and a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method for recovering a system failure provided in the embodiment of the present application is implemented.
Specifically, the system fault recovery method includes:
after determining that a system fault occurs in a second board card, broadcasting fault information of the system fault occurring in the second board card, wherein the second board card is as follows: a board card of the intelligent device except the first board card;
Under the condition that fault information of system faults of the second board card broadcasted by other board cards except the first board card in the intelligent equipment is received, judging whether the second board card can be used for carrying out system fault recovery on the second board card;
and if so, performing system fault recovery on the second board card.
Other embodiments of the system fault recovery method are the same as those mentioned in the previous method, and are not described again here.
As can be seen from the above, in the scheme provided in this embodiment, after executing the computer program stored in the computer-readable storage medium, the multiple board cards in the intelligent device cooperate to determine whether a system fault occurs on one board card in the intelligent device, and when it is determined that the system fault occurs on the board card, a board card in the intelligent device that does not have the system fault performs fault recovery on the board card that has the fault. Therefore, when the scheme provided by the embodiment is applied to fault recovery, manual operation of maintenance personnel is not needed, operation of the board card during system fault recovery is simplified, and further fault recovery efficiency is improved.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, and the computer-readable storage medium, since they are substantially similar to the embodiments of the method, the description is simple, and for the relevant points, reference may be made to the partial description of the embodiments of the method.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (18)

1. The system fault recovery method is characterized by being applied to a first board card in intelligent equipment, wherein the first board card is any one board card in the intelligent equipment;
the method comprises the following steps:
after determining that a system fault occurs in a second board card, broadcasting fault information of the system fault occurring in the second board card, wherein the second board card is as follows: a board card of the intelligent device except the first board card;
under the condition that fault information of system faults of the second board card broadcasted by other board cards except the first board card in the intelligent equipment is received, judging whether the second board card can be used for carrying out system fault recovery on the second board card;
If so, performing system fault recovery on the second board card;
wherein, the step of judging whether the second board card can be used for system fault recovery of the second board card comprises the following steps:
broadcasting self-recommendation information of the first board card, and monitoring self-recommendation information broadcast by other board cards, wherein the self-recommendation information is information that the first board card recommends the first board card to other board cards to carry out system fault recovery on the fault board card;
when self-recommendation information broadcasted by other board cards is monitored, whether the self-recommendation information can be used for carrying out system fault recovery on the second board card is judged according to the self-recommendation information and the received self-recommendation information;
and when the self-recommendation information broadcast by other boards is not monitored, determining that the self-energy can be used for performing system fault recovery on the second board.
2. The method of claim 1, wherein the second board is determined to have a system failure by:
determining a target bus from the buses of the intelligent device;
sending fault detection information to the second board card through each target bus;
monitoring whether a fault detection response from the second board card is not received through each target bus;
if so, determining that the second board card has a system fault;
The step of broadcasting the fault information of the second board card with the system fault includes:
and broadcasting fault information of system faults of the second board card through each target bus.
3. The method according to claim 1, further comprising, before the step of broadcasting self-referral information of itself:
acquiring self hardware state information, and judging whether the self hardware state information meets a first self-recommendation condition;
if the first self-recommendation condition is met, obtaining a self-service scene;
judging whether the self service scene meets a second self-recommendation condition;
and if the second self-referral condition is met, executing the step of broadcasting self-referral information of the self-referral information.
4. The method of claim 1, further comprising, prior to the step of performing system failure recovery on the second board:
sending system fault recovery prompt information;
acquiring response information of a user aiming at the system fault recovery prompt information;
and executing the step of performing system fault recovery on the second board card under the condition that the response information indicates that the user agrees to perform system fault recovery.
5. The method of claim 1, wherein the step of performing system failure recovery on the second board comprises:
Turning off a power supply of the second board card;
controlling a system on the second board card to enter a fault recovery mode through an I/O pin connected between the board cards so as to enable the second board card to carry out system fault recovery;
receiving a system fault recovery completion notification sent by the second board card after the system fault recovery is completed;
determining whether the second board card completes system fault recovery according to the system fault recovery completion notification;
if so, the power supply of the second board card is turned off, and the power supply of the second board card is turned on again, so that the second board card is subjected to system restart.
6. The method of claim 1, wherein the step of performing system failure recovery on the second board comprises:
switching the starting equipment of the second board card to the backup equipment of the second board card by setting a hardware I/O state;
resetting the second board card so that: the second board card is reset and started, starting equipment is switched to the backup equipment after the hardware I/O state is read, and files for system fault recovery in the backup equipment are recovered to the main equipment of the second board card;
Receiving a system fault recovery completion notification sent by the second board card after the system fault recovery is completed;
determining whether the second board card completes system fault recovery according to the system fault recovery completion notification;
if so, clearing the hardware I/O state;
and sending a reset notice to the second board card so that the second board card is reset and started.
7. The method of claim 1, wherein the step of performing system failure recovery on the second board comprises:
switching the starting mode of the second board card to a preset upgrading mode by setting a hardware I/O state;
resetting the second board card so that: the second board card is reset and started and enters the preset upgrading mode after the hardware I/O state is read;
upgrading the second board card according to the preset upgrading mode;
after the upgrading of the second board card is determined to be completed, clearing the hardware I/O state;
and sending a reset notice to the second board card so that the second board card is reset and started.
8. The method of claim 1, wherein the step of performing system failure recovery on the second board comprises:
Upgrading the second board card by operating a preset flash memory programming program;
and after the second board card is determined to be upgraded, resetting the second board card so as to reset and start the second board card.
9. The system fault recovery device is characterized by being applied to a first board card in intelligent equipment, wherein the first board card is any one board card in the intelligent equipment;
the device comprises:
the information broadcasting module is used for broadcasting the fault information of the system fault of the second board card after the second board card is determined to have the system fault, wherein the second board card is as follows: a board card of the intelligent device except the first board card;
the recovery judging module is used for judging whether the second board card can be used for performing system fault recovery on the second board card or not under the condition that fault information of system faults of the second board card broadcast by other board cards except the first board card in the intelligent equipment is received, and if the fault information is received, triggering the fault recovery module;
the fault recovery module is used for recovering the system fault of the second board card;
wherein, the recovery judging module comprises:
The information receiving and determining submodule is used for determining the fault information of the system fault of the second board card which is broadcast by other board cards except the first board card in the intelligent equipment;
the information broadcasting sub-module is used for broadcasting self-recommendation information of the information broadcasting sub-module, monitoring the self-recommendation information broadcast by other board cards, triggering the recovery judgment sub-module when the self-recommendation information broadcast by other board cards is monitored, and triggering the recovery determination sub-module when the self-recommendation information broadcast by other board cards is not monitored;
the recovery judgment submodule is used for judging whether the second board card can be used for carrying out system fault recovery on the second board card according to self-recommendation information and received self-recommendation information;
and the recovery determining submodule is used for determining that the self-energy can be used for performing system fault recovery on the second board card.
10. The apparatus of claim 9, further comprising: a fault determination module;
the fault determining module is used for determining whether the second board card has a system fault;
the fault determination module includes:
the bus determining submodule is used for determining a target bus from the buses of the intelligent equipment;
The message sending submodule is used for sending fault detection messages to the second board card through each target bus;
the response detection submodule is used for monitoring whether fault detection responses from the second board card are not received through all the target buses, and if so, the fault determination submodule is triggered;
the fault determining submodule is used for determining that the second board card has a system fault;
the information broadcasting module is specifically configured to broadcast fault information that the second board card has a system fault through each target bus.
11. The apparatus of claim 9, wherein the recovery determination module further comprises:
the state information obtaining submodule is used for obtaining self hardware state information after the information receiving determining submodule determines that the fault information is received, judging whether the self hardware state information meets a first self-recommendation condition or not, and triggering the scene obtaining submodule if the self hardware state information meets the first self-recommendation condition;
the scene obtaining submodule is used for obtaining a self service scene;
and the condition judgment submodule is used for judging whether the self service scene meets a second self-recommendation condition or not, and triggering the information broadcasting submodule if the self service scene meets the second self-recommendation condition.
12. The apparatus of claim 9, further comprising:
the information sending module is used for sending system fault recovery prompt information;
and the information acquisition module is used for acquiring response information of the user aiming at the system fault recovery prompt information and triggering the fault recovery module under the condition that the response information indicates that the user agrees to carry out system fault recovery.
13. The apparatus of claim 9,
the fault recovery module is specifically configured to turn off a power supply of the second board card; controlling a system on the second board card to enter a fault recovery mode through an I/O pin connected between the board cards so as to enable the second board card to carry out system fault recovery; receiving a system fault recovery completion notification sent by the second board card after the system fault recovery is completed; determining whether the second board card completes system fault recovery according to the system fault recovery completion notification; if so, the power supply of the second board card is turned off, and the power supply of the second board card is turned on again, so that the second board card is subjected to system restart.
14. The apparatus of claim 9,
The failure recovery module is specifically configured to switch a starting device of the second board card to a backup device of the second board card by setting a hardware I/O state; resetting the second board card so that: the second board card is reset and started, starting equipment is switched to the backup equipment after the hardware I/O state is read, and files for system fault recovery in the backup equipment are recovered to the main equipment of the second board card; receiving a system fault recovery completion notification sent by the second board card after the system fault recovery is completed; determining whether the second board card completes system fault recovery according to the system fault recovery completion notification; if so, clearing the hardware I/O state; and sending a reset notice to the second board card so that the second board card is reset and started.
15. The apparatus of claim 9,
the fault recovery module is specifically configured to switch a start mode of the second board card to a preset upgrade mode by setting a hardware I/O state; resetting the second board card so that: the second board card is reset and started and enters the preset upgrading mode after the hardware I/O state is read; upgrading the second board card according to the preset upgrading mode; after the upgrading of the second board card is determined to be completed, clearing the hardware I/O state; and sending a reset notice to the second board card so that the second board card is reset and started.
16. The apparatus of claim 9,
the failure recovery module is specifically configured to upgrade the second board card by running a preset flash programming program; and after the second board card is determined to be upgraded, resetting the second board card so as to reset and start the second board card.
17. The utility model provides an electronic equipment, electronic equipment is first integrated circuit board in the smart machine, a serial communication port, includes: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 8 when executing a program stored in the memory.
18. A computer-readable storage medium, which is a readable storage medium of a first board in a smart device, wherein a computer program is stored in the computer-readable storage medium, and when being executed by a processor, the computer program implements the method steps of any one of claims 1 to 8.
CN201710417137.0A 2017-06-06 2017-06-06 System fault recovery method and device Active CN108958989B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710417137.0A CN108958989B (en) 2017-06-06 2017-06-06 System fault recovery method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710417137.0A CN108958989B (en) 2017-06-06 2017-06-06 System fault recovery method and device

Publications (2)

Publication Number Publication Date
CN108958989A CN108958989A (en) 2018-12-07
CN108958989B true CN108958989B (en) 2021-09-17

Family

ID=64495057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710417137.0A Active CN108958989B (en) 2017-06-06 2017-06-06 System fault recovery method and device

Country Status (1)

Country Link
CN (1) CN108958989B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111538613B (en) * 2020-04-28 2023-06-13 浙江大华技术股份有限公司 Cluster system exception recovery processing method and device
CN114185603B (en) * 2021-11-08 2024-01-05 深圳云天励飞技术股份有限公司 Control method of intelligent accelerator card, server and intelligent accelerator card
CN114928640A (en) * 2022-04-22 2022-08-19 西安万像电子科技有限公司 Exception handling method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101135984A (en) * 2007-01-08 2008-03-05 中兴通讯股份有限公司 Hardware information backup device, and method for backup operation information and saving detecting information
CN102968352A (en) * 2012-12-14 2013-03-13 杨晓松 System and method for process monitoring and multi-stage recovery
CN103618618A (en) * 2013-11-13 2014-03-05 福建星网锐捷网络有限公司 Line card fault recovery method and related device based on distributed PCIE system
CN104635718A (en) * 2013-11-12 2015-05-20 沈阳新松机器人自动化股份有限公司 Robot fault repairing system and method
CN105005395A (en) * 2015-07-18 2015-10-28 成都生辉电子科技有限公司 Method for setting spare key of intelligent equipment
CN105550056A (en) * 2015-12-11 2016-05-04 中国航空工业集团公司西安航空计算技术研究所 System reconfiguration based fault self-recovery system and realization method therefor
CN106370949A (en) * 2016-08-31 2017-02-01 北京术锐技术有限公司 Operation robot incomplete running state fault detection method
CN106375114A (en) * 2016-08-26 2017-02-01 迈普通信技术股份有限公司 Hot plug fault recovery method and distributed device
CN106789306A (en) * 2016-12-30 2017-05-31 深圳市风云实业有限公司 Restoration methods and system are collected in communication equipment software fault detect

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128555A (en) * 1997-05-29 2000-10-03 Trw Inc. In situ method and system for autonomous fault detection, isolation and recovery
KR101275707B1 (en) * 2011-12-22 2013-07-30 (주)소만사 Network based data loss prevention appliance system of multi instance structure which assures high availability and provides mirroring, in-line, and mirroring/in-line dual network adjustment method and the operating method thereof
CN104102572A (en) * 2013-04-01 2014-10-15 中兴通讯股份有限公司 Method and device for detecting and processing system faults
CN105071968A (en) * 2015-08-18 2015-11-18 大唐移动通信设备有限公司 Method and device for repairing hidden failures of service plane and control plane of communication device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101135984A (en) * 2007-01-08 2008-03-05 中兴通讯股份有限公司 Hardware information backup device, and method for backup operation information and saving detecting information
CN102968352A (en) * 2012-12-14 2013-03-13 杨晓松 System and method for process monitoring and multi-stage recovery
CN104635718A (en) * 2013-11-12 2015-05-20 沈阳新松机器人自动化股份有限公司 Robot fault repairing system and method
CN103618618A (en) * 2013-11-13 2014-03-05 福建星网锐捷网络有限公司 Line card fault recovery method and related device based on distributed PCIE system
CN105005395A (en) * 2015-07-18 2015-10-28 成都生辉电子科技有限公司 Method for setting spare key of intelligent equipment
CN105550056A (en) * 2015-12-11 2016-05-04 中国航空工业集团公司西安航空计算技术研究所 System reconfiguration based fault self-recovery system and realization method therefor
CN106375114A (en) * 2016-08-26 2017-02-01 迈普通信技术股份有限公司 Hot plug fault recovery method and distributed device
CN106370949A (en) * 2016-08-31 2017-02-01 北京术锐技术有限公司 Operation robot incomplete running state fault detection method
CN106789306A (en) * 2016-12-30 2017-05-31 深圳市风云实业有限公司 Restoration methods and system are collected in communication equipment software fault detect

Also Published As

Publication number Publication date
CN108958989A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
CN108228374B (en) Equipment fault processing method, device and system
CN108958989B (en) System fault recovery method and device
CN107315656B (en) Multi-kernel embedded PLC software recovery method and PLC
CN102455950A (en) Firmware recovery system and method of base board management controller
CN109143954B (en) System and method for realizing controller reset
CN100492305C (en) Fast restoration method of computer system and apparatus
CN106610712B (en) Substrate management controller resetting system and method
CN104079454A (en) Equipment exception detecting method and equipment
CN109766197B (en) 4G module stable working method based on Android system
CN114116280A (en) Interactive BMC self-recovery method, system, terminal and storage medium
CN110445932B (en) Abnormal card dropping processing method and device, storage medium and terminal
CN106776206A (en) The method of monitor process state, device and electronic equipment
CN117130832A (en) Monitoring reset method and system of multi-core heterogeneous system, chip and electronic equipment
KR20210113595A (en) Anomaly handling method, terminal device and storage medium
CN111880992B (en) Monitoring and maintaining method for controller state in storage device
CN111371642B (en) Network card fault detection method, device, equipment and storage medium
CN113412480B (en) Mounting processing method, mounting processing device, electronic equipment and computer readable storage medium
CN107273291B (en) Processor debugging method and system
CN111130856A (en) Server configuration method, system, equipment and computer readable storage medium
CN101971562B (en) Method, device and system for controlling automatic running process performance
CN114647531B (en) Failure solving method, failure solving system, electronic device, and storage medium
CN113434354B (en) Bus exception handling method and device, electronic equipment and readable storage medium
CN115904770A (en) Process recovery method and device, electronic equipment and storage medium
CN108664361B (en) PCIE non-transparent channel repairing method and device
CN114610530A (en) Disaster tolerance switching method and device for business system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant