CN111124749A - Method and system for automatically repairing BMC (baseboard management controller) system of tightly-coupled high-performance computer system - Google Patents

Method and system for automatically repairing BMC (baseboard management controller) system of tightly-coupled high-performance computer system Download PDF

Info

Publication number
CN111124749A
CN111124749A CN201910839696.XA CN201910839696A CN111124749A CN 111124749 A CN111124749 A CN 111124749A CN 201910839696 A CN201910839696 A CN 201910839696A CN 111124749 A CN111124749 A CN 111124749A
Authority
CN
China
Prior art keywords
bmc
file
starting
boot
computer system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910839696.XA
Other languages
Chinese (zh)
Inventor
吴智
张春林
韩小虎
张祯
建澜涛
黄益明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910839696.XA priority Critical patent/CN111124749A/en
Publication of CN111124749A publication Critical patent/CN111124749A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1435Saving, restoring, recovering or retrying at system level using file system or storage system metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading

Abstract

The invention belongs to the field of high-performance computer system maintenance, and particularly relates to an automatic repair method and a repair system for a BMC (baseboard management controller) system of a tightly-coupled high-performance computer system. It is characterized by comprising: when monitoring that the BMC is failed to start, the management system controls the BMC to restart, acquires a BMC start file required by starting from the network file service system and loads the BMC start file into a memory of the network file service system to operate. According to the technical scheme, only the BMC starting file required by the BMC starting needs to be stored in the remote file system, and if the BMC fails in starting due to system crash caused by Flash bad blocks, the file required by the BMC starting can be acquired through the remote network file system to achieve automatic repairing. The BMC system of a plurality of nodes in the computer system can share only by occupying the space for storing a BMC starting file in the remote file system, so that resources are saved, maintenance is convenient, and the BMC of the plurality of nodes can simultaneously acquire the BMC file from the network file system, thereby improving the maintenance efficiency.

Description

Method and system for automatically repairing BMC (baseboard management controller) system of tightly-coupled high-performance computer system
Technical Field
The invention belongs to the field of high-performance computer system maintenance, and particularly relates to a BMC (baseboard management controller) system automatic repairing method for a high-performance computer system.
Background
With the continuous improvement of the operation performance of high-performance computers, the number of nodes included in the host system increases dramatically, and thus the scale of the distributed maintenance system taking the maintenance node as a main task also increases. An extremely large number of Baseboard Management Controllers (BMCs) are currently deployed in tightly coupled high-performance computing systems, and their reliability is particularly important as an important component of maintaining the system.
The core component of the BMC is an embedded system, the BMC usually stores own operating system files in a high-capacity Flash, and due to the characteristics of the Flash, bad blocks and failures with certain probability exist, which can cause failure in booting the BMC system. At present, an improved method is to use two-level storage, store bootloader in Nor Flash with better reliability but higher unit price and smaller capacity, store a kernel and a file system in eMMC nand Flash with poorer reliability but larger capacity, and perform offline re-burning when the eMMC nand Flash has a bad block to cause the kernel or the file system to fail. However, this method is inefficient to implement in large-scale systems, resulting in longer mean time to failure recovery and a consequent reduction in reliability and availability to maintain the system.
The invention patent application with application publication number CN103246583A and application publication date 2013, 8 month and 14 days discloses an electronic device with a CPU BIOS repairing function and a repairing method. The electronic device comprises a connection port used for electrically connecting the CPU system with an external device, and when a boot module of the CPU detects that the BIOS version of the boot module is wrong, the boot module copies the stored CPU BIOS from the external device to the flash memory by sending an instruction to the external device electrically connected with the CPU. Therefore, when the BIOS in the flash memory cannot be normally loaded, the CPU can load the BIOS in the external device connected with the electronic device. The method mainly aims at BIOS updating, and the BIOS file is characterized in that the occupied space is small, so that a plurality of external devices can be locally stored. The method mainly solves the problem of repairing the kernel and root file systems, and is characterized by occupying a large space and not being suitable for the limited storage space of a local embedded system. In addition, in the method, the whole processes of BIOS version detection, version search, update and the like are all initiated by the CPU system to be repaired, so that the requirement on the CPU system to be repaired is high, the CPU starting firmware is complex, and the method is not suitable for repairing the embedded system with simpler functions compared with the common CPU system.
In a tightly coupled high performance computing system, BMCs of the whole system are connected via an ethernet, and in order to improve maintenance efficiency, a tree-type management architecture is generally designed, that is, a management system is disposed above a first-layer BMC for managing the BMCs.
Disclosure of Invention
The invention aims to enable BMC to automatically recover from system crash caused by Flash bad blocks, improve the reliability of a maintenance system of a high-performance computer system, and provide a BMC system automatic repair method for a tightly-coupled high-performance computer system, which is characterized by comprising the following steps:
when the starting fails, the BMC acquires a BMC starting file required by the starting from the network file service system and loads the BMC starting file into a memory of the BMC starting file to run.
In the technical scheme, only the BMC starting file required by the BMC starting needs to be stored in the remote file system, and if the BMC fails in starting due to system crash caused by Flash bad blocks, the file required by the starting can be acquired through the remote network file system to realize automatic recovery. The BMC system of a plurality of nodes in the computer system can share only by occupying the space for storing a BMC starting file in the remote file system, so that resources are saved, maintenance is convenient, and the BMC of the plurality of nodes can simultaneously acquire the BMC file from the network file system, thereby improving the maintenance efficiency.
Further, after the BMC obtains a BMC boot file required for booting from the network file service system when the boot fails and loads the BMC boot file into a memory of the BMC boot file for running, the method further includes: and the BMC stores the BMC starting file. And updating the BMC start file stored in the local BMC by using the BMC start file acquired from the network file system, so that the BMC can be normally started based on the locally updated BMC start file.
Further, the BMC boot file refers to a BMC kernel and a BMC root file system.
Further, the BMC obtains a BMC start file required for starting from the network file service system when the start fails, and loads the BMC start file into a memory of the BMC system to operate, specifically: monitoring the starting state of each BMC in the computer system and commanding the BMC to enter an automatic repair mode when the BMC starting failure is monitored; the BMC acquires a BMC start-up file required by start from a network file service system in an automatic repair mode and loads the BMC start-up file into a memory of the BMC to operate. The monitoring of the BMC starting state can be realized by utilizing the communication interface between the management system established on the upper layer of the BMC of each node by the existing tightly-coupled high-performance computer and the serial port of the BMC, and the updating cost is low.
Further, the monitoring of the boot state of each BMC in the computer system specifically includes: and monitoring serial port output signals of each BMC in the computer system to monitor the starting state of each BMC.
Further, when it is monitored that the BMC fails to start, instructing the BMC to enter an automatic repair mode specifically includes: when the BMC is monitored to be failed to start, a reset command is sent to the BMC; and after the BMC is reset again according to the received reset command, the management system controls the BMC to start and automatically repair the network through the serial port.
Further, the BMC obtains a BMC boot file required for booting from the network file service system in the automatic repair mode and loads the BMC boot file into a memory thereof for operation, specifically: the management system controls the BMC to communicate with the network file service system through the Ethernet; the BMC downloads a BMC starting file system from the network file service system; and the BMC loads the downloaded BMC starting file into a memory of the BMC to operate.
The invention also provides a BMC system automatic repair system for high-performance computer system, which is characterized by comprising: the management system is used for monitoring the starting state of each BMC of the computer and commanding the BMC to enter an automatic repair mode when the BMC system starting failure is monitored; the network file system is used for storing a BMC starting file; and the BMC of the computer system acquires a BMC start-up file required by start from the network file service system in an automatic repair mode and loads the BMC start-up file into a memory of the BMC to operate.
In the technical scheme, only the BMC starting file required by the BMC starting needs to be stored in the remote file system, and if the BMC fails in starting due to system crash caused by Flash bad blocks, the file required by the starting can be acquired through the remote network file system to realize automatic recovery. The BMC system of a plurality of nodes in the computer system can share only by occupying the space for storing a BMC starting file in the remote file system, so that resources are saved, maintenance is convenient, and the BMC of the plurality of nodes can simultaneously acquire the BMC file from the network file system, thereby improving the maintenance efficiency.
Further, the management system comprises a serial port monitoring program for monitoring serial port output signals of each BMC in the computer system so as to monitor the starting state of each BMC.
Further, the BMC comprises a first storage module used for storing a boot BMC boot loader and a second storage module used for storing a BMC kernel and a BMC root file system.
The invention has the following beneficial effects:
1. the system startup failure caused by Flash bad blocks can be repaired without manual intervention, and the system has the self-repairing capability, so that the system reliability can be greatly improved.
2. The method can acquire the files required by starting through a remote network file system to realize automatic recovery, and is suitable for BMC maintenance of large-scale systems.
3. Resources are saved, maintenance is convenient, and the BMC of the nodes can acquire BMC files from the network file system at the same time, so that the maintenance efficiency is improved.
4. The software and hardware resources of the existing maintenance system in the tightly coupled high-performance computing system are fully utilized, and additional hardware is not needed.
Drawings
Fig. 1 is a system diagram of an automatic repair system according to a first embodiment of the present invention.
Detailed Description
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Unless otherwise defined, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that the conventional terms should be interpreted as having a meaning that is consistent with their meaning in the relevant art and this disclosure. The present disclosure is to be considered as an example of the invention and is not intended to limit the invention to the particular embodiments.
Example one
A system for automatically repairing a BMC (baseboard management controller) of a tightly coupled high-performance computer system is suitable for a computer system including a BMC (baseboard management controller), for example, a high-performance computer system with a large number of nodes in a host system. The high-performance computer system deploys a very large number of BMCs (baseboard management controllers) as a distributed maintenance system to maintain the nodes in the high-performance computer system.
The automatic BMC system repair system for a high-performance computer system according to this embodiment includes a management system, a network file system, and a BMC deployed in the high-performance computer system. The BMC usually stores the operating system file in a memory thereof, such as a large-capacity Flash, and because the Flash has bad blocks and failures with a certain probability, the BMC system is likely to fail to boot. Therefore, the BMC in this embodiment stores a BMC boot file (including a BMC kernel and a BMC root file system) required for BMC boot in a two-level storage manner, where the memory includes two different types of storage modules: the first memory module has a higher reliability and a higher price per unit, and the second memory module has a lower reliability and a correspondingly lower price per unit. For example, the first storage module may be implemented by Nor Flash, and the second storage module may be implemented by eMMcnand Flash. The boot loader (Bootloader) of the BMC is stored in a first storage module, and the BMC Kernel (Kernel) and the BMC root file system (Rootfs) are stored in a second storage module. Under normal conditions, the BMC runs a boot loader after being started at power-on, the boot loader loads the BMC kernel from the second storage module to the memory of the BMC, and the boot loader transmits the BMC kernel starting parameters to the BMC kernel. The starting parameters comprise a local path of the BMC kernel in the BMC and a local path of the BMC root file system in the BMC. Therefore, some initialization scripts and services of the BMC root file system in the second storage module are loaded into the memory of the BMC to run, and the start of the BMC is completed. When a second storage module of the BMC is a bad block or fails, failure of the BMC to boot the loader to load the BMC boot file after power-on boot may cause system crash and failure of the BMC boot.
And the network file system stores BMC starting files required by starting each BMC in the computer system. Each BMC in the computer system may access a file on the network file system over a network (e.g., ethernet) as a local file to obtain a BMC boot file required for its boot. The management system can allocate an IP address to each BMC of the computer system, so that the BMC can establish network connection with the network file system, thereby accessing the network file system.
And the management system is used for monitoring the starting state of each BMC of the computer and commanding the BMC to enter an automatic repair mode when the BMC system starting failure is monitored. The management system of this embodiment has a serial port monitoring program in operation under operating condition, and this serial port monitoring program can monitor the serial port output signal of each BMC in the computer system through the port of management system with each BMC serial port communication in the computer system, judges the start state of BMC according to the serial port output signal of BMC to realize the function of monitoring each BMC start state.
When the management system monitors that one or more BMCs fail to start through the serial port, a reset command (rst) is sent to the one or more BMCs which fail to start. After the BMC is reset again, the management system controls the BMC to start and automatically repair the network through the serial port. The BMC runs the boot loader after being powered on and started, establishes Ethernet communication with the network file service system through the IP address allocated by the BMC through the management system, and allocates the IP address to the BMC so that the BMC can access the BMC start file stored in the network file system. And a boot loader of the BMC loads the BMC kernel from the network file system to the memory of the BMC, and transmits the BMC kernel starting parameter to the BMC kernel. The starting parameters comprise IP address information of the BMC, a network path of a BMC kernel and a network path of a BMC root file system. And loading some initialization scripts and services of a BMC root file system in the network file system into a memory of the BMC to run so as to finish the starting of the BMC.
In another embodiment, after the system is successfully started by loading the BMC boot file in the network file system, the BMC further rewrites the downloaded BMC boot file into the second storage module to complete system repair.
Example two
The method is suitable for a computer system comprising BMC, for example, a high-performance computer system with a large number of nodes in a host system. The high-performance computer system deploys a very large number of BMCs (baseboard management controllers) as a distributed maintenance system to maintain the nodes in the high-performance computer system.
In the method of this embodiment, when monitoring that the BMC of each node fails to start, the management system controls the management system to acquire a BMC start file required for starting from the network file service system and loads the BMC start file into a memory of the management system to operate. The method specifically comprises the following steps:
step S1, monitoring the boot status of each BMC in the computer system and instructing the BMC to enter an automatic repair mode when a failure of boot of the BMC is detected. In this step, the management system described in the first embodiment is used to monitor the serial output signal of each BMC in the computer system to monitor the start state of each BMC. When the BMC starting failure is monitored, the management system sends a reset command to the BMC with the starting failure, and then the BMC with the starting failure gradually finishes the loading and repairing of the network file system under the control of the management system.
Step S2, the BMC acquires a BMC boot file required for booting from the network file service system in the automatic repair mode and loads the BMC boot file into its memory for running. Specifically, the BMC runs a boot loader after being started up, the management system allocates an IP address to the BMC, the management system establishes Ethernet communication with the network file service system, and the management system allocates the IP address to the BMC so that the BMC can access a BMC start file stored in the network file system. The BMC boot file refers to a file necessary for booting the BMC system, for example, the BMC boot file of this embodiment includes a BMC kernel and a BMC root file system. And a boot loader of the BMC loads the BMC kernel from the network file system to the memory of the BMC, and transmits the BMC kernel starting parameter to the BMC kernel. The starting parameters comprise IP address information of the BMC, a network path of a BMC kernel and a network path of a BMC root file system. And loading some initialization scripts and services of a BMC root file system in the network file system into a memory of the BMC to run so as to finish the starting of the BMC.
And step S3, the BMC stores the BMC starting file. After the restart is successful in step S2, the BMC rewrites the BMC boot file downloaded from the network file system in step S2 into the second storage module. And updating the BMC start file stored in the local BMC by using the BMC start file acquired from the network file system, so that the BMC can be normally started based on the locally updated BMC start file.
Although embodiments of the present invention have been described, various changes or modifications may be made by one of ordinary skill in the art within the scope of the appended claims.

Claims (10)

1. The method for automatically repairing the BMC system of the tightly coupled high-performance computer system is suitable for the computer system comprising the BMC, and is characterized by comprising the following steps:
when the starting fails, the BMC acquires a BMC starting file required by the starting from the network file service system and loads the BMC starting file into a memory of the BMC starting file to run.
2. The method of claim 1, wherein after the BMC obtains a BMC boot file required for booting from a network file service system when the boot fails and loads the BMC boot file into a memory of the BMC boot file for running, the method further comprises:
and the BMC stores the BMC starting file.
3. The method of claim 1, wherein the method comprises:
the BMC starting file refers to a BMC kernel and a BMC root file system.
4. The method of claim 1, wherein the BMC obtains a BMC boot file required for booting from a network file service system when the boot fails, and loads the BMC boot file into a memory of the BMC system to run, specifically:
monitoring the starting state of each BMC in the computer system and commanding the BMC to enter an automatic repair mode when the BMC starting failure is monitored;
the BMC acquires a BMC start-up file required by start from a network file service system in an automatic repair mode and loads the BMC start-up file into a memory of the BMC to operate.
5. The method of claim 4, wherein the monitoring of the boot status of each BMC in the computer system is specifically:
and monitoring serial port output signals of each BMC in the computer system to monitor the starting state of each BMC.
6. The method of claim 4, wherein when it is detected that the BMC fails to boot, the BMC is instructed to enter an automatic repair mode, specifically:
sending a reset command to the BMC when the starting failure of the BMC is monitored;
the BMC sets the automatic repair mode identifier to be effective according to the received reset command;
the BMC restarts.
7. The method of claim 4, wherein the BMC obtains a BMC boot file required for booting from the network file service system in the automatic repair mode and loads the BMC boot file into a memory of the BMC boot file for running, and the method specifically comprises:
the BMC is communicated with the network file service system through the Ethernet;
the BMC downloads a BMC starting file system from the network file service system;
and the BMC loads the downloaded BMC starting file into a memory of the BMC to operate.
8. The system for automatically repairing a BMC (baseboard management controller) system of a tightly coupled high-performance computer system is characterized by comprising:
the management system is used for monitoring the starting state of each BMC of the computer and commanding the BMC to enter an automatic repair mode when the BMC system starting failure is monitored;
the network file system is used for storing a BMC starting file;
and the BMC of the computer system acquires a BMC start-up file required by start from the network file service system in an automatic repair mode and loads the BMC start-up file into a memory of the BMC to operate.
9. The system of claim 8, wherein the system further comprises:
the management system comprises a serial port monitoring program used for monitoring serial port output signals of each BMC in the computer system so as to monitor the starting state of each BMC.
10. The system of claim 8, wherein the system comprises:
the BMC comprises a first storage module used for storing a boot BMC boot loader and a second storage module used for storing a BMC kernel and a BMC root file system.
CN201910839696.XA 2019-09-06 2019-09-06 Method and system for automatically repairing BMC (baseboard management controller) system of tightly-coupled high-performance computer system Pending CN111124749A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910839696.XA CN111124749A (en) 2019-09-06 2019-09-06 Method and system for automatically repairing BMC (baseboard management controller) system of tightly-coupled high-performance computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910839696.XA CN111124749A (en) 2019-09-06 2019-09-06 Method and system for automatically repairing BMC (baseboard management controller) system of tightly-coupled high-performance computer system

Publications (1)

Publication Number Publication Date
CN111124749A true CN111124749A (en) 2020-05-08

Family

ID=70495275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910839696.XA Pending CN111124749A (en) 2019-09-06 2019-09-06 Method and system for automatically repairing BMC (baseboard management controller) system of tightly-coupled high-performance computer system

Country Status (1)

Country Link
CN (1) CN111124749A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111953803A (en) * 2020-07-07 2020-11-17 锐捷网络股份有限公司 BMC starting method, equipment, system and storage medium
CN113127030A (en) * 2021-03-18 2021-07-16 山东英信计算机技术有限公司 Multi-node server BMC loading method, system, device and storage medium
EP4124957A3 (en) * 2021-09-08 2023-05-03 Beijing Baidu Netcom Science Technology Co., Ltd. Core board, server, fault repairing method and apparatus, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232397A (en) * 2008-02-22 2008-07-30 华为技术有限公司 Apparatus and method for renovating multi controller systems
CN103595572A (en) * 2013-11-27 2014-02-19 牛永伟 Selfreparing method of nodes in cloud computing cluster
CN103902327A (en) * 2012-12-29 2014-07-02 鸿富锦精密工业(深圳)有限公司 BMC starting system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232397A (en) * 2008-02-22 2008-07-30 华为技术有限公司 Apparatus and method for renovating multi controller systems
CN103902327A (en) * 2012-12-29 2014-07-02 鸿富锦精密工业(深圳)有限公司 BMC starting system and method
CN103595572A (en) * 2013-11-27 2014-02-19 牛永伟 Selfreparing method of nodes in cloud computing cluster

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111953803A (en) * 2020-07-07 2020-11-17 锐捷网络股份有限公司 BMC starting method, equipment, system and storage medium
CN113127030A (en) * 2021-03-18 2021-07-16 山东英信计算机技术有限公司 Multi-node server BMC loading method, system, device and storage medium
EP4124957A3 (en) * 2021-09-08 2023-05-03 Beijing Baidu Netcom Science Technology Co., Ltd. Core board, server, fault repairing method and apparatus, and storage medium

Similar Documents

Publication Publication Date Title
US7953831B2 (en) Method for setting up failure recovery environment
CN102238093B (en) Service interruption prevention method and device
US9910664B2 (en) System and method of online firmware update for baseboard management controller (BMC) devices
CN111124749A (en) Method and system for automatically repairing BMC (baseboard management controller) system of tightly-coupled high-performance computer system
US10430082B2 (en) Server management method and server for backup of a baseband management controller
US20090217079A1 (en) Method and apparatus for repairing multi-controller system
CN104915226A (en) Network device software starting method, device and network device
US7657734B2 (en) Methods and apparatus for automatically multi-booting a computer system
CN114116280B (en) Interactive BMC self-recovery method, system, terminal and storage medium
CN110874261A (en) Usability system, usability method, and storage medium storing program
CN101482823A (en) Single board application version implementing method and system
WO2019156062A1 (en) Information processing system, information processing device, bios updating method for information processing device, and bios updating program for information processing device
US20230129037A1 (en) Board management controller and method for starting thereof
CN112199240B (en) Method for switching nodes during node failure and related equipment
US20220318093A1 (en) Preserving error context during a reboot of a computing device
US10824517B2 (en) Backup and recovery of configuration files in management device
CN111090537A (en) Cluster starting method and device, electronic equipment and readable storage medium
CN110688130A (en) Physical machine deployment method, physical machine deployment device, readable storage medium and electronic equipment
CN110928726A (en) Embedded system self-recovery method and system based on watchdog and PXE
CN113377425B (en) BMC firmware generation method and device, BMC starting method and device and storage medium
CN116266150A (en) Service recovery method, data processing unit and related equipment
Cisco Operational Traps
CN112148531A (en) Dual-core chip and program backup and recovery method thereof
CN112612652A (en) Distributed storage system abnormal node restarting method and system
US11847467B2 (en) Boot method for embedded system including first and second baseboard management controller (BMC) and operating system (OS) image file using shared non-volatile memory module

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200508