WO2018095107A1 - 一种bios程序的异常处理方法及装置 - Google Patents

一种bios程序的异常处理方法及装置 Download PDF

Info

Publication number
WO2018095107A1
WO2018095107A1 PCT/CN2017/100375 CN2017100375W WO2018095107A1 WO 2018095107 A1 WO2018095107 A1 WO 2018095107A1 CN 2017100375 W CN2017100375 W CN 2017100375W WO 2018095107 A1 WO2018095107 A1 WO 2018095107A1
Authority
WO
WIPO (PCT)
Prior art keywords
bios program
bios
program
determining
main
Prior art date
Application number
PCT/CN2017/100375
Other languages
English (en)
French (fr)
Inventor
陈莹亮
张德
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018095107A1 publication Critical patent/WO2018095107A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3644Software debugging by instrumenting at runtime
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Definitions

  • the present invention relates to the field of communications, and in particular, to a method and an apparatus for processing an exception of a BIOS program.
  • the BIOS Basic Input Output System
  • the BIOS stores the most important basic input and output programs of the server, the self-test program after booting, and the system self-starting program.
  • the function is to provide the lowest level and most direct hardware setup and control for the server. Therefore, the reliability and importance of the BIOS is self-evident.
  • BIOS design which means that there is only one BIOS on a server, which is less reliable.
  • BIOS program When the BIOS program is abnormal or the storage medium is damaged, you need to wait for the professional operation and maintenance personnel to repair it before you can resume the service. The business interruption will often bring greater economic losses to the customer.
  • the embodiment of the invention provides a method and a device for processing an exception of a BIOS program, which can be used to start a second BIOS as a main BIOS program for service processing when the first BIOS program fails.
  • an embodiment of the present invention provides a method for processing an exception of a BIOS program, including:
  • the first BIOS is one of the N BIOS programs of the physical device, N is an integer greater than or equal to 2, and the first BIOS program is a main BIOS program for starting the physical device.
  • determining a second BIOS program of the N BIOS programs as a main BIOS program After determining that the first BIOS program has failed, determining a second BIOS program of the N BIOS programs as a main BIOS program, triggering the second BIOS program to start the device in the role of a main BIOS program.
  • the device can solve the BIOS failure problem in a short time, and the service processing required by the device is not interrupted for a long time. Reduce the loss caused by the failure of the BIOS program.
  • the method further includes a method for processing an upgrade fault that occurs during a BIOS program upgrade process.
  • a method for processing an upgrade fault that occurs during a BIOS program upgrade process First, the right to read and write the storage medium where the first BIOS program is located is obtained, and then the first BIOS program is upgraded. If the first BIOS program is successfully upgraded, some or all of the other BIOS programs in the device are upgraded. . If the upgrade of the first BIOS program fails, that is, the first BIOS program is faulty, the second BIOS program may be used as the main BIOS program according to the foregoing method, and the second BIOS program is triggered to start the device in the role of the main BIOS program.
  • the first BIOS program is upgraded by the out-of-band management mode, that is, the physical channel used for upgrading the first BIOS program and the first BIOS program for performing business processing.
  • the physical channels are different, enabling the first BIOS program to be upgraded while the device is performing business processing without interrupting the service.
  • determining whether the first BIOS program is When a fault occurs the first BIOS program can be monitored by the watchdog to see if it has failed.
  • the first BIOS program exception can be detected by the watchdog due to factors such as potential program errors or harsh environmental interference.
  • the watchdog can also reset the failed first BIOS program.
  • whether the first BIOS program is faulty may be determined by monitoring a signal sent by the first BIOS through the hardware interface in the first preset time.
  • the watchdog detects whether the first BIOS program has failed by receiving a software signal. If the device has not turned on the watchdog, it can also determine whether the first BIOS program is faulty by detecting a signal sent by the first BIOS program through the hardware interface. .
  • the first BIOS program may also be monitored by the CPLD for failure. If it is detected by the CPLD that the hardware interface signal of the storage medium where the first BIOS program is located is abnormal, it is determined that the first BIOS program has failed. In some cases, because the hardware interface signal of the storage medium where the first BIOS program is located is abnormal, the storage medium cannot work normally, and the first BIOS program cannot work normally. At this time, the first BIOS program can be monitored by the CPLD. normal operation.
  • the CPLD may be used to indicate The identifier of the first BIOS failure. After the identifier is cleared, the CPLD will send a reset signal to reset the device, that is, start the device with the second BIOS program as the main program.
  • the device may further include M MEs, where M is an integer greater than or equal to 2, and monitors the first ME that is the primary ME, and determines After the first ME sends a fault, the second ME of the M MEs is determined as the primary ME, and the second ME is triggered to start the device in the role of the primary ME.
  • the stability and reliability of the ME operation are also very important.
  • the ME may be monitored. After determining that the primary ME sends a fault, the second BIOS starts the device in the role of the main BIOS program to restore the device. Work normally to reduce losses due to ME failures.
  • an embodiment of the present invention provides an exception handling apparatus for a BIOS program, including a determining module and a triggering module.
  • the determining module is configured to determine that the first BIOS program is faulty, the first BIOS program is one of the N BIOS programs, N is a positive integer greater than or equal to 2, and the first BIOS program is in the The main BIOS program for starting the physical device before the failure of the first BIOS program is determined; after determining that the first BIOS program is faulty, determining that the second BIOS program of the N BIOS programs is the main BIOS program.
  • the triggering module is configured to trigger the second BIOS program to start the device in the role of a main BIOS program after the determining module determines that the second BIOS program is a main BIOS program.
  • the device further includes: an upgrade module, configured to: obtain read and write permissions of a storage medium where the first BIOS program is located; and upgrade the first BIOS program; If the upgrade of the first BIOS program is successful, the second BIOS program is upgraded.
  • an upgrade module configured to: obtain read and write permissions of a storage medium where the first BIOS program is located; and upgrade the first BIOS program; If the upgrade of the first BIOS program is successful, the second BIOS program is upgraded.
  • the determining module is specifically configured to: determine, by the watchdog, whether the first BIOS program is faulty.
  • the determining module is specifically configured to: if the signal sent by the first BIOS through the hardware interface is not detected within a preset time, determine A BIOS has failed.
  • the determining module is configured to: monitor, by using a CPLD, a hardware interface signal of a storage medium where the first BIOS program is located; If the hardware interface signal of the storage medium where the first BIOS program is located is abnormal, it is determined that the first BIOS program is faulty.
  • the triggering module is further configured to: clear the CPLD for indicating the The identifier of the first BIOS program is faulty, so that the CPLD triggers the second BIOS program to start the device in the role of the main BIOS program.
  • the determining module is further configured to: determine that the first management engine ME is faulty, the first ME is one of the M MEs a program, M is an integer greater than or equal to 2, the first ME is a primary ME used to start the physical device before the first ME fails; and after determining that the first ME is faulty, determining The second ME of the M MEs is the main ME.
  • the triggering module is further configured to trigger the second ME to start the device in the role of the primary ME.
  • a computer readable storage medium stores a computer execution instruction, and a BMC (Baseboard Management Controller) executes the computer execution instruction to implement the first aspect or the first Various possible implementations of aspects provide exception handling methods for BIOS programs.
  • a computer program product comprising computer executed instructions stored in a computer readable storage medium.
  • the BMC can read the computer execution instructions from the computer readable storage medium, and execute the computer execution instructions to implement the exception handling method of the BIOS program provided by the first aspect or the various possible implementations of the first aspect.
  • FIG. 1 is a schematic diagram of a connection of an outband management software in the prior art
  • FIG. 2 is a schematic flowchart of a method for processing an exception of a BIOS program according to an embodiment of the present invention
  • FIG. 3 is a second schematic flowchart of a method for processing an exception of a BIOS program according to an embodiment of the present invention
  • FIG. 4 is a third schematic flowchart of a method for processing an exception of a BIOS program according to an embodiment of the present invention
  • FIG. 5 is a fourth schematic flowchart of a method for processing an exception of a BIOS program according to an embodiment of the present invention.
  • FIG. 6 is a fifth schematic flowchart of a method for processing an exception of a BIOS program according to an embodiment of the present invention.
  • FIG. 7 is a sixth schematic flowchart of a method for processing an exception of a BIOS program according to an embodiment of the present disclosure
  • FIG. 8 is a schematic structural diagram of an abnormality processing apparatus of a BIOS program according to an embodiment of the present invention.
  • the embodiment of the present invention provides a method and device for processing an exception of a BIOS program.
  • the method provided by the embodiment of the present invention can be applied to a physical device that has at least two BIOS programs, wherein at least two BIOS programs can be hardened in the same storage medium or can be solidified in different storage media.
  • the foregoing storage medium may be a non-volatile memory, or may be a flash memory chip, and obtain other storage media, which is not limited in this embodiment of the present invention.
  • the control of the BIOS program is preferably performed by means of out-of-band management.
  • Out-of-band management that is, transmission of management control information and data information through different physical channels, the control plane and the data plane are completely independent and do not affect each other.
  • the BIOS program when the BIOS program is running, that is, through the data plane channel for service processing, it can also be controlled through the control plane channel, so that the operation of managing the BIOS can be performed during the process of processing the device. It is not necessary to manage the BIOS when the device is in the standby state, that is, when the device is powered on but no service processing is performed.
  • Out-of-band management software can be completely separated or partially separated from the service system of the device.
  • Figure 1 shows the out-of-band management software and devices.
  • Out-of-band management software can manage business systems, such as managing the power of the service (powering on or off, etc.), providing remote KVM (Keyboard Video Mouse, keyboard, display, mouse) functions and providing image mount functions for convenience.
  • KVM Keyboard Video Mouse, keyboard, display, mouse
  • the out-of-band management software can also manage the hardware of the device, for example, monitor the working status of the CPU (Central Processing Unit), memory, hard disk, and network card to detect abnormalities in time;
  • the out-of-band management software can also provide a variety of user interfaces for users to manage devices through out-of-band management software, such as WEB, SSH, and FTP based interfaces.
  • the out-of-band management software BMC manages power-on, power, speed, alarm, and fault diagnosis.
  • the BMC can communicate with the BIOS through the LPC (Low Pin Count) to monitor the BIOS and the status of the service system during power-on.
  • LPC Low Pin Count
  • SPI Serial Peripheral
  • Interface serial peripheral interface
  • the BIOS can be firmware upgraded in the BIOS fixed in SPI Flash.
  • FIG. 2 is a schematic flowchart diagram of a method for processing an exception of a BIOS program according to an embodiment of the present invention.
  • the method may be performed by an outband management software (for example, a BMC). As shown, the method includes the following steps:
  • Step 201 Determine that the first BIOS program is faulty.
  • the first BIOS program is one of the N BIOS programs that are solidified on the storage medium of the device, N is an integer greater than or equal to 2, and the first BIOS program is used to start before the first BIOS program sends a fault.
  • the device's main BIOS program is one of the N BIOS programs that are solidified on the storage medium of the device, N is an integer greater than or equal to 2, and the first BIOS program is used to start before the first BIOS program sends a fault.
  • the device's main BIOS program is one of the N BIOS programs that are solidified on the storage medium of the device, N is an integer greater than or equal to 2, and the first BIOS program is used to start before the first BIOS program sends a fault.
  • the device's main BIOS program is one of the N BIOS programs that are solidified on the storage medium of the device, N is an integer greater than or equal to 2, and the first BIOS program is used to start before the first BIOS program sends a fault.
  • the device's main BIOS program is
  • Step 202 After determining that the first BIOS program fails, determine the second BIOS program of the N BIOS programs as the main BIOS program.
  • the N- except the first BIOS program may be selected according to a preset priority.
  • the BIOS program with the highest priority is determined as the second BIOS program as the main program in one BIOS program, and one BIOS program may be randomly selected from the N-1 BIOS programs except the first BIOS program as the main program.
  • the embodiment of the invention does not limit this.
  • Step 203 After determining that the second BIOS program is the main BIOS program, trigger the second BIOS program to start the device in the role of the main BIOS program.
  • first BIOS program and the "second BIOS program” used in the embodiments of the present invention are Used to distinguish, not specific to a BIOS program.
  • step 201 when the above step 201 is implemented, it may be determined by the watchdog whether the first BIOS program has failed, as shown in FIG. Because the program may have potential errors, or the device is disturbed by external electromagnetic fields, the register and memory data are confused, causing the program to enter an infinite loop and cannot continue to work normally.
  • the watchdog can periodically check the working status of the chip, once an error occurs. A restart signal is sent to the chip; the command issued by the watchdog has the highest priority in the interrupt of the program.
  • the watchdog also known as the watchdog timer, is a timer circuit.
  • the input can receive the signal sent by the first BIOS, and the output can output a reset signal to the first BIOS.
  • a signal is sent to the watchdog periodically (commonly known as "feeding the dog"), indicating that the first BIOS program is operating normally, and the watchdog clears the timer after receiving the signal. And re-timed.
  • the watchdog sends a reset signal to the first BIOS to reset the first BIOS because the signal is not received within the set time.
  • the watchdog After the watchdog does not receive the signal within the set time, it can also send a signal to the out-of-band management software to notify the out-of-band management software that the first BIOS program has failed, and the out-of-band management software receives the watchdog transmission.
  • the above steps 202 and 203 are performed, that is, the second BIOS is determined to be the primary BIOS, and the second BIOS program is triggered to start the device in the role of the main BIOS program.
  • the user may also prompt the user to notify the user that the first BIOS program is abnormally running. Switch to the second BIOS program and the business system will be reset. The user can detect the cause of the abnormality of the first BIOS program after knowing that the first BIOS program has an abnormality.
  • the watchdog can be independent of the outband management software, and can also be integrated into the outband management software, which is not limited by the embodiment of the present invention.
  • the outband management software may monitor the signal sent by the first BIOS through the hardware interface. As shown in FIG. 4, if the first BIOS program runs normally, the signal is sent through the hardware interface. . After the out-of-band management software detects that the first BIOS sends a signal through the hardware interface, the timer is cleared and re-timed; if the out-of-band management software does not monitor the signal sent by the first BIOS through the hardware interface within a preset time, Then confirm that the first BIOS program has an exception.
  • the watchdog detects whether the first BIOS program has failed by receiving a software signal. If the device has not turned on the watchdog, it can also determine whether the first BIOS program is faulty by detecting a signal sent by the first BIOS program through the hardware interface. . Therefore, the above two embodiments can be combined to discover that the first BIOS program has failed in a timely manner.
  • the storage medium in which the first BIOS program is located may not work properly, and the service system of the device is continuously reset.
  • the watchdog or the belt External monitoring and monitoring The monitoring mechanism of the first BIOS to send signals through the hardware interface may not be able to start normally. At this time, it can be monitored by CPLD (Complex Programmable Logic Device).
  • the CPLD when the CPLD detects that the hardware electrical signal of the storage medium where the first BIOS program is located is abnormal, the CPLD sets an identifier for identifying whether the storage medium is normal or not, and the CPLD can be used to indicate an abnormality. After the flag is set to the abnormal identifier, the device sends a signal to the outband management software, and the outband management software periodically scans the identifier in the CPLD. When the outband management software determines the identifier according to the identifier on the identifier. a BIOS When the program is not working properly, it is determined that the second BIOS program is the main BIOS program, and the identifier for indicating the abnormality in the CPLD is cleared.
  • the CPLD may directly trigger the device reset, that is, trigger the second BIOS program to start the device in the role of the main BIOS program.
  • the out-of-band management software triggers the second BIOS program to start the device in the role of the main BIOS program.
  • the BIOS program exception handling method provided by the embodiment of the present invention can also be applied to when upgrading a BIOS program.
  • upgrading the BIOS program must be performed while the device is in standby, that is, when the device has not started service processing, the upgrade process must interrupt the service, and for the transaction that requires 24 hours of uninterrupted transactions.
  • the upgrade process of the prior art is very inconvenient.
  • the outband management software can be used to perform the outband management, so that the service processing of the data plane and the firmware upgrade of the control plane can be performed simultaneously.
  • the first BIOS program needs to be run for business processing, the program required for business processing is usually copied into the memory, and the program in the memory is executed to perform business processing. Therefore, the control plane independent of the data plane is used at this time. Upgrading the first BIOS program does not affect the progress of the business process. Can be applied to devices that require uninterrupted business systems.
  • the out-of-band management software first obtains the read/write permission of the storage medium where the first BIOS program is located, so that the out-of-band management software can update the first BIOS program.
  • the upgrade process can be as shown in Figure 6.
  • the out-of-band management software obtains a new version of the BIOS program, it first updates and upgrades the first BIOS program as the main BIOS program. If the upgrade is successful, it continues to be N in the device.
  • the BIOS program is upgraded in addition to the first BIOS program.
  • the BIOS program may be upgraded in descending order of priority according to a preset priority; or the order of upgrading other BIOS programs may be
  • the embodiment of the present invention does not limit this.
  • the second BIOS program may be determined as the main BIOS program according to the foregoing method, and then the second BIOS program is triggered to start the role in the role of the main BIOS program. device.
  • the Management Engine is solidified with the BIOS on the storage medium of the device's motherboard.
  • the ME needs to complete some information management of the device management.
  • the stability of the ME will also affect the device startup phase. Running.
  • the firmware program needs to be reloaded.
  • multiple ME programs can be solidified in one device, so that the ME is used as the main ME.
  • the first ME program that is the main ME program is usually mirrored on the motherboard of the device.
  • the motherboard of the device can send the hardware information of the device to the outband management software.
  • the outband management software can use SMLink (System Management) according to the hardware information. Link, the system management link) communicates with the device's motherboard.
  • SMLink System Management
  • Link the system management link
  • the outband management software may determine that the second ME program is the main ME program, and trigger the second ME to be the main ME program. The role starts the device.
  • the out-of-band management software can also send a reset command to the first ME to reset the first ME and copy the firmware program from the second ME.
  • a storage medium is solidified with a BIOS program and an ME program, usually the first BIOS.
  • the program and the first ME program are to be solidified in the same storage medium. Therefore, when it is determined that the first BIOS program or the first ME program is abnormal, the second BIOS program and the second ME program in the other storage medium are determined to be Main BIOS program and main ME program.
  • a plurality of BIOS programs and/or a plurality of ME programs may be solidified in the same storage medium, which is not limited in the embodiment of the present invention.
  • the second BIOS is restarted in the role of the main BIOS program, so that the device can solve the BIOS failure problem in a short time without causing the device to perform.
  • the business process is interrupted for a long time, reducing the loss caused by the failure of the BIOS program.
  • the software signal of the first BIOS program may be monitored by the watchdog, or the signal sent by the first BIOS program through the hardware interface may be monitored by the out-of-band management software, and may also be monitored by the CPLD.
  • the hardware signal of the storage medium in which the BIOS program is located can monitor the first BIOS program in various aspects, and realize that the first BIOS program is faulty and can be solved in time in different scenarios.
  • the out-of-band management software can perform firmware upgrade of the BIOS when the device performs business processing, thereby avoiding the inconvenience that the service must be interrupted due to the firmware upgrade.
  • the ME can also be monitored, and when the ME fails, it is switched to the second ME to work.
  • FIG. 8 is a schematic structural diagram of a BIOS program exception processing apparatus according to an embodiment of the present invention. As shown in the figure, the apparatus includes: a determining module 801 and a triggering module 802. Further, the apparatus may further include an upgrading module 803.
  • the determining module 801 is configured to determine that the first BIOS program is faulty, the first BIOS program is one of the N BIOS programs, and N is an integer greater than or equal to 2.
  • the first BIOS program is in the office.
  • the main BIOS program for starting the physical device before the failure of the first BIOS program is determined; after determining that the first BIOS program is faulty, determining that the second BIOS program of the N BIOS programs is the main BIOS program.
  • the triggering module 802 is configured to trigger the second BIOS program to start the device in the role of a main BIOS program after the determining module determines that the second BIOS program is a main BIOS program.
  • the device may further include an upgrade module 803, configured to acquire read and write permissions of the storage medium where the first BIOS program is located; upgrade the first BIOS program; and if the first BIOS program is successfully upgraded, the second The BIOS program is upgraded.
  • an upgrade module 803 configured to acquire read and write permissions of the storage medium where the first BIOS program is located; upgrade the first BIOS program; and if the first BIOS program is successfully upgraded, the second The BIOS program is upgraded.
  • the determining module 801 can determine, by the watchdog, whether the first BIOS program has failed.
  • the determining module 801 may also detect a signal sent by the first BIOS through the hardware interface. If the signal sent by the first BIOS through the hardware interface is not detected within a preset time, it is determined that the first BIOS is faulty.
  • the determining module 801 is further configured to: monitor, by the CPLD, a hardware interface signal of the storage medium where the first BIOS program is located; if the hardware interface signal of the storage medium where the first BIOS program is located is abnormally detected by the CPLD, determine A BIOS program has failed.
  • the triggering module 802 is further configured to: clear an identifier in the CPLD that is used to indicate that the first BIOS program is faulty, so that the CPLD is Triggering the second BIOS program starts the device in the role of a main BIOS program.
  • the determining module 801 is further configured to: determine that the first management engine ME is faulty, the first ME is one of the M MEs, and the M is an integer greater than or equal to 2, the first ME Is a primary ME used to start the physical device before the first ME fails; and after determining that the first ME is faulty, determining the M The second ME in the ME is the primary ME.
  • the triggering module 802 is further configured to trigger the second ME to start the device in the role of the primary ME.
  • the embodiment of the invention further provides a computer readable storage medium, wherein the computer readable storage medium stores a computer execution instruction, and the BMC executes the computer execution instruction to implement the exception processing method embodiment of the BIOS program.
  • Embodiments of the present invention also provide a computer program product comprising computer executed instructions stored in a computer readable storage medium.
  • the BMC can read the computer execution instructions from a computer readable storage medium and execute an embodiment of the computer execution instructions to implement an exception handling method of the BIOS program.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Abstract

一种BIOS程序的异常处理方法及装置。在该方法中,第一BIOS程序为物理设备的N个BIOS程序中的一个BIOS,且为启动物理设备的主BIOS程序;在确定第一BIOS程序发生故障后(201),确定所述第N个BIOS程序中的第二BIOS程序为主BIOS程序(202);触发第二BIOS以主BIOS程序的角色启动设备(203)。由于在确定第一BIOS程序发生故障后将第二BIOS以主BIOS程序的角色重启设备,使得设备在短时间内即可解决BIOS故障问题,不会造成该设备所需进行的业务处理长时间中断,减少由于BIOS程序故障而带来的损失。

Description

一种BIOS程序的异常处理方法及装置 技术领域
本发明涉及通信领域,尤其涉及一种BIOS程序的异常处理方法及装置。
背景技术
服务器的应用越来越广,金融、政府、教育等关键领域对服务器可靠性、稳定性提出了更高的要求。
BIOS(Basic Input Output System,基本输入输出系统)被固化到服务器主板的存储介质中,BIOS保存着服务器最重要的基本输入输出的程序、开机后自检程序和系统自启动程序等,BIOS的主要功能是为服务器提供最底层的、最直接的硬件设置和控制,因此,BIOS的可靠性和重要性不言而喻。
传统服务器往往采用单BIOS设计,即一台服务器上仅有一个BIOS,这种设计可靠性较差。当出现BIOS程序异常或者存储介质损坏时,需要等待专业的运维人员进行修复后才能恢复业务,业务中断较长时间往往会给客户带来较大的经济损失。
发明内容
本发明实施例提供了一种BIOS程序的异常处理方法及装置,用以实现当第一BIOS程序发生故障时,能够启动第二BIOS作为主BIOS程序进行业务处理。
第一方面,本发明实施例提供了一种BIOS程序的异常处理方法,包括:
第一BIOS是物理设备N个BIOS程序中的一个BIOS程序,N为大于或等于2的整数,且第一BIOS程序为用于启动所述物理设备的主BIOS程序。
在确定所述第一BIOS程序发生故障后,确定所述N个BIOS程序中的第二BIOS程序作为主BIOS程序,触发第二BIOS程序以主BIOS程序的角色启动所述设备。
由于在确定第一BIOS程序发生故障后将第二BIOS以主BIOS程序的角色重启设备,使得设备在短时间内即可解决BIOS故障问题,不会造成该设备所需进行的业务处理长时间中断,减少由于BIOS程序故障而带来的损失。
结合第一方面,在第一方面的第一种可能的实现方式中,该方法还包括对BIOS程序升级过程中发生的升级故障进行处理的方法。首先获取对第一BIOS程序所在存储介质进行读写的权限,然后对第一BIOS程序进行升级,若对第一BIOS程序升级成功,则对该设备中的其他BIOS程序中的部分或全部进行升级。若对第一BIOS程序升级失败,即确认第一BIOS程序发生故障,可以按照前述方法将第二BIOS程序作为主BIOS程序,并触发第二BIOS程序以主BIOS程序的角色启动该设备。
由于获取了对第一BIOS程序的读写权限,使得能够通过带外管理方式对第一BIOS程序进行升级,即对第一BIOS程序进行升级所用的物理通道与执行第一BIOS程序进行业务处理所用的物理通道不同,使得能够在该设备进行业务处理的同时对第一BIOS程序进行升级,而无需中断业务。
结合第一方面,在第一方面的第二种可能的实现方式中,在确定第一BIOS程序是否 发生故障时,可以通过看门狗监测第一BIOS程序是否发生故障。通过看门能够狗检测到由于程序潜在错误或恶劣环境干扰等因素而导致的第一BIOS程序异常,此外,看门狗还能够对发生故障的第一BIOS程序进行复位。
结合第一方面,在第一方面的第三种可能的实现方式中,还可以通过监测在第一预设时间内第一BIOS通过硬件接口发送的信号,来判断第一BIOS程序是否发生故障。看门狗是通过接收软件信号来检测第一BIOS程序是否发生故障,若设备还未开启看门狗,还可以通过检测第一BIOS程序通过硬件接口发送的信号来判断第一BIOS程序是否发生故障。
结合第一方面,在第一方面的第四种可能的实现方式中,还可以通过CPLD监控第一BIOS程序是否发生故障。若通过CPLD监测到第一BIOS程序所在存储介质的硬件接口信号发生异常,则确定第一BIOS程序发生故障。在一些情况下,由于第一BIOS程序所在存储介质的硬件接口信号异常,导致该存储介质无法正常工作,进而使得第一BIOS程序也无法正常工作,此时可以通过CPLD监测第一BIOS程序是否能够正常运行。
结合第一方面和第一方面的第四种可能的实现方式,在第一方面的第五种可能的实现方式中,在通过CPLD监测到第一BIOS发生故障后,可以清除CPLD中用于表示第一BIOS发生故障的标识,该标识被清除后,CPLD将发送复位信号以使该设备复位,即以第二BIOS程序为主程序启动设备。
结合第一方面,在第一方面的第六种可能的实现方式中,该设备还可以包括M个ME,M为大于等于2的整数,并对作为主ME的第一ME进行监控,在确定第一ME发送故障后,确定该M个ME中的第二ME作为主ME,并触发第二ME以主ME的角色启动设备。
由于ME运行的稳定性、可靠性也非常重要,在本发明实施例中还可以对ME进行监控,在确定主ME发送故障后,将第二BIOS以主BIOS程序的角色启动设备,使设备恢复正常工作,减少由于ME故障而带来的损失。
第二方面,本发明实施例提供了一种BIOS程序的异常处理装置,包括确定模块和触发模块。
其中,确定模块用于确定第一BIOS程序发生故障,所述第一BIOS程序是N个BIOS程序中的一个BIOS程序,N为大于或等于2的正整数,所述第一BIOS程序是在所述第一BIOS程序发生故障前用于启动物理设备的主BIOS程序;在确定所述第一BIOS程序发生故障后,确定所述N个BIOS程序中的第二BIOS程序为主BIOS程序。
触发模块用于在所述确定模块确定所述第二BIOS程序为主BIOS程序后,触发所述第二BIOS程序以主BIOS程序的角色启动所述设备。
结合第二方面,在第二方面的第一种可能的实现方式中,该装置还包括升级模块,用于:获取第一BIOS程序所在存储介质的读写权限;对第一BIOS程序进行升级;若对第一BIOS程序升级成功,则对所述第二BIOS程序进行升级。
结合第二方面,在第二方面的第二种可能的实现方式中,所述确定模块,具体用于:通过看门狗确定第一BIOS程序是否发生故障。
结合第二方面,在第二方面的第三种可能的实现方式中,所述确定模块,具体用于:若在预设时间内没有监测到第一BIOS通过硬件接口发送的信号,则确定第一BIOS发生故障。
结合第二方面,在第二方面的第四种可能的实现方式中,所述确定模块,具体用于:通过CPLD监控所述第一BIOS程序所在存储介质的硬件接口信号;若通过CPLD监测到所述第一BIOS程序所在存储介质的硬件接口信号异常,则确定第一BIOS程序发生故障。
结合第二方面和第二方面的第四种可能的实现方式,在第二方面的第五种可能的实现方式中,所述触发模块,还用于:清除所述CPLD中用于表示所述第一BIOS程序发生故障的标识,以使所述CPLD触发所述第二BIOS程序以主BIOS程序的角色启动设备。
结合第二方面,在第二方面的第六种可能的实现方式中,所述确定模块,还用于:确定第一管理引擎ME发生故障,所述第一ME是M个ME中的一个ME程序,M为大于或等于2的整数,所述第一ME是在所述第一ME发生故障前用于启动所述物理设备的主ME;在确定所述第一ME发生故障后,确定所述M个ME中的第二ME为主ME。
所述触发模块,还用于触发所述第二ME以主ME的角色启动设备。
第三方面,提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,BMC(Baseboard Management Controller,基板管理控制器)执行该计算机执行指令来实现上述第一方面或者第一方面的各种可能的实现方式提供的BIOS程序的异常处理方法。
第四方面,提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中。BMC可以从计算机可读存储介质读取该计算机执行指令,执行该计算机执行指令来实施上述第一方面或者第一方面的各种可能的实现方式提供的BIOS程序的异常处理方法。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍。
图1为现有技术中带外管理软件连接示意图;
图2为本发明实施例提供的BIOS程序的异常处理方法流程示意图之一;
图3为本发明实施例提供的BIOS程序的异常处理方法流程示意图之二;
图4为本发明实施例提供的BIOS程序的异常处理方法流程示意图之三;
图5为本发明实施例提供的BIOS程序的异常处理方法流程示意图之四;
图6为本发明实施例提供的BIOS程序的异常处理方法流程示意图之五;
图7为本发明实施例提供的BIOS程序的异常处理方法流程示意图之六;
图8为本发明实施例提供的BIOS程序的异常处理装置的结构示意图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述。
为了解决设备中的BIOS程序发生异常的问题,本发明实施例提供了一种BIOS程序的异常处理方法及装置。
本发明实施例提供的方法可应用于固化有至少两个BIOS程序的物理设备中,其中,至少两个BIOS程序可以被固化于同一存储介质中,也可以被固化于不同的存储介质中。 上述存储介质可以是非易失存储器,也可以是Flash存储芯片,获取其他存储介质,本发明实施例对此不做限制。
在本发明实施例中,当BIOS程序发生异常时,优选采用带外管理的方式对BIOS程序的控制。带外管理,即通过不同的物理通道传送管理控制信息和数据信息,控制面和数据面完全独立,互不影响。采用带外管理的方式,使得BIOS程序正在运行即通过数据面通道进行业务处理时,也可以通过控制面通道对其进行控制,使得对BIOS进行管理的操作可以在设备进行业务处理的过程中,而不必在设备处于待机状态时,即开机但未进行业务处理时才能对BIOS进行管理。
用于实现带外管理的软件,例如BMC(Baseboard Management Controller,基板管理控制器)等,与设备的业务系统可以是完全分离或部分分离,图1所示的即为带外管理软件与设备的业务系统完全分离的示意图。带外管理软件可以对业务系统进行管理,例如对业务电源进行管理(上电或下电等)、提供远程KVM(Keyboard Video Mouse,键盘、显示、鼠标)功能以及提供镜像挂载的功能以方便用户对业务系统的管理;带外管理软件还可以对设备的硬件进行管理,例如,监控CPU(Central Processing Unit,中央处理器)、内存、硬盘、网卡的工作状态,以便及时发现异常;此外,带外管理软件还可以提供多种用户接口供用户通过带外管理软件对设备进行管理,例如基于WEB、SSH、FTP的接口。
目前,带外管理软件BMC对上下电、温度、调速、告警、故障诊断等进行管理。其中,BMC对BIOS进行管理时,可以通过LPC(Low Pin Count,低管脚数接口)与BIOS进行通信,监控上电过程中的BIOS以及业务系统的状态;或者,还可以通过SPI(Serial Peripheral Interface,串行外设接口)与固化在SPI Flash芯片中的BIOS进行通信,在获取读写权限后可以对固化在SPI Flash中的BIOS进行固件升级。
下面详细介绍本发明实施例提供的BIOS程序的异常处理方法。
参见图2,为本发明实施例提供的BIOS程序的异常处理方法的流程示意图,该方法可以由带外管理软件(例如BMC)执行。如图所示,该方法包括以下步骤:
步骤201、确定第一BIOS程序发生故障。
其中,第一BIOS程序是被固化于设备的存储介质上的N个BIOS程序中的一个BIOS程序,N为大于等于2的整数,且第一BIOS程序为第一BIOS程序发送故障前用于启动该设备的主BIOS程序。
步骤202、在确定第一BIOS程序发生故障后,确定这N个BIOS程序中的第二BIOS程序作为主BIOS程序。
可选地,当N为大于2的整数时,在确定这N个BIOS程序中的第二BIOS程序作为主BIOS程序时,可以根据预先设置的优先级,从除第一BIOS程序外的N-1个BIOS程序中将优先级最高的BIOS程序确定为作为主程序的第二BIOS程序,也可以随机从除第一BIOS程序外的N-1个BIOS程序中选取一个BIOS程序作为主程序,本发明实施例对此不做限制。
步骤203、在确定第二BIOS程序为主BIOS程序后,触发第二BIOS程序以主BIOS程序的角色启动该设备。
应当理解,本发明实施例中所使用的“第一BIOS程序”和“第二BIOS程序”是为 了用于区分,而非特指某个BIOS程序。
在一些实施例中,实现上述步骤201时,可以通过看门狗确定第一BIOS程序是否发生故障,如图3所示。由于程序可能有潜在的错误,或者设备受到外界电磁场的干扰造成寄存器和内存数据的混乱,导致程序陷入死循环,无法继续正常工作,而看门狗可以定期的查看芯片的工作情况,一旦发生错误就向芯片发出重启信号;看门狗发出的命令在程序的中断中拥有最高的优先级。
看门狗,又叫watchdog timer(监视时钟),是一个定时器电路,输入端可以接收第一BIOS发送的信号,输出端可以输出复位信号给第一BIOS。当第一BIOS程序运行正常时,周期性地向看门狗发送一个信号(俗称“喂狗”),表示第一BIOS程序运行正常,看门狗在接收到信号后,将计时器清零,并重新计时。当第一BIOS程序运行异常,则无法向看门狗发送信号,而看门狗由于在设定时间内未接收到信号,则向第一BIOS发送复位信号,以使第一BIOS复位。
看门狗在设定时间内未接收到信号后,还可以向带外管理软件发送信号,以通知带外管理软件第一BIOS程序发生故障,带外管理软件在接收到看门狗发送的用于表示第一BIOS程序发生故障的信号后,执行上述步骤202和步骤203,即确定第二BIOS为主BIOS,并触发第二BIOS程序以主BIOS程序的角色启动该设备。
可选地,当带外管理软件接收到看门狗的发送的用于表示第一BIOS程序发生故障的信号后,还可以向用户提示警告信息,以使用户知晓第一BIOS程序运行异常,需要切换至第二BIOS程序,业务系统将被复位。用户可以在知晓第一BIOS程序发生异常后,检测第一BIOS程序发生异常的原因。
在具体实现时,看门狗可以独立于带外管理软件,也可以集成于带外管理软件中,本发明实施例对此不做限制。
在另外一些实施例中,实现上述步骤201时,带外管理软件可以监控第一BIOS通过硬件接口发送的信号,如图4所示,第一BIOS程序若运行正常,则会通过硬件接口发送信号。带外管理软件在监测到第一BIOS通过硬件接口发送信号后,将计时器清零,并重新计时;若带外管理软件在预设时间内没有监测到第一BIOS通过硬件接口发送的信号,则确认第一BIOS程序发生异常。
看门狗是通过接收软件信号来检测第一BIOS程序是否发生故障,若设备还未开启看门狗,还可以通过检测第一BIOS程序通过硬件接口发送的信号来判断第一BIOS程序是否发生故障。因此,可以将上述两种实施例相结合,以更及时发现第一BIOS程序发生故障。
此外,在一些情况下,由于硬件上是电气信号异常,可能会导致第一BIOS程序所在的存储介质无法正常工作,进而导致设备的业务系统不断复位,在此种情况下,看门狗或带外管理监控第一BIOS通过硬件接口发送信号的监控机制可能无法正常启动,此时,可以通过CPLD(Complex Programmable Logic Device,复杂可编程逻辑器件)进行监控。
如图5所示,当CPLD监测到第一BIOS程序所在的存储介质的硬件电气信号异常时,CPLD将用于标识该存储介质是否正常的标识位设置为用于表示异常的标识;CPLD可以在将该标识位设置为异常标识后主动向带外管理软件发送信号,也可以由带外管理软件定时对CPLD中的该标识位进行扫描,当带外管理软件根据该标识位上的标识确定第一BIOS 程序不能正常运行时,则确定第二BIOS程序为主BIOS程序,同时清除CPLD中该用于表示异常的标识。
可选地,CPLD中用于表示异常的标识被清除后,CPLD可以直接触发设备复位,即触发第二BIOS程序以主BIOS程序的角色启动该设备。当然,也可以确定出第二BIOS程序为主BIOS程序后,由带外管理软件触发第二BIOS程序以主BIOS程序的角色启动该设备。
本发明实施例提供的BIOS程序异常处理方法,还可以应用于对BIOS程序进行升级时。在现有技术中,对BIOS程序进行升级,必须在设备处于待机的情况下进行,即设备还未开始进行业务处理时,这就使得升级过程必须中断业务,而对于需要24小时不间断的交易系统、数据库等业务系统来说,现有技术的升级过程十分不便。
而本发明实施例中,由于可以采用带外管理软件通过带外管理的方式,使得数据面的业务处理和控制面的固件升级可以同时进行。当需要运行第一BIOS程序以进行业务处理时,通常先将业务处理所需的程序拷贝至内存中,通过运行内存中的程序以进行业务处理,因此,此时通过与数据面独立的控制面对第一BIOS程序进行升级,并不会影响业务处理的进程。能够应用于需要业务系统不间断的设备中。
带外管理软件先获取第一BIOS程序所在存储介质的读写权限,以使带外管理软件能够对第一BIOS程序进行更新。升级过程可以如图6所示,带外管理软件获取到新版本的BIOS程序时,先为作为主BIOS程序的为第一BIOS程序进行更新升级,若升级成功,则继续为该设备中N个BIOS程序中除第一BIOS程序外的其他BIOS程序进行升级。具体地,当N为大于2的整数时,可以根据预先设置的优先级,按照优先级由高到低的顺序对未进行BIOS程序进行升级;或者,对其他BIOS程序进行升级的顺序也可以是随机的;此外,还可以仅对除第一BIOS程序外的其他BIOS程序中的部分BIOS程序进行升级,可以对部分BIOS程序暂不升级,待设备运行负荷较低时再对余下的BIOS程序进行升级,本发明实施例对此不做限制。
若对第一BIOS程序升级失败,即第一BIOS程序无法正常运行发生故障,则可以按照前述方法,确定第二BIOS程序作为主BIOS程序,然后触发第二BIOS程序以主BIOS程序的角色启动该设备。
通常情况下,管理引擎(Management Engine,ME)随BIOS一同固化在设备主板的存储介质中,系统启动阶段需要ME完成设备管理的一些信息配置,ME的稳定与否也将影响到设备在启动阶段的运行。
由于ME对设备在启动阶段的重要性,且ME在发生故障后往往不能自动恢复,需要重新加载固件程序,本发明实施例还可以在一个设备中固化多个ME程序,以使在作为主ME程序的第一ME程序发生故障时,可以切换至其他ME工作,以保证设备的正常运行。
通常会将作为主ME程序的第一ME程序镜像到设备的主板上,设备的主板可以将该设备的硬件信息发送给带外管理软件,带外管理软件可以根据硬件信息,通过SMLink(System Management Link,系统管理链路)与设备主板进行通信。如图7所示,当检测到第一ME程序运行异常,或者第一ME主动上报异常时,带外管理软件可以确定第二ME程序为主ME程序,并触发第二ME以主ME程序的角色启动该设备。带外管理软件还可以向第一ME发送复位指令,以使第一ME进行复位,并从第二ME中拷贝固件程序。
一般来说,一个存储介质中固化有一个BIOS程序和一个ME程序,即通常第一BIOS 程序和第一ME程序会被固化于同一存储介质中,因此,往往在确定第一BIOS程序或第一ME程序发生异常时,确定另一存储介质中的第二BIOS程序和第二ME程序为主BIOS程序和主ME程序。当然,也可以在同一存储介质中固化多个BIOS程序和/或多个ME程序,本发明实施例对此不做限制。
通过上述实施例,实现了在确定第一BIOS程序发生故障后将第二BIOS以主BIOS程序的角色重启设备,使得设备在短时间内即可解决BIOS故障问题,不会造成该设备所需进行的业务处理长时间中断,减少由于BIOS程序故障而带来的损失。在确定第一BIOS程序是否发生故障时,可以通过看门狗监测第一BIOS程序的软件信号,也可以由带外管理软件监测第一BIOS程序通过硬件接口发送的信号,还可以通过CPLD监测第一BIOS程序所在存储介质的硬件信号,由于可以从多方面对第一BIOS程序进行监控,实现了在不同场景下均能够及时发现第一BIOS程序发生故障,并及时解决。当需要对BIOS进行固件升级时,带外管理软件可以在设备进行业务处理时对BIOS进行固件升级,避免了由于固件升级而必须中断业务所带来的不便。此外,在本发明实施例中还可以对ME进行监控,并在ME发生故障时切换至第二ME进行工作。
基于相同的技术构思,本发明实施例还提供了一种BIOS程序的异常处理装置,用以实现上述方法实施例。参见图8,为本发明实施例提供的BIOS程序异常处理装置的结构示意图,如图所述,该装置包括:确定模块801和触发模块802,进一步地,还装置还可以包括升级模块803。
其中,确定模块801用于确定第一BIOS程序发生故障,所述第一BIOS程序是N个BIOS程序中的一个BIOS程序,N为大于或等于2的整数,所述第一BIOS程序是在所述第一BIOS程序发生故障前用于启动物理设备的主BIOS程序;在确定所述第一BIOS程序发生故障后,确定所述N个BIOS程序中的第二BIOS程序为主BIOS程序。
触发模块802用于在所述确定模块确定所述第二BIOS程序为主BIOS程序后,触发所述第二BIOS程序以主BIOS程序的角色启动所述设备。
进一步地,该装置还可以包括升级模块803,用于获取第一BIOS程序所在存储介质的读写权限;对第一BIOS程序进行升级;若对第一BIOS程序升级成功,则对所述第二BIOS程序进行升级。
可选地,确定模块801,可以通过看门狗确定第一BIOS程序是否发生故障。
可选地,确定模块801,也可以检测第一BIOS通过硬件接口发送的信号,若在预设时间内没有监测到第一BIOS通过硬件接口发送的信号,则确定第一BIOS发生故障。
可选地,确定模块801,还可以通过CPLD监控所述第一BIOS程序所在存储介质的硬件接口信号;若通过CPLD监测到所述第一BIOS程序所在存储介质的硬件接口信号异常,则确定第一BIOS程序发生故障。
可选地,若确定模块801通过CPLD监测到第一BIOS程序发送故障,触发模块802还用于:清除所述CPLD中用于表示所述第一BIOS程序发生故障的标识,以使所述CPLD触发所述第二BIOS程序以主BIOS程序的角色启动设备。
可选地,确定模块801还可以用于:确定第一管理引擎ME发生故障,所述第一ME是M个ME中的一个ME程序,M为大于或等于2的整数,所述第一ME是在所述第一ME发生故障前用于启动所述物理设备的主ME;在确定所述第一ME发生故障后,确定所述M 个ME中的第二ME为主ME。此时,触发模块802,还用于触发所述第二ME以主ME的角色启动设备。
本发明实施例还提供了一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,BMC执行该计算机执行指令来实现上述BIOS程序的异常处理方法实施例。
本发明实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中。BMC可以从计算机可读存储介质读取该计算机执行指令,执行该计算机执行指令来实施BIOS程序的异常处理方法的实施例。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (14)

  1. 一种基本输入输出系统BIOS程序的异常处理方法,其特征在于,所述方法包括:
    确定第一BIOS程序发生故障,所述第一BIOS程序是N个BIOS程序中的一个BIOS程序,N为大于或等于2的正整数,所述第一BIOS程序是在所述第一BIOS程序发生故障前用于启动物理设备的主BIOS程序;
    确定所述第一BIOS程序发生故障后,确定所述N个BIOS程序中的第二BIOS程序为主BIOS程序;
    确定所述第二BIOS程序为主BIOS程序后,触发所述第二BIOS程序以主BIOS程序的角色启动所述设备。
  2. 如权利要求1所述的方法,其特征在于,还包括:
    获取第一BIOS程序所在存储介质的读写权限;
    对第一BIOS程序进行升级;
    若对第一BIOS程序升级成功,则对所述第二BIOS程序进行升级。
  3. 如权利要求1所述的方法,其特征在于,通过看门狗确定第一BIOS程序是否发生故障。
  4. 如权利要求1所述的方法,其特征在于,所述确定第一BIOS程序发生故障,包括:
    若在预设时间内没有监测到第一BIOS通过硬件接口发送的信号,则确定第一BIOS发生故障。
  5. 如权利要求1所述的方法,其特征在于,所述确定第一BIOS程序发生故障,包括:
    通过复杂可编程逻辑器件CPLD监控所述第一BIOS程序所在存储介质的硬件接口信号;
    若通过CPLD监测到所述第一BIOS程序所在存储介质的硬件接口信号异常,则确定第一BIOS程序发生故障。
  6. 如权利要求5所述的方法,其特征在于,所述触发所述第二BIOS程序以主BIOS程序的角色启动设备,包括:
    清除所述CPLD中用于表示所述第一BIOS程序发生故障的标识,以使所述CPLD触发所述第二BIOS程序以主BIOS程序的角色启动设备。
  7. 如权利要求1所述的方法,其特征在于,还包括:
    确定第一管理引擎ME发生故障,所述第一ME是M个ME中的一个ME程序,M为大于或等于2的整数,所述第一ME是在所述第一ME发生故障前用于启动所述物理设备的主ME;
    在确定所述第一ME发生故障后,确定所述M个ME中的第二ME为主ME;
    触发所述第二ME以主ME的角色启动设备。
  8. 一种基本输入输出系统BIOS程序的异常处理装置,其特征在于,包括:
    确定模块,用于确定第一BIOS程序发生故障,所述第一BIOS程序是N个BIOS程序中的一个BIOS程序,N为大于或等于2的正整数,所述第一BIOS程序是在所述第一BIOS程序发生故障前用于启动物理设备的主BIOS程序;在确定所述第一BIOS程序发生 故障后,确定所述N个BIOS程序中的第二BIOS程序为主BIOS程序;
    触发模块,用于在所述确定模块确定所述第二BIOS程序为主BIOS程序后,触发所述第二BIOS程序以主BIOS程序的角色启动所述设备。
  9. 如权利要求8所述的装置,其特征在于,还包括升级模块,用于:
    获取第一BIOS程序所在存储介质的读写权限;对第一BIOS程序进行升级;若对第一BIOS程序升级成功,则对所述第二BIOS程序进行升级。
  10. 如权利要求8所述的装置,其特征在于,所述确定模块,具体用于:
    通过看门狗确定第一BIOS程序是否发生故障。
  11. 如权利要求8所述的装置,其特征在于,所述确定模块,具体用于:
    若在预设时间内没有监测到第一BIOS通过硬件接口发送的信号,则确定第一BIOS发生故障。
  12. 如权利要求8所述的装置,其特征在于,所述确定模块,具体用于:
    通过复杂可编程逻辑器件CPLD监控所述第一BIOS程序所在存储介质的硬件接口信号;
    若通过CPLD监测到所述第一BIOS程序所在存储介质的硬件接口信号异常,则确定第一BIOS程序发生故障。
  13. 如权利要求12所述的装置,其特征在于,所述触发模块,具体用于:
    清除所述CPLD中用于表示所述第一BIOS程序发生故障的标识,以使所述CPLD触发所述第二BIOS程序以主BIOS程序的角色启动设备。
  14. 如权利要求8所述的装置,其特征在于,所述确定模块,还用于:
    确定第一管理引擎ME发生故障,所述第一ME是M个ME中的一个ME程序,M为大于或等于2的整数,所述第一ME是在所述第一ME发生故障前用于启动所述物理设备的主ME;在确定所述第一ME发生故障后,确定所述M个ME中的第二ME为主ME;
    所述触发模块,还用于触发所述第二ME以主ME的角色启动设备。
PCT/CN2017/100375 2016-11-24 2017-09-04 一种bios程序的异常处理方法及装置 WO2018095107A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611050425.9A CN106776282A (zh) 2016-11-24 2016-11-24 一种bios程序的异常处理方法及装置
CN201611050425.9 2016-11-24

Publications (1)

Publication Number Publication Date
WO2018095107A1 true WO2018095107A1 (zh) 2018-05-31

Family

ID=58910670

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/100375 WO2018095107A1 (zh) 2016-11-24 2017-09-04 一种bios程序的异常处理方法及装置

Country Status (2)

Country Link
CN (1) CN106776282A (zh)
WO (1) WO2018095107A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776282A (zh) * 2016-11-24 2017-05-31 华为技术有限公司 一种bios程序的异常处理方法及装置
CN107590009B (zh) * 2017-08-31 2020-12-01 深圳市恒扬数据股份有限公司 用于主机运行过程的故障处理方法及装置
CN108304282B (zh) * 2018-03-07 2021-04-20 郑州云海信息技术有限公司 一种双bios的控制方法及相关装置
CN108599981A (zh) * 2018-03-13 2018-09-28 迈普通信技术股份有限公司 业务卡的管理方法、业务卡及通信设备
CN109375956B (zh) * 2018-11-01 2021-10-15 郑州云海信息技术有限公司 一种重启操作系统的方法、逻辑设备以及控制设备
CN109714205A (zh) * 2018-12-28 2019-05-03 郑州云海信息技术有限公司 一种用于白盒交换机的bios双冗余保护方法、装置及计算机
CN110083491A (zh) * 2019-05-08 2019-08-02 苏州浪潮智能科技有限公司 一种bios初始化方法、装置、设备及存储介质
CN113010215B (zh) * 2021-03-12 2023-03-21 山东英信计算机技术有限公司 一种操作系统快速重启的方法、装置、设备及可读介质
CN113064747B (zh) * 2021-03-26 2022-10-28 山东英信计算机技术有限公司 一种服务器启动过程中的故障定位方法、系统及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101017441A (zh) * 2007-02-27 2007-08-15 华为技术有限公司 一种电子设备、电子设备的启动方法及bios升级方法
CN102750206A (zh) * 2012-05-10 2012-10-24 加弘科技咨询(上海)有限公司 多bios电路及多bios切换的方法
CN103077060A (zh) * 2013-01-10 2013-05-01 中兴通讯股份有限公司 主备用bios的切换方法及装置、系统
CN104615506A (zh) * 2015-02-13 2015-05-13 浪潮电子信息产业股份有限公司 一种基于逻辑控制的bios和bmc备份方法
CN104731675A (zh) * 2015-03-24 2015-06-24 浪潮集团有限公司 一种服务器系统中bios的智能冗余备份方法
CN105159719A (zh) * 2015-09-06 2015-12-16 浙江大华技术股份有限公司 一种主备用基本输入输出系统的启动方法及装置
CN106776282A (zh) * 2016-11-24 2017-05-31 华为技术有限公司 一种bios程序的异常处理方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101017441A (zh) * 2007-02-27 2007-08-15 华为技术有限公司 一种电子设备、电子设备的启动方法及bios升级方法
CN102750206A (zh) * 2012-05-10 2012-10-24 加弘科技咨询(上海)有限公司 多bios电路及多bios切换的方法
CN103077060A (zh) * 2013-01-10 2013-05-01 中兴通讯股份有限公司 主备用bios的切换方法及装置、系统
CN104615506A (zh) * 2015-02-13 2015-05-13 浪潮电子信息产业股份有限公司 一种基于逻辑控制的bios和bmc备份方法
CN104731675A (zh) * 2015-03-24 2015-06-24 浪潮集团有限公司 一种服务器系统中bios的智能冗余备份方法
CN105159719A (zh) * 2015-09-06 2015-12-16 浙江大华技术股份有限公司 一种主备用基本输入输出系统的启动方法及装置
CN106776282A (zh) * 2016-11-24 2017-05-31 华为技术有限公司 一种bios程序的异常处理方法及装置

Also Published As

Publication number Publication date
CN106776282A (zh) 2017-05-31

Similar Documents

Publication Publication Date Title
WO2018095107A1 (zh) 一种bios程序的异常处理方法及装置
CN107122321B (zh) 硬件修复方法、硬件修复系统以及计算机可读取存储装置
US10055296B2 (en) System and method for selective BIOS restoration
WO2022198972A1 (zh) 一种服务器启动过程中的故障定位方法、系统及装置
US7716520B2 (en) Multi-CPU computer and method of restarting system
CN108121630B (zh) 电子装置、重新启动方法及记录媒介
EP2518627B1 (en) Partial fault processing method in computer system
KR101712172B1 (ko) 컴퓨터 장애 증상의 사전 진단 및 분석 복구 시스템 및 방법
TWI261748B (en) Policy-based response to system errors occurring during OS runtime
CN108292342B (zh) 向固件中的侵入的通知
JP6130520B2 (ja) 多重系システムおよび多重系システム管理方法
US20170132102A1 (en) Computer readable non-transitory recording medium storing pseudo failure generation program, generation method, and generation apparatus
KR20040047209A (ko) 네트워크 상의 컴퓨터 시스템의 자동 복구 방법 및 이를구현하기 위한 컴퓨터 시스템의 자동 복구 시스템
CN114116280A (zh) 交互式bmc自恢复方法、系统、终端及存储介质
CN109976886B (zh) 内核远程切换方法及装置
CN115617550A (zh) 处理设备、控制单元、电子设备、方法和计算机程序
TWI764454B (zh) 韌體損壞恢復技術
WO2000051000A1 (fr) Systeme informatique et procede pour gerer les perturbations affectant un systeme informatique
US10824517B2 (en) Backup and recovery of configuration files in management device
US11314582B2 (en) Systems and methods for dynamically resolving hardware failures in an information handling system
CN107704399B (zh) 一种存储数据的方法和装置
TWI461905B (zh) 可遠端當機復原的運算裝置、用於運算裝置之遠端當機復原之方法及電腦可讀取媒體
JP7389877B2 (ja) ネットワークの最適なブートパスの方法及びシステム
TWI715005B (zh) 用於監控基板管理控制器之常駐程序的方法
US7676682B2 (en) Lightweight management and high availability controller

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17874434

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17874434

Country of ref document: EP

Kind code of ref document: A1