WO2022037014A1 - Boot restoration method for arm server, and related apparatus - Google Patents

Boot restoration method for arm server, and related apparatus Download PDF

Info

Publication number
WO2022037014A1
WO2022037014A1 PCT/CN2021/073359 CN2021073359W WO2022037014A1 WO 2022037014 A1 WO2022037014 A1 WO 2022037014A1 CN 2021073359 W CN2021073359 W CN 2021073359W WO 2022037014 A1 WO2022037014 A1 WO 2022037014A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
error counter
startup
boot
preset threshold
Prior art date
Application number
PCT/CN2021/073359
Other languages
French (fr)
Chinese (zh)
Inventor
孙秀强
黄家明
乔英良
李道童
王兵
李勋堂
张炳会
孙良勇
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2022037014A1 publication Critical patent/WO2022037014A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating

Definitions

  • the present application relates to the technical field of servers, and in particular, to a startup repair method for an ARM server; and also to a startup repair device, device, and computer-readable storage medium for startup of an ARM server.
  • the Quicksilver processor is a processor chip with 80 64-bit ARM (Advanced RISC Machines, advanced reduced instruction set machines) processor cores independently designed by Ampere using the V8 architecture authorized by ARM.
  • SCP System Control Processor, system control processor
  • firmware includes SMpro (System Management Program, system management program) and PMpro (Power Management Program, power management program) microcontroller management program, SMpro microcontroller provides the entire Management of the system, including the secure boot processing mechanism, managing processor clocks and restarts, system boot, power failure detection, and error handling.
  • PMpro microcontrollers provide power management functions, including ACPI (Advanced Configuration and Power Interface) Pstate (Power state, power state) state control, dynamic regulation of voltage and frequency, dynamic power consumption evaluation and over temperature protection mechanisms, etc.
  • ACPI Advanced Configuration and Power Interface
  • Pstate Power state, power state
  • the Quicksilver processor secure boot solution involves multiple modules such as SMpro, PMpro, ATF (ARM Trusted Firmware, ARM Trusted Firmware) and UEFI (Unified Extensible Firmware Interface, Unified Extensible Firmware Interface).
  • the SMpro boot program follows the TBBR ( Transmission BufferBloat Request, buffer congestion demand transmission) standard protocol, carry out SLM (Source Library Maintenance, source library maintenance) header file security verification on the SEC phase and image files of SMpro, and only after the verification is successful, can continue to guide down to PMpro At the same time, PMpro will also follow the TBBR protocol specification of the ARM platform to perform the security verification of the SLM header file for the key and content of the PMpro. After the SCP firmware security boot verification is completed, the ATF program can be started.
  • TBBR Transmission BufferBloat Request, buffer congestion demand transmission
  • the ATF firmware includes BL1 (Boot Loader stage1 , Boot Loader stage 1), BL2 (Boot Loader stage2, Boot Loader stage 2), BL31 (Boot Loader stage3-1, Boot Loader stage 3-1), BL32 (Boot Loader stage3-2, Boot Loader stage 3-2), BL33 (Boot Loader stage 3-3, boot loader stage 3-3) and other stages, each stage also follows the ARM platform TBBR protocol for security verification, when the above security verification is completed, it can be in the BL33 stage Jump to UEFI (Unified Extensible Firmware Interface, Unified Extensible Firmware Interface) firmware to boot normally. Incorrect setting of processor core voltage, incorrect memory voltage setting, incorrect memory speed setting, and incorrect memory working mode setting will cause the system to fail to start normally, which will bring great difficulties to the batch deployment and application maintenance of ARM servers. big challenge.
  • UEFI Unified Extensible Firmware Interface
  • the purpose of this application is to provide a startup repair method for an ARM server, which can automatically repair the wrong settings of the ARM server and ensure that the server can be started normally.
  • Another object of the present application is to provide an ARM server boot repair device, device and computer-readable storage medium, all of which have the above technical effects.
  • the present application provides a startup repair method for an ARM server, including:
  • the configuration information of the ARM processor is restored to the default setting, and after the configuration information of the ARM processor is restored to the default setting, the boot-up phase is restarted.
  • the condition for triggering judgment as to whether the value of the error counter reaches the preset threshold is that any one of the startup boot phases in the startup process of the ARM server fails to execute and the corresponding watchdog timer exceeds the preset value.
  • the boot-up phase includes:
  • the changing the value of the error counter according to a preset rule includes:
  • the value of the error counter is restored to the initial value.
  • restoring the value of the error counter to an initial value including:
  • the value of the error counter is restored to the initial value before the end of the UEFI phase.
  • the present application also provides a device for repairing an ARM server, including:
  • initialization module used to initialize the value of the error counter
  • a judgment module used for judging whether the value of the error counter reaches a preset threshold whenever any one of the startup boot stages in the ARM server startup process fails to execute;
  • a changing module configured to change the value of the error counter according to a preset rule and restart the boot-up phase if the preset threshold is not reached;
  • a restoration module configured to restore the configuration information of the ARM processor to a default setting if the preset threshold is reached, and restart the bootstrap after restoring the configuration information of the ARM processor to a default setting stage.
  • the present application also provides a boot repair device for an ARM server, including:
  • the processor is configured to implement the steps of the ARM server startup repair method according to any one of the above when executing the computer program.
  • the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the ARM described in any of the above is implemented. Steps for the Startup Repair method for the server.
  • the startup repair method for an ARM server includes: initializing the value of an error counter; whenever any one of the startup boot phases in the startup process of the ARM server fails to execute, judging whether the value of the error counter reaches a preset threshold; If the preset threshold is not reached, the value of the error counter is changed according to the preset rule, and the boot-up phase is restarted; if the preset threshold is reached, the configuration information of the ARM processor is restored to Default settings, and after restoring the configuration information of the ARM processor to the default settings, restart the boot-up phase.
  • the boot-repair apparatus, device and computer-readable storage medium for an ARM server provided by the present application all have the above technical effects.
  • FIG. 1 is a schematic flowchart of a startup repair method for an ARM server provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a startup repair device for an ARM server provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a startup repair device for an ARM server provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • the core of the present application is to provide a startup repair method for an ARM server, which can automatically repair the wrong settings of the ARM server and ensure that the server can be started normally.
  • Another core of the present application is to provide a boot repair device, equipment and computer-readable storage medium for an ARM server, all of which have the above-mentioned technical effects.
  • FIG. 1 is a schematic flowchart of an ARM server startup repair method provided by an embodiment of the present application. Referring to FIG. 1, the method mainly includes:
  • the present application adds a counter, that is, an error counter, on the SMpro of the SCP firmware, and counts the number of restarts through the error counter.
  • a counter that is, an error counter
  • the SMpro of the SCP firmware initializes the value of the error counter, such as initializing the value of the error counter to zero.
  • the booting phase includes a SMpro phase, a PMpro phase, an ATF phase, and a UEFI phase.
  • the SMpro stage, the PMpro stage, the ATF stage, and the UEFI stage are executed sequentially. Specifically, if the execution of the SMpro stage is successful, it will lead down to the PMpro stage. Further, if the PMpro stage is successfully executed, the ATF stage is further executed, including BL1, BL2, BL31, BL32, BL33 and other stages. If the ATF stage is successfully executed, jump to the UEFI stage in the BL33 stage. Further, if the UEFI stage is successfully executed, the system is booted to the OS, that is, the operating system.
  • the above startup and boot stages are executed normally in sequence, but after the configuration information is manually modified, one or more of the above boot and boot stages cannot be executed normally, thereby causing the ARM server to fail to start normally. Therefore, in the present application, whenever any one of the above startup and boot stages fails to execute, the SMpro firmware determines whether the value of the error counter reaches a preset threshold, and the preset threshold represents the maximum number of restarts, that is, the SMpro firmware determines whether the number of restarts reaches the threshold. Preset threshold.
  • the preset threshold is 3.
  • the above-mentioned condition for triggering judgment on whether the value of the error counter reaches the preset threshold is that any one of the startup boot phases in the startup process of the ARM server fails to execute and the corresponding watchdog timer exceeds the preset value. That is, whenever any one of the startup boot phases in the startup process of the ARM server fails and the corresponding watchdog timer exceeds the preset value, the SMpro firmware determines whether the value of the error timer reaches the preset threshold.
  • the SMpro firmware determines whether the value of the error counter reaches the preset threshold.
  • the SMpro firmware determines whether the value of the error counter reaches the preset threshold.
  • the SMpro firmware determines whether the value of the error counter reaches the preset threshold.
  • the UEFI stage fails to boot to the OS, the system cannot be booted normally, and the BMC's FRB-2 watchdog timer exceeds the preset value, the BMC informs the SMpro firmware that the UEFI stage fails to execute, and the SMpro firmware judges incorrectly. Whether the value of the counter reaches the preset threshold.
  • the above-mentioned preset value can be set to any value greater than the power-on time.
  • the value of the error counter is changed according to the preset rules, and the bootstrap stage is restarted, specifically restarting the first bootstrap stage, that is, restarting the SMpro stage. If the SMpro stage is successfully executed, Then, the PMpro stage is further automatically executed. If the PMpro stage is successfully executed, the ATF stage is further executed, and so on. After the restart, the operation of judging whether the value of the error counter reaches the preset threshold value will also be performed whenever any of the startup boot phases in the startup process of the ARM server fails and the corresponding watchdog timer exceeds the preset value.
  • the above-mentioned changing the value of the error counter according to the preset rule includes: increasing the value of the error counter by one. Specifically, whenever any of the startup boot phases in the ARM server startup process fails, determine whether the value of the error counter reaches the preset threshold, if not, increment the value of the error counter by one, and restart the boot boot phase . In this way, the value of the error counter corresponds to the number of restarts, and when the number of restarts reaches the maximum value, the value of the error counter reaches the preset threshold.
  • S104 If the preset threshold is reached, restore the configuration information of the ARM processor to the default setting, and restart the boot stage after restoring the configuration information of the ARM processor to the default setting.
  • the backup NVRAM Non-Volatile Random Access Memory, non-volatile random access memory
  • BIOS Basic Input/Output System
  • the value of the error counter is incremented by one, and the preset threshold is equal to 3 as an example:
  • the execution of the SMpro stage fails, judge whether the value of the error counter is 3. If the value of the error counter is 3, restore the configuration information of the ARM processor to the default setting; if the value of the error counter is not equal to 3, restart SMpro stage, and the value of the error counter is incremented by one; if the SMpro stage is successfully executed, the PMpro stage is further executed. If the execution of the PMpro stage fails, judge whether the value of the error counter is 3. If the value of the error counter is 3, restore the configuration information of the ARM processor to the default setting; if the value of the error counter is not equal to 3, restart SMpro stage, and the value of the error counter is incremented by one; if the PMpro stage is successfully executed, the ATF stage is further executed.
  • the execution of the ATF stage fails, it is judged whether the value of the error counter is 3. If the value of the error counter is 3, the configuration information of the ARM processor is restored to the default setting; if the value of the error counter is not equal to 3, The SMpro stage is restarted, and the value of the error counter is incremented by one; if the ATF stage is successfully executed, the UEFI stage is further executed.
  • the ARM server When the ARM server starts normally, restore the value of the error counter to the initial value. And it may specifically be restoring the value of the error counter to the initial value before the UEFI stage ends. That is, there is no problem in the UEFI stage, and before booting to the OS, restore the value of the error counter to the initial value, such as clearing the value of the error counter.
  • the startup repair method for an ARM server includes: initializing the value of an error counter; whenever any one of the startup boot phases in the startup process of the ARM server fails to execute, judging whether the value of the error counter is not Reaching the preset threshold; if the preset threshold is not reached, the value of the error counter is changed according to the preset rule, and the startup boot stage is restarted; if the preset threshold is reached, the ARM processor The configuration information of the ARM processor is restored to the default setting, and after the configuration information of the ARM processor is restored to the default setting, the startup and booting stage is restarted.
  • an error counter is added to count the number of restarts, and when the value of the error counter reaches a preset threshold, the configuration information of the ARM processor is restored to the default setting and restarted. Therefore, by restoring the configuration information of the ARM processor to the default settings, the ARM server can be guaranteed to start normally, and the problem that the ARM server cannot be started normally caused by artificial modification of the configuration information can be effectively solved.
  • FIG. 2 is a schematic diagram of an apparatus for booting up an ARM server according to an embodiment of the present application.
  • the apparatus includes:
  • an initialization module 10 for initializing the value of the error counter
  • the judgment module 20 is used for judging whether the value of the error counter reaches a preset threshold value whenever any one of the startup boot stages in the ARM server startup process fails to execute;
  • the changing module 30 is used for changing the value of the error counter according to the preset rule and restarting the boot phase if the preset threshold is not reached;
  • the restoring module 40 is configured to restore the configuration information of the ARM processor to the default setting if the preset threshold is reached, and restart the boot stage after restoring the configuration information of the ARM processor to the default setting.
  • the booting phase includes: a SMpro phase, a PMpro phase, an ATF phase, and a UEFI phase.
  • the condition for triggering the judgment of whether the value of the error counter reaches the preset threshold is that any one of the startup boot phases in the startup process of the ARM server fails to execute and the corresponding watchdog timer counts exceeds the preset value.
  • the judging module 20 includes:
  • the first judgment unit is used for judging whether the value of the error counter reaches the preset threshold when the execution of the SMpro stage fails and the timing of the watchdog timer of the SCP firmware exceeds the preset value;
  • the second judgment unit is used to judge whether the value of the error counter reaches the preset threshold when the execution of the PMpro stage fails and the watchdog timer of the SCP firmware exceeds the preset value;
  • the third judgment unit is used to judge whether the value of the error counter reaches the preset threshold when the execution of the ATF stage fails and the watchdog timer of the SCP firmware exceeds the preset value;
  • the fourth judgment unit is configured to judge whether the value of the error counter reaches the preset threshold when the UEFI stage fails to execute and the FRB-2 watchdog timer of the BMC exceeds the preset value.
  • the changing module 30 is specifically configured to increase the value of the error counter by one if the preset threshold is not reached, and restart the boot phase.
  • the count value restoration module is used to restore the value of the error counter to the initial value after the ARM server starts normally.
  • the count value restoration module is specifically configured to restore the value of the error counter to the initial value before the UEFI stage ends.
  • the present application also provides a boot-repair device for an ARM server.
  • the device includes a memory 1 and a processor 2 .
  • the memory 1 is used to store the computer program;
  • the processor 2 is used to execute the computer program to realize the following steps:
  • the present application also provides a computer-readable storage medium 4. As shown in FIG. 4, a computer program 41 is stored on the computer-readable storage medium 4. When the computer program 41 is executed by the processor, the following steps can be implemented:
  • the computer-readable storage medium may include: a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, etc. that can store program codes medium.
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically programmable ROM
  • EEPly erasable programmable ROM registers
  • hard disk hard disk
  • removable disk CD-ROM (compact disk read only memory)
  • CD-ROM compact disk read only memory

Abstract

A boot restoration method for an ARM server. The method comprises: initializing the value of an error counter (S101); each time that the execution of any boot loader stage during a boot process of an ARM server is unsuccessful, determining whether the value of the error counter reaches a preset threshold value (S102); if the value does not reach the preset threshold value, changing the value of the error counter according to a preset rule, and restarting the boot loader stage (S103); and if the value reaches the preset threshold value, restoring configuration information of an ARM processor to a default setting, and restarting the boot loader stage after the configuration information of the ARM processor is restored to the default setting (S104). By means of the method, an error setting of an ARM server can be automatically restored, thereby ensuring that the server can normally boot up. Further disclosed are a boot restoration apparatus for an ARM server, and a device and a computer-readable storage medium, which all have said technical effects.

Description

一种ARM服务器的启动修复方法及相关装置A kind of ARM server startup repair method and related device
本申请要求于2020年08月21日提交中国国家知识产权局,申请号为202010851698.3,发明名称为“一种ARM服务器的启动修复方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on August 21, 2020, with the application number of 202010851698.3 and the invention titled "A Startup Repair Method for an ARM Server and Related Device", the entire contents of which are approved by Reference is incorporated in this application.
技术领域technical field
本申请涉及服务器技术领域,特别涉及一种ARM服务器的启动修复方法;还涉及一种ARM服务器启动的启动修复装置、设备以及计算机可读存储介质。The present application relates to the technical field of servers, and in particular, to a startup repair method for an ARM server; and also to a startup repair device, device, and computer-readable storage medium for startup of an ARM server.
背景技术Background technique
Quicksilver处理器是Ampere公司采用ARM公司授权V8架构自主设计的一款拥有80个64位ARM(Advanced RISC Machines,高级精简指令集机器)处理器内核的处理器芯片。其SCP(System Control Processor,系统控制处理器)固件包含了SMpro(System Management Program,系统管理程序)和PMpro(Power Management Program,电源管理程序)微控制器的管理程序,SMpro微控制器提供了整个系统的管理,包括安全启动处理机制、管理处理器时钟和重启、系统引导、电源失效侦测和错误处理。PMpro微控制器提供了电源管理功能,包括ACPI(Advanced Configuration and Power Interface,高级配置和电源接口)的Pstate(Power state,电源状态)状态控制、动态调节电压和频率、动态耗电评估及过温保护机制等。Quicksilver处理器安全启动方案涉及SMpro、PMpro、ATF(ARM Trusted Firmware,ARM受信任固件)及UEFI(Unified Extensible Firmware Interface,统一可扩展固件接口)等多个模块,SMpro引导程序遵循ARM平台的TBBR(Transmission BufferBloat Request,缓存拥塞需求传输)规范协议,对SMpro的SEC阶段及镜像文件进行SLM(Source Library Maintenance,源程序库维护)头文件安全校验,只有校验成功后方可继续向下引导到PMpro阶段,同时PMpro也会遵循ARM平台TBBR协议规范对 PMpro的密钥和内容进行SLM头文件安全校验,当SCP固件安全启动校验完毕后方可进行ATF程序启动,ATF固件包括BL1(Boot Loader stage1,引导装载程序阶段1)、BL2(Boot Loader stage2,引导装载程序阶段2)、BL31(Boot Loader stage3-1,引导装载程序阶段3-1)、BL32(Boot Loader stage3-2,引导装载程序阶段3-2)、BL33(Boot Loader stage3-3,引导装载程序阶段3-3)等阶段,每个阶段同样遵循ARM平台TBBR协议进行安全校验,当上述安全校验完成后即可在BL33阶段跳到UEFI(统一可扩展固件接口,Unified Extensible Firmware Interface)固件进行正常启动引导。当处理器核电压设置不正确、内存电压设置不正确、内存速率设置不正确及内存工作模式设置不正确等都会导致系统无法正常启动,由此对ARM服务器的批量部署及应用维护带来了极大的挑战。The Quicksilver processor is a processor chip with 80 64-bit ARM (Advanced RISC Machines, advanced reduced instruction set machines) processor cores independently designed by Ampere using the V8 architecture authorized by ARM. Its SCP (System Control Processor, system control processor) firmware includes SMpro (System Management Program, system management program) and PMpro (Power Management Program, power management program) microcontroller management program, SMpro microcontroller provides the entire Management of the system, including the secure boot processing mechanism, managing processor clocks and restarts, system boot, power failure detection, and error handling. PMpro microcontrollers provide power management functions, including ACPI (Advanced Configuration and Power Interface) Pstate (Power state, power state) state control, dynamic regulation of voltage and frequency, dynamic power consumption evaluation and over temperature protection mechanisms, etc. The Quicksilver processor secure boot solution involves multiple modules such as SMpro, PMpro, ATF (ARM Trusted Firmware, ARM Trusted Firmware) and UEFI (Unified Extensible Firmware Interface, Unified Extensible Firmware Interface). The SMpro boot program follows the TBBR ( Transmission BufferBloat Request, buffer congestion demand transmission) standard protocol, carry out SLM (Source Library Maintenance, source library maintenance) header file security verification on the SEC phase and image files of SMpro, and only after the verification is successful, can continue to guide down to PMpro At the same time, PMpro will also follow the TBBR protocol specification of the ARM platform to perform the security verification of the SLM header file for the key and content of the PMpro. After the SCP firmware security boot verification is completed, the ATF program can be started. The ATF firmware includes BL1 (Boot Loader stage1 , Boot Loader stage 1), BL2 (Boot Loader stage2, Boot Loader stage 2), BL31 (Boot Loader stage3-1, Boot Loader stage 3-1), BL32 (Boot Loader stage3-2, Boot Loader stage 3-2), BL33 (Boot Loader stage 3-3, boot loader stage 3-3) and other stages, each stage also follows the ARM platform TBBR protocol for security verification, when the above security verification is completed, it can be in the BL33 stage Jump to UEFI (Unified Extensible Firmware Interface, Unified Extensible Firmware Interface) firmware to boot normally. Incorrect setting of processor core voltage, incorrect memory voltage setting, incorrect memory speed setting, and incorrect memory working mode setting will cause the system to fail to start normally, which will bring great difficulties to the batch deployment and application maintenance of ARM servers. big challenge.
因此,如何自动修复ARM服务器的错误设置已成为本领域技术人员亟待解决的技术问题。Therefore, how to automatically repair the wrong setting of the ARM server has become a technical problem to be solved urgently by those skilled in the art.
发明内容SUMMARY OF THE INVENTION
本申请的目的是提供一种ARM服务器的启动修复方法,能够自动修复ARM服务器的错误设置,保证服务器能够正常启动。本申请的另一目的是提供一种ARM服务器的启动修复装置、设备以及计算机可读存储介质,均具有上述技术效果。The purpose of this application is to provide a startup repair method for an ARM server, which can automatically repair the wrong settings of the ARM server and ensure that the server can be started normally. Another object of the present application is to provide an ARM server boot repair device, device and computer-readable storage medium, all of which have the above technical effects.
为解决上述技术问题,本申请提供了一种ARM服务器的启动修复方法,包括:In order to solve the above technical problems, the present application provides a startup repair method for an ARM server, including:
初始化错误计数器的值;Initialize the value of the error counter;
每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断所述错误计数器的值是否达到预设阈值;Whenever any one of the startup boot stages in the ARM server startup process fails to execute, determine whether the value of the error counter reaches a preset threshold;
若没有达到所述预设阈值,则依据预设规则改变所述错误计数器的值, 并重启所述启动引导阶段;If the preset threshold is not reached, changing the value of the error counter according to a preset rule, and restarting the boot-up phase;
若达到所述预设阈值,则将所述ARM处理器的配置信息还原为默认设置,并在将所述ARM处理器的配置信息还原为默认设置后,重启所述启动引导阶段。If the preset threshold is reached, the configuration information of the ARM processor is restored to the default setting, and after the configuration information of the ARM processor is restored to the default setting, the boot-up phase is restarted.
可选的,触发判断所述错误计数器的值是否达到预设阈值的条件为ARM服务器启动过程中的任意一个启动引导阶段执行失败且相应的看门狗计时器计时超出预设值。Optionally, the condition for triggering judgment as to whether the value of the error counter reaches the preset threshold is that any one of the startup boot phases in the startup process of the ARM server fails to execute and the corresponding watchdog timer exceeds the preset value.
可选的,所述启动引导阶段包括:Optionally, the boot-up phase includes:
SMpro阶段、PMpro阶段、ATF阶段以及UEFI阶段。SMpro stage, PMpro stage, ATF stage and UEFI stage.
可选的,每当ARM服务器启动过程中的任意一个启动引导阶段执行失败且相应的看门狗计时器计时超出预设值时,判断所述错误计数器的值是否达到预设阈值,包括:Optionally, whenever any one of the startup boot phases in the startup process of the ARM server fails and the corresponding watchdog timer exceeds the preset value, it is determined whether the value of the error counter reaches the preset threshold, including:
当所述SMpro阶段执行失败且SCP固件的看门狗计时器计时超出所述预设值时,判断所述错误计数器的值是否达到所述预设阈值;When the execution of the SMpro stage fails and the time of the watchdog timer of the SCP firmware exceeds the preset value, determine whether the value of the error counter reaches the preset threshold;
当所述PMpro阶段执行失败且所述SCP固件的看门狗计时器计时超出所述预设值时,判断所述错误计数器的值是否达到所述预设阈值;When the execution of the PMpro phase fails and the time of the watchdog timer of the SCP firmware exceeds the preset value, determine whether the value of the error counter reaches the preset threshold;
当所述ATF阶段执行失败且所述SCP固件的看门狗计时器计时超出所述预设值时,判断所述错误计数器的值是否达到所述预设阈值;When the execution of the ATF stage fails and the time of the watchdog timer of the SCP firmware exceeds the preset value, determine whether the value of the error counter reaches the preset threshold;
当所述UEFI阶段执行失败且BMC(Baseboard Manager Controller,基板管理控制器)的FRB-2(fault-resilient booting,level 2,故障弹性引导,第2级)看门狗计时器计时超出所述预设值时,判断所述错误计数器的值是否达到所述预设阈值。When the UEFI stage fails to execute and the FRB-2 (fault-resilient booting, level 2, fail-resilient booting, level 2) watchdog timer of the BMC (Baseboard Manager Controller) exceeds the preset time When the value is set, it is judged whether the value of the error counter reaches the preset threshold.
可选的,所述依据预设规则改变所述错误计数器的值,包括:Optionally, the changing the value of the error counter according to a preset rule includes:
将所述错误计数器的值加一。Increment the value of the error counter by one.
可选的,还包括:Optionally, also include:
当所述ARM服务器正常启动后,将所述错误计数器的值还原为初始值。After the ARM server is normally started, the value of the error counter is restored to the initial value.
可选的,所述当所述ARM服务器正常启动后,将所述错误计数器的值还原为初始值,包括:Optionally, after the ARM server starts normally, restoring the value of the error counter to an initial value, including:
在所述UEFI阶段结束前将所述错误计数器的值还原为初始值。The value of the error counter is restored to the initial value before the end of the UEFI phase.
为解决上述技术问题,本申请还提供了一种ARM服务器的修复装置,包括:In order to solve the above-mentioned technical problems, the present application also provides a device for repairing an ARM server, including:
初始化模块,用于初始化错误计数器的值;initialization module, used to initialize the value of the error counter;
判断模块,用于每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断所述错误计数器的值是否达到预设阈值;A judgment module, used for judging whether the value of the error counter reaches a preset threshold whenever any one of the startup boot stages in the ARM server startup process fails to execute;
改变模块,用于若没有达到所述预设阈值,则依据预设规则改变所述错误计数器的值,并重启所述启动引导阶段;a changing module, configured to change the value of the error counter according to a preset rule and restart the boot-up phase if the preset threshold is not reached;
还原模块,用于若达到所述预设阈值,则将所述ARM处理器的配置信息还原为默认设置,并在将所述ARM处理器的配置信息还原为默认设置后,重启所述启动引导阶段。A restoration module, configured to restore the configuration information of the ARM processor to a default setting if the preset threshold is reached, and restart the bootstrap after restoring the configuration information of the ARM processor to a default setting stage.
为解决上述技术问题,本申请还提供了一种ARM服务器的启动修复设备,包括:In order to solve the above technical problems, the present application also provides a boot repair device for an ARM server, including:
存储器,用于存储计算机程序;memory for storing computer programs;
处理器,用于执行所述计算机程序时实现如上任一项所述的ARM服务器的启动修复方法的步骤。The processor is configured to implement the steps of the ARM server startup repair method according to any one of the above when executing the computer program.
为解决上述技术问题,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上任一项所述的ARM服务器的启动修复方法的步骤。In order to solve the above technical problems, the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the ARM described in any of the above is implemented. Steps for the Startup Repair method for the server.
本申请所提供的ARM服务器的启动修复方法,包括:初始化错误计数器的值;每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断所述错误计数器的值是否达到预设阈值;若没有达到所述预设阈值,则依 据预设规则改变所述错误计数器的值,并重启所述启动引导阶段;若达到所述预设阈值,则将所述ARM处理器的配置信息还原为默认设置,并在将所述ARM处理器的配置信息还原为默认设置后,重启所述启动引导阶段。The startup repair method for an ARM server provided by the present application includes: initializing the value of an error counter; whenever any one of the startup boot phases in the startup process of the ARM server fails to execute, judging whether the value of the error counter reaches a preset threshold; If the preset threshold is not reached, the value of the error counter is changed according to the preset rule, and the boot-up phase is restarted; if the preset threshold is reached, the configuration information of the ARM processor is restored to Default settings, and after restoring the configuration information of the ARM processor to the default settings, restart the boot-up phase.
可见,本申请所提供的ARM服务器的启动修复方法,增设错误计数器统计重启的次数,并当错误计数器的值达到预设阈值时,将ARM处理器的配置信息还原为默认设置,并重新启动。由此,通过将ARM处理器的配置信息还原为默认设置,可以保障ARM服务器正常启动,有效解决了人为修改配置信息而导致的ARM服务器无法正常启动的问题。It can be seen that in the startup repair method of the ARM server provided by the present application, an error counter is added to count the number of restarts, and when the value of the error counter reaches a preset threshold, the configuration information of the ARM processor is restored to the default setting and restarted. Therefore, by restoring the configuration information of the ARM processor to the default settings, the ARM server can be guaranteed to start normally, and the problem that the ARM server cannot be started normally caused by artificial modification of the configuration information can be effectively solved.
本申请所提供的ARM服务器的启动修复装置、设备以及计算机可读存储介质均具有上述技术效果。The boot-repair apparatus, device and computer-readable storage medium for an ARM server provided by the present application all have the above technical effects.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对现有技术和实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the prior art and the drawings required in the embodiments. Obviously, the drawings in the following description are only some of the drawings in the present application. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.
图1为本申请实施例所提供的一种ARM服务器的启动修复方法的流程示意图;1 is a schematic flowchart of a startup repair method for an ARM server provided by an embodiment of the present application;
图2为本申请实施例所提供的一种ARM服务器的启动修复装置的示意图;2 is a schematic diagram of a startup repair device for an ARM server provided by an embodiment of the present application;
图3为本申请实施例所提供的一种ARM服务器的启动修复设备的示意图;3 is a schematic diagram of a startup repair device for an ARM server provided by an embodiment of the present application;
图4为本申请实施例所提供的一种计算机可读存储介质的示意图。FIG. 4 is a schematic diagram of a computer-readable storage medium provided by an embodiment of the present application.
具体实施方式detailed description
本申请的核心是提供一种ARM服务器的启动修复方法,能够自动修复ARM服务器的错误设置,保证服务器能够正常启动。本申请的另一核心是提供一种ARM服务器的启动修复装置、设备以及计算机可读存储介质,均具有 上述技术效果。The core of the present application is to provide a startup repair method for an ARM server, which can automatically repair the wrong settings of the ARM server and ensure that the server can be started normally. Another core of the present application is to provide a boot repair device, equipment and computer-readable storage medium for an ARM server, all of which have the above-mentioned technical effects.
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
请参考图1,图1为本申请实施例所提供的一种ARM服务器的启动修复方法的流程示意图,参考图1所示,该方法主要包括:Please refer to FIG. 1. FIG. 1 is a schematic flowchart of an ARM server startup repair method provided by an embodiment of the present application. Referring to FIG. 1, the method mainly includes:
S101:初始化错误计数器的值;S101: Initialize the value of the error counter;
具体的,本申请在SCP固件的SMpro上增设一个计数器,即错误计数器,并通过此错误计数器来统计重启次数。其中,ARM服务器在上电或冷重启时,SCP固件的SMpro将此错误计数器的值初始化,如将错误计数器的值初始化为零。Specifically, the present application adds a counter, that is, an error counter, on the SMpro of the SCP firmware, and counts the number of restarts through the error counter. Among them, when the ARM server is powered on or cold restarted, the SMpro of the SCP firmware initializes the value of the error counter, such as initializing the value of the error counter to zero.
S102:每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断错误计数器的值是否达到预设阈值;S102: Whenever any one of the startup boot stages in the startup process of the ARM server fails to execute, determine whether the value of the error counter reaches a preset threshold;
具体的,启动引导阶段包括SMpro阶段、PMpro阶段、ATF阶段以及UEFI阶段。SMpro阶段、PMpro阶段、ATF阶段以及UEFI阶段依次执行。具体而言,若SMpro阶段执行成功,则向下引导到PMpro阶段。进一步,若PMpro阶段执行成功,则进一步执行ATF阶段,包括BL1、BL2、BL31、BL32、BL33等阶段。若ATF阶段执行成功,则在其中的BL33阶段跳到UEFI阶段。进一步,若UEFI阶段执行成功,则引导至OS即操作系统。Specifically, the booting phase includes a SMpro phase, a PMpro phase, an ATF phase, and a UEFI phase. The SMpro stage, the PMpro stage, the ATF stage, and the UEFI stage are executed sequentially. Specifically, if the execution of the SMpro stage is successful, it will lead down to the PMpro stage. Further, if the PMpro stage is successfully executed, the ATF stage is further executed, including BL1, BL2, BL31, BL32, BL33 and other stages. If the ATF stage is successfully executed, jump to the UEFI stage in the BL33 stage. Further, if the UEFI stage is successfully executed, the system is booted to the OS, that is, the operating system.
正常情况下,上述各启动引导阶段依次正常执行,而在人为修改配置信息后,上述各启动引导阶段中的一个或多个无法正常执行,进而导致ARM服务器无法正常启动。由此,本申请每当上述各启动引导阶段中的任意一个执 行失败时,SMpro固件判断错误计数器的值是否达到预设阈值,预设阈值表征最大重启次数,即SMpro固件判断重启的次数是否达到预设阈值。Under normal circumstances, the above startup and boot stages are executed normally in sequence, but after the configuration information is manually modified, one or more of the above boot and boot stages cannot be executed normally, thereby causing the ARM server to fail to start normally. Therefore, in the present application, whenever any one of the above startup and boot stages fails to execute, the SMpro firmware determines whether the value of the error counter reaches a preset threshold, and the preset threshold represents the maximum number of restarts, that is, the SMpro firmware determines whether the number of restarts reaches the threshold. Preset threshold.
其中,对于上述预设阈值的具体数值,本申请不做唯一限定,可以根据实际需要进行差异性设置。例如,预设阈值为3。The specific numerical value of the above-mentioned preset threshold is not uniquely limited in this application, and can be set differently according to actual needs. For example, the preset threshold is 3.
另外,上述触发判断所述错误计数器的值是否达到预设阈值的条件为ARM服务器启动过程中的任意一个启动引导阶段执行失败且相应的看门狗计时器计时超出预设值。即每当ARM服务器启动过程中的任意一个启动引导阶段执行失败且相应的看门狗计时器计时超出预设值时,SMpro固件判断错误计时器的值是否达到预设阈值。In addition, the above-mentioned condition for triggering judgment on whether the value of the error counter reaches the preset threshold is that any one of the startup boot phases in the startup process of the ARM server fails to execute and the corresponding watchdog timer exceeds the preset value. That is, whenever any one of the startup boot phases in the startup process of the ARM server fails and the corresponding watchdog timer exceeds the preset value, the SMpro firmware determines whether the value of the error timer reaches the preset threshold.
进一步,上述每当ARM服务器启动过程中的任意一个启动引导阶段执行失败且相应的看门狗计时器计时超出预设值时,判断错误计数器的值是否达到预设阈值,包括:Further, whenever any one of the above-mentioned startup boot phases in the startup process of the ARM server fails and the corresponding watchdog timer exceeds the preset value, it is determined whether the value of the error counter reaches the preset threshold, including:
当SMpro阶段执行失败且SCP固件的看门狗计时器计时超出预设值时,判断错误计数器的值是否达到预设阈值;When the execution of the SMpro stage fails and the watchdog timer of the SCP firmware exceeds the preset value, determine whether the value of the error counter reaches the preset threshold;
当PMpro阶段执行失败且SCP固件的看门狗计时器计时超出预设值时,判断错误计数器的值是否达到预设阈值;When the execution of the PMpro stage fails and the watchdog timer of the SCP firmware exceeds the preset value, determine whether the value of the error counter reaches the preset threshold;
当ATF阶段执行失败且SCP固件的看门狗计时器计时超出预设值时,判断错误计数器的值是否达到预设阈值;When the execution of the ATF stage fails and the watchdog timer of the SCP firmware exceeds the preset value, determine whether the value of the error counter reaches the preset threshold;
当UEFI阶段执行失败且BMC的FRB-2看门狗计时器计时超出预设值时,判断错误计数器的值是否达到预设阈值。When the UEFI stage fails to execute and the FRB-2 watchdog timer of the BMC exceeds the preset value, it is determined whether the value of the error counter reaches the preset threshold.
具体而言,当SMpro阶段执行失败未能正常引导到PMpro阶段,导致无法正常开机,并且SCP固件的看门狗计时器计时超出预设值时,SMpro固件判断错误计数器的值是否达到预设阈值。当PMpro阶段执行失败未能正常引导到ATF阶段,导致无法正常开机,并且SCP固件的看门狗计时器计时超出预设值时,SMpro固件判断错误计数器的值是否达到预设阈值。当ATF阶段执行失败 未能正常引导到UEFI阶段,导致无法正常开机,并且SCP固件的看门狗计时器计时超出预设值时,SMpro固件判断错误计数器的值是否达到预设阈值。当UEFI阶段执行失败未能正常引导到OS,导致系统无法正常开机,并且BMC的FRB-2看门狗计时器计时超出预设值时,BMC通知SMpro固件UEFI阶段执行失败,进而SMpro固件判断错误计数器的值是否达到预设阈值。其中,上述预设值可设置为大于开机时间的任意值。Specifically, when the execution of the SMpro stage fails and fails to boot to the PMpro stage normally, resulting in failure to boot normally, and the watchdog timer of the SCP firmware exceeds the preset value, the SMpro firmware determines whether the value of the error counter reaches the preset threshold. . When the execution of the PMpro stage fails and fails to boot to the ATF stage normally, resulting in failure to boot normally, and the watchdog timer of the SCP firmware exceeds the preset value, the SMpro firmware determines whether the value of the error counter reaches the preset threshold. When the ATF stage fails to boot to the UEFI stage normally, which results in failure to boot normally, and the watchdog timer of the SCP firmware exceeds the preset value, the SMpro firmware determines whether the value of the error counter reaches the preset threshold. When the UEFI stage fails to boot to the OS, the system cannot be booted normally, and the BMC's FRB-2 watchdog timer exceeds the preset value, the BMC informs the SMpro firmware that the UEFI stage fails to execute, and the SMpro firmware judges incorrectly. Whether the value of the counter reaches the preset threshold. Wherein, the above-mentioned preset value can be set to any value greater than the power-on time.
S103:若未达到预设阈值,则依据预设规则改变所述错误计数器的值,并重启启动引导阶段;S103: If the preset threshold is not reached, change the value of the error counter according to the preset rule, and restart the booting phase;
具体的,若错误计数器的值没有达到预设阈值,则依据预设规则改变错误计数器的值,并重启启动引导阶段,具体重启首个启动引导阶段,即重启SMpro阶段,若SMpro阶段执行成功,则进一步自动执行PMpro阶段,若PMpro阶段执行成功,则进一步执行ATF阶段,依次类推。重启后,同样会执行每当ARM服务器启动过程中的任意一个启动引导阶段执行失败且相应的看门狗计时器计时超出预设值时,判断错误计数器的值是否达到预设阈值的操作。Specifically, if the value of the error counter does not reach the preset threshold, the value of the error counter is changed according to the preset rules, and the bootstrap stage is restarted, specifically restarting the first bootstrap stage, that is, restarting the SMpro stage. If the SMpro stage is successfully executed, Then, the PMpro stage is further automatically executed. If the PMpro stage is successfully executed, the ATF stage is further executed, and so on. After the restart, the operation of judging whether the value of the error counter reaches the preset threshold value will also be performed whenever any of the startup boot phases in the startup process of the ARM server fails and the corresponding watchdog timer exceeds the preset value.
其中,在一种具体的实施方式中,上述依据预设规则改变错误计数器的值,包括:将错误计数器的值加一。具体而言,每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断错误计数器的值是否达到预设阈值,若没有达到,则将错误计数器的值加一,并重启启动引导阶段。如此,错误计数器的值与重启的次数相对应,当重启的次数达到最大值时,错误计数器的值达到预设阈值。Wherein, in a specific implementation manner, the above-mentioned changing the value of the error counter according to the preset rule includes: increasing the value of the error counter by one. Specifically, whenever any of the startup boot phases in the ARM server startup process fails, determine whether the value of the error counter reaches the preset threshold, if not, increment the value of the error counter by one, and restart the boot boot phase . In this way, the value of the error counter corresponds to the number of restarts, and when the number of restarts reaches the maximum value, the value of the error counter reaches the preset threshold.
S104:若达到预设阈值,则将ARM处理器的配置信息还原为默认设置,并在将ARM处理器的配置信息还原为默认设置后,重启启动引导阶段。S104: If the preset threshold is reached, restore the configuration information of the ARM processor to the default setting, and restart the boot stage after restoring the configuration information of the ARM processor to the default setting.
具体的,若达到预设阈值,则加载BIOS(Basic Input/Output System,基本输入/输出系统)的备份NVRAM(Non-Volatile Random Access Memory,非易失随机存取存储器)参数,即将ARM处理器的配置信息还原为默认设置, 并在将ARM处理器的配置信息还原为默认设置后,重启启动引导阶段,以使ARM服务器能够正常启动。Specifically, if the preset threshold is reached, the backup NVRAM (Non-Volatile Random Access Memory, non-volatile random access memory) parameter of the BIOS (Basic Input/Output System) is loaded, which is the ARM processor. The configuration information of the ARM processor is restored to the default setting, and after the configuration information of the ARM processor is restored to the default setting, the booting phase is restarted, so that the ARM server can be started normally.
以错误计数器的初始值为0,每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,将错误计数器的值加一,预设阈值等于3为例:Taking the initial value of the error counter as 0, whenever any of the startup and boot phases in the ARM server startup process fails, the value of the error counter is incremented by one, and the preset threshold is equal to 3 as an example:
ARM服务器上电或冷启动后,将错误计数器的值初始化为零;After the ARM server is powered on or cold started, initialize the value of the error counter to zero;
若SMpro阶段执行失败,则判断错误计数器的值是否为3,若错误计数器的值为3,则将ARM处理器的配置信息还原为默认设置;若错误记计数器的值不等于3,则重新启动SMpro阶段,并将错误计数器的值加一;若SMpro阶段执行成功,则进一步执行PMpro阶段。若PMpro阶段执行失败,则判断错误计数器的值是否为3,若错误计数器的值为3,则将ARM处理器的配置信息还原为默认设置;若错误记计数器的值不等于3,则重新启动SMpro阶段,并将错误计数器的值加一;若PMpro阶段执行成功,则进一步执行ATF阶段。同理,若ATF阶段执行失败,则判断错误计数器的值是否为3,若错误计数器的值为3,则将ARM处理器的配置信息还原为默认设置;若错误记计数器的值不等于3,则重新启动SMpro阶段,并将错误计数器的值加一;若ATF阶段执行成功,则进一步执行UEFI阶段。若ATF阶段执行失败,则判断错误计数器的值是否为3,若错误计数器的值为3,则将ARM处理器的配置信息还原为默认设置;若错误记计数器的值不等于3,则重新启动SMpro阶段,并将错误计数器的值加一;若UEFI阶段执行成功,则启动引导到OS。If the execution of the SMpro stage fails, judge whether the value of the error counter is 3. If the value of the error counter is 3, restore the configuration information of the ARM processor to the default setting; if the value of the error counter is not equal to 3, restart SMpro stage, and the value of the error counter is incremented by one; if the SMpro stage is successfully executed, the PMpro stage is further executed. If the execution of the PMpro stage fails, judge whether the value of the error counter is 3. If the value of the error counter is 3, restore the configuration information of the ARM processor to the default setting; if the value of the error counter is not equal to 3, restart SMpro stage, and the value of the error counter is incremented by one; if the PMpro stage is successfully executed, the ATF stage is further executed. Similarly, if the execution of the ATF stage fails, it is judged whether the value of the error counter is 3. If the value of the error counter is 3, the configuration information of the ARM processor is restored to the default setting; if the value of the error counter is not equal to 3, The SMpro stage is restarted, and the value of the error counter is incremented by one; if the ATF stage is successfully executed, the UEFI stage is further executed. If the ATF stage fails, judge whether the value of the error counter is 3, if the value of the error counter is 3, restore the configuration information of the ARM processor to the default setting; if the value of the error counter is not equal to 3, restart SMpro stage, and increase the value of the error counter by one; if the UEFI stage is successfully executed, boot to the OS.
进一步,在上述实施例的基础上,还包括:Further, on the basis of the above-mentioned embodiment, it also includes:
当ARM服务器正常启动后,将错误计数器的值还原为初始值。且可具体为在UEFI阶段结束前将错误计数器的值还原为初始值。即在UEFI阶段执行无任何问题,并即将引导到OS前,将错误计数器的值还原为初始值,如将错误器的值清零。When the ARM server starts normally, restore the value of the error counter to the initial value. And it may specifically be restoring the value of the error counter to the initial value before the UEFI stage ends. That is, there is no problem in the UEFI stage, and before booting to the OS, restore the value of the error counter to the initial value, such as clearing the value of the error counter.
综上所述,本申请所提供的ARM服务器的启动修复方法,包括:初始化错误计数器的值;每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断所述错误计数器的值是否达到预设阈值;若没有达到所述预设阈值,则依据预设规则改变所述错误计数器的值,并重启所述启动引导阶段;若达到所述预设阈值,则将所述ARM处理器的配置信息还原为默认设置,并在将所述ARM处理器的配置信息还原为默认设置后,重启所述启动引导阶段。该启动修复方法,增设错误计数器统计重启的次数,并当错误计数器的值达到预设阈值时,将ARM处理器的配置信息还原为默认设置,并重新启动。由此,通过将ARM处理器的配置信息还原为默认设置,可以保障ARM服务器正常启动,有效解决了人为修改配置信息而导致的ARM服务器无法正常启动的问题。To sum up, the startup repair method for an ARM server provided by the present application includes: initializing the value of an error counter; whenever any one of the startup boot phases in the startup process of the ARM server fails to execute, judging whether the value of the error counter is not Reaching the preset threshold; if the preset threshold is not reached, the value of the error counter is changed according to the preset rule, and the startup boot stage is restarted; if the preset threshold is reached, the ARM processor The configuration information of the ARM processor is restored to the default setting, and after the configuration information of the ARM processor is restored to the default setting, the startup and booting stage is restarted. In the startup repair method, an error counter is added to count the number of restarts, and when the value of the error counter reaches a preset threshold, the configuration information of the ARM processor is restored to the default setting and restarted. Therefore, by restoring the configuration information of the ARM processor to the default settings, the ARM server can be guaranteed to start normally, and the problem that the ARM server cannot be started normally caused by artificial modification of the configuration information can be effectively solved.
本申请还提供了一种ARM服务器的启动修复装置,下文描述的该装置可以与上文描述的方法相互对应参照。请参考图2,图2为本申请实施例所提供的一种ARM服务器的启动修复装置的示意图,结合图2所示,该装置包括:The present application also provides a boot repair device for an ARM server, and the device described below can be referred to in correspondence with the method described above. Please refer to FIG. 2. FIG. 2 is a schematic diagram of an apparatus for booting up an ARM server according to an embodiment of the present application. With reference to FIG. 2, the apparatus includes:
初始化模块10,用于初始化错误计数器的值;an initialization module 10 for initializing the value of the error counter;
判断模块20,用于每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断错误计数器的值是否达到预设阈值;The judgment module 20 is used for judging whether the value of the error counter reaches a preset threshold value whenever any one of the startup boot stages in the ARM server startup process fails to execute;
改变模块30,用于若没有达到预设阈值,则依据预设规则改变错误计数器的值,并重启启动引导阶段;The changing module 30 is used for changing the value of the error counter according to the preset rule and restarting the boot phase if the preset threshold is not reached;
还原模块40,用于若达到预设阈值,则将ARM处理器的配置信息还原为默认设置,并在将ARM处理器的配置信息还原为默认设置后,重启启动引导阶段。The restoring module 40 is configured to restore the configuration information of the ARM processor to the default setting if the preset threshold is reached, and restart the boot stage after restoring the configuration information of the ARM processor to the default setting.
在上述实施例的基础上,可选的,启动引导阶段包括:SMpro阶段、PMpro阶段、ATF阶段以及UEFI阶段。On the basis of the foregoing embodiment, optionally, the booting phase includes: a SMpro phase, a PMpro phase, an ATF phase, and a UEFI phase.
在上述实施例的基础上,可选的,触发判断所述错误计数器的值是否达到预设阈值的条件为ARM服务器启动过程中的任意一个启动引导阶段执行失败且相应的看门狗计时器计时超出预设值。On the basis of the above embodiment, optionally, the condition for triggering the judgment of whether the value of the error counter reaches the preset threshold is that any one of the startup boot phases in the startup process of the ARM server fails to execute and the corresponding watchdog timer counts exceeds the preset value.
在上述实施例的基础上,可选的,判断模块20包括:On the basis of the above embodiment, optionally, the judging module 20 includes:
第一判断单元,用于当SMpro阶段执行失败且SCP固件的看门狗计时器计时超出预设值时,判断错误计数器的值是否达到预设阈值;The first judgment unit is used for judging whether the value of the error counter reaches the preset threshold when the execution of the SMpro stage fails and the timing of the watchdog timer of the SCP firmware exceeds the preset value;
第二判断单元,用于当PMpro阶段执行失败且SCP固件的看门狗计时器计时超出预设值时,判断错误计数器的值是否达到预设阈值;The second judgment unit is used to judge whether the value of the error counter reaches the preset threshold when the execution of the PMpro stage fails and the watchdog timer of the SCP firmware exceeds the preset value;
第三判断单元,用于当ATF阶段执行失败且SCP固件的看门狗计时器计时超出预设值时,判断错误计数器的值是否达到预设阈值;The third judgment unit is used to judge whether the value of the error counter reaches the preset threshold when the execution of the ATF stage fails and the watchdog timer of the SCP firmware exceeds the preset value;
第四判断单元,用于当UEFI阶段执行失败且BMC的FRB-2看门狗计时器计时超出预设值时,判断错误计数器的值是否达到预设阈值。The fourth judgment unit is configured to judge whether the value of the error counter reaches the preset threshold when the UEFI stage fails to execute and the FRB-2 watchdog timer of the BMC exceeds the preset value.
在上述实施例的基础上,改变模块30具体用于若未达到预设阈值,则将错误计数器的值加一,并重启启动引导阶段。On the basis of the above embodiment, the changing module 30 is specifically configured to increase the value of the error counter by one if the preset threshold is not reached, and restart the boot phase.
在上述实施例的基础上,可选的,还包括:On the basis of the above-mentioned embodiment, optional, also includes:
计数值还原模块,用于当ARM服务器正常启动后,将错误计数器的值还原为初始值。The count value restoration module is used to restore the value of the error counter to the initial value after the ARM server starts normally.
在上述实施例的基础上,可选的,计数值还原模块具体用于在UEFI阶段结束前将错误计数器的值还原为初始值。On the basis of the above embodiment, optionally, the count value restoration module is specifically configured to restore the value of the error counter to the initial value before the UEFI stage ends.
本申请还提供了一种ARM服务器的启动修复设备,参考图3所示,该设备包括存储器1和处理器2。其中,存储器1,用于存储计算机程序;处理器2,用于执行计算机程序实现如下的步骤:The present application also provides a boot-repair device for an ARM server. Referring to FIG. 3 , the device includes a memory 1 and a processor 2 . Wherein, the memory 1 is used to store the computer program; the processor 2 is used to execute the computer program to realize the following steps:
初始化错误计数器的值;每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断错误计数器的值是否达到预设阈值;若没有达到预 设阈值,则依据预设规则改变错误计数器的值,并重启启动引导阶段;若达到预设阈值,则将ARM处理器的配置信息还原为默认设置,并在将ARM处理器的配置信息还原为默认设置后,重启启动引导阶段。Initialize the value of the error counter; whenever any one of the startup boot phases in the ARM server startup process fails, determine whether the value of the error counter reaches the preset threshold; if it does not reach the preset threshold, change the error counter according to the preset rules. If the preset threshold is reached, the configuration information of the ARM processor is restored to the default setting, and after the configuration information of the ARM processor is restored to the default setting, the booting phase is restarted.
对于本申请所提供的设备的介绍请参照上述方法实施例,本申请在此不做赘述。For the introduction of the device provided in the present application, please refer to the above method embodiments, which will not be repeated in this application.
本申请还提供了一种计算机可读存储介质4,如图4所示,该计算机可读存储介质4上存储有计算机程序41,计算机程序41被处理器执行时可实现如下的步骤:The present application also provides a computer-readable storage medium 4. As shown in FIG. 4, a computer program 41 is stored on the computer-readable storage medium 4. When the computer program 41 is executed by the processor, the following steps can be implemented:
初始化错误计数器的值;每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断错误计数器的值是否达到预设阈值;若没有达到预设阈值,则依据预设规则改变错误计数器的值,并重启启动引导阶段;若达到预设阈值,则将ARM处理器的配置信息还原为默认设置,并在将ARM处理器的配置信息还原为默认设置后,重启启动引导阶段。Initialize the value of the error counter; whenever any one of the startup boot phases in the ARM server startup process fails, determine whether the value of the error counter reaches the preset threshold; if it does not reach the preset threshold, change the error counter according to the preset rules. If the preset threshold is reached, the configuration information of the ARM processor is restored to the default setting, and after the configuration information of the ARM processor is restored to the default setting, the booting phase is restarted.
该计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The computer-readable storage medium may include: a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, etc. that can store program codes medium.
对于本申请所提供的计算机可读存储介质的介绍请参照上述方法实施例,本申请在此不做赘述。For the introduction of the computer-readable storage medium provided by the present application, please refer to the above method embodiments, which are not repeated in this application.
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置、设备以及计算机可读存储介质而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other. For the apparatuses, devices, and computer-readable storage media disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple, and reference may be made to the descriptions of the methods for related parts.
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示 例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Professionals may further realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the possibilities of hardware and software. Interchangeability, the above description has generally described the components and steps of each example in terms of function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM(光盘只读存储器)、或技术领域内所公知的任意其它形式的存储介质中。The steps of a method or algorithm described in conjunction with the embodiments disclosed herein may be directly implemented in hardware, a software module executed by a processor, or a combination of the two. Software modules can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM (compact disk read only memory) , or any other form of storage medium known in the technical field.
以上对本申请所提供的技术方案进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围。The technical solutions provided by the present application are described in detail above. Specific examples are used herein to illustrate the principles and implementations of the present application, and the descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application. It should be pointed out that for those of ordinary skill in the art, without departing from the principles of the present application, several improvements and modifications can also be made to the present application, and these improvements and modifications also fall within the protection scope of the claims of the present application.

Claims (10)

  1. 一种ARM服务器的启动修复方法,其特征在于,包括:A startup repair method for an ARM server, comprising:
    初始化错误计数器的值;Initialize the value of the error counter;
    每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断所述错误计数器的值是否达到预设阈值;Whenever any one of the startup boot stages in the ARM server startup process fails to execute, determine whether the value of the error counter reaches a preset threshold;
    若没有达到所述预设阈值,则依据预设规则改变所述错误计数器的值,并重启所述启动引导阶段;If the preset threshold is not reached, changing the value of the error counter according to a preset rule, and restarting the boot-up phase;
    若达到所述预设阈值,则将所述ARM处理器的配置信息还原为默认设置,并在将所述ARM处理器的配置信息还原为默认设置后,重启所述启动引导阶段。If the preset threshold is reached, the configuration information of the ARM processor is restored to the default setting, and after the configuration information of the ARM processor is restored to the default setting, the boot-up phase is restarted.
  2. 根据权利要求1所述的启动修复方法,其特征在于,所述启动引导阶段包括:SMpro阶段、PMpro阶段、ATF阶段以及UEFI阶段。The startup repair method according to claim 1, wherein the startup and boot phase comprises: a SMpro phase, a PMpro phase, an ATF phase and a UEFI phase.
  3. 根据权利要求1所述的启动修复方法,其特征在于,触发判断所述错误计数器的值是否达到预设阈值的条件为:ARM服务器启动过程中的任意一个启动引导阶段执行失败且相应的看门狗计时器计时超出预设值。The startup repair method according to claim 1, wherein the condition for triggering judgment as to whether the value of the error counter reaches a preset threshold is: any one of the startup boot phases in the startup process of the ARM server fails to execute and the corresponding gatekeeper The dog timer has exceeded the preset value.
  4. 根据权利要求2所述的启动修复方法,其特征在于,每当ARM服务器启动过程中的任意一个启动引导阶段执行失败且相应的看门狗计时器计时超出预设值时,判断所述错误计数器的值是否达到预设阈值,包括:The startup repair method according to claim 2, wherein the error counter is judged whenever any one of the startup boot phases in the startup process of the ARM server fails and the corresponding watchdog timer exceeds a preset value. Whether the value of reaches the preset threshold, including:
    当所述SMpro阶段执行失败且SCP固件的看门狗计时器计时超出所述预设值时,判断所述错误计数器的值是否达到所述预设阈值;When the execution of the SMpro stage fails and the time of the watchdog timer of the SCP firmware exceeds the preset value, determine whether the value of the error counter reaches the preset threshold;
    当所述PMpro阶段执行失败且所述SCP固件的看门狗计时器计时超出所述预设值时,判断所述错误计数器的值是否达到所述预设阈值;When the execution of the PMpro phase fails and the time of the watchdog timer of the SCP firmware exceeds the preset value, determine whether the value of the error counter reaches the preset threshold;
    当所述ATF阶段执行失败且所述SCP固件的看门狗计时器计时超出所述预设值时,判断所述错误计数器的值是否达到所述预设阈值;When the execution of the ATF stage fails and the time of the watchdog timer of the SCP firmware exceeds the preset value, determine whether the value of the error counter reaches the preset threshold;
    当所述UEFI阶段执行失败且BMC的FRB-2看门狗计时器计时超出所述预设值时,判断所述错误计数器的值是否达到所述预设阈值。When the execution of the UEFI stage fails and the FRB-2 watchdog timer of the BMC exceeds the preset value, it is determined whether the value of the error counter reaches the preset threshold.
  5. 根据权利要求1所述的启动修复方法,其特征在于,所述依据预设规则改变所述错误计数器的值,包括:The startup repair method according to claim 1, wherein the changing the value of the error counter according to a preset rule comprises:
    将所述错误计数器的值加一。Increment the value of the error counter by one.
  6. 根据权利要求1所述的启动修复方法,其特征在于,还包括:The startup repair method according to claim 1, further comprising:
    当所述ARM服务器正常启动后,将所述错误计数器的值还原为初始值。After the ARM server is normally started, the value of the error counter is restored to the initial value.
  7. 根据权利要求6所述的启动修复方法,其特征在于,所述当所述ARM服务器正常启动后,将所述错误计数器的值还原为初始值,包括:The startup repair method according to claim 6, wherein the restoring the value of the error counter to an initial value after the ARM server starts normally, comprising:
    在所述UEFI阶段结束前将所述错误计数器的值还原为初始值。The value of the error counter is restored to the initial value before the end of the UEFI phase.
  8. 一种ARM服务器的修复装置,其特征在于,包括:A device for repairing an ARM server, comprising:
    初始化模块,用于初始化错误计数器的值;initialization module, used to initialize the value of the error counter;
    判断模块,用于每当ARM服务器启动过程中的任意一个启动引导阶段执行失败时,判断所述错误计数器的值是否达到预设阈值;A judgment module, used for judging whether the value of the error counter reaches a preset threshold whenever any one of the startup boot stages in the ARM server startup process fails to execute;
    改变模块,用于若没有达到所述预设阈值,则依据预设规则改变所述错误计数器的值,并重启所述启动引导阶段;a changing module, configured to change the value of the error counter according to a preset rule and restart the boot-up phase if the preset threshold is not reached;
    还原模块,用于若达到所述预设阈值,则将所述ARM处理器的配置信息还原为默认设置,并在将所述ARM处理器的配置信息还原为默认设置后,重启所述启动引导阶段。A restoration module, configured to restore the configuration information of the ARM processor to a default setting if the preset threshold is reached, and restart the bootstrap after restoring the configuration information of the ARM processor to a default setting stage.
  9. 一种ARM服务器的启动修复设备,其特征在于,包括:A startup repair device for an ARM server, characterized in that it includes:
    存储器,用于存储计算机程序;memory for storing computer programs;
    处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述的ARM服务器的启动修复方法的步骤。The processor is configured to implement the steps of the ARM server startup repair method according to any one of claims 1 to 7 when executing the computer program.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的ARM服务器的启动修复方法的步骤。A computer-readable storage medium, characterized in that, a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the ARM server according to any one of claims 1 to 7 is implemented. Steps to start the repair method.
PCT/CN2021/073359 2020-08-21 2021-01-22 Boot restoration method for arm server, and related apparatus WO2022037014A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010851698.3 2020-08-21
CN202010851698.3A CN112000508A (en) 2020-08-21 2020-08-21 Starting repair method of ARM server and related device

Publications (1)

Publication Number Publication Date
WO2022037014A1 true WO2022037014A1 (en) 2022-02-24

Family

ID=73473974

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073359 WO2022037014A1 (en) 2020-08-21 2021-01-22 Boot restoration method for arm server, and related apparatus

Country Status (2)

Country Link
CN (1) CN112000508A (en)
WO (1) WO2022037014A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000508A (en) * 2020-08-21 2020-11-27 苏州浪潮智能科技有限公司 Starting repair method of ARM server and related device
CN113032026A (en) * 2021-03-19 2021-06-25 山东英信计算机技术有限公司 Firmware management method, device, equipment and medium for server mainboard

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100010390A (en) * 2008-07-22 2010-02-01 엘지전자 주식회사 Microcomputer and method for controlling thereof
CN107038085A (en) * 2016-02-03 2017-08-11 阿里巴巴集团控股有限公司 A kind of restorative procedure of client application, apparatus and system
CN107844330A (en) * 2017-10-25 2018-03-27 郑州云海信息技术有限公司 A kind of method and system of enhancing ARM startup of server code reliabilities
CN107894949A (en) * 2017-10-11 2018-04-10 五八有限公司 The method, apparatus and equipment of abnormality processing
CN109783149A (en) * 2019-01-17 2019-05-21 Oppo广东移动通信有限公司 Start-up control method, device, mobile terminal and storage medium
CN112000508A (en) * 2020-08-21 2020-11-27 苏州浪潮智能科技有限公司 Starting repair method of ARM server and related device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100010390A (en) * 2008-07-22 2010-02-01 엘지전자 주식회사 Microcomputer and method for controlling thereof
CN107038085A (en) * 2016-02-03 2017-08-11 阿里巴巴集团控股有限公司 A kind of restorative procedure of client application, apparatus and system
CN107894949A (en) * 2017-10-11 2018-04-10 五八有限公司 The method, apparatus and equipment of abnormality processing
CN107844330A (en) * 2017-10-25 2018-03-27 郑州云海信息技术有限公司 A kind of method and system of enhancing ARM startup of server code reliabilities
CN109783149A (en) * 2019-01-17 2019-05-21 Oppo广东移动通信有限公司 Start-up control method, device, mobile terminal and storage medium
CN112000508A (en) * 2020-08-21 2020-11-27 苏州浪潮智能科技有限公司 Starting repair method of ARM server and related device

Also Published As

Publication number Publication date
CN112000508A (en) 2020-11-27

Similar Documents

Publication Publication Date Title
US10534618B2 (en) Auto bootloader recovery in BMC
US9946553B2 (en) BMC firmware recovery
TWI754317B (en) Method and system for optimal boot path for a network device
JP5575338B2 (en) Information processing apparatus, information processing method, and computer program
TWI664574B (en) Method of patching boot code of read-only memory and system-on-chip
US9880908B2 (en) Recovering from compromised system boot code
US8918778B2 (en) Method of fail safe flashing management device and application of the same
CN105917306B (en) System and method for configuring system firmware configuration data
WO2022037014A1 (en) Boot restoration method for arm server, and related apparatus
US20040158702A1 (en) Redundancy architecture of computer system using a plurality of BIOS programs
EP2513781A1 (en) Methods and devices for updating firmware of a component using a firmware update application
WO2016206514A1 (en) Startup processing method and device
BR112014014815B1 (en) COMPUTING DEVICE, METHOD AND STORAGE MEANS FOR PERFORMING FIRMWARE BACKUP COPY
CN103885847A (en) Dog feeding method and device based on embedded system
US20090271660A1 (en) Motherboard, a method for recovering the bios thereof and a method for booting a computer
TW201239759A (en) BIOS update method and computer system for using the same
CN107766102B (en) Boot method of dual basic input/output system (BIOS) and electronic device with same
US20220214945A1 (en) System Booting Method and Apparatus, Node Device, and Computer-Readable Storage Medium
TW201944239A (en) Server and method for restoring a baseboard management controller automatically
CN111090546A (en) Method, device and equipment for restarting operating system and readable storage medium
US9342392B2 (en) Image forming apparatus, image forming apparatus control method, and recording medium
JP5585502B2 (en) Information processing apparatus and firmware update method thereof
TWI554876B (en) Method for processing node replacement and server system using the same
CN111078452A (en) BMC firmware image recovery method and device
TWI839136B (en) Firmware update method for downstream devices of bmc

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21857120

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21857120

Country of ref document: EP

Kind code of ref document: A1