WO2023065601A1 - 服务器组件自检异常恢复方法、装置、系统及介质 - Google Patents

服务器组件自检异常恢复方法、装置、系统及介质 Download PDF

Info

Publication number
WO2023065601A1
WO2023065601A1 PCT/CN2022/083574 CN2022083574W WO2023065601A1 WO 2023065601 A1 WO2023065601 A1 WO 2023065601A1 CN 2022083574 W CN2022083574 W CN 2022083574W WO 2023065601 A1 WO2023065601 A1 WO 2023065601A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
firmware data
self
firmware
preset
Prior art date
Application number
PCT/CN2022/083574
Other languages
English (en)
French (fr)
Inventor
叶明洋
张敏
刘闻禹
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US18/564,699 priority Critical patent/US20240264914A1/en
Publication of WO2023065601A1 publication Critical patent/WO2023065601A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2284Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing by power-on test, e.g. power-on self test [POST]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing
    • G06F11/273Tester hardware, i.e. output processing circuits
    • G06F11/277Tester hardware, i.e. output processing circuits with comparison between actual response and known fault-free response

Definitions

  • the present application relates to the field of computer technology, and in particular to a method, device, system and medium for recovering from server component self-check abnormality.
  • BIOS that is, Basis Input Output System, basic input and output system
  • POST that is, POWER ON SELF TEST, power-on self-test
  • the firmware is usually loaded from a flash memory chip.
  • the firmware is mainly to manually switch the flash memory chip or reprogram the firmware to solve the abnormal problem during the self-test.
  • it takes a lot of time and labor costs to adopt the above method, and it is unavoidable to avoid repeated occurrence of this problem in the future.
  • the purpose of the present application is to provide a method, device, system and medium for recovering from server component self-test abnormalities, which can automatically recover from power-on self-test abnormalities that occur during server operating system startup.
  • the specific plan is as follows:
  • the present application discloses a server component self-test exception recovery method, which is applied to the server control chip, including:
  • first firmware data and the second firmware data are consistent, then determine target firmware data for performing self-test exception recovery based on the first firmware data and the second firmware data, and set the target firmware data to The data is sent to the platform control center, so that the platform control center uses the target firmware data to perform self-check abnormal recovery.
  • the querying the corresponding first firmware data and second firmware data from the first flash memory chip and the second flash memory chip based on the self-test abnormal state data includes:
  • the self-inspection abnormal state data is sent to the first flash memory chip and the second flash memory chip, so that the first flash memory chip and the second flash memory chip can utilize the self-inspection Perform corresponding firmware data query operations on abnormal state data;
  • the judging whether the self-inspection abnormal state data is wrong includes:
  • the method before obtaining the preset state data corresponding to the self-test abnormal state data from the field replaceable unit, the method further includes:
  • the operation corresponding to the self-test abnormal status data is the preset operation, trigger the step of acquiring the preset status data corresponding to the self-check abnormal status data from the field replaceable unit.
  • the judging whether the self-inspection abnormal state data is wrong it also includes:
  • the method further includes:
  • the preset firmware data corresponding to the preset state data in the field replaceable unit is consistent with the first firmware data or the second firmware data The data are consistent, wherein the preset state data is the preset state data corresponding to the self-test abnormal state data in the field replaceable unit;
  • the preset firmware data in the field-replaceable unit is consistent with the first firmware data or the second firmware data
  • combining the first firmware data and the second firmware data with the The firmware data that is consistent with the preset firmware data is sent to the platform control center, so that the platform control center uses the firmware data to perform self-check abnormal recovery.
  • the method further includes:
  • the present application discloses a server component self-test abnormal recovery device, which is applied to the server control chip, including:
  • the data acquisition module is used to obtain the self-inspection abnormal state data sent by the platform control center when the target component in the server has self-inspection abnormality;
  • a data query module configured to query corresponding first firmware data and second firmware data from the first flash memory chip and the second flash memory chip based on the self-test abnormal state data
  • a data comparison module configured to compare whether the first firmware data is consistent with the second firmware data
  • a data sending module configured to determine, based on the first firmware data and the second firmware data, target firmware data for performing self-test abnormality recovery if the first firmware data is consistent with the second firmware data , and send the target firmware data to the platform control center, so that the platform control center uses the target firmware data to perform self-check abnormal recovery.
  • the device for recovering from abnormal self-test of the server component also includes:
  • a third judging unit configured to judge whether the preset firmware data corresponding to the preset status data in the field replaceable unit is consistent with the first firmware data if the first firmware data is inconsistent with the second firmware data.
  • the firmware data is consistent with the second firmware data, wherein the preset status data is preset status data corresponding to the self-test abnormal status data in the field replaceable unit;
  • a second data sending unit configured to send the first firmware data and the second firmware data if the preset firmware data in the field replaceable unit is consistent with the first firmware data or the second firmware data
  • the firmware data in the second firmware data that is consistent with the preset firmware data is sent to the platform control center, so that the platform control center uses the firmware data to perform self-check abnormal recovery.
  • the present application discloses a server system, including:
  • the system also includes a server control chip connected to the first flash memory chip, the second flash memory chip, and the memory, respectively, for executing the computer program in the memory to achieve the following steps:
  • the target component in the server When the target component in the server has a self-test abnormality, then obtain the self-test abnormal state data sent by the platform control center; query from the first flash memory chip and the second flash memory chip based on the self-test abnormal state data Corresponding first firmware data and second firmware data; comparing whether the first firmware data is consistent with the second firmware data; if the first firmware data is consistent with the second firmware data, based on the first firmware data The first firmware data and the second firmware data determine the target firmware data for self-check abnormal recovery, and send the target firmware data to the platform control center, so that the platform control center can use the target firmware data Perform self-test exception recovery.
  • the method further includes:
  • the preset firmware data corresponding to the preset state data in the field replaceable unit is consistent with the first firmware data or the second firmware data The data are consistent, wherein the preset state data is the preset state data corresponding to the self-test abnormal state data in the field replaceable unit;
  • the preset firmware data in the field-replaceable unit is consistent with the first firmware data or the second firmware data
  • combining the first firmware data and the second firmware data with the The firmware data that is consistent with the preset firmware data is sent to the platform control center, so that the platform control center uses the firmware data to perform self-check abnormal recovery.
  • the present application discloses a computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, the above-mentioned disclosed steps of self-checking and abnormal recovery of the server component are implemented.
  • the present application provides a server component self-test abnormal recovery method applied to the server control chip.
  • the self-test abnormal status data sent by the platform control center is obtained; and based on The self-inspection abnormal state data queries corresponding first firmware data and second firmware data from the first flash memory chip and the second flash memory chip; then compares whether the first firmware data is consistent with the second firmware data; if The first firmware data is consistent with the second firmware data, then based on the first firmware data and the second firmware data, the target firmware data for self-test abnormal recovery is determined, and the target firmware data is and sent to the platform control center, so that the platform control center can use the target firmware data to perform self-check abnormal recovery.
  • this application queries the first firmware data and the second firmware data from the two flash memory chips connected to the server control chip based on the self-test abnormal state data sent by the platform control center, and based on the first firmware data and the second The firmware data determines the target firmware data for self-test abnormal recovery.
  • the firmware data determines the target firmware data for self-test abnormal recovery.
  • Fig. 1 is a flow chart of a server component self-test abnormal recovery method disclosed in the present application
  • Fig. 2 is a kind of target firmware data determination flowchart disclosed in the present application
  • FIG. 3 is a flow chart of a specific server component self-test abnormal recovery method disclosed in the present application.
  • FIG. 4 is a flowchart of a preset operation judgment disclosed in the present application.
  • FIG. 5 is a schematic structural diagram of a server component self-test abnormal recovery device disclosed in the present application.
  • FIG. 6 is a schematic structural diagram of a server system disclosed in the present application.
  • FIG. 7 is a structural diagram of a computer device disclosed in the present application.
  • the server operating system first needs to use a self-test program to detect each internal component to detect whether the working status of these components is normal, so that the server operating system can be started normally.
  • the firmware is usually loaded from a flash memory chip.
  • the firmware is abnormal, it is mainly to manually switch the flash memory chip or reprogram the firmware to solve the abnormal problem during the self-test.
  • it takes a lot of time and labor costs to adopt the above method, and it is unavoidable to avoid repeated occurrence of this problem in the future.
  • the embodiment of the present application discloses a server component self-test abnormal recovery method, which can automatically recover the self-test abnormal problem that occurs during the startup of the server operating system.
  • server component self-test exception recovery method which is applied to server control chips, such as complex programmable logic devices, FPGA (field programmable gate array), PLA (programmable logic array ), DSP (Digital Signal Processor) etc., the method comprises:
  • Step S11 when the target component in the server has a self-inspection abnormality, obtain the self-inspection abnormality status data sent by the platform control center.
  • the abnormal self-inspection status data sent by the platform control center is obtained.
  • Step S12 Query corresponding first firmware data and second firmware data from the first flash memory chip and the second flash memory chip based on the self-test abnormal state data.
  • Step S13 comparing whether the first firmware data is consistent with the second firmware data.
  • Step S14 If the first firmware data is consistent with the second firmware data, then determine the target firmware data for self-test exception recovery based on the first firmware data and the second firmware data, and store the The target firmware data is sent to the platform control center, so that the platform control center uses the target firmware data to perform self-check abnormal recovery.
  • the first firmware data and the second firmware data are ANDed, and based on the result of the operation, a The target firmware data, and then send the target firmware data to the platform control center, so that the platform control center can use the above target firmware data to perform self-test abnormal recovery.
  • the first firmware data is inconsistent with the second firmware data
  • the above-mentioned preset state data refers to the data corresponding to the self-test abnormal state data when the target component is in a normal state
  • the above-mentioned preset firmware data is based on the fact that the preset state data should return correct firmware data. Therefore, when the first firmware data is inconsistent with the second firmware data, it means that an abnormality occurs when the first flash memory chip or the second flash memory chip loads firmware, and it is necessary to judge whether it can correspond to the preset state data from the field replaceable unit.
  • the preset firmware data in the field replaceable unit is consistent with the first firmware data or the second firmware data
  • the correct target firmware data is found to be consistent with the first firmware data or the second firmware data If they are consistent, the firmware data consistent with the preset firmware data in the first firmware data and the second firmware data will be used as the target firmware data, and sent to the platform control center, so that the platform control center can use the above target firmware data to perform self-test abnormal recovery , and it also shows that one of the first flash memory chip or the second flash memory chip is abnormal when loading the firmware.
  • the above determination of whether the preset firmware data corresponding to the preset status data in the field replaceable unit is consistent with the first firmware data or the second firmware data further includes: if the field replaceable unit The preset firmware data in the replacement unit is inconsistent with the first firmware data or the second firmware data, then sending the preset firmware data in the field replaceable unit to the platform control center , so that the platform control center uses the preset firmware data to perform self-check abnormal recovery.
  • the preset firmware data in the field replaceable unit is inconsistent with the first firmware data or the second firmware data, it means that both the first flash memory chip and the second flash memory chip are abnormal when loading firmware, At this time, the preset firmware data corresponding to the preset state data in the field replaceable unit needs to be sent to the platform control center as the target firmware data, so that the platform control center can use the above target firmware data to perform self-check abnormality recovery.
  • the present application provides a server component self-test abnormal recovery method applied to the server control chip.
  • the self-test abnormal status data sent by the platform control center is obtained; and based on The self-inspection abnormal state data queries corresponding first firmware data and second firmware data from the first flash memory chip and the second flash memory chip; then compares whether the first firmware data is consistent with the second firmware data; if The first firmware data is consistent with the second firmware data, then based on the first firmware data and the second firmware data, the target firmware data for self-test abnormal recovery is determined, and the target firmware data is and sent to the platform control center, so that the platform control center can use the target firmware data to perform self-check abnormal recovery.
  • this application queries the first firmware data and the second firmware data from the two flash memory chips connected to the server control chip based on the self-test abnormal state data sent by the platform control center, and based on the first firmware data and the second The firmware data determines the target firmware data for self-test abnormal recovery.
  • the firmware data determines the target firmware data for self-test abnormal recovery.
  • the embodiment of the present application discloses a specific recovery method for server component self-inspection abnormality, which is applied to the server control chip. Compared with the previous embodiment, this embodiment further explains and optimizes the technical solution , including:
  • Step S21 when the target component in the server has a self-inspection abnormality, obtain the self-inspection abnormality status data sent by the platform control center.
  • Step S22 judging whether the self-check abnormal state data is wrong.
  • the above-mentioned judging whether the self-inspection abnormal state data is wrong may specifically include: obtaining preset state data corresponding to the self-inspection abnormal state data from the field replaceable unit; The abnormal state data and the preset state data are used to determine whether the self-check abnormal state data is wrong.
  • the preset status data is the data that the target component is in a normal status, so by putting the self-check status The data is compared with the preset state data, and based on the comparison result, it can be judged whether the above-mentioned abnormal state data of the self-inspection is wrong.
  • the preset state data corresponding to the self-test abnormal state data from the field replaceable unit before obtaining the preset state data corresponding to the self-test abnormal state data from the field replaceable unit, it also includes: using the firmware stored locally by the server control chip data, to judge whether the operation corresponding to the self-check abnormal state data is a preset operation; if the operation corresponding to the self-check abnormal state data is the preset operation, trigger the acquisition of the slave field replaceable unit A step of self-checking the preset state data corresponding to the abnormal state data. It should be pointed out that, before obtaining the preset state data corresponding to the self-test abnormal state data from the field replaceable unit, it is also necessary to use the server control chip to judge whether the self-test abnormal state data is preset according to the locally saved firmware data.
  • the preset operation the above-mentioned locally saved firmware data is obtained based on the data sent by the platform control center, and the preset operation mainly refers to whether the operation information corresponding to the status data is information used to characterize key operations, such as whether Commands loaded for key drivers, return values corresponding to key driver loading operations, etc.
  • Step S23 If the self-check abnormal state data is correct, send the self-check abnormal state data to the first flash memory chip and the second flash memory chip, so that the first flash memory chip and the second flash memory chip can use the Perform the corresponding firmware data query operation on the abnormal state data of the self-test described above.
  • the self-test status data when the self-test status data is consistent with the preset status data, it can be determined that the self-test status data is correct, and then the above-mentioned self-test abnormal status data is directly sent to the first flash memory that has previously established a connection with the server control chip chip and the second flash memory chip, and use the self-test abnormal state data to perform corresponding firmware data query operations through the first flash memory chip and the second flash memory chip.
  • the preset state data in the field replaceable unit is sent to the first flash memory chip and the second flash memory chips, so that the first flash memory chip and the second flash memory chip use the preset state data in the field replaceable unit to perform corresponding firmware data query operations.
  • the abnormal self-test status data is inconsistent with the preset status data, it can be determined that the above-mentioned abnormal self-test status data is incorrect, and the default status corresponding to the abnormal self-test status data in the field replaceable unit
  • the data that is, the data that the target component is in a normal state is sent to the first flash memory chip and the second flash memory chip, and the corresponding firmware data query operation is performed by using the preset state data through the first flash memory chip and the second flash memory chip.
  • Step S24 Obtain the first firmware data queried by the first flash memory chip and the second firmware data queried by the second flash memory chip.
  • the first flash memory chip and the second flash memory chip use the above-mentioned self-test abnormal state data to perform corresponding firmware data query operations
  • the first firmware data and the second flash memory chip queried by the first flash memory chip are obtained.
  • the queried second firmware data is obtained after the first flash memory chip and the second flash memory chip use the above-mentioned self-test abnormal state data to perform corresponding firmware data query operations.
  • Step S25 comparing whether the first firmware data is consistent with the second firmware data.
  • Step S26 If the first firmware data is consistent with the second firmware data, then determine the target firmware data for self-test exception recovery based on the first firmware data and the second firmware data, and store the The target firmware data is sent to the platform control center, so that the platform control center uses the target firmware data to perform self-check abnormal recovery.
  • this embodiment only processes the error-free self-inspection abnormal state data used to represent critical operations. In this way, automatic abnormal recovery of components used for non-critical operations is avoided, saving time and cost , which improves the efficiency in the self-test exception recovery process.
  • the embodiment of the present application discloses a server component self-test abnormality recovery device, which is applied to the server control chip, and the device includes:
  • the data acquisition module 11 is used to obtain the self-inspection abnormality status data sent by the platform control center when the target component in the server is abnormal in self-inspection;
  • a data query module 12 configured to query corresponding first firmware data and second firmware data from the first flash memory chip and the second flash memory chip based on the self-inspection abnormal state data;
  • a data comparison module 13 configured to compare whether the first firmware data is consistent with the second firmware data
  • the data sending module 14 is used to determine the target firmware for self-test abnormal recovery based on the first firmware data and the second firmware data if the first firmware data is consistent with the second firmware data data, and send the target firmware data to the platform control center, so that the platform control center can use the target firmware data to perform self-check abnormal recovery.
  • the present application provides a server component self-test abnormal recovery method applied to the server control chip.
  • the self-test abnormal status data sent by the platform control center is obtained; and based on The self-inspection abnormal state data queries corresponding first firmware data and second firmware data from the first flash memory chip and the second flash memory chip; then compares whether the first firmware data is consistent with the second firmware data; if The first firmware data is consistent with the second firmware data, then based on the first firmware data and the second firmware data, the target firmware data for self-test abnormal recovery is determined, and the target firmware data is and sent to the platform control center, so that the platform control center can use the target firmware data to perform self-check abnormal recovery.
  • this application queries the first firmware data and the second firmware data from the two flash memory chips connected to the server control chip based on the self-test abnormal state data sent by the platform control center, and based on the first firmware data and the second The firmware data determines the target firmware data for self-test abnormal recovery.
  • the firmware data determines the target firmware data for self-test abnormal recovery.
  • the data query module 12 also includes:
  • a first judging unit configured to judge whether the self-check abnormal state data is wrong
  • the first query unit is configured to send the self-check abnormal state data to the first flash memory chip and the second flash memory chip if the self-check abnormal state data is correct, so that the first flash memory chip and the second flash memory chip Second, the flash memory chip uses the self-test abnormal state data to perform corresponding firmware data query operations;
  • the first data acquisition unit is configured to acquire the first firmware data queried by the first flash memory chip and the second firmware data queried by the second flash memory chip.
  • the first judging unit may specifically include:
  • a second data acquisition unit configured to acquire preset state data corresponding to the self-test abnormal state data from the field replaceable unit
  • a data comparison unit configured to determine whether the self-inspection abnormal state data is wrong by comparing the self-inspection abnormal state data with the preset state data.
  • the server component self-test exception recovery device may also include:
  • the second judging unit is configured to use the firmware data stored locally by the server control chip to judge whether the operation corresponding to the self-check abnormal state data is a preset operation;
  • a triggering module configured to trigger the acquisition of preset status data corresponding to the self-check abnormal status data from the field replaceable unit if the operation corresponding to the self-check abnormal status data is the preset operation step.
  • the server component self-test exception recovery device further includes:
  • a first data sending unit configured to send the preset state data in the field replaceable unit to the first flash memory chip and the second flash memory if the self-test abnormal state data is incorrect. chips, so that the first flash memory chip and the second flash memory chip use the preset state data in the field replaceable unit to perform corresponding firmware data query operations;
  • the third data acquisition unit is configured to acquire the first firmware data queried by the first flash memory chip and the second firmware data queried by the second flash memory chip.
  • the data comparison module 13 may also include:
  • a third judging unit configured to judge whether the preset firmware data corresponding to the preset state data in the field replaceable unit is consistent with the the first firmware data or the second firmware data are consistent;
  • a second data sending unit configured to send the first firmware data and the second firmware data if the preset firmware data in the field replaceable unit is consistent with the first firmware data or the second firmware data
  • the firmware data in the second firmware data that is consistent with the preset firmware data is sent to the platform control center, so that the platform control center uses the firmware data to perform self-check abnormal recovery.
  • the server component self-test exception recovery device further includes:
  • the third data sending unit is configured to, if the preset firmware data in the field replaceable unit is inconsistent with the first firmware data or the second firmware data, send the The preset firmware data is sent to the platform control center, so that the platform control center uses the preset firmware data to perform self-check abnormal recovery.
  • FIG. 6 and FIG. 7 are structural schematic diagrams of a server system and computer equipment provided in the examples of the present application.
  • the server system includes a server control chip (such as a complex programmable logic device shown in Figure 6), a platform control center connected to the server control chip through an SPI (i.e. Serial Peripheral interface) link, and a first flash memory chip and the second flash memory chip, and a field replaceable unit and a baseboard management controller connected to the server control chip through an I2C (Inter-Integrated Circuit) link.
  • the baseboard management controller is configured to record the relevant information disclosed in any of the foregoing embodiments, so that engineers can record and troubleshoot related problems indicated by the relevant information.
  • the computer device may specifically include: at least one processor 21 , at least one memory 22 , a power supply 23 , a communication interface 24 , an input/output interface 25 and a communication bus 26 .
  • the memory 22 is used to store a computer program, and the computer program is loaded and executed by the processor 21, so as to realize the correlation in the server component self-test exception recovery method performed by a computer device disclosed in any of the foregoing embodiments. step.
  • the power supply 23 is used to provide operating voltage for each hardware device on the computer device 20;
  • the communication interface 24 can create a data transmission channel between the computer device 20 and external devices, and the communication protocol it follows is applicable Any communication protocol in the technical solution of the present application is not specifically limited here;
  • the input and output interface 25 is used to obtain external input data or output data to the external, and its specific interface type can be selected according to specific application needs, here Not specifically limited.
  • the processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • Processor 21 can adopt at least one hardware form in DSP (Digital Signal Processor, digital signal processor), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) to fulfill.
  • DSP Digital Signal Processor, digital signal processor
  • FPGA Field-Programmable Gate Array, field programmable gate array
  • PLA Programmable Logic Array, programmable logic array
  • Processor 21 may also include a main processor and a coprocessor, the main processor is a processor for processing data in a wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is Low-power processor for processing data in standby state.
  • CPU Central Processing Unit
  • the coprocessor is Low-power processor for processing data in standby state.
  • the processor 21 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content required to be displayed on the display screen.
  • the processor 21 may also include an AI (Artificial Intelligence, artificial intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • the memory 22, as a resource storage carrier can be a read-only memory, random access memory, magnetic disk or optical disk, etc., and the resources stored thereon include the operating system 221, computer program 222 and data 223, etc., and the storage method can be short-term storage or permanent storage.
  • the operating system 221 is used to manage and control each hardware device and computer program 222 on the computer device 20, so as to realize the operation and processing of the massive data 223 in the memory 22 by the processor 21, which can be Windows, Unix, Linux, etc.
  • the computer program 222 may further include a computer program capable of completing other specific tasks in addition to the computer program that can be used to complete the server component self-test exception recovery method performed by the computer device 20 disclosed in any of the foregoing embodiments.
  • the data 223 may not only include data received by the computer device and transmitted from an external device, but may also include data collected by its own input and output interface 25 .
  • the embodiment of the present application also discloses a storage medium, in which a computer program is stored, and when the computer program is loaded and executed by a processor, it can realize the self-inspection by the server component disclosed in any of the above-mentioned embodiments. Method steps performed during exception recovery.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)
  • Stored Programmes (AREA)

Abstract

本申请公开了一种服务器组件自检异常恢复方法、装置、系统及介质,该方法包括:当服务器中的目标组件出现自检异常,则获取平台控制中心发送的自检异常状态数据;基于自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据;比较第一固件数据与第二固件数据是否一致;如果第一固件数据和第二固件数据一致,则基于第一固件数据和第二固件数据确定用于进行自检异常恢复的目标固件数据,并将目标固件数据发送至平台控制中心,以便平台控制中心利用目标固件数据进行自检异常恢复。本申请通过基于自检异常状态数据从两个闪存芯片中确定出用于进行自检异常恢复的目标固件数据,节省了自检异常恢复带来的人工和时间成本。

Description

服务器组件自检异常恢复方法、装置、系统及介质
本申请要求在2021年10月20日提交中国专利局、申请号为202111218289.0、发明名称为“服务器组件自检异常恢复方法、装置、系统及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种服务器组件自检异常恢复方法、装置、系统及介质。
背景技术
随着计算机技术的发展,工业计算机已经应用到社会生产的众多领域,例如工业控制、数据采集、环境监测等等。这些场合大多需要计算机系统能够长时间稳定的工作。在服务器上电之后,服务器操作系统首先需要由自检程序来对内部各个组件进行检测,以检测这些组件的工作状态是否正常,以便服务器操作系统能够正常启动。例如BIOS(即Basis Input Output System,基本输入输出系统)是一组固化到服务器主板上的程序,其主要功能之一POST(即POWER ON SELF TEST,上电自检),即对服务器系统的组件进行检测。
当前,在进行自检时,通常是从一个闪存芯片中加载固件,当该固件出现异常时,主要是通过手动切换闪存芯片或者重新烧录此固件以解决自检过程中出现异常的问题。但通过上述方式需要花费较大的时间成本和人工成本,并且,无法避免后续重复出现此问题。
综上可见,如何对服务器操作系统启动中出现的自检异常问题进行自动恢复是目前有待解决的问题。
发明内容
有鉴于此,本申请的目的在于提供一种服务器组件自检异常恢复方法、装置、系统及介质,能够对服务器操作系统启动中出现的上电自检异常问题进行自动恢复。其具体方案如下:
第一方面,本申请公开了一种服务器组件自检异常恢复方法,应用于服务器控制芯片,包括:
当服务器中的目标组件出现自检异常,则获取平台控制中心发送的自检异常状态数据;
基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据;
比较所述第一固件数据与所述第二固件数据是否一致;
如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。
可选的,所述基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据,包括:
判断所述自检异常状态数据是否有误;
如果所述自检异常状态数据无误,则将所述自检异常状态数据发送至第一闪存芯片和第二闪存芯片,以便所述第一闪存芯片和所述第二闪存芯片利用所述自检异常状态数据进行相应的固件数据查询操作;
获取所述第一闪存芯片查询到的第一固件数据和所述第二闪存芯片查询到的第二固件数据。
可选的,所述判断所述自检异常状态数据是否有误,包括:
从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据;
通过比对所述自检异常状态数据与所述预设状态数据以确定所述自检异常状态数据是否有误。
可选的,所述从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据之前,还包括:
利用所述服务器控制芯片本地保存的固件数据,判断所述自检异常状态数据对应的操作是否为预设操作;
如果所述自检异常状态数据对应的操作为所述预设操作,则触发所述从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据的步骤。
可选的,所述判断所述自检异常状态数据是否有误之后,还包括:
如果所述自检异常状态数据有误,则将所述现场可更换单元中的所述预设状态数据发送至所述第一闪存芯片和所述第二闪存芯片,以便所述第一闪存芯片和所述第二闪存芯片利用所述现场可更换单元中的所述预设状态数据进行相应的固件数据查询操作;
获取所述第一闪存芯片查询到的第一固件数据和所述第二闪存芯片查询到的第二固件数据。
可选的,所述比较所述第一固件数据与所述第二固件数据是否一致之后,还包括:
如果所述第一固件数据和所述第二固件数据不一致,则判断所述现场可更换单元中与预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致,其中,所述预设状态数据为所述现场可更换单元中的与所述自检异常状态数据对应的预设状态数据;
如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据相一致,则将所述第一固件数据和所述第二固件数据中与所述预设固件数据相一致的固件数据发送至所述平台控制中心,以便所述平 台控制中心利用该固件数据进行自检异常恢复。
可选的,所述判断所述现场可更换单元中与所述预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致之后,还包括:
如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据均不一致,则将所述现场可更换单元中的所述预设固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述预设固件数据进行自检异常恢复。
第二方面,本申请公开了一种服务器组件自检异常恢复装置,应用于服务器控制芯片,包括:
数据获取模块,用于当服务器中的目标组件出现自检异常,则获取平台控制中心发送的自检异常状态数据;
数据查询模块,用于基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据;
数据比较模块,用于比较所述第一固件数据与所述第二固件数据是否一致;
数据发送模块,用于当如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。
可选的,所述服务器组件自检异常恢复装置还包括:
第三判断单元,用于当如果所述第一固件数据和所述第二固件数据不一致,则判断所述现场可更换单元中与预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致,其中,所述预设状态数据为所述现场可更换单元中的与所述自检异常状态数据对应的预设状态数据;
第二数据发送单元,用于当如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据相一致,则将所述第一固件数据和所述第二固件数据中与所述预设固件数据相一致的固件数据发送至所述平台控制中心,以便所述平台控制中心利用该固件数据进行自检异常恢复。
第三方面,本申请公开了一种服务器系统,包括:
第一闪存芯片、第二闪存芯片以及用于保存计算机程序的存储器;
所述系统还包括分别与所述第一闪存芯片、所述第二闪存芯片以及所述存储器进行连接的服务器控制芯片,用于执行所述存储器中的所述计算机程序,以实现以下步骤:
当所述服务器中的目标组件出现自检异常,则获取平台控制中心发送的自检异常状态数据;基于所述自检异常状态数据从所述第一闪存芯片和所述第二闪存芯片中查询相应的第一固件数据和第二固件数据;比较所述第一固件数据与所述第二固件数据是否一致;如果所述第一固件数据和所述第二固 件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。
可选的,所述比较所述第一固件数据与所述第二固件数据是否一致之后,还包括:
如果所述第一固件数据和所述第二固件数据不一致,则判断所述现场可更换单元中与预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致,其中,所述预设状态数据为所述现场可更换单元中的与所述自检异常状态数据对应的预设状态数据;
如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据相一致,则将所述第一固件数据和所述第二固件数据中与所述预设固件数据相一致的固件数据发送至所述平台控制中心,以便所述平台控制中心利用该固件数据进行自检异常恢复。
第四方面,本申请公开了一种计算机可读存储介质,用于存储计算机程序;其中,所述计算机程序被处理器执行时实现前述公开的服务器组件自检异常恢复的步骤。
可见,本申请提供了一种应用于服务器控制芯片的服务器组件自检异常恢复方法,首先当服务器中的目标组件出现自检异常时,则获取平台控制中心发送的自检异常状态数据;并基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据;然后比较所述第一固件数据与所述第二固件数据是否一致;如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。由此可见,本申请基于由平台控制中心发送的自检异常状态数据从与服务器控制芯片连接的两个闪存芯片中查询第一固件数据和第二固件数据,并基于第一固件数据和第二固件数据确定出用于进行自检异常恢复的目标固件数据,通过这种使用两个闪存芯片的方式,提升了组件自检异常恢复的成功率,从而减少了后续更换闪存芯片等人工干预操作所带来的人工成本,实现了当服务器上电后出现自检异常时,可以利用目标固件数据进行自动恢复,以便服务器操作系统能够快速、正常地启动。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请公开的一种服务器组件自检异常恢复方法流程图;
图2为本申请公开的一种目标固件数据确定流程图;
图3为本申请公开的一种具体的服务器组件自检异常恢复方法流程图;
图4为本申请公开的一种预设操作判断流程图;
图5为本申请公开的一种服务器组件自检异常恢复装置结构示意图;
图6为本申请公开的一种服务器系统结构示意图;
图7为本申请公开的一种计算机设备结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
当前,在服务器上电之后,服务器操作系统首先需要由自检程序来对内部各个组件进行检测,以检测这些组件的工作状态是否正常,以便服务器操作系统能够正常启动。在进行自检时,通常是从一个闪存芯片中加载固件,当该固件出现异常时,主要是通过手动切换闪存芯片或者重新烧录此固件以解决自检过程中出现异常的问题。但通过上述方式需要花费较大的时间成本和人工成本,并且,无法避免后续重复出现此问题。为此,本申请实施例公开了一种服务器组件自检异常恢复方法,能够对服务器操作系统启动中出现的自检异常问题进行自动恢复。
参见图1所示,本申请实施例公开了一种服务器组件自检异常恢复方法,应用于服务器控制芯片,例如复杂可编程逻辑器件、FPGA(现场可编程门阵列)、PLA(可编程逻辑阵列)、DSP(数字信号处理器)等,该方法包括:
步骤S11:当服务器中的目标组件出现自检异常,则获取平台控制中心发送的自检异常状态数据。
本实施例中,当在上电自检的过程中,服务器中的目标组件出现自检异常时,则获取由平台控制中心发送的自检异常状态数据。
步骤S12:基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据。
本实施例中,在获取到自检异常状态数据之后,需要基于该自检异常状态数据从预先与服务器控制芯片连接的第一闪存芯片和第二闪存芯片中查询出相应的用于进行自检异常恢复的第一固件数据和第二固件数据。
步骤S13:比较所述第一固件数据与所述第二固件数据是否一致。
本实施例中,需要将上述查询到的第一固件数据和第二固件数据进行比对,并基于比对结果判断第一固件数据和第二固件数据是否一致。
步骤S14:如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数 据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。
本实施例中,当上述比对结果为第一固件数据和第二固件数据一致时,则将第一固件数据和第二固件数据进行与运算,并基于运算结果得到用于进行自检异常恢复的目标固件数据,再将该目标固件数据发送至平台控制中心,以便平台控制中心利用上述目标固件数据进行自检异常恢复。
如图2所示,在另一种具体实施例中,如果所述第一固件数据和所述第二固件数据不一致,则判断现场可更换单元中与预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致;如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据相一致,则将所述第一固件数据和所述第二固件数据中与所述预设固件数据相一致的固件数据发送至所述平台控制中心,以便所述平台控制中心利用该固件数据进行自检异常恢复。需要指出的是,本实施例中,上述预设状态数据指的是与自检异常状态数据对应的当目标组件处于正常状态的数据,而上述预设固件数据是基于该预设状态数据应当返回的正确的固件数据。因此,当第一固件数据和第二固件数据不一致时,则意味着第一闪存芯片或第二闪存芯片加载固件时出现异常,则需要判断是否可以从现场可更换单元中与预设状态数据对应的预设固件数据中找到与第一固件数据或第二固件数据相一致的正确的目标固件数据,如果所述现场可更换单元中的预设固件数据与第一固件数据或第二固件数据相一致,则将第一固件数据和第二固件数据中与预设固件数据相一致的固件数据作为目标固件数据,并发送至平台控制中心,以便平台控制中心利用上述目标固件数据进行自检异常恢复,同时也说明了第一闪存芯片或第二闪存芯片中的其中一个闪存芯片在加载固件时出现异常。
进一步的,上述判断所述现场可更换单元中与预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致之后,还包括:如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据均不一致,则将所述现场可更换单元中的所述预设固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述预设固件数据进行自检异常恢复。可以理解的是,如果现场可更换单元中的预设固件数据与第一固件数据或所述第二固件数据均不一致,则意味着第一闪存芯片和第二闪存芯片加载固件时均出现异常,此时需要将现场可更换单元中的与预设状态数据对应的预设固件数据作为目标固件数据,并发送至所述平台控制中心,以便平台控制中心利用上述目标固件数据进行自检异常恢复。
可见,本申请提供了一种应用于服务器控制芯片的服务器组件自检异常恢复方法,首先当服务器中的目标组件出现自检异常时,则获取平台控制中心发送的自检异常状态数据;并基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据;然后比较所述 第一固件数据与所述第二固件数据是否一致;如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。由此可见,本申请基于由平台控制中心发送的自检异常状态数据从与服务器控制芯片连接的两个闪存芯片中查询第一固件数据和第二固件数据,并基于第一固件数据和第二固件数据确定出用于进行自检异常恢复的目标固件数据,通过这种使用两个闪存芯片的方式,提升了组件自检异常恢复的成功率,从而减少了后续更换闪存芯片等人工干预操作所带来的人工成本,实现了当服务器上电后出现自检异常时,可以利用目标固件数据进行自动恢复,以便服务器操作系统能够快速、正常地启动。
参见图3所示,本申请实施例公开了一种具体的服务器组件自检异常恢复方法,应用于服务器控制芯片,相对于上一实施例,本实施例对技术方案作了进一步的说明和优化,具体包括:
步骤S21:当服务器中的目标组件出现自检异常,则获取平台控制中心发送的自检异常状态数据。
步骤S22:判断所述自检异常状态数据是否有误。
本实施例中,上述判断所述自检异常状态数据是否有误,具体可以包括:从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据;通过比对所述自检异常状态数据与所述预设状态数据以确定所述自检异常状态数据是否有误。可以理解的是,在现场可更换单元中,记录着有与上述自检异常状态数据对应的预设状态数据,并且该预设状态数据是目标组件处于正常状态的数据,因此通过将自检状态数据与预设状态数据进行比对,基于比对结果可以判断上述自检异常状态数据是否有误。
进一步的,如图4所示,本实施例中,上述从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据之前,还包括:利用所述服务器控制芯片本地保存的固件数据,判断所述自检异常状态数据对应的操作是否为预设操作;如果所述自检异常状态数据对应的操作为所述预设操作,则触发所述从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据的步骤。需要指出的是,在从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据之前,还需要利用服务器控制芯片根据本地保存的固件数据判断该自检异常状态数据是否为预设操作,上述本地保存的固件数据是基于平台控制中心发送过的数据而得到的,并且所述预设操作主要指该状态数据所对应的操作信息是否为用于表征关键操作的信息,例如是否为关键驱动加载的命令、与关键驱动加载操作对应的返回值等。可以理解的是,只有当上述自检异常状态数据对应的操作为关键操作时,才会触发后续从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据的步骤,也即,如 果上述自检异常状态数据对应的操作不是关键操作时,则不会再执行后续从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据的步骤。
步骤S23:如果所述自检异常状态数据无误,则将所述自检异常状态数据发送至第一闪存芯片和第二闪存芯片,以便所述第一闪存芯片和所述第二闪存芯片利用所述自检异常状态数据进行相应的固件数据查询操作。
本实施例中,当自检状态数据与预设状态数据一致时,则可以确定上述自检状态数据无误,则直接将上述自检异常状态数据发送至预先与服务器控制芯片建立连接的第一闪存芯片和第二闪存芯片,并通过第一闪存芯片和第二闪存芯片利用该自检异常状态数据进行相应的固件数据查询操作。
在另一种具体实施例中,如果所述自检异常状态数据有误,则将所述现场可更换单元中的所述预设状态数据发送至所述第一闪存芯片和所述第二闪存芯片,以便所述第一闪存芯片和所述第二闪存芯片利用所述现场可更换单元中的所述预设状态数据进行相应的固件数据查询操作。可以理解的是,当自检状态异常数据与预设状态数据不一致时,则可以确定上述自检状态异常数据有误,则将现场可更换单元中与该自检状态异常数据对应的预设状态数据,也即所述目标组件处于正常状态的数据发送至第一闪存芯片和第二闪存芯片,并通过第一闪存芯片和第二闪存芯片利用该预设状态数据进行相应的固件数据查询操作。
步骤S24:获取所述第一闪存芯片查询到的第一固件数据和所述第二闪存芯片查询到的第二固件数据。
本实施例中,在第一闪存芯片和第二闪存芯片利用上述自检异常状态数据进行相应的固件数据查询操作之后,则获取到第一闪存芯片查询到的第一固件数据和第二闪存芯片查询到的第二固件数据。
步骤S25:比较所述第一固件数据与所述第二固件数据是否一致。
步骤S26:如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。
其中,关于上述步骤S21、S25、S26更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述
由此可见,在基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据的过程中,需要利用所述服务器控制芯片本地保存的固件数据来判断所述自检异常状态数据对应的操作是否为预设操作,并且只有当所述自检异常状态数据为预设操作之后,还需要将该自检异常状态数据与现场可更换单元中记录的用于表征目标组件处于正常状态的预设状态数据进行比较,当自检异常状态数据与预设状态数据一致时,才可确定所述自检异常状态数据无误,并将该自检异常状态数据发送至第一闪存芯片和第二闪存芯片,以便所述第一闪存芯片和所述第二闪存芯片 利用所述自检异常状态数据进行相应的固件数据查询操作。由此可见,本实施例只对用于表征关键操作且无误的自检异常状态数据进行处理,通过这种方式,避免了对用于进行非关键操作的组件进行异常自动恢复,节约了时间成本,提高了自检异常恢复过程中的效率。
参见图5所示,本申请实施例公开了一种服务器组件自检异常恢复装置,应用于服务器控制芯片,该装置包括:
数据获取模块11,用于当服务器中的目标组件出现自检异常,则获取平台控制中心发送的自检异常状态数据;
数据查询模块12,用于基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据;
数据比较模块13,用于比较所述第一固件数据与所述第二固件数据是否一致;
数据发送模块14,用于当如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。
可见,本申请提供了一种应用于服务器控制芯片的服务器组件自检异常恢复方法,首先当服务器中的目标组件出现自检异常时,则获取平台控制中心发送的自检异常状态数据;并基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据;然后比较所述第一固件数据与所述第二固件数据是否一致;如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。由此可见,本申请基于由平台控制中心发送的自检异常状态数据从与服务器控制芯片连接的两个闪存芯片中查询第一固件数据和第二固件数据,并基于第一固件数据和第二固件数据确定出用于进行自检异常恢复的目标固件数据,通过这种使用两个闪存芯片的方式,提升了组件自检异常恢复的成功率,从而减少了后续更换闪存芯片等人工干预操作所带来的人工成本,实现了当服务器上电后出现自检异常时,可以利用目标固件数据进行自动恢复,以便服务器操作系统能够快速、正常地启动。
在一些具体实施例中,所述数据查询模块12,还包括:
第一判断单元,用于判断所述自检异常状态数据是否有误;
第一查询单元,用于当如果所述自检异常状态数据无误,则将所述自检异常状态数据发送至第一闪存芯片和第二闪存芯片,以便所述第一闪存芯片和所述第二闪存芯片利用所述自检异常状态数据进行相应的固件数据查询操作;
第一数据获取单元,用于获取所述第一闪存芯片查询到的第一固件数据和所述第二闪存芯片查询到的第二固件数据。
在一些具体实施例中,所述第一判断单元,具体可以包括:
第二数据获取单元,用于从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据;
数据比对单元,用于通过比对所述自检异常状态数据与所述预设状态数据以确定所述自检异常状态数据是否有误。
在一些具体实施例中,所述服务器组件自检异常恢复装置,还可以包括:
第二判断单元,用于利用所述服务器控制芯片本地保存的固件数据,判断所述自检异常状态数据对应的操作是否为预设操作;
步骤触发模块,用于当如果所述自检异常状态数据对应的操作为所述预设操作,则触发所述从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据的步骤。
在一些具体实施例中,所述服务器组件自检异常恢复装置,还包括:
第一数据发送单元,用于当如果所述自检异常状态数据有误,则将所述现场可更换单元中的所述预设状态数据发送至所述第一闪存芯片和所述第二闪存芯片,以便所述第一闪存芯片和所述第二闪存芯片利用所述现场可更换单元中的所述预设状态数据进行相应的固件数据查询操作;
第三数据获取单元,用于获取所述第一闪存芯片查询到的第一固件数据和所述第二闪存芯片查询到的第二固件数据。
在一些具体实施例中,所述数据比较模块13之后,还可以包括:
第三判断单元,用于当如果所述第一固件数据和所述第二固件数据不一致,则判断所述现场可更换单元中与所述预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致;
第二数据发送单元,用于当如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据相一致,则将所述第一固件数据和所述第二固件数据中与所述预设固件数据相一致的固件数据发送至所述平台控制中心,以便所述平台控制中心利用该固件数据进行自检异常恢复。
在一些具体实施例中,所述服务器组件自检异常恢复装置,还包括:
第三数据发送单元,用于当如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据均不一致,则将所述现场可更换单元中的所述预设固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述预设固件数据进行自检异常恢复。
图6和图7为本申请实例提供的一种服务器系统和计算机设备的结构示意图。
所述服务器系统包括服务器控制芯片(例如图6中所示的复杂可编程逻辑器件)、与所述服务器控制芯片通过SPI(即Serial Peripheral interface)链路建立连接的平台控制中心、第一闪存芯片和第二闪存芯片,以及与所述服务器 控制芯片通过I2C(即Inter-Integrated Circuit)链路建立连接的现场可更换单元和基板管理控制器。其中,所述基板管理控制器用于记录前述任一实施例公开的相关信息,以便工程师基于所述相关信息表示的相关问题进行记录和排查。
所述计算机设备具体可以包括:至少一个处理器21、至少一个存储器22、电源23、通信接口24、输入输出接口25和通信总线26。其中,所述存储器22用于存储计算机程序,所述计算机程序由所述处理器21加载并执行,以实现前述任一实施例公开的由计算机设备执行的服务器组件自检异常恢复方法中的相关步骤。
本实施例中,电源23用于为计算机设备20上的各硬件设备提供工作电压;通信接口24能够为计算机设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;输入输出接口25,用于获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。
其中,处理器21可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器21可以采用DSP(Digital Signal Processor,数字信号处理器)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器21也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器21可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器21还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
另外,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源包括操作系统221、计算机程序222及数据223等,存储方式可以是短暂存储或者永久存储。
其中,操作系统221用于管理与控制计算机设备20上的各硬件设备以及计算机程序222,以实现处理器21对存储器22中海量数据223的运算与处理,其可以是Windows、Unix、Linux等。计算机程序222除了包括能够用于完成前述任一实施例公开的由计算机设备20执行的服务器组件自检异常恢复方法的计算机程序之外,还可以进一步包括能够用于完成其他特定工作的计算机程序。数据223除了可以包括计算机设备接收到的由外部设备传输进来的数据,也可以包括由自身输入输出接口25采集到的数据等。
进一步的,本申请实施例还公开了一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序被处理器加载并执行时,实现前述任一实施例公开的由服务器组件自检异常恢复过程中执行的方法步骤。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本申请所提供的一种服务器组件自检异常恢复方法、装置、设备及存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (12)

  1. 一种服务器组件自检异常恢复方法,其特征在于,应用于服务器控制芯片,包括:
    当服务器中的目标组件出现自检异常,则获取平台控制中心发送的自检异常状态数据;
    基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据;
    比较所述第一固件数据与所述第二固件数据是否一致;
    如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。
  2. 根据权利要求1所述的服务器组件自检异常恢复方法,其特征在于,所述基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据,包括:
    判断所述自检异常状态数据是否有误;
    如果所述自检异常状态数据无误,则将所述自检异常状态数据发送至第一闪存芯片和第二闪存芯片,以便所述第一闪存芯片和所述第二闪存芯片利用所述自检异常状态数据进行相应的固件数据查询操作;
    获取所述第一闪存芯片查询到的第一固件数据和所述第二闪存芯片查询到的第二固件数据。
  3. 根据权利要求2所述的服务器组件自检异常恢复方法,其特征在于,所述判断所述自检异常状态数据是否有误,包括:
    从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据;
    通过比对所述自检异常状态数据与所述预设状态数据以确定所述自检异常状态数据是否有误。
  4. 根据权利要求3所述的服务器组件自检异常恢复方法,其特征在于,所述从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据之前,还包括:
    利用所述服务器控制芯片本地保存的固件数据,判断所述自检异常状态数据对应的操作是否为预设操作;
    如果所述自检异常状态数据对应的操作为所述预设操作,则触发所述从现场可更换单元获取与所述自检异常状态数据对应的预设状态数据的步骤。
  5. 根据权利要求3所述的服务器组件自检异常恢复方法,其特征在于,所述判断所述自检异常状态数据是否有误之后,还包括:
    如果所述自检异常状态数据有误,则将所述现场可更换单元中的所述预设状态数据发送至所述第一闪存芯片和所述第二闪存芯片,以便所述第一闪存芯片和所述第二闪存芯片利用所述现场可更换单元中的所述预设状态数据进行相应的固件数据查询操作;
    获取所述第一闪存芯片查询到的第一固件数据和所述第二闪存芯片查询到的第二固件数据。
  6. 根据权利要求1所述的服务器组件自检异常恢复方法,其特征在于,所述比较所述第一固件数据与所述第二固件数据是否一致之后,还包括:
    如果所述第一固件数据和所述第二固件数据不一致,则判断所述现场可更换单元中与预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致,其中,所述预设状态数据为所述现场可更换单元中的与所述自检异常状态数据对应的预设状态数据;
    如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据相一致,则将所述第一固件数据和所述第二固件数据中与所述预设固件数据相一致的固件数据发送至所述平台控制中心,以便所述平台控制中心利用该固件数据进行自检异常恢复。
  7. 根据权利要求6所述的服务器组件自检异常恢复方法,其特征在于,所述判断所述现场可更换单元中与所述预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致之后,还包括:
    如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据均不一致,则将所述现场可更换单元中的所述预设固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述预设固件数据进行自检异常恢复。
  8. 一种服务器组件自检异常恢复装置,其特征在于,应用于服务器控制芯片,包括:
    数据获取模块,用于当服务器中的目标组件出现自检异常,则获取平台控制中心发送的自检异常状态数据;
    数据查询模块,用于基于所述自检异常状态数据从第一闪存芯片和第二闪存芯片中查询相应的第一固件数据和第二固件数据;
    数据比较模块,用于比较所述第一固件数据与所述第二固件数据是否一致;
    数据发送模块,用于当如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。
  9. 根据权利要求8所述的服务器组件自检异常恢复装置,其特征在于,还包括:
    第三判断单元,用于当如果所述第一固件数据和所述第二固件数据不一致,则判断所述现场可更换单元中与预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致,其中,所述预设状态数据为所述现场可更换单元中的与所述自检异常状态数据对应的预设状态数据;
    第二数据发送单元,用于当如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据相一致,则将所述第一固件数据和所述第二固件数据中与所述预设固件数据相一致的固件数据发送至所述平台控制中心,以便所述平台控制中心利用该固件数据进行自检异常恢复。
  10. 一种服务器系统,其特征在于,包括:
    第一闪存芯片、第二闪存芯片以及用于保存计算机程序的存储器;
    所述系统还包括分别与所述第一闪存芯片、所述第二闪存芯片以及所述存储器进行连接的服务器控制芯片,用于执行所述存储器中的所述计算机程序,以实现以下步骤:
    当所述服务器中的目标组件出现自检异常,则获取平台控制中心发送的自检异常状态数据;基于所述自检异常状态数据从所述第一闪存芯片和所述第二闪存芯片中查询相应的第一固件数据和第二固件数据;比较所述第一固件数据与所述第二固件数据是否一致;如果所述第一固件数据和所述第二固件数据一致,则基于所述第一固件数据和所述第二固件数据确定用于进行自检异常恢复的目标固件数据,并将所述目标固件数据发送至所述平台控制中心,以便所述平台控制中心利用所述目标固件数据进行自检异常恢复。
  11. 根据权利要求10所述的服务器系统,其特征在于,所述比较所述第一固件数据与所述第二固件数据是否一致之后,还包括:
    如果所述第一固件数据和所述第二固件数据不一致,则判断所述现场可更换单元中与预设状态数据对应的预设固件数据是否与所述第一固件数据或所述第二固件数据相一致,其中,所述预设状态数据为所述现场可更换单元中的与所述自检异常状态数据对应的预设状态数据;
    如果所述现场可更换单元中的所述预设固件数据与所述第一固件数据或所述第二固件数据相一致,则将所述第一固件数据和所述第二固件数据中与所述预设固件数据相一致的固件数据发送至所述平台控制中心,以便所述平台控制中心利用该固件数据进行自检异常恢复。
  12. 一种计算机可读存储介质,其特征在于,用于存储计算机程序;其中,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的服务器组件自检异常恢复的步骤。
PCT/CN2022/083574 2021-10-20 2022-03-29 服务器组件自检异常恢复方法、装置、系统及介质 WO2023065601A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/564,699 US20240264914A1 (en) 2021-10-20 2022-03-29 Method and device for recovering self-test exception of server component, system and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111218289.0 2021-10-20
CN202111218289.0A CN113672306B (zh) 2021-10-20 2021-10-20 服务器组件自检异常恢复方法、装置、系统及介质

Publications (1)

Publication Number Publication Date
WO2023065601A1 true WO2023065601A1 (zh) 2023-04-27

Family

ID=78550637

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083574 WO2023065601A1 (zh) 2021-10-20 2022-03-29 服务器组件自检异常恢复方法、装置、系统及介质

Country Status (3)

Country Link
US (1) US20240264914A1 (zh)
CN (1) CN113672306B (zh)
WO (1) WO2023065601A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061603A (zh) * 2019-12-30 2020-04-24 鹍骐科技(北京)股份有限公司 可记录自检数据的主板和计算机、自检数据的记录方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672306B (zh) * 2021-10-20 2022-02-18 苏州浪潮智能科技有限公司 服务器组件自检异常恢复方法、装置、系统及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100281297A1 (en) * 2009-04-29 2010-11-04 Jibbe Mahmoud K Firmware recovery in a raid controller by using a dual firmware configuration
CN110908847A (zh) * 2019-11-22 2020-03-24 苏州浪潮智能科技有限公司 一种异常恢复方法、系统、电子设备及存储介质
CN111858119A (zh) * 2020-07-13 2020-10-30 山东云海国创云计算装备产业创新中心有限公司 一种bios故障修复方法及相关装置
CN112667462A (zh) * 2020-12-15 2021-04-16 苏州浪潮智能科技有限公司 一种服务器的双闪存运行监测的系统、方法及介质
CN113064757A (zh) * 2021-03-26 2021-07-02 山东英信计算机技术有限公司 一种服务器固件自恢复系统及服务器
CN113672306A (zh) * 2021-10-20 2021-11-19 苏州浪潮智能科技有限公司 服务器组件自检异常恢复方法、装置、系统及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100281297A1 (en) * 2009-04-29 2010-11-04 Jibbe Mahmoud K Firmware recovery in a raid controller by using a dual firmware configuration
CN110908847A (zh) * 2019-11-22 2020-03-24 苏州浪潮智能科技有限公司 一种异常恢复方法、系统、电子设备及存储介质
CN111858119A (zh) * 2020-07-13 2020-10-30 山东云海国创云计算装备产业创新中心有限公司 一种bios故障修复方法及相关装置
CN112667462A (zh) * 2020-12-15 2021-04-16 苏州浪潮智能科技有限公司 一种服务器的双闪存运行监测的系统、方法及介质
CN113064757A (zh) * 2021-03-26 2021-07-02 山东英信计算机技术有限公司 一种服务器固件自恢复系统及服务器
CN113672306A (zh) * 2021-10-20 2021-11-19 苏州浪潮智能科技有限公司 服务器组件自检异常恢复方法、装置、系统及介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061603A (zh) * 2019-12-30 2020-04-24 鹍骐科技(北京)股份有限公司 可记录自检数据的主板和计算机、自检数据的记录方法

Also Published As

Publication number Publication date
CN113672306B (zh) 2022-02-18
US20240264914A1 (en) 2024-08-08
CN113672306A (zh) 2021-11-19

Similar Documents

Publication Publication Date Title
CN107122321B (zh) 硬件修复方法、硬件修复系统以及计算机可读取存储装置
WO2023065601A1 (zh) 服务器组件自检异常恢复方法、装置、系统及介质
US8468389B2 (en) Firmware recovery system and method of baseboard management controller of computing device
US8473666B2 (en) Systems and methods for driverless operation of USB device
US9680712B2 (en) Hardware management and control of computer components through physical layout diagrams
WO2018095107A1 (zh) 一种bios程序的异常处理方法及装置
US20090100287A1 (en) Monitoring Apparatus and a Monitoring Method Thereof
WO2016145888A1 (zh) 显示屏处理方法及装置
CN110704228B (zh) 一种固态硬盘异常处理方法及系统
US11662803B2 (en) Control method, apparatus, and electronic device
CN114138644A (zh) Bmc调试方法及监控方法、系统、装置、设备、介质
CN110083491A (zh) 一种bios初始化方法、装置、设备及存储介质
CN115599617B (zh) 总线检测方法、装置、服务器及电子设备
CN117389781A (zh) 服务器设备的异常侦测与恢复方法、系统、服务器及介质
CN112131043A (zh) 一种基本输入输出系统的异常检测与恢复方法和装置
JP2018180982A (ja) 情報処理装置、およびログ記録方法
US20170364368A1 (en) Setting method of accessing system parameters and server using the same
KR100605031B1 (ko) Usb 메모리 장치를 이용한 임베디드 시스템의 장애복구 및 업그레이드 방법
CN104750551A (zh) 一种计算机系统及其自定义响应方法
CN116089139A (zh) 一种串口硬盘故障处理方法、装置、介质
CN113190491B (zh) 一种串口信息显示方法、系统及介质
KR101100894B1 (ko) 임베디드 장치의 오류검출 및 복구방법
CN118349290B (zh) 双存储芯片的启动升级系统、方法、设备、介质和产品
CN117033084B (zh) 虚拟机备份方法、装置、电子设备及存储介质
CN113687869B (zh) 一种兼容txt功能和asd功能的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22882226

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE