CN113704023B - Firmware self-recovery device, method and server system - Google Patents

Firmware self-recovery device, method and server system Download PDF

Info

Publication number
CN113704023B
CN113704023B CN202110808136.5A CN202110808136A CN113704023B CN 113704023 B CN113704023 B CN 113704023B CN 202110808136 A CN202110808136 A CN 202110808136A CN 113704023 B CN113704023 B CN 113704023B
Authority
CN
China
Prior art keywords
rom
data information
data
firmware
roms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110808136.5A
Other languages
Chinese (zh)
Other versions
CN113704023A (en
Inventor
李倩倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110808136.5A priority Critical patent/CN113704023B/en
Publication of CN113704023A publication Critical patent/CN113704023A/en
Application granted granted Critical
Publication of CN113704023B publication Critical patent/CN113704023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1471Saving, restoring, recovering or retrying involving logging of persistent data for recovery
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The application discloses a firmware self-recovery device, a firmware self-recovery method and a server system. According to the application, the data information of the other ROM can be calculated through the data information of the two ROMs and the preset checking algorithm, namely, the data information of any two ROMs can be regarded as complete data information, namely, when the data information of any one of the three ROMs is destroyed, the integrity of the data is not influenced, the data safety can be improved, the fault tolerance is good, and the stability and the reliability of a server system are effectively improved; in addition, the three ROMs are matched with the storage of complete data information, so that the reading and writing speed of the data information can be improved.

Description

Firmware self-recovery device, method and server system
Technical Field
The present application relates to the field of server systems, and in particular, to a firmware self-recovery device, a firmware self-recovery method, and a server system.
Background
Currently, firmware information of a mainstream server system is generally stored in a Read-Only Memory (ROM), and when the firmware information in the ROM is destroyed, the server system has a risk of being unable to start up, resulting in lower stability and reliability of the server system.
In the prior art, in order to improve the stability and reliability of server systems, some server systems are designed as dual ROM schemes, and when one ROM information is destroyed, the server system is automatically switched to another ROM. However, even with the dual ROM design, fault tolerance is poor, which is disadvantageous in effectively improving the stability and reliability of the server system.
Therefore, how to provide a solution to the above technical problem is a problem that a person skilled in the art needs to solve at present.
Disclosure of Invention
The application aims to provide a firmware self-recovery device, a firmware self-recovery method and a firmware self-recovery server system, wherein the data information of the other ROM can be calculated through the data information of the two ROMs and a preset verification algorithm, namely, the data information of any two ROMs can be regarded as complete data information, namely, when the data information of any one of the three ROMs is damaged, the integrity of the data is not influenced, the data safety can be improved, the fault tolerance is good, and the stability and the reliability of the server system are improved effectively; in addition, the three ROMs are matched with the storage of complete data information, so that the reading and writing speed of the data information can be improved.
In order to solve the above technical problems, the present application provides a firmware self-recovery device, including:
a first ROM for storing first firmware data;
a second ROM for storing second firmware data; the first firmware data and the second firmware data are combined to obtain all effective firmware data of the server system;
the third ROM is used for storing verification information obtained by calculating the first firmware data and the second firmware data through a preset verification algorithm;
and the controller is used for respectively acquiring the data information of the first ROM, the second ROM and the third ROM, and if the data information of one ROM is damaged, recovering the damaged data information based on the data information of the other two ROMs and the preset verification algorithm so as to load and use a computing board of the server system when the computer system is started.
Preferably, the controller is further configured to:
writing the recovered data information into the ROM with the destroyed data information so as to recover the original data information in the ROM with the destroyed data information.
Preferably, the first ROM, the second ROM and the third ROM are the same in number and all three are multiple; one of the first ROM, one of the second ROM and one of the third ROM form a ROM group;
the firmware self-recovery device further includes:
a plurality of management boards provided with a plurality of ROM groups one by one;
the adapter plate is provided with the controller and is connected with the plurality of management boards;
the controller is specifically configured to determine a main management board from the in-place management boards, acquire data information of three ROMs on the main management board, and if the data information of one of the ROMs is damaged, recover the damaged data information based on the data information of the other two ROMs and the preset verification algorithm; if the data information of two or all ROMs is destroyed, the main management board is redetermined from the rest management boards, and the step of acquiring the data information of three ROMs on the main management board is re-executed until the complete data information of the three ROMs is acquired, so that the computing board of the server system is loaded for use when being started.
Preferably, the controller is further configured to:
and erasing the data information of the ROM group on the management board with the damaged data information, and correspondingly writing the data information of the ROM group on the management board with the undamaged data information into the ROM group on the management board with the damaged data information.
Preferably, the number of the first ROM, the second ROM and the third ROM is two; the two management boards comprise a master management board and a slave management board;
the controller is specifically configured to detect the in-place situations of two management boards, if both the two management boards are in place, acquire data information of three ROMs on the main management board, and if the data information of one of the ROMs is damaged, recover the damaged data information based on the data information of the other two ROMs and the preset verification algorithm; if the data information of two or all ROMs is destroyed, switching to acquire the data information of three ROMs from the slave management board; if only one management board is in place, the data information of three ROMs is directly obtained from the management board for loading and using when the computing board of the server system is started.
Preferably, the first firmware data, the second firmware data and the verification information are all binary data;
the preset process of the verification algorithm comprises the following steps:
when the nth bit data of the first firmware data is 0 and the nth bit data of the second firmware data is 0, the nth bit data of the check information is 0; wherein n is a positive integer;
when the nth bit data of the first firmware data is 0 and the nth bit data of the second firmware data is 1, the nth bit data of the check information is 1;
when the nth bit data of the first firmware data is 1 and the nth bit data of the second firmware data is 0, the nth bit data of the check information is 1;
when the nth bit data of the first firmware data is 1 and the nth bit data of the second firmware data is 1, the nth bit data of the check information is 0.
Preferably, the controller is further configured to:
accumulating the number of times that the data information of each ROM is destroyed, and judging whether the number of times that the data information of the target ROM is destroyed is larger than a preset number threshold; if yes, carrying out replacement reminding of the target ROM; wherein the target ROM is any one of the ROMs.
Preferably, the number of the computing boards is a plurality;
the firmware self-recovery device further includes:
the switch circuits are respectively connected with the controller and the plurality of computing boards;
the controller is also used for controlling the switch circuit to conduct a communication link between the switch circuit and a target computing board with the loading requirement of the starting firmware according to the loading requirements of the starting firmware of a plurality of computing boards so as to transmit complete data information of three ROMs to the target computing board for loading and using when the target computing board is started.
In order to solve the above technical problems, the present application further provides a firmware self-recovery method, which is applied to any one of the above firmware self-recovery devices, including:
respectively acquiring data information of the first ROM, the second ROM and the third ROM;
if the data information of one ROM is destroyed, recovering the destroyed data information based on the data information of the other two ROMs and the preset verification algorithm for loading and using when the computing board of the server system is started.
In order to solve the technical problem, the application also provides a server system which comprises any firmware self-recovery device.
The application provides a firmware self-recovery device, which comprises a first ROM, a second ROM, a third ROM and a controller; the controller is used for respectively acquiring the data information of the first ROM, the second ROM and the third ROM, and if the data information of one ROM is destroyed, the destroyed data information is recovered based on the data information of the other two ROMs and a preset verification algorithm so as to be used for loading a computing board of the server system when the computer system is started. Therefore, the data information of the other ROM can be calculated through the data information of the two ROMs and a preset verification algorithm, namely, the data information of any two ROMs can be regarded as complete data information, namely, when the data information of any one of the three ROMs is destroyed, the integrity of the data is not affected, the data safety can be improved, the fault tolerance is good, and the stability and the reliability of a server system are improved effectively; in addition, the three ROMs are matched with the storage of complete data information, so that the reading and writing speed of the data information can be improved.
The application also provides a firmware self-recovery method and a server system, which have the same beneficial effects as the firmware self-recovery device.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required in the prior art and the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of a firmware self-recovery device according to an embodiment of the present application;
fig. 2 is a schematic diagram of a specific structure of a firmware self-recovery device according to an embodiment of the present application;
fig. 3 is a flowchart of a firmware self-recovery method according to an embodiment of the present application.
Detailed Description
The application provides a firmware self-recovery device, a firmware self-recovery method and a firmware self-recovery server system, wherein the data information of the other ROM can be calculated through the data information of the two ROMs and a preset verification algorithm, namely, the data information of any two ROMs can be regarded as complete data information, namely, when the data information of any one of the three ROMs is damaged, the integrity of the data is not influenced, the data safety can be improved, the fault tolerance is good, and the stability and the reliability of the server system are improved effectively; in addition, the three ROMs are matched with the storage of complete data information, so that the reading and writing speed of the data information can be improved.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a firmware self-recovery device according to an embodiment of the application.
The firmware self-recovery device includes:
a first ROM 1 for storing first firmware data;
a second ROM 2 for storing second firmware data; the first firmware data and the second firmware data are combined to obtain all effective firmware data of the server system;
a third ROM3 for storing verification information calculated by a preset verification algorithm from the first firmware data and the second firmware data;
the controller 4 is configured to obtain the data information of the first ROM 1, the second ROM 2, and the third ROM3, and if the data information of one of the ROMs is damaged, recover the damaged data information based on the data information of the other two ROMs and a preset verification algorithm, so that the computing board of the server system is loaded and used when the computer system is started.
Specifically, the firmware self-recovery device of the present application includes a first ROM 1, a second ROM 2, a third ROM3, and a controller 4, and the working principle thereof is as follows:
the application divides all the effective firmware data of the server system into first firmware data and second firmware data, wherein the first firmware data is stored in the first ROM 1, the second firmware data is stored in the second ROM 2, namely, all the effective firmware data of the server system can be obtained by combining the firmware data obtained from the first ROM 1 and the second ROM 2.
The application calculates the first firmware data and the second firmware data through a preset checking algorithm to obtain checking information, and stores the calculated checking information in the third ROM 3. It can be understood that if the data information of the first ROM 1 is damaged and the data information of the second ROM 2 and the third ROM3 are good, the data information of the first ROM 1 can be calculated through the data information of the second ROM 2 and the third ROM3 and a preset verification algorithm to obtain three complete data information of the third ROM; similarly, if the data information of the second ROM 2 is damaged and the data information of the first ROM 1 and the third ROM3 are good, the data information of the second ROM 2 can be calculated through the data information of the first ROM 1 and the third ROM3 and a preset verification algorithm so as to obtain the complete data information of the three ROMs; if the data information of the third ROM3 is damaged and the data information of the first ROM 1 and the second ROM 2 are good, the data information of the third ROM3 can be calculated through the data information of the first ROM 1 and the second ROM 2 and a preset checking algorithm so as to obtain three complete data information of the ROM, and the three complete data information is loaded and used when a computing board of the server system is started.
Based on this, the controller 4 is connected to the first ROM 1, the second ROM 2, and the third ROM3, respectively, and is configured to obtain data information of the first ROM 1, the second ROM 2, and the third ROM3, where if the data information of the three ROMs is not destroyed, complete data information of the three ROMs can be directly obtained; if the data information of one ROM is destroyed, recovering the destroyed data information based on the data information of the other two ROMs and a preset verification algorithm to obtain the complete data information of the three ROMs for loading when the computing board of the server system is started.
The application provides a firmware self-recovery device, which comprises a first ROM, a second ROM, a third ROM and a controller; the controller is used for respectively acquiring the data information of the first ROM, the second ROM and the third ROM, and if the data information of one ROM is destroyed, the destroyed data information is recovered based on the data information of the other two ROMs and a preset verification algorithm so as to be used for loading a computing board of the server system when the computer system is started. Therefore, the data information of the other ROM can be calculated through the data information of the two ROMs and a preset verification algorithm, namely, the data information of any two ROMs can be regarded as complete data information, namely, when the data information of any one of the three ROMs is destroyed, the integrity of the data is not affected, the data safety can be improved, the fault tolerance is good, and the stability and the reliability of a server system are improved effectively; in addition, the three ROMs are matched with the storage of complete data information, so that the reading and writing speed of the data information can be improved.
Based on the above embodiments:
as an alternative embodiment, the controller 4 is further configured to:
writing the recovered data information into the ROM with the damaged data information to recover the original data information in the ROM with the damaged data information.
Further, the controller 4 can write the recovered data information into the ROM with the damaged data information, so as to recover the original data information in the ROM with the damaged data information, thereby realizing automatic recovery of the storage content in the ROM with the damaged data information, without manual re-burning, and having higher operability and easy later maintenance.
Referring to fig. 2, fig. 2 is a schematic diagram of a specific structure of a firmware self-recovery device according to an embodiment of the application.
As an alternative embodiment, the number of the first ROM 1, the second ROM 2 and the third ROM3 is the same and all three are multiple; a first ROM 1, a second ROM 2 and a third ROM3 form a ROM group;
the firmware self-recovery device further includes:
a plurality of management boards provided with a plurality of ROM groups one by one;
an adapter plate 5 provided with a controller 4 and connected with a plurality of management boards;
the controller 4 is specifically configured to determine a main management board from the in-place management boards, obtain data information of three ROMs on the main management board, and if the data information of one of the ROMs is damaged, restore the damaged data information based on the data information of the other two ROMs and a preset verification algorithm; if the data information of two or all ROMs is destroyed, the main management board is redetermined from the rest management boards, and the step of acquiring the data information of three ROMs on the main management board is re-executed until the complete data information of the three ROMs is acquired, so that the computing board of the server system is loaded for use when the server system is started.
Further, the first ROM 1, the second ROM 2 and the third ROM3 are the same in number, and the number of the first ROM 1, the second ROM 2 and the third ROM3 is multiple; wherein a first ROM 1, a second ROM 2 and a third ROM3 form a ROM group, thereby obtaining a plurality of ROM groups.
Based on this, the firmware self-recovery device of the present application further includes a plurality of management boards (CMC (CHASIS MANAGEMENT CONTROLLER) is selected) and an adapter board 5, and the working principle thereof is as follows:
a plurality of management boards are connected with the adapter board 5, and the purpose of the management boards is to realize the communication between the controller 4 on the adapter board 5 and the ROM groups on each management board. Specifically, the application sets up a plurality of first communication interfaces on the adapter plate 5, sets up the second communication interface that pegs graft with first communication interface on each management board to realize the communication of the ROM group on the controller 4 and each management board on the adapter plate 5 through communication interface.
The controller 4 determines a main management board from the in-place management boards, acquires the data information of three ROMs on the main management board, and can directly obtain the complete data information of the three ROMs if the data information of the three ROMs is not destroyed; if the data information of one ROM is destroyed, recovering the destroyed data information based on the data information of the other two ROMs and a preset verification algorithm to obtain the complete data information of the three ROMs; if the data information of two or all ROMs is destroyed, and at the moment, the complete data information cannot be reserved, the main management board is redetermined from the rest management boards, the data information of three ROMs on the redetermined main management board is obtained, and if the data information of the three ROMs is not destroyed, the complete data information of the three ROMs can be directly obtained; if the data information of one ROM is destroyed, recovering the destroyed data information based on the data information of the other two ROMs and a preset verification algorithm to obtain the complete data information of the three ROMs; if the data information of two or all ROMs is destroyed, the main management board is continuously redetermined from the rest management boards until the complete data information of three ROMs is obtained for loading and using when the computing board of the server system is started.
Therefore, the application has three ROMs matched with double insurance for storing complete data information (adopting a banding storage mode) and a plurality of management boards, thereby greatly improving the safety and reliability of the server system.
As an alternative embodiment, the controller 4 is further configured to:
and erasing the data information of the ROM group on the management board with the damaged data information, and correspondingly writing the data information of the ROM group on the management board with the undamaged data information into the ROM group on the management board with the damaged data information.
Further, the controller 4 also erases the data information of the ROM group on the management board (called the first management board) with damaged data information, namely, erases the data information of the three ROMs on the first management board, and then writes the data information of the ROM group on the management board (called the second management board) with undamaged data information into the ROM group on the management board with damaged data information, specifically, writes the data information of the first ROM on the second management board into the first ROM on the first management board, writes the data information of the second ROM on the second management board into the second ROM on the first management board, and writes the data information of the third ROM on the second management board into the third ROM on the first management board, thereby realizing automatic recovery of the storage content in the ROM on the management board with damaged data information, without manual re-writing, and having higher operability and easy post maintenance.
It should be noted that, if the data information of one ROM is damaged and the data information of the other two ROMs are intact, the management board is still considered as a management board with the data information being not damaged (the data recovery is performed according to the manner of writing the recovered data information into the ROM with the damaged data information as proposed in the above embodiment). Only when the data information of two or all of the ROMs on the management board is destroyed, the management board is considered as the management board whose data information is destroyed.
As an alternative embodiment, the number of the first ROM 1, the second ROM 2 and the third ROM3 is two; the two management boards comprise a master management board CMC0 and a slave management board CMC1;
the controller 4 is specifically configured to detect the in-place situation of two management boards, if both of the two management boards are in place, obtain data information of three ROMs on the main management board CMC0, and if the data information of one of the two ROMs is damaged, recover the damaged data information based on the data information of the other two ROMs and a preset verification algorithm; if the data information of two or all ROMs is destroyed, switching to obtain the data information of three ROMs from the management board CMC1; if only one management board is in place, the data information of three ROMs is directly obtained from the management board for loading and using when the computing board of the server system is started.
Specifically, the number of the first ROM 1, the second ROM 2 and the third ROM3 is two, so that the firmware self-recovery device specifically comprises two management boards, and the controller 4 defines the master management board CMC0 and the slave management board CMC1 in advance.
Based on this, the controller 4 will automatically detect the on-site status of the master management board CMC0 and the slave management board CMC1 (via the I2C bus) after power-up. If the main management board CMC0 and the auxiliary management board CMC1 are in place, acquiring data information of three ROMs on the main management board CMC0, and if the data information of the three ROMs is not destroyed, directly acquiring complete data information of the three ROMs; if the data information of one ROM is destroyed, recovering the destroyed data information based on the data information of the other two ROMs and a preset verification algorithm to obtain the complete data information of the three ROMs; if the data information of two or all ROMs is destroyed, switching to obtain the data information of three ROMs from the management board CMC1, and similarly, if the data information of three ROMs is not destroyed, directly obtaining the complete data information of three ROMs; if the data information of one ROM is destroyed, recovering the destroyed data information based on the data information of the other two ROMs and a preset verification algorithm to obtain the complete data information of the three ROMs; if the data information of two or all ROM is destroyed, the complete data information of three ROM cannot be obtained. If only one management board is in place, the data information of three ROMs is directly obtained from the management board, and similarly, if the data information of the three ROMs is not destroyed, the complete data information of the three ROMs can be directly obtained; if the data information of one ROM is destroyed, recovering the destroyed data information based on the data information of the other two ROMs and a preset verification algorithm to obtain the complete data information of the three ROMs; if the data information of two or all ROM is destroyed, the complete data information of three ROM cannot be obtained.
As an alternative embodiment, the first firmware data, the second firmware data and the verification information are all binary data;
the preset process of the checking algorithm comprises the following steps:
when the nth bit data of the first firmware data is 0 and the nth bit data of the second firmware data is 0, the nth bit data of the check information is 0; wherein n is a positive integer;
when the nth bit data of the first firmware data is 0 and the nth bit data of the second firmware data is 1, the nth bit data of the check information is 1;
when the nth bit data of the first firmware data is 1 and the nth bit data of the second firmware data is 0, the nth bit data of the check information is 1;
when the nth bit data of the first firmware data is 1 and the nth bit data of the second firmware data is 1, the nth bit data of the check information is 0.
Specifically, the first firmware data, the second firmware data and the verification information of the present application are all binary data (only 0, 1). When the nth bit data of the first firmware data is 0 and the nth bit data of the second firmware data is 0, calculating by a checking algorithm, wherein the nth bit data of the checking information is 0; when the nth bit data of the first firmware data is 0 and the nth bit data of the second firmware data is 1, calculating by a checking algorithm, wherein the nth bit data of the checking information is 1; when the nth bit data of the first firmware data is 1 and the nth bit data of the second firmware data is 0, calculating by a checking algorithm, wherein the nth bit data of the checking information is 1; when the nth bit data of the first firmware data is 1 and the nth bit data of the second firmware data is 1, the nth bit data of the check information is calculated to be 0 through a check algorithm, so that the data information of the two ROMs and the data information of the other ROM can be calculated through a preset check algorithm.
As an alternative embodiment, the controller 4 is further configured to:
accumulating the number of times that the data information of each ROM is destroyed, and judging whether the number of times that the data information of the target ROM is destroyed is larger than a preset number threshold; if yes, carrying out replacement reminding of the target ROM; wherein the target ROM is any ROM.
Further, the controller 4 may also accumulate the number of times the data information of each ROM on each management board is destroyed, and determine whether the number of times the data information of any one ROM (referred to as a target ROM) is destroyed is greater than a preset number of times threshold; if the target ROM is larger than the preset frequency threshold, the target ROM is considered to be replaced, and then replacement reminding of the target ROM is carried out, so that a user is reminded of the need of replacing the target ROM at the moment in time.
As an alternative embodiment, the number of computing boards is a plurality;
the firmware self-recovery device further includes:
a switching circuit Switch connected to the controller 4 and the plurality of computing boards, respectively;
the controller 4 is further configured to control the Switch circuit Switch to Switch on a communication link between itself and a target computing board having a boot firmware loading requirement according to the boot firmware loading requirements of the plurality of computing boards, so as to transmit complete data information of the three ROMs to the target computing board for loading when the target computing board is booted.
Further, the number of computing boards in the server system is plural, and the firmware self-recovery device further includes a switching circuit Switch (which may be disposed on the adapter board 5), where the switching circuit Switch is controlled by the controller 4, and the control principle is that: according to the loading demands of the starting firmware of the plurality of computing boards, the control Switch turns on the communication link between the controller 4 and the computing board (called the target computing board) with the loading demands of the starting firmware, so that the controller 4 can transmit the complete data information of the three ROMs to the target computing board (through the I2C bus) for loading and using when the target computing board is started, and therefore the plurality of computing boards share the data information of the ROMs on the management board, and cost saving is facilitated.
In addition, the controller 4 of the application can select the original FPGA (Field Programmable Gate Array ) in the server system, so that the cost is saved. When the server system is powered on, the FPGA is powered on before the computing board, so that complete data information of the ROM on the management board can be transmitted to the computing board for power-on loading.
Referring to fig. 3, fig. 3 is a flowchart of a firmware self-recovery method according to an embodiment of the application.
The firmware self-recovery method is applied to any one of the firmware self-recovery devices, and comprises the following steps:
step S1: and respectively acquiring data information of the first ROM, the second ROM and the third ROM.
Step S2: if the data information of one ROM is destroyed, recovering the destroyed data information based on the data information of the other two ROMs and a preset verification algorithm, so as to load and use a computing board of the server system when the computer system is started.
The description of the firmware self-recovery method provided by the present application refers to the embodiment of the firmware self-recovery device, and the disclosure is not repeated here.
The application also provides a server system, which comprises any firmware self-recovery device.
The description of the server system provided by the present application refers to the embodiment of the firmware self-recovery device, and the present application is not repeated here.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A firmware self-healing device, comprising:
a first ROM for storing first firmware data;
a second ROM for storing second firmware data; the first firmware data and the second firmware data are combined to obtain all effective firmware data of the server system;
the third ROM is used for storing verification information obtained by calculating the first firmware data and the second firmware data through a preset verification algorithm;
and the controller is used for respectively acquiring the data information of the first ROM, the second ROM and the third ROM, and if the data information of one ROM is damaged, recovering the damaged data information based on the data information of the other two ROMs and the preset verification algorithm so as to load and use a computing board of the server system when the computer system is started.
2. The firmware self-recovery apparatus of claim 1, wherein the controller is further to:
writing the recovered data information into the ROM with the destroyed data information so as to recover the original data information in the ROM with the destroyed data information.
3. The firmware self-recovery device according to claim 1, wherein the number of the first ROM, the second ROM and the third ROM is the same and all three are plural; one of the first ROM, one of the second ROM and one of the third ROM form a ROM group;
the firmware self-recovery device further includes:
a plurality of management boards, each of which is provided with one ROM group;
the adapter plate is provided with the controller and is connected with the plurality of management boards;
the controller is specifically configured to determine a main management board from the in-place management boards, acquire data information of three ROMs on the main management board, and if the data information of one of the ROMs is damaged, recover the damaged data information based on the data information of the other two ROMs and the preset verification algorithm; if the data information of two or all ROMs is destroyed, the main management board is redetermined from the rest management boards, and the step of acquiring the data information of three ROMs on the main management board is re-executed until the complete data information of the three ROMs is acquired, so that the computing board of the server system is loaded for use when being started.
4. The firmware self-recovery apparatus of claim 3, wherein the controller is further to:
and erasing the data information of the ROM group on the management board with the damaged data information, and correspondingly writing the data information of the ROM group on the management board with the undamaged data information into the ROM group on the management board with the damaged data information.
5. The firmware self-recovery apparatus according to claim 3, wherein the number of the first ROM, the second ROM, and the third ROM is two; the two management boards comprise a master management board and a slave management board;
the controller is specifically configured to detect the in-place situations of two management boards, if both the two management boards are in place, acquire data information of three ROMs on the main management board, and if the data information of one of the ROMs is damaged, recover the damaged data information based on the data information of the other two ROMs and the preset verification algorithm; if the data information of two or all ROMs is destroyed, switching to acquire the data information of three ROMs from the slave management board; if only one management board is in place, the data information of three ROMs is directly obtained from the management board for loading and using when the computing board of the server system is started.
6. The firmware self-recovery apparatus according to claim 1, wherein the first firmware data, the second firmware data and the verification information are binary data;
the preset process of the verification algorithm comprises the following steps:
when the nth bit data of the first firmware data is 0 and the nth bit data of the second firmware data is 0, the nth bit data of the check information is 0; wherein n is a positive integer;
when the nth bit data of the first firmware data is 0 and the nth bit data of the second firmware data is 1, the nth bit data of the check information is 1;
when the nth bit data of the first firmware data is 1 and the nth bit data of the second firmware data is 0, the nth bit data of the check information is 1;
when the nth bit data of the first firmware data is 1 and the nth bit data of the second firmware data is 1, the nth bit data of the check information is 0.
7. The firmware self-recovery apparatus of claim 2, wherein the controller is further to:
accumulating the number of times that the data information of each ROM is destroyed, and judging whether the number of times that the data information of the target ROM is destroyed is larger than a preset number threshold; if yes, carrying out replacement reminding of the target ROM; wherein the target ROM is any one of the ROMs.
8. The firmware self-recovery apparatus of any one of claims 1-7, wherein the number of computing boards is a plurality;
the firmware self-recovery device further includes:
the switch circuits are respectively connected with the controller and the plurality of computing boards;
the controller is also used for controlling the switch circuit to conduct a communication link between the switch circuit and a target computing board with the loading requirement of the starting firmware according to the loading requirements of the starting firmware of a plurality of computing boards so as to transmit complete data information of three ROMs to the target computing board for loading and using when the target computing board is started.
9. A firmware self-recovery method, applied to a firmware self-recovery device as claimed in any one of claims 1 to 8, comprising:
respectively acquiring data information of the first ROM, the second ROM and the third ROM;
if the data information of one ROM is destroyed, recovering the destroyed data information based on the data information of the other two ROMs and the preset verification algorithm for loading and using when the computing board of the server system is started.
10. A server system comprising a firmware self-healing device according to any one of claims 1-8.
CN202110808136.5A 2021-07-16 2021-07-16 Firmware self-recovery device, method and server system Active CN113704023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110808136.5A CN113704023B (en) 2021-07-16 2021-07-16 Firmware self-recovery device, method and server system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110808136.5A CN113704023B (en) 2021-07-16 2021-07-16 Firmware self-recovery device, method and server system

Publications (2)

Publication Number Publication Date
CN113704023A CN113704023A (en) 2021-11-26
CN113704023B true CN113704023B (en) 2023-08-11

Family

ID=78648834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110808136.5A Active CN113704023B (en) 2021-07-16 2021-07-16 Firmware self-recovery device, method and server system

Country Status (1)

Country Link
CN (1) CN113704023B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11984175B2 (en) * 2022-05-25 2024-05-14 Advanced Micro Devices, Inc. Automatic mirrored ROM

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109716A (en) * 2019-05-13 2019-08-09 深圳忆联信息系统有限公司 Guarantee that SSD firmware stablizes method, apparatus, computer equipment and the storage medium of load

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7849454B2 (en) * 2006-01-13 2010-12-07 Dell Products L.P. Automatic firmware corruption recovery and update

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109716A (en) * 2019-05-13 2019-08-09 深圳忆联信息系统有限公司 Guarantee that SSD firmware stablizes method, apparatus, computer equipment and the storage medium of load

Also Published As

Publication number Publication date
CN113704023A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
US9798632B2 (en) Providing boot data in a cluster network environment
CN101329632B (en) Method and apparatus for starting CPU by BOOT
US8468389B2 (en) Firmware recovery system and method of baseboard management controller of computing device
CN105700901B (en) Starting method, device and computer system
CN105874428B (en) Technology for the operating system transformation in multiple operating system environments
US20160306623A1 (en) Control module of node and firmware updating method for the control module
CN101681281A (en) Emerging bad block detection
CN103488498A (en) Computer booting method and computer
CN102662607A (en) RAID6 level mixed disk array, and method for accelerating performance and improving reliability
CN102035683B (en) Control method and system for switching of main board and standby board
CN1828535A (en) After-error recovery method of transmission equipment card software on-line update
CN113704023B (en) Firmware self-recovery device, method and server system
WO2017028375A1 (en) Version upgrading method and system
JP4491482B2 (en) Failure recovery method, computer, cluster system, management computer, and failure recovery program
CN108255630A (en) A kind of method for reducing solid state disk powered-off fault processing time
CN105224424A (en) A kind of backup method and system
CN109086078A (en) Android system upgrade method, device, server and mobile terminal
CN102831030A (en) Data backup and recovery system and method
US8015437B2 (en) Restoring data to a distributed storage node
CN111831476A (en) Method of controlling operation of RAID system
CN102025758A (en) Method, device and system fore recovering data copy in distributed system
US10078558B2 (en) Database system control method and database system
CN103150224A (en) Electronic equipment and method for improving starting reliability
CN101923500A (en) Backup and update method and device of bootstrap program in embedded equipment
CN110928726A (en) Embedded system self-recovery method and system based on watchdog and PXE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant