CN115048244A - Hardware repair method and system for server, computer equipment and medium - Google Patents

Hardware repair method and system for server, computer equipment and medium Download PDF

Info

Publication number
CN115048244A
CN115048244A CN202210655271.5A CN202210655271A CN115048244A CN 115048244 A CN115048244 A CN 115048244A CN 202210655271 A CN202210655271 A CN 202210655271A CN 115048244 A CN115048244 A CN 115048244A
Authority
CN
China
Prior art keywords
hardware
expander
firmware
large system
abnormal operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210655271.5A
Other languages
Chinese (zh)
Other versions
CN115048244B (en
Inventor
季树荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210655271.5A priority Critical patent/CN115048244B/en
Publication of CN115048244A publication Critical patent/CN115048244A/en
Application granted granted Critical
Publication of CN115048244B publication Critical patent/CN115048244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1441Resetting or repowering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Stored Programmes (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a hardware repair method, a system, computer equipment and a medium of a server, wherein the method comprises the following steps: dividing firmware partitions in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partitions; detecting the running state of each bottom layer hardware through the expander; in response to the detection of the abnormal running state of the bottom layer hardware, judging whether the large system is loaded successfully; reading a firmware file corresponding to the bottom hardware with abnormal operation from the firmware partition through the expander in response to the fact that the large system is not loaded successfully; and writing the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restarting the large system after the writing is finished so as to finish the repair of the bottom hardware with abnormal operation. By the scheme of the invention, hardware with abnormal operation can be quickly and correctly repaired, and the normal operation of the whole server is ensured.

Description

Hardware repair method and system for server, computer equipment and medium
Technical Field
The present invention relates to the field of server technologies, and in particular, to a method, a system, a computer device, and a medium for repairing hardware of a server.
Background
In cloud computing and big data era, mass data storage needs storage products with better performance and faster transmission rate, and the requirement of higher transmission rate and the guarantee of data integrity and reliability mean that a system of a server is more complicated, the complexity of the system means that more bottom layer hardware is needed for cooperative operation, firmware is the soul of hardware equipment, and data interaction, interdependence and high coupling of various firmware, under the condition, the safety of the firmware is particularly important, if the firmware of certain hardware is not upgraded to an effective version or does not directly have an initial version when leaving a factory in operation, or the operation of the bottom layer hardware is abnormal due to some reason in the operation of the storage system, the condition occurs, the information taken by a big system is wrong, and if the condition is serious, the whole server can not normally operate, with unacceptable consequences.
Therefore, effective upgrading of firmware of hardware equipment under various scenes is very important, if a large system is normally loaded and normally operated when the firmware is abnormal, the large system is operated on a CPU and has a file system, so that the large system can be directly moved to a directory of the file system to take the firmware file for normal upgrading after being started when the firmware is abnormal, the large system is required to be normally started and the firmware file is packaged into a large system upgrading file when the firmware is upgraded, after the large system is started, the version number and the operation condition of target hardware firmware are checked, and if the target hardware resource is normally started, but the version number of the operating firmware is inconsistent with the firmware version in an upgrading package, the firmware upgrading is triggered; however, if the firmware is abnormally started (the firmware running file is damaged or is directly empty), under such a condition, the large system can also directly write the firmware running file in the large packet into the target hardware, so that the purpose of repairing the problematic firmware is achieved. However, in many cases, there is a case where a large system is not loaded, and in this case, if bottom layer hardware such as a CPLD, an FPGA, and a PSU is abnormally started or firmware is empty, and there is no large system that can repair the bottom layer hardware, in this case, the entire server may fail to operate.
Disclosure of Invention
In view of this, the present invention provides a method, a system, a computer device, and a medium for repairing hardware of a server, which solve the problem that when a large system is not loaded and bottom hardware of the server is abnormal, the abnormal bottom hardware cannot be repaired, so that the bottom hardware cannot be normally started or a running program is disordered, and even the entire server cannot normally run.
Based on the above object, an aspect of the embodiments of the present invention provides a method for repairing hardware of a server, which specifically includes the following steps:
dividing a firmware partition in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partition;
detecting the running state of each bottom layer hardware through the expander;
in response to the detection of the abnormal running state of the bottom layer hardware, judging whether the large system is loaded successfully;
reading a firmware file corresponding to the bottom hardware with abnormal operation from the firmware partition through the expander in response to the fact that the large system is not loaded successfully;
and writing the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restarting the large system after the writing is finished so as to finish the repair of the bottom hardware with abnormal operation.
In some embodiments, partitioning the firmware partition in the memory space of the expander nonvolatile memory device comprises:
and dividing a firmware partition and a temporary storage partition in the storage space of the expander nonvolatile storage device.
In some embodiments, after determining whether the large system is successfully loaded in response to detecting the running state exception of the underlying hardware, the method further comprises:
responding to the successful loading of the large system, acquiring a firmware file corresponding to the bottom hardware with abnormal operation through the large system, sending the firmware file acquired through the large system to an expander, and executing the following steps based on the expander:
storing the firmware file acquired by the large system in a temporary storage partition;
reading the firmware file acquired by the large system from the temporary storage partition;
and driving the JTAG of the bottom hardware with abnormal operation to write the firmware file acquired by the large system into the bottom hardware with abnormal operation, and restarting the large system after the writing is finished.
In some embodiments, detecting the operational state of each underlying hardware by the expander comprises:
and periodically inquiring heartbeat information of each bottom layer hardware through the expander so as to detect the running state of each bottom layer hardware.
In some embodiments, detecting the operational state of each underlying hardware by the expander comprises:
and reading a register of the bottom hardware through the expander to confirm the running state of the bottom hardware, and synchronizing the running state of the bottom hardware to an upper-layer large system.
In some embodiments, writing the corresponding firmware file to the underlying hardware of the running exception via the expander comprises:
and driving the JTAG of the bottom hardware with abnormal operation through the expander so as to write the corresponding firmware file into the bottom hardware with abnormal operation.
In some embodiments, the non-volatile memory device includes any one of: FLASH and NVRAM.
In another aspect of the embodiments of the present invention, a hardware repair system for a server is further provided, including:
the writing module is configured to divide firmware partitions in the storage space of the expander nonvolatile storage device and write firmware files corresponding to all bottom hardware of the server into the firmware partitions;
the detection module is configured to detect the running state of each bottom layer hardware through the expander;
the judging module is configured to respond to the detection of the abnormal running state of the bottom layer hardware and judge whether the large system is loaded successfully;
the reading module is configured to respond to the fact that the large system is not loaded successfully, and read the firmware files corresponding to the bottom layer hardware with abnormal operation from the firmware partitions through the expander;
and the repairing module is configured to write the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restart the large system after the writing is finished so as to finish repairing the bottom hardware with abnormal operation.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing a computer program executable on the processor, the computer program when executed by the processor implementing the steps of the method as above.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has at least the following beneficial technical effects: dividing firmware partitions in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partitions; detecting the running state of each bottom layer hardware through the expander; in response to the detection of the abnormal running state of the bottom layer hardware, judging whether the large system is loaded successfully; reading a firmware file corresponding to the bottom hardware with abnormal operation from the firmware partition through the expander in response to the fact that the large system is not loaded successfully; the corresponding firmware file is written into the bottom hardware with abnormal operation through the expander, and the large system is restarted after the writing is finished so as to finish the repair of the bottom hardware with abnormal operation, so that the hardware with abnormal operation can be quickly and correctly repaired, and the normal operation of the whole server is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a block diagram of an embodiment of a hardware repair method for a server according to the present invention;
FIG. 2 is a flowchart of another embodiment of a hardware repair method for a server according to the present invention;
FIG. 3 is a diagram illustrating an embodiment of a hardware repair system for a server according to the present invention;
FIG. 4 is a schematic structural diagram of an embodiment of a computer device provided in the present invention;
fig. 5 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In view of the above object, a first aspect of the embodiments of the present invention provides an embodiment of a hardware repair method for a server. As shown in fig. 1, it includes the following steps:
s10, dividing a firmware partition in the storage space of the expander nonvolatile storage device, and writing the firmware files corresponding to all bottom hardware of the server into the firmware partition;
s20, detecting the running state of each bottom layer hardware through the expander;
s30, responding to the detected running state abnormity of the bottom layer hardware, and judging whether the large system is loaded successfully;
s40, responding to the fact that the large system is not loaded successfully, and reading a firmware file corresponding to the bottom layer hardware with abnormal operation from the firmware partition through the expander;
s50, writing the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restarting the large system after the writing is finished so as to finish the repair of the bottom hardware with abnormal operation.
The expander runs between a large system and underlying hardware (such as a CPLD, an FPGA, a PSU and the like) and belongs to a collection and transfer station of information, but the expander does not have a file system, so that a block of area is divided in a nonvolatile storage device (such as a FLASH or an NVRAM) for storing the firmware file of the underlying hardware, so that when the large system is not loaded and the underlying hardware runs abnormally, the expander can directly write the corresponding firmware file in the FLASH to repair the underlying hardware, and thus the normal running of the whole server is recovered.
Wherein a large system refers to an adaptation on the LUNIX system on the server CPU for presenting the information of the server to the client.
Fig. 2 is a flowchart illustrating repair of the hardware of the server. The specific process is as follows:
the expander detects the running state of each bottom layer hardware, and supposes that the CPLD is detected to be abnormal in running;
judging whether the large system is loaded successfully or not;
if the large system loading is successful, executing the following steps based on the expander:
reading a firmware file of the CPLD prestored in the firmware partition;
and the JTAG driving the CPLD writes the firmware file into the CPLD and restarts the large system after the writing is finished.
The following three aspects are used to describe an application scenario for repairing hardware based on expander when a large system is not loaded successfully.
1) And (3) in a research and development debugging stage: in the research, development and debugging process, the cooperation of the firmware of multiple hardware is in a primary stage, the adaptation of a large system is not completed, abnormal conditions such as data acquisition, data format and data interaction are easy to occur, so that a certain firmware runs abnormally and even is blocked, the abnormal firmware is directly upgraded and repaired through the expander, the times of manually burning the abnormal firmware can be reduced, the research and development time is saved, and the research and development efficiency is improved.
2) Testing and production stages: in the testing and production debugging stage, a large system does not exist, testing and production line workers are not product research and development personnel, the levels are different, the debugging means are various, a plurality of abnormal scenes which have extremely low probability and are extremely difficult to locate can be triggered, if the firmware cannot be normally started, the means are limited, the problem is solved by replacing a chip, the efficiency is low, and the testing and production efficiency can be improved by directly upgrading and repairing the abnormal firmware through the expander;
3) a client field stage: the product sent to the customer site is often not the final firmware version, the large system may not be adapted, and if the situation of abnormal starting of the bottom layer firmware occurs at this time, because the product is the customer site, a rough method for directly replacing a chip or a firmware FLASH cannot be adopted, and the abnormal firmware is directly upgraded and repaired through the expander, so that the perception of the customer on the abnormality is reduced, the customer experience can be greatly improved, and public praise benefits are brought to the company.
In the embodiment of the invention, the firmware partitions are divided in the storage space of the expander nonvolatile storage device, and the firmware files corresponding to all the bottom hardware of the server are written into the firmware partitions; detecting the running state of each bottom layer hardware through the expander; in response to the detection of the abnormal running state of the bottom layer hardware, judging whether the large system is loaded successfully; reading a firmware file corresponding to the bottom hardware with abnormal operation from the firmware partition through the expander in response to the fact that the large system is not loaded successfully; the corresponding firmware file is written into the bottom hardware with abnormal operation through the expander, and the large system is restarted after the writing is finished so as to finish the repair of the bottom hardware with abnormal operation, so that the hardware with abnormal operation can be quickly and correctly repaired, and the normal operation of the whole server is ensured.
In some embodiments, partitioning the firmware partition in the memory space of the expander nonvolatile memory device comprises:
and dividing a firmware partition and a temporary storage partition in the storage space of the expander nonvolatile storage device.
Specifically, the firmware partition is used for prestoring the firmware file of the bottom hardware, so that when the large system is not loaded, the expander can read the firmware file of the bottom hardware with abnormal operation from the firmware partition and write the firmware file into the bottom hardware with abnormal operation; the temporary storage partition is used for temporarily storing the firmware file read by the large system from the file system directory when the large system is successfully loaded and the bottom hardware is abnormal during normal starting, and clearing the temporarily stored firmware file after the firmware file is written into the abnormal bottom hardware.
In some embodiments, after determining whether the large system is successfully loaded in response to detecting the running state exception of the underlying hardware, the method further comprises:
responding to the successful loading of the large system, acquiring a firmware file corresponding to the bottom hardware with abnormal operation through the large system, sending the firmware file acquired through the large system to the expander, and executing the following steps based on the expander:
storing the firmware file acquired by the large system in a temporary storage partition;
reading the firmware file acquired by the large system from the temporary storage partition;
and driving the JTAG of the bottom hardware with abnormal operation to write the firmware file acquired by the large system into the bottom hardware with abnormal operation, and restarting the large system after the writing is finished.
The repair process of hardware when a large system is successfully loaded will be described with reference to fig. 2. The specific process is as follows:
the expander detects the running state of each bottom layer hardware, and supposes that the CPLD is detected to be abnormal in running;
judging whether the large system is loaded successfully or not;
if the large system is loaded successfully, executing the following steps based on the large system:
acquiring and decompressing a firmware file;
sending the decompressed firmware file to the expander, and executing the following steps based on the expander after the firmware file is sent to the expander:
storing the decompressed firmware file into a temporary storage partition;
and the JTAG driving the CPLD reads the decompressed firmware file from the temporary storage subarea and writes the firmware file into the CPLD, and the large system is restarted after the writing is finished.
The embodiment of the invention can be used for the storage system of the server. When a certain bottom hardware firmware of the storage system is abnormal, if the large system is loaded, the large system can directly read a firmware file corresponding to the bottom hardware in the file system, and then upgrade the firmware of the abnormal hardware through the expander, so that the problem hardware is repaired. If the large system is not loaded successfully, the expander is used as a temporary processor, firmware files of hardware (such as CPLD, FPGA and PSU) which can be monitored by the expander and has an interactive link in the server are prestored in a firmware partition of a nonvolatile memory device for storing the expander, when certain hardware of the storage system is abnormal and the large system is not loaded successfully, the firmware files prestored in the firmware partition are directly read to upgrade the problem hardware by setting logic in the firmware of the expander, so that the problem firmware is repaired quickly and correctly, and the normal operation of the storage system and the normal loading of the large system are ensured.
In some embodiments, detecting the operational state of each underlying hardware by the expander comprises:
and periodically inquiring heartbeat information of each bottom layer hardware through the expander so as to detect the running state of each bottom layer hardware.
In some embodiments, detecting the operational state of each underlying hardware by the expander comprises:
and reading a register of the bottom hardware through the expander to confirm the running state of the bottom hardware, and synchronizing the running state of the bottom hardware to an upper-layer large system.
In the specific embodiment of the invention, an interface for inquiring the running state of the firmware is provided for the bottom layer hardware as much as possible in the expander program, and the running state of the hardware is confirmed by measuring (namely periodically inquiring) heartbeat information of the bottom layer hardware such as a CPLD, an FPGA, a PSU and the like or actually reading a register of the bottom layer hardware through buses such as I2C, a GPIO and the like, and synchronizing the information to the upper layer large system.
The method comprises the steps that related data of case management needed by a large system are processed by accessing a mainboard expander, the mainboard expander takes the related data through bottom layer protocols such as CPLD and I2C, and then the related data are subjected to data interaction with the large system in a packaging mode.
In some embodiments, writing the corresponding firmware file to the underlying hardware of the running exception via the expander comprises:
and driving the JTAG of the bottom hardware with abnormal operation through the expander so as to write the corresponding firmware file into the bottom hardware with abnormal operation.
In some embodiments, the non-volatile memory device includes any one of: FLASH and NVRAM.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a hardware repair system for a server, including:
a writing module 110, where the writing module 110 is configured to divide a firmware partition in a storage space of the expander nonvolatile storage device, and write firmware files corresponding to all bottom-layer hardware of the server into the firmware partition;
a detection module 120, wherein the detection module 120 is configured to detect the operating state of each bottom layer hardware through the expander;
a determining module 130, wherein the determining module 130 is configured to determine whether the large system is successfully loaded in response to detecting that the running state of the underlying hardware is abnormal;
the reading module 140, the reading module 140 being configured to, in response to the large system not being loaded successfully, read, by the expander, a firmware file corresponding to the bottom layer hardware with the abnormal operation from the firmware partition;
and the repairing module 150 is configured to write the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restart the large system after the writing is completed so as to complete the repairing of the bottom hardware with abnormal operation.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, the embodiment of the present invention further provides a computer device 30, in which the computer device 30 comprises a processor 310 and a memory 320, the memory 320 stores a computer program 321 that can run on the processor, and the processor 310 executes the program to perform the following steps of the method.
Dividing a firmware partition in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partition;
detecting the running state of each bottom layer hardware through the expander;
in response to the detection of the abnormal running state of the bottom layer hardware, judging whether the large system is loaded successfully;
reading a firmware file corresponding to the bottom hardware with abnormal operation from the firmware partition through the expander in response to the fact that the large system is not loaded successfully;
and writing the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restarting the large system after the writing is finished so as to finish the repair of the bottom hardware with abnormal operation.
The memory, as a non-volatile computer-readable storage medium, may be used to store a non-volatile software program, a non-volatile computer-executable program, and modules, such as program instructions/modules corresponding to the hardware repair method of the server in this embodiment of the present application. The processor executes various functional applications and data processing of the device by running the nonvolatile software programs, instructions and modules stored in the memory, that is, the hardware repair method of the server implementing the above method embodiments.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
In some embodiments, partitioning the firmware partition in the memory space of the expander nonvolatile memory device comprises:
and dividing a firmware partition and a temporary storage partition in the storage space of the expander nonvolatile storage device.
In some embodiments, after determining whether the large system is successfully loaded in response to detecting the running state exception of the underlying hardware, the method further comprises:
responding to the successful loading of the large system, acquiring a firmware file corresponding to the bottom hardware with abnormal operation through the large system, sending the firmware file acquired through the large system to an expander, and executing the following steps based on the expander:
storing the firmware file acquired by the large system in a temporary storage partition;
reading the firmware file acquired by the large system from the temporary storage partition;
and driving the JTAG of the bottom hardware with abnormal operation to write the firmware file acquired by the large system into the bottom hardware with abnormal operation, and restarting the large system after the writing is finished.
In some embodiments, detecting the operational state of each underlying hardware by the expander comprises:
and periodically inquiring heartbeat information of each bottom layer hardware through the expander so as to detect the running state of each bottom layer hardware.
In some embodiments, detecting the operational state of each underlying hardware by the expander comprises:
and reading a register of the bottom hardware through the expander to confirm the running state of the bottom hardware, and synchronizing the running state of the bottom hardware to the upper large system.
In some embodiments, writing the corresponding firmware file to the underlying hardware of the running exception via the expander comprises:
and driving the JTAG of the bottom hardware with abnormal operation through the expander so as to write the corresponding firmware file into the bottom hardware with abnormal operation.
In some embodiments, the non-volatile memory device includes any one of: FLASH and NVRAM.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 5, an embodiment of the present invention further provides a computer-readable storage medium 40, where the computer-readable storage medium 40 stores a computer program 410, which when executed by a processor, performs the above method.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments corresponding thereto.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A hardware repair method for a server, comprising:
dividing a firmware partition in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partition;
detecting the running state of each bottom layer hardware through the expander;
in response to the detection of the abnormal running state of the bottom layer hardware, judging whether the large system is loaded successfully;
reading a firmware file corresponding to the bottom hardware with abnormal operation from the firmware partition through the expander in response to the fact that the large system is not loaded successfully;
and writing the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restarting the large system after the writing is finished so as to finish the repair of the bottom hardware with abnormal operation.
2. The method of claim 1, wherein partitioning the firmware partition in the memory space of the expander nonvolatile memory device comprises:
and dividing a firmware partition and a temporary storage partition in the storage space of the expander nonvolatile storage device.
3. The method of claim 2, after determining whether the large system was successfully loaded in response to detecting an operational status anomaly of the underlying hardware, further comprising:
responding to the successful loading of the large system, acquiring a firmware file corresponding to the bottom hardware with abnormal operation through the large system, sending the firmware file acquired through the large system to the expander, and executing the following steps based on the expander:
storing the firmware file acquired by the large system in a temporary storage partition;
reading the firmware file acquired by the large system from the temporary storage partition;
and driving the JTAG of the bottom hardware with abnormal operation to write the firmware file acquired by the large system into the bottom hardware with abnormal operation, and restarting the large system after the writing is finished.
4. The method of claim 1, wherein detecting the operational status of each underlying hardware via the expander comprises:
and periodically inquiring the heartbeat information of each bottom layer hardware through the expander to detect the running state of each bottom layer hardware.
5. The method of claim 1, wherein detecting the operational status of each underlying hardware via the expander comprises:
and reading a register of the bottom hardware through the expander to confirm the running state of the bottom hardware, and synchronizing the running state of the bottom hardware to an upper-layer large system.
6. The method of claim 1, wherein writing the corresponding firmware file to underlying hardware running an exception via the expander comprises:
and driving the JTAG of the bottom hardware with abnormal operation through the expander so as to write the corresponding firmware file into the bottom hardware with abnormal operation.
7. The method of claim 1, wherein the non-volatile storage device comprises any one of: FLASH and NVRAM.
8. A hardware repair system for a server, comprising:
the writing module is configured to divide firmware partitions in the storage space of the expander nonvolatile storage device and write firmware files corresponding to all bottom hardware of the server into the firmware partitions;
the detection module is configured to detect the running state of each bottom layer hardware through the expander;
the judging module is configured to respond to the detection that the running state of the bottom layer hardware is abnormal and judge whether the large system is loaded successfully;
the reading module is configured to respond to the fact that the large system is not loaded successfully, and read the firmware files corresponding to the bottom layer hardware with abnormal operation from the firmware partitions through the expander;
and the repairing module is configured to write the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restart the large system after the writing is finished so as to finish repairing the bottom hardware with abnormal operation.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202210655271.5A 2022-06-10 2022-06-10 Hardware repairing method, system, computer equipment and medium of server Active CN115048244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210655271.5A CN115048244B (en) 2022-06-10 2022-06-10 Hardware repairing method, system, computer equipment and medium of server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210655271.5A CN115048244B (en) 2022-06-10 2022-06-10 Hardware repairing method, system, computer equipment and medium of server

Publications (2)

Publication Number Publication Date
CN115048244A true CN115048244A (en) 2022-09-13
CN115048244B CN115048244B (en) 2024-06-07

Family

ID=83160479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210655271.5A Active CN115048244B (en) 2022-06-10 2022-06-10 Hardware repairing method, system, computer equipment and medium of server

Country Status (1)

Country Link
CN (1) CN115048244B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140115386A1 (en) * 2012-10-24 2014-04-24 Hon Hai Precision Industry Co., Ltd. Server and method for managing server
US20200201568A1 (en) * 2018-12-20 2020-06-25 Micron Technology, Inc. Exception handling based on responses to memory requests in a memory subsystem
CN112230939A (en) * 2020-09-01 2021-01-15 西安广和通无线软件有限公司 Hardware module repairing method and device, computer equipment and storage medium
CN113448760A (en) * 2021-06-05 2021-09-28 山东英信计算机技术有限公司 Method, system, equipment and medium for recovering abnormal state of hard disk

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140115386A1 (en) * 2012-10-24 2014-04-24 Hon Hai Precision Industry Co., Ltd. Server and method for managing server
US20200201568A1 (en) * 2018-12-20 2020-06-25 Micron Technology, Inc. Exception handling based on responses to memory requests in a memory subsystem
CN112230939A (en) * 2020-09-01 2021-01-15 西安广和通无线软件有限公司 Hardware module repairing method and device, computer equipment and storage medium
CN113448760A (en) * 2021-06-05 2021-09-28 山东英信计算机技术有限公司 Method, system, equipment and medium for recovering abnormal state of hard disk

Also Published As

Publication number Publication date
CN115048244B (en) 2024-06-07

Similar Documents

Publication Publication Date Title
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
US8239854B2 (en) Bookmark and configuration file for installation sequence
CN109144789B (en) Method, device and system for restarting OSD
CN110225078B (en) Application service updating method, system and terminal equipment
CN112099825B (en) Method, device, equipment and storage medium for upgrading component
CN113726553A (en) Node fault recovery method and device, electronic equipment and readable storage medium
CN113805925A (en) Online upgrading method, device, equipment and medium for distributed cluster management software
CN116266150A (en) Service recovery method, data processing unit and related equipment
CN114020509A (en) Method, device and equipment for repairing work load cluster and readable storage medium
US7734956B2 (en) Process management system
CN113703823A (en) BMC (baseboard management controller) firmware upgrading method and device, electronic equipment and storage medium
CN114385418A (en) Protection method, device, equipment and storage medium for communication equipment
CN111124724B (en) Node fault testing method and device of distributed block storage system
CN115048244A (en) Hardware repair method and system for server, computer equipment and medium
JP2018180982A (en) Information processing device and log recording method
CN107273291B (en) Processor debugging method and system
CN113778763B (en) Intelligent switching method and system for three-way interface service faults
CN114201393A (en) Software test processing method, device, equipment, medium and program product
CN114793196A (en) Firmware upgrading method, device, equipment and storage medium
CN115543399A (en) Software processing system, software processing method and device
CN105677515A (en) Online backup method and system for database
CN112596750A (en) Application testing method and device, electronic equipment and computer readable storage medium
CN114978891B (en) Processing method, device and storage medium for BIOS configuration of network device
CN105765908A (en) Method, client and system for multi-site automatic update
US20240319987A1 (en) State machine operation for non-disruptive update of a data management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant