CN115048244B - Hardware repairing method, system, computer equipment and medium of server - Google Patents

Hardware repairing method, system, computer equipment and medium of server Download PDF

Info

Publication number
CN115048244B
CN115048244B CN202210655271.5A CN202210655271A CN115048244B CN 115048244 B CN115048244 B CN 115048244B CN 202210655271 A CN202210655271 A CN 202210655271A CN 115048244 B CN115048244 B CN 115048244B
Authority
CN
China
Prior art keywords
hardware
expander
firmware
large system
abnormal operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210655271.5A
Other languages
Chinese (zh)
Other versions
CN115048244A (en
Inventor
季树荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210655271.5A priority Critical patent/CN115048244B/en
Publication of CN115048244A publication Critical patent/CN115048244A/en
Application granted granted Critical
Publication of CN115048244B publication Critical patent/CN115048244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1441Resetting or repowering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Stored Programmes (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a hardware repairing method, a system, a computer device and a medium of a server, wherein the method comprises the following steps: dividing a firmware partition in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partition; detecting the running state of each bottom layer hardware through an expander; judging whether the large system is loaded successfully or not in response to detecting that the running state of the underlying hardware is abnormal; reading a firmware file corresponding to the bottom hardware with abnormal operation from the firmware partition through an expander in response to the successful loading of the large system; and writing the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restarting the large system after the writing is completed so as to complete the repair of the bottom hardware with abnormal operation. By the scheme of the invention, the hardware with abnormal operation can be quickly and correctly repaired, and the normal operation of the whole server is ensured.

Description

Hardware repairing method, system, computer equipment and medium of server
Technical Field
The present invention relates to the field of server technologies, and in particular, to a method, a system, a computer device, and a medium for repairing hardware of a server.
Background
In the cloud computing and big data age, mass data storage needs a storage product with better performance and faster transmission rate, the transmission rate is higher, meanwhile, the integrity and reliability of data are guaranteed, the system of a server is more complicated, the complexity of the system means that more bottom hardware is needed to cooperate with each other, the firmware is the soul of hardware equipment, the data interaction, interdependence and high coupling performance of various firmware are needed, under the condition that the safety of the firmware is particularly important, if in operation, the firmware of a certain hardware is not upgraded to an effective version or directly has no initial version when leaving a factory, or the operation of the bottom hardware is abnormal due to some reasons in the operation of the storage system, the situation occurs, the information error taken by the big system is caused, and if serious, the whole server cannot operate normally, so that the consequences are unacceptable.
Therefore, the effective upgrade of the firmware of the hardware equipment is very important in various scenes, if the large system is normally loaded and operates normally when the firmware is abnormal, as the large system operates on the CPU and has a file system, the large system can directly take the firmware file under the directory of the file system to be normally upgraded after the large system is started when the firmware is abnormal, the large system is required to be normally started and the firmware file is packaged into a large system upgrade file when the firmware is upgraded, after the large system is started, the version number and the operation condition of the target hardware firmware are checked, and if the target hardware resource is normally started, but the version number of the operation firmware is inconsistent with the firmware version in the upgrade package, the firmware upgrade is triggered; however, if the firmware is abnormally and abnormally started (the firmware running file is damaged or the firmware running file is directly empty), the large system can also directly write the firmware running file in the large package into the target hardware, so that the purpose of repairing the problem firmware is achieved. However, in many cases, there are cases where a large system is not loaded, in which case if the underlying hardware such as CPLD, FPGA, PSU is started up abnormally or the firmware is empty, and there is no large system that can repair the underlying hardware, in which case the entire server may fail to operate.
Disclosure of Invention
In view of this, the invention provides a method, a system, a computer device and a medium for repairing hardware of a server, which solve the problem that when the bottom hardware of the server is abnormal under the condition of not loading a large system, the abnormal bottom hardware cannot be repaired, so that the bottom hardware cannot be started normally or the running program is disordered, and even the whole server cannot run normally.
Based on the above objects, an aspect of the embodiments of the present invention provides a method for repairing hardware of a server, which specifically includes the following steps:
Dividing a firmware partition in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partition;
detecting the running state of each bottom hardware through the expander;
Judging whether the large system is loaded successfully or not in response to detecting that the running state of the underlying hardware is abnormal;
responding to the fact that the large system is not loaded successfully, and reading a firmware file corresponding to bottom-layer hardware with abnormal operation from the firmware partition through the expander;
and writing the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restarting the large system after the writing is completed so as to complete the repair of the bottom hardware with abnormal operation.
In some implementations, partitioning firmware partitions in a memory space of an expander nonvolatile memory device includes:
and dividing a firmware partition and a temporary storage partition in the storage space of the expander nonvolatile storage device.
In some embodiments, after determining whether the large system is loaded successfully in response to detecting an operational state exception with the underlying hardware, further comprising:
responding to successful loading of the large system, acquiring a firmware file corresponding to the bottom hardware with abnormal operation through the large system, sending the firmware file acquired through the large system to an expander, and executing the following steps based on the expander:
Storing the firmware file acquired by the large system in a temporary storage partition;
reading the firmware file acquired by the large system from the temporary storage partition;
and driving JTAG of the bottom hardware with abnormal operation to write the firmware file acquired by the large system into the bottom hardware with abnormal operation, and restarting the large system after the writing is completed.
In some embodiments, detecting, by the expander, an operational state of each underlying hardware includes:
And inquiring the heartbeat information of each bottom layer hardware through the expander period to detect the running state of each bottom layer hardware.
In some embodiments, detecting, by the expander, an operational state of each underlying hardware includes:
And reading a register of the bottom hardware through the expander to confirm the running state of the bottom hardware, and synchronizing the running state of the bottom hardware to an upper-layer large system.
In some embodiments, writing the corresponding firmware file to the abnormally running underlying hardware by the expander includes:
And driving JTAG of the bottom hardware with abnormal operation through the expander to write the corresponding firmware file into the bottom hardware with abnormal operation.
In some embodiments, the nonvolatile memory device includes any one of the following: FLASH and NVRAM.
In another aspect of the embodiment of the present invention, there is also provided a hardware repair system of a server, including:
The writing module is configured to divide a firmware partition in a storage space of the expander nonvolatile storage device and write firmware files corresponding to all bottom hardware of the server into the firmware partition;
the detection module is configured to detect the running state of each bottom layer hardware through the expander;
the judging module is configured to respond to the detection of the abnormal running state of the underlying hardware and judge whether the loading of the large system is successful or not;
the reading module is configured to respond to the fact that the large system is not loaded successfully, and the firmware file corresponding to the bottom-layer hardware with abnormal operation is read from the firmware partition through the expander;
And the repair module is configured to write the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restart the large system after the writing is completed so as to complete the repair of the bottom hardware with abnormal operation.
In yet another aspect of the embodiment of the present invention, there is also provided a computer apparatus, including: at least one processor; and a memory storing a computer program executable on the processor, which when executed by the processor, performs the steps of the method as above.
In yet another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method steps as described above.
The invention has at least the following beneficial technical effects: dividing a firmware partition in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partition; detecting the running state of each bottom layer hardware through an expander; judging whether the large system is loaded successfully or not in response to detecting that the running state of the underlying hardware is abnormal; reading a firmware file corresponding to the bottom hardware with abnormal operation from the firmware partition through an expander in response to the successful loading of the large system; the corresponding firmware file is written into the bottom hardware with abnormal operation through the expander, and the large system is restarted after the writing is completed to complete the repair of the bottom hardware with abnormal operation, so that the hardware with abnormal operation can be quickly and correctly repaired, and the normal operation of the whole server is ensured.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of an embodiment of a method for repairing hardware of a server according to the present invention;
FIG. 2 is a flowchart of a method for repairing hardware of a server according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a hardware repair system of a server according to the present invention;
FIG. 4 is a schematic diagram of a computer device according to an embodiment of the present invention;
Fig. 5 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two entities with the same name but different entities or different parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention, and the following embodiments are not described one by one.
Based on the above object, in a first aspect of the embodiments of the present invention, an embodiment of a hardware repair method of a server is provided. As shown in fig. 1, it includes the steps of:
S10, dividing a firmware partition in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partition;
S20, detecting the running state of each bottom layer hardware through the expander;
S30, judging whether the large system is loaded successfully or not in response to detecting that the running state of the underlying hardware is abnormal;
S40, responding to the fact that the large system is not loaded successfully, and reading a firmware file corresponding to bottom-layer hardware with abnormal operation from the firmware partition through the expander;
S50, writing the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restarting the large system after the writing is completed so as to complete the repair of the bottom hardware with abnormal operation.
The expander runs between the large system and the underlying hardware (such as CPLD, FPGA, PSU, etc.), belongs to the information collection and transfer station, but does not have a file system, so that a block of area is divided in a nonvolatile storage device (such as FLASH or NVRAM) storing the expander for storing firmware files of the underlying hardware, so that when the large system is unloaded and the underlying hardware runs abnormally, the expander can directly write the corresponding firmware files into the underlying hardware by reading the corresponding firmware files in the FLASH to repair the underlying hardware, and the normal running of the whole server is restored.
Wherein large system refers to an adaptation on LUNIX systems on the server CPU for presenting the information of the server to the client.
As shown in fig. 2, a flowchart for repairing hardware of a server is shown. The specific flow is as follows:
The expander detects the running state of each bottom hardware, and presumes that CPLD running abnormality is detected at the moment;
Judging whether the large system is loaded successfully or not;
if the large system is loaded successfully, the following steps are executed based on the expander:
Reading a firmware file of a CPLD pre-stored in a firmware partition;
JTAG driving CPLD writes firmware file into CPLD, and restarts large system after writing is completed.
The following describes an application scenario for repairing hardware based on an expander when a large system is not loaded successfully in three aspects.
1) Development and debugging stage: in the research and development debugging process, the matching of the firmware of multiple hardware is in a primary stage, and the large system adaptation is not completed, so that abnormal conditions such as data acquisition, data format, data interaction and the like easily occur to cause abnormal operation and even jamming of a certain firmware, and the abnormal firmware is directly updated and repaired by an expander, so that the frequency of manually burning the abnormal firmware can be reduced, the research and development time is saved, and the research and development efficiency is improved.
2) Testing and production stages: in the test and production debugging stage, a large system does not exist, and because test and production line staff are not product research staff, the level is uneven, the debugging means are various, a plurality of abnormal scenes with extremely small probability and extremely difficult positioning can be triggered, if the situation that the firmware cannot be started normally occurs, the means are limited, the problem is usually solved by replacing a chip, the efficiency is low, and the abnormal firmware is directly updated and repaired by an expander, so that the test and production efficiency can be improved;
3) Customer site stage: the product sent to the customer site is often not the final firmware version, the large system may not be adapted, if the bottom firmware is started abnormally, because the product is the customer site, a rough method of directly replacing a chip or firmware FLASH cannot be adopted, and the abnormal firmware is directly updated and repaired by an expander, so that the perception of the customer on the abnormality is reduced, the customer experience is greatly improved, and the public praise benefit is brought to the company.
According to the embodiment of the invention, the firmware partition is divided in the storage space of the expander nonvolatile storage device, and the firmware files corresponding to all the bottom hardware of the server are written into the firmware partition; detecting the running state of each bottom layer hardware through an expander; judging whether the large system is loaded successfully or not in response to detecting that the running state of the underlying hardware is abnormal; reading a firmware file corresponding to the bottom hardware with abnormal operation from the firmware partition through an expander in response to the successful loading of the large system; the corresponding firmware file is written into the bottom hardware with abnormal operation through the expander, and the large system is restarted after the writing is completed to complete the repair of the bottom hardware with abnormal operation, so that the hardware with abnormal operation can be quickly and correctly repaired, and the normal operation of the whole server is ensured.
In some implementations, partitioning firmware partitions in a memory space of an expander nonvolatile memory device includes:
and dividing a firmware partition and a temporary storage partition in the storage space of the expander nonvolatile storage device.
Specifically, the firmware partition is used for pre-storing the firmware file of the bottom hardware, so that when the large system is not loaded, the expander can read the firmware file of the bottom hardware with abnormal operation from the firmware partition and write the firmware file into the bottom hardware with abnormal operation; the temporary storage partition is used for temporarily storing the firmware file read from the file system directory by the large system when the large system is successfully loaded and normally started, and clearing the temporarily stored firmware file after the firmware file is written into the abnormal bottom hardware.
In some embodiments, after determining whether the large system is loaded successfully in response to detecting an operational state exception with the underlying hardware, further comprising:
responding to successful loading of the large system, acquiring a firmware file corresponding to the bottom hardware with abnormal operation through the large system, sending the firmware file acquired through the large system to an expander, and executing the following steps based on the expander:
Storing the firmware file acquired by the large system in a temporary storage partition;
reading the firmware file acquired by the large system from the temporary storage partition;
and driving JTAG of the bottom hardware with abnormal operation to write the firmware file acquired by the large system into the bottom hardware with abnormal operation, and restarting the large system after the writing is completed.
The repair process of the hardware when the large system is successfully loaded is described with reference to fig. 2. The specific flow is as follows:
The expander detects the running state of each bottom hardware, and presumes that CPLD running abnormality is detected at the moment;
Judging whether the large system is loaded successfully or not;
if the loading of the large system is successful, the following steps are executed based on the large system:
Obtaining a firmware file and decompressing;
Transmitting the decompressed firmware file to an expander, and executing the following steps based on the expander after transmitting the firmware file to the expander:
storing the decompressed firmware file into a temporary storage area;
and the JTAG of the CPLD is driven to read the decompressed firmware file from the temporary storage partition, write the firmware file into the CPLD, and restart the large system after the writing is completed.
The embodiment of the invention can be used for a storage system of a server. When a certain bottom hardware firmware of the storage system is abnormal, if the large system is loaded, the large system can directly read a firmware file corresponding to the bottom hardware in the file system, and then the firmware of the abnormal hardware is updated through an expander, so that the problem hardware is repaired. If the large system is not loaded successfully, the expander is taken as a temporary processor, a firmware file of hardware (for example CPLD, FPGA, PSU) which can be monitored by the expander and has an interactive link in a server is prestored in a firmware partition of a nonvolatile storage device storing the expander, and when a certain hardware of the storage system is abnormal and the large system is not loaded successfully, the firmware file prestored in the firmware partition is directly read to upgrade the problem hardware by setting logic in the firmware of the expander, so that the problem firmware is repaired rapidly and correctly, and the normal operation of the storage system and the normal loading of the large system are ensured.
In some embodiments, detecting, by the expander, an operational state of each underlying hardware includes:
And inquiring the heartbeat information of each bottom layer hardware through the expander period to detect the running state of each bottom layer hardware.
In some embodiments, detecting, by the expander, an operational state of each underlying hardware includes:
And reading a register of the bottom hardware through the expander to confirm the running state of the bottom hardware, and synchronizing the running state of the bottom hardware to an upper-layer large system.
In a specific embodiment of the present invention, interfaces for querying firmware running states are provided for as many bottom hardware as possible in the expander program, and the running state of the hardware is confirmed by measuring (i.e. periodically querying) heartbeat information of the bottom hardware such as CPLD, FPGA, PSU or actually reading a register of the bottom hardware through a bus such as I2C, GPIO, and synchronizing the information to an upper-level large system.
The case management related data needed by the large system is processed by accessing a main board expander, the main board expander takes the related data through a CPLD, an I2C and other bottom layer protocols, then the related data are interacted with the large system through a form of a group package, and based on the data, the expander is used as an information collecting and processing platform and an information transfer station, and the information interaction channels of the expander and other firmware are basically opened, so that when the bottom layer hardware operates abnormally, and the large system is not loaded, the expander directly takes an upgrade file to write in the CPLD and other hardware through a firmware file stored in a nonvolatile storage device (such as FLASH) of the expander, and further the bottom layer hardware is repaired.
In some embodiments, writing the corresponding firmware file to the abnormally running underlying hardware by the expander includes:
And driving JTAG of the bottom hardware with abnormal operation through the expander to write the corresponding firmware file into the bottom hardware with abnormal operation.
In some embodiments, the nonvolatile memory device includes any one of the following: FLASH and NVRAM.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a hardware repair system of a server, including:
the writing module 110 is configured to divide a firmware partition in a storage space of the expander nonvolatile storage device, and write firmware files corresponding to all bottom hardware of the server into the firmware partition;
A detection module 120, where the detection module 120 is configured to detect an operation state of each underlying hardware through the expander;
A judging module 130, wherein the judging module 130 is configured to respond to the detection of the abnormal running state of the underlying hardware and judge whether the loading of the large system is successful;
The reading module 140 is configured to read a firmware file corresponding to the bottom-layer hardware with abnormal operation from the firmware partition through the expander in response to the large system not being loaded successfully;
And the repair module 150 is configured to write the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restart the large system after the writing is completed so as to complete the repair of the bottom hardware with abnormal operation.
According to another aspect of the present invention, as shown in fig. 4, according to the same inventive concept, an embodiment of the present invention further provides a computer device 30, in which the computer device 30 includes a processor 310 and a memory 320, the memory 320 stores a computer program 321 executable on the processor, and the processor 310 executes steps of the following method when executing the program.
Dividing a firmware partition in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partition;
detecting the running state of each bottom hardware through the expander;
Judging whether the large system is loaded successfully or not in response to detecting that the running state of the underlying hardware is abnormal;
responding to the fact that the large system is not loaded successfully, and reading a firmware file corresponding to bottom-layer hardware with abnormal operation from the firmware partition through the expander;
and writing the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restarting the large system after the writing is completed so as to complete the repair of the bottom hardware with abnormal operation.
The memory is used as a non-volatile computer readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer executable programs and modules, such as program instructions/modules corresponding to the hardware repair method of the server in the embodiment of the application. The processor executes various functional applications of the device and data processing by running nonvolatile software programs, instructions and modules stored in the memory, that is, implements the hardware repair method of the server of the above method embodiment.
The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the local module through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
In some implementations, partitioning firmware partitions in a memory space of an expander nonvolatile memory device includes:
and dividing a firmware partition and a temporary storage partition in the storage space of the expander nonvolatile storage device.
In some embodiments, after determining whether the large system is loaded successfully in response to detecting an operational state exception with the underlying hardware, further comprising:
responding to successful loading of the large system, acquiring a firmware file corresponding to the bottom hardware with abnormal operation through the large system, sending the firmware file acquired through the large system to an expander, and executing the following steps based on the expander:
Storing the firmware file acquired by the large system in a temporary storage partition;
reading the firmware file acquired by the large system from the temporary storage partition;
and driving JTAG of the bottom hardware with abnormal operation to write the firmware file acquired by the large system into the bottom hardware with abnormal operation, and restarting the large system after the writing is completed.
In some embodiments, detecting, by the expander, an operational state of each underlying hardware includes:
And inquiring the heartbeat information of each bottom layer hardware through the expander period to detect the running state of each bottom layer hardware.
In some embodiments, detecting, by the expander, an operational state of each underlying hardware includes:
And reading a register of the bottom hardware through the expander to confirm the running state of the bottom hardware, and synchronizing the running state of the bottom hardware to an upper-layer large system.
In some embodiments, writing the corresponding firmware file to the abnormally running underlying hardware by the expander includes:
And driving JTAG of the bottom hardware with abnormal operation through the expander to write the corresponding firmware file into the bottom hardware with abnormal operation.
In some embodiments, the nonvolatile memory device includes any one of the following: FLASH and NVRAM.
According to another aspect of the present invention, as shown in fig. 5, there is also provided a computer-readable storage medium 40, the computer-readable storage medium 40 storing a computer program 410 which, when executed by a processor, performs the above method.
Finally, it should be noted that, as will be appreciated by those skilled in the art, all or part of the procedures in implementing the methods of the embodiments described above may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the program may include the procedures of the embodiments of the methods described above when executed. The storage medium of the program may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (RAM), or the like. The computer program embodiments described above may achieve the same or similar effects as any of the method embodiments described above.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (10)

1. A method for repairing hardware of a server, comprising:
Dividing a firmware partition in a storage space of an expander nonvolatile storage device, and writing firmware files corresponding to all bottom hardware of a server into the firmware partition;
detecting the running state of each bottom hardware through the expander;
Judging whether the large system is loaded successfully or not in response to detecting that the running state of the underlying hardware is abnormal;
responding to the fact that the large system is not loaded successfully, and reading a firmware file corresponding to bottom-layer hardware with abnormal operation from the firmware partition through the expander;
and writing the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restarting the large system after the writing is completed so as to complete the repair of the bottom hardware with abnormal operation.
2. The method of claim 1, wherein partitioning the firmware partition in the memory space of the expander nonvolatile memory device comprises:
and dividing a firmware partition and a temporary storage partition in the storage space of the expander nonvolatile storage device.
3. The method of claim 2, further comprising, after determining whether the large system is loaded successfully in response to detecting an operational state exception with the underlying hardware:
responding to successful loading of the large system, acquiring a firmware file corresponding to the bottom hardware with abnormal operation through the large system, sending the firmware file acquired through the large system to an expander, and executing the following steps based on the expander:
Storing the firmware file acquired by the large system in a temporary storage partition;
reading the firmware file acquired by the large system from the temporary storage partition;
and driving JTAG of the bottom hardware with abnormal operation to write the firmware file acquired by the large system into the bottom hardware with abnormal operation, and restarting the large system after the writing is completed.
4. The method of claim 1, wherein detecting, by the expander, an operational state of each underlying hardware comprises:
And inquiring the heartbeat information of each bottom layer hardware through the expander period to detect the running state of each bottom layer hardware.
5. The method of claim 1, wherein detecting, by the expander, an operational state of each underlying hardware comprises:
And reading a register of the bottom hardware through the expander to confirm the running state of the bottom hardware, and synchronizing the running state of the bottom hardware to an upper-layer large system.
6. The method of claim 1, wherein writing the corresponding firmware file to the abnormally-running underlying hardware by the expander comprises:
And driving JTAG of the bottom hardware with abnormal operation through the expander to write the corresponding firmware file into the bottom hardware with abnormal operation.
7. The method of claim 1, wherein the non-volatile memory device comprises any one of: FLASH and NVRAM.
8. A hardware repair system for a server, comprising:
The writing module is configured to divide a firmware partition in a storage space of the expander nonvolatile storage device and write firmware files corresponding to all bottom hardware of the server into the firmware partition;
the detection module is configured to detect the running state of each bottom layer hardware through the expander;
the judging module is configured to respond to the detection of the abnormal running state of the underlying hardware and judge whether the loading of the large system is successful or not;
the reading module is configured to respond to the fact that the large system is not loaded successfully, and the firmware file corresponding to the bottom-layer hardware with abnormal operation is read from the firmware partition through the expander;
And the repair module is configured to write the corresponding firmware file into the bottom hardware with abnormal operation through the expander, and restart the large system after the writing is completed so as to complete the repair of the bottom hardware with abnormal operation.
9. A computer device, comprising:
At least one processor; and
A memory storing a computer program executable on the processor, wherein the processor performs the steps of the method of any one of claims 1 to 7 when the program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor performs the steps of the method according to any one of claims 1 to 7.
CN202210655271.5A 2022-06-10 2022-06-10 Hardware repairing method, system, computer equipment and medium of server Active CN115048244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210655271.5A CN115048244B (en) 2022-06-10 2022-06-10 Hardware repairing method, system, computer equipment and medium of server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210655271.5A CN115048244B (en) 2022-06-10 2022-06-10 Hardware repairing method, system, computer equipment and medium of server

Publications (2)

Publication Number Publication Date
CN115048244A CN115048244A (en) 2022-09-13
CN115048244B true CN115048244B (en) 2024-06-07

Family

ID=83160479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210655271.5A Active CN115048244B (en) 2022-06-10 2022-06-10 Hardware repairing method, system, computer equipment and medium of server

Country Status (1)

Country Link
CN (1) CN115048244B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112230939A (en) * 2020-09-01 2021-01-15 西安广和通无线软件有限公司 Hardware module repairing method and device, computer equipment and storage medium
CN113448760A (en) * 2021-06-05 2021-09-28 山东英信计算机技术有限公司 Method, system, equipment and medium for recovering abnormal state of hard disk

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201417536A (en) * 2012-10-24 2014-05-01 Hon Hai Prec Ind Co Ltd Method and system for automatically managing servers
US20200201568A1 (en) * 2018-12-20 2020-06-25 Micron Technology, Inc. Exception handling based on responses to memory requests in a memory subsystem

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112230939A (en) * 2020-09-01 2021-01-15 西安广和通无线软件有限公司 Hardware module repairing method and device, computer equipment and storage medium
CN113448760A (en) * 2021-06-05 2021-09-28 山东英信计算机技术有限公司 Method, system, equipment and medium for recovering abnormal state of hard disk

Also Published As

Publication number Publication date
CN115048244A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN111290918B (en) Server running state monitoring method and device and computer readable storage medium
CN107660289B (en) Automatic network control
KR101712172B1 (en) The preliminary diagnosis and analysis and recovery system of computer error, and method thereof
CN110225078B (en) Application service updating method, system and terminal equipment
CN112099825B (en) Method, device, equipment and storage medium for upgrading component
CN114884796B (en) Fault processing method and device, electronic equipment and storage medium
CN111897697B (en) Server hardware fault repairing method and device
CN115994044B (en) Database fault processing method and device based on monitoring service and distributed cluster
WO2024148857A1 (en) Method and apparatus for filtering root cause of server fault, and non-volatile readable storage medium and electronic apparatus
CN114528350B (en) Cluster brain fracture processing method, device, equipment and readable storage medium
CN112068935A (en) Method, device and equipment for monitoring deployment of kubernets program
CN113703823A (en) BMC (baseboard management controller) firmware upgrading method and device, electronic equipment and storage medium
CN115048244B (en) Hardware repairing method, system, computer equipment and medium of server
CN111124724B (en) Node fault testing method and device of distributed block storage system
CN110968456B (en) Method and device for processing fault disk in distributed storage system
CN113778763B (en) Intelligent switching method and system for three-way interface service faults
CN107273291B (en) Processor debugging method and system
CN115269252A (en) Application program fault processing method, device, equipment and storage medium
CN105677515A (en) Online backup method and system for database
CN111611142A (en) Information collection method, device and storage medium
CN114978891B (en) Processing method, device and storage medium for BIOS configuration of network device
CN113656208B (en) Data processing method, device, equipment and storage medium of distributed storage system
US20240160506A1 (en) Operation support apparatus, system, method, and computer-readable medium
CN118626210A (en) Task execution method, device, computer equipment and storage medium
CN117439863A (en) Alarm processing method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant