CN112445640A - Server downtime fault positioning and isolating system and method - Google Patents

Server downtime fault positioning and isolating system and method Download PDF

Info

Publication number
CN112445640A
CN112445640A CN202011116419.5A CN202011116419A CN112445640A CN 112445640 A CN112445640 A CN 112445640A CN 202011116419 A CN202011116419 A CN 202011116419A CN 112445640 A CN112445640 A CN 112445640A
Authority
CN
China
Prior art keywords
loaded
equipment
loading
current
fru
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011116419.5A
Other languages
Chinese (zh)
Inventor
叶明洋
王鹏
张敏
付水论
杨德晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202011116419.5A priority Critical patent/CN112445640A/en
Publication of CN112445640A publication Critical patent/CN112445640A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a server downtime fault positioning and isolating system.A BIOS acquires FRU storage information mounted under a BMC (baseboard management controller), compares current equipment to be loaded with all non-fault loading equipment information stored in the FRU after real-time update, and loads the current equipment to be loaded if all non-fault loading equipment information stored in the FRU after real-time update comprises the current equipment to be loaded; the BMC acquires the loading time of the BIOS current device to be loaded in the timing module, and judges whether the server system is down according to the loading time of the current device to be loaded; if the server system is down, the BMC removes the information of the equipment to be loaded currently in the FRU so as to realize the positioning and isolation of the fault equipment.

Description

Server downtime fault positioning and isolating system and method
Technical Field
The invention relates to the field of server faults, in particular to a system and a method for positioning and isolating a server downtime fault.
Background
With the development of information technology, the configuration of the server is more and more abundant at present, and various requirements can be met. Since the server typically runs critical application software, the reliability requirements of the system are very high.
However, due to the continuous abundance of server configuration, various devices can be accessed into the system, which leads to the continuous improvement of the complexity of system service application and the continuous increase of the probability of instability of the system. The problem of server downtime occurs in a machine room, and the operation of service application is seriously influenced.
In the current design, the downtime phenomenon needs to be judged and confirmed manually, and meanwhile, for the positioning of the fault equipment, an engineer needs to perform recurrence phenomenon testing and repeated verification according to own experience, so that a large amount of time and manpower are consumed, and the efficiency of judging and positioning the downtime fault of the server is not improved.
Disclosure of Invention
The invention provides a system and a method for positioning and isolating the downtime fault of a server in order to solve the problems in the prior art, effectively solves the problem that a large amount of time and labor are consumed due to artificial judgment and verification, and effectively improves the efficiency of judging and positioning the downtime fault of the server.
The invention provides a server downtime fault positioning and isolating system in a first aspect, which comprises: the system comprises a BIOS, a BMC, an FRU, a PCH and a timing module, wherein the FRU is mounted on the BMC and stores all non-fault loading equipment information which is updated in real time; the BIOS is in communication connection with the BMC through the PCH, acquires FRU storage information mounted under the BMC, compares current equipment to be loaded with all real-time updated loading equipment information which is stored in the FRU and does not have faults, and loads the current equipment to be loaded if all real-time updated loading equipment information which is stored in the FRU and does not have faults comprises the current equipment to be loaded; if all the updated information of the loading equipment which is not failed in real time and stored by the FRU does not comprise the current equipment to be loaded, the BIOS continues to load the next equipment to be loaded; the BMC is in communication connection with the timing module, acquires the loading time of the BIOS current device to be loaded in the timing module, and judges whether the server system is down according to the loading time of the current device to be loaded; and if the server system is down, the BMC removes the current equipment information to be loaded in the FRU so as to realize the positioning isolation of the fault equipment.
Optionally, the information of the loading device includes a loading time preset threshold corresponding to each loading device that has not failed.
Further, the specifically step of judging whether the server system is down according to the loading time of the current device to be loaded is as follows:
and judging whether the loading time of the current equipment to be loaded is greater than a preset threshold of the loading time corresponding to the current equipment to be loaded, if so, shutting down the server system, and if not, shutting down the server system.
Optionally, the timing module is a CPLD.
The second aspect of the present invention provides a method for locating and isolating a server downtime fault, which is implemented based on the system for locating and isolating a server downtime fault according to the first aspect of the present invention, and comprises:
after the system is powered on, the BIOS acquires all the real-time updated loading equipment information which is stored in the FRU mounted under the BMC and is not in fault, compares the current equipment to be loaded with all the real-time updated loading equipment information which is stored in the FRU and is not in fault, and if all the real-time updated loading equipment information which is stored in the FRU and is not in fault comprises the current equipment to be loaded, the BIOS loads the current equipment to be loaded;
the BMC acquires the loading time of the BIOS current device to be loaded in the timing module, and judges whether the server system is down according to the loading time of the current device to be loaded; if the server system is down, the BMC removes the current equipment information to be loaded in the FRU to realize fault equipment isolation;
and if all the updated information of the loading equipment which is not failed in real time and stored by the FRU does not comprise the current equipment to be loaded, the BIOS continues to load the next equipment to be loaded.
Optionally, the method further comprises: and starting the server system until all the equipment to be loaded finish loading and no downtime occurs in the loading process.
Optionally, the BMC may remove the information of the current device to be loaded in the FRU, so as to implement fault device isolation, and then:
and the BMC sets suggestive information and records the current downtime phenomenon and the current equipment to be loaded.
Optionally, the specifically step of judging whether the server system is down according to the loading time of the current device to be loaded is:
and judging whether the loading time of the current equipment to be loaded is greater than a preset threshold of the loading time corresponding to the current equipment to be loaded, if so, shutting down the server system, and if not, shutting down the server system.
Optionally, the timing module records the loading time of the current device to be loaded, and after the recording of the loading time of the current device to be loaded is completed, the timing module is cleared to zero to record the loading time of the next device to be loaded.
Further, the BIOS loads the devices to be loaded in sequence according to the stored HOB list, and the sum of the number of the devices to be loaded stored in the HOB list is not less than the number of all the devices which are not failed and stored in the FRU after being updated in real time.
The technical scheme adopted by the invention comprises the following technical effects:
1. the invention effectively solves the problem of large time consumption and manpower consumption caused by artificial judgment and verification, realizes automatic positioning and isolation of the equipment to be loaded when the server is down, and effectively improves the efficiency of judging and positioning the down fault of the server.
2. According to the technical scheme, the information of the loading equipment comprises a loading time preset threshold of the loading equipment, whether the loading equipment is down is judged according to whether the loading time of the current equipment to be loaded is greater than the loading time preset threshold of the equipment to be loaded, the loading condition of the corresponding equipment to be loaded can be judged according to the loading time preset threshold of each equipment to be loaded, different loading time preset thresholds can be conveniently set according to the actual condition of each equipment to be loaded, the down judgment is carried out, and the flexibility of judging the down condition of different equipment to be loaded is improved.
3. According to the technical scheme, the BMC removes the information of the current equipment to be loaded in the FRU so as to set the prompting information after the fault equipment is isolated, records the current downtime phenomenon and the current equipment to be loaded, is convenient to realize the positioning of the fault equipment and the test and analysis of the later downtime situation, and avoids the downtime phenomenon needing to be repeated for many times.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without any creative effort.
FIG. 1 is a schematic diagram of a system according to an embodiment of the present invention;
FIG. 2 is a schematic flow diagram of a second method embodiment of the present invention;
FIG. 3 is a schematic flow diagram of a third embodiment of a method according to aspects of the present invention;
fig. 4 is a schematic flow diagram of an embodiment of the tetragonal method in accordance with the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
Example one
As shown in fig. 1, the present invention provides a server downtime fault positioning and isolating system, which includes: the system comprises a BIOS1, a BMC2, an FRU3, a PCH4 and a timing module 5, wherein the FRU3 is mounted on the BMC2 and stores all information of the loaded equipment which is not failed and is updated in real time; the BIOS1 is in communication connection with the BMC2 through the PCH4, obtains FRU3 storage information mounted under the BMC2, compares the current equipment to be loaded with all real-time updated loading equipment information which is stored in the FRU3 and does not have faults, and if all real-time updated loading equipment information which is stored in the FRU3 and does not have faults comprises the current equipment to be loaded, the BIOS1 loads the current equipment to be loaded; if all the updated information of the loading equipment which is not failed in real time and stored by the FRU3 does not comprise the current equipment to be loaded, the BIOS1 continues to load the next equipment to be loaded; the BMC2 is in communication connection with the timing module 5, acquires the loading time of the current equipment to be loaded of the BIOS1 in the timing module 5, and judges whether the server system is down according to the loading time of the current equipment to be loaded; if the server system is down, the BMC2 removes the current device information to be loaded in the FRU3 to realize the location isolation of the fault device.
The BIOS1(Basic Input Output System, BIOS) and the BMC2 (Basic board Manager Controller, board management Controller) are communicatively connected through a P CH4 (integrated south bridge), specifically, the BI OS1 and the PCH4 are connected through a Serial Peripheral Interface (SPI) Bus, the PCH4 is communicatively connected to a BMC2 through an LPC (Low pin Bus), and the BMC2 controls an enable signal (FLASH _ CS) of the BIOS1 FLASH through the PCH 4. The BMC2 is communicatively coupled to the FRU3(Field R eplace Unit) and the timing module 5 via the I2C bus.
The information of the loading equipment comprises a loading time preset threshold corresponding to each loading equipment which does not fail; whether the equipment to be loaded is down is judged according to whether the loading time of the current equipment to be loaded is greater than a preset threshold of the loading time of the equipment to be loaded, the loading condition of the corresponding equipment to be loaded can be judged according to the preset threshold of the loading time of each equipment to be loaded, different preset thresholds of the loading time can be conveniently set according to the actual condition of each equipment to be loaded, the down judgment is carried out, and the flexibility of judging the down condition of different equipment to be loaded is improved. Specifically, the preset loading time threshold of each device to be loaded may be obtained by the preset loading time threshold of each device to be loaded after being updated in real time stored in the FRU3, and the information of all non-failed loading devices after being updated in real time in the FRU3 may be stored in the form of a data list or a database, for example, the name of each non-failed loading device — the preset loading time threshold … … corresponding to the loading device
Judging whether the server system is down according to the loading time of the current equipment to be loaded is specifically as follows: and judging whether the loading time of the current equipment to be loaded is greater than a preset threshold value of the loading time of the current equipment to be loaded, if so, shutting down the server system, and if not, shutting down the server system.
Specifically, the timing module 5 may be a CPLD (Complex Programming logic device) for recording the loading time of the device to be loaded, and after the recording of the loading time of the current device to be loaded is completed, the timing module 5 is cleared to zero to record the loading time of the next device to be loaded.
Further, if the server system is down, the BMC2 will remove the device information currently to be loaded in the FRU3 to implement location isolation of the faulty device, and the BMC2 sets the suggestive information to record the current down phenomenon and the device currently to be loaded. The specific record form may be a log form, and the present invention is not limited herein.
The BIOS1 loads the devices to be loaded in sequence according to the stored HOB list (list of sequences in which the BIOS loads the devices to be loaded), where the total number of devices to be loaded stored in the HOB list is not less than the number of all non-failed devices stored in the FRU3 after real-time update. The number of real-time updated load devices stored in FRU3 is the number of all non-failed load devices in FRU3 that were updated in real time.
The invention effectively solves the problem of large time consumption and manpower consumption caused by artificial judgment and verification, realizes automatic positioning and isolation of the equipment to be loaded when the server is down, and effectively improves the efficiency of judging and positioning the down fault of the server.
According to the technical scheme, the information of the loading equipment comprises a loading time preset threshold of the loading equipment, whether the loading equipment is down is judged according to whether the loading time of the current equipment to be loaded is greater than the loading time preset threshold of the equipment to be loaded, the loading condition of the corresponding equipment to be loaded can be judged according to the loading time preset threshold of each equipment to be loaded, different loading time preset thresholds can be conveniently set according to the actual condition of each equipment to be loaded, the down judgment is carried out, and the flexibility of judging the down condition of different equipment to be loaded is improved.
Example two
As shown in fig. 2, the technical solution of the present invention further provides a method for locating and isolating a server downtime fault, which is implemented based on the first embodiment of the present invention, and includes:
s1, after the system is powered on, the BIOS acquires all the real-time updated loading equipment information which is stored in the FRU under the BMC, and compares the current equipment to be loaded with all the real-time updated loading equipment information which is stored in the FRU and does not have faults;
s2, judging whether all the real-time updated information of the loading equipment without faults stored in the FRU includes the current equipment to be loaded, if so, executing the step S3; if the judgment result is no, executing step S5;
s3, loading the current equipment to be loaded by the BIOS;
s4, the BMC acquires the loading time of the BIOS current device to be loaded in the timing module, and judges whether the server system is down according to the loading time of the current device to be loaded; if the judgment result is yes, executing step S6; if the judgment result is no, executing step S5;
s5, the BIOS continues to load the next device to be loaded;
s6, the BMC removes the current device information to be loaded in the FRU to realize the fault device isolation.
In step S1, after the system is powered on, the BIOS communicates with the BMC through the PCH to obtain all the non-failure loaded device information stored in the FRU mounted under the BMC after being updated in real time, and compares the current device to be loaded with all the non-failure loaded device information stored in the FRU after being updated in real time; the loading device information includes a loading time preset threshold corresponding to each loading device that has not failed.
In step S4, the step of determining whether the server system is down according to the loading time of the current device to be loaded is specifically: and judging whether the loading time of the current equipment to be loaded is greater than a preset threshold of the loading time corresponding to the current equipment to be loaded, if so, shutting down the server system, and if not, shutting down the server system. The preset threshold value of the loading time corresponding to the equipment to be loaded can be flexibly adjusted and determined according to the actual conditions of the type and the like of each loading equipment, so that different preset threshold values of the loading time can be conveniently set according to the actual conditions of each equipment to be loaded, downtime judgment is carried out, and the flexibility of downtime judgment of different equipment to be loaded is improved. Specifically, the preset loading time threshold of each device to be loaded may be obtained by the preset loading time threshold of each device to be loaded after being updated in real time stored in the FRU3, and the information of all non-failed loading devices after being updated in real time in the FRU3 may be stored in the form of a data list or a database, for example, the name of each non-failed loading device — the preset loading time threshold … … corresponding to the loading device
And the timing module records the loading time of the current equipment to be loaded, and after the recording of the loading time of the current equipment to be loaded is finished, the timing module is reset to record the loading time of the next equipment to be loaded.
Specifically, the timing module may be a CPLD, or other types of timing modules, and may be flexibly adjusted in practical application, which is not limited herein.
And sequentially loading the equipment to be loaded by the BIOS according to a stored HOB list (a list of the order of loading the equipment to be loaded by the BIOS), wherein the sum of the number of the equipment to be loaded stored in the HOB list is not less than the number of all the loading equipment which are not failed and stored in the FRU after being updated in real time. The number of the loading devices which are stored in the FRU and updated in real time is the number of all the loading devices which are not failed and updated in real time in the FRU.
The invention effectively solves the problem of large time consumption and manpower consumption caused by artificial judgment and verification, realizes automatic positioning and isolation of the equipment to be loaded when the server is down, and effectively improves the efficiency of judging and positioning the down fault of the server.
According to the technical scheme, the information of the loading equipment comprises a loading time preset threshold of the loading equipment, whether the loading equipment is down is judged according to whether the loading time of the current equipment to be loaded is greater than the loading time preset threshold of the equipment to be loaded, the loading condition of the corresponding equipment to be loaded can be judged according to the loading time preset threshold of each equipment to be loaded, different loading time preset thresholds can be conveniently set according to the actual condition of each equipment to be loaded, the down judgment is carried out, and the flexibility of judging the down condition of different equipment to be loaded is improved.
EXAMPLE III
As shown in fig. 3, the technical solution of the present invention further provides a method for locating and isolating a server downtime fault, which is implemented based on the first embodiment of the present invention, and includes:
s1, after the system is powered on, the BIOS acquires all the real-time updated loading equipment information which is stored in the FRU under the BMC, and compares the current equipment to be loaded with all the real-time updated loading equipment information which is stored in the FRU and does not have faults;
s2, judging whether all the real-time updated information of the loading equipment without faults stored in the FRU includes the current equipment to be loaded, if so, executing the step S3; if the judgment result is no, executing step S5;
s3, loading the current equipment to be loaded by the BIOS;
s4, the BMC acquires the loading time of the BIOS current device to be loaded in the timing module, and judges whether the server system is down according to the loading time of the current device to be loaded; if the judgment result is yes, executing step S6; if the judgment result is no, executing step S5;
s5, the BIOS continues to load the next device to be loaded;
s6, the BMC removes the current equipment information to be loaded in the FRU to realize fault equipment isolation;
and S7, starting the server system until all the devices to be loaded are loaded and the downtime does not occur in the loading process.
In step S7, after the BIOS completes all the devices to be loaded, the server system is normally started up if the server system is not down.
Example four
As shown in fig. 4, the technical solution of the present invention further provides a method for locating and isolating a server downtime fault, which is implemented based on the first embodiment of the present invention, and includes:
s1, after the system is powered on, the BIOS acquires all the real-time updated loading equipment information which is stored in the FRU under the BMC, and compares the current equipment to be loaded with all the real-time updated loading equipment information which is stored in the FRU and does not have faults;
s2, judging whether all the real-time updated information of the loading equipment without faults stored in the FRU includes the current equipment to be loaded, if so, executing the step S3; if the judgment result is no, executing step S5;
s3, loading the current equipment to be loaded by the BIOS;
s4, the BMC acquires the loading time of the BIOS current device to be loaded in the timing module, and judges whether the server system is down according to the loading time of the current device to be loaded; if the judgment result is yes, executing step S6; if the judgment result is no, executing step S5;
s5, the BIOS continues to load the next device to be loaded;
s6, the BMC removes the current equipment information to be loaded in the FRU to realize fault equipment isolation;
s7, setting prompting information by the BMC, and recording the current downtime phenomenon and the current equipment to be loaded;
and S8, starting the server system until all the devices to be loaded are loaded and the downtime does not occur in the loading process.
According to the technical scheme, the BMC removes the information of the current equipment to be loaded in the FRU so as to set the prompting information after the fault equipment is isolated, records the current downtime phenomenon and the current equipment to be loaded, is convenient to realize the positioning of the fault equipment and the test and analysis of the later downtime situation, and avoids the downtime phenomenon needing to be repeated for many times.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A server downtime fault positioning and isolating system is characterized by comprising: the system comprises a BIOS, a BMC, an FRU, a PCH and a timing module, wherein the FRU is mounted on the BMC and stores all non-fault loading equipment information which is updated in real time; the BIOS is in communication connection with the BMC through the PCH, acquires FRU storage information mounted under the BMC, compares current equipment to be loaded with all real-time updated loading equipment information which is stored in the FRU and does not have faults, and loads the current equipment to be loaded if all real-time updated loading equipment information which is stored in the FRU and does not have faults comprises the current equipment to be loaded; if all the updated information of the loading equipment which is not failed in real time and stored by the FRU does not comprise the current equipment to be loaded, the BIOS continues to load the next equipment to be loaded; the BMC is in communication connection with the timing module, acquires the loading time of the BIOS current device to be loaded in the timing module, and judges whether the server system is down according to the loading time of the current device to be loaded; and if the server system is down, the BMC removes the current equipment information to be loaded in the FRU so as to realize the positioning isolation of the fault equipment.
2. The system of claim 1, wherein the information about the loading devices comprises a preset threshold value of loading time corresponding to each loading device that has not failed.
3. The system for locating and isolating the downtime of the server according to claim 2, wherein the step of judging whether the server system is down according to the loading time of the current device to be loaded is specifically as follows:
and judging whether the loading time of the current equipment to be loaded is greater than a preset threshold of the loading time corresponding to the current equipment to be loaded, if so, shutting down the server system, and if not, shutting down the server system.
4. The system for locating and isolating the server downtime according to any one of claims 1 to 3, wherein the timing module is a CPLD.
5. A method for locating and isolating a server downtime fault, which is implemented based on the system for locating and isolating a server downtime fault according to any one of claims 1 to 4, and comprises the following steps:
after the system is powered on, the BIOS acquires all the real-time updated loading equipment information which is stored in the FRU mounted under the BMC and is not in fault, compares the current equipment to be loaded with all the real-time updated loading equipment information which is stored in the FRU and is not in fault, and if all the real-time updated loading equipment information which is stored in the FRU and is not in fault comprises the current equipment to be loaded, the BIOS loads the current equipment to be loaded;
the BMC acquires the loading time of the BIOS current device to be loaded in the timing module, and judges whether the server system is down according to the loading time of the current device to be loaded; if the server system is down, the BMC removes the current equipment information to be loaded in the FRU to realize fault equipment isolation;
and if all the updated information of the loading equipment which is not failed in real time and stored by the FRU does not comprise the current equipment to be loaded, the BIOS continues to load the next equipment to be loaded.
6. The method for locating and isolating the server downtime fault according to claim 5, further comprising: and starting the server system until all the equipment to be loaded finish loading and no downtime occurs in the loading process.
7. The method for locating and isolating the server downtime fault according to claim 5, wherein the BMC is further configured to remove the device information to be loaded currently in the FRU to implement the fault device isolation, and then is further configured to:
and the BMC sets suggestive information and records the current downtime phenomenon and the current equipment to be loaded.
8. The method for locating and isolating the server downtime fault according to claim 5, wherein the step of judging whether the server system is downtime according to the loading time of the current device to be loaded is specifically as follows:
and judging whether the loading time of the current equipment to be loaded is greater than a preset threshold of the loading time corresponding to the current equipment to be loaded, if so, shutting down the server system, and if not, shutting down the server system.
9. The method for locating and isolating the downtime of the server according to claims 5 to 8, wherein the timing module records the loading time of the current device to be loaded, and after the recording of the loading time of the current device to be loaded is completed, the timing module is cleared to zero to record the loading time of the next device to be loaded.
10. The method for locating and isolating the server down fault according to any one of claims 5 to 8, wherein the BIOS loads the devices to be loaded in sequence according to a stored HOB list, and the number of the devices to be loaded stored in the HOB list is not less than the number of all the devices which are not faulty and stored in the FRU after being updated in real time.
CN202011116419.5A 2020-10-19 2020-10-19 Server downtime fault positioning and isolating system and method Withdrawn CN112445640A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011116419.5A CN112445640A (en) 2020-10-19 2020-10-19 Server downtime fault positioning and isolating system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011116419.5A CN112445640A (en) 2020-10-19 2020-10-19 Server downtime fault positioning and isolating system and method

Publications (1)

Publication Number Publication Date
CN112445640A true CN112445640A (en) 2021-03-05

Family

ID=74735548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011116419.5A Withdrawn CN112445640A (en) 2020-10-19 2020-10-19 Server downtime fault positioning and isolating system and method

Country Status (1)

Country Link
CN (1) CN112445640A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800025A (en) * 2018-12-13 2019-05-24 平安普惠企业管理有限公司 Page loading method, device, equipment and storage medium
CN109947586A (en) * 2019-03-20 2019-06-28 浪潮商用机器有限公司 A kind of method, apparatus and medium of isolated fault equipment
CN111526207A (en) * 2020-05-06 2020-08-11 金蝶软件(中国)有限公司 Data transmission method and related equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800025A (en) * 2018-12-13 2019-05-24 平安普惠企业管理有限公司 Page loading method, device, equipment and storage medium
CN109947586A (en) * 2019-03-20 2019-06-28 浪潮商用机器有限公司 A kind of method, apparatus and medium of isolated fault equipment
CN111526207A (en) * 2020-05-06 2020-08-11 金蝶软件(中国)有限公司 Data transmission method and related equipment

Similar Documents

Publication Publication Date Title
CN111312325B (en) BBU fault diagnosis method and device, electronic equipment and storage medium
CN111274077A (en) Disk array reliability testing method, system, terminal and storage medium
CN113378403B (en) Simulation test modeling method, system, test method, device and storage medium
CN109167701B (en) Consistency checking method, device and system for power distribution automation standardization extension communication protocol
CN114117973A (en) Logic synthesis method, device and storage medium
CN112073263A (en) Method, system, equipment and medium for testing and monitoring reliability of white box switch
CN112100085B (en) Android application program stability testing method, device and equipment
CN111597181B (en) Distributed heterogeneous data cleaning system based on visual management
CN111078476B (en) Network card drive firmware stability test method, system, terminal and storage medium
CN110990289B (en) Method and device for automatically submitting bug, electronic equipment and storage medium
CN112445640A (en) Server downtime fault positioning and isolating system and method
CN111240913A (en) Server DQS error-reporting memory batch test method and device
CN111707966A (en) CPLD electric leakage detection method and device
CN111552584B (en) Testing system, method and device for satellite primary fault diagnosis isolation and recovery function
CN104678292A (en) Test method and device for CPLD (Complex Programmable Logic Device)
CN107167675A (en) A kind of ageing testing method and device of CANBus terminals
CN116449810B (en) Fault detection method and device, electronic equipment and storage medium
CN113568842B (en) Automatic testing method and system for batch tasks
CN115065628B (en) Automatic test method and test system for fault code self-clearing of controller without sleep strategy
CN113609577B (en) Automobile electric appliance principle inspection method
CN112034296B (en) Avionics fault injection system and method
CN117112443A (en) BootLoader test method, device and system
CN107390115B (en) Method for detecting SC serial port and MC serial port of IO in raid memory in batch
CN115408268A (en) OTA small flow flash test system and method
CN116089283A (en) Monitoring test method, system, equipment and readable medium for simulating quasi-production environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210305

WW01 Invention patent application withdrawn after publication