CN115687026A - Multi-node server fault early warning method, device, equipment and medium - Google Patents

Multi-node server fault early warning method, device, equipment and medium Download PDF

Info

Publication number
CN115687026A
CN115687026A CN202211190559.6A CN202211190559A CN115687026A CN 115687026 A CN115687026 A CN 115687026A CN 202211190559 A CN202211190559 A CN 202211190559A CN 115687026 A CN115687026 A CN 115687026A
Authority
CN
China
Prior art keywords
fault
node server
preset
early
warned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211190559.6A
Other languages
Chinese (zh)
Inventor
刘传旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211190559.6A priority Critical patent/CN115687026A/en
Publication of CN115687026A publication Critical patent/CN115687026A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application discloses a multi-node server fault early warning method, a device, equipment and a medium, which relate to the technical field of computers and are applied to a multi-node server case, and the method comprises the following steps: determining a fault node server and a node server to be early-warned from all node servers of the multi-node server chassis; acquiring first fault information returned by the fault node server, and determining a preset fault type of the first fault information; correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result, and judging whether the detection result meets a preset fault condition; if the first failure information meets the requirement, recording corresponding second failure information in the node server to be early-warned, and generating corresponding warning information based on the second failure information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type. The fault early warning of the multi-node server can be realized.

Description

Multi-node server fault early warning method, device, equipment and medium
Technical Field
The invention relates to the technical field of computers, in particular to a multi-node server fault early warning method, a multi-node server fault early warning device, a multi-node server fault early warning equipment and a multi-node server fault early warning medium.
Background
A multi-node server typically consists of one chassis and multiple node servers. A BMC (Baseboard Management Controller) module is provided on the node server and is responsible for Management control of a single node server; there is also a CMC (Chassis Management Controller) module on the Chassis, which is responsible for centrally managing the power supply, the fans of the whole Chassis and the BMC modules of the node servers.
Node servers with the same configuration are usually collocated in the same chassis, and when one node server fails, other node servers in the same chassis usually have the same failure risk.
In summary, when one node server in the chassis fails, the node server needs to be warned to other node servers, and the other node servers detect whether the same fault or a slight fault exists or not aiming at the type of fault, so that more serious problems are avoided. Therefore, how to implement the multi-node server failure early warning is a problem to be solved in the field.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a method, an apparatus, a device and a medium for early warning of a failure of a multi-node server, which can implement early warning of a failure of a multi-node server. The specific scheme is as follows:
in a first aspect, the present application discloses a multi-node server fault early warning method, applied to a multi-node server chassis, including:
determining a fault node server and a node server to be early-warned from all node servers of the multi-node server chassis;
acquiring first fault information returned by the fault node server, and determining a preset fault type of the first fault information;
correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result, and judging whether the detection result meets a preset fault condition or not;
if the first failure information meets the requirement, recording corresponding second failure information in the node server to be early-warned, and generating corresponding warning information based on the second failure information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type.
Optionally, before the obtaining of the failure information returned by the failed node server, the method further includes:
and detecting the fault node server through a substrate management controller of the fault node server to obtain first fault information.
Optionally, the obtaining first failure information returned by the failed node server includes:
and acquiring first fault information returned by the fault node server through a preset two-wire serial bus interface or a preset network interface.
Optionally, the determining the preset fault type of the first fault information includes:
and recording the first fault information, and processing the first fault information by using a substrate management controller in the node server to be pre-warned so as to determine a preset fault type of the first fault information.
Optionally, the performing, based on the preset fault type, corresponding detection on the node server to be early-warned to obtain a detection result includes:
and if the preset fault type is any one or more of a preset voltage fault type, a preset current fault type and a preset temperature fault type, detecting a corresponding position in the node server to be early-warned to obtain a detection result.
Optionally, the performing, based on the preset fault type, corresponding detection on the node server to be early-warned to obtain a detection result includes:
and if the preset fault type is any one or more of a preset CPU fault type, a preset memory fault type, a preset PCIE fault type and a preset hard disk fault type, detecting a corresponding module in the node server to be early-warned to obtain a detection result.
Optionally, the updating the corresponding fault early warning parameter in the node server to be early warned based on the preset fault type includes:
and updating any one or more fault early warning parameters of a corresponding fault early warning threshold value, a fault detection frequency and a corresponding warning grade in the node server to be early warned based on the preset fault type.
In a second aspect, the present application discloses a multi-node server fault early warning device, which is applied to a multi-node server chassis, and includes:
the node server determining module is used for determining a fault node server and a node server to be early-warned from all node servers of the multi-node server case;
the type determining module is used for acquiring first fault information returned by the fault node server and determining a preset fault type of the first fault information;
the judging module is used for correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result and judging whether the detection result meets a preset fault condition or not;
the early warning module is used for recording corresponding second fault information in the node server to be early warned if the second fault information is met, and generating corresponding warning information based on the second fault information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the steps of the multi-node server fault early warning method disclosed in the foregoing.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the multi-node server failure early warning method disclosed above.
Therefore, the fault node server and the node server to be early-warned are determined from all the node servers of the multi-node server case; acquiring first fault information returned by the fault node server, and determining a preset fault type of the first fault information; correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result, and judging whether the detection result meets a preset fault condition; if the first failure information meets the requirement, recording corresponding second failure information in the node server to be early-warned, and generating corresponding warning information based on the second failure information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type. Therefore, after a fault node server and a to-be-early-warned node server of a multi-node server case are determined, a preset fault type is determined by utilizing first fault information of the fault node server, and then corresponding fault early warning processing is carried out based on the preset fault type.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a multi-node server fault early warning method disclosed in the present application;
FIG. 2 is a flow chart of a specific multi-node server failure early warning method disclosed in the present application;
FIG. 3 is a schematic structural diagram of a multi-node server failure warning apparatus disclosed in the present application;
fig. 4 is a block diagram of an electronic device disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A multi-node server typically consists of one chassis and multiple node servers. The node server is provided with a BMC module which is responsible for management control of a single node server; the chassis is also provided with a CMC module which is responsible for centralized management of the power supply, the fan and the BMC module of each node server of the whole chassis.
Node servers with the same configuration are usually collocated in the same chassis, and when one node server fails, other node servers in the same chassis usually have the same failure risk.
Therefore, the multi-node server fault early warning scheme is correspondingly provided, and the multi-node server fault early warning can be realized.
Referring to fig. 1, an embodiment of the present application discloses a multi-node server fault early warning method, which is applied to a multi-node server chassis, and includes:
step S11: and determining a fault node server and a node server to be early-warned from all node servers of the multi-node server chassis.
In this embodiment, before the obtaining of the failure information returned by the failed node server, the method further includes: and detecting the fault node server through a substrate management controller of the fault node server to obtain first fault information. It can be understood that, for example, two node servers, namely, a node server a and a node server B exist in a multi-node server chassis, where a baseboard management controller of the node server a currently detects that the node server a fails and generates first failure information, and therefore the node server a is determined as a failed node server, and the current node server B is determined as a node server to be warned.
In the present embodimentThe obtaining of the first failure information returned by the failed node server specifically includes: and acquiring first fault information returned by the fault node server through a preset two-wire serial bus interface or a preset network interface. The node server A is connected with the node server A through a preset two-wire serial bus interface (I) 2 C) And the node server A can also send the first fault information to the baseboard management controller of the node server B through a preset network interface.
Step S12: and acquiring first fault information returned by the fault node server, and determining a preset fault type of the first fault information.
In this embodiment, the determining the preset fault type of the first fault information includes: and recording the first fault information, and processing the first fault information by using a substrate management controller in the node server to be pre-warned so as to determine a preset fault type of the first fault information. After the baseboard management controller of the node server B acquires the first fault information of the node server A, the first fault information is recorded, and then the baseboard management controller of the node server B processes the first fault information so as to determine a preset fault type of the first fault information. It can be understood that the first fault information may be obtained through a preset external Interface of the node server B, for example, the preset external Interface is a WEB (World Wide WEB, i.e., global Wide area network) Interface, an IPMI (Intelligent Platform Management Interface), where the preset external Interface corresponds to the preset two-wire serial bus Interface or the preset network Interface.
Step S13: and correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result, and judging whether the detection result meets a preset fault condition.
It can be understood that it is necessary to determine whether the detection result satisfies a preset fault condition, that is, determine whether the detection result represents that a fault corresponding to the first fault information occurs in the node server B.
Step S14: if yes, recording corresponding second fault information in the node server to be early-warned, and generating corresponding warning information based on the second fault information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type.
In this embodiment, if the first failure information is satisfied, it indicates that the node server B has a failure corresponding to the first failure information, and if the first failure information is not satisfied, it indicates that the node server B has not currently a failure corresponding to the first failure information. If the node server B has a fault corresponding to the first fault information, recording corresponding second fault information, generating alarm information corresponding to the second fault information, and displaying the alarm information to a preset platform to inform a user that corresponding fault processing is needed, so as to avoid generating larger faults. If the node server B does not currently have the fault corresponding to the first fault information, but the probability that the node server B has the fault corresponding to the first fault information is higher, the corresponding fault early warning parameters in the node server B may be updated, for example, the fault detection frequency and the fault threshold are increased, that is, the fault alertness corresponding to the first fault information is increased, so as to perform an early warning function.
Therefore, the fault node server and the node server to be early-warned are determined from all the node servers of the multi-node server case; acquiring first fault information returned by the fault node server, and determining a preset fault type of the first fault information; correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result, and judging whether the detection result meets a preset fault condition or not; if the first failure information meets the requirement, recording corresponding second failure information in the node server to be early-warned, and generating corresponding warning information based on the second failure information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type. Therefore, after a fault node server and a to-be-early-warned node server of a multi-node server case are determined, a preset fault type is determined by utilizing first fault information of the fault node server, and then corresponding fault early warning processing is carried out based on the preset fault type, therefore, when the fault node server exists in the multi-node server case, relevant detection is carried out on the to-be-early-warned server, if a detection result meets a preset fault condition, namely the to-be-early-warned server has a corresponding fault, corresponding warning information is generated, if the preset fault condition is not met, namely the to-be-early-warned server does not currently have a corresponding fault, fault early warning parameters can be updated, namely the fault alertness corresponding to the first fault information is improved, and multi-node server fault early warning is realized.
Referring to fig. 2, an embodiment of the present application discloses a specific multi-node server fault early warning method, which is applied to a multi-node server chassis, and includes:
step S21: and determining a fault node server and a node server to be early-warned from all node servers of the multi-node server chassis.
For a more specific working process of the step S21, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Step S22: and acquiring first fault information returned by the fault node server, and determining a preset fault type of the first fault information.
For more specific working process of the step S22, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Step S23: and correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result, and judging whether the detection result meets a preset fault condition.
In a specific embodiment, the performing, based on the preset fault type, corresponding detection on the node server to be early-warned to obtain a detection result specifically includes: and if the preset fault type is any one or more of a preset voltage fault type, a preset current fault type and a preset temperature fault type, detecting a corresponding position in the node server to be early-warned to obtain a detection result.
In another specific embodiment, the correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result specifically includes: and if the preset fault type is any one or more of a preset CPU fault type, a preset memory fault type, a preset PCIE fault type and a preset hard disk fault type, detecting a corresponding module in the node server to be early-warned to obtain a detection result.
Step S24: and if so, recording corresponding second fault information in the node server to be early-warned, and generating corresponding warning information based on the second fault information.
In this embodiment, the baseboard management controller of the node server B detects a fault at the same position as the first fault information with respect to the voltage, current, and temperature faults, and if the same fault is detected, the baseboard management controller of the node server B records the fault. For faults of a Central Processing Unit (CPU), a memory, a Peripheral Component Interconnect Express (PCIE), a hard disk, and the like, a BMC on the node server B needs to notify a Basic Input Output System (BIOS) to detect a relevant module, and meanwhile, a Baseboard Management Controller (BMC) of the node server B collects a register to analyze possible faults, and if the same faults are detected, the baseboard management controller of the node server B records the faults.
Step S25: and if not, updating any one or more fault early warning parameters of the corresponding fault early warning threshold value, fault detection frequency and warning grade in the node server to be early warned based on the preset fault type.
In this embodiment, for the preset fault type of the first fault information is any one fault type of voltage, current, and temperature, if the node server B does not detect a fault, the fault early warning threshold value of the same position on the node server B is increased, and the fault detection frequency of the same position on the node server B is increased. If the node server B does not detect a fault, the alarm levels of the same position and type on the node server B are raised, and the fault detection frequency of the same position on the node server B is increased.
Therefore, when a fault node server exists, the fault node server gives an early warning to the substrate management controllers of other node servers in the multi-node server case, so that after the substrate management controllers of the other node servers receive the first fault information, self-checking is carried out based on the fault type to detect whether corresponding faults exist, the fault detection capability of the multi-node server is improved, possible faults can be given an early warning, and serious faults are reduced.
Referring to fig. 3, an embodiment of the present application discloses a multi-node server fault early warning apparatus, which is applied to a multi-node server chassis, and includes:
a node server determining module 11, configured to determine a failed node server and a node server to be early-warned from all node servers of the multi-node server chassis;
the type determining module 12 is configured to obtain first fault information returned by the faulty node server, and determine a preset fault type of the first fault information;
the judging module 13 is configured to perform corresponding detection on the node server to be subjected to early warning based on the preset fault type to obtain a detection result, and judge whether the detection result meets a preset fault condition;
the early warning module 14 is configured to record corresponding second fault information in the node server to be early warned if the second fault information is met, and generate corresponding warning information based on the second fault information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type.
Therefore, the fault node server and the node server to be early-warned are determined from all the node servers of the multi-node server case; acquiring first fault information returned by the fault node server, and determining a preset fault type of the first fault information; correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result, and judging whether the detection result meets a preset fault condition; if the first failure information meets the requirement, recording corresponding second failure information in the node server to be early-warned, and generating corresponding warning information based on the second failure information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type. Therefore, after a fault node server and a to-be-early-warned node server of a multi-node server case are determined, a preset fault type is determined by utilizing first fault information of the fault node server, and then corresponding fault early warning processing is carried out based on the preset fault type, therefore, when the fault node server exists in the multi-node server case, relevant detection is carried out on the to-be-early-warned server, if a detection result meets a preset fault condition, namely the to-be-early-warned server has a corresponding fault, corresponding warning information is generated, if the preset fault condition is not met, namely the to-be-early-warned server does not currently have a corresponding fault, fault early warning parameters can be updated, namely the fault alertness corresponding to the first fault information is improved, and multi-node server fault early warning is realized.
In some embodiments, the multi-node server failure early warning apparatus includes:
and the first fault information generation unit is used for detecting the fault node server through a baseboard management controller of the fault node server to obtain first fault information.
In some embodiments, the type determining module 12 includes:
and the first fault information acquisition unit is used for acquiring first fault information returned by the fault node server through a preset two-wire serial bus interface or a preset network interface.
In some embodiments, the type determining module 12 includes:
and the fault type determining unit is used for recording the first fault information and processing the first fault information by using a substrate management controller in the node server to be early-warned so as to determine a preset fault type of the first fault information.
In some embodiments, the determining module 13 includes:
and the first detection result acquisition unit is used for detecting the corresponding position in the node server to be early-warned to obtain a detection result if the preset fault type is any one or more of a preset voltage fault type, a preset current fault type and a preset temperature fault type.
In some embodiments, the determining module 13 includes:
and the second detection result acquisition unit is used for detecting a corresponding module in the node server to be pre-warned to obtain a detection result if the preset fault type is any one or more of a preset CPU fault type, a preset memory fault type, a preset PCIE fault type and a preset hard disk fault type.
In some embodiments, the early warning module 14 includes:
and the early warning parameter updating unit is used for updating any one or more fault early warning parameters of a corresponding fault early warning threshold value, a fault detection frequency and a warning level in the node server to be early warned based on the preset fault type.
Further, the embodiment of the application also provides electronic equipment. FIG. 4 is a block diagram illustrating an electronic device 20 according to an exemplary embodiment, and the contents of the diagram should not be construed as limiting the scope of use of the present application in any way.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The method specifically comprises the following steps: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is adapted to store a computer program, which is loaded and executed by the processor 21, to implement the steps of:
determining a fault node server and a node server to be early-warned from all node servers of the multi-node server chassis;
acquiring first fault information returned by the fault node server, and determining a preset fault type of the first fault information;
correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result, and judging whether the detection result meets a preset fault condition;
if yes, recording corresponding second fault information in the node server to be early-warned, and generating corresponding warning information based on the second fault information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type.
In some embodiments, the processor, by executing the computer program stored in the memory, may specifically implement the following steps:
and detecting the fault node server through a substrate management controller of the fault node server to obtain first fault information.
In some embodiments, the processor may specifically implement the following steps by executing the computer program stored in the memory:
and acquiring first fault information returned by the fault node server through a preset two-wire serial bus interface or a preset network interface.
In some embodiments, the processor, by executing the computer program stored in the memory, may specifically implement the following steps:
and recording the first fault information, and processing the first fault information by using a substrate management controller in the node server to be pre-warned so as to determine a preset fault type of the first fault information.
In some embodiments, the processor, by executing the computer program stored in the memory, may specifically implement the following steps:
and if the preset fault type is any one or more of a preset voltage fault type, a preset current fault type and a preset temperature fault type, detecting a corresponding position in the node server to be early-warned to obtain a detection result.
In some embodiments, the processor may specifically implement the following steps by executing the computer program stored in the memory:
and if the preset fault type is any one or more of a preset CPU fault type, a preset memory fault type, a preset PCIE fault type and a preset hard disk fault type, detecting a corresponding module in the node server to be early-warned to obtain a detection result.
In some embodiments, the processor, by executing the computer program stored in the memory, may further include:
and updating any one or more fault early warning parameters of a corresponding fault early warning threshold value, a fault detection frequency and a corresponding warning grade in the node server to be early warned based on the preset fault type.
In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device; the communication interface 24 can create a data transmission channel between the electronic device and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.
The processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 21 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 21 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 21 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 21 may further include an AI (Artificial Intelligence) processor for processing a calculation operation related to machine learning.
In addition, the storage 22 is used as a carrier for storing resources, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., the resources stored thereon include an operating system 221, a computer program 222, data 223, etc., and the storage may be a transient storage or a permanent storage.
The operating system 221 is used for managing and controlling hardware devices and computer programs 222 on the electronic device, so as to implement operations and processing of the mass data 223 in the memory 22 by the processor 21, and may be Windows, unix, linux, or the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the multi-node server failure warning method performed by the electronic device disclosed in any of the foregoing embodiments. The data 223 may include data received by the electronic device and transmitted from an external device, or may include data collected by the input/output interface 25 itself.
Further, an embodiment of the present application further discloses a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is loaded and executed by a processor, the method steps executed in the multi-node server fault early warning process disclosed in any of the foregoing embodiments are implemented.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The method, the device, the equipment and the medium for early warning the fault of the multi-node server provided by the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A multi-node server fault early warning method is applied to a multi-node server chassis and comprises the following steps:
determining a fault node server and a node server to be early-warned from all node servers of the multi-node server case;
acquiring first fault information returned by the fault node server, and determining a preset fault type of the first fault information;
correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result, and judging whether the detection result meets a preset fault condition;
if yes, recording corresponding second fault information in the node server to be early-warned, and generating corresponding warning information based on the second fault information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type.
2. The multi-node server fault pre-warning method according to claim 1, wherein before the obtaining of the fault information returned by the faulty node server, the method further comprises:
and detecting the fault node server through a substrate management controller of the fault node server to obtain first fault information.
3. The multi-node server fault pre-warning method according to claim 1, wherein the acquiring first fault information returned by the faulty node server includes:
and acquiring first fault information returned by the fault node server through a preset two-wire serial bus interface or a preset network interface.
4. The multi-node server failure early warning method according to claim 1, wherein the determining a preset failure type of the first failure information comprises:
and recording the first fault information, and processing the first fault information by using a substrate management controller in the node server to be early warned so as to determine a preset fault type of the first fault information.
5. The multi-node server fault early warning method according to claim 1, wherein the correspondingly detecting the node server to be early warned based on the preset fault type to obtain a detection result comprises:
and if the preset fault type is any one or more of a preset voltage fault type, a preset current fault type and a preset temperature fault type, detecting a corresponding position in the node server to be early-warned to obtain a detection result.
6. The multi-node server fault early warning method according to claim 1, wherein the correspondingly detecting the node server to be early warned based on the preset fault type to obtain a detection result comprises:
and if the preset fault type is any one or more of a preset CPU fault type, a preset memory fault type, a preset PCIE fault type and a preset hard disk fault type, detecting a corresponding module in the node server to be early-warned to obtain a detection result.
7. The multi-node server fault early warning method according to any one of claims 1 to 6, wherein the updating of the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type includes:
and updating any one or more fault early warning parameters of a corresponding fault early warning threshold value, a fault detection frequency and a corresponding warning grade in the node server to be early warned based on the preset fault type.
8. The utility model provides a multinode server trouble early warning device which characterized in that is applied to multinode server machine case, includes:
the node server determining module is used for determining a fault node server and a node server to be early-warned from all node servers of the multi-node server case;
the type determining module is used for acquiring first fault information returned by the fault node server and determining a preset fault type of the first fault information;
the judging module is used for correspondingly detecting the node server to be early-warned based on the preset fault type to obtain a detection result and judging whether the detection result meets a preset fault condition or not;
the early warning module is used for recording corresponding second fault information in the node server to be early warned if the second fault information is met, and generating corresponding warning information based on the second fault information; and if not, updating the corresponding fault early warning parameters in the node server to be early warned based on the preset fault type.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the multi-node server failure warning method according to any of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the multi-node server failure warning method according to any of claims 1 to 7.
CN202211190559.6A 2022-09-28 2022-09-28 Multi-node server fault early warning method, device, equipment and medium Pending CN115687026A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211190559.6A CN115687026A (en) 2022-09-28 2022-09-28 Multi-node server fault early warning method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211190559.6A CN115687026A (en) 2022-09-28 2022-09-28 Multi-node server fault early warning method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115687026A true CN115687026A (en) 2023-02-03

Family

ID=85064012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211190559.6A Pending CN115687026A (en) 2022-09-28 2022-09-28 Multi-node server fault early warning method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115687026A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116684306A (en) * 2023-06-29 2023-09-01 苏州浪潮智能科技有限公司 Fault prediction method, device, equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116684306A (en) * 2023-06-29 2023-09-01 苏州浪潮智能科技有限公司 Fault prediction method, device, equipment and readable storage medium
CN116684306B (en) * 2023-06-29 2023-11-03 苏州浪潮智能科技有限公司 Fault prediction method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US9298800B2 (en) Discovering relationships between data processing environment components
EP2472402B1 (en) Remote management systems and methods for mapping operating system and management controller located in a server
US20130212257A1 (en) Computer program and monitoring apparatus
CN109088775B (en) Abnormity monitoring method and device and server
US10591970B2 (en) Industrial asset management systems and methods thereof
CN112380089A (en) Data center monitoring and early warning method and system
CN113672306B (en) Server component self-checking abnormity recovery method, device, system and medium
CN115687026A (en) Multi-node server fault early warning method, device, equipment and medium
CN108959025A (en) A kind of server alarm method, device and server
JP6002856B2 (en) Monitoring system and monitoring method
CN116225812B (en) Baseboard management controller system operation method, device, equipment and storage medium
CN115510064A (en) ES alarm data backfilling method, device, equipment and medium
CN110752972A (en) Network card state monitoring method, device, equipment and medium
CN113110970B (en) Method, device, equipment and medium for monitoring all parts in server working mode
CN114296979A (en) Method and device for detecting abnormal state of Internet of things equipment
CN112817827A (en) Operation and maintenance method, device, server, equipment, system and medium
CN102822806B (en) Detect the state that gets nowhere of application
CN111309532A (en) PCIE equipment abnormity detection method, system, electronic equipment and storage medium
CN111414274A (en) Far-end eliminating method for abnormal state of cabinet applied to data center
CN117349127B (en) GPU card-falling detection method and device
CN117033084B (en) Virtual machine backup method and device, electronic equipment and storage medium
CN116684306B (en) Fault prediction method, device, equipment and readable storage medium
CN117055718B (en) System, method, device, equipment and storage medium for detecting power consumption of server
CN115391140A (en) Abnormal state processing method of server liquid leakage detection device and related components
CN117992264A (en) Host fault repairing method, device and system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination