CN117349128A

CN117349128A - Fault monitoring method, device and equipment of server cluster and storage medium

Info

Publication number: CN117349128A
Application number: CN202311654834.XA
Authority: CN
Inventors: 陈栋; 李春; 魏兴华; 李建辉; 杨禹航; 吴炎; 臧冰凌; 张文件; 罗春; 王显伟
Original assignee: Hangzhou Woqu Technology Co ltd
Current assignee: Hangzhou Woqu Technology Co ltd
Priority date: 2023-12-05
Filing date: 2023-12-05
Publication date: 2024-01-05
Anticipated expiration: 2043-12-05
Also published as: CN117349128B

Abstract

The invention relates to the technical field of server monitoring, in particular to a fault monitoring method, device and equipment of a server cluster and a storage medium, wherein the method comprises the following steps: acquiring relation information of a server cluster; processing the relation information of the server clusters to generate a connection association diagram of the server clusters; acquiring current attribute information of a server cluster; generating a fault state diagram of the server cluster according to the connection association diagram of the server cluster and the current attribute information of the server cluster, so that faults of the server cluster are monitored according to the fault state diagram of the server cluster; the fault state of each server in the server cluster can be intuitively inquired, the probability of the whole fault can be determined according to a certain connection relation according to the fault state, and further the fault monitoring of the server cluster is realized.

Description

Fault monitoring method, device and equipment of server cluster and storage medium

Technical Field

The present invention relates to the field of server monitoring technologies, and in particular, to a method, an apparatus, a device, and a storage medium for monitoring a failure of a server cluster.

Background

The connection mode between the database all-in-one machines is network communication connection, and a logical service cluster, namely a database all-in-one machine cluster, is formed together; in the database all-in-one cluster, a certain server is in a crash state or other abnormal states due to different factors, so that data abnormality can be caused, and therefore, faults of the server need to be monitored.

Disclosure of Invention

Aiming at the technical problems, the invention provides a fault monitoring method of a server cluster, which comprises the following steps:

and acquiring the relation information of the server cluster.

And processing the relation information of the server clusters to generate a connection association diagram of the server clusters.

And acquiring the current attribute information of the server cluster.

And generating a fault state diagram of the server cluster according to the connection association diagram of the server cluster and the current attribute information of the server cluster, so that faults of the server cluster are monitored according to the fault state diagram of the server cluster.

The invention also protects a fault monitoring device of the server cluster, which comprises:

the first acquisition module is used for acquiring the relation information of the server cluster.

The first generation module is used for processing the relation information of the server cluster and generating a connection association diagram of the server cluster.

And the second acquisition module is used for acquiring the current attribute information of the server cluster.

And the second generation module is used for generating a fault state diagram of the server cluster according to the connection association diagram of the server cluster and the current attribute information of the server cluster, so that the fault of the server cluster is monitored according to the fault state diagram of the server cluster.

The invention protects a computer device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the fault monitoring method of the server cluster when executing the computer program.

The present invention protects a computer readable storage medium storing a computer program which when executed by a processor implements the above-described failure monitoring method for a server cluster.

Compared with the prior art, the invention has obvious advantages and beneficial effects. By means of the technical scheme, the fault monitoring method, the fault monitoring device, the fault monitoring equipment and the storage medium of the server cluster can achieve quite technical progress and practicability, and have wide industrial utilization value, and the fault monitoring method, the fault monitoring device and the storage medium of the server cluster have at least the following advantages:

the invention discloses a fault monitoring method, device and equipment of a server cluster and a storage medium, wherein the method comprises the following steps: acquiring relation information of a server cluster; processing the relation information of the server clusters to generate a connection association diagram of the server clusters; acquiring current attribute information of a server cluster; generating a fault state diagram of the server cluster according to the connection association diagram of the server cluster and the current attribute information of the server cluster, so that faults of the server cluster are monitored according to the fault state diagram of the server cluster; the fault state of each server in the server cluster can be intuitively inquired, the probability of the whole fault can be determined according to a certain connection relation according to the fault state, and further the fault monitoring of the server cluster is realized.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention, as well as the features and advantages of the present invention, which are more fully understood, as it is now apparent from the following detailed description of the preferred embodiments, taken in conjunction with the accompanying drawings.

Drawings

Fig. 1 is a flowchart of a fault monitoring method for a server cluster according to a first embodiment of the present invention;

FIG. 2 is a flowchart of the step S2 provided in the first embodiment of the present invention;

FIG. 3 is a flowchart of step S4 according to a first embodiment of the present invention;

fig. 4 is a schematic structural diagram of a fault monitoring device for a server cluster according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of a first generating module 2 according to a second embodiment of the present invention;

fig. 6 is a schematic structural diagram of a second generating module 4 according to a second embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to the specific implementation and effects of a recovery method of a seed-obtained server cluster according to the present invention with reference to the accompanying drawings and preferred embodiments.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

Example 1

As shown in fig. 1, a first embodiment provides a fault monitoring method for a server cluster, where the method includes:

s1, acquiring relation information of a server cluster, wherein the server cluster comprises a plurality of servers, for example, the servers are database integrated machines.

Specifically, the relationship information of the server cluster includes relationship information of a plurality of servers, where the relationship information of each server refers to a communication connection relationship between any server and other servers except the server.

S2, processing the relation information of the server clusters to generate a connection association diagram of the server clusters.

Specifically, the processing the relationship information of the server cluster, and generating a connection association graph of the server cluster further includes the following steps, as shown in fig. 2:

s21, determining a connection association server ID set of the server cluster according to the relation information of the server cluster;

in a specific embodiment, the method further determines a set of connection association server IDs for the server cluster by:

s211, obtaining a server ID list A= { A corresponding to the server cluster ₁ ，A ₂ ，……，A _i ，……，A _m }，A _i Refers to the ith server ID, i=1, 2 … … m, m is the number of server IDs corresponding to the server cluster.

Specifically, the server ID is a unique identity of the server.

S212, acquiring a connection gateway corresponding to A according to the relation information of the server cluster corresponding to A

The set of co-server IDs b= { B ₁ ，B ₂ ，……，B _i ，……，B _m }，B _i ={B _i1 ，B _i2 ，……，B _ij ，……，B _in（i） }，B _ij Refers to A _i The corresponding j-th connection association server ID, j=1, 2 … … n (i), n (i) referring to a _i And the corresponding number of the connection association server IDs, namely the connection association server ID set of the server cluster is B.

Further, the A _i The corresponding connection association server ID refers to the connection associated with A _i And the unique identity of the servers with the relation information exists between the corresponding servers.

Specifically, the connection association graph of the server cluster is a tree-structured association graph, wherein the connection association graph of the server cluster comprises a connection association root node and leaf nodes associated with s-layer connection, and the number of the leaf nodes associated with the connection of each layer is inconsistent, and the method further comprises the following steps of:

s221, a connection association server ID number list n= { n (1), n (2), … …, n (i), … …, n (m) } corresponding to a is acquired.

S222, determining the root node associated with the connection according to n.

In a specific embodiment, the root node of the connection association is A when only n (i) is the minimum number of associated server IDs in n _i 。

In another specific embodiment, in step S222, the root node associated with the connection is further determined by:

s2221, according to n, acquires a first intermediate server ID set C= { C ₁ ，C ₂ ，……，C _x ，……，C _p }，C _x For the x-th first intermediate server ID, x=1, 2 … … p, p being the number of first intermediate server IDs.

Further, the first intermediate server ID refers to a server ID corresponding to the minimum value in n.

S2222, from BAcquiring a list z= { z (1), z (2), … …, z (x), … …, z (p) } of the number of the associated server IDs corresponding to C, wherein z (x) is C _x Corresponding number of associated server IDs.

S2223, when any z (x) is the minimum number of associated server IDs in z, determining C _x And associating a root node for the connection.

And the root node and the leaf node are found through the association relation, so that a connection association diagram of a reasonable tree structure is further constructed, the subsequent association with the fault is realized, the probability of the fault of the whole is ensured to be determined, and the effective monitoring of the fault of the server cluster is further realized.

S223, determining all leaf nodes D= { D according to the root nodes associated with the connection ₁ ，D ₂ ，……，D _r ，……，D _s }，D _r ={D _r1 ，D _r2 ，……，D _ry ，……，D _rq（r） }，D _ry For the y leaf node in the r layer, r=1, 2 … … s, y=1, 2 … … q (r), q (r) being the number of leaf nodes in the r layer; it can be understood that: d (D) _ry Characterised by dividing D in A _r-1 And any server ID which is not larger than a preset first server ID quantity threshold value and is outside the corresponding server ID list and the server ID corresponding to the root node associated with the connection.

Further, in step S223, D is also determined by the following steps _1y ：

S2231, obtaining a second intermediate server ID list U= { U corresponding to the root node associated with the connection ₁ ，U ₂ ，……，U _g ，……，U _v }，U _g G=1, 2 … … v for the g second intermediate server IDs corresponding to the root node associated with the connection, where v is the number of second intermediate server IDs corresponding to the root node associated with the connection.

Further, the second intermediate server ID is an associated server ID corresponding to the root node associated with the connection in B.

S2232, obtain each U _g Corresponding number of associated server IDs and U _g U with the number of corresponding associated server IDs not greater than a preset first server ID number threshold _g As D _1y 。

Preferably, the first server ID number threshold may be determined by a person skilled in the art according to the level of the leaf node, which will not be described in detail herein.

When the leaf nodes are determined, the reasonable leaf nodes can be accurately determined based on the number of the associated server IDs, so that the reasonable probability of overall faults is improved, and further effective monitoring of faults of the server cluster is realized.

S3, obtaining the current attribute information of the server cluster.

Specifically, the current attribute information of the server cluster includes current attribute information of each server, where the current attribute information of each server includes: current hardware state information of the server, current network state information of the server, and current software state information of the server.

And S4, generating a fault state diagram of the server cluster according to the connection association diagram of the server cluster and the current attribute information of the server cluster, so that faults of the server cluster are monitored according to the fault state diagram of the server cluster.

In a specific embodiment, the generating a fault state diagram of the server cluster according to the connection association diagram of the server cluster and the current attribute information of the server cluster, so that the monitoring of the fault of the server cluster according to the fault state diagram of the server cluster further includes the following steps, as shown in fig. 3:

s41, determining the current fault label vector corresponding to each server according to the current attribute information corresponding to each server in the server set.

S42, generating a fault state diagram of the server cluster according to the current fault label vector of each server and the connection association diagram of the server cluster, so that faults of the server cluster are monitored according to the fault state diagram of the server cluster.

Specifically, the step S41 further includes the following steps:

s411, acquiring the current attribute information A corresponding to A ⁰ ={A ⁰ ₁ ，A ⁰ ₂ ，……，A ⁰ _i ，……，A ⁰ _m }，A ⁰ _i =（A ⁰ _i1 ，A ⁰ _i2 ，A ⁰ _i3 ），A ⁰ _i1 Refers to A _i Current hardware state information of A ⁰ _i2 Refers to A _i Current network state information of A) _i3 Refers to A _i Is provided for the current software state information of the computer system.

S412, according to A ⁰ Obtaining A ⁰ Corresponding current failure tag vector set B ⁰ ={B ⁰ ₁ ，B ⁰ ₂ ，……，B ⁰ _i ，……，B ⁰ _m }，B ⁰ _i =（B ⁰ _i1 ，B ⁰ _i2 ，B ⁰ _i3 ），B ⁰ _i1 Is A ⁰ _i1 Corresponding fault probability value, B ⁰ _i2 Is A ⁰ _i2 Corresponding fault probability value, B ⁰ _i3 Is A ⁰ _i3 A corresponding fault probability value; it can be understood that: will A ⁰ _i1 、A ⁰ _i2 And A ⁰ _i3 Respectively inputting the data into a corresponding trained neural network model to obtain A by distribution ⁰ _i1 、A ⁰ _i2 And A ⁰ _i3 A corresponding fault probability value; those skilled in the art are aware of the method for obtaining the probability value of occurrence of the fault by using the neural network model, and will not be described herein.

Specifically, the step S42 further includes the following steps:

s421, obtaining a connection association diagram of a server cluster;

s422, will B ⁰ _i And recording the corresponding server ID on the node corresponding to each server ID in the connection association diagram of the server cluster, and generating a fault state diagram of the server cluster so as to monitor the fault of the server cluster according to the fault state diagram of the server cluster.

And the fault state diagram is constructed by combining the probability of fault occurrence on the basis of the connection relation diagram, so that the reasonable probability of overall fault occurrence is determined, and further, the effective monitoring of the faults of the server cluster is realized.

The fault monitoring method of the server cluster in this embodiment includes: acquiring relation information of a server cluster; processing the relation information of the server clusters to generate a connection association diagram of the server clusters; acquiring current attribute information of a server cluster; generating a fault state diagram of the server cluster according to the connection association diagram of the server cluster and the current attribute information of the server cluster, so that faults of the server cluster are monitored according to the fault state diagram of the server cluster; the fault state of each server in the server cluster can be intuitively inquired, the probability of the whole fault can be determined according to a certain connection relation according to the fault state, and further the fault monitoring of the server cluster is realized.

Example two

As shown in fig. 4, a first embodiment provides a fault monitoring device for a server cluster, where the device includes:

the first obtaining module 1 is configured to obtain relationship information of a server cluster, where the server cluster includes a plurality of servers, for example, the servers are database integrated machines.

The first generating module 2 is configured to process the relationship information of the server cluster, and generate a connection association diagram of the server cluster.

Specifically, as shown in fig. 5, the first generating module 2 further includes:

a first determining module 21, configured to determine a connection association server ID set of the server cluster according to relationship information of the server cluster;

the first graph generating module 22 is configured to generate a connection association graph of the server cluster according to the connection association server ID set of the server cluster.

In a specific embodiment, the first determining module 21 includes:

a server ID list obtaining module 211, configured to obtain a server ID list a= { a corresponding to the server cluster ₁ ，A ₂ ，……，A _i ，……，A _m }，A _i Refers to the ith server ID, i=1, 2 … … m, m is the number of server IDs corresponding to the server cluster.

Specifically, the server ID is a unique identity of the server.

A connection association server ID set acquisition module 212, configured to, according to the server cluster corresponding to a

Acquiring a connection association server ID set B= { B corresponding to A ₁ ，B ₂ ，……，B _i ，……，B _m }，B _i ={B _i1 ，B _i2 ，……，B _ij ，……，B _in（i） }，B _ij Refers to A _i The corresponding j-th connection association server ID, j=1, 2 … … n (i), n (i) referring to a _i And the corresponding number of the connection association server IDs, namely the connection association server ID set of the server cluster is B.

Specifically, the connection association diagram of the server cluster is a tree-structured association diagram, wherein,

the connection association graph of the server cluster includes a connection association root node and leaf nodes associated with s-layer connection, and the number of the leaf nodes associated with the connection of each layer is inconsistent, where the first graph generating module 22 includes:

the connection association server ID number list obtaining module 221 is configured to obtain a connection association server ID number list n= { n (1), n (2), … …, n (i), … …, n (m) } corresponding to a.

The root node determining module 222 is configured to determine, according to n, a root node associated with the connection.

In another specific embodiment, the root node determination module 222 includes:

a first intermediate server ID set acquisition module 2221, configured to acquire a first intermediate server ID set c= { C according to n ₁ ，C ₂ ，……，C _x ，……，C _p }，C _x For the x-th first intermediate server ID, x=1, 2 … … p, p being the number of first intermediate server IDs.

A first execution module 2222, configured to obtain, from B, a list z= { z (1), z (2), … …, z (x), … …, z (p) }, z (x) corresponding to C, where z (x) is C _x Corresponding number of associated server IDs.

A second execution module 2223 configured to determine C when any z (x) is the minimum number of associated server IDs in z _x And associating a root node for the connection.

A leaf node determining module 223, configured to determine all leaf nodes d= { D according to the root nodes associated with the connection ₁ ，D ₂ ，……，D _r ，……，D _s }，D _r ={D _r1 ，D _r2 ，……，D _ry ，……，D _rq（r） }，D _ry For the y leaf node in the r layer, r=1, 2 … … s, y=1, 2 … … q (r), q (r) being the number of leaf nodes in the r layer; it can be understood that: d (D) _ry Characterised by dividing D in A _r-1 And any server ID which is not larger than a preset first server ID quantity threshold value and is outside the corresponding server ID list and the server ID corresponding to the root node associated with the connection.

Further, the leaf node determining module 223 includes:

a third execution module 2231, configured to obtain a second intermediate server ID list u= { U corresponding to the root node associated with the connection ₁ ，U ₂ ，……，U _g ，……，U _v }，U _g G=1, 2 … … v for the g second intermediate server IDs corresponding to the root node associated with the connection, where v is the number of second intermediate server IDs corresponding to the root node associated with the connection.

A fourth execution module 2232 for obtaining each U _g Corresponding number of associated server IDs and U _g U with the number of corresponding associated server IDs not greater than a preset first server ID number threshold _g As D _1y 。

And the second acquisition module 3 is used for acquiring the current attribute information of the server cluster.

Specifically, the attribute information of the server cluster includes current attribute information of each server, where the current attribute information of each server includes: current hardware state information of the server, current network state information of the server, and current software state information of the server.

And the second generating module 4 is configured to generate a failure state diagram of the server cluster according to the connection association diagram of the server cluster and current attribute information of the server cluster, so that failure of the server cluster is monitored according to the failure state diagram of the server cluster.

In a specific embodiment, as shown in fig. 6, the second generating module 4 further includes:

and the second determining module 41 is configured to determine a current fault label vector corresponding to each server according to the current attribute information corresponding to each server in the server set.

And the second graph generating module 42 is configured to generate a failure state graph of the server cluster according to the current failure label vector of each server and the connection association graph of the server cluster, so that failure of the server cluster is monitored according to the failure state graph of the server cluster.

Specifically, the second determining module 41 includes:

a current attribute information obtaining module 411 for obtaining current attribute information a corresponding to a ⁰ ={A ⁰ ₁ ，A ⁰ ₂ ，……，A ⁰ _i ，……，A ⁰ _m }，A ⁰ _i =（A ⁰ _i1 ，A ⁰ _i2 ，A ⁰ _i3 ），A ⁰ _i1 Refers to A _i Current hardware state information of A ⁰ _i2 Refers to A _i Current network state information of A) _i3 Refers to A _i Is provided for the current software state information of the computer system.

The current failure tag vector set obtaining module 412 is configured to obtain, according to a ⁰ Obtaining A ⁰ Corresponding current failure tag vector set B ⁰ ={B ⁰ ₁ ，B ⁰ ₂ ，……，B ⁰ _i ，……，B ⁰ _m }，B ⁰ _i =（B ⁰ _i1 ，B ⁰ _i2 ，B ⁰ _i3 ），B ⁰ _i1 Is A ⁰ _i1 Corresponding fault probability value, B ⁰ _i2 Is A ⁰ _i2 Corresponding fault probability value, B ⁰ _i3 Is A ⁰ _i3 A corresponding fault probability value; it can be understood that: will A ⁰ _i1 、A ⁰ _i2 And A ⁰ _i3 Respectively inputting the data into a corresponding trained neural network model to obtain A by distribution ⁰ _i1 、A ⁰ _i2 And A ⁰ _i3 A corresponding fault probability value; those skilled in the art are aware of the method for obtaining the probability value of occurrence of the fault by using the neural network model, and will not be described herein.

Specifically, the second graph generation module 42 includes:

a fifth execution module 421, configured to obtain a connection association diagram of the server cluster;

a sixth execution module 422 for executing B ⁰ _i And recording the corresponding server ID on the node corresponding to each server ID in the connection association diagram of the server cluster, and generating a fault state diagram of the server cluster so as to monitor the fault of the server cluster according to the fault state diagram of the server cluster.

In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

acquiring relation information of a server cluster;

processing the relation information of the server clusters to generate a connection association diagram of the server clusters;

acquiring current attribute information of a server cluster;

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring relation information of a server cluster;

acquiring current attribute information of a server cluster;

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above functional units and the division of the modules are illustrated, and in practical application, the above functions may be allocated to different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to complete all or part of the functions described above.

The present invention is not limited to the above-mentioned embodiments, but is not limited to the above-mentioned embodiments, and any person skilled in the art can make some changes or modifications to the equivalent embodiments without departing from the scope of the present invention, but all the simple modifications, equivalent changes and modifications according to the technical matter of the present invention fall within the scope of the technical solution of the present invention.

Claims

1. A method for monitoring a failure of a server cluster, the method comprising:

acquiring relation information of a server cluster;

acquiring current attribute information of a server cluster;

2. The method for fault monitoring of a server cluster according to claim 1, wherein the server cluster comprises a number of servers.

3. The failure monitoring method of a server cluster according to claim 2, wherein the relationship information of the server cluster includes relationship information of a plurality of servers, wherein the relationship information of each server refers to a communication connection relationship between any one server and other servers except itself.

4. The method for monitoring a failure of a server cluster according to claim 3, wherein the processing the relationship information of the server cluster, and generating a connection association graph of the server cluster further comprises the steps of:

determining a connection association server ID set of the server cluster according to the relation information of the server cluster;

and generating a connection association diagram of the server cluster according to the connection association server ID set of the server cluster.

5. The method for fault monitoring of a server cluster as claimed in claim 2, wherein,

the current attribute information of the server cluster includes current attribute information of each server, wherein the current attribute information of each server includes: current hardware state information of the server, current network state information of the server, and current software state information of the server.

6. The method for monitoring a failure of a server cluster according to claim 5, wherein generating a failure state diagram of the server cluster according to the connection association diagram of the server cluster and current attribute information of the server cluster, so that the monitoring of the failure of the server cluster according to the failure state diagram of the server cluster further comprises the steps of:

determining a current fault label vector corresponding to each server according to the current attribute information corresponding to each server in the server cluster;

generating a fault state diagram of the server cluster according to the current fault label vector of each server and the connection association diagram of the server cluster, so that faults of the server cluster are monitored according to the fault state diagram of the server cluster.

7. A fault monitoring device for a server cluster, the device comprising:

the first acquisition module is used for acquiring the relation information of the server cluster;

the first generation module is used for processing the relation information of the server cluster and generating a connection association diagram of the server cluster;

the second acquisition module is used for acquiring the current attribute information of the server cluster;

8. The fault monitoring device of a server cluster according to claim 7, wherein the server cluster comprises a number of servers.

9. The failure monitoring apparatus of claim 8, wherein the relationship information of the server cluster includes relationship information of a plurality of servers, wherein the relationship information of each server refers to a communication connection relationship between any one server and other servers except itself.

10. The fault monitoring device of a server cluster according to claim 9, wherein the first generating module comprises:

the first determining module is used for determining a connection association server ID set of the server cluster according to the relation information of the server cluster;

and the first graph generation module is used for generating a connection association graph of the server cluster according to the connection association server ID set of the server cluster.

11. The failure monitoring apparatus of the server cluster according to claim 8, wherein the current attribute information of the server cluster includes current attribute information of each server, wherein the current attribute information of each server includes: current hardware state information of the server, current network state information of the server, and current software state information of the server.

12. The fault monitoring device of a server cluster according to claim 11, wherein the second generating module comprises:

the second determining module is used for determining a current fault label vector corresponding to each server according to the current attribute information corresponding to each server in the server cluster;

and the second graph generating module is used for generating a fault state graph of the server cluster according to the current fault label vector of each server and the connection association graph of the server cluster, so that faults of the server cluster are monitored according to the fault state graph of the server cluster.

13. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements a fault monitoring method of a server cluster according to any of claims 1-6 when the computer program is executed.

14. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method for fault monitoring of a server cluster according to any one of claims 1 to 6.