CN114465874A

CN114465874A - Fault prediction method, device, electronic equipment and storage medium

Info

Publication number: CN114465874A
Application number: CN202210358521.9A
Authority: CN
Inventors: 易存道
Original assignee: Beijing Baolande Software Co ltd
Current assignee: Beijing Baolande Software Co ltd
Priority date: 2022-04-07
Filing date: 2022-04-07
Publication date: 2022-05-10
Anticipated expiration: 2042-04-07
Also published as: CN114465874B

Abstract

The invention provides a fault prediction method, a fault prediction device, electronic equipment and a storage medium, wherein the method comprises the following steps: constructing a knowledge graph to be predicted, wherein the knowledge graph to be predicted takes network elements as nodes and takes a calling relationship or a deployment relationship among the network elements as edges; determining the similarity between the knowledge graph to be predicted and each historical fault knowledge graph based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph; and determining the fault probability corresponding to the knowledge graph to be predicted based on the similarity between the knowledge graph to be predicted and each historical fault knowledge graph. The method, the device, the electronic equipment and the storage medium provided by the invention can improve the accuracy of fault prediction, realize fault early warning, reduce the probability of fault occurrence, reduce the influence on enterprises, do not need manual processing, save a large amount of manpower and material resources and improve the efficiency of fault prediction.

Description

Fault prediction method, device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for predicting a failure, an electronic device, and a storage medium.

Background

With the rapid development of science and technology, computer software is increasingly deployed in a distributed cloud environment, the dependency relationship among components is complex, a large number of devices are usually used in a business system deployed by an enterprise, the devices are difficult to avoid faults caused by various reasons after long-time operation, and the faults have great influence on the whole business system after the faults occur, so that the loss of the enterprise is caused. Therefore, it is very necessary to predict the failure of the equipment in the business system.

At present, most of fault prediction technologies are that operation and maintenance personnel predict faults based on relevance of equipment indexes, prediction results are inaccurate when the fault prediction technologies are used for predicting faults in such a mode, and a large amount of events, alarms, faults and data logs can be generated by various kinds of equipment, if the equipment is simply dependent on manual processing, a large amount of manpower and material resources can be consumed, and the error rate during processing cannot be guaranteed.

Disclosure of Invention

The invention provides a fault prediction method, a fault prediction device, electronic equipment and a storage medium, which are used for solving the defect of low fault prediction accuracy rate in the prior art and improving the fault prediction accuracy rate.

The invention provides a fault prediction method, which comprises the following steps:

constructing a knowledge graph to be predicted, wherein the knowledge graph to be predicted takes network elements as nodes and takes a calling relationship or a deployment relationship among the network elements as edges;

determining the similarity between the knowledge graph to be predicted and each historical fault knowledge graph based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph;

and determining the fault probability corresponding to the knowledge graph to be predicted based on the similarity between the knowledge graph to be predicted and each historical fault knowledge graph.

According to the fault prediction method provided by the invention, the similarity between the knowledge graph to be predicted and each historical fault knowledge graph is determined based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph, and the method comprises the following steps:

determining the same subgraph based on the intersection of the subgraph set of the knowledge graph to be predicted and the fault subgraph set of any historical fault knowledge graph;

and determining the similarity between the knowledge graph to be predicted and any historical fault knowledge graph based on the same subgraph, the corresponding weight of the same subgraph and the union of the subgraph set and the fault subgraph set.

According to the fault prediction method provided by the invention, the weight corresponding to the same subgraph is determined based on the following formula:

wherein the content of the first and second substances,

for the corresponding weight of the same sub-graph,

is the depth of the same sub-image,

for the number of subgraphs in the set of subgraphs,

is as follows

The depth of the individual subgraphs.

According to the fault prediction method provided by the invention, the construction of the knowledge graph to be predicted comprises the following steps:

constructing network element subgraphs of various abnormal network elements based on the abnormal events of the various abnormal network elements;

and constructing the knowledge graph to be predicted based on the network element subgraphs of the abnormal network elements.

According to the fault prediction method provided by the invention, the step of constructing the knowledge graph to be predicted based on the network element subgraphs of the abnormal network elements comprises the following steps:

and constructing the knowledge graph to be predicted based on the network element subgraphs of the abnormal network elements, the relationship information of the abnormal network elements and the association probability among the network element subgraphs of the abnormal network elements.

According to the fault prediction method provided by the invention, each sub-graph in the sub-graph set of the knowledge graph to be predicted is determined based on the following steps:

and decomposing the knowledge graph to be predicted into the sub-graphs based on the edges and the attributes of the nodes in the knowledge graph to be predicted.

According to the fault prediction method provided by the invention, the similarity between the knowledge graph to be predicted and each historical fault knowledge graph is determined based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph, and then the method further comprises the following steps:

determining a similar historical fault knowledge map similar to the to-be-predicted knowledge map based on the similarity between the to-be-predicted knowledge map and each historical fault knowledge map;

and determining the fault prediction time corresponding to the knowledge graph to be predicted based on the historical fault time corresponding to the similar historical fault knowledge graph.

The present invention also provides a failure prediction apparatus, comprising:

the system comprises a construction unit and a prediction unit, wherein the construction unit is used for constructing a to-be-predicted knowledge graph, and the to-be-predicted knowledge graph takes network elements as nodes and takes a calling relationship or a deployment relationship among the network elements as edges;

the determining unit is used for determining the similarity between the knowledge graph to be predicted and each historical fault knowledge graph based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph;

and the prediction unit is used for determining the fault probability corresponding to the knowledge graph to be predicted based on the similarity between the knowledge graph to be predicted and each historical fault knowledge graph.

The present invention also provides an electronic device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the fault prediction method as described in any of the above when executing the program.

The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a fault prediction method as described in any one of the above.

According to the fault prediction method, the fault prediction device, the electronic equipment and the storage medium, the to-be-predicted knowledge graph is constructed, and the similarity between the to-be-predicted knowledge graph and each historical fault knowledge graph is determined based on the intersection and the union of the sub-graph set of the to-be-predicted knowledge graph and the fault sub-graph set of each historical fault knowledge graph, so that the fault probability corresponding to the to-be-predicted knowledge graph is determined, the fault prediction accuracy can be improved, fault early warning is achieved, the fault occurrence probability is reduced, the influence on enterprises is reduced, manual processing is not needed, a large amount of manpower and material resources are saved, and the fault prediction efficiency is improved.

Drawings

In order to more clearly illustrate the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic flow chart of a fault prediction method provided by the present invention;

FIG. 2 is one of the exemplary diagrams of a knowledge-graph to be predicted provided by the present invention;

fig. 3 is one of exemplary diagrams of a network element subgraph of an abnormal network element provided by the present invention;

fig. 4 is a second exemplary diagram of a network element subgraph of an abnormal network element provided by the present invention;

FIG. 5 is a second exemplary diagram of a knowledge-graph to be predicted provided by the present invention;

FIG. 6 is an exemplary diagram of a historical failure occurrence timeline provided by the present invention;

FIG. 7 is an exemplary diagram of a predicted failure occurrence timeline provided by the present invention;

FIG. 8 is a second flowchart of the failure prediction method provided by the present invention;

FIG. 9 is a schematic structural diagram of a failure prediction device provided by the present invention;

fig. 10 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a fault prediction method. Fig. 1 is a schematic flow chart of a fault prediction method provided by the present invention, as shown in fig. 1, the method includes:

step 110, constructing a knowledge graph to be predicted, wherein the knowledge graph to be predicted takes network elements as nodes and takes a calling relationship or a deployment relationship among the network elements as edges;

step 120, determining similarity between the knowledge graph to be predicted and each historical fault knowledge graph based on intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph;

and step 130, determining the fault probability corresponding to the knowledge graph to be predicted based on the similarity between the knowledge graph to be predicted and each historical fault knowledge graph.

Specifically, the to-be-predicted knowledge graph, that is, the knowledge graph that needs to be subjected to fault prediction, may be constructed according to abnormal index data acquired during the operation of the system when an abnormal event occurs, where the to-be-predicted knowledge graph takes network elements related to the abnormal index data as nodes and takes a calling relationship or a deployment relationship between the network elements as edges. The historical fault knowledge graph is a knowledge graph corresponding to a historical fault of the network element, and may be a knowledge graph constructed according to index data when the historical fault occurs, or a knowledge graph arranged by an expert according to rule experience data. It can be understood that, in general, after a certain abnormal event occurs for a period of time, the network element fault is only caused, and therefore, the embodiment of the present invention can perform fault prediction according to the similarity between the to-be-predicted knowledge graph and each historical fault knowledge graph, so as to perform processing in time and avoid unnecessary loss.

After the knowledge graph to be predicted is constructed, the similarity between the knowledge graph to be predicted and each historical fault knowledge graph can be calculated according to the intersection and the union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph, wherein the sub-graph set of the knowledge graph to be predicted is a set formed by sub-graphs of the knowledge graph to be predicted, and the fault sub-graph set of the historical fault knowledge graph is a set formed by sub-graphs of the historical fault knowledge graph. Then, the fault probability corresponding to the knowledge graph to be predicted can be determined according to the similarity between the knowledge graph to be predicted and each historical fault knowledge graph, wherein the fault probability is the prediction probability of the network element in the knowledge graph to be predicted to have faults.

In the process of calculating the similarity, the similarity between the knowledge graph to be predicted and each historical fault knowledge graph can be determined only according to the intersection and the union of the sub-graph set and the fault sub-graph set, and the similarity can also be determined by combining other information such as graph information of the sub-graphs on the basis of the intersection and the union. Further, the maximum value of the similarity between the knowledge graph to be predicted and each historical failure knowledge graph can be used as the failure probability corresponding to the knowledge graph to be predicted. In addition, the fault type which may have faults can be predicted according to the fault type of the historical fault knowledge graph corresponding to the maximum value, and the service which may be affected can be predicted according to the key service corresponding to the fault network element of the historical fault knowledge graph.

It should be noted that, unlike a method for performing fault prediction based on relevance of equipment indexes, the embodiment of the present invention generates a to-be-predicted knowledge graph with more comprehensive relevance information, and performs fault prediction based on the to-be-predicted knowledge graph, so that a more accurate prediction result can be obtained. And the to-be-predicted knowledge graph is decomposed into sub graphs, and the similarity between the to-be-predicted knowledge graph and each historical fault knowledge graph is analyzed from the sub graph granularity, so that the calculation accuracy of the similarity between the to-be-predicted knowledge graph and each historical fault knowledge graph can be improved, and the accuracy of fault prediction is further improved.

According to the method provided by the embodiment of the invention, the knowledge graph to be predicted is constructed, and the similarity between the knowledge graph to be predicted and each historical fault knowledge graph is determined based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph, so that the fault probability corresponding to the knowledge graph to be predicted is determined, the accuracy of fault prediction can be improved, fault early warning is realized, the fault occurrence probability is reduced, the influence on an enterprise is reduced, manual processing is not needed, a large amount of manpower and material resources are saved, and the efficiency of fault prediction is improved.

Based on the above embodiment, in order to further improve the efficiency of fault prediction, each historical fault knowledge graph in step 120 may be obtained by screening from historical fault knowledge graphs stored in a knowledge base according to nodes in the knowledge graph to be predicted, and a specific process may be that the knowledge graph to be predicted includes m nodes, and the historical fault knowledge graph having seed nodes is selected from the knowledge base by selecting [ m × n ] nodes from the nodes as seed nodes. Here, the number of selected historical failure knowledge maps may be adjusted by adjusting the form of n, which may be set to 0.9, for example.

Based on any of the above embodiments, step 120 includes:

and determining the similarity between the knowledge graph to be predicted and the historical fault knowledge graph based on the same subgraph, the weight corresponding to the same subgraph and the union of the subgraph set and the fault subgraph set.

Specifically, for each historical failure knowledge graph, the similarity between the knowledge graph to be predicted and the historical failure knowledge graph can be calculated in the following manner: firstly, determining the same subgraph in two sets, namely the same subgraph, according to the intersection between the subgraph set of the knowledge graph to be predicted and the fault subgraph set of the historical fault knowledge graph; and then, calculating the similarity between the knowledge graph to be predicted and the historical fault knowledge graph according to the same subgraph, the weight corresponding to the same subgraph and the union set of the subgraph set and the fault subgraph set. Here, the weight corresponding to the same subgraph may be set according to an empirical value or may be obtained through intelligent calculation, which is not specifically limited in the embodiment of the present invention.

For example, a knowledge graph to be predicted

Set of subgraphs of

Knowledge map of historical faults

Set of fault subgraphs of

From the intersection of the two, it can be determined

And

are the same subgraph, then can be based on

And its corresponding weight,

And determining the similarity between the knowledge graph to be predicted and the historical fault knowledge graph by the corresponding weight and the union of the subgraph set and the fault subgraph set.

Based on any of the above embodiments, the weight corresponding to the same subgraph is determined based on the following formula:

wherein the content of the first and second substances,

the weights corresponding to the same sub-graph,

is the depth of the same sub-image,

for the number of subgraphs in the set of subgraphs,

is as follows

The depth of the individual subgraphs.

Specifically, in order to further improve the calculation accuracy of the similarity between the knowledge graph to be predicted and the historical fault knowledge graph, the weight corresponding to the same sub-graph in the embodiment of the present invention may be obtained by performing depth calculation on the same sub-graph, and specifically, the following formula may be adopted:

wherein the content of the first and second substances,

the weights corresponding to the same sub-graph,

is the depth of the same sub-image,

the number of sub-graphs in the sub-graph set of the knowledge graph to be predicted, e is a natural base number,

is as follows

The depth of each sub-graph. It should be noted that, considering that there may be attributes in nodes in a subgraph, in the embodiment of the present invention, when the depth of the same subgraph is calculated, both the attributes of the nodes and the attributes of the nodes are taken as objects, the length between every two objects is calculated, and the maximum value of the obtained lengths is taken as the depth.

For example, fig. 2 is one of the exemplary graphs of the knowledge graph to be predicted provided by the present invention, as shown in fig. 2, the nodes in the knowledge graph to be predicted are a and B, a, B, and c are attributes of the node a, and it is determined that the same sub-graphs are Aa, AB, and bAB according to the above steps, then the depth of the same sub-graph Aa is 1 between a and a, the depth of the same sub-graph AB is 1 between a and B, and the depth of the same sub-graph bAB is 2 between B and B, which can be respectively substituted into the above formulas to obtain corresponding weights.

Further, the similarity between the knowledge graph G1 to be predicted and the historical fault knowledge graph G2 can be calculated by the following formula:

the meaning of the above formula is: set of subgraphs of graph G1

The point is multiplied by the weight corresponding to each subgraph in the subgraph set of G1 to form the intersection of the subgraph set with the fault subgraph set of the graph G2Weight matrix of

The similarity between the knowledge graph to be predicted and the historical fault knowledge graph is obtained by multiplying the same subgraph by the corresponding weight of the subgraph and dividing by the union set of the subgraph set and the fault subgraph set.

According to the method provided by the embodiment of the invention, the knowledge graph to be predicted is decomposed into the sub-graphs, the depth of the sub-graphs is converted into the sub-graph weight in the similarity calculation process, and the similarity between the knowledge graph to be predicted and each historical fault knowledge graph is analyzed by combining the sub-graph weight, so that the calculation accuracy of the similarity between the knowledge graph to be predicted and the historical fault knowledge graph can be further improved, and the accuracy of fault prediction is further improved.

Based on any of the above embodiments, step 110 includes:

and constructing a knowledge graph to be predicted based on the network element subgraphs of the abnormal network elements.

Specifically, the index data of the network element device may be detected in real time during the operation of the system, and when the abnormal index data is detected, the abnormal index data may be classified in groups according to the index type corresponding to the abnormal index data to obtain different types of abnormal events occurring in each abnormal network element, where the types of the abnormal events may include network element index abnormality, alarm, abnormal log, call chain abnormality, slow SQL (Structured Query Language), key service abnormality, and the like, which is not specifically limited in this embodiment of the present invention.

On this basis, a network element subgraph of each abnormal network element can be constructed according to abnormal events occurring in each abnormal network element, each abnormal network element can be used as a node of the network element subgraph, each type of attribute of the node corresponds to one type of abnormal event, fig. 3 and 4 are exemplary diagrams of the network element subgraph of the abnormal network element provided by the invention, as shown in fig. 3, the abnormal network element is an example, the types of the abnormal events occurring in the abnormal network element include abnormal logs, alarms, index abnormalities and call chain abnormalities, as shown in fig. 4, the abnormal network element is a host, and the types of the abnormal events occurring in the abnormal network element include log error reporting and GC (Garbage Collection) times; then, the network element subgraphs which belong to the same knowledge graph can be determined according to the network element subgraphs of the abnormal network elements, and the knowledge graph to be predicted is constructed according to the network element subgraphs.

Based on any embodiment, the method for constructing the knowledge graph to be predicted based on the network element subgraphs of the abnormal network elements comprises the following steps:

and constructing a knowledge graph to be predicted based on the network element subgraphs of the different network elements, the relationship information of the different network elements and the association probability among the network element subgraphs of the different network elements.

Specifically, the network element subgraphs of the abnormal network elements may be connected according to the network element subgraphs of the abnormal network elements and the relationship information of the abnormal network elements to obtain an initial abnormal knowledge graph, where the relationship information of the abnormal network elements includes a call relationship, a deployment relationship, and the like related to the abnormal network elements, and the call relationship may be manually arranged or generated by using call chain data, which is not specifically limited in the embodiment of the present invention.

On the basis, algorithms such as frequent subgraph mining and the like can be applied to mine association probability among network element subgraphs centering on different network elements, the network element subgraphs with the association probability larger than a preset threshold value are extracted and combined, and finally the to-be-predicted knowledge graph needing fault prediction at present is obtained. Here, the association probability is used to characterize the strength of the association between the network element subgraphs, and it can be understood that the stronger the association between two network element subgraphs, the greater the probability of corresponding to the same fault, and therefore, the fault prediction needs to be performed by drawing the same knowledge graph.

It should be noted that the to-be-predicted knowledge graph takes the network elements as nodes, and takes the calling relationship or the deployment relationship between the network elements as edges, and when the relationship information of the abnormal network element relates to the non-abnormal network element, the finally generated to-be-predicted knowledge graph also includes the nodes corresponding to the non-abnormal network element. For example, fig. 5 is a second exemplary diagram of the knowledge graph to be predicted provided by the present invention, as shown in fig. 5, the virtual machine network element has no abnormal event mounted thereon, and belongs to a non-abnormal network element, while the three network elements, i.e., the instance, the host, and the application, belong to an abnormal network element; according to the example, the host and the relation information of the application, it can be determined that the application and the host have a direct calling relation, the network element subgraph of the host can be directly connected to the network element subgraph of the application, and the example and the application do not have a direct calling relation but have a deployment relation, namely the example is deployed on a virtual machine which is deployed on the application, so that the network element subgraph of the example can be connected to the network element subgraph of the virtual machine and then connected to the network element subgraph of the application.

Further, considering that each network element in the service system is embodied by an application, when merging network element subgraphs, the embodiment of the present invention needs to connect the network element subgraph of each abnormal network element to the network element subgraph of its associated application network element, whose root node is always an application, for example, in the above example, the instance and the host belong to an abnormal network element, and their network element subgraphs are finally connected to the network element subgraph of the associated application network element.

Based on any embodiment, each sub-graph in the sub-graph set of the knowledge graph to be predicted is determined based on the following steps:

and decomposing the knowledge graph to be predicted into sub-graphs based on the edges and attributes of all nodes in the knowledge graph to be predicted.

Specifically, the knowledge graph to be predicted takes the network element as a node, and various types of abnormal events occurring in the network element can be taken as various types of attributes of the corresponding node. Therefore, the knowledge graph to be predicted can be decomposed into sub-graphs with the depths from 1 to d according to the edges and the attributes of each node in the knowledge graph to be predicted, wherein d is the depth of the knowledge graph to be predicted, and each node in each sub-graph is ensured to contain at most one type of attribute, for example, taking the knowledge graph to be predicted in fig. 2 as an example, the graph contains nodes a and B, the node a contains three types of attributes a, B and c, and sub-graphs of Aa, Ab, Ac, Ab, aAB, bAB and caba can be decomposed through extraction, wherein each node in the sub-graph Ab does not contain attributes, and besides, the nodes a in other sub-graphs contain one type of attributes.

Based on any of the above embodiments, step 120 further includes:

determining a similar historical fault knowledge map similar to the knowledge map to be predicted based on the similarity between the knowledge map to be predicted and each historical fault knowledge map;

Specifically, considering that the fault occurrence time cannot be accurately predicted in the existing fault prediction technology, in the embodiment of the invention, according to the similarity between the knowledge graph to be predicted and each historical fault knowledge graph, the historical fault knowledge graph most similar to the knowledge graph to be predicted, namely the similar historical fault knowledge graph, is determined, and then according to the historical fault time corresponding to the similar historical fault knowledge graph, the future fault occurrence time corresponding to the knowledge graph to be predicted, namely the fault prediction time, is determined.

Further, considering that in a normal situation, after a certain abnormal event continues to occur for a period of time, the network element fault is not caused, the embodiment of the present invention may obtain the time length T required from the earliest occurrence of the abnormal event to the network element fault through the historical fault time corresponding to the similar historical fault knowledge graph₁FIG. 6 is an exemplary diagram of a historical failure occurrence time axis according to the present invention, and as shown in FIG. 6, T is obtained according to a difference between a failure occurrence time point and an earliest abnormal time₁And taking the time when the network element equipment or the component fails as a failure occurrence time point.

The interception rule of the earliest abnormal time is as follows: multiple abnormal events exist in a historical fault, and the reporting time t corresponding to a certain abnormal event is as follows₁、t₂、t₃...t_x...t_n，t_x>t_（x+1）T1 is the latest reporting time, when the interval between two adjacent reporting times is greater than the given threshold, the later time of the two reporting times is the earliest time (t) of the abnormal event in the current abnormity_A) (ii) a By means of such a methodThe earliest occurrence time of a plurality of abnormal events corresponding to one historical fault is t_A、t_BAnd t_CAnd t is_A<t_B<t_CThen t is_AI.e. the earliest finally determined anomaly time.

FIG. 7 is an exemplary diagram of a time axis for predicting occurrence of a failure according to the present invention, as shown in FIG. 7, which can be obtained from T₁Current detected current abnormal time T₀₁And the time T of the current earliest occurrence of an anomaly₀₀(the determination method refers to the interception rule and is not described again), and determines the failure prediction time T, namely the current abnormal time T₀₁And the corresponding network element in the knowledge graph to be predicted has a fault after the time T:

T=T₁-（T₀₁-T₀₀）

based on any of the above embodiments, fig. 8 is a second schematic flow chart of the fault prediction method provided by the present invention, as shown in fig. 8, the method includes the following steps:

s1, constructing network element subgraph of abnormal network element

1) Abnormal event extraction for abnormal network elements

The abnormal events of each abnormal network element within a period of time can be extracted from the system, and in order to ensure that the extracted abnormal events correspond to the same fault, the specific extraction rule for the abnormal event of each abnormal network element is as follows: the abnormal network element generates an abnormal event t at the following time node₁、t₂、t₃...t_x...t_n，t_x>t_x+1，t₁For the time node of the latest occurrence of an abnormal event of the network element, from t₁Starting to look up forward when t_xWhen the abnormal event occurred in the time node is extracted, if t_xAnd t_x+1Is less than or equal to a given time threshold, then t_x+1The abnormal events occurring in the time node are also extracted, and when the time interval of two adjacent abnormal events is larger than a given time threshold value, the abnormal events are not searched forward.

2) Network element subgraph construction of abnormal network elements

The extracted index data of the abnormal events of each abnormal network element is grouped according to the object index type, so that different types of abnormal events (including network element index abnormality, alarm, abnormal log, call chain abnormality, slow SQL, key service abnormality and the like) occurring in each abnormal network element within a period of time can be obtained, and a network element subgraph of each abnormal network element is constructed based on the abnormal events, as shown in FIGS. 3 and 4.

S2, constructing a knowledge graph to be predicted

1) Extraction of relation information of abnormal network elements

Firstly, acquiring a deployment relationship between network elements based on an enterprise AMDB (Application Management Database), wherein the AMDB is a Database for providing basic data and the deployment relationship of asset equipment; then, based on a call relationship between network elements in an enterprise IT (Internet Technology, information Technology) system and a deployment relationship between the network elements acquired from the AMDB, the relationship between the detected abnormal network elements is perfected, and an abnormal knowledge graph is preliminarily formed.

2) To-be-predicted knowledge graph construction

By means of a frequent subgraph mining algorithm, on the basis of the step 1), mining association probabilities among network element subgraphs centering on different network elements, extracting the network element subgraphs with the association probabilities larger than a preset threshold value for combination, and finally connecting the network element subgraph of each abnormal network element to the network element subgraph of the application network element associated with the abnormal network element, wherein the root node of each abnormal network element is always applied, so that a to-be-predicted knowledge graph which needs to be subjected to fault prediction at present is obtained, as shown in fig. 5.

S3, fault prediction is carried out based on the knowledge graph to be predicted

Fault prediction is carried out based on knowledge graph to be predicted

Sending the knowledge graph to be predicted obtained in the steps to an AI (Artificial Intelligence) analysis center, wherein the AI analysis center adopts a graph similarity calculation method, namely calculating the similarity between the knowledge graph to be predicted and the historical fault knowledge graph stored in the knowledge base, determining the historical fault knowledge graph with similar TOP-n in the knowledge base, namely the similar historical fault knowledge graph, and taking the fault reason of the similar historical fault knowledge graph as the current fault reason, and the specific process is as follows:

1) coarse screen with similar atlas

Taking the knowledge graph to be predicted in fig. 5 as an example, the graph comprises four nodes including an example, a virtual machine and an application, and the historical failure knowledge graph with the seed nodes is selected from the knowledge base by selecting [4 x n ] nodes from the nodes as the seed nodes. The number of the selected historical fault knowledge maps is adjusted by adjusting the form of n, in the embodiment of the invention, n is set to be 0.9, and K historical fault knowledge maps containing real faults are selected from the knowledge base.

2) Atlas decomposition

Using currently obtained knowledge-graphs including the knowledge-graph to be predicted

，... ...，

,

The K +1 spectra are separately decomposed, e.g. the spectra

Decomposed into n sub-graphs with depth 1-d and nodes having at most one type of attribute

Wherein d is the maximum depth of the map.

3) Similarity calculation

Giving a weight to each sub-graph obtained by decomposition in the knowledge graph to be predicted, wherein the weight is as follows:

wherein

Is as follows

The weight corresponding to each of the sub-maps,

is as follows

The depth of the individual sub-graphs,

the number of the decomposed sub-maps is shown, and e is a natural base number.

And calculating the similarity between the knowledge graph G1 to be predicted and the historical fault knowledge graph G2 by adopting a Jaccard similarity algorithm with weight. The process is as follows:

the meaning of the above formula is: set of subgraphs of graph G1

A weight matrix which is formed by multiplying the intersection point of the failure sub-graph set of the graph G2 by the weight corresponding to each sub-graph in the sub-graph set of the graph G1

Through the algorithm, the historical fault knowledge map which is most similar to the knowledge map to be predicted in the knowledge base, namely the similar historical fault knowledge map, can be calculated, and the corresponding similarity is used as the fault probability corresponding to the knowledge map to be predicted. In the actual use process, because the efficiency of predicting the abnormity is higher, a multithreading parallel mode is generally adopted for calculation so as to meet the prediction efficiency.

Second, prediction of fault occurrence time

Acquiring historical fault time of a similar historical fault knowledge map with high similarity from a knowledge base, acquiring earliest occurrence time of each abnormal event in the historical faults for a certain historical fault, determining the final earliest abnormality time, taking the time of the network element equipment or assembly with faults as fault discovery time, obtaining the time required from the earliest occurrence of the abnormal events to the network element faults, and predicting the future time of the faults according to the current abnormality time and the current earliest time of the abnormal events.

And taking the fault type corresponding to the similar historical fault knowledge graph as the fault type for predicting the fault of the network element equipment in the knowledge graph to be predicted, and taking the similarity between the knowledge graph to be predicted and the similar historical fault knowledge graph as the fault occurrence probability, namely the fault probability. And searching a corresponding key service as a finally predicted possibly influenced service through the predicted fault network element associated with the similar historical fault knowledge map.

Based on any of the above embodiments, the following problems exist in the conventional failure prediction technology for devices in a service system:

inaccurate prediction: in the prior art, most of the operation and maintenance personnel carry out fault prediction based on the relevance of equipment indexes, and the prediction is carried out in such a way, so that the prediction result is inaccurate, and the service fault cannot be predicted; relying on manual processing: a large amount of events, alarms, faults and data logs are generated by each service system and each device, if the processing is carried out by only depending on manpower, a large amount of manpower and material resources are consumed, and the error rate in the processing process cannot be guaranteed; the fault type is not expressed clearly: most of the prior art only analyzes based on equipment logs and equipment key index data, and cannot show the calling relation of various components in an IT system and the abnormity of various components; in addition, according to the current data format, the occurrence time of a fault, the type of the fault, the expression form of the fault, the influence range, and the like cannot be accurately predicted.

In order to solve the above problems, embodiments of the present invention provide a method for predicting a critical service failure based on an operation and maintenance failure knowledge graph for an IT cluster system of a large enterprise, which accesses data such as a system deployment structure, a call relation, an operation state, index data, and the like in real time in an abnormality detection center, immediately processes and analyzes abnormal index data when abnormal index data is detected, forms a to-be-predicted knowledge graph, compares and analyzes the to-be-predicted knowledge graph with a historical failure knowledge graph corresponding to a historical failure, and performs failure prediction based on the to-be-predicted knowledge graph, wherein the to-be-predicted knowledge graph is a data structure in which an abnormal event occurs in the system is stored in a graph form.

The specific process is as follows: extracting abnormal events of each abnormal network element within a period of time from the system, constructing network element subgraphs of the abnormal network elements, and constructing a knowledge graph to be predicted by combining the connection relation between the network elements and association probability obtained by mining a frequent subgraph mining algorithm for the network element subgraphs; sending the knowledge graph to be predicted to an AI analysis center aiming at the obtained knowledge graph to be predicted, and comparing and analyzing the knowledge graph with a historical fault knowledge graph which has faults in the past in the system by the AI analysis center to predict the possible faults and the probability of the faults; and analyzing the time of the fault possibly occurring through the past time data of the fault occurring in the detection system.

According to the method provided by the embodiment of the invention, the faults possibly occurring in the future, the fault occurrence probability and the occurrence time are finally predicted through the processes, so that the fault early warning is achieved, the fault occurrence probability is reduced, the influence on enterprises is reduced, the fault prediction is carried out on the network element equipment based on the map, more comprehensive fault association information can be applied, more accurate prediction results are obtained, and the calling relations of various components in an IT system and the abnormity of various components can be shown.

The following describes the failure prediction apparatus provided by the present invention, and the failure prediction apparatus described below and the failure prediction method described above may be referred to in correspondence with each other.

Based on any of the above embodiments, the present invention provides a failure prediction apparatus. Fig. 9 is a schematic structural diagram of a failure prediction apparatus provided in the present invention, and as shown in fig. 9, the apparatus includes:

a constructing unit 910, configured to construct a knowledge graph to be predicted, where the knowledge graph to be predicted uses network elements as nodes and uses a call relationship or a deployment relationship between the network elements as edges;

a determining unit 920, configured to determine similarity between the knowledge graph to be predicted and each historical failure knowledge graph based on intersection and union of the sub-graph set of the knowledge graph to be predicted and the failure sub-graph set of each historical failure knowledge graph;

and the prediction unit 930 is configured to determine the fault probability corresponding to the knowledge graph to be predicted based on the similarity between the knowledge graph to be predicted and each historical fault knowledge graph.

According to the device provided by the embodiment of the invention, the knowledge graph to be predicted is constructed, and the similarity between the knowledge graph to be predicted and each historical fault knowledge graph is determined based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph, so that the fault probability corresponding to the knowledge graph to be predicted is determined, the accuracy of fault prediction can be improved, fault early warning is realized, the fault occurrence probability is reduced, the influence on an enterprise is reduced, manual processing is not needed, a large amount of manpower and material resources are saved, and the efficiency of fault prediction is improved.

Based on any of the above embodiments, the determining unit 920 is configured to:

wherein the content of the first and second substances,

the weights corresponding to the same sub-graph,

is the depth of the same sub-image,

for the number of subgraphs in the set of subgraphs,

is as follows

The depth of the individual subgraphs.

Based on any of the above embodiments, the constructing unit 910 includes:

a subgraph construction subunit, configured to construct network element subgraphs of different abnormal network elements based on the abnormal events of the different abnormal network elements;

and the map construction subunit is used for constructing the knowledge map to be predicted based on the network element subgraphs of the abnormal network elements.

Based on any of the embodiments above, the map building subunit is configured to:

Based on any of the above embodiments, the apparatus further comprises a temporal prediction unit configured to:

Fig. 10 illustrates a physical structure diagram of an electronic device, and as shown in fig. 10, the electronic device may include: a processor (processor)1010, a communication Interface (Communications Interface)1020, a memory (memory)1030, and a communication bus 1040, wherein the processor 1010, the communication Interface 1020, and the memory 1030 communicate with each other via the communication bus 1040. Processor 1010 may call logic instructions in memory 1030 to perform a fault prediction method comprising: constructing a knowledge graph to be predicted, wherein the knowledge graph to be predicted takes network elements as nodes and takes a calling relationship or a deployment relationship among the network elements as edges; determining the similarity between the knowledge graph to be predicted and each historical fault knowledge graph based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph; and determining the fault probability corresponding to the knowledge graph to be predicted based on the similarity between the knowledge graph to be predicted and each historical fault knowledge graph.

Furthermore, the above logic instructions in the memory 1030 can be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer-readable storage medium, the computer program, when executed by a processor, being capable of executing the failure prediction method provided by the above methods, the method including: constructing a knowledge graph to be predicted, wherein the knowledge graph to be predicted takes network elements as nodes and takes a calling relationship or a deployment relationship among the network elements as edges; determining the similarity between the knowledge graph to be predicted and each historical fault knowledge graph based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph; and determining the fault probability corresponding to the knowledge graph to be predicted based on the similarity between the knowledge graph to be predicted and each historical fault knowledge graph.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for fault prediction provided by the above methods, the method comprising: constructing a knowledge graph to be predicted, wherein the knowledge graph to be predicted takes network elements as nodes and takes a calling relationship or a deployment relationship among the network elements as edges; determining the similarity between the knowledge graph to be predicted and each historical fault knowledge graph based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph; and determining the fault probability corresponding to the knowledge graph to be predicted based on the similarity between the knowledge graph to be predicted and each historical fault knowledge graph.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of fault prediction, comprising:

determining the similarity between the knowledge graph to be predicted and each historical failure knowledge graph based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the failure sub-graph set of each historical failure knowledge graph;

2. The fault prediction method according to claim 1, wherein the determining the similarity between the knowledge graph to be predicted and each historical fault knowledge graph based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph comprises:

3. The failure prediction method of claim 2, wherein the corresponding weight of the same sub-graph is determined based on the following formula:

wherein the content of the first and second substances,

for the weight corresponding to the same sub-graph,

is the depth of the same sub-image,

for the number of subgraphs in the set of subgraphs,

is as follows

The depth of the individual subgraphs.

4. The failure prediction method according to claim 1, wherein the constructing the knowledge graph to be predicted comprises:

5. The fault prediction method according to claim 4, wherein the constructing the knowledge graph to be predicted based on the network element subgraphs of the abnormal network elements comprises:

6. The fault prediction method according to any one of claims 1 to 5, characterized in that each sub-graph in the set of sub-graphs of the knowledge-graph to be predicted is determined based on the following steps:

7. The fault prediction method according to any one of claims 1 to 5, wherein the determining the similarity between the knowledge graph to be predicted and each historical fault knowledge graph based on the intersection and union of the sub-graph set of the knowledge graph to be predicted and the fault sub-graph set of each historical fault knowledge graph further comprises:

8. A failure prediction apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the fault prediction method of any one of claims 1 to 7 when executing the program.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the fault prediction method according to any one of claims 1 to 7.