CN113900844B

CN113900844B - Fault root cause positioning method, system and storage medium based on service code level

Info

Publication number: CN113900844B
Application number: CN202111127982.7A
Authority: CN
Inventors: 沈梦家; 曹立; 隋楷心; 刘大鹏; 王继斌; 张文池; 吴楠; 陈恒茂
Original assignee: Beijing Bishi Technology Co ltd
Current assignee: Beijing Bishi Technology Co ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2024-07-09
Anticipated expiration: 2041-09-26
Also published as: CN113900844A

Abstract

The invention provides a service code level-based fault root cause positioning method, a service code level-based fault root cause positioning system and a storage medium, wherein the service code level-based fault root cause positioning method comprises the following steps: constructing a global heterogeneous topological graph comprising calling relations among systems and calling relations among service codes; constructing a time sequence anomaly detection model based on multidimensional indexes, and carrying out anomaly detection on each calling edge of the global heterogeneous topological graph; generating a heterogeneous fault graph based on the abnormal detection result of each calling edge; and (3) based on a random walk object level ordering algorithm, performing fault root cause positioning on the obtained heterogeneous fault map. The heterogeneous topological graph is adopted, so that the service code calling relationship and the membership relationship with finer granularity are displayed succinctly and clearly; by fusing the association characteristics of the multidimensional indexes, the accuracy of index anomaly detection of the calling edge in the heterogeneous topological structure is effectively improved; the accuracy of fault root cause positioning is effectively improved through a node ordering algorithm of the heterogeneous graph.

Description

Fault root cause positioning method, system and storage medium based on service code level

Technical Field

The invention relates to fault root cause positioning, in particular to fault root cause positioning based on service code level.

Background

With the rapid development of technologies such as cloud computing, service computing, and the increasing demand for business by social production, more and more modern enterprises deploy applications and system services in a cloud computing environment, referred to as distributed cloud applications or micro-services. Compared with the traditional centralized architecture, the distributed architecture has better component expansibility, higher development productivity and lower cost.

To ensure high availability and reliability of the system, application providers must deploy link monitoring systems to collect key performance indicators for each service, such as network response time, service response rate, and success rate, to handle complex distributed environments to meet availability constraints and stringent service level objectives. However, as the service requirements are increasingly complex and the micro-service scale is increasingly large, when faults occur, a large number of index alarms are generated due to the multiple calling dependency relationship among the cross systems, and at this time, a system administrator faces to massive alarm index information and is difficult to quickly check key alarm indexes and corresponding fault root cause systems only by means of manual analysis, so that the machine learning algorithm is required to automatically process and analyze monitoring index data and system topological relationship so as to quickly locate the fault root cause systems.

However, most of the existing link tracking monitoring systems only collect call relation data between systems, perform fault root cause positioning based on call relation of a system layer, and do not consider key information of service codes of system call, so that the existing scheme is difficult to position to a fine-grained fault root cause problem, and abnormal information is easily hidden due to coarse-grained data aggregation information of the system layer.

In addition, due to the complexity and periodicity of the service, the existing simple anomaly detection strategy based on the fixed threshold or k-sigma can have more false positives or false negatives, for example, the effect of the alarm rule that the response rate is lower than 90% and the time exceeds 3 minutes in different services is not satisfactory, and the ideal effect is difficult to achieve. In addition, most of the conventional anomaly detection algorithms only aim at a single index to perform anomaly detection triggering alarm, complex dependency relationships among a plurality of key performance indexes are not considered, false alarm is easy to occur, and especially in an index anomaly detection scene of a finer-granularity calling edge in a heterogeneous topology structure, the false alarm rate is higher.

Finally, aiming at the data scene combined with the system and the service code, the current academia and industry mostly adopt the same level of call data for analysis, but most of actual scenes relate to different levels of call data, and the situation is more complex. Therefore, a solution for locating the root cause of the failure of the fusion system and the service code needs to be proposed.

Disclosure of Invention

In order to solve the above problems existing in the prior art, the present invention provides:

A fault root cause positioning method based on service code level mainly comprises the following steps:

s1, constructing a global heterogeneous topological graph comprising calling relations among systems and calling relations among service codes;

s2, constructing a time sequence anomaly detection model based on multidimensional indexes, and carrying out anomaly detection on each calling edge of the global heterogeneous topological graph;

S3, generating a heterogeneous fault diagram based on an abnormal detection result of each calling edge;

s4, positioning the fault root cause of the obtained heterogeneous fault map based on a random walk object level ordering algorithm.

A fault root cause positioning system based on service code level mainly comprises the following modules:

The global heterogeneous topological graph generation module is used for constructing a global heterogeneous topological graph comprising calling relations among systems and calling relations among service codes;

the anomaly detection module is used for constructing a time sequence anomaly detection model based on multidimensional indexes and carrying out anomaly detection on each call edge of the global heterogeneous topological graph;

The heterogeneous fault diagram generation module is used for generating a heterogeneous fault diagram based on the abnormal detection result of each calling edge;

And the fault root positioning module is used for positioning the fault root of the obtained heterogeneous fault graph based on a random walk object level ordering algorithm.

A storage medium storing a computer program; the computer device performs the method of any of the above claims when the computer program is executed by a processor in the computer device.

According to the invention, by constructing the heterogeneous topological graph, the service code calling relationship and the membership relationship with finer granularity are displayed succinctly and clearly; by fusing the associated characteristics of the multidimensional indexes, a time sequence anomaly detection model based on the multidimensional indexes is constructed, so that anomaly detection of the calling edge of the global heterogeneous topological graph is realized, and compared with the technical problem of high false alarm rate caused by anomaly detection only aiming at a single index in the prior art, the method and the device effectively improve the accuracy rate of the anomaly detection of the indexes of the calling edge in the heterogeneous topological structure; the heterogeneous fault map and root cause system corresponding to the current alarm is automatically processed by combining a node ordering algorithm of the heterogeneous map with a machine learning algorithm, and is briefly displayed to the system in a form of a visual map and root cause recommendation for subsequent analysis and processing, so that an administrator can be assisted to efficiently locate the fault source, and the accuracy of locating the fault root cause is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive effort for a person skilled in the art.

FIG. 1A flow chart of the method of the present invention

FIG. 2 is a heterogeneous topology of the system, service code invocation relationship of the present invention

FIG. 3 shows a model for detecting time series anomalies based on multidimensional indexes

FIG. 4 is a diagram showing the results of the index anomaly detection of the present invention

FIG. 5A heterogeneous fault diagram of the present invention

FIG. 6A is a visual interface for fault root location of the present invention

Detailed Description

In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced otherwise than as described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.

Example 1

In order to solve the problems in the prior art, the present embodiment provides a service code level-based fault root cause positioning method, and a flowchart thereof is shown in fig. 1, and mainly includes the following steps:

s1, constructing a global heterogeneous topological graph comprising calling relations among systems and calling relations among service codes.

In order to locate anomalies and root causes at the service code level of finer granularity, the invention proposes a composition strategy of the mixed relationship between the service codes and the application system. In addition, if there is a system call forwarded by using the enterprise service bus system ESB_F5, the service code call relationship and the service code membership relationship in the upstream and downstream systems can be obtained by sorting the CMDB service call comparison table. The construction process of the heterogeneous topology map is described below with actual sample data.

The service monitoring system collects call data and state of service transaction in detail in the log, for example, after the call log at a certain alarm time is analyzed, the call data and state are shown in the following table 1:

TABLE 1 transaction detail data after parsing

It can be seen that the system nodes S1, S2, S3, S4 and the service code nodes T1, T2 called by the system nodes S1, S2 are included in the data. The call relation among the nodes is comprehensively considered, the constructed heterogeneous topological graph comprising the call relation of the system nodes and the service code nodes is shown in fig. 2, the call relation graph reflecting the global system and the service code is obtained in fig. 2, wherein each call edge is time sequence index data formed by aggregation of transaction detail data and set time granularity, and the indexes adopted by the invention comprise: transaction amount, work amount, response amount, failure amount, unresponsive amount, success rate, response time. Compared with the prior art which only relates to call topological diagrams among systems, the heterogeneous topological diagram comprising the call relations among the systems and the service codes can capture the call relations and the membership relations of the service codes with finer granularity, and the representation form is concise and clear.

The global heterogeneous topology map obtained tends to be very complex, due to the huge and very complex traffic in real traffic. However, only local business is affected when faults occur in the inter-production environment, so the invention provides that the call edge of the global heterogeneous topological graph is subjected to abnormal detection so as to obtain the local heterogeneous topological graph with faults.

S2, constructing a time sequence anomaly detection model based on multidimensional indexes, and carrying out anomaly detection on the calling edge of the global heterogeneous topological graph. The time series abnormality detection model based on the multidimensional index is constructed through a graph attention mechanism as shown in fig. 3. S2 specifically comprises the following steps:

s2.1, normalizing the time sequence of the time window corresponding to the n indexes;

N represents the number of KPI indexes counted by each calling edge, and the n KPI indexes are converted into nodes for representation in order to consider the association characteristics among all indexes, namely the ith index corresponds to the node v _i. Obtaining input features { v ₁,v₂,…,v_n } corresponding to n KPI indexes by adopting a min-max normalization method, wherein Node v _i represents a w-dimensional feature vector corresponding to the ith KPI indicator, the dimension w of the feature vector corresponding to the dimension of the time window.

S2.2, learning the fusion characteristics of the nodes through a graph attention mechanism.

The fusion feature h _i of node v _i is calculated by the following formula:

Wherein N (i) represents a set of neighbor nodes of the node V _i, V _j represents a neighbor node of the node V _i, σ represents a sigmoid activation function, a _ij represents an association weight of the node V _i and the node V _j, the node V _j represents a w-dimensional feature vector corresponding to the j-th index, and the association weight a _ij is calculated by the following formula:

wherein, E _ij represents the attention value of the calling edge between node v _i and node v _j, e _il represents the attention value of the calling edge between node v _i and node v _l,Representing the feature connection operation, leakyReLU is an activation function, W represents a learnable parameter matrix, L represents the number of neighbor nodes of the v _j node, and L represents the sequence number of neighbor nodes of the v _i node.

And calculating to obtain fusion characteristics of all nodes, wherein the fusion characteristics are represented by H _i.

S2.3, learning to obtain the embedded features of the time sequences corresponding to different indexes based on the obtained fusion features H _i of all the nodes.

After learning by a graph attention mechanism, the output feature dimension of the fusion feature H _i of all nodes is n x w, the output feature dimension is connected with the original sequence feature to obtain n x 2w dimension features, and then the n x 2w dimension features are input into an LSTM module to encode long-term time sequence dependent features, and the embedding features of time sequences corresponding to different indexes are obtained through learning.

S2.4 obtaining predicted values of time sequences of all indexes at the time t based on the obtained embedded features of the time sequences corresponding to the different indexes

Specifically, the embedded features of all the indexes are input into a multi-layer perceptron MLP to obtain predicted values of all time sequences at the time tThe mean square error loss function MSE is used as an optimization function:

where n represents the number of predicted indicators.

S2.5 the predicted value at time t based on the time series of all the indicators obtainedAn anomaly score _i (t) representing the degree of deviation of the index is calculated.

Wherein the deviation value for the i-th index is calculated by the following formula:

The deviation value of the index is normalized by the following formula:

wherein score _i (t) is an outlier value, AndThe experiment shows that the normalization effect is optimal. By adopting the time sequence abnormality detection model based on multiple indexes, the method can more intuitively observe the deviation degree of each index.

S2.6, judging whether the calling edge is abnormal or not based on the obtained abnormal score value score _i (t). Specifically, the obtained anomaly score _i (t) representing the deviation degree of the index is compared with a preset threshold, and when the anomaly score _i (t) is larger than the threshold, the detection result of the calling edge is judged to be anomaly. As shown in fig. 4, the red side indicates abnormal and the black side indicates normal.

Compared with the traditional time sequence anomaly detection method, the time sequence anomaly detection model based on the multi-dimensional indexes, which is constructed by the invention, does not depend on any data distribution assumption, and takes correlation dependence characteristics among the multi-dimensional indexes of service call into consideration, so that anomaly detection is more accurate and efficient.

S3, generating a heterogeneous fault diagram based on the abnormal detection result of each calling edge.

Specifically, based on S2, obtaining an abnormal call edge in the heterogeneous topological graph, filtering call edge data with a normal detection result from the global heterogeneous topological graph, and obtaining a heterogeneous fault graph only showing a fault part. For example, filtering the global heterogeneous topology map of FIG. 2 results in a heterogeneous fault map as shown in FIG. 5.

Specifically, S4 includes the following steps:

s4.1, determining an object set V and an object type set A based on the heterogeneous fault diagram generated in the S3.

In particular, the heterogeneous fault map generated by S3 may be formally represented asWhere v and epsilon represent the object set and the relationship set, respectively. Since the heterogeneous map contains a plurality of types of objects, an object type mapping function is setWherein A represents a set of object types that are not repeated after mapping, and objects of the same type of a plurality of different instances are mapped to corresponding object types through a mapping function.

S4.2, based on the obtained object type set A, corresponding exception propagation factors are allocated for different object types.

And based on the importance degrees of different object types in the heterogeneous fault graph, corresponding abnormal propagation factors are distributed for the different object types. Specifically, the anomaly propagation factors for different object types may be learned through distribution by expert knowledge or through a combined search optimization algorithm, such as a simulated annealing optimization algorithm, based on historical data.

Compared with the method in the prior art that the difference of the abnormal propagation among different object types is not considered, the method and the device have the advantages that the difference of the weights of the abnormal propagation among different object types is expressed by setting the abnormal propagation factors among different object types, so that the accuracy and pertinence of the subsequent root cause score calculation are effectively improved.

And S4.3, based on the obtained object set V, adopting PageRank algorithm to obtain pivot values of the objects through iterative computation, and taking the pivot values as initial root cause scores R _ea of the objects.

Where a represents any object in the object set V.

S4.4 determines a root score R _x for each object based on the obtained anomaly propagation factor and the initial root score.

Specifically, the root cause score R _x of the object x is obtained by:

Wherein X, Y represents an object set with a type of X and an object set with a type of Y in the object type set a, respectively, X represents an object in the object set with a type of X, and Y represents an object in the object set with a type of Y; r _x and R _y represent root scores for object x and object y, respectively; m _xY is a adjacency matrix, the elements in M _xY are denoted by M _xY, and if there is a relationship between object x and object type Y, then M _xY =num (x, Y); if there is no relationship between object x and object type Y, m _xY =0; num (x, Y) represents the sum of the number of relationships between object x and all objects in the set of objects of type Y; gamma _XY denotes an abnormal propagation factor between the object type X and the object type Y, Epsilon represents the attenuation factor and is selected based on expert knowledge.

The method and the device effectively solve the problem that the relation among different object types is not considered by the initial root cause score through combining the object ordering algorithm of the heterogeneous graph.

S4.5, selecting an object corresponding to the root cause score of the top-K as a fault root cause positioning result based on the obtained root cause score of each object.

Wherein the root score of top-K represents the top K largest root scores.

Specifically, the obtained fault root positioning result is displayed in a visual form, as shown in fig. 6, and is provided for a system administrator to refer to.

The heterogeneous topological graph is adopted, so that the service code calling relationship and the membership relationship with finer granularity are displayed succinctly and clearly; by fusing the association characteristics of the multidimensional indexes, the accuracy of index anomaly detection of the calling edge in the heterogeneous topological structure is effectively improved; the node ordering algorithm of the heterogeneous graph is adopted to conduct root cause positioning, the pivot value of abnormal propagation of objects in the heterogeneous graph is considered, the abnormal propagation cause among different object types is considered, after the system monitoring data is processed through the algorithm processing framework, the heterogeneous fault graph and the root cause system corresponding to the current alarm are obtained through automatic processing of the machine learning algorithm, and the heterogeneous fault graph and the root cause system are displayed to the system in a visual form and a root cause recommended form for analysis and processing, so that an administrator can be assisted to efficiently position a fault root cause, and the accuracy of positioning the fault root cause is effectively improved.

Example two

The embodiment provides a fault root cause positioning system based on service code level, which mainly comprises the following modules:

The global heterogeneous topological graph generation module is used for constructing a global heterogeneous topological graph comprising call relations among systems and call relations among service codes.

In order to locate service code level anomalies and root causes of finer granularity, the invention proposes a composition strategy of a mixed relation between service codes and application systems. In addition, if there is a system call forwarded by using the enterprise service bus system ESB_F5, the service code call relationship and the service code membership relationship in the upstream and downstream systems can be obtained by sorting the CMDB service call comparison table.

The anomaly detection module is used for constructing a time sequence anomaly detection model based on multidimensional indexes and carrying out anomaly detection on the call edge of the global heterogeneous topological graph. The anomaly detection model is constructed by a graph attention mechanism as shown in fig. 3. The abnormality detection module is used for realizing the following functions:

First, the time sequence of the time window corresponding to the n indexes is normalized.

N represents the number of KPI indexes counted by each calling edge, and the n KPI indexes are converted into nodes for representation in order to consider the association characteristics among all indexes, namely the ith index corresponds to the node v _i. Obtaining input features { v ₁,v₂,…,v_n } corresponding to n KPI indexes by adopting a min-max normalization methodNode v _i represents a w-dimensional feature vector corresponding to the ith KPI indicator, the dimension w of the feature vector corresponding to the dimension of the time window.

The fusion characteristics of different nodes are learned through a graph attention mechanism.

Specifically, the fusion feature h _i of the node v _i is calculated by the following formula:

Where e _ij represents the attention value of the calling edge between node v _i and node v _j, e _il represents the attention value of the calling edge between node v _i and node v _l, Representing the feature connection operation, leakyReLU is an activation function, W represents a learnable parameter matrix, L represents the number of neighbor nodes of the v _j node, and L represents the sequence number of neighbor nodes of the v _i node. And calculating to obtain fusion characteristics of all nodes, wherein the fusion characteristics are represented by H _i.

Based on the obtained fusion characteristics H _i of all the nodes, learning to obtain the embedded characteristics of the time sequences corresponding to different indexes.

Obtaining predicted values of time sequences of all indexes at the time t based on the obtained embedded features of the time sequences corresponding to the different indexes

Where n represents the number of predicted indicators. The method adopts the time sequence abnormality detection model based on multiple indexes, and can more intuitively observe the deviation degree of each index.

Said predicted value at time t based on the time series of all said indicators obtainedAn anomaly score _i (t) representing the degree of deviation of the index is calculated.

The deviation value of the index is normalized by the following formula:

Based on the obtained anomaly score value score _i (t), it is determined whether the call edge is anomalous.

Specifically, the obtained anomaly score _i (t) representing the deviation degree of the index is compared with a preset threshold, and when the anomaly score _i (t) is larger than the threshold, the detection result of the calling edge is judged to be anomaly. As shown in fig. 4, the red side indicates abnormal and the black side indicates normal.

The heterogeneous fault diagram generation module is used for generating a heterogeneous fault diagram based on the abnormal detection result of each calling edge.

Specifically, based on the abnormality detection module, an abnormal call edge in the heterogeneous topological graph is obtained, call edge data with a normal detection result is filtered from the global heterogeneous topological graph, and a heterogeneous fault graph only displaying a fault part is obtained. For example, filtering the global heterogeneous topology map of FIG. 2 results in a heterogeneous fault map as shown in FIG. 5.

Specifically, the fault root positioning module is used for realizing the following functions:

and determining an object set V and an object type set A based on the heterogeneous fault map generated by the heterogeneous fault map generation module.

In particular, the heterogeneous fault map generated by the heterogeneous fault map generation module may be formally represented asWhere v and epsilon represent the object set and the relationship set, respectively. Since the heterogeneous map contains a plurality of types of objects, an object type mapping function is setWherein A represents a set of object types that are not repeated after mapping, and objects of the same type of a plurality of different instances are mapped to corresponding object types through a mapping function.

Based on the obtained object type set A, corresponding exception propagation factors are allocated for different object types.

Based on the obtained object set V, the pivot value of each object is obtained through iterative calculation by adopting a PageRank algorithm and is used as an initial root cause score R _ea of each object.

Where a represents any object in the object set V.

Based on the obtained anomaly propagation factor and the initial root score, a root score R _x for each object is determined.

Specifically, the root cause score R _x of the object x is obtained by:

And selecting an object corresponding to the root cause score of the top-K as a fault root cause positioning result based on the obtained root cause score of each object.

Wherein the root score of top-K represents the top K largest root scores.

Embodiment III:

the present embodiment provides a storage medium storing a computer program; the computer device performs the method of any of the above claims when the computer program is executed by a processor in the computer device.

The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the embodiments. However, it will be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Thus, the foregoing descriptions of specific embodiments described herein are presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the embodiments to the precise form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art in light of the above teachings. Additionally, as used herein to refer to the position of a component, the terms above and below or their synonyms do not necessarily refer to absolute positions relative to external references, but rather to relative positions of components with reference to the figures.

Furthermore, the foregoing figures and description include many concepts and features that can be combined in various ways to achieve various benefits and advantages. Thus, features, components, elements, and/or concepts from the various figures may be combined to produce embodiments or implementations that are not necessarily shown or described in this specification. Furthermore, not all of the features, components, elements, and/or concepts illustrated in the drawings or description may be required in any particular embodiment and/or implementation. It should be understood that such embodiments and/or implementations fall within the scope of the present description.

Claims

1. The fault root cause positioning method based on the service code level is characterized by comprising the following steps:

s2, constructing a time sequence anomaly detection model based on multidimensional indexes, and carrying out anomaly detection on each call edge of the global heterogeneous topological graph;

s4, positioning a fault root cause of the obtained heterogeneous fault map based on a random walk object level ordering algorithm;

the step S2 further comprises the following steps:

s2.1, normalizing the time sequence of the time window corresponding to the n index;

s2.2, learning the fusion characteristics of the nodes through a graph attention mechanism;

s2.3, learning to obtain embedded features of time sequences corresponding to different indexes based on the obtained fusion features Hi of all the nodes;

s2.4 obtaining predicted values of time sequences of all indexes at the time t based on the embedded characteristics of the time sequences corresponding to the obtained different indexes

S2.5 predicted values at time t based on the time series of all the obtained indicatorsCalculating to obtain an anomaly score value score (t) representing the deviation degree of the index;

S2.6, judging whether the calling edge is abnormal or not based on the obtained anomaly score value score (t).

2. The service code level based fault cause localization method of claim 1, wherein each of the call edges of the global heterogeneous topology is time series index data generated by aggregation of transaction detail data and a set time granularity, the index data comprising at least a combination of two or more of transaction amount, effort amount, response amount, failure amount, unresponsive amount, success rate, response time.

3. The service code level based fault root location method as defined in claim 1, wherein the S2.2 learning the fusion feature of the node through a graph attention mechanism comprises:

the fusion feature hi of node i is calculated by the following formula:

Wherein N (i) represents a set of neighbor nodes of the node v _i, v _j represents a neighbor node of the node v _i, σ represents a sigmoid activation function, α _ij represents an association weight of the node v _i and the node v _j, and the node v _j represents a w-dimensional feature vector corresponding to a jth KPI index;

the association weight α _ij is calculated by the following formula:

wherein, Ij represents the attention value of the calling edge between node v _i and node v _j, e _il represents the attention value of the calling edge between node v _i and node v _l,Representing the feature connection operation, leakyReLU is an activation function, W represents a learnable parameter matrix, L represents the number of neighbor nodes of the v _j node, and i represents the sequence number of the neighbor node of the v _i node.

4. The service code level based fault root location method as defined in claim 1, wherein the S4 comprises the steps of:

s4.1, determining an object set V and an object type set A based on the heterogeneous fault diagram generated in the step S3;

s4.2, based on the obtained object type set A, corresponding exception propagation factors are distributed for different object types;

S4.3, based on the obtained object set V, adopting PageRank algorithm to obtain pivot values of all objects through iterative computation as initial root cause scores (Rea) of all objects;

s4.4, determining root cause scores Rx of each object based on the obtained abnormal propagation factors and the initial root cause scores; s4.5, selecting an object corresponding to the root cause score of the top-K as a fault root cause positioning result based on the obtained root cause score of each object.

5. The service code level based fault root location method as defined in claim 4, wherein S4.2 comprises: the anomaly propagation factors are assigned by expert knowledge or by a combinatorial search optimization algorithm based on historical data.

6. The service code level based fault root location method as defined in claim 4, wherein S4.4 comprises:

the root score Rx of object x is calculated by the following formula:

Wherein X, Y represents an object set with a type of X and an object set with a type of Y in the object type set a, respectively, X represents an object in the object set with a type of X, and Y represents an object in the object set with a type of Y; rx and Ry respectively represent root scores of an object x and an object y, and R _ex is an initial root score of an x node; mxY is a adjacency matrix, the elements in MxY are denoted by mxY, mxY =num (x, Y) if there is a relationship between object x and object type Y; mxY =0 if there is no relationship between object x and object type Y; num (x, Y) represents the sum of the number of relationships between object x and all objects in the set of objects of type Y; gamma _XY denotes an abnormal propagation factor between the object type X and the object type Y, Epsilon represents the attenuation factor.

7. The service code level based fault root cause localization method as claimed in claim 4, wherein the root cause score of top-K represents the first K largest root cause scores; the S4.5 further comprises: and displaying the obtained fault root cause positioning result in a visual form.

8. The fault root cause positioning system based on the service code level is characterized by mainly comprising the following modules:

The fault root cause positioning module is used for positioning the fault root cause of the obtained heterogeneous fault map based on a random walk object level ordering algorithm;

The constructing of the time sequence anomaly detection model based on the multidimensional index, and the anomaly detection of each call edge of the global heterogeneous topological graph further comprises the following steps:

9. A storage medium, characterized in that it stores a computer program; computer device performing the method according to any of claims 1-7, when said computer program is executed by a processor in the computer device.