CN116909788A

CN116909788A - Multi-mode fault diagnosis method and system with unchanged task direction and visual angle

Info

Publication number: CN116909788A
Application number: CN202310844026.3A
Authority: CN
Inventors: 李兵; 谢帅宇; 王健; 何涵彬
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-10-20

Abstract

The application discloses a task-oriented and visual-angle-unchanged multi-modal fault diagnosis method and system, wherein the method comprises the steps of constructing an instance dependency graph of a micro-service system based on tracking information to obtain multi-modal event expressions of each instance; based on the example dependency graph, randomly inactivating part of non-root cause nodes to obtain an augmentation data set; performing feature aggregation on the instance dependency graph based on the graph neural network, obtaining multi-mode graph level fault expression, performing task-oriented learning on the multi-mode graph level fault expression, and constructing cross-mode association; based on the fault expression of the graph level, the joint learning root causes are positioned and the fault classification is carried out on two fault diagnosis tasks, and the final ranking of the fault root causes and the fault type are obtained. The application screens out fault root factors from a large number of fault microservices and identifies fault types by deeply mining the hidden relations between the diagnosis tasks and the data of different modes and constructing cross-mode relations, thereby assisting engineers in carrying out fault diagnosis.

Description

Multi-mode fault diagnosis method and system with unchanged task direction and visual angle

Technical Field

The application relates to the technical field of software engineering microservices, in particular to a multi-mode fault diagnosis method and system with unchanged task direction and view angle.

Background

Microservice architectures are becoming increasingly popular because of their scalability, fast iteration, and the like. The micro-service architecture divides the single application architecture into a plurality of micro-services according to business logic, each micro-service comprises a plurality of instance copies, and the micro-services are called with the RPC through HTTP. However, complex call relationships in micro-service systems allow failures to be passed from one micro-service instance to another, resulting in a large number of anomalous micro-services occurring simultaneously. For example, the facebook downtime incident lasts for approximately 6 hours, 9 months 2021, resulting in a loss of about $7980.

Fault diagnosis is a very necessary task in a micro-service system, and when a fault occurs, an operation and maintenance person needs to analyze monitoring data of multiple modes, quickly locate a fault-occurring micro-service instance (such as login service), analyze a fault type (such as insufficient memory), and pertinently adopt a strategy (improve memory configuration). A great deal of research work is currently done to diagnose faults based on a single monitoring data, such as GIED and DejaVu by extracting features of historical Metric for fault detection and localization; microRank and MicroSketch utilize the structural information and call details of Trace to infer fault microservices with maximum probability; dyCause performs fault diagnosis by extracting log information of the API. However, the monitoring data of a single modality does not contain comprehensive information, such as only Trace information, which is difficult to diagnose hardware resource faults.

In recent years, attention has been paid to how to integrate monitoring data of a plurality of modalities for fault diagnosis. The difficulty in fusing the multi-mode monitoring data is that the data structure difference among different modes is large, the association is difficult to establish, the time scale is difficult to align, and the like. Some work now focuses on fusing multi-modal monitoring data, opening the view of the fault diagnostic tool. For example, the division fusion adopts an early fusion mode to unify multi-mode monitoring data into the characteristics of events during data processing for subsequent specific downstream tasks; eadro adopts a medium-term fusion mode, and different mode data are respectively extracted into high-dimensional features by adopting a specific neural network and then fused.

The present inventors have found that at least the following technical problems exist in the prior art in the process of implementing the present application:

compared with a method based on single-mode data, the method only utilizes part of effective information, and the DiagFusion and the Eadro have better performance in fault diagnosis. However, the two methods only perform simple feature extraction before the multi-mode data features are fused, the DiagFusion is directly fused on the data layer, the earro is fused on the feature layer, and the fusion is directly put into a downstream diagnosis task, but no targeted training is performed based on the characteristics of the respective modes and tasks, so that the fault diagnosis accuracy is not high.

Disclosure of Invention

The invention aims to provide a multi-mode fault diagnosis method and system with unchanged task direction and visual angle, which are used for solving or at least partially solving the technical problem of low fault diagnosis accuracy in the prior art.

In order to solve the technical problems, the technical scheme of the invention is as follows:

the first aspect provides a task-oriented and view-invariant multi-modal fault diagnosis method, comprising:

s1: constructing an instance dependency graph of a micro-service system based on tracking information, uniformly extracting and encoding multi-modal events of each micro-service instance on the instance dependency graph, and acquiring multi-modal event expression of each micro-service instance, wherein the instance dependency graph of the micro-service system comprises nodes and edges, the nodes represent the micro-service instances, the edges represent calling relations among the micro-service instances, and the nodes comprise root cause nodes and non-root cause nodes;

s2: based on the example dependency graph, randomly inactivating part of non-root cause nodes to obtain an augmentation data set;

s3: performing feature aggregation on the instance dependency graph based on the graph neural network and the acquired multi-mode event expression, acquiring multi-mode graph level fault expression, and performing task-oriented learning and cross-mode association on the multi-mode graph level fault expression;

S4: based on the fault expression and augmentation data set of the graph level, two fault diagnosis tasks of root cause positioning and fault classification are combined to obtain a final multi-mode fault diagnosis result, wherein the final multi-mode fault diagnosis result comprises a fault root cause ranking and a fault type.

In one embodiment, step S1 includes:

s1.1: capturing interaction tracks between micro service instances based on a distributed system tracking technology, obtaining tracking information through the collected and captured interaction tracks, and constructing an instance dependency graph of the micro service system based on the tracking information;

s1.2: extracting and coding abnormal events in the index sequence, extracting and coding the abnormal events of each span in tracking, extracting and coding a log template in a log, wherein tracking is used for recording the track of a user request in a micro-service system;

s1.3: and (3) encoding an event sequence of three modes to obtain event expressions of the three modes.

In one embodiment, step S2 obtains the number of non-causative nodes that are randomly deactivated according to the following formula:

wherein p represents the proportion of nodes that are randomly deactivated,representative example dependency graph->M is the number of non-root nodes that are randomly deactivated.

In one embodiment, step S3 includes:

s3.1: dependency graph for any instance in graph list GLIs the kth node v of (2) _k Is expressed by three modes of (a)Encoder (E) using three topology-based adaptive graph neural networks _T ,E _M ,E _L ) Feature aggregation is carried out on each modal expression to obtain node expression fused with neighbor node information +.>Further a multi-modal graph level fault expression is obtained, wherein +.>Respectively represent node v _k The event expression of the index mode of (2) tracking the event expression of the mode and the event expression of the log mode; e (E) _T ,E _M ,E _L The encoder of tracking mode, the encoder of index mode and the encoder of log mode respectively, < ->V under index mode respectively _k Node expression gathering information of all neighbor nodes in K hops and v in tracking mode _k Node expression aggregating all neighbor node information in K hops and v in log mode _k Node expression of all neighbor node information in K hops is gathered;

s3.2: mining potential contributions of each modality to a specific task by adopting task-oriented learning;

s3.3: the view-invariant information is mined by constructing associations between multiple modalities, the view-invariant information including anomalous microservices and degrees of failure.

In one embodiment, the three modes include an index mode, a tracking mode and a log mode, wherein the acquisition mode of the index mode diagram level fault expression comprises the following steps:

adopting a TAG graph neural network as an encoder to perform feature fusion on the index modes:

wherein ,representing index characteristics of z-th neighbor node in index mode, W ₁ and W₂ A learnable parameter representing a TAG network;

after passing through the two-layer graph neural network, graph level fault expression of the index mode is obtained through a maximum pooling layer:

wherein ,representing the number of all nodes in the instance dependency graph, f ^M Graph level fault expression representing an index modality.

In one embodiment, the penalty in step S3.2 is a multi-modal task oriented penalty, expressed as:

wherein ,loss of tracking mode, index mode and log mode respectively, +.>Representing a multi-modal task oriented penalty;

the loss in S3.3 is a cross-modal loss of three modes, expressed as:

wherein ,cross-modal contrast loss representing index modality and log modality, < >>Cross-modal contrast loss representing index modality and tracking modality, < >>Is a cross-modal loss of three modes.

In one embodiment, step S4 includes:

S4.1: and adopting a medium-term fusion mode, and fusing multi-mode graph level fault expression during model training:

wherein ,f^M ，f ^T and f^L The method comprises the steps of respectively representing graph level fault expression of an index mode, tracking graph level fault expression of a mode and graph level fault expression of a log mode;

s4.2: the root cause positioning and fault classification tasks are jointly trained by adopting two multi-layer perceptron, and for the root cause positioning task, the difference between the root cause and the true root cause obtained by measuring the cross entropy loss function is selected:

wherein T represents the number of failure samples, N represents the total number of micro-service examples, and y when the root cause of the ith sample is micro-service example m _i,m =1, otherwise y _i,m ＝0；Representing root loss due to positioning task; p is p _i,m The probability that the root cause representing the ith sample is the micro-service instance m;

for the fault classification task, cross entropy loss is selected as an optimization objective:

wherein C represents a total number of failure categories, y when the failure type of the ith sample is type C _i,c =1, otherwise y _i,c ＝0，Showing the failure classification task loss; p is p _i,c A probability that the fault type representing the i-th sample is c;

s4.3: considering the loss of two fault diagnosis tasks, the task direction loss and the cross-modal loss in a unified way, the final optimization objective can be expressed as:

Where a, ss, epsilon represent the weight of each component, respectively,representing a multi-modal task oriented penalty, +.>Is a cross-modal loss.

Based on the same inventive concept, a second aspect of the present invention provides a task oriented and view-invariant multi-modal fault diagnosis system, comprising:

the system comprises a multi-modal event expression acquisition module, a multi-modal event expression generation module and a multi-modal event generation module, wherein the multi-modal event expression acquisition module is used for constructing an instance dependency graph of a micro-service system based on tracking information, uniformly extracting and encoding multi-modal events of each micro-service instance on the instance dependency graph, and acquiring multi-modal event expressions of each micro-service instance, wherein the instance dependency graph of the micro-service system comprises nodes and edges, the nodes represent the micro-service instances, the edges represent calling relations among the micro-service instances, and the nodes comprise root cause nodes and non-root cause nodes;

the augmentation data acquisition module is used for acquiring an augmentation data set based on an example dependency graph by randomly inactivating part of non-root cause nodes;

the graph level fault expression acquisition module is used for carrying out feature aggregation on the instance dependency graph based on the graph neural network and the acquired multi-mode event expression, acquiring multi-mode graph level fault expression, and carrying out task-oriented learning and cross-mode association on the multi-mode graph level fault expression;

The multi-mode fault diagnosis module is used for obtaining a final multi-mode fault diagnosis result, comprising a fault root ranking and a fault type, based on two fault diagnosis tasks of fault expression and augmentation data sets of a graph level and combined learning root positioning and fault classification.

Based on the same inventive concept, a third aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed, implements the method of the first aspect.

Based on the same inventive concept, a fourth aspect of the present application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor implementing the method according to the first aspect when executing said program.

Compared with the prior art, the application has the following advantages and beneficial technical effects:

the application discloses a task-oriented and visual-angle-unchanged multi-modal fault diagnosis method, which comprises the steps of firstly constructing an instance dependency graph of a micro-service system based on tracking information, uniformly extracting and encoding multi-modal events of each instance on the instance dependency graph, and obtaining multi-modal event expression of each instance; then based on the example dependency graph, randomly inactivating part of non-root cause nodes to obtain an augmentation data set; then, carrying out feature aggregation on the instance dependency graph based on a graph neural network, obtaining a multi-mode graph level fault expression, and carrying out task-oriented learning and cross-mode association construction on the multi-mode graph level fault expression; and finally, based on fault expression of the graph level, combining two fault diagnosis tasks of root cause positioning and fault classification to obtain final root cause ranking and fault types. The application screens out the fault root cause and identifies the fault type from a large number of fault micro-services by deeply mining the hidden relations between the diagnosis task and the data of different modes and constructing the cross-mode relation, solves the technical problem that the fault diagnosis accuracy is lower due to the fact that the diagnosis task does not fully mine the hidden relations among the modes of the diagnostician based on the multi-mode (tracking, index and log) monitoring data, omits the technical problem that the fault diagnosis accuracy is lower due to the preference of the diagnosis task to the specific mode, achieves the technical effect of improving the fault diagnosis accuracy, and assists engineers in carrying out fault diagnosis.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a task oriented and view invariant multi-modal fault diagnosis method disclosed in an embodiment of the present application;

FIG. 2 is an example dependency graph constructed in an embodiment of the application;

FIG. 3 is a schematic sequence diagram of an index in an embodiment of the application;

FIG. 4 is a log-like illustration in an embodiment of the application;

FIG. 5 is an architecture diagram of a task oriented and view invariant multi-modal fault diagnosis system in an embodiment of the application;

FIG. 6 is a schematic diagram of a parameter experiment of the inactivation ratio p in the embodiment of the present application;

fig. 7 is a schematic diagram of a parameter experiment of the dimension parameter δ in the embodiment of the present application.

Detailed Description

The present inventors have found through a great deal of research and practice that: most of fault diagnosis methods in the prior art are based on single-mode monitoring data, however, the limited information of the single-mode monitoring data can not enable a diagnostician to cover all fault scenes; the existing diagnostic device based on multi-mode (tracking, index and log) monitoring data does not fully mine the hidden relation among modes, ignores the preference of a diagnosis task to a specific mode, and has lower fault diagnosis accuracy.

In order to solve the above problems, the present application provides a multi-mode fault diagnosis method with unchanged task direction and viewing angle, comprising the following steps: constructing an instance dependency graph of the micro-service system based on the tracking information, uniformly extracting and encoding multi-modal events of each instance on the instance dependency graph, and acquiring multi-modal event expression of each instance; based on the example dependency graph, randomly inactivating part of non-root cause nodes to obtain an augmentation data set; performing feature aggregation on the instance dependency graph based on the graph neural network, obtaining multi-mode graph level fault expression, performing task-oriented learning on the multi-mode graph level fault expression, and constructing cross-mode association; based on the fault expression of the graph level, the joint learning root causes are positioned and the fault classification is carried out on two fault diagnosis tasks, and the final ranking of the fault root causes and the fault type are obtained. The application screens out fault root factors from a large number of fault microservices and identifies fault types by deeply mining the hidden relations between the diagnosis tasks and the data of different modes and constructing cross-mode relations, thereby assisting engineers in carrying out fault diagnosis.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Example 1

The embodiment discloses a multi-mode fault diagnosis method with unchanged task direction and viewing angle, please refer to fig. 1, which comprises the following steps:

Specifically, because the data volume in the multi-mode fault diagnosis data set is rare, the manual annotation is difficult, and the training data can be expanded through data augmentation. The data is sometimes lost in the collection of the monitoring data (for example, the data in a certain mode cannot be collected due to network, storage and other reasons), and the data augmentation mode in the embodiment simulates the situation, so that the data set is diversified as much as possible, and the trained model has stronger generalization capability.

In one embodiment, step S1 includes:

Referring to fig. 2, the right is an example dependency graph constructed in an embodiment of the present invention, and the left is tracking information of a micro service system.

In the specific implementation process, in step S1.2, the extraction and coding of the abnormal event in the index sequence may be implemented by the following manner:

for a given index sequence (as shown in fig. 3, which is an index sequence), an abnormal value in the index sequence is extracted by adopting a 3-sigma anomaly detection algorithm, and the 'content' in the event template records the index name and the anomaly direction of the anomaly, wherein the anomaly direction refers to the position of the anomaly value relative to the upper and lower thresholds of the 3-sigma anomaly. For example, <1684475398705, S1, CPU_usage, +.o, indicates that the CPU usage is above the upper threshold of 3-sigma.

Extracting and encoding the abnormal event of each span in the tracking can be achieved by the following modes:

the trace information is shown in the left diagram of fig. 2, and trace records the trace of the user request in the micro-service system, one trace information is composed of a plurality of spans, each span is reported by a specific micro-service, and the spans record the processing time, the father node ID and the ID of the current node. For a given piece of tracking information, classifying according to a tracking path, extracting abnormal values in all calling delays of each span in the tracking path by adopting a 3-sigma abnormality detection algorithm, and recording called parties when abnormality occurs by the content in an event template, wherein <1684475398705, S1 and S2> indicate that an S1 micro-service instance is abnormal when an S2 micro-service instance is called.

Extracting and encoding the log templates in the log can be achieved by the following ways:

the log is semi-structured data, fig. 4 shows a sample of the log, which is mainly composed of a static log template (non call service: # as a downstream service) and a dynamically changing message (xxService), and the key of extracting the log event is to mine the log template. The log template in the log document can be extracted by widely used Drain log parsing technology, and the "content" in the event template records the template ID when the event occurs, for example, 17asc28ud in <1684475398705, S1, 17asc28ud > represents the ID of the log template.

Step S1.3 encodes an event sequence of three modes, thereby obtaining event expressions of the three modes. As shown in FIG. 2, an instance dependency graphEach node of the sequence of events is corresponding to a micro service instance in a micro service system and the calling relation between the micro service instance and the micro service instance, the event sequences are grouped according to instance IDs in the events, and the event sequences are added into the corresponding nodes as attributes. The event sequence is analogized to sentences in natural language, each event in the sequence is analogized to each word in the sentences, and fastText word embedding models are adopted to carry out independent encoding on the event sequence of each mode, so that high-dimensional expression of each mode event is obtained. Then picture- >Has the attribute:

X＝(x ^M ,x ^T ,x ^L )

wherein x^M ,x ^T ,x ^L The index modality of the node v, the tracking modality and the high-dimensional expression of the event sequence of the log modality are represented respectively. And carrying out event extraction operation on each fault data of the history to obtain a graph list GL.

Specifically, consider an instance dependency graphThe non-root micro service examples are divided into abnormal non-root micro service examples (non-root node) and normal micro service examples (root node). Firstly, the missing normal micro-service instance does not affect root cause derivation, because existing methods, such as MicroRCA, AAMR, etc., are mostly derived based on monitoring data of abnormal micro-services; secondly, removing part of data of abnormal non-root cause abnormal micro service instance can reduce the range of fault root cause positioning, so that the missing part of non-root cause nodes can not influence the derivation of root cause.

Referring to fig. 6, in this embodiment, p=0.2 is shown as a parametric experiment diagram of the inactivation ratio p.

Dependency graph for any instance in graph list GLRandom inactivation example dependency graph->The middle part of non-root cause node is used for obtaining an augmentation chart +.>Thereby, the effect of data augmentation can be achieved. According to the formula, the number of the non-root cause nodes which are deactivated randomly can be obtained, and the increment obtained after the random deactivation is carried outBroad view->Is also added to the graph list GL and is +.>Share the same root cause of failure and failure type label.

In one embodiment, step S3 includes:

s3.1: dependency graph for any instance in graph list GLIs the kth node v of (2) _k Is expressed by three modes of (a)Encoder (E) using three topology-based adaptive graph neural networks _T ,E _M ,E _L ) Feature aggregation is carried out on each modal expression to obtain node expression fused with neighbor node information +.>Further a multi-modal graph level fault expression is obtained, wherein +.>Respectively represent node v _k The event expression of the index mode of (2) tracking the event expression of the mode and the event expression of the log mode; e (E) _T ,E _M ,E _L The encoder of tracking mode, the encoder of index mode and the encoder of log mode respectively, < ->V under index mode respectively _k Node expression gathering information of all neighbor nodes in K hops and v in tracking mode _k Node expression aggregating all neighbor node information in K hops and v in log mode _k Aggregating all neighbor node information within a K-hopNode expression of (a);

the loss in S3.3 is a cross-modal loss of three modes, expressed as:

Specifically, after obtaining the fault expressions of the graph levels of the three modes, the task-oriented learning is adopted in step S3.2 to mine the potential contribution of each mode to a specific task, taking the tracking mode and root cause positioning task as an example, the collected tracking is similar for faults occurring in the same micro service instance at different times, because the micro service and fault link affected by the same fault root cause are similar, and therefore, the acceptance between the tracking mode information of the two faults can be maximized at the expression level. Specifically, given a small batch of fault samples s= { S ₁ ,s ₂ …,s _n (i) th failure sampleGraph level representation f including tracking _i ^T And the corresponding root cause label y _i Definition and failure samples s _i Is s _i Positive sample set P (i) = { j|j e [1, n],y _j ＝y _i Therefore, for tracking modality and root cause localization task, the corresponding task oriented penalty function ∈ ->Can be defined as:

wherein g (i) is used to measure the trace expression f _i ^T Whether in a small batch of samples, g (i) is closer to the positive sample set, can be expressed as:

wherein the phi function is used to calculate the degree of similarity of the two features for a given ith feature f in the trace mode _i ^T And a given jth feature f _j ^T ，f _i ^T and f_j ^T The degree of similarity of (c) can be calculated as:

where τ is the temperature coefficient used to adjust the sensitivity to difficult samples and sim (·) represents the cosine similarity of the two features.The common effective information of the tracking mode in a small batch of samples under the same fault root cause can be amplified. Also, because of the wide use of index modalities in root cause positioning, index modalities mayPerforming task-oriented learning with the root cause positioning task; the log mode contains rich system detail information, so that the log mode can conduct task-oriented learning with a fault identification task. The multi-modal task-oriented penalty can be expressed as:

wherein , and />Indicating the task oriented loss of the index and log, respectively.

And step S3.3, while performing task oriented learning, the hidden relations among the multi-modal expressions can be mined. Different modalities may be considered to describe the same fault from different perspectives, although these modalities focus on different levels of the micro-service system, there is some perspective-invariant information that exists in all modalities, such as abnormal micro-services and the extent of the fault, etc. Such view-invariant information can therefore be mined by constructing associations between multiple modalities. Given a small batch of fault samples s= { S ₁ ,s ₂ …,s _n Each failure sample s _i ＝(f _i ^M ,f _i ^T ,f _i ^L ) The graph level expressions of three modes are included, are positive samples, and are negative samples. Thus, an attempt is made to zoom in the distance of any two modalities in the feature space, for any two modalities D ₁ and D₂ The learning objective may be expressed as:

wherein ,is a contrast loss of two modal expressions, which can be calculated specifically as:

wherein if k+.i, thenOtherwise->Taking the index as a core mode, enabling the graph level expression of the tracking and the log to be close to the graph level expression of the index, wherein the cross-mode loss of the three modes can be expressed as follows:

wherein ,cross-modal contrast loss representing metrics and logs, < ->Indicating cross-modal contrast loss of the metrics and tracking. By->Leading, the view invariant information among the three modes is enhanced.

In one embodiment, step S4 includes:

Specifically, step S4.1 is a graph level fault expression fusing three modalities. The three-mode graph level fault expression is used as a reference for a downstream fault diagnosis task, so that a mid-term fusion mode is adopted, and the multi-mode graph level fault expression is fused during model training.

For two downstream fault diagnosis tasks: root cause localization and fault classification can both be considered classification problems. Therefore, in step S4.2, two multi-layer perceptron (MLP) is used to perform joint training on two tasks of root cause positioning and fault classification, and for the root cause positioning task, a difference between the root cause and the true root cause obtained by measuring the cross entropy loss function is selected.

And uniformly considering the loss of two fault diagnosis tasks, the task guiding loss and the cross-modal loss through the step S4.3 to obtain a final optimization target. Because both task oriented and cross-modal losses are contrast losses in nature, the sum of these two losses can be directly expressed as one term and the dimension modified with the parameter delta. Since the static setting of the values of α, β, ε requires a lot of attempts and domain knowledge, the dynamic weight method is adopted, and the sizes of the three weights are continuously adjusted in training, so that the final optimization objective can be rewritten as:

wherein Γ represents the set of all tasks to be learned, here the root cause localization and fault classification, γ refers to a single specific task, ω _Γ Represented as all the learnable parameters of the fault diagnosis method above,representing the loss of gamma task under the weight parameter omega, c _γ A learnable weight (i.e., α, β, ε) representing a single task γ. In this way, a ranked list of relevant causes and a specific fault type are finally obtained, providing references for operators to take corresponding recovery measures.

The following description is made regarding the point of invention and point of improvement of the present invention:

1. multi-mode data coding (characterization) mode

In patent document 1 (CN 115640159a, a micro-service fault diagnosis method and system), a mode-by-mode learning method is adopted, "information representation of each mode is learned using a specific model for each mode characteristic". The invention uniformly extracts and codes the multi-modal event of each instance on the instance dependency graph to obtain the multi-modal event expression of each instance. Compared with the patent document 1, the method and the device only retain important event information of the multi-modal data after converting the heterogeneous multi-modal data into the unified event, and reduce complexity of the model at the colleagues who reduce difficulty of multi-modal data fusion, thereby improving efficiency of feature extraction.

Patent document 2 (CN 115309575a, method, device and equipment for diagnosing a microservice failure based on a graph convolutional neural network) adopts the following scheme: and collecting alarm events of the target micro-service to be diagnosed in a preset time period before and after the fault, wherein the alarm events are generated based on multi-mode data, and the alarm events at least comprise index alarm events, log alarm events and call chain alarm events. That is, patent document 2 extracts multimodal data as 1 alert event sequence, which is a combination of three modality (index, log, call chain) events, equivalent to multimodal fusion performed in the data processing section; the two schemes respectively extract event sequences of three modes, and 3 alarm event sequences are extracted in total, so that the method has the advantages that high-dimensional feature expression can be respectively extracted for the three modes, and better effects can be obtained by multi-mode fusion on the high-dimensional features.

2. Multi-mode fusion mode

In patent document 1 (CN 115640159a, a micro-service fault diagnosis method and system), a large vector is obtained by splicing log representation HL, KPI representation HK and trace representation HT obtained in the previous stage through a multi-modal representation fusion step. The fusion mode adopted by the patent document 1 is to simply splice multi-mode characterization, and the scheme of the invention provides 'task-oriented learning' on the basis to strengthen the advantage of a specific mode on a task and 'construct transmembrane state association' to excavate information with unchanged view angle between modes. The two methods can better fuse and mine useful information in the multi-modal features.

Patent document 2 (CN 115309575a, method, device and equipment for diagnosing a microservice failure based on a graph convolutional neural network) adopts the following scheme: collecting alarm events of a target micro-service to be diagnosed in a preset time period before and after a fault, wherein the alarm events are generated based on multi-mode data, that is, the combined multi-mode event in the patent document 2 is 1 alarm event sequence, which is equivalent to the fusion of the multi-modes in a data processing part; the invention extracts the events of the three modes respectively, obtains the high-dimensional characteristics, then fuses the high-dimensional characteristics, and further adopts 'task-oriented learning' and 'construction of transmembrane state association' to strengthen multi-mode fusion, so that the invention can be fully applied to the hidden information of the high-dimensional characteristics and enhance the performance of downstream tasks.

3. Data augmentation

The invention is based on the assumption that the missing part of non-root cause node does not affect the derivation of root cause, and adopts the method of randomly inactivating part of non-root cause node based on an example dependency graph to acquire an augmentation data set. Patent document 2 discloses a training data set obtained by exchanging at least one sample node vector of a sample microservice in an initial training data set according to a root cause microservice node tag or a microservice fault type tag of the sample microservice; and training the event vector generator by using the training data set after the data enhancement processing to obtain the event vector generator after training. It is known that the data enhancement in patent document 2 aims at training the event vector generator, that is, obtaining a better initial feature expression of the event, but it does not increase the number of data sets nor alleviate the problem of sparse multi-modal fault diagnosis data. According to the scheme, firstly, the 'derivation of root causes cannot be influenced by missing part non-root cause nodes' is provided based on field knowledge, then the possible data missing situation in an on-line environment is simulated, a new data sample is obtained by randomly inactivating part non-root cause nodes, the same root cause label is shared with the original sample, and the purpose of amplifying a data set is achieved.

Fig. 7 is a schematic diagram of a parameter experiment of the dimension parameter δ in the embodiment of the invention.

In order to verify the effectiveness and the beneficial effects of the method proposed by the present invention, specific experimental data are described below.

On the data sets GAIA, AIOps-22, the method (we) proposed by the present invention was compared with the existing methods (DiagFusion and Eadro) on root cause location and fault classification tasks, and the comparison results are shown in Table 1.

TABLE 1 comparison of the process of the invention with the prior art

In addition, the invention also carries out an ablation experiment, and the function of each module is verified, and particularly, please refer to table 2.

Table 2 ablation experiments

In table 2 AUG stands for data augmentation policy, TO stands for task oriented learning, CM stands for cross-modal association.

Example two

Based on the same inventive concept, this embodiment discloses a multi-mode fault diagnosis system with unchanged task direction and viewing angle, please refer to fig. 5, the system includes:

The task-oriented and view-angle-unchanged multi-mode fault diagnosis system shown in fig. 5 is a task-oriented and view-unchanged multi-mode fault diagnosis framework, in which event extraction corresponds to a multi-mode event expression acquisition module, data augmentation corresponds to an augmentation data acquisition module, multi-mode joint learning corresponds to a graph level fault expression acquisition module, and fault diagnosis corresponds to a multi-mode fault diagnosis module.

Because the system described in the second embodiment of the present invention is a system for implementing the multi-mode fault diagnosis method with unchanged task direction and viewing angle in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can know the specific structure and deformation of the system, and therefore, the detailed description thereof is omitted herein. All systems used in the method of the first embodiment of the present invention are within the scope of the present invention.

Example III

Based on the same inventive concept, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed, implements the method as described in embodiment one.

Since the computer readable storage medium described in the third embodiment of the present application is a computer readable storage medium used for implementing the multi-mode fault diagnosis method with unchanged task direction and viewing angle in the first embodiment of the present application, based on the method described in the first embodiment of the present application, a person skilled in the art can understand the specific structure and modification of the computer readable storage medium, and therefore, the description thereof is omitted herein. All computer readable storage media used in the method according to the first embodiment of the present application are included in the scope of protection.

Example IV

Based on the same inventive concept, the application also provides a computer device, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor executes the program to implement the method in the first embodiment.

Because the computer device described in the fourth embodiment of the present application is a computer device used for implementing the multi-mode fault diagnosis method with unchanged task direction and viewing angle in the first embodiment of the present application, based on the method described in the first embodiment of the present application, a person skilled in the art can understand the specific structure and deformation of the computer device, and therefore, the description thereof is omitted herein. All computer devices used in the method of the first embodiment of the present application are within the scope of the present application.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims and the equivalents thereof, the present invention is also intended to include such modifications and variations.

Claims

1. A task oriented and view-invariant multi-modal fault diagnosis method, comprising:

2. The task-oriented and view-invariant multi-modal fault diagnosis method according to claim 1, wherein step S1 comprises:

3. The task-oriented and view-invariant multi-modal fault diagnosis method of claim 1, wherein step S2 obtains a random number of inactive non-causative nodes according to the following formula:

4. The task-oriented and view-invariant multimodal fault diagnosis method of claim 1, wherein step S3 comprises:

s3.1: dependency graph for any instance in graph list GLIs the kth node v of (2) _k Is expressed by three modes of (a)Encoder (E) using three topology-based adaptive graph neural networks _T ,E _M ,E _L ) Feature aggregation is carried out on each modal expression to obtain node expression fused with neighbor node information +.>Further a multi-modal graph level fault expression is obtained, wherein +.>Respectively represent node v _k The event expression of the index mode of (2) tracking the event expression of the mode and the event expression of the log mode; e (E) _T ,E _M ,E _L The encoder of tracking mode, the encoder of index mode and the encoder of log mode respectively, < - >V under index mode respectively _k Node expression gathering information of all neighbor nodes in K hops and v in tracking mode _k Node expression aggregating all neighbor node information in K hops and v in log mode _k Node expression of all neighbor node information in K hops is gathered;

5. The task-oriented and view-invariant multimodal fault diagnosis method of claim 4, wherein the three modalities comprise an index modality, a tracking modality and a log modality, wherein the acquisition means of the index modality map level fault expression comprises:

6. The task-oriented and view-invariant multi-modal fault diagnosis method of claim 4, wherein the loss in step S3.2 is a multi-modal task-oriented loss, expressed as:

the loss in S3.3 is a cross-modal loss of three modes, expressed as:

7. The task-oriented and view-invariant multimodal fault diagnosis method of claim 1, wherein step S4 comprises:

8. A task oriented and view invariant multi-modal fault diagnosis system comprising:

9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when executed, implements the method of any one of claims 1 to 7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the program is executed.