WO2023029654A1

WO2023029654A1 - Fault root cause determination method and device, and storage medium and electronic device

Info

Publication number: WO2023029654A1
Application number: PCT/CN2022/098678
Authority: WO
Inventors: 杜家强; 罗秋野; 杨民凡; 付光荣
Original assignee: 中兴通讯股份有限公司
Priority date: 2021-09-06
Filing date: 2022-06-14
Publication date: 2023-03-09
Also published as: CN115774855A

Abstract

Embodiments of the present invention provide a fault root cause determination method and device, and a storage medium and an electronic device. The method comprises: obtaining current service fault data; on the basis of a pretrained target GCN model, determining a fault classification result of the current service fault data according to fault feature data corresponding to the current service fault data; and determining a fault root cause of the current service fault data according to the fault classification result. Therefore, the problems in the related art of low operation and maintenance efficiency and high costs because fault root cause locating depends on the experience and service level of operation and maintenance personnel or service experts are solved; and a fault category classification of the current fault is inferred on the basis of the GCN model, and a fault root cause is determined according to the fault classification result, such that the operation and maintenance costs are reduced while the operation and maintenance efficiency is improved.

Description

Method, device, storage medium and electronic device for determining the root cause of a fault

Cross References to Related Applications

This disclosure is based on the Chinese patent application CN202111039384.4 filed on September 06, 2021, with the title of "a method, device, storage medium and electronic device for determining the root cause of a fault", and claims the priority of this patent application, by reference All the disclosed content is incorporated into this disclosure.

technical field

Embodiments of the present disclosure relate to the communication field, and in particular, relate to a method, device, storage medium, and electronic device for determining the root cause of a fault.

Background technique

The fault alarm of the telecommunication network has the characteristics of large amount of data and many sudden faults. For example, when a network device fails and triggers an alarm, its associated uplink and downlink devices will also cause a corresponding failure due to the correlation between the devices, causing it to generate derived alarm information in a short time. The traditional manual method of locating the root cause of a fault depends on the experience level of the operation and maintenance personnel, and the efficiency is low. The rule reasoning method relies on the accumulation and extraction of rule knowledge, and the accumulation of rule knowledge is also a long-term process. If unsupervised learning is used for rule extraction, business experts need to identify and confirm, which depends on the experience of business experts and Business level.

Aiming at the problem of low operation and maintenance efficiency and high cost in related technologies that the location of the root cause of the fault depends on the experience and business level of the operation and maintenance personnel or business experts, no solution has been proposed.

Contents of the invention

Embodiments of the present disclosure provide a method, device, storage medium, and electronic device for determining the root cause of a fault, so as to at least solve the problem that the location of the root cause of a fault in the related art depends on the experience and business level of the operation and maintenance personnel or business experts, and the operation and maintenance efficiency is low. And the problem of high cost.

According to an embodiment of the present disclosure, a method for determining the root cause of a fault is provided, including:

Obtain current business failure data;

Based on the pre-trained target GCN model, determine the fault classification result of the current service fault data according to the fault feature data;

Determine the fault root cause of the current service fault data according to the fault classification result.

According to another embodiment of the present disclosure, a device for determining the root cause of a fault is also provided, including:

The acquisition module is set to acquire current business failure data;

The first determination module is configured to determine the fault classification result of the current service fault data according to the fault feature data based on the pre-trained target GCN model;

The second determining module is configured to determine the fault root cause of the current service fault data according to the fault classification result.

According to yet another embodiment of the present disclosure, there is also provided a computer-readable storage medium, where a computer program is stored in the storage medium, wherein the computer program is set to execute any one of the above method embodiments when running in the steps.

According to yet another embodiment of the present disclosure, there is also provided an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to perform any of the above Steps in the method examples.

In the embodiment of the present disclosure, the current business fault data is obtained; based on the pre-trained target GCN model, the fault classification result of the current business fault data is determined according to the fault characteristic data; the current business fault is determined according to the fault classification result The root cause of the failure of the data can solve the problem that the location of the root cause of the failure in related technologies depends on the experience and business level of the operation and maintenance personnel or business experts, and the operation and maintenance efficiency is low and the cost is high. Based on the GCN model, the fault category is inferred from the current fault Classify, determine the root cause of the fault according to the fault classification result, improve the operation and maintenance efficiency and reduce the operation and maintenance cost.

Description of drawings

FIG. 1 is a block diagram of a hardware structure of a mobile terminal according to a method for determining the root cause of a fault according to an embodiment of the present disclosure;

2 is a flowchart of a method for determining the root cause of a fault according to an embodiment of the present disclosure;

Fig. 3 is a flowchart of GCN model training according to a preferred embodiment of the present disclosure;

FIG. 4 is a schematic diagram of root cause location of telecommunication equipment faults based on the GCN graph convolutional neural network according to the present embodiment;

FIG. 5 is a schematic diagram of fault sample topology data according to this embodiment;

FIG. 6 is a schematic diagram of a GCN model according to the present embodiment;

Fig. 7 is a block diagram of a fault root cause determination device according to this embodiment.

Detailed ways

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings and in combination with the embodiments.

It should be noted that the terms "first" and "second" in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence.

The method embodiments provided in the embodiments of the present disclosure may be executed in mobile terminals, computer terminals or similar computing devices. Taking running on a mobile terminal as an example, FIG. 1 is a block diagram of the hardware structure of the mobile terminal according to the method for determining the root cause of a fault according to an embodiment of the present disclosure. As shown in FIG. 1 , the mobile terminal may include one or more (only shown in FIG. 1 a) a processor 102 (the processor 102 may include but not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, wherein the above-mentioned mobile terminal may also include a memory for communication Functional transmission device 106 and input and output device 108 . Those skilled in the art can understand that the structure shown in FIG. 1 is only for illustration, and it does not limit the structure of the above mobile terminal. For example, the mobile terminal may also include more or fewer components than those shown in FIG. 1 , or have a different configuration from that shown in FIG. 1 .

The memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as the computer program corresponding to the fault root cause determination method in the embodiment of the present disclosure, the processor 102 runs the computer program stored in the memory 104, thereby Executing various functional applications and slicing processing of the service chain address pool is to realize the above-mentioned method. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory that is remotely located relative to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. The specific example of the above network may include a wireless network provided by the communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network Interface Controller (NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet in a wireless manner.

In this embodiment, a method for determining the root cause of a fault running on the above-mentioned mobile terminal or network architecture is provided. FIG. 2 is a flowchart of a method for determining the root cause of a fault according to an embodiment of the disclosure. As shown in FIG. 2 , the process Including the following steps:

Step S202, acquiring current service failure data;

Step S204, based on the pre-trained target GCN model, determine the fault classification result of the current business fault data according to the fault characteristic data corresponding to the current business fault data;

Step S206, determining the fault root cause of the current service fault data according to the fault classification result.

In this embodiment, the above step S204 may specifically include:

S2021. Obtain fault topology data and fault feature data for extracting the current service fault data;

S2022. Input the fault topology data and fault feature data into the target GCN model to obtain the fault classification result output by the target GCN model, wherein the fault classification result is a target fault category matrix, and the target The fault category matrix is a set of probabilities corresponding to each fault category. Further, S2022 may specifically include: forming the fault feature data of each node in the fault topology data into a first feature matrix, and forming the connection relationship between nodes in the fault topology data into a first adjacency matrix; The first adjacency matrix determines a second adjacency matrix including self-connection, wherein both the first adjacency matrix and the first adjacency matrix are used to represent the connection relationship between nodes; the second adjacency matrix is determined according to the second adjacency matrix One-degree matrix, wherein, the first degree matrix is a matrix composed of degrees of multiple nodes, the degrees of the multiple nodes refer to the number of nodes connected to the multiple nodes, and the first degree matrix is a diagonal matrix; input the first feature matrix, the second adjacency matrix, and the first degree matrix into the target GCN model to obtain the target fault classification result output by the target GCN model.

In this embodiment, the above step S206 may specifically include: determining that the fault category corresponding to the maximum probability in the target fault category matrix is the root cause of the fault of the current service fault data.

In an optional embodiment, FIG. 3 is a flowchart of GCN model training according to a preferred embodiment of the present disclosure. As shown in FIG. 3, the process further includes the following steps:

Step S302, extracting a preset number of historical network fault samples, wherein the historical network fault samples include fault sample topology data, fault sample feature data, and label information corresponding to the fault category of the fault sample feature data;

Further, the above step S302 specifically includes: collecting historical raw data; calculating the distance between business data in the historical raw data based on connectivity, distance and weight, time span and weight, preset rules and weights of the fault topology data ; Divide the service data whose distance is smaller than the preset threshold into the same cluster, and obtain the preset number of historical network fault samples.

Step S304, using the fault sample topology data, the fault sample feature data and the fault label information corresponding to the fault sample feature data to train the original GCN model to obtain the target GCN model, wherein the topology structure, the fault sample feature data is the input of the original GCN model, the fault category corresponding to the fault sample feature data output by the trained target GCN model is the same as the target operation result of the fault sample feature data The actual corresponding fault category satisfies the preset objective function.

Further, the above step S304 may specifically include:

S3041. Form the fault sample feature data of each node in the fault topology data into a second feature matrix, and form the connection relationship between nodes in the fault sample topology data into a third adjacency matrix;

S3042. Determine a fourth adjacency matrix including self-connection according to the third adjacency matrix, wherein both the third adjacency matrix and the fourth adjacency matrix are used to represent the connection relationship between nodes;

S3043. Determine a second degree matrix according to the fourth adjacency matrix, wherein the second degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes refer to the The number of connected nodes, the second degree matrix is a diagonal matrix;

S3044. Composing label information corresponding to the fault category of the fault sample feature data into a fault category matrix;

S3045. Train the original GCN model according to the second feature matrix, the fourth adjacency matrix, the second degree matrix, and the fault category matrix to obtain the target GCN model.

Further, the above step S3045 may specifically include: determining the weight parameters of the target GCN model in the following manner:

Wherein, Z is the fault category matrix, ReLU is the activation function of the first layer of the original GCN model, Softmax is the activation function of the second layer of the original GCN model, X is the second feature matrix,

is the fourth adjacency matrix,

is the second degree matrix, W ⁽⁰⁾ and W ⁽¹⁾ are the weight parameters;

The target GCN model is determined according to the weight parameters.

Fig. 4 is the schematic diagram of the root cause localization of telecommunication equipment fault based on GCN graph convolutional neural network according to the present embodiment, as shown in Fig. 4, by network fault sample extraction module 42, network GCN model training module 44, network current fault identification module 46 and network GCN reasoning module 48.

In this embodiment, comprehensively considering the telecommunications topology structure, fault feature data and labeling information, the fault features are automatically extracted through the forward propagation and back propagation of the GCN graph convolutional neural network, and the GCN model based on the training is used to analyze the faults in the new faults. Related nodes are node classified to identify the root cause of the failure.

In the network fault sample extraction module 42, the telecommunication fault is embodied as a plurality of service data sets that are spatially related in topology and close in time, including basic topology, alarm data and performance KPI data. If a transmission interruption alarm occurs on a physical site, it will trigger a downtime alarm of the base station at the downstream site in a short time, and some KPI data exceptions will be used as the business data of this transmission interruption fault.

Based on the three dimensions of topological connectivity, distance and weight, time span and weight, empirical rules and weight, the distance between business data is calculated, and the business data with a distance smaller than a specific threshold is divided into the same cluster as a fault. Each fault is marked and classified according to the root cause by manual methods and automatic methods based on empirical rules. The topology, alarm, performance KPI, and label information related to each fault are converted as the output of this module. The data format of the failure sample will be described in detail below.

Fig. 5 is a schematic diagram of fault sample topology data according to the present embodiment. As shown in Fig. 5, different types of telecommunication equipment form a network for carrying voice or digital services on the upper layer, wherein the physical site is the basic structure of the network and its operation and maintenance. Unit, each physical site generally includes these three types of equipment: power supply equipment, transmission equipment and communication equipment, multiple physical sites constitute a telecommunication network. The number in the node is the number of the physical site, and a total of 8 physical sites form a directed graph, and each site includes power supply equipment, transmission equipment and communication equipment. The topology needs to be used as the input of the GCN model training, the format is JSON format, the key is the current physical site ID, and its value is the ID of the directly adjacent neighbor physical site:

Fault sample characteristic data, the physical site is composed of different types of telecommunication equipment, when a fault occurs, the business data belonging to the same fault is extracted and converted, and the key equipment alarms and KPI performance data are combined, and the performance data is followed by the alarm , and arranged in the order of power supply equipment, transmission equipment, and communication equipment, the format is shown in Table 1.

Table 1

Arrange the frequently occurring characteristic alarms in a fixed order, mark each site as 1 or 0 according to whether the alarm occurs during the fault, arrange the key KPI indicators in a specific order, and the last column is the label of the root node: power failure PowerFault , transmission fault TransFault, communication equipment fault CommunicationFault and normal node Normal. Each fault sample includes topology structure in JSON format, feature data in CSV format, and fault classification annotations. Multiple fault samples are used as the output of this module and the input of the GCN model training module.

Network GCN training module 44, graph convolution is to perform deep learning on graph data and extract features from graph data, so that these features can be used to classify graph data nodes.

Suppose there are N nodes (nodes) in the graph data, and each node has its own characteristics. Let the characteristics of these nodes form an N×D-dimensional matrix X, and then the relationship between each node will also form an N×N Dimensional matrix A, also known as the adjacency matrix (adjacency matrix). X and A are the input of the GCN model. In this network model, the propagation method between layers is as follows:

Adjacency matrix A: used to represent the connection relationship between nodes, here it is assumed to be a 0-1 matrix, where,

Represents an adjacency matrix containing self-connections, the adjacency matrix corresponding to the topology of Figure 5

for:

degree matrix

The degree of each node refers to the number of nodes it connects, which is a diagonal matrix, where the diagonal elements

The degree matrix corresponding to Figure 5 is:

Feature matrix X: H ⁽⁰⁾ in the corresponding formula is used to represent the feature of the node, X∈R ^N×D , where D is the dimension of the feature, as follows:

Fig. 6 is a schematic diagram of the GCN model according to this embodiment. As shown in Fig. 6, the GCN model is a two-layer GCN, and its formula is as follows:

Among them, each layer of GCN network needs to be transformed, aggregated and activated.

Transformation: transform and learn the current node features, here is the multiplication rule XW;

Aggregation: Aggregate the characteristics of nodes in the domain to obtain new characteristics of the node, here is the simple addition rule AX;

Activation: The activation function is used to increase nonlinearity, the activation function of the first layer is ReLU, and the activation function of the second layer is Softmax.

Bring the sample data into the GCN model for forward propagation, output the classification results and manually label them for cross-entropy calculation as the loss function, and use the dynamic gradient descent method to optimize the loss function, so as to realize the automatic update of the parameters of the model until the loss no longer decreases Stop updating, train the parameters W ⁽⁰⁾ and W ⁽¹⁾ to get the required model as the output of this module.

The current network fault identification module 46 and the network fault sample extraction module 42 extract multi-sample data from historical business data, while the current network fault identification module 46 extracts ongoing single fault data without manual labeling, it is the network GCN reasoning module The output of 48 is the same as the network failure sample extraction module 42. The output is the topology data of the current fault and the fault feature data without labels, and the other is the same as the network fault sample extraction module 42 .

Network GCN reasoning module 48, based on the trained GCN model, when a new fault occurs, the topology output by the current fault extraction module is converted into an adjacency matrix

and node characteristics X as the input of this module, after entering formula 2, the classification situation Z(Z∈R ^N×4 ) of each faulty node can be obtained.

In this embodiment, the topological structure of the telecommunication network and the service data of each node are used as the input, and the GCN graph convolutional neural network model is trained to identify the root cause of the fault, not only identifying the physical node of the fault, but also further identifying the fault Classification. This embodiment uses the GCN neural network to reduce the dependence on the experience of artificial experts, which is more conducive to promotion and implementation. Considering the topological structure data comprehensively, it not only utilizes the characteristic data of the node, but also aggregates the characteristics of its adjacent nodes, and this also fits the characteristics of fault propagation. In addition, the node characteristics not only consider the alarm data, but also consider the Relevant key performance data makes the amount of information more comprehensive. works well.

The following briefly describes this embodiment by taking the topology structure of a telecommunication network with 9 physical sites as an example.

Prepare network structure data and feature data, as shown in Figure 5, the number in the node is the number of the physical site, a total of 8 physical sites form a topology, and each site includes power supply equipment, transmission equipment and communication equipment. The topology needs to be used as the input of the GCN model training, the format is JSON format, the key is the current physical site ID, and its value is the ID of the directly adjacent neighbor physical site:

The adjacency matrix corresponding to the topology of Figure 5

is, the adjacency matrix corresponding to the topology

for:

degree matrix

The degree of each node refers to the number of nodes it is connected to, which is a diagonal matrix where the diagonal elements

The degree matrix corresponding to Figure 5 is:

Extract the characteristic data X of each node. The physical site is composed of different telecommunication equipment. When a fault occurs, the business data belonging to the same fault is extracted and converted, and the key equipment alarms and KPI performance data are combined. The performance data are arranged in the order of power supply equipment, transmission equipment, and communication equipment. The characteristics of Figure 5 are shown in Table 2.

Table 2

Feature matrix X: H ⁽⁰⁾ in the corresponding formula is used to represent the feature of the node, X∈R ^N×D , where D is the dimension of the feature, and the above table is transformed into the feature matrix as follows:

Prepare multiple sets of fault topology data and feature sample data as input for GCN training.

Training GCN model, input adjacency matrix in GCN model

degree matrix

The feature matrix X is already available.

By bringing in the above formula to perform multiple rounds of iterations of forward propagation and back propagation, and training the parameters W ⁽⁰⁾ and W ⁽¹⁾ , the required model can be obtained.

Fault root cause location and classification, input new adjacency matrix

degree matrix

The characteristic matrix X can get the classification situation Z of each node (Z∈R ^N×4 ): N is the number of nodes, and 4 represents the classification situation of each node: respectively, power failure, transmission failure, communication equipment failure, no failure 4 The probability of this situation, [0, 0.9, 0.1, 0] indicates that the node is a transmission failure.

According to another embodiment of the present disclosure, a device for determining the root cause of a fault is also provided. FIG. 7 is a block diagram of the device for determining the root cause of a fault according to this embodiment. As shown in FIG. 7 , it includes:

Obtaining module 72, configured to obtain current service failure data;

The first determination module 74 is configured to determine the fault classification result of the current business fault data according to the fault characteristic data corresponding to the current business fault data based on the pre-trained target GCN model;

The second determining module 76 is configured to determine the fault root cause of the current service fault data according to the fault classification result.

In an exemplary embodiment, the first determining module 74 includes:

The acquisition sub-module is configured to acquire the fault topology data and the fault characteristic data for extracting the current business fault data;

The input sub-module is configured to input the fault topology data and the fault feature data into the target GCN model to obtain the fault classification result output by the target GCN model, wherein the fault classification result is a target fault A category matrix, the target failure category matrix is a probability set corresponding to each failure category.

In an exemplary embodiment, the second determination module 76 is also set to

Determining the fault category corresponding to the maximum probability in the target fault category matrix as the root cause of the fault of the current service fault data.

In an exemplary embodiment, the input sub-module is also set to

The fault feature data of each node in the fault topology data is formed into a first feature matrix, and the connection relationship between each node in the fault topology data is formed into a first adjacency matrix;

determining a second adjacency matrix comprising self-connections based on the first adjacency matrix;

Determine the first degree matrix according to the second adjacency matrix, wherein the first degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes refer to the The number of nodes, the first degree matrix is a diagonal matrix;

Inputting the first feature matrix, the second adjacency matrix, and the first degree matrix into the target GCN model to obtain the target fault classification result output by the target GCN model.

In an exemplary embodiment, the device also includes:

An extraction module configured to extract a preset number of historical network fault samples, wherein the historical network fault samples include fault sample topology data, fault sample feature data, and label information corresponding to fault categories of the fault sample feature data;

The training module is configured to use the fault sample topology data, the fault sample feature data, and the fault sample feature data corresponding to the fault label information to train the original GCN model to obtain the target GCN model, wherein the The topology structure and the fault sample feature data are the input of the original GCN model, and the fault category corresponding to the fault sample feature data output by the trained target GCN model is the same as the target operation of the fault sample feature data. The actual corresponding fault category of the result satisfies the preset objective function.

In an exemplary embodiment, the training module includes:

Composing a sub-module, configured to form the fault sample feature data of each node in the fault topology data into a second feature matrix, and form the connection relationship between each node in the fault sample topology data to form a third adjacency matrix;

The first determining submodule is configured to determine a fourth adjacency matrix including self-connections according to the third adjacency matrix;

The second determining submodule is configured to determine a second degree matrix according to the fourth adjacency matrix, wherein the second degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes respectively refer to is the number of nodes connected by the plurality of nodes, and the second degree matrix is a diagonal matrix;

Composing a submodule, configured to form a fault category matrix with label information corresponding to the fault category of the fault sample feature data;

The training submodule is configured to train the original GCN model according to the second feature matrix, the fourth adjacency matrix, the second degree matrix and the fault category matrix to obtain the target GCN model.

In an exemplary embodiment, the training submodule is also set to

Determine the weight parameters of the target GCN model in the following manner:

is the fourth adjacency matrix,

The target GCN model is determined according to the weight parameters.

In an exemplary embodiment, the extraction module includes:

The collection sub-module is set to collect historical raw data;

The calculation sub-module is configured to calculate the distance between business data in the historical raw data based on connectivity, distance and weight of fault topology data, time span and weight, preset rules and weight;

The division sub-module is configured to divide the service data whose distance is smaller than the preset threshold into the same cluster, and obtain the preset number of historical network fault samples.

Embodiments of the present disclosure also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.

In an exemplary embodiment, the above-mentioned computer-readable storage medium may include but not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM) , mobile hard disk, magnetic disk or optical disk and other media that can store computer programs.

Embodiments of the present disclosure also provide an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.

In an exemplary embodiment, the electronic device may further include a transmission device and an input and output device, wherein the transmission device is connected to the processor, and the input and output device is connected to the processor.

For specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and exemplary implementation manners, and details will not be repeated here in this embodiment.

Obviously, those skilled in the art should understand that each module or each step of the above-mentioned disclosure can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices In fact, they can be implemented in program code executable by a computing device, and thus, they can be stored in a storage device to be executed by a computing device, and in some cases, can be executed in an order different from that shown here. Or described steps, or they are fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present disclosure is not limited to any specific combination of hardware and software.

The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the principle of the present disclosure shall be included in the protection scope of the present disclosure.

Claims

A method for determining the root cause of a failure, comprising:

Obtain current business fault data;

Based on the pre-trained target GCN model, determine the fault classification result of the current business fault data according to the fault feature data corresponding to the current business fault data;

Determine the fault root cause of the current service fault data according to the fault classification result.
The method according to claim 1, wherein, based on the pre-trained target GCN model, determining the fault classification result of the current service fault data according to the fault feature data comprises:

Extracting the fault topology data and the fault characteristic data described in the fault characteristic data of the current business fault data;

Input the fault topology data and the fault feature data into the target GCN model to obtain the fault classification result output by the target GCN model, wherein the fault classification result is a target fault category matrix, and the target The fault category matrix is a set of probabilities corresponding to each fault category.
The method according to claim 2, wherein determining the fault root cause of the current service fault data according to the fault classification result comprises:

Determining the fault category corresponding to the maximum probability in the target fault category matrix as the root cause of the fault of the current service fault data.
The method according to claim 2, wherein, inputting the fault topology data and fault feature data into the target GCN model, and obtaining the target fault classification result output by the target GCN model comprises:

Composing the fault feature data of each node in the fault topology data into a first feature matrix, and forming the connection relationship between each node in the fault topology data into a first adjacency matrix;

determining a second adjacency matrix comprising self-connections based on the first adjacency matrix;

Determine the first degree matrix according to the second adjacency matrix, wherein the first degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes refer to the The number of nodes, the first degree matrix is a diagonal matrix:

Inputting the first feature matrix, the second adjacency matrix, and the first degree matrix into the target GCN model to obtain the target fault classification result output by the target GCN model.
The method according to any one of claims 1 to 4, wherein, based on the pre-trained target GCN model, the fault classification result of the current business fault data is determined according to the fault characteristic data corresponding to the current business fault data Previously, the method further included:

Extracting a preset number of historical network fault samples, wherein the historical network fault samples include fault sample topology data, fault sample feature data, and label information corresponding to the fault category of the fault sample feature data;

Use the fault sample topology data, the fault sample feature data, and the fault sample feature data corresponding to the fault label information to train the original GCN model to obtain the target GCN model, wherein the topology, the fault The sample feature data is the input of the original GCN model, the fault category corresponding to the fault sample feature data output by the trained target GCN model and the actual corresponding fault category of the target operation result of the fault sample feature data Satisfy the preset objective function.
The method according to claim 5, wherein the original GCN model is trained by using the fault sample topology data, the fault sample feature data, and the fault sample feature data corresponding to the fault label information to obtain the target GCN models include:

Forming the fault sample feature data of each node in the fault topology data into a second feature matrix, and forming the connection relationship between each node in the fault sample topology data into a third adjacency matrix;

determining a fourth adjacency matrix comprising self-connections according to the third adjacency matrix;

Determine the second degree matrix according to the fourth adjacency matrix, wherein the second degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes refer to the The number of nodes, the second degree matrix is a diagonal matrix;

Composing the label information corresponding to the fault category of the fault sample characteristic data into a fault category matrix;

The original GCN model is trained according to the second feature matrix, the fourth adjacency matrix, the second degree matrix and the fault category matrix to obtain the target GCN model.
The method according to claim 6, wherein the original GCN model is trained according to the second feature matrix, the fourth adjacency matrix, the second degree matrix and the fault category matrix to obtain the Target GCN models include:

Determine the weight parameters of the target GCN model in the following manner:

Wherein, Z is the fault category matrix, ReLU is the activation function of the first layer of the original GCN model, Softmax is the activation function of the second layer of the original GCN model, X is the second feature matrix,
is the fourth adjacency matrix,
is the second degree matrix, W (0) and W (1) are the weight parameters;

The target GCN model is determined according to the weight parameters.
The method according to claim 5, wherein extracting a preset number of historical network fault samples comprises:

Collect historical raw data;

Calculate the distance between business data in the historical raw data based on the connectivity, distance and weight of the fault topology data, time span and weight, preset rules and weight;

The service data whose distance is smaller than the preset threshold is divided into the same cluster to obtain the preset number of historical network fault samples.
A device for determining the root cause of a fault, comprising:

The acquisition module is set to acquire current business failure data;

The first determination module is configured to determine the fault classification result of the current business fault data according to the fault feature data corresponding to the current business fault data based on the pre-trained target GCN model;

The second determining module is configured to determine the fault root cause of the current service fault data according to the fault classification result.
A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the method described in any one of claims 1 to 8 when running.
An electronic device, comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to perform the method described in any one of claims 1 to 8.