WO2023029654A1 - Fault root cause determination method and device, and storage medium and electronic device - Google Patents

Fault root cause determination method and device, and storage medium and electronic device Download PDF

Info

Publication number
WO2023029654A1
WO2023029654A1 PCT/CN2022/098678 CN2022098678W WO2023029654A1 WO 2023029654 A1 WO2023029654 A1 WO 2023029654A1 CN 2022098678 W CN2022098678 W CN 2022098678W WO 2023029654 A1 WO2023029654 A1 WO 2023029654A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault
data
matrix
target
gcn model
Prior art date
Application number
PCT/CN2022/098678
Other languages
French (fr)
Chinese (zh)
Inventor
杜家强
罗秋野
杨民凡
付光荣
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023029654A1 publication Critical patent/WO2023029654A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Definitions

  • Embodiments of the present disclosure relate to the communication field, and in particular, relate to a method, device, storage medium, and electronic device for determining the root cause of a fault.
  • the fault alarm of the telecommunication network has the characteristics of large amount of data and many sudden faults. For example, when a network device fails and triggers an alarm, its associated uplink and downlink devices will also cause a corresponding failure due to the correlation between the devices, causing it to generate derived alarm information in a short time.
  • the traditional manual method of locating the root cause of a fault depends on the experience level of the operation and maintenance personnel, and the efficiency is low.
  • the rule reasoning method relies on the accumulation and extraction of rule knowledge, and the accumulation of rule knowledge is also a long-term process. If unsupervised learning is used for rule extraction, business experts need to identify and confirm, which depends on the experience of business experts and Business level.
  • Embodiments of the present disclosure provide a method, device, storage medium, and electronic device for determining the root cause of a fault, so as to at least solve the problem that the location of the root cause of a fault in the related art depends on the experience and business level of the operation and maintenance personnel or business experts, and the operation and maintenance efficiency is low. And the problem of high cost.
  • a method for determining the root cause of a fault including:
  • a device for determining the root cause of a fault including:
  • the acquisition module is set to acquire current business failure data
  • the first determination module is configured to determine the fault classification result of the current service fault data according to the fault feature data based on the pre-trained target GCN model;
  • the second determining module is configured to determine the fault root cause of the current service fault data according to the fault classification result.
  • a computer-readable storage medium where a computer program is stored in the storage medium, wherein the computer program is set to execute any one of the above method embodiments when running in the steps.
  • an electronic device including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to perform any of the above Steps in the method examples.
  • the current business fault data is obtained; based on the pre-trained target GCN model, the fault classification result of the current business fault data is determined according to the fault characteristic data; the current business fault is determined according to the fault classification result
  • the root cause of the failure of the data can solve the problem that the location of the root cause of the failure in related technologies depends on the experience and business level of the operation and maintenance personnel or business experts, and the operation and maintenance efficiency is low and the cost is high.
  • the fault category is inferred from the current fault Classify, determine the root cause of the fault according to the fault classification result, improve the operation and maintenance efficiency and reduce the operation and maintenance cost.
  • FIG. 1 is a block diagram of a hardware structure of a mobile terminal according to a method for determining the root cause of a fault according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a method for determining the root cause of a fault according to an embodiment of the present disclosure
  • Fig. 3 is a flowchart of GCN model training according to a preferred embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of root cause location of telecommunication equipment faults based on the GCN graph convolutional neural network according to the present embodiment
  • FIG. 5 is a schematic diagram of fault sample topology data according to this embodiment.
  • FIG. 6 is a schematic diagram of a GCN model according to the present embodiment.
  • Fig. 7 is a block diagram of a fault root cause determination device according to this embodiment.
  • FIG. 1 is a block diagram of the hardware structure of the mobile terminal according to the method for determining the root cause of a fault according to an embodiment of the present disclosure.
  • the mobile terminal may include one or more (only shown in FIG. 1 a) a processor 102 (the processor 102 may include but not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, wherein the above-mentioned mobile terminal may also include a memory for communication Functional transmission device 106 and input and output device 108 .
  • a processor 102 may include but not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA
  • a memory 104 for storing data
  • the above-mentioned mobile terminal may also include a memory for communication Functional transmission device 106 and input and output device 108 .
  • FIG. 1 is only for illustration, and it does not limit the structure of the above mobile terminal.
  • the mobile terminal may also include more or fewer components than those shown in FIG. 1 , or have a different configuration from that shown in FIG. 1 .
  • the memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as the computer program corresponding to the fault root cause determination method in the embodiment of the present disclosure, the processor 102 runs the computer program stored in the memory 104, thereby Executing various functional applications and slicing processing of the service chain address pool is to realize the above-mentioned method.
  • the memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include a memory that is remotely located relative to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission device 106 is used to receive or transmit data via a network.
  • the specific example of the above network may include a wireless network provided by the communication provider of the mobile terminal.
  • the transmission device 106 includes a Network Interface Controller (NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • FIG. 2 is a flowchart of a method for determining the root cause of a fault according to an embodiment of the disclosure. As shown in FIG. 2 , the process Including the following steps:
  • Step S202 acquiring current service failure data
  • Step S204 based on the pre-trained target GCN model, determine the fault classification result of the current business fault data according to the fault characteristic data corresponding to the current business fault data;
  • Step S206 determining the fault root cause of the current service fault data according to the fault classification result.
  • step S204 may specifically include:
  • S2022 may specifically include: forming the fault feature data of each node in the fault topology data into a first feature matrix, and forming the connection relationship between nodes in the fault topology data into a first adjacency matrix;
  • the first adjacency matrix determines a second adjacency matrix including self-connection, wherein both the first adjacency matrix and the first adjacency matrix are used to represent the connection relationship between nodes;
  • the second adjacency matrix is determined according to the second adjacency matrix One-degree matrix, wherein, the first degree matrix is a matrix composed of degrees of multiple nodes, the degrees of the multiple nodes refer to the number of nodes connected to the multiple nodes, and the first degree matrix is a diagonal matrix; input the first feature matrix, the second adjacency matrix, and the first degree matrix into the target GCN model to obtain the target fault classification result output by the target GCN model.
  • the above step S206 may specifically include: determining that the fault category corresponding to the maximum probability in the target fault category matrix is the root cause of the fault of the current service fault data.
  • FIG. 3 is a flowchart of GCN model training according to a preferred embodiment of the present disclosure. As shown in FIG. 3, the process further includes the following steps:
  • Step S302 extracting a preset number of historical network fault samples, wherein the historical network fault samples include fault sample topology data, fault sample feature data, and label information corresponding to the fault category of the fault sample feature data;
  • step S302 specifically includes: collecting historical raw data; calculating the distance between business data in the historical raw data based on connectivity, distance and weight, time span and weight, preset rules and weights of the fault topology data ; Divide the service data whose distance is smaller than the preset threshold into the same cluster, and obtain the preset number of historical network fault samples.
  • Step S304 using the fault sample topology data, the fault sample feature data and the fault label information corresponding to the fault sample feature data to train the original GCN model to obtain the target GCN model, wherein the topology structure, the fault sample feature data is the input of the original GCN model, the fault category corresponding to the fault sample feature data output by the trained target GCN model is the same as the target operation result of the fault sample feature data The actual corresponding fault category satisfies the preset objective function.
  • step S304 may specifically include:
  • step S3045 may specifically include: determining the weight parameters of the target GCN model in the following manner:
  • Z is the fault category matrix
  • ReLU is the activation function of the first layer of the original GCN model
  • Softmax is the activation function of the second layer of the original GCN model
  • X is the second feature matrix
  • W (0) and W (1) are the weight parameters
  • the target GCN model is determined according to the weight parameters.
  • Fig. 4 is the schematic diagram of the root cause localization of telecommunication equipment fault based on GCN graph convolutional neural network according to the present embodiment, as shown in Fig. 4, by network fault sample extraction module 42, network GCN model training module 44, network current fault identification module 46 and network GCN reasoning module 48.
  • the fault features are automatically extracted through the forward propagation and back propagation of the GCN graph convolutional neural network, and the GCN model based on the training is used to analyze the faults in the new faults.
  • Related nodes are node classified to identify the root cause of the failure.
  • the telecommunication fault is embodied as a plurality of service data sets that are spatially related in topology and close in time, including basic topology, alarm data and performance KPI data. If a transmission interruption alarm occurs on a physical site, it will trigger a downtime alarm of the base station at the downstream site in a short time, and some KPI data exceptions will be used as the business data of this transmission interruption fault.
  • the distance between business data is calculated, and the business data with a distance smaller than a specific threshold is divided into the same cluster as a fault.
  • Each fault is marked and classified according to the root cause by manual methods and automatic methods based on empirical rules.
  • the topology, alarm, performance KPI, and label information related to each fault are converted as the output of this module.
  • the data format of the failure sample will be described in detail below.
  • Fig. 5 is a schematic diagram of fault sample topology data according to the present embodiment.
  • different types of telecommunication equipment form a network for carrying voice or digital services on the upper layer, wherein the physical site is the basic structure of the network and its operation and maintenance.
  • each physical site generally includes these three types of equipment: power supply equipment, transmission equipment and communication equipment, multiple physical sites constitute a telecommunication network.
  • the number in the node is the number of the physical site, and a total of 8 physical sites form a directed graph, and each site includes power supply equipment, transmission equipment and communication equipment.
  • the topology needs to be used as the input of the GCN model training, the format is JSON format, the key is the current physical site ID, and its value is the ID of the directly adjacent neighbor physical site:
  • the physical site is composed of different types of telecommunication equipment, when a fault occurs, the business data belonging to the same fault is extracted and converted, and the key equipment alarms and KPI performance data are combined, and the performance data is followed by the alarm , and arranged in the order of power supply equipment, transmission equipment, and communication equipment, the format is shown in Table 1.
  • Each fault sample includes topology structure in JSON format, feature data in CSV format, and fault classification annotations. Multiple fault samples are used as the output of this module and the input of the GCN model training module.
  • Network GCN training module 44 graph convolution is to perform deep learning on graph data and extract features from graph data, so that these features can be used to classify graph data nodes.
  • N nodes nodes
  • X and A are the input of the GCN model.
  • the propagation method between layers is as follows:
  • Adjacency matrix A used to represent the connection relationship between nodes, here it is assumed to be a 0-1 matrix, where,
  • the degree of each node refers to the number of nodes it connects, which is a diagonal matrix, where the diagonal elements
  • the degree matrix corresponding to Figure 5 is:
  • Feature matrix X H (0) in the corresponding formula is used to represent the feature of the node, X ⁇ R N ⁇ D , where D is the dimension of the feature, as follows:
  • Fig. 6 is a schematic diagram of the GCN model according to this embodiment. As shown in Fig. 6, the GCN model is a two-layer GCN, and its formula is as follows:
  • each layer of GCN network needs to be transformed, aggregated and activated.
  • Transformation transform and learn the current node features, here is the multiplication rule XW;
  • Aggregation Aggregate the characteristics of nodes in the domain to obtain new characteristics of the node, here is the simple addition rule AX;
  • the activation function is used to increase nonlinearity, the activation function of the first layer is ReLU, and the activation function of the second layer is Softmax.
  • the current network fault identification module 46 and the network fault sample extraction module 42 extract multi-sample data from historical business data, while the current network fault identification module 46 extracts ongoing single fault data without manual labeling, it is the network GCN reasoning module
  • the output of 48 is the same as the network failure sample extraction module 42.
  • the output is the topology data of the current fault and the fault feature data without labels, and the other is the same as the network fault sample extraction module 42 .
  • Network GCN reasoning module 48 based on the trained GCN model, when a new fault occurs, the topology output by the current fault extraction module is converted into an adjacency matrix and node characteristics X as the input of this module, after entering formula 2, the classification situation Z(Z ⁇ R N ⁇ 4 ) of each faulty node can be obtained.
  • the topological structure of the telecommunication network and the service data of each node are used as the input, and the GCN graph convolutional neural network model is trained to identify the root cause of the fault, not only identifying the physical node of the fault, but also further identifying the fault Classification.
  • This embodiment uses the GCN neural network to reduce the dependence on the experience of artificial experts, which is more conducive to promotion and implementation. Considering the topological structure data comprehensively, it not only utilizes the characteristic data of the node, but also aggregates the characteristics of its adjacent nodes, and this also fits the characteristics of fault propagation. In addition, the node characteristics not only consider the alarm data, but also consider the Relevant key performance data makes the amount of information more comprehensive. works well.
  • the number in the node is the number of the physical site, a total of 8 physical sites form a topology, and each site includes power supply equipment, transmission equipment and communication equipment.
  • the topology needs to be used as the input of the GCN model training, the format is JSON format, the key is the current physical site ID, and its value is the ID of the directly adjacent neighbor physical site:
  • the adjacency matrix corresponding to the topology of Figure 5 is, the adjacency matrix corresponding to the topology for:
  • the degree of each node refers to the number of nodes it is connected to, which is a diagonal matrix where the diagonal elements
  • the degree matrix corresponding to Figure 5 is:
  • Extract the characteristic data X of each node The physical site is composed of different telecommunication equipment. When a fault occurs, the business data belonging to the same fault is extracted and converted, and the key equipment alarms and KPI performance data are combined. The performance data are arranged in the order of power supply equipment, transmission equipment, and communication equipment. The characteristics of Figure 5 are shown in Table 2.
  • Feature matrix X H (0) in the corresponding formula is used to represent the feature of the node, X ⁇ R N ⁇ D , where D is the dimension of the feature, and the above table is transformed into the feature matrix as follows:
  • the characteristic matrix X can get the classification situation Z of each node (Z ⁇ R N ⁇ 4 ): N is the number of nodes, and 4 represents the classification situation of each node: respectively, power failure, transmission failure, communication equipment failure, no failure 4
  • the probability of this situation, [0, 0.9, 0.1, 0] indicates that the node is a transmission failure.
  • FIG. 7 is a block diagram of the device for determining the root cause of a fault according to this embodiment. As shown in FIG. 7 , it includes:
  • Obtaining module 72 configured to obtain current service failure data
  • the first determination module 74 is configured to determine the fault classification result of the current business fault data according to the fault characteristic data corresponding to the current business fault data based on the pre-trained target GCN model;
  • the second determining module 76 is configured to determine the fault root cause of the current service fault data according to the fault classification result.
  • the first determining module 74 includes:
  • the acquisition sub-module is configured to acquire the fault topology data and the fault characteristic data for extracting the current business fault data
  • the input sub-module is configured to input the fault topology data and the fault feature data into the target GCN model to obtain the fault classification result output by the target GCN model, wherein the fault classification result is a target fault A category matrix, the target failure category matrix is a probability set corresponding to each failure category.
  • the second determination module 76 is also set to
  • the input sub-module is also set to
  • the fault feature data of each node in the fault topology data is formed into a first feature matrix, and the connection relationship between each node in the fault topology data is formed into a first adjacency matrix;
  • the first degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes refer to the The number of nodes, the first degree matrix is a diagonal matrix;
  • the device also includes:
  • An extraction module configured to extract a preset number of historical network fault samples, wherein the historical network fault samples include fault sample topology data, fault sample feature data, and label information corresponding to fault categories of the fault sample feature data;
  • the training module is configured to use the fault sample topology data, the fault sample feature data, and the fault sample feature data corresponding to the fault label information to train the original GCN model to obtain the target GCN model, wherein the The topology structure and the fault sample feature data are the input of the original GCN model, and the fault category corresponding to the fault sample feature data output by the trained target GCN model is the same as the target operation of the fault sample feature data.
  • the actual corresponding fault category of the result satisfies the preset objective function.
  • the training module includes:
  • Composing a sub-module configured to form the fault sample feature data of each node in the fault topology data into a second feature matrix, and form the connection relationship between each node in the fault sample topology data to form a third adjacency matrix;
  • the first determining submodule is configured to determine a fourth adjacency matrix including self-connections according to the third adjacency matrix;
  • the second determining submodule is configured to determine a second degree matrix according to the fourth adjacency matrix, wherein the second degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes respectively refer to is the number of nodes connected by the plurality of nodes, and the second degree matrix is a diagonal matrix;
  • the training submodule is configured to train the original GCN model according to the second feature matrix, the fourth adjacency matrix, the second degree matrix and the fault category matrix to obtain the target GCN model.
  • the training submodule is also set to
  • Z is the fault category matrix
  • ReLU is the activation function of the first layer of the original GCN model
  • Softmax is the activation function of the second layer of the original GCN model
  • X is the second feature matrix
  • W (0) and W (1) are the weight parameters
  • the target GCN model is determined according to the weight parameters.
  • the extraction module includes:
  • the collection sub-module is set to collect historical raw data
  • the calculation sub-module is configured to calculate the distance between business data in the historical raw data based on connectivity, distance and weight of fault topology data, time span and weight, preset rules and weight;
  • the division sub-module is configured to divide the service data whose distance is smaller than the preset threshold into the same cluster, and obtain the preset number of historical network fault samples.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.
  • the above-mentioned computer-readable storage medium may include but not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM) , mobile hard disk, magnetic disk or optical disk and other media that can store computer programs.
  • ROM read-only memory
  • RAM random access memory
  • mobile hard disk magnetic disk or optical disk and other media that can store computer programs.
  • Embodiments of the present disclosure also provide an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
  • the electronic device may further include a transmission device and an input and output device, wherein the transmission device is connected to the processor, and the input and output device is connected to the processor.
  • each module or each step of the above-mentioned disclosure can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices In fact, they can be implemented in program code executable by a computing device, and thus, they can be stored in a storage device to be executed by a computing device, and in some cases, can be executed in an order different from that shown here. Or described steps, or they are fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present disclosure is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Biology (AREA)
  • Signal Processing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Embodiments of the present invention provide a fault root cause determination method and device, and a storage medium and an electronic device. The method comprises: obtaining current service fault data; on the basis of a pretrained target GCN model, determining a fault classification result of the current service fault data according to fault feature data corresponding to the current service fault data; and determining a fault root cause of the current service fault data according to the fault classification result. Therefore, the problems in the related art of low operation and maintenance efficiency and high costs because fault root cause locating depends on the experience and service level of operation and maintenance personnel or service experts are solved; and a fault category classification of the current fault is inferred on the basis of the GCN model, and a fault root cause is determined according to the fault classification result, such that the operation and maintenance costs are reduced while the operation and maintenance efficiency is improved.

Description

一种故障根因确定方法、装置、存储介质及电子装置Method, device, storage medium and electronic device for determining the root cause of a fault
相关申请的交叉引用Cross References to Related Applications
本公开基于2021年09月06日提交的发明名称为“一种故障根因确定方法、装置、存储介质及电子装置”的中国专利申请CN202111039384.4,并且要求该专利申请的优先权,通过引用将其所公开的内容全部并入本公开。This disclosure is based on the Chinese patent application CN202111039384.4 filed on September 06, 2021, with the title of "a method, device, storage medium and electronic device for determining the root cause of a fault", and claims the priority of this patent application, by reference All the disclosed content is incorporated into this disclosure.
技术领域technical field
本公开实施例涉及通信领域,具体而言,涉及一种故障根因确定方法、装置、存储介质及电子装置。Embodiments of the present disclosure relate to the communication field, and in particular, relate to a method, device, storage medium, and electronic device for determining the root cause of a fault.
背景技术Background technique
电信网络的故障告警具有数据量大、突发故障情况多的特点。例如网络设备出现故障并引发告警时,与它相关联的上下行设备同样会因为设备间的关联性引发相应的故障,使之在短时间内产生衍生的告警信息。故障根因定位传统的人工方式依赖于运维人员的经验水平,效率较低。而规则推理方式依赖于规则知识的积累总结及抽取,而规则知识积累也是一个周期较长的过程,如果使用无监督学习的方式进行规则抽取需要业务专家识别和确认,依赖于业务专家的经验及业务水平。The fault alarm of the telecommunication network has the characteristics of large amount of data and many sudden faults. For example, when a network device fails and triggers an alarm, its associated uplink and downlink devices will also cause a corresponding failure due to the correlation between the devices, causing it to generate derived alarm information in a short time. The traditional manual method of locating the root cause of a fault depends on the experience level of the operation and maintenance personnel, and the efficiency is low. The rule reasoning method relies on the accumulation and extraction of rule knowledge, and the accumulation of rule knowledge is also a long-term process. If unsupervised learning is used for rule extraction, business experts need to identify and confirm, which depends on the experience of business experts and Business level.
针对相关技术中故障根因定位依赖于运维人员或业务专家的经验与业务水平,运维效率低且成本高的问题,尚未提出解决方案。Aiming at the problem of low operation and maintenance efficiency and high cost in related technologies that the location of the root cause of the fault depends on the experience and business level of the operation and maintenance personnel or business experts, no solution has been proposed.
发明内容Contents of the invention
本公开实施例提供了一种故障根因确定方法、装置、存储介质及电子装置,以至少解决相关技术中故障根因定位依赖于运维人员或业务专家的经验与业务水平,运维效率低且成本高的问题。Embodiments of the present disclosure provide a method, device, storage medium, and electronic device for determining the root cause of a fault, so as to at least solve the problem that the location of the root cause of a fault in the related art depends on the experience and business level of the operation and maintenance personnel or business experts, and the operation and maintenance efficiency is low. And the problem of high cost.
根据本公开的一个实施例,提供了一种故障根因确定方法,包括:According to an embodiment of the present disclosure, a method for determining the root cause of a fault is provided, including:
获取当前业务故障数据;Obtain current business failure data;
基于预先训练好的目标GCN模型,根据所述故障特征数据确定所述当前业务故障数据的故障分类结果;Based on the pre-trained target GCN model, determine the fault classification result of the current service fault data according to the fault feature data;
根据所述故障分类结果确定所述当前业务故障数据的故障根因。Determine the fault root cause of the current service fault data according to the fault classification result.
根据本公开的另一个实施例,还提供了一种故障根因确定装置,包括:According to another embodiment of the present disclosure, a device for determining the root cause of a fault is also provided, including:
获取模块,设置为获取当前业务故障数据;The acquisition module is set to acquire current business failure data;
第一确定模块,设置为基于预先训练好的目标GCN模型,根据所述故障特征数据确定所述当前业务故障数据的故障分类结果;The first determination module is configured to determine the fault classification result of the current service fault data according to the fault feature data based on the pre-trained target GCN model;
第二确定模块,设置为根据所述故障分类结果确定所述当前业务故障数据的故障根因。The second determining module is configured to determine the fault root cause of the current service fault data according to the fault classification result.
根据本公开的又一个实施例,还提供了一种计算机可读的存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。According to yet another embodiment of the present disclosure, there is also provided a computer-readable storage medium, where a computer program is stored in the storage medium, wherein the computer program is set to execute any one of the above method embodiments when running in the steps.
根据本公开的又一个实施例,还提供了一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行上述任一项方法实施例中的步骤。According to yet another embodiment of the present disclosure, there is also provided an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to perform any of the above Steps in the method examples.
本公开实施例,获取当前业务故障数据;基于预先训练好的目标GCN模型,根据所述故障特征数据确定所述当前业务故障数据的故障分类结果;根据所述故障分类结果确定所述当前业务故障数据的故障根因,可以解决相关技术中故障根因定位依赖于运维人员或业务专家的经验与业务水平,运维效率低且成本高的问题,基于GCN模型对当前故障进行推理出故障类别分类,根据故障分类结果确定故障根因,提升运维效率的同时降低运维成本。In the embodiment of the present disclosure, the current business fault data is obtained; based on the pre-trained target GCN model, the fault classification result of the current business fault data is determined according to the fault characteristic data; the current business fault is determined according to the fault classification result The root cause of the failure of the data can solve the problem that the location of the root cause of the failure in related technologies depends on the experience and business level of the operation and maintenance personnel or business experts, and the operation and maintenance efficiency is low and the cost is high. Based on the GCN model, the fault category is inferred from the current fault Classify, determine the root cause of the fault according to the fault classification result, improve the operation and maintenance efficiency and reduce the operation and maintenance cost.
附图说明Description of drawings
图1是本公开实施例的故障根因确定方法的移动终端的硬件结构框图;FIG. 1 is a block diagram of a hardware structure of a mobile terminal according to a method for determining the root cause of a fault according to an embodiment of the present disclosure;
图2是根据本公开实施例的故障根因确定方法的流程图;2 is a flowchart of a method for determining the root cause of a fault according to an embodiment of the present disclosure;
图3是根据本公开优选实施例的GCN模型训练的流程图;Fig. 3 is a flowchart of GCN model training according to a preferred embodiment of the present disclosure;
图4是根据本实施例的基于GCN图卷积神经网络的电信设备故障根因定位的示意图;FIG. 4 is a schematic diagram of root cause location of telecommunication equipment faults based on the GCN graph convolutional neural network according to the present embodiment;
图5是根据本实施例的故障样本拓扑数据的示意图;FIG. 5 is a schematic diagram of fault sample topology data according to this embodiment;
图6是根据本实施例的GCN模型的示意图;FIG. 6 is a schematic diagram of a GCN model according to the present embodiment;
图7是根据本实施例的故障根因确定装置的框图。Fig. 7 is a block diagram of a fault root cause determination device according to this embodiment.
具体实施方式Detailed ways
下文中将参考附图并结合实施例来详细说明本公开的实施例。Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings and in combination with the embodiments.
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that the terms "first" and "second" in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence.
本公开实施例中所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在移动终端上为例,图1是本公开实施例的故障根因确定方法的移动终端的硬件结构框图,如图1所示,移动终端可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和用于存储数据的存储器104,其中,上述移动终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述移动终端的结构造成限定。例如,移动终端还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。The method embodiments provided in the embodiments of the present disclosure may be executed in mobile terminals, computer terminals or similar computing devices. Taking running on a mobile terminal as an example, FIG. 1 is a block diagram of the hardware structure of the mobile terminal according to the method for determining the root cause of a fault according to an embodiment of the present disclosure. As shown in FIG. 1 , the mobile terminal may include one or more (only shown in FIG. 1 a) a processor 102 (the processor 102 may include but not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, wherein the above-mentioned mobile terminal may also include a memory for communication Functional transmission device 106 and input and output device 108 . Those skilled in the art can understand that the structure shown in FIG. 1 is only for illustration, and it does not limit the structure of the above mobile terminal. For example, the mobile terminal may also include more or fewer components than those shown in FIG. 1 , or have a different configuration from that shown in FIG. 1 .
存储器104可用于存储计算机程序,例如,应用软件的软件程序以及模块,如本公开实施例中的故障根因确定方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及业务链地址池切片处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至移动终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as the computer program corresponding to the fault root cause determination method in the embodiment of the present disclosure, the processor 102 runs the computer program stored in the memory 104, thereby Executing various functional applications and slicing processing of the service chain address pool is to realize the above-mentioned method. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory that is remotely located relative to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括移动终端的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network  Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。The transmission device 106 is used to receive or transmit data via a network. The specific example of the above network may include a wireless network provided by the communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network Interface Controller (NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet in a wireless manner.
在本实施例中提供了一种运行于上述移动终端或网络架构的故障根因确定方法,图2是根据本公开实施例的故障根因确定方法的流程图,如图2所示,该流程包括如下步骤:In this embodiment, a method for determining the root cause of a fault running on the above-mentioned mobile terminal or network architecture is provided. FIG. 2 is a flowchart of a method for determining the root cause of a fault according to an embodiment of the disclosure. As shown in FIG. 2 , the process Including the following steps:
步骤S202,获取当前业务故障数据;Step S202, acquiring current service failure data;
步骤S204,基于预先训练好的目标GCN模型,根据所述当前业务故障数据对应的故障特征数据确定所述当前业务故障数据的故障分类结果;Step S204, based on the pre-trained target GCN model, determine the fault classification result of the current business fault data according to the fault characteristic data corresponding to the current business fault data;
步骤S206,根据所述故障分类结果确定所述当前业务故障数据的故障根因。Step S206, determining the fault root cause of the current service fault data according to the fault classification result.
本实施例中,上述步骤S204具体可以包括:In this embodiment, the above step S204 may specifically include:
S2021,获取提取所述当前业务故障数据的故障拓扑数据与故障特征数据;S2021. Obtain fault topology data and fault feature data for extracting the current service fault data;
S2022,将所述故障拓扑数据与故障特征数据输入所述目标GCN模型中,得到所述目标GCN模型输出的所述故障分类结果,其中,所述故障分类结果为目标故障类别矩阵,所述目标故障类别矩阵为对应每种故障类别的概率集合。进一步的,S2022具体可以包括:将所述故障拓扑数据中每个节点的所述故障特征数据组成第一特征矩阵,将所述故障拓扑数据中各节点间的连接关系形成第一邻接矩阵;根据所述第一邻接矩阵确定包括自连接的第二邻接矩阵,其中,所述第一邻接矩阵、所述第一邻接矩阵均用于表示节点间的连接关系;根据所述第二邻接矩阵确定第一度矩阵,其中,所述第一度矩阵是由多个节点的度组成的矩阵,所述多个节点的度分别指的是所述多个节点连接的节点数,所述第一度矩阵为对角矩阵;将所述第一特征矩阵、所述第二邻接矩阵、所述第一度矩阵输入所述目标GCN模型中,得到所述目标GCN模型输出的所述目标故障分类结果。S2022. Input the fault topology data and fault feature data into the target GCN model to obtain the fault classification result output by the target GCN model, wherein the fault classification result is a target fault category matrix, and the target The fault category matrix is a set of probabilities corresponding to each fault category. Further, S2022 may specifically include: forming the fault feature data of each node in the fault topology data into a first feature matrix, and forming the connection relationship between nodes in the fault topology data into a first adjacency matrix; The first adjacency matrix determines a second adjacency matrix including self-connection, wherein both the first adjacency matrix and the first adjacency matrix are used to represent the connection relationship between nodes; the second adjacency matrix is determined according to the second adjacency matrix One-degree matrix, wherein, the first degree matrix is a matrix composed of degrees of multiple nodes, the degrees of the multiple nodes refer to the number of nodes connected to the multiple nodes, and the first degree matrix is a diagonal matrix; input the first feature matrix, the second adjacency matrix, and the first degree matrix into the target GCN model to obtain the target fault classification result output by the target GCN model.
本实施例中,上述步骤S206具体可以包括:确定所述目标故障类别矩阵中所述概率最大值对应的故障类别为所述当前业务故障数据的故障根因。In this embodiment, the above step S206 may specifically include: determining that the fault category corresponding to the maximum probability in the target fault category matrix is the root cause of the fault of the current service fault data.
在一可选的实施例中,图3是根据本公开优选实施例的GCN模型训练的流程图,如图3所示,该流程还包括如下步骤:In an optional embodiment, FIG. 3 is a flowchart of GCN model training according to a preferred embodiment of the present disclosure. As shown in FIG. 3, the process further includes the following steps:
步骤S302,提取预设数量的历史网络故障样本,其中,所述历史网络故障样本包括故障样本拓扑数据、故障样本特征数据以及所述故障样本特征数据对应故障类别的标签信息;Step S302, extracting a preset number of historical network fault samples, wherein the historical network fault samples include fault sample topology data, fault sample feature data, and label information corresponding to the fault category of the fault sample feature data;
进一步的,上述步骤S302具体包括:采集历史原始数据;基于故障拓扑数据的联通性、距离及权重,时间跨度及权重,预设规则及权重,计算出所述历史原始数据中业务数据间的距离;将距离小于预设阈值的业务数据划分到同一簇中,得到所述预设数量的历史网络故障样本。Further, the above step S302 specifically includes: collecting historical raw data; calculating the distance between business data in the historical raw data based on connectivity, distance and weight, time span and weight, preset rules and weights of the fault topology data ; Divide the service data whose distance is smaller than the preset threshold into the same cluster, and obtain the preset number of historical network fault samples.
步骤S304,使用所述故障样本拓扑数据、所述故障样本特征数据以及所述故障样本特征数据对应故障的标签信息对所述原始GCN模型进行训练,得到所述目标GCN模型,其中,所述拓扑结构、所述故障样本特征数据为所述原始GCN模型的输入,训练好的所述目标GCN模型输出的所述故障样本特征数据对应的故障类别与所述故障样本特征数据所述目标操作结果的实际对应的故障类别满足预设目标函数。Step S304, using the fault sample topology data, the fault sample feature data and the fault label information corresponding to the fault sample feature data to train the original GCN model to obtain the target GCN model, wherein the topology structure, the fault sample feature data is the input of the original GCN model, the fault category corresponding to the fault sample feature data output by the trained target GCN model is the same as the target operation result of the fault sample feature data The actual corresponding fault category satisfies the preset objective function.
进一步的,上述步骤S304具体可以包括:Further, the above step S304 may specifically include:
S3041,将所述故障拓扑数据中每个节点的所述故障样本特征数据组成第二特征矩阵,将所述故障样本拓扑数据中各节点间的连接关系形成第三邻接矩阵;S3041. Form the fault sample feature data of each node in the fault topology data into a second feature matrix, and form the connection relationship between nodes in the fault sample topology data into a third adjacency matrix;
S3042,根据所述第三邻接矩阵确定包括自连接的第四邻接矩阵,其中,所述第三邻接矩阵、所述第四邻接矩阵均用于表示节点间的连接关系;S3042. Determine a fourth adjacency matrix including self-connection according to the third adjacency matrix, wherein both the third adjacency matrix and the fourth adjacency matrix are used to represent the connection relationship between nodes;
S3043,根据所述第四邻接矩阵确定第二度矩阵,其中,所述第二度矩阵是由多个节点的度组成的矩阵,所述多个节点的度分别指的是所述多个节点连接的节点数,所述第二度矩阵为对角矩阵;S3043. Determine a second degree matrix according to the fourth adjacency matrix, wherein the second degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes refer to the The number of connected nodes, the second degree matrix is a diagonal matrix;
S3044,将所述故障样本特征数据对应故障类别的标签信息组成故障类别矩阵;S3044. Composing label information corresponding to the fault category of the fault sample feature data into a fault category matrix;
S3045,根据所述第二特征矩阵、所述第四邻接矩阵、所述第二度矩阵以及所述故障类别矩阵对所述原始GCN模型进行训练,得到所述目标GCN模型。S3045. Train the original GCN model according to the second feature matrix, the fourth adjacency matrix, the second degree matrix, and the fault category matrix to obtain the target GCN model.
进一步的,上述步骤S3045具体可以包括:通过以下方式确定所述目标GCN模型的权重参数:Further, the above step S3045 may specifically include: determining the weight parameters of the target GCN model in the following manner:
Figure PCTCN2022098678-appb-000001
Figure PCTCN2022098678-appb-000001
其中,Z为所述故障类别矩阵,ReLU为所述原始GCN模型的第1层激活函数,Softmax为所述原始GCN模型的第2层激活函数,X为所述第二特征矩阵、
Figure PCTCN2022098678-appb-000002
为所述第四邻接矩阵、
Figure PCTCN2022098678-appb-000003
为所述第二度矩阵,W (0)、W (1)为所述权重参数;
Wherein, Z is the fault category matrix, ReLU is the activation function of the first layer of the original GCN model, Softmax is the activation function of the second layer of the original GCN model, X is the second feature matrix,
Figure PCTCN2022098678-appb-000002
is the fourth adjacency matrix,
Figure PCTCN2022098678-appb-000003
is the second degree matrix, W (0) and W (1) are the weight parameters;
根据所述权重参数确定所述目标GCN模型。The target GCN model is determined according to the weight parameters.
图4是根据本实施例的基于GCN图卷积神经网络的电信设备故障根因定位的示意图,如图4所示,由网络故障样本提取模块42、网络GCN模型训练模块44、网络当前故障识别模块46、网络GCN推理模块48组成。Fig. 4 is the schematic diagram of the root cause localization of telecommunication equipment fault based on GCN graph convolutional neural network according to the present embodiment, as shown in Fig. 4, by network fault sample extraction module 42, network GCN model training module 44, network current fault identification module 46 and network GCN reasoning module 48.
本实施例中,综合考虑电信拓扑结构和故障特征数据以及标注信息,通过GCN图卷积神经网络前向传播和反向传播自动提取故障特征,基于训练出的GCN模型对新发生的故障中的相关的节点进行节点分类,识别出根源故障。In this embodiment, comprehensively considering the telecommunications topology structure, fault feature data and labeling information, the fault features are automatically extracted through the forward propagation and back propagation of the GCN graph convolutional neural network, and the GCN model based on the training is used to analyze the faults in the new faults. Related nodes are node classified to identify the root cause of the failure.
网络故障样本提取模块42,电信故障体现为拓扑结构上空间上相关且时间上相近的多个业务数据集合,包括基本的拓扑结构,告警数据和性能KPI数据。如在某物理站点上发生传输中断告警,会在短时间内引发下游站点的基站退服告警,和部分KPI数据异常,都作为本次传输中断故障的业务数据。In the network fault sample extraction module 42, the telecommunication fault is embodied as a plurality of service data sets that are spatially related in topology and close in time, including basic topology, alarm data and performance KPI data. If a transmission interruption alarm occurs on a physical site, it will trigger a downtime alarm of the base station at the downstream site in a short time, and some KPI data exceptions will be used as the business data of this transmission interruption fault.
基于拓扑结构上的联通性、距离及权重、时间跨度及权重、经验规则及权重3个维度,计算出业务数据间的距离,将距离小于特定阈值的业务数据划分到同一簇中作为一个故障,通过人工方式、经验规则自动方式对每个故障进行标注根源故障分类。每个故障相关的拓扑,告警,性能KPI,以及标注信息转换后作为本模块的输出。下面将详细介绍故障样本的数据格式。Based on the three dimensions of topological connectivity, distance and weight, time span and weight, empirical rules and weight, the distance between business data is calculated, and the business data with a distance smaller than a specific threshold is divided into the same cluster as a fault. Each fault is marked and classified according to the root cause by manual methods and automatic methods based on empirical rules. The topology, alarm, performance KPI, and label information related to each fault are converted as the output of this module. The data format of the failure sample will be described in detail below.
图5是根据本实施例的故障样本拓扑数据的示意图,如图5所示,不同类型的电信设备组成网络用于承载上层的语音或数字业务,其中物理站点是网络构成及其运维的基本单位,每个物理站点都大致包含这3类设备:电源设备,传输设备和通讯设备,多个物理站点构成电信网络。节点中的编号为物理站点编号,共8个物理站点形成一个有向图,其中每个站点都包含电源设备,传输设备及通讯设备。拓扑结构需要作为GCN模型训练的输入,格式为JSON格式,key为当前物理站点ID,其值为直接相邻的邻居物理站点ID:Fig. 5 is a schematic diagram of fault sample topology data according to the present embodiment. As shown in Fig. 5, different types of telecommunication equipment form a network for carrying voice or digital services on the upper layer, wherein the physical site is the basic structure of the network and its operation and maintenance. Unit, each physical site generally includes these three types of equipment: power supply equipment, transmission equipment and communication equipment, multiple physical sites constitute a telecommunication network. The number in the node is the number of the physical site, and a total of 8 physical sites form a directed graph, and each site includes power supply equipment, transmission equipment and communication equipment. The topology needs to be used as the input of the GCN model training, the format is JSON format, the key is the current physical site ID, and its value is the ID of the directly adjacent neighbor physical site:
Figure PCTCN2022098678-appb-000004
Figure PCTCN2022098678-appb-000004
Figure PCTCN2022098678-appb-000005
Figure PCTCN2022098678-appb-000005
故障样本特征数据,物理站点有不同类型的电信设备构成,当故障发生时,归属于同一故障的业务数据进行抽取和转换,把关键的设备告警和KPI性能数据进行组合,按照先告警后性能数据,并按电源设备,传输设备,通讯设备的顺序排列,格式如表1所示。Fault sample characteristic data, the physical site is composed of different types of telecommunication equipment, when a fault occurs, the business data belonging to the same fault is extracted and converted, and the key equipment alarms and KPI performance data are combined, and the performance data is followed by the alarm , and arranged in the order of power supply equipment, transmission equipment, and communication equipment, the format is shown in Table 1.
表1Table 1
Figure PCTCN2022098678-appb-000006
Figure PCTCN2022098678-appb-000006
将经常发生的特征告警按照固定顺序排列,每个站点根据故障中是否发生此告警进行标记为1或0,将关键的KPI指标按照特定顺序排列,最后一列为是否根源节点的标签:电源故障PowerFault、传输故障TransFault、通讯设备故障CommunicationFault和正常节点Normal。每个故障样本包括JSON格式的拓扑结构和CSV格式的特征数据和故障分类标注,多个故障样本作为本模块的输出和GCN模型训练模块的输入。Arrange the frequently occurring characteristic alarms in a fixed order, mark each site as 1 or 0 according to whether the alarm occurs during the fault, arrange the key KPI indicators in a specific order, and the last column is the label of the root node: power failure PowerFault , transmission fault TransFault, communication equipment fault CommunicationFault and normal node Normal. Each fault sample includes topology structure in JSON format, feature data in CSV format, and fault classification annotations. Multiple fault samples are used as the output of this module and the input of the GCN model training module.
网络GCN训练模块44,图卷积就是对图数据进行深度学习,从图数据中提取特征,从而可以使用这些特征去对图数据进行节点分类。Network GCN training module 44, graph convolution is to perform deep learning on graph data and extract features from graph data, so that these features can be used to classify graph data nodes.
假定图数据其中有N个节点(node),每个节点都有自己的特征,设这些节点的特征组成一个N×D维的矩阵X,然后各个节点之间的关系也会形成一个N×N维的矩阵A,也称为邻接矩阵(adjacency matrix)。X和A便是GCN模型的输入,在这个网络模型里面层与层之间的传播方式如下:Suppose there are N nodes (nodes) in the graph data, and each node has its own characteristics. Let the characteristics of these nodes form an N×D-dimensional matrix X, and then the relationship between each node will also form an N×N Dimensional matrix A, also known as the adjacency matrix (adjacency matrix). X and A are the input of the GCN model. In this network model, the propagation method between layers is as follows:
Figure PCTCN2022098678-appb-000007
Figure PCTCN2022098678-appb-000007
邻接矩阵A:用来表示节点间的连接关系,这里假定是0-1矩阵,其中,Adjacency matrix A: used to represent the connection relationship between nodes, here it is assumed to be a 0-1 matrix, where,
Figure PCTCN2022098678-appb-000008
表示包含自连接的邻接矩阵,图5的拓扑结构对应的邻接矩阵
Figure PCTCN2022098678-appb-000009
为:
Figure PCTCN2022098678-appb-000008
Represents an adjacency matrix containing self-connections, the adjacency matrix corresponding to the topology of Figure 5
Figure PCTCN2022098678-appb-000009
for:
Figure PCTCN2022098678-appb-000010
Figure PCTCN2022098678-appb-000010
度矩阵
Figure PCTCN2022098678-appb-000011
每个节点的度指的是其连接的节点数,这是一个对角矩阵,其中,对角线元素
Figure PCTCN2022098678-appb-000012
图5对应的度矩阵为:
degree matrix
Figure PCTCN2022098678-appb-000011
The degree of each node refers to the number of nodes it connects, which is a diagonal matrix, where the diagonal elements
Figure PCTCN2022098678-appb-000012
The degree matrix corresponding to Figure 5 is:
Figure PCTCN2022098678-appb-000013
Figure PCTCN2022098678-appb-000013
特征矩阵X:对应公式中的H (0)用于表示节点的特征,X∈R N×D,这里D是特征的维度,如下: Feature matrix X: H (0) in the corresponding formula is used to represent the feature of the node, X∈R N×D , where D is the dimension of the feature, as follows:
Figure PCTCN2022098678-appb-000014
Figure PCTCN2022098678-appb-000014
图6是根据本实施例的GCN模型的示意图,如图6所示,GCN模型为两层GCN,其公式如下:Fig. 6 is a schematic diagram of the GCN model according to this embodiment. As shown in Fig. 6, the GCN model is a two-layer GCN, and its formula is as follows:
Figure PCTCN2022098678-appb-000015
Figure PCTCN2022098678-appb-000015
其中,每层GCN网络都要进行变换,聚合和激活。Among them, each layer of GCN network needs to be transformed, aggregated and activated.
变换:对当前的节点特征进行变换学习,这里就是乘法规则XW;Transformation: transform and learn the current node features, here is the multiplication rule XW;
聚合:聚合领域节点的特征,得到该节点的新特征,这里是简单的加法规则AX;Aggregation: Aggregate the characteristics of nodes in the domain to obtain new characteristics of the node, here is the simple addition rule AX;
激活:采用激活函数,增加非线性,第1层激活函数ReLU,第2层激活函数为Softmax。Activation: The activation function is used to increase nonlinearity, the activation function of the first layer is ReLU, and the activation function of the second layer is Softmax.
将样本数据带入GCN模型进行正向传播,输出分类结果与人工标注进行交叉熵计算作为损失函数,采用动态梯度下降法对损失函数进行优化,实现模型的参数自动更新,直到损失不再减少时停止更新,训练出参数W (0)和W (1)即可得到需要的模型作为本模块的输出。 Bring the sample data into the GCN model for forward propagation, output the classification results and manually label them for cross-entropy calculation as the loss function, and use the dynamic gradient descent method to optimize the loss function, so as to realize the automatic update of the parameters of the model until the loss no longer decreases Stop updating, train the parameters W (0) and W (1) to get the required model as the output of this module.
网络当前故障识别模块46,网络故障样本提取模块42从历史业务数据提取多样本数据,而网络当前故障识别模块46是提取正在发生的单故障数据,而且没有经过人工标注,正是网络GCN推理模块48的输出,其它同网络故障样本提取模块42。输出为当前故障的拓扑结构数据和不带标注的故障特征数据,其它同网络故障样本提取模块42。The current network fault identification module 46 and the network fault sample extraction module 42 extract multi-sample data from historical business data, while the current network fault identification module 46 extracts ongoing single fault data without manual labeling, it is the network GCN reasoning module The output of 48 is the same as the network failure sample extraction module 42. The output is the topology data of the current fault and the fault feature data without labels, and the other is the same as the network fault sample extraction module 42 .
网络GCN推理模块48,基于训练好的GCN模型,当新的故障发生时,当前故障提取模块输出的拓扑结构转为邻接矩阵
Figure PCTCN2022098678-appb-000016
和节点特征X作为本模块的输入,待入公式2即可得到各个节点故障节点的分类情况Z(Z∈R N×4)。
Network GCN reasoning module 48, based on the trained GCN model, when a new fault occurs, the topology output by the current fault extraction module is converted into an adjacency matrix
Figure PCTCN2022098678-appb-000016
and node characteristics X as the input of this module, after entering formula 2, the classification situation Z(Z∈R N×4 ) of each faulty node can be obtained.
本实施例将电信网络的拓扑结构和每个节点的业务数据为特征作为输入,训练GCN图卷积神经网络模型用于根因故障的识别,不但识别出故障的物理节点,而且更进一步识别故障的分类。本实施例使用了GCN神经网络降低了对人工专家经验的依赖,更利于推广和实施。综合考虑到了拓扑结构数据,不但利用本节点的特征数据,而且聚合了其相邻节点的特征,而这一点也契合了故障具有传播性的特点,另外节点特征不但考虑了告警数据,也考虑了相关的关键性能数据,使得信息量更加全面。效果良好。In this embodiment, the topological structure of the telecommunication network and the service data of each node are used as the input, and the GCN graph convolutional neural network model is trained to identify the root cause of the fault, not only identifying the physical node of the fault, but also further identifying the fault Classification. This embodiment uses the GCN neural network to reduce the dependence on the experience of artificial experts, which is more conducive to promotion and implementation. Considering the topological structure data comprehensively, it not only utilizes the characteristic data of the node, but also aggregates the characteristics of its adjacent nodes, and this also fits the characteristics of fault propagation. In addition, the node characteristics not only consider the alarm data, but also consider the Relevant key performance data makes the amount of information more comprehensive. works well.
下面以具有9个物理站点的电信网络的拓扑结构为例,对本实施例进行简要说明。The following briefly describes this embodiment by taking the topology structure of a telecommunication network with 9 physical sites as an example.
准备网络结构数据和特征数据,如图5所示,节点中的编号为物理站点编号,共8个物理站点形成一个拓扑结构,其中每个站点都包含电源设备,传输设备及通讯设备。拓扑结构需要作为GCN模型训练的输入,格式为JSON格式,key为当前物理站点ID,其值为直接相邻的邻居物理站点ID:Prepare network structure data and feature data, as shown in Figure 5, the number in the node is the number of the physical site, a total of 8 physical sites form a topology, and each site includes power supply equipment, transmission equipment and communication equipment. The topology needs to be used as the input of the GCN model training, the format is JSON format, the key is the current physical site ID, and its value is the ID of the directly adjacent neighbor physical site:
Figure PCTCN2022098678-appb-000017
Figure PCTCN2022098678-appb-000017
图5的拓扑结构对应的邻接矩阵
Figure PCTCN2022098678-appb-000018
为,拓扑结构对应的邻接矩阵
Figure PCTCN2022098678-appb-000019
为:
The adjacency matrix corresponding to the topology of Figure 5
Figure PCTCN2022098678-appb-000018
is, the adjacency matrix corresponding to the topology
Figure PCTCN2022098678-appb-000019
for:
Figure PCTCN2022098678-appb-000020
Figure PCTCN2022098678-appb-000020
度矩阵
Figure PCTCN2022098678-appb-000021
每个节点的度指的是其连接的节点数,这是一个对角矩阵,其中对角线元素
Figure PCTCN2022098678-appb-000022
Figure PCTCN2022098678-appb-000023
图5对应的度矩阵为:
degree matrix
Figure PCTCN2022098678-appb-000021
The degree of each node refers to the number of nodes it is connected to, which is a diagonal matrix where the diagonal elements
Figure PCTCN2022098678-appb-000022
Figure PCTCN2022098678-appb-000023
The degree matrix corresponding to Figure 5 is:
Figure PCTCN2022098678-appb-000024
Figure PCTCN2022098678-appb-000024
提取各节点的特征数据X,物理站点有不同的电信设备构成,当故障发生时,归属于同一故障的业务数据进行抽取和转换,把关键的设备告警和KPI性能数据进行组合,按照先告警后性能数据,并按电源设备,传输设备,通讯设备的顺序排列,图5的特征如表2所示。Extract the characteristic data X of each node. The physical site is composed of different telecommunication equipment. When a fault occurs, the business data belonging to the same fault is extracted and converted, and the key equipment alarms and KPI performance data are combined. The performance data are arranged in the order of power supply equipment, transmission equipment, and communication equipment. The characteristics of Figure 5 are shown in Table 2.
表2Table 2
Figure PCTCN2022098678-appb-000025
Figure PCTCN2022098678-appb-000025
特征矩阵X:对应公式中的H (0)用于表示节点的特征,X∈R N×D,这里D是特征的维度,上表转换为特征矩阵为: Feature matrix X: H (0) in the corresponding formula is used to represent the feature of the node, X∈R N×D , where D is the dimension of the feature, and the above table is transformed into the feature matrix as follows:
Figure PCTCN2022098678-appb-000026
Figure PCTCN2022098678-appb-000026
准备多组故障的拓扑结构数据和特征样本数据作为GCN训练的输入。Prepare multiple sets of fault topology data and feature sample data as input for GCN training.
训练GCN模型,GCN模型中的输入邻接矩阵
Figure PCTCN2022098678-appb-000027
度矩阵
Figure PCTCN2022098678-appb-000028
特征矩阵X,都已经具备。
Training GCN model, input adjacency matrix in GCN model
Figure PCTCN2022098678-appb-000027
degree matrix
Figure PCTCN2022098678-appb-000028
The feature matrix X is already available.
Figure PCTCN2022098678-appb-000029
Figure PCTCN2022098678-appb-000029
通过带入上述公式进行经过多轮正向传播和反向传播的迭代,训练出参数W (0)和W (1)即可得到需要的模型。 By bringing in the above formula to perform multiple rounds of iterations of forward propagation and back propagation, and training the parameters W (0) and W (1) , the required model can be obtained.
故障根因定位及分类,输入新的邻接矩阵
Figure PCTCN2022098678-appb-000030
度矩阵
Figure PCTCN2022098678-appb-000031
特征矩阵X即可得到每个节点的分类情况Z(Z∈R N×4):N为节点数,4表示每个节点分类情况:分别为电源故障,传输故障,通讯设备故障,无故障4种情况的概率,[0,0.9,0.1,0]表示本节点为传输故障。
Fault root cause location and classification, input new adjacency matrix
Figure PCTCN2022098678-appb-000030
degree matrix
Figure PCTCN2022098678-appb-000031
The characteristic matrix X can get the classification situation Z of each node (Z∈R N×4 ): N is the number of nodes, and 4 represents the classification situation of each node: respectively, power failure, transmission failure, communication equipment failure, no failure 4 The probability of this situation, [0, 0.9, 0.1, 0] indicates that the node is a transmission failure.
根据本公开的另一个实施例,还提供了一种故障根因确定装置,图7是根据本实施例的故障根因确定装置的框图,如图7所示,包括:According to another embodiment of the present disclosure, a device for determining the root cause of a fault is also provided. FIG. 7 is a block diagram of the device for determining the root cause of a fault according to this embodiment. As shown in FIG. 7 , it includes:
获取模块72,设置为获取当前业务故障数据;Obtaining module 72, configured to obtain current service failure data;
第一确定模块74,设置为基于预先训练好的目标GCN模型,根据所述当前业务故障数据对应的故障特征数据确定所述当前业务故障数据的故障分类结果;The first determination module 74 is configured to determine the fault classification result of the current business fault data according to the fault characteristic data corresponding to the current business fault data based on the pre-trained target GCN model;
第二确定模块76,设置为根据所述故障分类结果确定所述当前业务故障数据的故障根因。The second determining module 76 is configured to determine the fault root cause of the current service fault data according to the fault classification result.
在一示例性实施例中,所述第一确定模块74包括:In an exemplary embodiment, the first determining module 74 includes:
获取子模块,设置为获取提取所述当前业务故障数据的故障拓扑数据与所述故障特征数据;The acquisition sub-module is configured to acquire the fault topology data and the fault characteristic data for extracting the current business fault data;
输入子模块,设置为将所述故障拓扑数据与所述故障特征数据输入所述目标GCN模型中,得到所述目标GCN模型输出的所述故障分类结果,其中,所述故障分类结果为目标故障类别矩阵,所述目标故障类别矩阵为对应每种故障类别的概率集合。The input sub-module is configured to input the fault topology data and the fault feature data into the target GCN model to obtain the fault classification result output by the target GCN model, wherein the fault classification result is a target fault A category matrix, the target failure category matrix is a probability set corresponding to each failure category.
在一示例性实施例中,所述第二确定模块76,还设置为In an exemplary embodiment, the second determination module 76 is also set to
确定所述目标故障类别矩阵中所述概率最大值对应的故障类别为所述当前业务故障数据的故障根因。Determining the fault category corresponding to the maximum probability in the target fault category matrix as the root cause of the fault of the current service fault data.
在一示例性实施例中,所述输入子模块,还设置为In an exemplary embodiment, the input sub-module is also set to
将所述故障拓扑数据中每个节点的所述故障特征数据组成第一特征矩阵,将所述故障拓 扑数据中各节点间的连接关系形成第一邻接矩阵;The fault feature data of each node in the fault topology data is formed into a first feature matrix, and the connection relationship between each node in the fault topology data is formed into a first adjacency matrix;
根据所述第一邻接矩阵确定包括自连接的第二邻接矩阵;determining a second adjacency matrix comprising self-connections based on the first adjacency matrix;
根据所述第二邻接矩阵确定第一度矩阵,其中,所述第一度矩阵是由多个节点的度组成的矩阵,所述多个节点的度分别指的是所述多个节点连接的节点数,所述第一度矩阵为对角矩阵;Determine the first degree matrix according to the second adjacency matrix, wherein the first degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes refer to the The number of nodes, the first degree matrix is a diagonal matrix;
将所述根据所述第一特征矩阵、所述第二邻接矩阵、所述第一度矩阵输入所述目标GCN模型中,得到所述目标GCN模型输出的所述目标故障分类结果。Inputting the first feature matrix, the second adjacency matrix, and the first degree matrix into the target GCN model to obtain the target fault classification result output by the target GCN model.
在一示例性实施例中,所述装置还包括:In an exemplary embodiment, the device also includes:
提取模块,设置为提取预设数量的历史网络故障样本,其中,所述历史网络故障样本包括故障样本拓扑数据、故障样本特征数据以及所述故障样本特征数据对应故障类别的标签信息;An extraction module configured to extract a preset number of historical network fault samples, wherein the historical network fault samples include fault sample topology data, fault sample feature data, and label information corresponding to fault categories of the fault sample feature data;
训练模块,设置为使用所述故障样本拓扑数据、所述故障样本特征数据以及所述故障样本特征数据对应故障的标签信息对所述原始GCN模型进行训练,得到所述目标GCN模型,其中,所述拓扑结构、所述故障样本特征数据为所述原始GCN模型的输入,训练好的所述目标GCN模型输出的所述故障样本特征数据对应的故障类别与所述故障样本特征数据所述目标操作结果的实际对应的故障类别满足预设目标函数。The training module is configured to use the fault sample topology data, the fault sample feature data, and the fault sample feature data corresponding to the fault label information to train the original GCN model to obtain the target GCN model, wherein the The topology structure and the fault sample feature data are the input of the original GCN model, and the fault category corresponding to the fault sample feature data output by the trained target GCN model is the same as the target operation of the fault sample feature data. The actual corresponding fault category of the result satisfies the preset objective function.
在一示例性实施例中,所述训练模块包括:In an exemplary embodiment, the training module includes:
组成子模块,设置为将所述故障拓扑数据中每个节点的所述故障样本特征数据组成第二特征矩阵,将所述故障样本拓扑数据中各节点间的连接关系形成第三邻接矩阵;Composing a sub-module, configured to form the fault sample feature data of each node in the fault topology data into a second feature matrix, and form the connection relationship between each node in the fault sample topology data to form a third adjacency matrix;
第一确定子模块,设置为根据所述第三邻接矩阵确定包括自连接的第四邻接矩阵;The first determining submodule is configured to determine a fourth adjacency matrix including self-connections according to the third adjacency matrix;
第二确定子模块,设置为根据所述第四邻接矩阵确定第二度矩阵,其中,所述第二度矩阵是由多个节点的度组成的矩阵,所述多个节点的度分别指的是所述多个节点连接的节点数,所述第二度矩阵为对角矩阵;The second determining submodule is configured to determine a second degree matrix according to the fourth adjacency matrix, wherein the second degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes respectively refer to is the number of nodes connected by the plurality of nodes, and the second degree matrix is a diagonal matrix;
组成子模块,设置为将所述故障样本特征数据对应故障类别的标签信息组成故障类别矩阵;Composing a submodule, configured to form a fault category matrix with label information corresponding to the fault category of the fault sample feature data;
训练子模块,设置为根据所述第二特征矩阵、所述第四邻接矩阵、所述第二度矩阵以及所述故障类别矩阵对所述原始GCN模型进行训练,得到所述目标GCN模型。The training submodule is configured to train the original GCN model according to the second feature matrix, the fourth adjacency matrix, the second degree matrix and the fault category matrix to obtain the target GCN model.
在一示例性实施例中,所述训练子模块,还设置为In an exemplary embodiment, the training submodule is also set to
通过以下方式确定所述目标GCN模型的权重参数:Determine the weight parameters of the target GCN model in the following manner:
Figure PCTCN2022098678-appb-000032
Figure PCTCN2022098678-appb-000032
其中,Z为所述故障类别矩阵,ReLU为所述原始GCN模型的第1层激活函数,Softmax为所述原始GCN模型的第2层激活函数,X为所述第二特征矩阵、
Figure PCTCN2022098678-appb-000033
为所述第四邻接矩阵、
Figure PCTCN2022098678-appb-000034
为所述第二度矩阵,W (0)、W (1)为所述权重参数;
Wherein, Z is the fault category matrix, ReLU is the activation function of the first layer of the original GCN model, Softmax is the activation function of the second layer of the original GCN model, X is the second feature matrix,
Figure PCTCN2022098678-appb-000033
is the fourth adjacency matrix,
Figure PCTCN2022098678-appb-000034
is the second degree matrix, W (0) and W (1) are the weight parameters;
根据所述权重参数确定所述目标GCN模型。The target GCN model is determined according to the weight parameters.
在一示例性实施例中,所述提取模块包括:In an exemplary embodiment, the extraction module includes:
采集子模块,设置为采集历史原始数据;The collection sub-module is set to collect historical raw data;
计算子模块,设置为基于故障拓扑数据的联通性、距离及权重,时间跨度及权重,预设 规则及权重,计算出所述历史原始数据中业务数据间的距离;The calculation sub-module is configured to calculate the distance between business data in the historical raw data based on connectivity, distance and weight of fault topology data, time span and weight, preset rules and weight;
划分子模块,设置为将距离小于预设阈值的业务数据划分到同一簇中,得到所述预设数量的历史网络故障样本。The division sub-module is configured to divide the service data whose distance is smaller than the preset threshold into the same cluster, and obtain the preset number of historical network fault samples.
本公开的实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。Embodiments of the present disclosure also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.
在一个示例性实施例中,上述计算机可读存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。In an exemplary embodiment, the above-mentioned computer-readable storage medium may include but not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM) , mobile hard disk, magnetic disk or optical disk and other media that can store computer programs.
本公开的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。Embodiments of the present disclosure also provide an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
在一个示例性实施例中,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。In an exemplary embodiment, the electronic device may further include a transmission device and an input and output device, wherein the transmission device is connected to the processor, and the input and output device is connected to the processor.
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。For specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and exemplary implementation manners, and details will not be repeated here in this embodiment.
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that each module or each step of the above-mentioned disclosure can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices In fact, they can be implemented in program code executable by a computing device, and thus, they can be stored in a storage device to be executed by a computing device, and in some cases, can be executed in an order different from that shown here. Or described steps, or they are fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present disclosure is not limited to any specific combination of hardware and software.
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the principle of the present disclosure shall be included in the protection scope of the present disclosure.

Claims (11)

  1. 一种故障根因确定方法,包括:A method for determining the root cause of a failure, comprising:
    获取当前业务故障数据;Obtain current business fault data;
    基于预先训练好的目标GCN模型,根据所述当前业务故障数据对应的故障特征数据确定所述当前业务故障数据的故障分类结果;Based on the pre-trained target GCN model, determine the fault classification result of the current business fault data according to the fault feature data corresponding to the current business fault data;
    根据所述故障分类结果确定所述当前业务故障数据的故障根因。Determine the fault root cause of the current service fault data according to the fault classification result.
  2. 根据权利要求1所述的方法,其中,基于预先训练好的目标GCN模型,根据所述故障特征数据确定所述当前业务故障数据的故障分类结果包括:The method according to claim 1, wherein, based on the pre-trained target GCN model, determining the fault classification result of the current service fault data according to the fault feature data comprises:
    提取所故障特征数据述当前业务故障数据的故障拓扑数据与所述故障特征数据;Extracting the fault topology data and the fault characteristic data described in the fault characteristic data of the current business fault data;
    将所述故障拓扑数据与所述故障特征数据输入所述目标GCN模型中,得到所述目标GCN模型输出的所述故障分类结果,其中,所述故障分类结果为目标故障类别矩阵,所述目标故障类别矩阵为对应每种故障类别的概率集合。Input the fault topology data and the fault feature data into the target GCN model to obtain the fault classification result output by the target GCN model, wherein the fault classification result is a target fault category matrix, and the target The fault category matrix is a set of probabilities corresponding to each fault category.
  3. 根据权利要求2所述的方法,其中,根据所述故障分类结果确定所述当前业务故障数据的故障根因包括:The method according to claim 2, wherein determining the fault root cause of the current service fault data according to the fault classification result comprises:
    确定所述目标故障类别矩阵中所述概率最大值对应的故障类别为所述当前业务故障数据的故障根因。Determining the fault category corresponding to the maximum probability in the target fault category matrix as the root cause of the fault of the current service fault data.
  4. 根据权利要求2所述的方法,其中,将所述故障拓扑数据与故障特征数据输入所述目标GCN模型中,得到所述目标GCN模型输出的所述目标故障分类结果包括:The method according to claim 2, wherein, inputting the fault topology data and fault feature data into the target GCN model, and obtaining the target fault classification result output by the target GCN model comprises:
    将所述故障拓扑数据中每个节点的所述故障特征数据组成第一特征矩阵,将所述故障拓扑数据中各节点间的连接关系形成第一邻接矩阵;Composing the fault feature data of each node in the fault topology data into a first feature matrix, and forming the connection relationship between each node in the fault topology data into a first adjacency matrix;
    根据所述第一邻接矩阵确定包括自连接的第二邻接矩阵;determining a second adjacency matrix comprising self-connections based on the first adjacency matrix;
    根据所述第二邻接矩阵确定第一度矩阵,其中,所述第一度矩阵是由多个节点的度组成的矩阵,所述多个节点的度分别指的是所述多个节点连接的节点数,所述第一度矩阵为对角矩阵:Determine the first degree matrix according to the second adjacency matrix, wherein the first degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes refer to the The number of nodes, the first degree matrix is a diagonal matrix:
    将所述第一特征矩阵、所述第二邻接矩阵、所述第一度矩阵输入所述目标GCN模型中,得到所述目标GCN模型输出的所述目标故障分类结果。Inputting the first feature matrix, the second adjacency matrix, and the first degree matrix into the target GCN model to obtain the target fault classification result output by the target GCN model.
  5. 根据权利要求1至4中任一项所述的方法,其中,在基于预先训练好的目标GCN模型,根据所述当前业务故障数据对应的故障特征数据确定所述当前业务故障数据的故障分类结果之前,所述方法还包括:The method according to any one of claims 1 to 4, wherein, based on the pre-trained target GCN model, the fault classification result of the current business fault data is determined according to the fault characteristic data corresponding to the current business fault data Previously, the method further included:
    提取预设数量的历史网络故障样本,其中,所述历史网络故障样本包括故障样本拓扑数据、故障样本特征数据以及所述故障样本特征数据对应故障类别的标签信息;Extracting a preset number of historical network fault samples, wherein the historical network fault samples include fault sample topology data, fault sample feature data, and label information corresponding to the fault category of the fault sample feature data;
    使用所述故障样本拓扑数据、所述故障样本特征数据以及所述故障样本特征数据对应故障的标签信息对原始GCN模型进行训练,得到所述目标GCN模型,其中,所述拓扑结构、所述故障样本特征数据为所述原始GCN模型的输入,训练好的所述目标GCN模型输出的所述故障样本特征数据对应的故障类别与所述故障样本特征数据所述目标操作结果的实际对应的故障类别满足预设目标函数。Use the fault sample topology data, the fault sample feature data, and the fault sample feature data corresponding to the fault label information to train the original GCN model to obtain the target GCN model, wherein the topology, the fault The sample feature data is the input of the original GCN model, the fault category corresponding to the fault sample feature data output by the trained target GCN model and the actual corresponding fault category of the target operation result of the fault sample feature data Satisfy the preset objective function.
  6. 根据权利要求5所述的方法,其中,使用所述故障样本拓扑数据、所述故障样本特征数据以及所述故障样本特征数据对应故障的标签信息对所述原始GCN模型进行训练,得到所 述目标GCN模型包括:The method according to claim 5, wherein the original GCN model is trained by using the fault sample topology data, the fault sample feature data, and the fault sample feature data corresponding to the fault label information to obtain the target GCN models include:
    将所述故障拓扑数据中每个节点的所述故障样本特征数据组成第二特征矩阵,将所述故障样本拓扑数据中各节点间的连接关系形成第三邻接矩阵;Forming the fault sample feature data of each node in the fault topology data into a second feature matrix, and forming the connection relationship between each node in the fault sample topology data into a third adjacency matrix;
    根据所述第三邻接矩阵确定包括自连接的第四邻接矩阵;determining a fourth adjacency matrix comprising self-connections according to the third adjacency matrix;
    根据所述第四邻接矩阵确定第二度矩阵,其中,所述第二度矩阵是由多个节点的度组成的矩阵,所述多个节点的度分别指的是所述多个节点连接的节点数,所述第二度矩阵为对角矩阵;Determine the second degree matrix according to the fourth adjacency matrix, wherein the second degree matrix is a matrix composed of the degrees of multiple nodes, and the degrees of the multiple nodes refer to the The number of nodes, the second degree matrix is a diagonal matrix;
    将所述故障样本特征数据对应故障类别的标签信息组成故障类别矩阵;Composing the label information corresponding to the fault category of the fault sample characteristic data into a fault category matrix;
    根据所述第二特征矩阵、所述第四邻接矩阵、所述第二度矩阵以及所述故障类别矩阵对所述原始GCN模型进行训练,得到所述目标GCN模型。The original GCN model is trained according to the second feature matrix, the fourth adjacency matrix, the second degree matrix and the fault category matrix to obtain the target GCN model.
  7. 根据权利要求6所述的方法,其中,根据所述第二特征矩阵、所述第四邻接矩阵、所述第二度矩阵以及所述故障类别矩阵对所述原始GCN模型进行训练,得到所述目标GCN模型包括:The method according to claim 6, wherein the original GCN model is trained according to the second feature matrix, the fourth adjacency matrix, the second degree matrix and the fault category matrix to obtain the Target GCN models include:
    通过以下方式确定所述目标GCN模型的权重参数:Determine the weight parameters of the target GCN model in the following manner:
    Figure PCTCN2022098678-appb-100001
    Figure PCTCN2022098678-appb-100001
    其中,Z为所述故障类别矩阵,ReLU为所述原始GCN模型的第1层激活函数,Softmax为所述原始GCN模型的第2层激活函数,X为所述第二特征矩阵、
    Figure PCTCN2022098678-appb-100002
    为所述第四邻接矩阵、
    Figure PCTCN2022098678-appb-100003
    为所述第二度矩阵,W (0)、W (1)为所述权重参数;
    Wherein, Z is the fault category matrix, ReLU is the activation function of the first layer of the original GCN model, Softmax is the activation function of the second layer of the original GCN model, X is the second feature matrix,
    Figure PCTCN2022098678-appb-100002
    is the fourth adjacency matrix,
    Figure PCTCN2022098678-appb-100003
    is the second degree matrix, W (0) and W (1) are the weight parameters;
    根据所述权重参数确定所述目标GCN模型。The target GCN model is determined according to the weight parameters.
  8. 根据权利要求5所述的方法,其中,提取预设数量的历史网络故障样本包括:The method according to claim 5, wherein extracting a preset number of historical network fault samples comprises:
    采集历史原始数据;Collect historical raw data;
    基于故障拓扑数据的联通性、距离及权重,时间跨度及权重,预设规则及权重,计算出所述历史原始数据中业务数据间的距离;Calculate the distance between business data in the historical raw data based on the connectivity, distance and weight of the fault topology data, time span and weight, preset rules and weight;
    将距离小于预设阈值的业务数据划分到同一簇中,得到所述预设数量的历史网络故障样本。The service data whose distance is smaller than the preset threshold is divided into the same cluster to obtain the preset number of historical network fault samples.
  9. 一种故障根因确定装置,包括:A device for determining the root cause of a fault, comprising:
    获取模块,设置为获取当前业务故障数据;The acquisition module is set to acquire current business failure data;
    第一确定模块,设置为基于预先训练好的目标GCN模型,根据所述当前业务故障数据对应的故障特征数据确定所述当前业务故障数据的故障分类结果;The first determination module is configured to determine the fault classification result of the current business fault data according to the fault feature data corresponding to the current business fault data based on the pre-trained target GCN model;
    第二确定模块,设置为根据所述故障分类结果确定所述当前业务故障数据的故障根因。The second determining module is configured to determine the fault root cause of the current service fault data according to the fault classification result.
  10. 一种计算机可读的存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至8任一项中所述的方法。A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the method described in any one of claims 1 to 8 when running.
  11. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至8任一项中所述的方法。An electronic device, comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to perform the method described in any one of claims 1 to 8.
PCT/CN2022/098678 2021-09-06 2022-06-14 Fault root cause determination method and device, and storage medium and electronic device WO2023029654A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111039384.4 2021-09-06
CN202111039384.4A CN115774855A (en) 2021-09-06 2021-09-06 Fault root cause determination method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
WO2023029654A1 true WO2023029654A1 (en) 2023-03-09

Family

ID=85387432

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098678 WO2023029654A1 (en) 2021-09-06 2022-06-14 Fault root cause determination method and device, and storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN115774855A (en)
WO (1) WO2023029654A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842349A (en) * 2023-08-31 2023-10-03 天津鑫宝龙电梯集团有限公司 Intelligent fault identification method, device, equipment and medium
CN117176550A (en) * 2023-09-25 2023-12-05 云念软件(广东)有限公司 Integrated operation maintenance method and system based on fault identification
CN117519052A (en) * 2023-12-12 2024-02-06 博纯(泉州)半导体材料有限公司 Fault analysis method and system based on electronic gas production and manufacturing system
CN117519052B (en) * 2023-12-12 2024-05-28 博纯(泉州)半导体材料有限公司 Fault analysis method and system based on electronic gas production and manufacturing system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015176565A1 (en) * 2014-05-22 2015-11-26 袁志贤 Method for predicting faults in electrical equipment based on multi-dimension time series
CN111342997A (en) * 2020-02-06 2020-06-26 烽火通信科技股份有限公司 Construction method of deep neural network model, fault diagnosis method and system
CN111490897A (en) * 2020-02-27 2020-08-04 华中科技大学 Network fault analysis method and system for complex network
CN112651167A (en) * 2020-12-02 2021-04-13 杭州电子科技大学 Semi-supervised rolling bearing fault diagnosis method based on graph neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015176565A1 (en) * 2014-05-22 2015-11-26 袁志贤 Method for predicting faults in electrical equipment based on multi-dimension time series
CN111342997A (en) * 2020-02-06 2020-06-26 烽火通信科技股份有限公司 Construction method of deep neural network model, fault diagnosis method and system
CN111490897A (en) * 2020-02-27 2020-08-04 华中科技大学 Network fault analysis method and system for complex network
CN112651167A (en) * 2020-12-02 2021-04-13 杭州电子科技大学 Semi-supervised rolling bearing fault diagnosis method based on graph neural network

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842349A (en) * 2023-08-31 2023-10-03 天津鑫宝龙电梯集团有限公司 Intelligent fault identification method, device, equipment and medium
CN116842349B (en) * 2023-08-31 2023-11-21 天津鑫宝龙电梯集团有限公司 Intelligent fault identification method, device, equipment and medium
CN117176550A (en) * 2023-09-25 2023-12-05 云念软件(广东)有限公司 Integrated operation maintenance method and system based on fault identification
CN117176550B (en) * 2023-09-25 2024-03-19 云念软件(广东)有限公司 Integrated operation maintenance method and system based on fault identification
CN117519052A (en) * 2023-12-12 2024-02-06 博纯(泉州)半导体材料有限公司 Fault analysis method and system based on electronic gas production and manufacturing system
CN117519052B (en) * 2023-12-12 2024-05-28 博纯(泉州)半导体材料有限公司 Fault analysis method and system based on electronic gas production and manufacturing system

Also Published As

Publication number Publication date
CN115774855A (en) 2023-03-10

Similar Documents

Publication Publication Date Title
WO2023029654A1 (en) Fault root cause determination method and device, and storage medium and electronic device
CN110943857B (en) Power communication network fault analysis and positioning method based on convolutional neural network
CN112003718B (en) Network alarm positioning method based on deep learning
US20220121994A1 (en) Method and apparatus for implementing model training, and computer storage medium
CN111199244B (en) Data classification method and device, storage medium and electronic device
US20230300159A1 (en) Network traffic anomaly detection method and apparatus, and electronic apparatus and storage medium
CN107992490A (en) A kind of data processing method and data processing equipment
CN112217674B (en) Alarm root cause identification method based on causal network mining and graph attention network
US20230169230A1 (en) Probabilistic wind speed forecasting method and system based on multi-scale information
WO2022057321A1 (en) Method and apparatus for detecting anomalous link, and storage medium
WO2023116111A1 (en) Disk fault prediction method and apparatus
CN113452802A (en) Equipment model identification method, device and system
CN112583640A (en) Service fault detection method and device based on knowledge graph
CN110795558B (en) Label acquisition method and device, storage medium and electronic device
CN114553671A (en) Diagnosis method for power communication network fault alarm
CN115086139B (en) Communication network alarm fault handling priority adjustment method and device
CN113824575B (en) Method and device for identifying fault node, computing equipment and computer storage medium
CN113541986B (en) Fault prediction method and device for 5G slice and computing equipment
CN116582414A (en) Fault root cause positioning method, device, equipment and readable storage medium
CN111291078A (en) Domain name matching detection method and device
WO2022111284A1 (en) Data labeling processing method and apparatus, and storage medium and electronic apparatus
CN115001937B (en) Smart city Internet of things-oriented fault prediction method and device
US20230209367A1 (en) Telecommunications network predictions based on machine learning using aggregated network key performance indicators
CN113779423A (en) Model parameter adjusting method and device, electronic equipment and storage medium
CN113825151B (en) Method and device for predicting relationship between slice alarms based on relationship graph convolution network

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE