WO2022048652A1 - 故障定位的方法、电子设备及存储介质 - Google Patents

故障定位的方法、电子设备及存储介质 Download PDF

Info

Publication number
WO2022048652A1
WO2022048652A1 PCT/CN2021/116527 CN2021116527W WO2022048652A1 WO 2022048652 A1 WO2022048652 A1 WO 2022048652A1 CN 2021116527 W CN2021116527 W CN 2021116527W WO 2022048652 A1 WO2022048652 A1 WO 2022048652A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
tested
network
data
objects
Prior art date
Application number
PCT/CN2021/116527
Other languages
English (en)
French (fr)
Inventor
韩俊华
薄开涛
彭鑫
郭慧峰
何力
李伟
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP21863714.8A priority Critical patent/EP4156022A4/en
Publication of WO2022048652A1 publication Critical patent/WO2022048652A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence

Definitions

  • the embodiments of the present application relate to the field of computers, and in particular, to a fault location method, an electronic device, and a storage medium.
  • the characteristic information of network elements is mainly extracted for fault location, and the characteristic information of each network element is independent of each other, resulting in inaccurate fault location.
  • an embodiment of the present application provides a method for locating faults, including: acquiring at least two objects to be tested in the network to be tested, characteristic data of the objects to be tested, and associations between the objects to be tested; At least two of the objects to be tested, the characteristic data of the objects to be tested, and each of the correlations, generate an object relationship structure diagram; according to the object relationship structure diagram and a preset fault location model, locate the object to be tested.
  • the fault object in the network, the fault location model is a graph neural network structure.
  • an embodiment of the present application further provides an electronic device, including: at least one processor; and,
  • the controller can perform the above-mentioned method of fault location.
  • the embodiments of the present application further provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the above-mentioned method for locating faults is implemented.
  • FIG. 1 is a flowchart of a method for locating faults according to the first embodiment of the present application
  • FIG. 2 is a flowchart of a method for locating a fault according to a second embodiment of the present application
  • FIG. 3 is a flowchart of a method for locating a fault according to a third embodiment of the present application.
  • FIG. 4 is a schematic diagram of an edge in a method for locating faults according to a third embodiment of the present application.
  • FIG. 5 is a schematic diagram of an edge in a method for locating faults according to a third embodiment of the present application.
  • FIG. 6 is a schematic diagram of an object relationship structure diagram in a method for locating faults according to a third embodiment of the present application.
  • FIG. 7 is a schematic diagram of aggregating sample nodes in a method for locating faults according to a third embodiment of the present application.
  • FIG. 8 is a structural block diagram of an electronic device according to a fourth embodiment of the present application.
  • the first embodiment of the present application relates to a method for locating faults, the process of which is shown in Figure 1:
  • Step 101 Acquire at least two objects to be tested in the network to be tested, characteristic data of the objects to be tested, and associations between the objects to be tested.
  • Step 102 Generate an object relationship structure diagram according to the at least two objects to be measured, the characteristic data of the objects to be measured, and each association relationship.
  • Step 103 According to the object relationship structure diagram and the preset fault location model, locate the fault object in the network under test, and the fault location model is a graph neural network structure.
  • the fault location method proposed in the present application obtains the object to be tested in the network to be tested and the relationship between the objects to be tested, and generates an object relationship structure diagram according to the object to be tested and the relationship between the objects to be tested, Convert the network under test into the form of a graph, and locate the fault object in the network under test according to the preset fault location model; the fault location model is obtained based on the training of the graph neural network, which encodes the center through the surrounding nodes. In the training, the relationship between the surrounding nodes and the central node is fully utilized to make the location of the faulty node more accurate. The fault location is performed through the graph neural network model and the object relationship structure diagram, which makes the location of the faulty object more accurate.
  • the second embodiment of the present application relates to a method for locating faults, and the method for locating faults is applied to electronic devices, such as servers and the like.
  • the second embodiment is an example of steps 101-103 in the first embodiment, and the process is shown in Figure 2:
  • Step 201 Acquire at least two objects to be tested in the network to be tested, characteristic data of the objects to be tested, and an association relationship between the objects to be tested.
  • the fault location method in this example is mainly used to locate faults in the network
  • the network to be tested can be any network, for example, a clock and time synchronization network, a Synchronous Digital Hierarchy ("Synchronous Digital Hierarchy, " SDH”) network, Packet Transport Network (“PTN”) network, IP Radio Access Network (IPRAN) network, optical transport network (“OTN”) network, IP network, etc.
  • SDH Synchronous Digital Hierarchy
  • PDN Packet Transport Network
  • IPRAN IP Radio Access Network
  • OTN optical transport network
  • IP network IP network, etc.
  • the object to be tested is an object in the network to be tested that needs to be located, for example, a network element and an optical fiber link in the network to be tested.
  • the characteristic data of the object to be tested may be acquired by means of collection, and the characteristic data may be alarm data or performance data in the object to be tested, or may include both alarm data and performance data.
  • the association relationship may be the connection relationship between the objects to be tested, and the relationship between the objects to be tested can be detected through data transmission.
  • Step 202 Take the object to be tested as a corresponding node.
  • the object to be measured can be used as a node.
  • each network element can be converted into a corresponding node;
  • the object to be measured also includes an optical fiber link, the optical fiber chain can be The path is converted into a corresponding node;
  • the object to be tested can be a network element and an optical fiber link, the network element in the network to be tested is abstracted as a corresponding node, and the optical fiber link is used as a corresponding node.
  • Step 203 Generate node information of nodes corresponding to the object to be measured according to the characteristic data of the object to be measured.
  • the feature data is converted into a feature vector whose dimension is equal to the number of kinds, and the feature vector is used as the node information of the node.
  • the number of types of feature data is obtained. For example, if the feature data of the object to be tested includes multiple alarms, the number N of types in the alarm data is obtained. If the feature data includes performance data, the types of performance data are obtained. The number M, if the feature data includes both alarm data and performance data, the number of types of alarm data is N, and the number of types of performance data is M, then the number of types of the feature data is N+M.
  • the alarm data is converted into digital coded data; the digital coded data is used as a feature vector, and the feature vector is used as node information of the node.
  • the alarm data can be character data
  • the alarm data in order to unify the data form, can be converted into digital encoded data, and the digital encoded data can be used as feature data, for example, it can be represented by 0 and 1 encoding Alarm data; each type of alarm corresponds to a feature dimension, if the alarm exists, the dimension is represented as 1, otherwise it is represented as 0. If there are N different types of alarm data, an N-dimensional feature vector is obtained, and the N-dimensional feature vector is recorded as the node information of the node.
  • the feature data includes the performance data of the object to be tested, normalize the performance data, and use the normalized performance data as a feature vector; or, according to at least two preset discrete numerical intervals , disperse the performance data into each discrete numerical interval; obtain the value corresponding to the discrete numerical interval in which the performance data is located as a feature vector.
  • the value of the performance data may be directly taken as the value in the feature vector.
  • the performance data may also be normalized or discretized.
  • the normalization process refers to normalizing the performance data representation interval to be represented between 0 and 1.
  • Discretization processing refers to setting one or more thresholds. Through the thresholds, multiple discrete numerical intervals can be obtained. According to the thresholds, the performance data can be dispersed into each discrete numerical area, and the value corresponding to the discrete numerical interval in which the performance data is located is used as the value. The value of the eigenvector.
  • the threshold is 1 and the corresponding two discrete value intervals
  • the performance value exceeding the threshold is divided into discrete value interval 1
  • the corresponding value of the discrete value interval 1 is 1; otherwise, the performance value is divided into discrete value interval 0
  • the value corresponding to the discrete value interval 0 is represented as 0. If it corresponds to multiple thresholds, according to multiple thresholds, it is divided into different discrete value intervals, from low to high, each discrete value interval corresponds to a value, and the performance data is in the discrete value interval, then the corresponding discrete value interval is obtained. Numerical values, if three thresholds are set, can be divided into four discrete numerical ranges, corresponding to four values of 0, 1, 2, and 3 respectively.
  • Step 204 Generate an edge between every two nodes according to the association relationship between the objects to be tested.
  • the relationship between the objects to be tested is taken as an edge between every two nodes. For example, if network element A and network element B are connected, an edge is generated between the corresponding node A and node B.
  • Step 205 According to each node, the node information of each node, and each edge, an object relationship structure graph is formed.
  • the node, the node information of the node, and each edge are combined to form the object relationship structure diagram.
  • Step 206 According to the object relationship structure diagram and the preset fault location model, locate the fault object in the network under test.
  • the fault localization model is obtained by training based on the graph neural network structure in advance, and the fault localization model can be trained based on the node classification model of the graph neural network.
  • h represents the characterization vector of the node (denoted as "Embedding")
  • the subscript v represents the index of the current node
  • u represents the index of the node adjacent to the V node
  • the superscript k represents the adjacent node is in the kth layer
  • represents the activation function
  • W k and B k represent matrices
  • N(v) represents the set of neighbor nodes of node v
  • AGG(*) represents the aggregation operation.
  • the basic version of the graph neural network propagation mechanism is to use the averaging method when aggregating the neighbor node information of a node, and use the neural network to do the aggregation operation, where the node information of each node can be the node's representation vector h.
  • the mathematical description of the propagation mechanism is shown in formula (2):
  • Graph Neural Networks can be divided into Graph Convolutional Networks (GCN), Graph Attention Networks (GAN), Gated Graph Neural Networks, and Graph Isomorphisms. Network (Graph Isomorphism Network, GIN) and graph sampling and aggregation (Graph SAmple and aggreGatE, Graph SAGE) network, etc.
  • the sample structure diagram in the training set is trained, the network parameters in the fault location model are obtained, and the fault location model is verified by the test sample set, and then Adjust the network parameters to obtain the fault location model.
  • the sample structure graph in the training set may include nodes and node relationship information, node labels, node feature data, and the like.
  • Node labels can be divided into faulty nodes and normal nodes, or divided into fault source nodes, fault-affected nodes, and normal nodes, according to the needs of fault location scenarios.
  • the sources of the sample network to be tested in the sample structure diagram include the simulated network environment based on the laboratory and the network environment in use on the live network.
  • the sample structure diagram can also be formed based on the data of the same sample network to be tested acquired at different times.
  • Graph neural networks can use graph neural network models with node classification capabilities, including but not limited to the following neural network models, such as Graph Convolutional Networks (GCN), Graph Attention Networks (GAN), Gated Graph Neural Network, Graph Isomorphism Network (GIN) and Graph SAGE Network, etc.
  • GCN Graph Convolutional Networks
  • GAN Graph Attention Networks
  • GAN Gated Graph Neural Network
  • GIN Graph Isomorphism Network
  • Graph SAGE Network Graph SAGE Network
  • the softmax classifier is used as an example, of course, other classifiers can also be selected.
  • the corresponding loss function needs to be designed to measure the degree of deviation between the predicted value of the model and the actual value, so that the established node classification model based on graph neural network can be trained.
  • Commonly used loss functions include cross-entropy loss function, 0-1 loss function, squared loss function, absolute loss function, logarithmic loss function and exponential loss function, etc.
  • the cross-entropy loss function is used as an example.
  • the fault location model based on the graph neural network structure is trained by using the training sample set. After the model training is completed, the test sample set is used to verify the effect of model node classification. When the accuracy of node classification reaches the standard used by actual business, the parameters obtained by model training can be solidified for application of fault location.
  • an object relationship structure diagram is displayed along with the location of the faulted object in the object relationship structure diagram.
  • the results of each object in the network to be tested can be displayed in the form of a topology map or a list, and the topology map is used for intuitive presentation.
  • Different node types need to be rendered with different background colors. For example, for faulty nodes, you can use red background color for rendering, for nodes and edges that will be affected in the direction of fault propagation, you can use yellow or orange background color for rendering, and for normal nodes and edges, you can use green or none.
  • the undertone rendering of the color In the form of a list, the faulty node and the normal node can be distinguished by adding a column.
  • the third embodiment of the present application relates to a method for locating faults.
  • the third embodiment is an improvement on the second embodiment.
  • the main improvement lies in: whether there is a directionality in the association relationship in this embodiment, if there is Directional, the relationship is converted into a directional edge. Its process is shown in Figure 3.
  • Step 301 Acquire at least two objects to be tested in the network to be tested, characteristic data of the objects to be tested, and an association relationship between the objects to be tested.
  • Step 302 Use the object to be tested as a corresponding node.
  • Step 303 Generate node information of nodes corresponding to the object to be measured according to the characteristic data of the object to be measured.
  • Steps 301 to 303 in this example are substantially the same as steps 201 to 203 in the second embodiment, and will not be repeated here.
  • Step 304 Perform the following processing for each association relationship: determine whether the association relationship has directionality, and if so, convert the association relationship into an edge representing the directionality.
  • the association is abstracted as an edge with directionality. For example, if network element A transmits data unidirectionally to network element B, the association relationship between network element A and network element B is directional. Then an edge as shown in Figure 4 can be formed, that is, node A points to node B. If there are both optical fiber links and network elements in the object to be tested, if the data of node A is transmitted to node C through the input port 1 of the fiber and the output port 2 to node C, the fiber link is used as node B, which can be formed as shown in Figure 5 edge.
  • Each edge has corresponding edge information, and the edge information includes the source end and the destination end. For example, the source end is node A, and the destination end is the input port 1 of the optical fiber link.
  • association relationship is directly converted into the corresponding edge to form an undirected graph.
  • Step 305 According to each node, the node information of each node, and each edge, an object relationship structure graph is formed.
  • This step is substantially the same as step 205 in the second embodiment, and will not be repeated here.
  • Step 306 According to the object relationship structure diagram and the preset fault location model, locate the fault object in the network under test.
  • the training process of the fault location model may include: acquiring a sample structure diagram in the training set, where the sample structure diagram is generated by the sample objects in the sample network, the feature data of each sample object, and the association relationship between each sample object; If the edges in the sample structure graph are directional, during the training process of the sample structure graph, the following aggregation processing is performed for each sample node in the sample structure graph: aggregate the node information of the sample node in the propagation direction and the sample Node information of adjacent nodes of a node, where adjacent nodes are other sample nodes within a preset distance from the sample node.
  • the adjacent nodes of each sample node may be other sample nodes within a preset distance from the sample node, and the distance between the sample node and the sample node may be the same as the layer where the adjacent node is located. For example, as shown in Figure 6 and Figure 7, if the aggregation operation is performed on the A node, the rectangle in Figure 7 represents the aggregation, and k is set to 2, then the adjacent nodes are within 2 layers of the A node. a sample node of ,
  • the 0th layer is the input layer, and the characterization vector of the 0th layer is the initial Embedding of each sample node;
  • the 1st layer The Embedding of the sample node B is propagated from the Embedding of its neighboring nodes A and C.
  • the Embedding of the sample node C is propagated from the Embedding of nodes A, B, E, and F.
  • the Embedding of sample node D comes from the propagation of the Embedding of its sample node A.
  • Layer 2 The Embedding of the sample node A comes from the propagation of the Embeddings of its nodes B, C, and D.
  • the adjacent nodes for the sample node A refer to the nodes B, C and D of the 1st layer; and the sample nodes A, C, B, E, F of the 0th layer.
  • a directed graph is established.
  • the feature information of adjacent nodes in the direction of fault propagation can be aggregated.
  • the calculation can reduce the computational cost and avoid the interference of adjacent but irrelevant node information. Improves the accuracy of the fault location model.
  • a network under test is used as an example to introduce the entire fault location process.
  • Scenario 1 The network to be tested is a clock synchronization network.
  • the clock synchronization network is mainly responsible for synchronizing the clock frequency of each network device in the network, and controlling the clock frequency deviation of each network device within the required range.
  • Clock synchronization is directional, that is, the clock frequency is synchronized to the downstream network device through the upstream network device.
  • a fault occurs, it has the direction of fault propagation, that is, if the upstream network device fails, the downstream network device will be abnormal.
  • the goal of fault location is to quickly find the faulty network device when the clock synchronization network fails.
  • the relationship between the object to be tested and the object to be tested in the fault diagnosis of the clock synchronization network is converted into nodes and edges, and an object relationship structure diagram is generated.
  • the objects to be tested include network elements, physical fiber links, and external clock sources, which are converted into nodes. Convert connections between network elements, physical fiber links, and external clock sources into edges.
  • the object to be tested is a network element, and an edge is established according to the link relationship between the network elements. Because the fault propagation of the clock synchronization network is directional, the association relationship is converted into a directed edge.
  • Clock alarms and performance data of network elements can be collected as characteristic data of the object to be tested.
  • the alarm and performance data generated by the physical ports connecting both ends of the physical fiber link can be used as its characteristic data.
  • the external clock source you can collect the network elements connected to the external clock source, and use the related clock alarm and performance data as its characteristic data.
  • each node's feature data Convert each node's feature data into feature vectors.
  • 0 and 1 encoding is used to convert it into a feature vector, which means that each type of alarm related to the clock corresponds to a feature dimension. If an alarm exists, it is represented by a value of 1, otherwise, it is It is represented by the value 0.
  • an N-dimensional feature vector will be obtained.
  • performance data in this example, the performance data is processed in a normalized manner, and the performance value is normalized to the interval of 0 to 1 for representation. One type of performance data corresponds to one feature dimension. If there are M types of feature performance indicators, then an M-dimensional feature vector is obtained. The respective feature vectors of the alarm data and the performance data are spliced to obtain an N+M-dimensional feature vector, and the N+M-dimensional feature vector is used as the node information of the node.
  • the types of nodes in the object relationship structure diagram can be divided, for example, they can be divided into fault nodes and normal nodes, or divided into fault source nodes, fault affected nodes and normal nodes.
  • the types of nodes are divided into faulty nodes and normal nodes.
  • the sample structure graph includes node, edge information, node labels, and node information.
  • the source of the sample structure diagram includes the simulated network environment based on the laboratory and the real network environment used in the live network.
  • an end-to-end graph neural network based fault localization model is preset.
  • the graph neural network model can be a graph attention network
  • the node classifier can be a softmax classifier
  • the loss function can be a cross-entropy loss function.
  • the input of the fault location model is the feature vector of each node, edge and node in the graph, and the output is the type label of each node. It should be noted that when performing the aggregation calculation of the graph neural network, the adjacent nodes in the direction of fault propagation are aggregated.
  • the fault location model is trained by using the training set for the node classification model based on the graph neural network.
  • a training sample structure graph includes the labels and feature vectors of each node, edge, and node in the graph. After the model training is completed, the test set is used to verify the effect of the model training.
  • a test sample structure graph also includes the labels and feature vectors of each node, edge, and node in the graph.
  • the network parameters of the fault location model are continuously updated through training and verification; when the accuracy of fault location reaches the standard used by actual services, the parameters obtained by model training can be solidified to generate a fault location model.
  • the process of locating the fault object in the network under test based on the trained fault location model includes: first, collecting the information of the clock synchronization network that needs to locate the fault, including the object to be tested, the relationship between the objects to be tested and the information of each object to be tested. Information such as characteristic data of the measured object. Then, the collected feature data is preprocessed, converted into digital encoded data, the converted data encoded data is used as the feature vector of the node, the object to be tested is used as the node, and the relationship between the objects is used as the edge.
  • the node information and edges form an object relationship structure diagram; the obtained object relationship structure diagram is input into the fault location model, and the node type of each node is obtained.
  • the object relationship structure diagram includes nodes, node information and edges. If the node type is the faulty node type, the location of the faulty node is obtained, and the location of the faulty location is completed.
  • topology map It can be presented in the form of a topology map or a list.
  • the topology map is used for intuitive presentation, and different node types need to be rendered with different background colors. For example, for faulty nodes, you can use red background color for rendering, for nodes and edges that will be affected in the direction of fault propagation, you can use yellow or orange background color for rendering, and for normal nodes and edges, you can use green or none.
  • the undertone rendering of the color In the form of a list, the faulty node and the normal node can be distinguished by adding a column.
  • the fault location model in this example utilizes the characteristic information of the clock synchronization network node, and also makes full use of the characteristic information of the adjacent nodes around the node.
  • the clock synchronization network has the directionality of fault propagation.
  • the node information in the fault propagation direction and the node information of the adjacent nodes are aggregated and calculated for this node, which can reduce the calculation. It also reduces the interference of irrelevant node information.
  • Scenario 2 The network to be tested is a bearer network.
  • the bearer network is mainly responsible for transmitting service data, and can provide services such as Layer 2 virtual private network L2VPN or Layer 3 virtual private network L3VPN.
  • Data transmission in the bearer network includes two directions of sending and receiving. Therefore, the direction of fault propagation includes two directions.
  • a network element node or physical fiber link of the bearer network fails, it will affect the surrounding adjacent data transmission nodes, and fault propagation may exist in both directions. Therefore, in this example, the fault propagation is non-directional.
  • the goal of fault location is to quickly find the faulty node when the bearer network fails.
  • the objects to be tested include network elements and physical fiber links, and the objects to be tested are regarded as nodes.
  • the association between network elements and physical fiber links is used as an edge.
  • the fault propagation of the bearer network has no direction, and generates undirected edges.
  • Clock alarms and performance data of network elements can be collected as characteristic data of the object to be tested.
  • the alarm and performance data generated by the physical ports connecting both ends of the physical fiber link can be used as its characteristic data.
  • the external clock source you can collect the network elements connected to the external clock source, and use the related clock alarm and performance data as its characteristic data.
  • each node's feature data Convert each node's feature data into feature vectors.
  • 0 and 1 encoding is used to convert it into a feature vector, which means that each type of alarm related to the clock corresponds to a feature dimension. If an alarm exists, it is represented by a value of 1, otherwise, it is It is represented by the value 0.
  • an N-dimensional feature vector will be obtained.
  • performance data in this example, the performance data is processed by normalization, and the performance value is normalized to the interval of 0 to 1 for representation. One type of performance data corresponds to one feature dimension. If there are M types of feature performance indicators, then an M-dimensional feature vector is obtained.
  • the feature vectors of the alarm data and the performance data are spliced together to obtain an N+M-dimensional feature vector, and the N+M-dimensional feature vector is used as the node information of the node.
  • the types of nodes in the object relationship structure diagram can be divided, for example, they can be divided into fault nodes and normal nodes, or divided into fault source nodes, fault affected nodes and normal nodes.
  • the types of nodes are divided into faulty nodes and normal nodes.
  • the sample structure diagram includes node and edge information, node labels, and node information.
  • the source of the sample structure diagram includes the simulated network environment based on the laboratory and the real network environment used in the live network.
  • an end-to-end graph neural network based fault localization model is preset.
  • the graph neural network model can be a graph attention network
  • the node classifier can be a softmax classifier
  • the loss function can be a cross-entropy loss function.
  • the input of the fault location model is the feature vector of each node, edge and node in the graph
  • the output is the type label of each node. It should be noted that when performing the aggregation calculation of the graph neural network, since the fault propagation is not directional, when performing the aggregation operation on each node, it is necessary to aggregate the node and all adjacent nodes around the node for aggregation calculation. .
  • the training set is used to train the established fault location model based on the graph neural network.
  • a training sample structure graph includes the labels and feature vectors of each node, edge, and node in the graph. After the model training is completed, the test set is used to verify the effect of the model training.
  • a test sample structure graph also includes the labels and feature vectors of each node, edge, and node in the graph.
  • the network parameters of the fault location model are continuously updated through training and verification; when the accuracy of fault location reaches the standard used by actual services, the parameters obtained by model training can be solidified to generate a fault location model.
  • the process of locating the fault object in the network under test based on the trained fault location model includes: first, collecting the information of the clock synchronization network that needs to locate the fault, including the object to be tested, the relationship between the objects to be tested and the information of each object to be tested. Information such as characteristic data of the measured object. Then, the collected feature data is preprocessed, converted into digital encoded data, the converted data encoded data is used as the feature vector of the node, the object to be tested is used as the node, and the relationship between the objects is used as the edge.
  • the node information and edges form an object relationship structure diagram; the obtained object relationship structure diagram is input into the fault location model, and the node type of each node is obtained.
  • the object relationship structure diagram includes nodes, node information and edges. If the node type is the faulty node type, the location of the faulty node is obtained to complete the location of the faulty location.
  • topology map It can be presented in the form of a topology map or a list.
  • the topology map is used for intuitive presentation, and different node types need to be rendered with different background colors. For example, for faulty nodes, you can use red background color for rendering, for nodes and edges that will be affected in the direction of fault propagation, you can use yellow or orange background color for rendering, and for normal nodes and edges, you can use green or none.
  • the undertone rendering of the color In the form of a list, the faulty node and the normal node can be distinguished by adding a column.
  • the fourth embodiment of the present application relates to an electronic device, whose structural block diagram is shown in FIG. 8 , the electronic device includes: at least one processor 401 ; and a memory 402 communicatively connected to the at least one processor 401 ; wherein, the memory 402 Instructions executable by the at least one processor 401 are stored, and the instructions are executed by the at least one processor 401 to enable the at least one processor 401 to perform the above-described method of fault location.
  • the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus links one or more processors and various circuits of the memory together.
  • the bus may also link together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described herein.
  • the bus interface provides the interface between the bus and the transceiver.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted over the wireless medium through the antenna, and the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory may be used to store data used by the processor in performing operations.
  • the fifth embodiment of the present application relates to a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the above-mentioned method for locating a fault is implemented.
  • the program is stored in a storage medium and includes several instructions to make a device (which may be a single-chip microcomputer) , chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请实施例涉及计算机领域,特别涉及一种故障定位的方法、电子设备及存储介质。本申请实施例提供的故障定位的方法,包括:获取待测网络中至少两个待测对象、待测对象的特征数据以及各待测对象之间的关联关系;根据至少两个所述待测对象、所述待测对象的特征数据以及各所述关联关系,生成对象关系结构图;根据所述对象关系结构图以及预设的故障定位模型,定位所述待测网络中的故障对象,故障定位模型为图神经网络结构。

Description

故障定位的方法、电子设备及存储介质
交叉引用
本申请基于申请号为“202010921946.7”、申请日为2020年09月04日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请实施例涉及计算机领域,特别涉及一种故障定位的方法、电子设备及存储介质。
背景技术
随着网络的规模越来越大,网络的结构越来越复杂,当网络发生故障时,如何快速的定位网络故障显得非常重要。通常采用基于机器学习的方法进行故障定位。
然而,基于机器学习的网络故障定位方法中,主要提取网元的特征信息进行故障定位,各个网元之间的特征信息相互独立,导致故障定位不准确。
发明内容
为实现上述目的,本申请实施例提供了一种故障定位的方法,包括:获取待测网络中至少两个待测对象、待测对象的特征数据以及各待测对象之间的关联关系;根据至少两个所述待测对象、所述待测对象的特征数据以及各所述关联关系,生成对象关系结构图;根据所述对象关系结构图以及预设的故障定位模型,定位所述待测网络中的故障对象,故障定位模型为图神经网络结构。
为实现上述目的,本申请实施例还提供了一种电子设备,包括:至少一个 处理器;以及,
与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的故障定位的方法。
为实现上述目的,本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现上述的故障定位的方法。
附图说明
图1是根据本申请第一实施例中故障定位的方法的流程图;
图2是根据本申请第二实施例中故障定位的方法的流程图;
图3是根据本申请第三实施例中故障定位的方法的流程图;
图4是根据本申请第三实施例中故障定位的方法中的边的示意图;
图5是根据本申请第三实施例中故障定位的方法中的边的示意图;
图6是根据本申请第三实施例中故障定位的方法中的对象关系结构图的示意图;
图7是根据本申请第三实施例中故障定位的方法中的聚合样本节点的示意图;
图8是根据本申请第四实施例中电子设备的结构框图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
本申请的第一实施方式涉及一种故障定位的方法,其流程如图1所示:
步骤101:获取待测网络中至少两个待测对象、待测对象的特征数据以及 各待测对象之间的关联关系。
步骤102:根据至少两个待测对象、待测对象的特征数据以及各关联关系,生成对象关系结构图。
步骤103:根据对象关系结构图以及预设的故障定位模型,定位待测网络中的故障对象,故障定位模型为图神经网络结构。
本申请提出的故障定位的方法,获取待测网络中的待测对象以及各待测对象之间的关联关系,根据待测对象以及各待测对象之间的关联关系,生成对象关系结构图,将待测网络住转换成图谱形式,根据预设的故障定位模型,定位出该待测网络中的故障对象;故障定位模型是基于图神经网络训练获得,图神经网络是通过周围节点来编码中心节点,在训练中充分利用了周围节点与中心节点之间的关系,使得定位的故障节点更加准确,通过该图神经网络模型以及该对象关系结构图进行故障定位,使得定位的故障对象更加准确。
本申请的第二实施方式涉及一种故障定位的方法,该故障定位的方法应用于电子设备,如,服务器等。第二实施例是对第一实施例中的步骤101-103的举例说明,其流程如图2所示:
步骤201:获取待测网络中至少两个待测对象、待测对象的特征数据以及各待测对象之间的关联关系。
在一些例子中,本示例中故障定位的方法主要用于对网络中的故障进行定位,待测网络可以是任意一种网络,例如,时钟和时间同步网、同步数字体系(Synchronous Digital Hierarchy,“SDH”)网络、分组传送网(Packet Transport Network,“PTN”)网络、无线接入网IP化(IP Radio Access Network,IPRAN)网络、光传送网(optical transport network,“OTN”)网络、IP网络等。待测对象为待测网络中需要定位故障的对象,例如,待测网络中的网元、光纤链路等。
可以通过采集的方式获取待测对象的特征数据,该特征数据可以是该待测对象中的告警数据或性能数据,也可以即包括告警数据也包括性能数据等。关联关系可以是待测对象之间的连接关系,可以通过数据的传输检测到各待测对象之间的关联关系。
步骤202:将待测对象作为对应的节点。
在一些例子中,可以将待测对象作为节点,例如,待测对象为网元,那么每个网元可以转换为对应的节点;若待测对象还包括光纤链路,那么可以将该光纤链路转换为对应的一个节点;待测对象可以是网元和光纤链路,将待测网络中的网元抽象作为对应的节点,将该光纤链路作为对应的节点。
步骤203:根据待测对像的特征数据,生成待测对象对应的节点的节点信息。
在一个例子中,根据特征数据的种类数;将特征数据转换为维度等于种类数的特征向量,将特征向量作为节点的节点信息。
在一些例子中,获取特征数据的种类数,例如,若待测对象的特征数据中包括多种告警,获取该告警数据中的种类数N,若特征数据包括性能数据,则获取性能数据的种类数M,若特征数据中既包括告警数据又包括性能数据,告警数据中的种类数为N,性能数据的种类数为M,那么该特征数据的种类数为N+M。
在一个例子中,若特征数据包括告警数据;将告警数据转换为数字编码数据;将数字编码数据作为特征向量,将特征向量作为节点的节点信息。
在一些例子中,由于告警数据可以是字符数据,为了统一数据的形式,可以将该告警数据转换为数字编码数据,将该数字编码数据作为特征数据,例如,可以采用0和1编码的方式表示告警数据;每种类型的告警对应一个特征维度,如果告警存在,该维度表示为1,否则表示为0。如果有N种不同类型的告警数据,就会得到一个N维的特征向量,该N维特征向量记为该节点的节点信息。
在一个例子中,若特征数据包括待测对象的性能数据,对性能数据进行归一化处理,将归一化处理后的性能数据作为特征向量;或者,根据预设的至少两个离散数值区间,将性能数据分散至各离散数值区间;获取性能数据所处的离散数值区间对应的数值作为特征向量。
在一些例子中,若特征数据中包括性能数据,可以直接取性能数据的值作为特征向量中的值,为了便于特征向量的表示,还可以对性能数据进行归一化或离散化处理。归一化处理是指将性能数据表示区间归一化到0到1之间表示。离散化处理是指设置1个或多个阈值,通过阈值,可以获得多个离散数值区间,根据阈值可以将性能数据分散至各个离散数值区域,将性能数据所处的离散数 值区间对应的数值作为特征向量的值。例如,若阈值为1个以及对应的两个离散数值区间,将性能数值超过阈值分至离散数值区间1,该离散数值区间1对应数值为1,否则,将性能数值分至离散数值区间0,该离散数值区间0对应数值为表示为0。如果对应多个阈值,根据多个阈值,划分为不同的离散数值区间,由低到高,每一个离散数值区间对应一个数值,性能数据所处于该离散数值区间,则获取该离散数值区间对应的数值,如设置了3个阈值,可划分为4个离散数值区间,分别对应0、1、2、3四个数值。
步骤204:根据各待测对象之间的关联关系,生成每两个节点之间的边。
在一些例子中,将各待测对象之间的关联关系作为每两个节点之间的边。例如,网元A和网元B连接,那么在对应的节点A和节点B之间生成一条边。
步骤205:根据每个节点、每个节点的节点信息以及每条边,形成对象关系结构图。
节点、节点的节点信息以及每条边组合后形成该对象关系结构图。
步骤206:根据对象关系结构图以及预设的故障定位模型,定位待测网络中的故障对象。
在一些例子中,故障定位模型为预先基于图神经网络结构训练获得,可以基于图神经网络的节点分类模型训练该故障定位模型。
为了便于对本示例的理解的,下面介绍图神经网络。
图神经网络的传播机制数学描述如公式(1):
Figure PCTCN2021116527-appb-000001
其中,h示节点的表征向量(记为“Embedding”),下标v表示当前的节点的索引,u表示与V节点相邻节点的索引,上标k表示相邻节点在第k层,σ表示激活函数,W k和B k表示矩阵,N(v)表示节点v的邻节点集合,AGG(*)表示聚合操作。当k=0时,
Figure PCTCN2021116527-appb-000002
其中,x v是节点v的输入特征向量。
图神经网络传播机制的基础版本是聚合一个节点的邻节点信息时,采用平均的方法,并使用神经网络做聚合操作,其中,每个节点的节点信息可以为节点的表征向量h。传播机制的数学描述如公式(2)所示:
Figure PCTCN2021116527-appb-000003
可以理解的是,公式(2)图神经网络传播机制的基础版本,还可以采用其他版本的图神经网络传播机制。图神经网络按照传播机制不同,可以划分为图卷积网络(Graph Convolutional Networks,GCN)、图注意力网络(Graph Attention Networks,GAN)、门控图神经网络(Gated Graph Neural Network)、图同构网络(Graph Isomorphism Network,GIN)和图采样与聚合(Graph SAmple and aggreGatE,Graph SAGE)网络等。
本示例中,按照预设的基于图神经网络训练的故障定位模型,对训练集中的样本结构图进行训练,获得该故障定位模型中的网络参数,通过测试样本集对故障定位模型进行验证,进而调整网络参数,获得故障定位模型。
在一些例子中,训练集中的样本结构图可以包括节点及节点关系信息、节点标签和节点特征数据等。节点标签根据故障定位场景的需要,可以划分为故障节点和正常节点,或者划分为故障根源节点、故障影响节点和正常节点。样本结构图中的样本待测网络的来源包括基于实验室仿真网络环境和现网使用中的网络环境。样本结构图还可以基于不同时间获取的同一样本待测网络的数据形成。
根据故障定位的需要,建立一个端到端的基于图神经网络的故障定位模型。图神经网络可以选用具有节点分类能力的图神经网络模型,包括但不限于如下图神经网络模型,如图卷积网络(Graph Convolutional Networks,GCN)、图注意力网络(Graph Attention Networks,GAN)、门控图神经网络(Gated Graph Neural Network)、图同构网络(Graph Isomorphism Network,GIN)和Graph SAGE网络等。
为了进行节点分类,还需要在节点的Embedding后附加一个分类器来完成,负责将每个节点采用图神经网络传播后得到的Embedding,映射为对应的类别输出。本示例中以softmax分类器为例,当然也可以选择其他分类器。除此之外,还需设计对应的损失函数,用于衡量模型预测值与实际值之间的偏差程度,以便可以对已建立的基于图神经网络的节点分类模型进行训练。常用的损失函数包括交叉熵损失函数、0-1损失函数、平方损失函数、绝对损失函数、对数损失函数和指数损失函数等,如本示例中以交叉熵损失函数为例。
采用训练样本集对建立的基于图神经网络结构的故障定位模型进行训练。 完成模型训练后,再采用测试样本集验证模型节点分类的效果。当节点分类的准确率达到实际业务使用的标准后,就可以将模型训练得到的参数固化下来,进行故障定位的应用。
在一个例子中,显示对象关系结构图以及故障对象在对象关系结构图中的位置。
在一些例子中,可以以拓扑图或列表的形式显示该待测网络中每个对象的结果,采用拓扑图的方式直观呈现,需要对不同节点类型采用不同底色进行渲染。比如对于故障节点,可以采用红色底色进行渲染,对于对故障传播方向上会影响到的节点和边,可以采用黄色或橙色底色进行渲染,对于正常节点和边,可以采用绿色或不采用任何颜色的底色渲染。采用列表的方式呈现,可以将故障节点和正常节点通过增加一列区分开来。
本申请的第三实施方式涉及一种故障定位的方法,第三实施例是对第二实施例的改进,主要改进之处在于:本实施例中的根据关联关系中是否存在方向性,若存在方向性,则将关联关系转换为具有方向性的边。其流程如图3所示。
步骤301:获取待测网络中至少两个待测对象、待测对象的特征数据以及各待测对象之间的关联关系。
步骤302:将待测对象作为对应的节点。
步骤303:根据待测对像的特征数据,生成待测对象对应的节点的节点信息。
本示例中步骤301至步骤303与第二实施例中的步骤201至步骤203大致相同,此处将不再进行赘述。
步骤304:针对每个关联关系进行如下处理:判断关联关系是否存在方向性,若是存在,将关联关系转换为表征方向性的边。
在一些例子中,判断关联关系是否存在方向性,若存在反向性,则将该关联关系抽象为具有方向性的边。例如,若网元A的单向传输数据至网元B,该网元A与网元B之间的关联关系具有方向性。那么可以形成如图4所示的边,即节点A指向节点B。若待测对象中既有光纤链路又有网元,若节点A的数据通过光纤的输入端口1,经过输出端口2传输至节点C,光纤链路作为节点B,可以形成如图5所示的边。每条边有对应的边信息,边信息包括源端和目的端, 如源端为节点A,目的端为光纤链路的输入端口1。
若关联关系中没有方向性,则直接将关联关系转换为对应的边,形成无向图。
步骤305:根据每个节点、每个节点的节点信息以及每条边,形成对象关系结构图。
该步骤与第二实施例中的步骤205大致相同,此处不再进行赘述。
步骤306:根据对象关系结构图以及预设的故障定位模型,定位待测网络中的故障对象。
在一个例子中,故障定位模型的训练过程可以包括:获取训练集中的样本结构图,样本结构图为样本网络中的样本对象、各样本对象的特征数据以及各样本对象之间的关联关系生成;若样本结构图中的边具有方向性,在对样本结构图进行训练过程中,针对样本结构图中的每个样本节点进行如下聚合处理:聚合在传播方向上该样本节点的节点信息以及该样本节点的相邻节点的节点信息,相邻节点为与样本节点之间距离处于预设距离之内的其他样本节点。
在一些例子中,每个样本节点的相邻节点,可以是与该样本节点之间距离预设距离之内的其他样本节点,可以将样本节点与样本节点之间的距离以相邻节点所在层的层数表示;例如,如图6和图7所示,若对A节点进行聚合操作,图7中的矩形表示聚合,设定k为2,那么相邻节点为距离该A节点2层内的样本节点,
第0层即输入层,第0层的表征向量为每个样本节点的初始的Embedding;第1层:样本节点B的Embedding来自它的邻节点A、C的Embedding的传播。样本节点C的Embedding来节点A、B、E、F的Embedding的传播。样本节点D的Embedding来自它的样本节点A的Embedding的传播。第2层:样本节点A的Embedding来自它的节点B、C、D的Embedding的传播。也就是说,对于样本节点A的相邻节点是指第1层的节点B、C和D;以及第0层的样本节点A、C、B、E、F。在聚合操作时,按照公式(1)的计算方式聚合该样本节点A的节点信息以及该样本节点A的相邻节点的节点信息,其中,由于预设距离设定为2,则可以预先设置公式(1)中的k=2。
值得一提的是,如果故障传播具有方向性,建立的是有向图,在利用图神 经网络进行聚合周围相邻节点特征信息时,可以对故障传播方向上的相邻节点的特征信息进行聚合计算,可以减少了计算代价,也避免了相邻但无关节点信息的干扰。提高了故障定位模型的准确性。
下面本示例中,举例以一个的待测网络介绍整个故障定位的过程。
情景一:待测网络为时钟同步网。
时钟同步网主要负责对网络中各网络设备的时钟频率进行同步,将各网络设备的时钟频率偏差控制在要求的范围内。时钟同步具有方向性,即通过上游网络设备向下游网络设备进行时钟频率的同步。当出现故障时,具有故障传播的方向性,也就是上游网络设备出现故障,会导致下游网络设备出现异常。故障定位的目标,就是在时钟同步网出现故障时,可以快速找到故障网络设备。
时钟同步网故障诊断的待测对象和待测对象之间的关系,转换为节点和边,生成对象关系结构图。对于时钟同步网来说,待测对象包括网元、物理光纤链路和外部时钟源,将其转换为节点。将网元、物理光纤链路和外部时钟源之间的连接关系转换为边。例如,待测对象为网元,根据网元之间的链路关系建立边。因为时钟同步网的故障传播存在方向性,故将关联关系转换为有向边。通过采集时钟同步网的网元、物理光纤链路和外部时钟源信息,得到对应的节点、边以及各节点的节点信息,形成具有方向性的对象关系结构图。
可以采集网元的时钟类告警、性能数据作为待测对象的特征数据。对于物理光纤链路,可以将连接物理光纤链路两端的物理端口产生的告警和性能数据作为其特征数据。对于外部时钟源可以通过采集与外部时钟源相连的网元,将相关的时钟类告警和性能数据作为其特征数据。
将每个节点的特征数据转换为特征向量。对于告警数据,本示例中采用0和1编码的方式将其转换为特征向量,是指对于时钟相关的每种类型的告警,分别对应一个特征维度,如果告警存在,以数值1表示,否则,就以数值0表示。针对每个节点:如果有N种不同类型的特征告警,就会得到一个N维的特征向量。对于性能数据,本示例中采用归一化的方式处理性能数据,将性能值归一化到0到1区间进行表示。一种类型的性能数据对应一个特征维度。如果有M种类型的特征性能指标,那么,得到一个M维的特征向量。将告警数据和性能数据各自的特征向量进行拼接,得到一个N+M维的特征向量,将该N+M 维的特征向量作为该节点的节点信息。
根据故障定位场景的需要,可以划分对象关系结构图中节点的类型,例如,可以划分为故障节点和正常节点,或者划分为故障根源节点、故障影响节点和正常节点。本示例中,节点的类型划分故障节点和正常节点。
在训练故障定位模型之前,获取多个样本结构图,形成训练集,样本结构图包括节点、边信息、节点标签和节点信息等。样本结构图的来源包括基于实验室仿真网络环境和现网使用中的真实网络环境。
在获得训练集之后,预先设置一个端到端的基于图神经网络的故障定位模型。图神经网络模型可以为图注意力网络,节点分类器可以为softmax分类器,损失函数可以为交叉熵损失函数。该故障定位模型的输入为图中各节点、边和节点的特征向量,输出为各节点的类型标签。需要说明的是,在进行图神经网络的聚合计算时,聚合故障传播方向上的相邻节点。
采用训练集对建立的基于图神经网络的节点分类模型对故障定位模型进行训练,一条训练样本结构图包含图中各节点、边、节点的标签和特征向量。对模型训练完成后,采用测试集验证模型训练后的效果。一条测试样本结构图同样包含图中各节点、边、节点的标签和特征向量。通过训练和验证不断更新该故障定位模型的网络参数;当故障定位的准确率达到实际业务使用的标准后,即可将模型训练得到的参数固化,生成故障定位模型。
基于已训练好的故障定位模型对待测网络中的故障对象进行定位的过程包括:首先,采集需要定位故障的时钟同步网络的信息,包括待测对象、各待测对象之间的关系和各待测对像的特征数据等信息。然后,对采集的特征数据进行预处理,转换为数字编码数据,将转换后的数据编码数据作为节点的特征向量,将待测对象作为节点,将个对象之间的关系作为边,根据节点、节点信息、边组成对象关系结构图;将得到的对象关系结构图输入到故障定位模型中,得到各个节点的节点类型,该对象关系结构图中包括节点、节点信息和边。如果节点类型是故障节点类型,则获取该故障节点的位置,完成故障位置的定位。
可以采用拓扑图或列表的方式呈现。采用拓扑图的方式直观呈现,需要对不同节点类型采用不同底色进行渲染。比如对于故障节点,可以采用红色底色进行渲染,对于对故障传播方向上会影响到的节点和边,可以采用黄色或橙色 底色进行渲染,对于正常节点和边,可以采用绿色或不采用任何颜色的底色渲染。采用列表的方式呈现,可以将故障节点和正常节点通过增加一列区分开来。
采用本示例中的故障定位模型利用了时钟同步网络节点的特征信息,还充分利用了节点周围相邻节点的特征信息。通过图神经网络可以将网络中待测对象的故障特征信息利用更充分,故障定位的准确率更高。同时,时钟同步网具有故障传播的方向性,在基于图神经网络聚合周围相邻节点特征信息时,对故障传播方向上的节点信息和相邻节点的节点信息本节点进行聚合计算,可以减少计算的代价,还减少了无关节点信息的干扰。
情景二:待测网络为承载网。
承载网主要负责传输业务数据,可以提供二层虚拟专用网L2VPN或三层虚拟专用网L3VPN等业务。承载网中传输数据包括收发两个方向,因此,故障传播的方向包括两个方向。当承载网一个网元节点或物理光纤链路出现故障时,会影响周围相邻的数据传输节点,故障传播在两个方向上都可能存在,因此,本示例中故障传播是无方向性的。故障定位的目标,在承载网出现故障时,可以快速找到故障节点。
对于承载网来说,待测对象包括网元和物理光纤链路,将待测对象作为节点。将网元和物理光纤链路之间的关联关系作为边。承载网的故障传播无方向性,生成无向边。通过采集承载网网元节点和物理光纤链路的信息,得到对应的节点的节点信息,根据节点、节点信息以及边,生成该承载网的对象关系结构图,该对象关系结构图为无向图。
可以采集网元的时钟类告警、性能数据作为待测对象的特征数据。对于物理光纤链路,可以将连接物理光纤链路两端的物理端口产生的告警和性能数据作为其特征数据。对于外部时钟源可以通过采集与外部时钟源相连的网元,将相关的时钟类告警和性能数据作为其特征数据。
将每个节点的特征数据转换为特征向量。对于告警数据,本示例中采用0和1编码的方式将其转换为特征向量,是指对于时钟相关的每种类型的告警,分别对应一个特征维度,如果告警存在,以数值1表示,否则,就以数值0表示。针对每个节点:如果有N种不同类型的特征告警,就会得到一个N维的特征向量。对于性能数据,本示例中采用归一化的方式处理性能数据,将性能值 归一化到0到1区间进行表示。一种类型的性能数据对应一个特征维度。如果有M种类型的特征性能指标,那么,得到一个M维的特征向量。将告警数据和性能数据各自的特征向量进行拼接,得到一个N+M维的特征向量,将该N+M维的特征向量作为该节点的节点信息。
根据故障定位场景的需要,可以划分对象关系结构图中节点的类型,例如,可以划分为故障节点和正常节点,或者划分为故障根源节点、故障影响节点和正常节点。本示例中,节点的类型划分故障节点和正常节点。
在训练故障定位模型之前,获取多个样本结构图,形成训练集,样本结构图包括节点及边信息、节点标签和节点信息等。样本结构图的来源包括基于实验室仿真网络环境和现网使用中的真实网络环境。
在获得训练集之后,预先设置一个端到端的基于图神经网络的故障定位模型。图神经网络模型可以为图注意力网络,节点分类器可以为softmax分类器,损失函数可以为交叉熵损失函数。该故障定位模型的输入为图中各节点、边和节点的特征向量,输出为各节点的类型标签。需要说明的是,在进行图神经网络的聚合计算时,由于故障传播不具有方向性,因此,对每个节点进行聚合操作时需要聚合本节点以及该节点的周围所有相邻的节点进行聚合计算。
采用训练集对建立的基于图神经网络的故障定位模型进行训练,一条训练样本结构图包含图中各节点、边、节点的标签和特征向量。对模型训练完成后,采用测试集验证模型训练后的效果。一条测试样本结构图同样包含图中各节点、边、节点的标签和特征向量。通过训练和验证不断更新该故障定位模型的网络参数;当故障定位的准确率达到实际业务使用的标准后,即可将模型训练得到的参数固化,生成故障定位模型。
基于已训练好的故障定位模型对待测网络中的故障对象进行定位的过程包括:首先,采集需要定位故障的时钟同步网络的信息,包括待测对象、各待测对象之间的关系和各待测对像的特征数据等信息。然后,对采集的特征数据进行预处理,转换为数字编码数据,将转换后的数据编码数据作为节点的特征向量,将待测对象作为节点,将个对象之间的关系作为边,根据节点、节点信息、边组成对象关系结构图;将得到的对象关系结构图输入到故障定位模型中,得到各个节点的节点类型,该对象关系结构图中包括节点、节点信息和边。如果 节点类型是故障节点类型,则获取该故障节点的位置,完成故障位置的定位。
可以采用拓扑图或列表的方式呈现。采用拓扑图的方式直观呈现,需要对不同节点类型采用不同底色进行渲染。比如对于故障节点,可以采用红色底色进行渲染,对于对故障传播方向上会影响到的节点和边,可以采用黄色或橙色底色进行渲染,对于正常节点和边,可以采用绿色或不采用任何颜色的底色渲染。采用列表的方式呈现,可以将故障节点和正常节点通过增加一列区分开来。
本申请第四实施方式涉及一种电子设备,其结构框图如图8所示,该电子设备包括:至少一个处理器401;以及,与至少一个处理器401通信连接的存储器402;其中,存储器402存储有可被至少一个处理器401执行的指令,指令被至少一个处理器401执行,以使至少一个处理器401能够执行上述的故障定位的方法。
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路链接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,另外,天线还接收数据并将数据传送给处理器。
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。
本申请第五实施方式涉及一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现上述的故障定位的方法。
本领域技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM, Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (10)

  1. 一种故障定位的方法,其中,包括:
    获取待测网络中至少两个待测对象、所述待测对象的特征数据以及各所述待测对象之间的关联关系;
    根据至少两个所述待测对象、所述待测对象的特征数据以及各所述关联关系,生成对象关系结构图;
    根据所述对象关系结构图以及预设的故障定位模型,定位所述待测网络中的故障对象,所述故障定位模型为图神经网络结构。
  2. 根据权利要求1所述故障定位的方法,其中,所述获取待测网络中至少两个待测对象、所述待测对象的特征数据以及各所述待测对象之间的关联关系,包括:
    将所述待测对象作为对应的节点;
    根据所述待测对像的特征数据,生成所述待测对象对应的所述节点的节点信息;
    根据各所述待测对象之间的所述关联关系,生成每两个所述节点之间的边;
    根据每个所述节点、每个所述节点的节点信息以及每条所述边,形成所述对象关系结构图。
  3. 根据权利要求2所述故障定位的方法,其中,所述根据各所述待测对象之间的所述关联关系,生成每两个所述节点之间的边,包括:
    针对每个所述关联关系进行如下处理:
    判断所述关联关系是否存在方向性,若是存在,将所述关联关系转换为表征所述方向性的边。
  4. 根据权利要求2或3所述的故障定位的方法,其中,所述获取所述待测对像的特征数据,生成所述待测对象对应的所述节点的节点信息,包括:
    获取所述特征数据的种类数;
    将所述特征数据转换为维度等于所述种类数的特征向量;
    将所述特征向量作为所述节点的节点信息。
  5. 根据权利要求4所述的故障定位的方法,其中,若所述特征数据包括所述待测对象的性能数据;
    所述将所述特征数据转换为维度等于所述种类数的特征向量,包括:
    对所述性能数据进行归一化处理,将归一化处理后的所述性能数据作为所述特征向量;或者,
    根据预设的至少两个离散数值区间,将所述性能数据分散至各所述离散数值区间;获取所述性能数据所处的所述离散数值区间对应的数值作为所述特征向量。
  6. 根据权利要求4所述的故障定位的方法,其中,若所述特征数据包括告警数据;
    所述将所述特征数据转换为维度等于所述种类数的特征向量,包括:
    将所述告警数据转换为数字编码数据;
    将所述数字编码数据作为所述特征向量。
  7. 根据权利要求4所述的故障定位的方法,其中,所述根据所述对象关系结构图以及预设的故障定位模型,定位所述待测网络中的故障对象之前,所述方法还包括:
    获取训练集中的样本结构图,所述样本结构图为样本网络中的样本对象以及各样本对象之间的关联关系生成;
    若所述样本结构图中的边具有方向性,在对所述样本结构图进行训练过程中,针对每个所述样本节点进行如下聚合处理:聚合在传播方向上所述样本节点的节点信息以及所述样本节点的相邻节点的节点信息,所述相邻节点为与所述样本节点之间距离处于预设距离之内的其他样本节点。
  8. 根据权利要求1至7中任一项所述的故障定位的方法,其中,根据所述对象关系结构图以及预设的故障定位模型,定位所述待测网络中的故障对象之后,包括:
    显示所述对象关系结构图以及所述故障对象在所述对象关系结构图中的位置。
  9. 一种电子设备,包括:至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至8中 任一所述的故障定位的方法。
  10. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至8中任一项所述的故障定位的方法。
PCT/CN2021/116527 2020-09-04 2021-09-03 故障定位的方法、电子设备及存储介质 WO2022048652A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21863714.8A EP4156022A4 (en) 2020-09-04 2021-09-03 FAULT LOCATION METHOD, ELECTRONIC DEVICE AND RECORDING MEDIUM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010921946.7 2020-09-04
CN202010921946.7A CN114221857A (zh) 2020-09-04 2020-09-04 故障定位的方法、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022048652A1 true WO2022048652A1 (zh) 2022-03-10

Family

ID=80491622

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/116527 WO2022048652A1 (zh) 2020-09-04 2021-09-03 故障定位的方法、电子设备及存储介质

Country Status (3)

Country Link
EP (1) EP4156022A4 (zh)
CN (1) CN114221857A (zh)
WO (1) WO2022048652A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481692A (zh) * 2022-09-23 2022-12-16 常州安控电器成套设备有限公司 一种基于sgan的水泵机组故障诊断方法
CN115494349A (zh) * 2022-11-04 2022-12-20 国网浙江省电力有限公司金华供电公司 有源配电网单相接地故障区段定位方法
CN115857461A (zh) * 2023-03-02 2023-03-28 东莞正大康地饲料有限公司 小猪预混合饲料生产在线监控方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114785674A (zh) * 2022-04-27 2022-07-22 中国电信股份有限公司 故障定位方法及装置、计算机可存储介质
WO2024114675A1 (zh) * 2022-12-02 2024-06-06 青岛海信日立空调系统有限公司 多联机空调系统及其控制方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106998256A (zh) * 2016-01-22 2017-08-01 腾讯科技(深圳)有限公司 一种通信故障定位方法及服务器
US20170236000A1 (en) * 2016-02-16 2017-08-17 Samsung Electronics Co., Ltd. Method of extracting feature of image to recognize object
US20200242506A1 (en) * 2019-01-25 2020-07-30 Optum Services (Ireland) Limited Systems and methods for time-based abnormality identification within uniform dataset
CN111538872A (zh) * 2020-07-09 2020-08-14 太平金融科技服务(上海)有限公司 业务节点信息的可视化方法、装置、计算机设备和介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10050853B2 (en) * 2016-08-25 2018-08-14 Fujitsu Limited Neural network learning methods to identify network ports responsible for packet loss or delay
US10637715B1 (en) * 2017-05-02 2020-04-28 Conviva Inc. Fault isolation in over-the-top content (OTT) broadband networks
CN110995475B (zh) * 2019-11-20 2023-04-11 国网湖北省电力有限公司信息通信公司 一种基于迁移学习的电力通信网故障检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106998256A (zh) * 2016-01-22 2017-08-01 腾讯科技(深圳)有限公司 一种通信故障定位方法及服务器
US20170236000A1 (en) * 2016-02-16 2017-08-17 Samsung Electronics Co., Ltd. Method of extracting feature of image to recognize object
US20200242506A1 (en) * 2019-01-25 2020-07-30 Optum Services (Ireland) Limited Systems and methods for time-based abnormality identification within uniform dataset
CN111538872A (zh) * 2020-07-09 2020-08-14 太平金融科技服务(上海)有限公司 业务节点信息的可视化方法、装置、计算机设备和介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4156022A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481692A (zh) * 2022-09-23 2022-12-16 常州安控电器成套设备有限公司 一种基于sgan的水泵机组故障诊断方法
CN115481692B (zh) * 2022-09-23 2023-10-10 江苏安控智汇科技股份有限公司 一种基于sgan的水泵机组故障诊断方法
CN115494349A (zh) * 2022-11-04 2022-12-20 国网浙江省电力有限公司金华供电公司 有源配电网单相接地故障区段定位方法
CN115494349B (zh) * 2022-11-04 2023-04-07 国网浙江省电力有限公司金华供电公司 有源配电网单相接地故障区段定位方法
CN115857461A (zh) * 2023-03-02 2023-03-28 东莞正大康地饲料有限公司 小猪预混合饲料生产在线监控方法及系统

Also Published As

Publication number Publication date
EP4156022A1 (en) 2023-03-29
CN114221857A (zh) 2022-03-22
EP4156022A4 (en) 2024-06-05

Similar Documents

Publication Publication Date Title
WO2022048652A1 (zh) 故障定位的方法、电子设备及存储介质
WO2019136955A1 (zh) 基于画像技术的网络异常检测方法、装置、设备及介质
WO2018103453A1 (zh) 检测网络的方法和装置
US9697100B2 (en) Event correlation
US8751417B2 (en) Trouble pattern creating program and trouble pattern creating apparatus
JP7116103B2 (ja) 光モジュールの故障を予測するための方法、装置、およびデバイス
US7688758B2 (en) Node merging process for network topology representation
US11501106B2 (en) Anomaly factor estimation device, anomaly factor estimation method, and storage medium
WO2015154478A1 (zh) 创建性能测量任务、性能测量结果的处理方法及装置
TW200849917A (en) Detecting method of network invasion
US10050853B2 (en) Neural network learning methods to identify network ports responsible for packet loss or delay
WO2019179457A1 (zh) 一种确定网络设备的状态的方法及装置
CN107113191A (zh) 数据中心结构网络中的内联数据包追踪
CN115277102A (zh) 网络攻击检测方法、装置、电子设备及存储介质
US11153193B2 (en) Method of and system for testing a computer network
Yan et al. Principal Component Analysis Based Network Traffic Classification.
Savaliya et al. Securing industrial communication with software-defined networking.
Guo et al. FullSight: A feasible intelligent and collaborative framework for service function chains failure detection
WO2021038639A1 (ja) デバイス識別装置、デバイス識別方法およびデバイス識別プログラム
US10057148B2 (en) Data-driven estimation of network port delay
CN114978976A (zh) SRv6融合网络的数据异常检测方法及装置
Haddadi et al. Tuning topology generators using spectral distributions
JP2022037107A (ja) 障害分析装置、障害分析方法および障害分析プログラム
JP7217885B2 (ja) ネットワークスキャン装置、コンピュータに実行させるためのプログラムおよびプログラムを記録したコンピュータ読み取り可能な記録媒体
US8542601B2 (en) Abnormality locating method and apparatus, and computer-readable storage medium

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021863714

Country of ref document: EP

Effective date: 20221219

NENP Non-entry into the national phase

Ref country code: DE