CN115225457B

CN115225457B - Knowledge and data dual-drive-based network fault analysis method

Info

Publication number: CN115225457B
Application number: CN202210729564.3A
Authority: CN
Inventors: 朱晓荣; 谷奉锦
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2023-09-26
Anticipated expiration: 2042-06-24
Also published as: CN115225457A

Abstract

The invention discloses a network fault analysis method based on knowledge and data dual drive, which constructs a network fault knowledge map based on the step-by-step diagnosis of network faults by using machine learning, and performs network fault analysis by using knowledge retrieval and knowledge reasoning, so that the final network fault analysis result has high diagnosis accuracy and good interpretation.

Description

Knowledge and data dual-drive-based network fault analysis method

Technical Field

The invention relates to the technical field of communication, in particular to a network fault analysis method based on knowledge and data dual driving.

Background

Along with the development of technology and the progress of age, the development of mobile communication networks has undergone a change and innovation once and becomes an indispensable part of daily life. Along with the increase of the number of network users and the diversification of demands, the current network scene is very complex, and the management of the complex network environment is very challenging to ensure the normal operation of the network.

In recent years, with the development of big data mining and artificial intelligence technology, more and more intelligent fault diagnosis methods based on machine learning are favored. Currently, network fault diagnosis techniques based on machine learning are widely applied, and mainly comprise a fault diagnosis method based on a Support Vector Machine (SVM), a fault diagnosis method based on an Artificial Neural Network (ANN), a network fault diagnosis method based on deep learning, and the like. The method for diagnosing faults based on machine learning can make full use of big data to diagnose possible fault modes, fault reasons and the like, but the network fault diagnosis based on data only has two problems, namely, the method is firstly interpretable, the traditional model for machine learning and deep learning is a black box for a user, and the interpretation cannot be given for the output fault diagnosis result, so that the application of the model in actual engineering is reduced; secondly, the prior knowledge, such as unstructured knowledge, such as a fault isolation manual, cannot be effectively utilized, so that unstructured data resources are wasted.

The knowledge graph concept has been widely focused in the industry since 2012 by google, and has been widely applied in many fields in recent years. The knowledge graph can gather massive information, data and link relations into knowledge, so that information resources are easier to calculate, understand and evaluate, a set of semantic knowledge base is formed, a more effective mode is provided for massive, heterogeneous and dynamic big data expression, organization, management and utilization, the intelligent level of a network is higher, and the network is more similar to the cognitive thinking of human beings. In recent years, knowledge maps have been widely used in the fields of medical treatment, education, fault diagnosis, and the like.

In order to make up the defects of the fault diagnosis method based on machine learning only, the prior knowledge is fully utilized, and the network fault analysis method based on the combination of the knowledge graph and the machine learning is provided.

Disclosure of Invention

The purpose of the invention is that: aiming at the problem that the network fault diagnosis process is only based on machine learning, the network fault analysis method based on the knowledge and data dual-drive is constructed by combining a knowledge graph, the fault diagnosis is carried out by using the machine learning, the diagnosis result is used as a known condition to carry out knowledge reasoning and knowledge retrieval in the knowledge graph, the information related to the fault is output, and the interpretability and the practicability of the fault analysis are improved;

in order to achieve the above functions, the invention designs a network fault analysis method based on knowledge and data dual drive, and the following steps S1-S3 are executed aiming at a network with network faults in a target area to finish network fault diagnosis:

s1, respectively taking a network fault name, a network fault reason, a network fault description, a network fault representation, a problem caused by a network fault, a network fault occurrence position, network fault solving measures and a network fault type as an upper-layer body of a network fault knowledge graph, respectively carrying out knowledge extraction on structured data, semi-structured data and unstructured data of the upper-layer body of each network fault knowledge graph to obtain a triplet corresponding to the upper-layer body, another upper-layer body related to the upper-layer body and a relation between the two upper-layer bodies, and constructing and storing the knowledge graph based on the triples corresponding to each upper-layer body;

s2, aiming at a network with network faults in a target area, collecting data samples with known network fault names and network fault reasons, firstly taking the data samples as input, taking the network fault names as output, training a machine learning network through a machine learning method to obtain a network fault name diagnosis model, then taking the data samples as input, taking the network fault reasons as output, training the machine learning network through the machine learning method to obtain a network fault reason diagnosis model, applying the obtained network fault name diagnosis model and the network fault reason diagnosis model, inputting actual data samples aiming at the target area to obtain the network fault names and the network fault reasons of the network faults in the target area, and constructing a network fault knowledge graph of the target area network based on the obtained network fault names and the network fault reasons of the target area and the knowledge graph obtained in the step S1;

and S3, based on the network fault knowledge graph of the target area network obtained in the step S2, aiming at network faults occurring in the target area, respectively diagnosing the fault name output by the model and the fault source output by the model as a core entity according to the fault name, searching and outputting an upper-layer body with the distance of 1 from the core entity in the knowledge graph based on a knowledge search method of sub-graph matching, and outputting the relation between the core entity and the upper-layer body to finish the diagnosis of the network faults occurring in the target area.

As a preferred technical scheme of the invention: in the step S1, aiming at the constructed knowledge graph, the knowledge graph is stored in a Neo4j graph database mode.

As a preferred technical scheme of the invention: in step S2, the network fault names include weak coverage and poor quality, the two network fault names respectively correspond to network fault reasons, the network fault reasons corresponding to the weak coverage include large station spacing, room leakage, abnormal switching threshold, missing neighbor cells and abnormal measurement threshold, and the network fault reasons corresponding to the poor quality include MOD3 interference, pilot pollution and overlapping coverage;

the specific steps of step S2 are as follows:

s21, aiming at a network with network faults in a target area, collecting a data sample comprising at least 8 characteristic values, carrying out standardized pretreatment on the data sample, and carrying out relevance descending order on each characteristic value of the data sample based on an XGboost algorithm characteristic value ordering function;

s22, based on the arrangement of the characteristic values of the data samples obtained in the step S21, selecting the first 8 characteristic values in the arrangement, taking the selected characteristic values as input, respectively based on 6 machine learning algorithms of logistic regression, linear discriminant analysis, K nearest neighbor, decision tree, naive Bayes and support vector machine, and training a machine learning model by using the network fault name and the network fault cause as output to obtain a network fault name diagnosis model and a network fault cause diagnosis model;

s23, comparing the diagnosis accuracy of the 6 machine learning algorithms in the step S22, respectively aiming at the network fault name diagnosis model and the network fault cause diagnosis model, matching the machine learning algorithm with the highest diagnosis accuracy, and outputting the diagnosis result of the machine learning algorithm with the highest diagnosis accuracy.

The beneficial effects are that: the advantages of the present invention over the prior art include:

1. the combination of machine learning and knowledge graph is applied to network fault diagnosis, so that the accuracy of fault diagnosis is ensured, and the interpretability of fault diagnosis is effectively improved.

2. The network fault diagnosis problem is divided into two sub-problems, and different machine learning algorithms are used for processing different problems, so that the accuracy of the network fault diagnosis is greatly improved.

3. The fault diagnosis result is displayed in the form of a knowledge graph sub-graph, so that the fault diagnosis result is clearer and the response is faster.

Drawings

FIG. 1 is a flow chart of a knowledge and data dual driven network failure analysis method provided in accordance with an embodiment of the present invention;

FIG. 2 is a diagram of a knowledge graph provided in accordance with an embodiment of the present invention;

fig. 3 is a network failure knowledge graph provided in accordance with an embodiment of the present invention;

FIG. 4 is a comparison of machine learning accuracy provided in accordance with an embodiment of the present invention;

fig. 5 is a knowledge subgraph of a weak coverage output as a core entity, provided according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Referring to fig. 1, in the network fault analysis method based on knowledge and data dual driving provided by the embodiment of the invention, for a network where a network fault occurs in a target area, the following steps S1 to S3 are executed to complete network fault diagnosis:

s1, respectively taking a network fault name, a network fault reason, a network fault description, a network fault representation, a problem caused by a network fault, a network fault occurrence position, network fault solving measures and a network fault type as an upper-layer body of a network fault knowledge graph, respectively carrying out knowledge extraction on structured data, semi-structured data and unstructured data of the upper-layer body of each network fault knowledge graph to obtain a triplet corresponding to the upper-layer body, another upper-layer body related to the upper-layer body and a relation between the two upper-layer bodies, and constructing and storing the knowledge graph based on the triples corresponding to each upper-layer body; knowledge graph structure referring to fig. 2.

And storing the knowledge graph in a Neo4j graph database form aiming at the constructed knowledge graph.

S2, aiming at a network with network faults in a target area, collecting data samples with known network fault names and network fault reasons, firstly taking the data samples as input, taking the network fault names as output, training a machine learning network through a machine learning method to obtain a network fault name diagnosis model, then taking the data samples as input, taking the network fault reasons as output, training the machine learning network through the machine learning method to obtain a network fault reason diagnosis model, applying the obtained network fault name diagnosis model and the network fault reason diagnosis model, inputting actual data samples aiming at the target area to obtain the network fault names and the network fault reasons of the network faults in the target area, and constructing a network fault knowledge graph of the target area network based on the obtained network fault names and the network fault reasons of the target area and the knowledge graph obtained in the step S1; the network failure knowledge graph is referred to in fig. 3.

In step S2, the network fault names include weak coverage and poor quality, the two network fault names respectively correspond to network fault reasons, the network fault reasons corresponding to the weak coverage include large station spacing, room leakage, abnormal switching threshold, missing neighbor cells and abnormal measurement threshold, and the network fault reasons corresponding to the poor quality include MOD3 interference, pilot pollution and overlapping coverage;

the specific steps of step S2 are as follows:

s21, aiming at a network with network faults in a target area, collecting data samples comprising at least 8 eigenvalues, wherein the fault data samples comprise eigenvalues related to network fault states, and the eigenvalues comprise at least RSRP, RSRQ, RSSI and SINR; carrying out standardized pretreatment on the data sample, and carrying out relevance descending order arrangement on each characteristic value of the data sample based on the characteristic value ordering function of the XGboost algorithm;

the specific method for standardized pretreatment is as follows:

for k eigenvalue sets X ₁ ,X ₂ ,…,X _i ,…,X _k ++, wherein X _i ＝*x _i1 ,x _i2 ,…,x _in +，x _i1 ,x _i2 ,…,x _in Respectively the ith characteristic value in n fault data samples;

the following formula is normalized for each eigenvalue:

wherein x is _ij For the characteristic value set X _i The j-th characteristic value, Y _ij Is x _ij Normalized values after pretreatment.

S23, selecting the first 8 characteristic values in the arrangement based on the characteristic value arrangement of the fault data sample obtained in the step S22, taking the selected characteristic values as input, respectively training a machine learning model based on 6 machine learning algorithms of logistic regression, linear discriminant analysis, K nearest neighbor, decision tree, naive Bayes and support vector machine by using a network fault mode and a network fault source as output;

s24, comparing the diagnosis accuracy of the 6 machine learning algorithms in the step S23, and respectively aiming at a network failure mode and a network failure cause, matching the machine learning algorithm with the highest diagnosis accuracy, and outputting the diagnosis result of the machine learning algorithm with the highest diagnosis accuracy; 6 machine learning accuracy contrast graphs refer to fig. 4.

The calculation formula of the diagnosis accuracy rate is as follows:

where Ac is the diagnostic accuracy, TP is the number of times positive samples are predicted as positive samples, TN is the number of times negative samples are predicted as negative samples, FP is the number of times negative samples are predicted as positive samples, i.e. the number of false positives, TN is the number of times positive samples are predicted as negative samples, i.e. the number of false negatives.

Wherein, the knowledge subgraph output by taking weak coverage as a core entity refers to fig. 5.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims

1. A network fault analysis method based on knowledge and data dual drive is characterized in that the following steps S1-S3 are executed aiming at a network with network faults in a target area to finish network fault diagnosis:

2. The network fault analysis method based on knowledge and data dual driving as claimed in claim 1, wherein in step S1, the knowledge graph is stored in the form of Neo4j graph database for the constructed knowledge graph.

3. The method for analyzing network faults based on knowledge and data dual driving as claimed in claim 1, wherein in step S2, the network fault names include weak coverage and poor quality, the two network fault names correspond to network fault reasons respectively, the network fault reasons corresponding to the weak coverage include large station spacing, room separation leakage, abnormal switching threshold, missing neighbor cells and abnormal measurement threshold, and the network fault reasons corresponding to the poor quality include MOD3 interference, pilot pollution and overlapping coverage;

the specific steps of step S2 are as follows: