CN112260873A

CN112260873A - Dynamic network fault diagnosis method under 5G network slice

Info

Publication number: CN112260873A
Application number: CN202011137354.2A
Authority: CN
Inventors: 黄晓奇; 谭康; 保剑; 周瑾瑜; 王龙; 周雨涛; 丘国良; 郑启文; 欧明辉; 吴俊宇; 宋旅宁
Original assignee: Shenzhen Power Supply Bureau Co Ltd
Current assignee: Shenzhen Power Supply Bureau Co Ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2021-01-22
Anticipated expiration: 2040-10-22
Also published as: CN112260873B

Abstract

The invention provides a dynamic network fault diagnosis method under 5G network slice, which comprises the steps of calculating the unavailable reliability of each fault node according to the probability that each fault node causes the unavailability of each symptom node in a symptom range observed by a network manager and the unavailable probability that each fault node causes the unavailability of each symptom node in the whole symptom range, and calculating the corrected value of the symptom node; correcting the original state and the current state of the symptom node by using the correction value of the symptom node; constructing a fault propagation model based on the time segments; acquiring the original state and the current state of the corrected symptom node, and calculating the original state of the fault node according to the original state of the corrected symptom node; and calculating the current state of the fault node according to the original state of the fault node, the corrected original state and the current state of the symptom node. By the method and the device, the problems of noise increase and inaccurate fault propagation model caused by dynamic increase of the existing network are solved.

Description

Dynamic network fault diagnosis method under 5G network slice

Technical Field

The invention relates to the technical field of 5G communication, in particular to a dynamic network fault diagnosis method under a 5G network slice.

Background

The 5G network has the characteristics of high bandwidth and low time delay, and meets the network requirements of people in work and life. Under the background, the application scenes of the 5G network are more and more, and the working efficiency and the life convenience are obviously improved. However, since the 5G network has various types and a large number of services, the demand for network resources is significantly increased. In order to improve the utilization rate of network resources and ensure the service quality of various services, the network slicing technology has become a network technology framework commonly accepted by network equipment vendors and network operators. After network slicing, the existing physical network is divided into an underlying network and a virtual network. The wireless sub-network, the transmission sub-network and the data sub-network of the underlying network are sliced into different sub-networks. The virtual network is composed of wireless slices, transmission slices, and data slices by acquiring resources from the underlying network. For convenience of description, the present invention refers to the resources of the underlying network collectively as underlying node resources and underlying link resources. Resources of the virtual network are collectively referred to as virtual node resources and virtual link resources. In order to ensure the stable operation of the underlying network and the virtual network, the service quality of the 5G service is improved. The fault diagnosis technology of the 5G network has become a current research hotspot.

The fault diagnosis algorithm can be divided into two strategies of passive monitoring and active monitoring. Passive monitoring is passive fault location based on data from a network management system. The active monitoring is to actively monitor the network characteristics in real time according to the characteristics of the service, thereby quickly finding potential problems and repairing faults. For example, literature [ GONTARA, Salah; BOUFAIED, Amine; based on a Boolean Particle Swarm Optimization Algorithm, an end-to-end fault location Algorithm is designed by KORBAA, Oujdi.fault Localization Algorithm in Computer Networks Based on the Boolean Particle Swarm Optimization [ C ]// Proceedings of the 2019IEEE International Conference on Systems, Man and Cybernetics (SMC). According to the technical scheme, a network monitoring system is dynamically adjusted according to network topological characteristics, so that a fault diagnosis algorithm is more adaptive to changes of a network environment. The fault diagnosis process generally adopts a dependency matrix for fault location, and can be divided into a binary model and a non-binary model.

The existing method mainly solves the problem of fault diagnosis in a static network environment. Due to the fact that the 5G network slicing technology has the characteristics of dynamic migration and increase as required, network node and network link resources can dynamically change along with time, and the problems of increase of network noise caused by the dynamic property and inaccuracy of a fault propagation model caused by dynamic change of the network resources are solved.

Disclosure of Invention

The invention aims to provide a dynamic network fault diagnosis method under a 5G network slice, which is used for solving the problems of network noise increase and inaccurate fault propagation model caused by the dynamic property of the existing network.

The invention provides a dynamic network fault diagnosis method under a 5G network slice, which comprises the following steps:

step S11, constructing an initial fault propagation model, wherein the initial fault propagation model comprises faults, symptoms and directed lines from the faults to the symptoms;

step S12, obtaining the probability that each fault node causes unavailability of each symptom node in the symptom range observed by the network manager, the unavailability probability that each fault node causes each symptom node in the whole symptom range, and calculating the unavailability credibility of each fault node according to the probability that each fault node causes unavailability of each symptom node in the symptom range observed by the network manager and the unavailability probability that each fault node causes each symptom node in the whole symptom range;

step S13, calculating the correction value of the symptom node according to the unavailable credibility of each fault node having influence with the symptom node;

step S14, when the correction value of the symptom node and the original value of the symptom node respectively represent that the symptom is in different states, averaging the correction value of the symptom node and the original value of the symptom node to obtain an average value of the symptom node, and correcting the original state and the current state of the symptom node by using the average value of the symptom node;

step S15, constructing a fault propagation model based on the time slice, wherein the fault propagation model based on the time slice comprises the original state of the fault node, the current state of the fault node and the original state and the current state of the symptom node after correction, and acquiring the state transition probability of the fault node from the original state to the current state;

step S16, acquiring the original state and the current state of the corrected symptom node, and calculating the original state of the fault node in the original state of the corrected symptom node according to the original state of the corrected symptom node;

and step S17, calculating the current state of the fault node according to the original state of the fault node, the corrected original state and the current state of the symptom node.

Further, the formula for calculating the unavailability reliability of each failed node according to the probability that each failed node causes unavailability of each symptom node in the symptom range observed by the network manager and the unavailability probability that each failed node causes each symptom node in the total symptom range in the step S12 is specifically:

wherein the content of the first and second substances,

is the unavailability confidence of the ith failed node, s_j∈S_oIndicates that the jth symptom node is in the range of the symptom observed by the network management, s_jE S denotes that the jth symptom node is in the whole symptom range, P (S)_j|f_i) Indicating the probability that the ith failed node results in the unavailability of the respective symptom node.

Further, step S13 specifically includes:

step S131, adding the unavailable credibility of each fault node having influence on the symptom node to obtain the sum of the unavailable credibility;

and step S132, dividing the sum of the unavailable credibility by the number of fault nodes having influence on the symptom node to obtain a corrected value of the symptom node.

Further, the formula for calculating the original state of the fault node in the original state of the modified symptom node according to the original state of the modified symptom node in step S16 is specifically:

wherein, F is¹In order to be the original state of the failed node,

indicating the original state of the failed node,

indicating the original state of the modified symptom node,

and the probability that the original state of the fault node is abnormal when the original state of the corrected symptom node is abnormal is shown.

Further, the formula for implementing step S17 is specifically:

and F²＝argF^2*,

Wherein, F is²In order to be the current state of the failed node,

representing the original state of the symptom node,

representing the current state of the symptom node,

indicating the current state of the failed node.

The implementation of the invention has the following beneficial effects:

according to the method, the original state and the current state of the symptom node are acquired through network management, the unavailable credibility of the fault node is introduced to correct the original state and the current state of the symptom node, a time segment is introduced into a fault propagation model, the current state of the fault node is further calculated after the original state of the fault node is obtained, and partial interference caused by dynamic introduction of a network is corrected by utilizing the relation between the fault node and the symptom node and the correlation of the fault node in time; the problem of inaccurate fault propagation model caused by network noise increase and network resource dynamic change due to the existing dynamic property is solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a dynamic network fault diagnosis method under a 5G network slice according to an embodiment of the present invention.

Fig. 2 is a schematic view of virtual network resource allocation under a 5G network slice according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an initial fault propagation model under a 5G network slice according to an embodiment of the present invention.

FIG. 4 is a graphical illustration of a comparison of diagnostic accuracy provided by embodiments of the present invention.

Fig. 5 is a comparison diagram of the diagnostic false alarm rate provided by the embodiment of the invention.

Fig. 6 is a comparison diagram of diagnostic durations provided by an embodiment of the present invention.

Detailed Description

In this patent, the following description will be given with reference to the accompanying drawings and examples.

As shown in fig. 1, an embodiment of the present invention provides a dynamic network fault diagnosis method under a 5G network slice, where the method includes:

and step S11, constructing an initial fault propagation model, wherein the initial fault propagation model comprises a fault node, a symptom node and a directed line from the fault node to the symptom node.

In this embodiment, the underlying network and the virtual network use undirected weighted graph G, respectively^S＝(N^S,E^S)、 G^V＝(N^V,E^V) Is represented by, wherein, N^SAnd N^VRespectively representing a bottom node set, a virtual node set, E^SAnd E^VRespectively representing a bottom link set and a virtual link set. Using M_N:(N^V→N^S,E^V→P^S) Indicating an underlying network node N^SResource allocation to a virtual network node N^VAn underlying network path P^SResource allocation to virtual network link E^V. Underlying network path P^SThe method refers to an end-to-end bottom layer link resource formed by connecting a plurality of bottom layer links, and two end points respectively correspond to a virtual network link E^VThe two endpoints of (2) are mapped to the underlying network node. For example, the resources of the virtual nodes a, b of the virtual network VN1 are allocated by the underlay nodes A, B of the underlay network SN, respectively.

In the daily operation process of a network operator, the running states of various services and the states of network resources can be acquired in real time through network management software. Due to network slicing, the running state of a service cannot directly correspond to underlying network resources. Therefore, before establishing the fault propagation model, the virtual network is first mapped onto the underlying network based on the mapping relationship between the virtual network and the underlying network, and the specific network mapping refers to fig. 2. And secondly, constructing a service fault propagation model by using the service state, the underlying network resources and the bearing relationship between the service and the underlying network resources.

The business fault propagation model includes symptoms, faults, fault-to-symptom directed lines. Symptoms refer to states of traffic, including available and unavailable states, denoted by s ═ 1 and s ═ 0, respectively. Failure refers to underlying network resources. The fault state includes a resource available state and a resource unavailable state, which are respectively represented by f-1 and f-0. The directed line from fault to symptom represents the effect of the fault node on the symptom node, and the number on the line represents the degree of effect, see FIG. 3, e.g. fault node A to symptom node

Is 0.7, use

And (4) showing. The physical meaning is: according to the historical operation experience of the network, when the bottom layer resource is in an unavailable state, the symptom node is caused with the probability of 0.7

The status of (1) is unavailable. The value of the directed line is expressed by probability, which is mainly caused by the network dynamics, network noise and other reasons.

Step S12, obtaining the probability that each fault node causes unavailability of each symptom node in the symptom range observed by the network manager, the unavailability probability that each fault node causes each symptom node in the whole symptom range, and calculating the unavailability credibility of each fault node according to the probability that each fault node causes unavailability of each symptom node in the symptom range observed by the network manager and the unavailability probability that each fault node causes each symptom node in the whole symptom range.

Specifically, the formula for calculating the unavailability reliability of each fault node according to the probability that each fault node causes unavailability of each symptom node in the symptom range observed by the network manager and the unavailability probability that each fault node causes unavailability of each symptom node in the whole symptom range is specifically as follows:

wherein alpha is_fiIs the unavailability confidence of the ith failed node, s_j∈S_oIndicates that the jth symptom node is in the range of the symptom observed by the network management, s_jE S denotes that the jth symptom node is in the whole symptom range, P (S)_j|f_i) Representing the probability that each failed node results in the unavailability of the respective symptom node.

Step S13, calculating a correction value of the symptom node based on the unavailability reliability of each of the failed nodes having an influence on the symptom node.

Specifically, step S13 specifically includes:

In the initial fault propagation model created in step S11, the fault node having an influence on the symptom node is a fault having a directed line to the symptom. Referring to fig. 3, a symptom node

The fault nodes with influences are A, D and E, the unavailable credibility of the fault node A, the fault node D and the fault node E is calculated and added respectively, the number of the fault nodes with influences on the symptom node is 3 in total, and the unavailable credibility sum is divided by 3 to obtain the correction value of the symptom node.

And step S14, when the correction value of the symptom node and the original value of the symptom node respectively indicate that the symptom is in different states, averaging the correction value of the symptom node and the original value of the symptom node to obtain an average value of the symptom node, correcting the average value of the original state and the current state of the symptom node by using the average value of the symptom node, and correcting the original state and the current state of the symptom node by using the average value of the symptom node.

In this embodiment, the original value of the symptom node represents a value before the symptom node is not corrected, for example, the original value of the symptom node is 0, the correction value of the symptom node is 1, in order to avoid error correction, an averaging method is adopted to make the average value of the symptoms be 0.5, and the original state and the current state of the symptom node are corrected by using the average value; if the original value of the symptom node is identical to the corrected value of the symptom node, no correction is required.

Step S15, constructing a fault propagation model based on the time slice, wherein the fault propagation model based on the time slice comprises the original state of the fault node, the current state of the fault node and the original state and the current state of the symptom node after correction, and acquiring the state transition probability of the fault node from the original state to the current state.

In the present embodiment, use is made of

Indicating the original state of the failed node,

representing the current state of the failed node, and N representing the number of the failed nodes; use of

Representing the original state of the symptom node,

representing the current state of the symptom node, and M represents the number of symptom nodes.

The probability of a state transition from the original state to the current state for the ith failed node is denoted as p (f)_l ²|f_l ¹) In consideration of the characteristics of the dynamic environment, the invention adds the time slice t into the fault propagation model, thereby depicting different network models in different time slices. At this time, each fault node contains the state of a plurality of time slices, the state of each time slice is related to the state of the previous time slice, and p (f) is used_i ^t|f_i ^t-1) Representing the state transition probability of the failed node. Namely: the state of the fault node at the last time slice t-1 is f_i ^t-1Under the condition of (1), the state of the fault node on the time slice t is f_i ^tProbability of p (f)_l ²|f_l ¹) A certain time slice can be taken for statistics through the network passing pipe to obtain the state transition probability.

And step S16, acquiring the corrected original state and the current state of the symptom node, and calculating the original state of the fault node in the corrected original state of the symptom node according to the corrected original state of the symptom node.

In step S16, the formula for calculating the original state of the fault node in the original state of the corrected symptom node according to the original state of the corrected symptom node is specifically:

wherein, F is¹In order to be the original state of the failed node,

indicating the original state of the failed node,

indicating the original state of the modified symptom node,

From bayesian theory and equation (1) it can be seen that:

wherein, F^1*Is an intermediate variable, because

Is taken from

Irrelevant, therefore, the formula (2) can be simplified into the formula (3);

wherein, p (f)_i ¹) The failure probability of the ith failed node in the original state,

represents the revised jth symptom node in the original state in the time-slice-based fault propagation model

The node of the node (c) is,

represents the revised jth symptom node in the original state

When the node is unavailable, the probability that the fault node in the original state is unavailable is determined, wherein the fault node in the original state is the corrected j-th symptom node in the original state

The parent node of (2).

In addition, p (f)_i ¹) And

may be obtained from the network management system.

The formula for implementing step S17 is specifically:

and F²＝argF^2*,

Wherein, F is²In order to be the current state of the failed node,

representing the original state of the symptom node,

representing the current state of the symptom node,

indicate the reason forThe current state of the barrier node.

According to Bayesian inference, the calculation process of formula (4) is as follows:

The node of the node (c) is,

representing a modified kth symptom node in the current state in the time-slice based fault propagation model

A parent node of (a); p (f)_l ²|f_l ¹) The state transition probability from the original state to the current state for the ith failed node.

In addition, p (f)_i ¹)、

And p (f)_l ²|f_l ¹) Can be obtained directly based on network management software statistics or obtained by further calculation on data obtained by statistics.

In order to verify the effect of the method, a 5G network slicing environment is simulated, and a GT-ITM (GT-ITM) (E, W.Zegura, K.L.Calvert, S.Bhattacharjee.How to model an internet [ C ]// Proceedings of IEEE INFOCOM, 1996.) tool is used for generating a network topology environment in an experiment. The network topology includes both an underlying network and a virtual network. The size of the nodes of the underlying network increases from 100 to 500. The size of the nodes of the virtual network increases from 5 to 15. The resource mapping of the virtual network to the underlying network uses a classical mapping algorithm.

In terms of end-to-end service simulation, 10% of nodes are selected from each virtual network as starting nodes, and 2 nodes are selected from the rest nodes as target nodes. And connecting the starting node and the target node by using a shortest path algorithm for simulating end-to-end service. In the aspect of fault simulation of the bottom layer network nodes, setting the prior faults of the bottom layer nodes to be uniformly distributed according to 0.001 and 0.01. In order to simulate the dynamic property of the network, the state of the network is changed again at intervals of 20 seconds, and the state information of the service is acquired again.

In order to verify the performance of the method, the DNFDA algorithm utilized by the method is compared with a Fault diagnosis algorithm (FDAoFPM) based on a Fault propagation model. The comparison algorithm builds a fault propagation model based on the relation between the fault and the symptom, and does not optimize the fault propagation model. And when algorithm comparison is carried out, analysis is carried out from three dimensions of diagnosis accuracy rate, diagnosis false alarm rate and diagnosis duration. The diagnosis accuracy refers to the proportion of the diagnosed fault resources in the total fault resources. The higher the diagnosis accuracy rate, the more fault resources identified by the algorithm. The diagnosis false alarm rate refers to that the resource state is a normal state, but the diagnosis algorithm identifies the resource state as an abnormal state, and the proportion of the fault resources diagnosed by errors in the total real fault resources is evaluated by using the diagnosis false alarm rate. Therefore, the lower the diagnosis false alarm rate, the better the algorithm performance. The diagnostic duration is the length of time from the receipt of the service status and the network topology until the set of failed nodes is diagnosed.

The diagnostic accuracy comparison is shown in fig. 4, with the X-axis representing the number of network nodes and the Y-axis representing the diagnostic accuracy. As can be seen from the figure, under different network scales, the two algorithms achieve better diagnosis accuracy under different network scales. The results of the two algorithms are compared, so that the method improves the accuracy of diagnosis. The method optimizes the fault model according to the network characteristics, and improves the accuracy of the fault diagnosis model.

The diagnostic false positive rate comparison results are shown in fig. 5. The X-axis represents the number of network nodes and the Y-axis represents the diagnostic false alarm rate. It can be known from the figure that the number of the network nodes has a small influence on the diagnosis false alarm rate of the two algorithms, which indicates that the influence of different network scales on the fault propagation model is small. In addition, the false diagnosis alarm rate of the method is lower than that of the traditional algorithm. This is because the present invention corrects the noise symptom, thereby improving the accuracy of the fault diagnosis model.

The diagnosis period comparison results are shown in fig. 6. The X-axis represents the number of network nodes and the Y-axis represents the diagnostic time. As can be seen from the figure, the diagnostic duration of both algorithms increases rapidly as the number of network nodes increases. This is because the increase in the network size causes the fault propagation model to become rapidly large, and a longer diagnosis time is required for fault diagnosis. In the aspect of comparing the diagnosis time of the two algorithms, the diagnosis time of the method is longer. This is because the method of the present invention requires optimization of the model, increasing the overall length of the fault diagnosis.

The implementation of the invention has the following beneficial effects:

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A dynamic network fault diagnosis method under a 5G network slice is characterized by comprising the following steps:

2. The method according to claim 1, wherein the formula for calculating the unavailability reliability of each failed node in the implementing step S12 according to the probability that each failed node causes unavailability of each symptom node in the symptom range observed by the webmaster and the unavailability probability that each failed node causes unavailability of each symptom node in the total symptom range is specifically:

wherein the content of the first and second substances,

3. The method according to claim 1, wherein step S13 specifically includes:

4. The method according to claim 1, wherein the formula for calculating the original state of the fault node in the original state of the modified symptom node according to the original state of the modified symptom node in step S16 is specifically as follows:

wherein, F is¹In order to be the original state of the failed node,

indicating the original state of the failed node,

indicating the original state of the modified symptom node,

5. The method of claim 1, wherein the formula for implementing step S17 is specifically:

and F²＝argF^2*,

Wherein, F is²In order to be the current state of the failed node,

representing the original state of the symptom node,

representing the current state of the symptom node,

indicating the current state of the failed node。