CN112260873B

CN112260873B - Dynamic network fault diagnosis method under 5G network slice

Info

Publication number: CN112260873B
Application number: CN202011137354.2A
Authority: CN
Inventors: 黄晓奇; 谭康; 保剑; 周瑾瑜; 王龙; 周雨涛; 丘国良; 郑启文; 欧明辉; 吴俊宇; 宋旅宁
Original assignee: Shenzhen Power Supply Bureau Co Ltd
Current assignee: Shenzhen Power Supply Bureau Co Ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2023-03-03
Anticipated expiration: 2040-10-22
Also published as: CN112260873A

Abstract

The invention provides a dynamic network fault diagnosis method under 5G network slice, which comprises the steps of calculating the unavailable reliability of each fault node according to the probability that each fault node causes the unavailability of each symptom node in a symptom range observed by a network manager and the unavailable probability that each fault node causes the unavailability of each symptom node in the whole symptom range, and calculating the corrected value of the symptom node; correcting the original state and the current state of the symptom node by using the correction value of the symptom node; constructing a fault propagation model based on the time segments; acquiring the original state and the current state of the corrected symptom node, and calculating the original state of the fault node according to the original state of the corrected symptom node; and calculating the current state of the fault node according to the original state of the fault node, the corrected original state and the current state of the symptom node. By the method and the device, the problems of noise increase and inaccurate fault propagation model caused by dynamic increase of the existing network are solved.

Description

Dynamic network fault diagnosis method under 5G network slice

Technical Field

The invention relates to the technical field of 5G communication, in particular to a dynamic network fault diagnosis method under a 5G network slice.

Background

The characteristics of high bandwidth and low time delay of the 5G network meet the network requirements of people in work and life. Under the background, the application scenes of the 5G network are more and more, and the working efficiency and the life convenience are obviously improved. However, since the 5G network has various types and a large number of services, the demand for network resources is significantly increased. In order to improve the utilization rate of network resources and ensure the service quality of various services, the network slicing technology has become a network technology framework commonly accepted by network equipment vendors and network operators. After network slicing, an existing physical network is divided into an underlying network and a virtual network. The wireless sub-network, the transmission sub-network and the data sub-network of the underlying network are sliced into different sub-networks. The virtual network is composed of wireless slices, transmission slices and data slices by acquiring resources from the underlying network. For convenience of description, the present invention refers to the resources of the underlying network collectively as underlying node resources and underlying link resources. Resources of the virtual network are collectively referred to as virtual node resources and virtual link resources. In order to ensure the stable operation of the underlying network and the virtual network, the service quality of the 5G service is improved. The fault diagnosis technology of the 5G network has become a current research hotspot.

The fault diagnosis algorithm can be divided into two strategies, namely passive monitoring and active monitoring. Passive monitoring is passive fault location based on data from a network management system. The active monitoring is to actively monitor the network characteristics in real time according to the characteristics of the service, thereby quickly finding potential problems and repairing faults. For example, literature [ GONTARA, salah; BOUFAIED, amine; based on a Boolean Particle Swarm Optimization Algorithm, an end-to-end fault location Algorithm is designed by KORBAA, oujdi.fat Localization Algorithm in Computer Networks Based on the Boolean Particle Swarm Optimization [ C ]// Proceedings of the 2019IEEE International Conference on Systems, man and Cybernetics (SMC). According to the technical scheme, the method comprises the following steps of dynamically adjusting a network monitoring system according to network topological characteristics, so that a fault diagnosis algorithm is more adaptive to changes of a network environment. The fault diagnosis process generally adopts a dependency matrix for fault location, and can be divided into a binary model and a non-binary model.

The existing method mainly solves the problem of fault diagnosis in a static network environment. Due to the fact that the 5G network slicing technology has the characteristics of dynamic migration and increase as required, network node and network link resources can dynamically change along with time, and the problems of increase of network noise caused by the dynamic property and inaccuracy of a fault propagation model caused by dynamic change of the network resources are solved.

Disclosure of Invention

The invention aims to provide a dynamic network fault diagnosis method under a 5G network slice, which is used for solving the problems of network noise increase and inaccurate fault propagation model caused by the dynamic property of the existing network.

The invention provides a dynamic network fault diagnosis method under a 5G network slice, which comprises the following steps:

s11, constructing an initial fault propagation model, wherein the initial fault propagation model comprises faults, symptoms and a directed line from the faults to the symptoms;

step S12, obtaining the probability that each fault node causes unavailability of each symptom node in a symptom range observed by a network manager, and the unavailability probability that each fault node causes each symptom node in a total symptom range, and calculating the unavailability credibility of each fault node according to the probability that each fault node causes unavailability of each symptom node in the symptom range observed by the network manager and the unavailability probability that each fault node causes each symptom node in the total symptom range;

s13, calculating a correction value of the symptom node according to the unavailable credibility of each fault node having influence on the symptom node;

step S14, when the correction value of the symptom node and the original value of the symptom node respectively represent that the symptom is in different states, calculating the average value of the symptom node by the correction value of the symptom node and the original value of the symptom node, and correcting the original state and the current state of the symptom node by using the average value of the symptom node;

step S15, constructing a fault propagation model based on a time slice, wherein the fault propagation model based on the time slice comprises an original state of a fault node, a current state of the fault node and the original state and the current state of a symptom node after correction, and acquiring the state transition probability of the fault node from the original state to the current state;

s16, acquiring the corrected original state and the current state of the symptom node, and calculating the original state of the fault node in the corrected original state of the symptom node according to the corrected original state of the symptom node;

and S17, calculating the current state of the fault node according to the original state of the fault node, the corrected original state and the current state of the symptom node.

Further, the formula for calculating the unavailability reliability of each faulty node according to the probability that each faulty node causes unavailability of each faulty node in the symptom range observed by the network manager and the unavailability probability that each faulty node causes unavailability of each faulty node in the all symptom ranges in step S12 is specifically:

wherein the content of the first and second substances,

is the unavailability confidence of the ith failed node, s _j ∈S _o Represents the observed symptom of the jth symptom node in the network managementWithin the range of s _j Epsilon S indicates that the jth symptom node is in the whole symptom range, P (S) _j |f _i ) Indicating the probability that the ith failed node caused the respective symptom node to be unavailable.

Further, step S13 specifically includes:

step S131, adding the unavailable credibility of each fault node having influence on the symptom node to obtain the sum of the unavailable credibility;

and step S132, dividing the sum of the unavailable credibility by the number of fault nodes having influence on the symptom node to obtain a corrected value of the symptom node.

Further, the formula for calculating the original state of the fault node in the original state of the modified symptom node according to the modified original state of the symptom node in step S16 is specifically:

wherein, the F ¹ In order to be the original state of the failed node,

indicating the original state of the failed node,

indicating the original state of the modified symptom node,

and the probability that the original state of the fault node is abnormal when the original state of the corrected symptom node is abnormal is shown.

Further, the formula for implementing step S17 is specifically:

and F ² ＝argF ^2* ,

Wherein, F is ² In order to be the current state of the failed node,

representing the original state of the symptom node,

represents the current state of the symptom node,

indicating the current state of the failed node.

The implementation of the invention has the following beneficial effects:

according to the method, the original state and the current state of the symptom node are acquired through network management, the unavailable reliability of the fault node is introduced to correct the original state and the current state of the symptom node, a time segment is introduced into a fault propagation model, the current state of the fault node is further calculated after the original state of the fault node is obtained, and partial interference caused by dynamic introduction of a network is corrected by utilizing the relation between the fault node and the symptom node and the time correlation of the fault node; the problem of inaccurate fault propagation model caused by network noise increase and network resource dynamic change due to the existing dynamic property is solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a dynamic network fault diagnosis method under a 5G network slice according to an embodiment of the present invention.

Fig. 2 is a schematic view of virtual network resource allocation under a 5G network slice according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an initial fault propagation model under a 5G network slice according to an embodiment of the present invention.

FIG. 4 is a comparison of diagnostic accuracy provided by embodiments of the present invention.

Fig. 5 is a comparison diagram of the diagnostic false alarm rate provided by the embodiment of the invention.

Fig. 6 is a comparison diagram of diagnostic durations provided by an embodiment of the present invention.

Detailed Description

In this patent, the following description will be given with reference to the accompanying drawings and examples.

As shown in fig. 1, an embodiment of the present invention provides a dynamic network fault diagnosis method under a 5G network slice, where the method includes:

s11, constructing an initial fault propagation model, wherein the initial fault propagation model comprises fault nodes, symptom nodes and directed lines from the fault nodes to the symptom nodes.

In this embodiment, the underlying network and the virtual network use undirected weighted graph G, respectively ^S ＝(N ^S ,E ^S )、 G ^V ＝(N ^V ,E ^V ) Is represented by, wherein, N ^S And N ^V Respectively representing a bottom node set, a virtual node set, E ^S And E ^V Respectively representing a bottom link set and a virtual link set. Using M _N :(N ^V →N ^S ,E ^V →P ^S ) Indicating an underlying network node N ^S Resource allocation to a virtual network node N ^V An underlying network path P ^S Resource allocation to virtual network link E ^V . Underlying network path P ^S The method refers to an end-to-end bottom layer link resource formed by connecting a plurality of bottom layer links, and two end points respectively correspond to a virtual network link E ^V The two endpoints of (2) are mapped to the underlying network node. For example, the resources of the virtual nodes a, b of the virtual network VN1 are allocated by the underlay nodes A, B of the underlay network SN, respectively.

In the daily operation process of a network operator, the running states of various services and the states of network resources can be acquired in real time through network management software. Due to network slicing, the running state of a service cannot directly correspond to underlying network resources. Therefore, before establishing the fault propagation model, the virtual network is first mapped onto the underlying network based on the mapping relationship between the virtual network and the underlying network, and the specific network mapping refers to fig. 2. And secondly, constructing a service fault propagation model by using the service state, the underlying network resources and the bearing relationship between the service and the underlying network resources.

The business fault propagation model includes symptoms, faults, fault-to-symptom directed lines. Symptoms refer to the state of the traffic, including the available state and the unavailable state, represented using s =1 and s =0, respectively. Failure refers to underlying network resources. The fault status includes a resource available status and a resource unavailable status, which are respectively represented by f =1 and f = 0. The directed line from fault to symptom represents the effect of the fault node on the symptom node, and the number on the line represents the degree of effect, see FIG. 3, e.g. fault node A to symptom node

Is 0.7, use

And (4) showing. The physical meaning is: according to the historical operation experience of the network, when the bottom layer resource is in an unavailable state, the symptom node is caused with the probability of 0.7

The status of (1) is unavailable. The value of the directed line is expressed by probability, which is mainly caused by the network dynamics, network noise and other reasons.

Step S12, obtaining the probability that each fault node causes unavailability of each symptom node in the symptom range observed by the network manager, and the unavailability probability that each fault node causes each symptom node in the whole symptom range, and calculating the unavailability credibility of each fault node according to the probability that each fault node causes unavailability of each symptom node in the symptom range observed by the network manager and the unavailability probability that each fault node causes each symptom node in the whole symptom range.

Specifically, the formula for calculating the unavailability reliability of each fault node according to the probability that each fault node causes unavailability of each symptom node in the symptom range observed by the network manager and the unavailability probability that each fault node causes unavailability of each symptom node in the whole symptom range is specifically as follows:

wherein alpha is _fi Is the unavailability confidence of the ith failed node, s _j ∈S _o Indicates that the jth symptom node is in the range of the symptom observed by the network management, s _j E S denotes that the jth symptom node is in the whole symptom range, P (S) _j |f _i ) Representing the probability that each failed node results in the unavailability of the respective symptom node.

And S13, calculating the correction value of the symptom node according to the unavailable credibility of each fault node having influence on the symptom node.

Specifically, step S13 specifically includes:

In the initial fault propagation model created in step S11, the fault node having an influence on the symptom node is a fault having a directed line to the symptom. Referring to fig. 3, a symptom node

The fault nodes with influences are A, D and E, the unavailable credibility of the fault node A, the fault node D and the fault node E is calculated and added respectively, the number of the fault nodes with influences on the symptom nodes is 3, and the unavailable credibility sum is divided by 3 to obtain the symptom nodesCorrection value of the shape node.

And S14, when the correction value of the symptom node and the original value of the symptom node respectively represent that the symptom is in different states, averaging the correction value of the symptom node and the original value of the symptom node, correcting the average value of the original state and the current state of the symptom node by using the average value of the symptom node, and correcting the original state and the current state of the symptom node by using the average value of the symptom node.

In this embodiment, the original value of the symptom node represents a value before the symptom node is not corrected, for example, the original value of the symptom node is 0, the correction value of the symptom node is 1, in order to avoid error correction, an averaging method is adopted to make the average value of the symptoms be 0.5, and the original state and the current state of the symptom node are corrected by using the average value; if the original value of the symptom node is identical to the corrected value of the symptom node, no correction is required.

And S15, constructing a time-slice-based fault propagation model, wherein the time-slice-based fault propagation model comprises the original state of the fault node, the current state of the fault node and the original state and the current state of the symptom node after correction, and acquiring the state transition probability of the fault node from the original state to the current state.

In the present embodiment, use is made of

Indicating the original state of the failed node,

representing the current state of the failed node, and N representing the number of the failed nodes; use of

Representing the original state of the symptom node,

representing the current state of the symptom node, M represents the symptomThe number of state nodes.

The probability of a state transition from the original state to the current state for the ith failed node is denoted as p (f) _l ² |f _l ¹ ) In consideration of the characteristics of the dynamic environment, the invention adds the time slice t into the fault propagation model, thereby depicting different network models in different time slices. At this time, each fault node contains the state of a plurality of time slices, the state of each time slice is related to the state of the previous time slice, and p (f) is used _i ^t |f _i ^t-1 ) Representing the state transition probability of the failed node. Namely: the state of the fault node at the last time slice t-1 is f _i ^t-1 Under the condition of (1), the state of the fault node on the time slice t is f _i ^t Probability of p (f) _l ² |f _l ¹ ) A certain time slice can be taken for statistics through the network passing pipe to obtain the state transition probability.

And S16, acquiring the corrected original state and the current state of the symptom node, and calculating the original state of the fault node in the corrected original state of the symptom node according to the corrected original state of the symptom node.

In step S16, according to the corrected original state of the symptom node, the formula for calculating the original state of the fault node in the corrected original state of the symptom node is specifically:

wherein, the F ¹ In order to be the original state of the failed node,

indicating the original state of the failed node,

representing the original state of the modified symptom node,

From bayesian theory and equation (1), it can be known that:

wherein, F ^1* Is an intermediate variable, because

Is taken as a value of

Irrelevant, therefore, formula (2) can be simplified to formula (3);

wherein, p (f) _i ¹ ) The failure probability of the ith failed node in the original state,

represents the revised jth symptom node in the original state in the time-slice-based fault propagation model

The node of the node (c) is,

represents the revised jth symptom node in the original state

When the node is unavailable, the probability that the fault node in the original state is unavailable is determined, wherein the fault node in the original state is the corrected j-th symptom node in the original state

The parent node of (c).

In addition, p (f) _i ¹ ) And

may be obtained from the network management system.

The formula for implementing step S17 is specifically:

and F ² ＝argF ^2* ,

Wherein, F is ² In order to be the current state of the failed node,

representing the original state of the symptom node,

represents the current state of the symptom node,

indicating the current state of the failed node.

According to Bayesian inference, the calculation process of formula (4) is as follows:

representing repairs in the time-slice-based fault propagation model in an original stateThe j-th symptom node after the positive

The node of the node (c) is,

representing a modified kth symptom node in the current state in the time-slice based fault propagation model

A parent node of (a); p (f) _l ² |f _l ¹ ) The state transition probability from the original state to the current state for the ith failed node.

In addition, p (f) _i ¹ )、

And p (f) _l ² |f _l ¹ ) Can be obtained directly based on network management software statistics or obtained by further calculation on data obtained by statistics.

In order to verify the effect of the method, a 5G network slicing environment is simulated, and a GT-ITM (E, W.Zegura, K.L.Calvert, S.Bhattacharjee.how to model an internet [ C ]// Proceedings of IEEE INFOCOM, 1996.) tool is used in an experiment to generate a network topology environment. The network topology includes both an underlying network and a virtual network. The size of the nodes of the underlying network increases from 100 to 500. The size of the nodes of the virtual network increases from 5 to 15. The resource mapping of the virtual network to the underlying network uses a classical mapping algorithm.

In terms of end-to-end service simulation, 10% of the nodes from each virtual network are selected as the starting nodes, and 2 nodes from the remaining nodes are selected as the target nodes. And connecting the starting node and the target node by using a shortest path algorithm for simulating end-to-end service. In the aspect of fault simulation of the underlying network nodes, setting uniform distribution of prior faults of the underlying nodes [0.001,0.01 ]. In order to simulate the dynamic property of the network, the state of the network is changed again at intervals of 20 seconds, and the state information of the service is acquired again.

In order to verify the performance of the method, the DNFDA algorithm utilized by the method is compared with a Fault diagnosis algorithm (FDAoFPM) based on a Fault propagation model. The comparison algorithm builds a fault propagation model based on the relation between the fault and the symptom, and does not optimize the fault propagation model. And when algorithm comparison is carried out, analysis is carried out from three dimensions of diagnosis accuracy rate, diagnosis false alarm rate and diagnosis duration. The diagnosis accuracy refers to the proportion of the diagnosed fault resources in the total fault resources. The higher the diagnosis accuracy rate, the more fault resources identified by the algorithm. The diagnosis false alarm rate refers to that the resource state is a normal state, but the diagnosis algorithm identifies the resource state as an abnormal state, and the proportion of the fault resources diagnosed by errors in the total real fault resources is evaluated by using the diagnosis false alarm rate. Therefore, the lower the diagnosis false alarm rate, the better the algorithm performance. The diagnosis duration refers to the time length from the time when the service state and the network topology are received to the time when the fault node set is diagnosed.

The diagnostic accuracy comparison is shown in fig. 4, with the X-axis representing the number of network nodes and the Y-axis representing the diagnostic accuracy. As can be seen from the figure, under different network scales, the two algorithms achieve better diagnosis accuracy under different network scales. The results of the two algorithms are compared, so that the method improves the accuracy of diagnosis. The method optimizes the fault model according to the network characteristics, and improves the accuracy of the fault diagnosis model.

The diagnostic false positive rate comparison results are shown in fig. 5. The X-axis represents the number of network nodes and the Y-axis represents the diagnostic false alarm rate. It can be known from the figure that the number of the network nodes has a small influence on the diagnosis false alarm rate of the two algorithms, which indicates that the influence of different network scales on the fault propagation model is small. In addition, the false diagnosis alarm rate of the method is lower than that of the traditional algorithm. This is because the present invention corrects the noise symptom, thereby improving the accuracy of the fault diagnosis model.

The diagnosis period comparison results are shown in fig. 6. The X-axis represents the number of network nodes and the Y-axis represents the diagnostic time. As can be seen from the figure, the diagnostic duration of both algorithms increases rapidly as the number of network nodes increases. This is because the increase in the network size causes the fault propagation model to become rapidly large, and a longer diagnosis time is required for fault diagnosis. In the aspect of comparing the diagnosis time of the two algorithms, the diagnosis time of the method is longer. This is because the method of the present invention requires optimization of the model, increasing the overall length of the fault diagnosis.

The implementation of the invention has the following beneficial effects:

according to the method, the original state and the current state of the symptom node are acquired through network management, the unavailable credibility of the fault node is introduced to correct the original state and the current state of the symptom node, a time segment is introduced into a fault propagation model, the current state of the fault node is further calculated after the original state of the fault node is obtained, and partial interference caused by dynamic introduction of a network is corrected by utilizing the relation between the fault node and the symptom node and the correlation of the fault node in time; the problem of inaccurate fault propagation model caused by network noise increase and network resource dynamic change due to the existing dynamic property is solved.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A dynamic network fault diagnosis method under 5G network slice is characterized by comprising the following steps:

step S16, acquiring the corrected original state and the current state of the symptom node, and calculating the original state of the fault node in the corrected original state of the symptom node according to the corrected original state of the symptom node;

2. The method according to claim 1, wherein the formula for calculating the unavailability reliability of each failed node in the step S12 according to the probability that each failed node causes unavailability of each symptom node in the symptom range observed by the network manager and the unavailability probability that each failed node causes unavailability of each symptom node in the total symptom range is specifically:

wherein alpha is _fi Is the unavailability confidence of the ith failed node, s _j ∈S _o Indicates that the jth symptom node is within the range of symptoms observed by the network management, s _j E S denotes that the jth symptom node is in the whole symptom range, P (S) _j |f _i ) Indicating the probability that the ith failed node results in the unavailability of the respective symptom node.

3. The method according to claim 1, wherein step S13 specifically comprises:

and step S132, dividing the sum of the unavailable credibility by the number of fault nodes having influence on symptom nodes to obtain a corrected value of the symptom nodes.

4. The method according to claim 1, wherein the formula for calculating the original state of the fault node in the original state of the modified symptom node according to the original state of the modified symptom node in step S16 is specifically: