CN111368888A

CN111368888A - Service function chain fault diagnosis method based on deep dynamic Bayesian network

Info

Publication number: CN111368888A
Application number: CN202010116968.6A
Authority: CN
Inventors: 唐伦; 廖皓; 贺兰钦; 曹睿; 胡彦娟
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Shenzhen Wanzhida Technology Transfer Center Co ltd; Xixian New Area Digital Technology Co.,Ltd.
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2020-07-03
Anticipated expiration: 2040-02-25
Also published as: CN111368888B

Abstract

The invention relates to a service function chain fault diagnosis method based on a deep dynamic Bayesian network, which belongs to the technical field of communication, and is characterized in that a fault diagnosis model is constructed according to a fault propagation relation in a hierarchical network architecture of a service function chain by combining the characteristics of a service function chain scene, and high-dimensional data of symptoms is collected by adopting a mode of monitoring a plurality of virtual network function performance data on a physical node. And in consideration of the diversity of network symptom observation data based on an SDN/NFV framework and the spatial correlation of physical nodes and virtual network functions, a deep belief network is adopted to extract the characteristics of the observation data. And finally, introducing a dynamic Bayesian network to diagnose the fault source in real time by utilizing the time correlation of fault propagation. The service function chain fault diagnosis method for the 5G end-to-end network slice scene can effectively process high-dimensional network data and meet the requirement of a system on fault diagnosis precision.

Description

Service function chain fault diagnosis method based on deep dynamic Bayesian network

Technical Field

The invention belongs to the technical field of communication, and relates to a service function chain fault diagnosis method based on a deep dynamic Bayesian network.

Background

In recent years, with the increasing demand for user diversity, it is difficult for the conventional network architecture to adapt to the change of user demand. Therefore, a 5G network should have high flexibility to cope with the diversification of user service demands. Network slicing based on Software Defined Networking (SDN) and Network Function Virtualization (NFV) has become a key technology for 5G network operators to provide various customized services on demand in a sustainable way. NFV decouples Network Functions (NF) and physical hardware facilities, replaces special hardware with general hardware, can conveniently and quickly deploy network functions at any position in a network, and simultaneously realizes demand-based allocation and dynamic expansion of general hardware resources. In a sliced network, each traffic request consists of a set of Virtual Network Functions (VNFs) that provide different network services, which are interconnected in a certain order to form a Service Function Chain (SFC). Currently, most of the research generally focuses on the acceptance rate of the virtual network request, the allocation of the underlying network resources, the deployment of the SFC, and the like, and neglects the reliability of the virtual network. Network failures can occur to a greater extent because service function chains can be created, migrated, and destroyed dynamically. To ensure quality of service (QoS) of service function chains in a network function virtualization environment, it is necessary for a network to be able to achieve a fast recovery from a failure.

However, failover to a virtualized network is a very serious problem currently facing. With the exponential growth of user traffic and the increasing complexity of network structures, the current manual-based network operation and maintenance mode is not only inefficient but also high in cost. In order to reduce operation and maintenance expenditure and improve operation and maintenance efficiency, the 5G network introduces the concept of self-organizing network technology (SON), namely self-configuration, self-optimization and self-healing are utilized to realize self-management of the network [7,8 ]. The fault diagnosis is used as the key for positioning the fault source of the network and is the primary condition for realizing self-healing of the network.

The existing diagnostic methods have some problems: (1) in the existing explicit diagnosis method based on the network topology and the fault propagation model, a static network is usually obtained through expert knowledge, however, in a virtualized network, the network topology and the fault dependency relationship may change at any time, and mutual interference between faults and even unknown faults may be caused due to sharing of bottom layer resources, thereby causing wrong diagnosis; the method based on the probe needs to occupy network resources, and is easy to cause congestion in the network, thereby causing the problem of service quality reduction; the method based on the model needs to reconstruct a network topology and a fault dependence graph model during each diagnosis, and the timeliness of diagnosis is difficult to guarantee in a large-scale network. (2) Data in a virtualized network has the characteristics of mass, high dimension and multiple sources, and the traditional diagnosis method cannot well explain the characteristics of weak correlation. (3) The learning-based method can intelligently classify faults, but training data is not time-efficient, so that the actual diagnosis accuracy is not high.

Disclosure of Invention

In view of the above, the present invention provides an SFC fault diagnosis method based on a deep dynamic bayesian network to study the problem of VNF node fault location. The method can effectively diagnose the fault position of the bottom node according to the change of the performance data of the VNF node in the network virtualization environment, and meets the requirement of network reliability.

In order to achieve the purpose, the invention provides the following technical scheme:

a service function chain fault diagnosis method based on a deep dynamic Bayesian network comprises the following steps:

s1: constructing a fault diagnosis model according to a fault propagation relation in a hierarchical network architecture of a service function chain;

s2: monitoring, at a physical node, performance data of a plurality of Virtual Network Function (VNF) thereon, and collecting high-dimensional data of a symptom;

s3: aiming at the diversity of Network observation data and the spatial correlation between a physical node and a VNF under an SDN/NFV framework, a correlated Deep Belief Network (DBN) model is established to perform feature extraction and dimension reduction on the observation data, a k-step contrast Divergence algorithm (CD-k) is used for approximately sampling a historical observation data set, and a self-adaptive BP algorithm added with a momentum term is used for fine adjustment of the model;

s4: a Dynamic Bayesian Network (DBN) model is established to diagnose fault sources in real time by utilizing the time correlation existing between faults, and a 1.5 time slice joint tree reasoning algorithm is used for positioning the fault sources.

Further, in step S1, in the service function chain scenario, the NFV MANO in the virtualization layer determines, according to the user service request, VNFs and their logical links required by the service, and ensures resources of a general server occupied by the VNFs in operation and a specific bandwidth on a path, where the resources of the general server include computation, network, and storage, and then the SDN controller connects the VNFs to form an SFC and control transmission connection; the application layer comprises a plurality of SFCs for serving various service flows, each SFC is formed by different network functions in a chain mode according to a certain sequence and provides end-to-end service for the service flows.

Further, in step S1, according to the fault propagation relationship in the hierarchical network architecture of the service function chain, the fault diagnosis model that is established needs to first locate a VNF node that may have a fault at the application layer, and then locate the root of the fault according to the mapping relationship between the VNF node that has a fault at the application layer and the infrastructure layer; for a layered network architecture, a Dynamic Bayesian Network (DBN) capable of causal association over time and adapting to environmental dynamics is employed for fault diagnosis.

Further, in step S2, the method further includes monitoring performance data of a plurality of VNFs on the physical node, collecting high-dimensional data of the symptom, performing normalization preprocessing on the data, eliminating the influence caused by different dimensions of the symptom information, and preprocessing the data by using a linear maximum and minimum method, where a conversion function is:

further, in step S3, a relevant Deep Belief Network (DBN) model is established for the diversity of network observation data based on the SDN/NFV architecture and the spatial correlation between the physical nodes and the VNF:

s31: carrying out greedy layer-by-layer training on the network in an unsupervised learning mode by using a multi-hidden-layer neural network consisting of three-layer stacked Restricted Boltzmann Machines (RBMs), and learning the high-level fault characteristics of the physical nodes only by using an SFC virtual node historical observation data set;

s32: adding a softmax layer on the three layers of RBM models to form a Deep Belief Network (DBN) to classify the node faults, and performing reverse supervised fine adjustment by combining label data to obtain a classification model of an initial time slice;

s33: the parameters are further optimized using real-time symptom data.

Further, step S3 specifically includes the following steps:

the parameter to be learned is theta, and in an SFC scene, theta is the probability dependence relationship between the fault symptom to be obtained and the actual fault, namely

θ＝{w_ij,a_i,b_j:1 i m,1 j n}

Wherein, w_ijRepresenting the weight between the visual level node i and the hidden level node j, a_iRepresenting the bias of the visible level node a, b_jRepresenting the bias of a hidden layer node j, wherein n is the number of various fault factors X of the physical node, and m is the number of virtual node observation data Y;

for parameters of the SFC fault diagnosis model in a single time slice, learning is carried out by adopting a deep belief network, and the parameters are trained in an off-line and on-line learning mode:

firstly, collecting a historical observation data set of a fault node as a training sample, and dividing the training sample into a marked sample and an unmarked sample; given that S and U represent sets of marked and unmarked samples, respectively, Y and X represent various types of symptom information of a failed VNF node and a label output of the failure type, respectively; to pairThe set of historical observation data for VNF nodes on the same physical node is denoted as Q ═ … Q_i… } in which Q_i＝[Y_t,Y_t-1,…,Y_t-d+1]D represents the dimension of the sample of the model input; finally, marking an unmarked sample of unsupervised learning as U (Y), marking a marked sample of supervised learning as S (Y, X), and dividing all data samples into small-batch data sets so as to improve the training speed of the DBN model through batch training;

then, dividing the sample set into a training set, a verification set and a test set according to a proportion, and training the model by using unmarked and marked small-batch data sets; learning parameters of the RBM in an unsupervised learning mode by using an unlabeled data set U (Y), so that the network probability distribution of the RBM can be better fitted to a training sample; adopting a k-step Contrast Divergence (CD) algorithm to approximately sample the sample, and updating a parameter theta by solving the gradient of a log-likelihood function;

after the RBM1 model is subjected to iterative adjustment by a CD-K fast learning algorithm, obtaining preliminary model parameters; then, the activation state of the hidden layer neuron nodes obtained by RBM1 training is used as the network input of an RBM2, and the subsequent RBM3 models are sequentially trained in this way until all RBMs in the DBN model are trained; outputting the hidden layer of the last RBM as the input of a softmax classifier;

after the optimal model parameters in the unsupervised pre-training stage are obtained, carrying out supervised reverse fine adjustment by combining tag data S (Y, X), and establishing a complex nonlinear relation between fault characteristics and node state tags, wherein the tag values represent the real state of each VNF fault of the SFC; the BP algorithm with the self-adaptive learning rate reduction added with momentum items is used for reversely fine-tuning the integral model parameters of the deep belief network, and the parameters in the unsupervised stage are taken as initialization parameters, and the expression is as follows:

wherein, theta_tAnd theta_t-1Respectively in the t-th and t-1 th iterationsThe correction quantity of the parameter, b is a momentum term coefficient, a is a learning rate, and ln L/ln theta is the gradient of the log-likelihood function of the current sample.

After obtaining a DBN model by using historical observation data of VNF nodes of the same type, optimizing the model in real time by using real-time observation data of the VNF fault nodes in a slicing period; sample R is updated in real time by a sampling sliding window mechanism_t＝[Y_t,Y_t-1,…,Y_t-d+1]Where d denotes the length of the sliding window, i.e. each time a time slice t has elapsed, the observation data Y at time t is introduced_tSimultaneously deleting Y_t-dKeeping the size of the input sample unchanged; the single sample set training mode is used for optimizing the parameters of the model, and the fault symptom is output to be Y at the moment t_t-d+1:tThe predicted infrastructure layer node state under the condition is p (X)_t|Y_t-d+1:t) Wherein X is_t＝{x₁,x₂,…,x_n}。

Further, in step S4, the method specifically includes:

the dynamic Bayesian network DBN model is defined as (B)₀,B_→) In which B is₀And the prior network represents the initial time slice of the online learning phase of the deep belief network, namely the initial time physical node state. B is_→A hidden state transition model representing a BN composed of more than two time slices;

the dynamic Bayesian network DBN is based on the observation data Y_t＝{y₁,y₂,…,y_mDeducing hidden variable X_t＝{x₁,x₂,…,x_nThe probability of the maximum possible value, wherein Y represents the symptom information of the SFC virtual node, m possible values are provided, X represents the physical node state of the infrastructure layer, and n possible actual results are provided;

setting the initial hidden variable prior distribution matrix as pi, then

π＝(π_i)_1×n,i＝1,2,…,n

Wherein

Corresponding to the initialThe prior probability of the working state of the infrastructure layer node at the moment is used, and then the posterior probability estimated by the initial time slice in the online learning stage of the depth information network is used as the prior probability of the node state;

the state transition matrix between the failed nodes is A, then

A＝(a_ik)_n×n,a_ik＝P(X_t＝i|X_t-1＝k)，i,k＝1,2,…,n

Wherein a is_ijThe influence of the state of a certain fault factor of the fault node at the time t-1 on the state at the time t is shown;

the state transition matrix between the fault node and the symptom information is B, then

B＝(b_ij)_n×m，b_ij＝P(Y_t＝j|X_t＝i)，i＝1,2,…,n,j＝1,2,…,m.

Wherein b is_ijThe method comprises the steps of representing the influence of a fault on working performance data of a virtual node when the fault occurs at time i;

under the classical assumption of the dynamic bayesian network model, the joint probability of observation and state is given by:

wherein

Representing the observed emission probability required for DDBN inference. Then, modeling the observed emission probability by adopting a Deep Belief Network (DBN) which well extracts the high-dimensional data characteristics;

finally, reasoning SFC fault, calculating probability distribution of fault root under the condition of given fault symptom, and adopting 1.5 time slice combined tree reasoning algorithm to maximize possible value P (x) of fault_t＝i|y_1:T)；

In the SFC fault diagnosis model, the main idea of using a 1.5 time slice joint tree reasoning algorithm to carry out SFC fault reasoning is as follows: according to the Markov property of the dynamic Bayesian network, the set of fault nodes is nextThe time slice has child nodes, and under the condition that the values of the child nodes are known, the state of the past node and the state of the future node have no relation, and the child nodes are called interface nodes; is provided with JT_tIs a union tree in time t, C_tIs JT_tIn which contains I_tThe ball of (D)_tIs JT_tIn which contains I_t-1Interface node I in clique, time slice_tInterface I for receiving previous time slice_t-1And also the next time slice interface I_t+1Reasoning is carried out between the interfaces through message propagation, and the reasoning process is as follows:

step 1: constructing a 1.5 time slice joint tree JT by performing normalization, triangulation and other steps on a DBN-based SFC fault inference model_t(ii) a Establishing a group tree by triangularizing a transition probability matrix A between fault nodes, finding triangulated maximum groups, and connecting separation nodes formed by intersection of two groups between each maximum group to form a connection tree; wherein each clique has a potential function ψ which is a (CPT) conditional probability table product of nodes in the respective cliques;

step 2: information forward propagation, Joint Tree JT for Current time slice_tJoint Tree JT from previous time slice_t-1Obtaining new evidence;

and step 3: information back propagation, Joint Tree JT for Current time slice_tJoint Tree JT from a later time slice_t+1Absorb evidences in the middle and realize JT to the current time slice_tThe probability distribution of the joint tree of (1) is updated.

Further, in the process of reasoning through message propagation between the interfaces in step S4, step 2 specifically includes the following steps:

step 21) initializing junction tree JT_tThe potential function ψ of;

step 22) firstly assigning values to symptom nodes, and then taking symptoms before t time slices as prior information P (I)_t-1|Y_1:t-1) Collecting symptom information to C_t-1Performing marginalization to obtain I_t-1Probability distribution of (2). Through an interface I_t-1From C_t-1To D_tTransferring;

wherein I_t-1∈C_t-1∩D_tIndicating propagation of the fault node between time slices;

step 23) collecting symptom data Y of the fault node of the SFC of the current time slice_tAs evidence input to the junction tree;

step 24) for root C_tCollecting and adding evidence;

step 25) return to JT_tThe process ends when T is T, otherwise, T +1 goes to step 22).

Further, in the process of reasoning through message propagation between the interfaces in step S4, step 3 specifically includes the following steps:

step 31) if T is T, distributing the evidence;

step 32) marginalizing D_t+1To obtain I_tA probability distribution of (a);

step 33) by absorption D_t+1To update JT_tC in_tI of (A)_t；

Wherein the content of the first and second substances,

and

representing the original potential;

step 34) from root C_tDistributing the evidence;

step 35) Return to JT_tThe process ends when t is 1, otherwise, t-1 goes to step 31).

The invention has the beneficial effects that: the fault diagnosis method provided by the invention can effectively extract massive, high-dimensional and multi-source data characteristics in a complex network on the basis of meeting the requirement of the system on the diagnosis precision, ensures the real-time performance of fault diagnosis and has high application value in a wireless communication system.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a schematic diagram of a scenario in which the present invention may be applied;

FIG. 2 is a schematic diagram illustrating the dependence of a network slice on a fault in the present invention;

FIG. 3 is a schematic diagram of a fault diagnosis model according to the present invention;

FIG. 4 is a flow chart of the SFC fault diagnosis algorithm of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

FIG. 1 is a schematic diagram of a scenario in which the present invention may be implemented. Referring to fig. 1, a service function forwarding graph, consisting of two service function chains, subdivides the service function chains into an application layer, a virtualization layer, and an infrastructure layer. The application layer includes service functions and virtual links connecting the service functions. The virtualization layer comprises an NFV MANO controller and an SDN controller, and realizes the functions of resource management, network service arrangement, fault management and the like. The infrastructure layer comprises infrastructure physical nodes and connection relations thereof, and the access network and the core network equipment adopt various general servers and realize network function virtualization through virtual machines. Different slices can flexibly deploy VNFs of SFCs according to network service requests of users to meet quality of service requirements of the users, in fig. 1, VNFs of two SFCs in a slice are sequentially deployed in a DU pool, a CU pool, and a core network, and a common VNF and a virtual link may exist between the two SFCs. The general server node of the infrastructure layer may actually be affected by random factors in the environment to cause a failure, and the VNF node expressed as SFC by the mapping relationship may cause a failure. In order to ensure the stability and the service quality of the network, the node state of the VNF needs to be monitored, and then a self-healing technology is used to quickly recover the virtual network from a failure.

Fig. 2 is a schematic diagram illustrating the dependence of a network slice on a failure in the present invention. The fault diagnosis system needs to abstract supervised system resources, network resources related to the 5G end-to-end slicing network can be divided into physical layer resources and logic layer resources, and the corresponding resource types comprise the physical layer resources such as a CPU (Central processing Unit), a memory, a network, storage, bandwidth, a port and a link and the logic layer resources such as a Web agent, a firewall, network address conversion, an intrusion prevention system, a TCP (Transmission control protocol) optimizer, a load balancer and the like. The fault dependency graph reflects the hierarchical structure and the bearing relation of the network, and represents how faults propagate in the network and cause the breakdown of services, and the root cause analysis strategy finds the root cause of the faults according to the fault dependency graph.

Fig. 3 is a schematic diagram of a fault diagnosis model in the present invention. Referring to fig. 1, the SFC fault diagnosis model according to the present invention relies on a dynamic bayesian network that is causally related in time and adapts to environmental dynamic changes and a deep belief network that processes high-dimensional input data well, formalizes the fault inference relationship of the SFC based on a DDBN model, models a high-level temporal relationship using the dynamic bayesian network model, and extracts features from observation data using the deep belief network model in each time slice.

For the detected SFC failure node, all VNF node data of the physical node is collected once per time slice, and the node data collected by T time slices is represented as Y ═ { Y ═ Y₁,…,Y_t,…,Y_TThe node data collected by a single time slice is expressed as

Wherein the content of the first and second substances,

representing a VNF node H mapped to the physical node_mCPU utilization, processing latency, waitTime delay, bandwidth occupancy, etc. If a VNF node of the SFC fails, the observation data of the node is characterized as a fault symptom, and abnormal data such as processing delay, waiting delay rise, bandwidth occupancy rate increase, CPU load increase and the like can occur. Hidden variable X ═ X₁,…,X_t,…，X_TThe real state of the physical node in each time slice is represented, and the real state corresponds to various fault sources of a physical layer and a logic layer, including software and hardware components which can be in fault, such as a CPU, a port, a cache, a DNS and the like. Wherein, the data Y collected by the fault symptom node in the time slice t_tIs the physical node true state X_tExpression, t time slice physical node true state X_tReal state X dependent on t-1 time slice_t-1。

The invention discloses a service function chain fault diagnosis method based on a deep dynamic Bayesian network, which comprises the following steps of:

under a service function chain scene, the NFV MANO of a virtualization layer determines VNFs and logic links thereof required by services according to user service requests, and ensures resources of a general server occupied by the operation of the VNFs and specific bandwidth on a path, wherein the resources of the general server comprise calculation, network and storage, and then an SDN controller connects the VNFs together to form an SFC and control transmission connection; the application layer comprises a plurality of SFCs for serving various service flows, each SFC is formed by different network functions in a chain mode according to a certain sequence and provides end-to-end service for the service flows.

According to a fault propagation relation in a layered network architecture of a service function chain, a fault diagnosis model is established, a VNF node which is possibly faulted needs to be positioned at an application layer, and then the root of the fault is positioned according to a mapping relation between the VNF node which is faulted at the application layer and an infrastructure layer; for a layered network architecture, a Dynamic Bayesian Network (DBN) capable of causal association over time and adapting to environmental dynamics is employed for fault diagnosis.

monitoring a plurality of VNF performance data at a physical node thereon and collecting high dimensional data of a symptom; the data is subjected to normalization preprocessing to eliminate the influence caused by different symptom information dimensions, and the data is preprocessed by adopting a linear maximum and minimum method, wherein the conversion function is as follows:

aiming at the diversity of network observation data and the spatial correlation between a physical node and a VNF under the SDN/NFV architecture, a related Deep Belief Network (DBN) model is established:

s33: the parameters are further optimized using real-time symptom data.

θ＝{w_ij,a_i,b_j:1 i m,1 j n}

firstly, collecting a historical observation data set of a fault node as a training sample, and dividing the training sample into a marked sample and an unmarked sample; given that S and U represent sets of marked and unmarked samples, respectively, Y and X represent various types of symptom information of a failed VNF node and a label output of the failure type, respectively; the set of historical observation data for VNF nodes on the same physical node is denoted Q ═ … Q_i… } in which Q_i＝[Y_t,Y_t-1,…,Y_t-d+1]D represents the dimension of the sample of the model input; finally, marking an unmarked sample of unsupervised learning as U (Y), marking a marked sample of supervised learning as S (Y, X), and dividing all data samples into small-batch data sets so as to improve the training speed of the DBN model through batch training;

then, dividing the sample set into a training set, a verification set and a test set according to a proportion, and training the model by using unmarked and marked small-batch data sets; learning parameters of the RBM in an unsupervised learning mode by using an unlabeled data set U (Y), so that the network probability distribution of the RBM can be better fitted to a training sample; then, approximate sampling is carried out on the sample by adopting a k-step Contrast Divergence (CD) algorithm, and the parameter theta is updated by solving the gradient of a log-likelihood function;

after the optimal model parameters in the unsupervised pre-training stage are obtained, carrying out supervised reverse fine adjustment by combining tag data S (Y, X), and establishing a complex nonlinear relation between fault characteristics and node state tags, wherein the tag values represent the real state of each VNF fault of the SFC; the adaptive learning rate BP algorithm added with momentum items is used for reversely fine-tuning the integral model parameters of the deep belief network, and the parameters in the unsupervised stage are taken as initialization parameters, and the expression is as follows:

wherein, theta_tAnd theta_t-1Respectively representing the correction quantity of the parameter in the t-th iteration and the t-1-th iteration, b is a momentum term coefficient, a is a learning rate, and ln L/ln theta is the gradient of the log-likelihood function of the current sample;

after obtaining a DBN model by using historical observation data of VNF nodes of the same type, optimizing the model in real time by using real-time observation data of the VNF fault nodes in a slicing period; sample R is updated in real time by a sampling sliding window mechanism_t＝[Y_t,Y_t-1,…,Y_t-d+1]Where d denotes the length of the sliding window, i.e. each time a time slice t has elapsed, the observation data Y at time t is introduced_tSimultaneously deleting Y_t-dKeeping the size of the input sample unchanged; then, a single sample set training mode is used for optimizing the parameters of the model, and the fault symptom is output to be Y at the moment t_t-d+1:tThe predicted infrastructure layer node state under the condition is p (X)_t|Y_t-d+1:t) Wherein X is_t＝{x₁,x₂,…,x_n}。

The dynamic Bayesian network DBN model is defined as (B)₀,B_→) In which B is₀Depth of representationAnd (3) a priori network of an initial time slice in an online learning phase of the belief network, namely the initial physical node state. B is_→A hidden state transition model representing a BN composed of more than two time slices;

setting the initial hidden variable prior distribution matrix as pi, then

π＝(π_i)_1×n,i＝1,2,…,n

Wherein

A prior probability corresponding to the operating state of the infrastructure layer node at the initial time; then, the posterior probability estimated by the initial time slice in the online learning stage of the deep belief network is used as the prior probability of the node state;

the state transition matrix between the failed nodes is A, then

A＝(a_ik)_n×n,a_ik＝P(X_t＝i|X_t-1＝k)，i,k＝1,2,…,n

B＝(b_ij)_n×m，b_ij＝P(Y_t＝j|X_t＝i)，i＝1,2,…,n,j＝1,2,…,m.

wherein

Representing observed emission probabilities required for DDBN inference; then, modeling the observed emission probability by adopting a Deep Belief Network (DBN) which well extracts the high-dimensional data characteristics;

In the SFC fault diagnosis model, the main idea of using a 1.5 time slice joint tree reasoning algorithm to carry out SFC fault reasoning is as follows: according to the Markov property of the dynamic Bayesian network, the set of fault nodes has child nodes in the next time slice, and under the condition that the values of the child nodes are known, the states of the past nodes and the states of the future nodes have no relation, and the child nodes are called interface nodes; is provided with JT_tIs a union tree in time t, C_tIs JT_tIn which contains I_tThe ball of (D)_tIs JT_tIn which contains I_t-1Interface node I in clique, time slice_tInterface I for receiving previous time slice_t-1And also the next time slice interface I_t+1Reasoning is carried out between the interfaces through message propagation, and the reasoning process is as follows:

step 2: information is propagated forward whenJoint tree JT of preceding time slices_tJoint Tree JT from previous time slice_t-1Obtaining new evidence;

step 21) initializing junction tree JT_tThe potential function ψ of;

step 24) for root C_tCollecting and adding evidence;

Step 31) if T is T, distributing the evidence;

step 32) marginalizing D_t+1To obtain I_tA probability distribution of (a);

step 33) by absorption D_t+1To update JT_tC in_tI of (A)_t；

Wherein the content of the first and second substances,

and

representing the original potential;

step 34) from root C_tDistributing the evidence;

FIG. 4 is a flow chart of the SFC fault diagnosis algorithm of the present invention. The process is as follows:

step 401: collecting related monitoring data of a plurality of VNF nodes on a physical node corresponding to the SFC symptom node, and extracting data characteristics;

step 402: carrying out normalization pretreatment on the characteristic data;

step 403: the observation dataset is divided into a historical observation dataset and a real-time symptom dataset. And performing multilayer RBM pre-training on the historical observation data set, adding a softmax layer, performing reverse fine adjustment on the deep belief network model by using a self-adaptive BP algorithm introducing momentum items, and extracting a model parameter theta. Finally, further optimizing the model parameters by using a real-time symptom data set, and outputting a predicted physical node state;

step 404: obtaining a state transition probability matrix of the dynamic Bayesian network model according to the predicted physical node state, obtaining an observed emission probability matrix according to the real-time symptom data set and the predicted physical node state, and constructing the dynamic Bayesian network model based on the observed emission probability matrix;

step 405: and (4) carrying out SFC fault reasoning according to the 1.5 time slice joint tree reasoning algorithm to obtain a fault source.

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. A service function chain fault diagnosis method based on a deep dynamic Bayesian network is characterized in that: the method comprises the following steps:

s2: monitoring performance data of a plurality of Virtual Network Functions (VNFs) on a physical node, and collecting high-dimensional data of symptoms;

s3: aiming at the diversity of network observation data and the spatial correlation between a physical node and a VNF under an SDN/NFV framework, a correlated Deep Belief Network (DBN) model is established to carry out feature extraction and dimension reduction on the observation data, a historical observation data set is approximately sampled through a k-step contrastive divergence algorithm CD-k, and the model is finely adjusted by using a self-adaptive BP algorithm added with a momentum term;

s4: and (3) establishing a dynamic Bayesian network DBN model to diagnose the fault source in real time by utilizing the time correlation existing among the faults, and positioning the fault source by using a 1.5 time slice joint tree inference algorithm.

2. The deep dynamic bayesian network based service function chain fault diagnosis method according to claim 1, wherein: in step S1, in the service function chain scenario, the NFV MANO in the virtualization layer determines, according to the user service request, VNFs and their logical links required by the service, and ensures resources of a general server occupied by the operation of the VNFs and specific bandwidths on paths, where the resources of the general server include computation, network, and storage, and then the SDN controller connects the VNFs to form an SFC and control transmission connection; the application layer comprises a plurality of SFCs for serving various service flows, each SFC is formed by different network functions in a chain mode according to a certain sequence and provides end-to-end service for the service flows.

3. The deep dynamic bayesian network based service function chain fault diagnosis method according to claim 1, wherein: in step S1, according to the fault propagation relationship in the hierarchical network architecture of the service function chain, the established fault diagnosis model needs to first locate a VNF node that may have a fault at the application layer, and then locate the root of the fault according to the mapping relationship between the VNF node that has a fault at the application layer and the infrastructure layer; for a layered network architecture, a Dynamic Bayesian Network (DBN) capable of causal association over time and adapting to environmental dynamics is employed for fault diagnosis.

4. The deep dynamic bayesian network based service function chain fault diagnosis method according to claim 1, wherein: in step S2, monitoring multiple VNF performance data on the physical node, and collecting high-dimensional data of the symptom; in order to improve the learning efficiency of model parameters and improve the accuracy of the model, data needs to be subjected to normalization preprocessing to eliminate the influence caused by different symptom information dimensions, a linear maximum and minimum value method is adopted to carry out preprocessing on the data, and the conversion function is as follows:

5. the deep dynamic bayesian network based service function chain fault diagnosis method according to claim 1, wherein: in step S3, a relevant deep belief network DBN model is established for the diversity of network observation data based on the SDN/NFV architecture and the spatial correlation between the physical node and the VNF:

s33: the parameters are further optimized using real-time symptom data.

6. The deep dynamic Bayesian network based service function chain fault diagnosis method as recited in claim 5, wherein: in step S3, the following contents are specifically included:

θ＝{w_ij,a_i,b_j:1 i m,1 j n}

firstly, collecting a historical observation data set of a fault node as a training sample, and dividing the training sample into a marked sample and an unmarked sample; given that S and U represent sets of marked and unmarked samples, respectively, Y and X represent various types of symptom information of a failed VNF node and a label output of the failure type, respectively; the set of historical observation data for VNF nodes on the same physical node is denoted Q ═ … Q_i… } in which Q_i＝[Y_t,Y_t-1,…,Y_t-d+1]D represents the dimension of the sample of the model input; finally, the unmarked sample of unsupervised learning is recorded as U (Y), the marked sample of supervised learning is recorded as S (Y, X), and all the data samples are divided into small batch dataSet to improve the training speed of the DBN model through batch training;

after the optimal model parameters in the unsupervised pre-training stage are obtained, carrying out supervised reverse fine adjustment by combining tag data S (Y, X), and establishing a complex nonlinear relation between fault characteristics and node state tags, wherein the tag values represent the real state of each VNF fault of the SFC; the BP algorithm with the reduced self-adaptive learning rate is used for reversely fine-tuning the integral model parameters of the deep belief network, the parameters in the unsupervised stage are taken as initialization parameters, and the expression is as follows:

wherein, theta_tAnd theta_t-1Respectively representing the correction quantity of the parameter in the t-th iteration and the t-1-th iteration, b is a momentum term coefficient, a is a learning rate, and lnL/ln theta is the gradient of the log-likelihood function of the current sample;

7. The deep dynamic bayesian network based service function chain fault diagnosis method according to claim 1, wherein: in step S4, the method specifically includes:

the dynamic Bayesian network DBN model is defined as (B)₀,B_→) In which B is₀The prior network represents the initial time slice of the online learning stage of the deep belief network, namely the initial physical node state; b is_→A hidden state transition model representing a BN composed of more than two time slices;

setting the initial hidden variable prior distribution matrix as pi, then

π＝(π_i)_1×n,i＝1,2,…,n

Wherein

A prior probability corresponding to the operating state of the infrastructure layer node at the initial time; using the posterior probability estimated by the initial time slice in the online learning stage of the depth information network as the prior probability of the node state;

the state transition matrix between the failed nodes is A, then

A＝(a_ik)_n×n,a_ik＝P(X_t＝i|X_t-1＝k)，i,k＝1,2,…,n

B＝(b_ij)_n×m，b_ij＝P(Y_t＝j|X_t＝i)，i＝1,2,…,n,j＝1,2,…,m.

wherein

Representing observed emission probabilities required for DDBN inference; modeling the observed emission probability by adopting a Deep Belief Network (DBN) which well extracts high-dimensional data characteristics;

According to the Markov property of the dynamic Bayesian network, the set of fault nodes has child nodes in the next time slice, and under the condition that the values of the child nodes are known, the states of the past nodes and the states of the future nodes have no relation, and the child nodes are called interface nodes; is provided with JT_tIs a union tree in time t, C_tIs JT_tIn which contains I_tThe ball of (D)_tIs JT_tIn which contains I_t-1The mass of (a) is,interface node I in time slice_tInterface I for receiving previous time slice_t-1And also the next time slice interface I_t+1Reasoning is carried out between the interfaces through message propagation, and the reasoning process is as follows:

8. The deep dynamic bayesian network based service function chain fault diagnosis method according to claim 7, wherein: in the process of reasoning through message propagation between the interfaces in step S4, step 2 specifically includes the following steps:

step 21) initializing junction tree JT_tThe potential function ψ of;

step 22) firstly assigning values to symptom nodes, and then taking symptoms before t time slices as prior information P (I)_t-1|Y_1:t-1) Collecting symptom information to C_t-1Performing marginalization to obtain I_t-1A probability distribution of (a); through an interface I_t-1From C_t-1To D_tTransferring;

step 24) for root C_tCollecting and adding evidence;

9. The deep dynamic bayesian network based service function chain fault diagnosis method according to claim 8, wherein: in the process of reasoning through message propagation between the interfaces in step S4, step 3 specifically includes the following steps:

step 31) if T is T, distributing the evidence;

step 32) marginalizing D_t+1To obtain I_tA probability distribution of (a);

step 33) by absorption D_t+1To update JT_tC in_tI of (A)_t；

Wherein the content of the first and second substances,

and

representing the original potential;

step 34) from root C_tDistributing the evidence;