CN109194583B

CN109194583B - Network congestion link diagnosis method and system based on deep reinforcement learning

Info

Publication number: CN109194583B
Application number: CN201810890267.0A
Authority: CN
Inventors: 潘胜利; 曾德泽
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2021-05-14
Anticipated expiration: 2038-08-07
Also published as: CN109194583A

Abstract

The invention discloses a network congestion link diagnosis method and system based on deep Learning, which combine reinforcement Learning and deep Learning through DQN, utilize the advantages of the method facing a high-dimensional state and adopt a strategy based on state-action-reward to construct a Q-Learning mode of a label to diagnose a congestion link. In the invention, the enhanced learning part of DQN defines the state as a binary group consisting of a link and a congestion state set of all paths passing through the link; an action is defined to guess whether the link is congested based on the congestion state set of the path; the reward is defined as positive reward when guess is right and negative reward when guess is wrong, and the deep learning part of the DQN adopts a deep neural network such as a deep convolutional neural network. Therefore, the DQN automatically learns the incidence relation between the network congestion path and the network congestion link through continuous iteration, so that the network congestion link can be accurately diagnosed, and the diagnosis performance of the invention is excellent.

Description

Network congestion link diagnosis method and system based on deep reinforcement learning

Technical Field

The invention relates to the field of network congestion link diagnosis, in particular to a network congestion link diagnosis method and system based on deep reinforcement learning.

Background

The external measurement technology based on the cooperation of the routers initiates a measurement process at the edge of the network, and the parameters to be measured are obtained through the feedback of the internal nodes to the detection data. The common tools include ping for diagnosing network connectivity, obtaining traceroute of network topology, pathchar for measuring performance parameters such as link bandwidth and time delay, and the like. Such an approach would fail when the internal nodes do not support cooperation because of network security, etc. In addition, in such methods, an ICMP (internet Control Measurement protocol) message is mostly used as detection data, and the priority of the ICMP message in an actual network is low, so that the measured performance parameters may not accurately reflect the actual state of the network. End-to-end measurement obtains end-to-end performance parameters of the network by transceiving data between network edge nodes. The method only needs to use the basic store-and-forward function of the router, the dependence on the Network is minimum, and the Network Tomography (NT) is a method for deducing the Network internal parameters such as link performance parameters, topological structures and the like according to end-to-end measurement data. Because the method can obtain the internal performance parameters of the network under the condition of no internal node cooperation, and is very fit with the characteristics of non-cooperation, isomerization and edge control based on the current internet, the method carries out research by a tomography method of the performance parameters of the network link, solves the diagnosis of the network congestion link through deep learning, and more accurately and quickly obtains the running state of the link in the network.

Disclosure of Invention

In order to solve the technical problems, the application provides a network congestion link diagnosis method and system based on deep reinforcement learning.

The invention solves the technical problem and adopts a network congestion link diagnosis method based on deep learning, which comprises the following steps:

s1, collecting M parts of actual link congestion state data by the network to be diagnosed to obtain M pieces of link congestion state network directed acyclic graphs serving as a sample pool; m is an integer greater than 1, the link congestion status network is represented by a directed acyclic graph, G ═ V, E, where V ═ {0,1,2, …, k, …, M } is the set of network nodes, E ═ { l ═ l₁,l₂,…,l_k,…,l_mIs a set of links, and link l_kThen representing a link with an end node of k, the set of all paths in the network is defined as P ═ P₁,p₂,…,p_i,…,p_nAnd a corresponding path congestion state observation set is defined as Y ═ Y₁,y₂,…,y_i,…,y_nH, wherein the ith path p_iThe congestion state of is y_iWhen y is_iWhen 1, represents a path p_iIs in a congested state; and if y_iWhen 0, it represents a path p_iIn a normal state, phi_kRepresents a transit link l_kA set of paths of; y is_kCorresponds to phi_kThe congestion state observation set of each path in the network is X ═ X₁,x₂,...,x_k,...,x_m}；

S2, performing decision state modeling on each link congestion state network directed acyclic graph respectively, and generating a state set S together, wherein the state S is defined as a bituple of the link and a path congestion state set passing through the link, namely S ═ S_k＝(l_k,Y_k) The state set is S ═ S₁,s₂,s₃,...,s_k,...,s_mFor in state s ═ s }_kThe set of actions taken is a, where a 0 represents the guessed link l_kIs a normal link, i.e.

When a is 1, it represents a guess of l_kFor congested links, i.e. having

When the true link congestion status is the same as the guessed link congestion status, i.e., when

Then, a reward will be earned; otherwise, obtaining punishment;

s3, according to a training method DQN of the neural network, taking the state set S and the corresponding decision set B as training data sets to train the neural network, wherein each set of training data is input in a state of a directed acyclic graph during training, and the corresponding decision is output;

s4, adopting the same method as the step S2 to model the decision state of the network directed acyclic graph to be subjected to the network congestion link diagnosis and generate an initial task state S₀Will state s₀And substituting the neural network obtained by training in the step S3 to predict the link congestion state.

Further, in the method for diagnosing a network congestion link based on deep learning of the present invention, the method for constructing an objective function of DQN in step S3 is as follows:

a1, using a deep neural network as the network of Q value, with the parameter being ω, i.e. by updating ω to make the Q function approach the optimal value: q (s, a, omega) ≈ Q^π(s, a); wherein s represents a state and a represents a decision;

a2, defining an objective function using mean square error in Q-value:

L(ω)＝E[(r+γ·maxQ(s′,a′,ω)-Q(s,a,ω)²)]；

where s 'represents the next state, a' represents the next decision, and E represents. . . And r represents. . . And gamma denotes an attenuation coefficient;

a3, calculating the gradient of the parameter ω with respect to the objective function:

and A4, using the SGD to realize the end-to-end optimization target.

Furthermore, in the deep learning-based network congestion link diagnosis method of the present invention, the DQN training mainly comprises the following steps:

b1, initializing an experience pool D, setting the capacity to be N, and storing training samples;

b2, initializing a Q neural network of the action-value function, wherein the weight parameter theta is a random value;

b3 method for initializing target action-cost function

A neural network having the same structure as Q and a weight parameter theta^-＝θ；

B4, setting the total number M of fragments;

b5 initializing network input state s₀And computing a network output;

b6 set S with state_next＝{s₀And taking the network parameters as an input set, and performing recursive updating on the network parameters.

Further, in the method for diagnosing a network congestion link based on deep learning of the present invention, the step B6 of recursively updating the network includes:

b61, guessing the action of each state in the input set and updating the network, and obtaining the next state, if the next state is non-absorbing, adding the next state into the next state set;

and B62, if the next state set is not empty, taking the next state set as the input of the network recursive update, and continuing the recursive process, otherwise, ending the process.

Further, in the deep learning based network congestion link diagnosis method of the present invention, the step of performing action guessing on each state in step B61 to perform network update and obtain the next set of states includes:

c1, selecting actions by adopting an epsilon-greedy strategy: randomly selecting an action from the action set A as a by using probability epsilon_tOtherwise, inputting the current state into the current network, calculating the Q value of each action by using one-time CNN, and selecting the action with the maximum Q value as a_t；

C2, execution a_tGet execution a_tFeedback r of_tAnd the next state s_t+1；

C3, and combining the four parameters(s)_t,a_t,r_t,s_t+1) Storing the current state into D, wherein the D stores the state of N moments;

c4, randomly fetch minibratch state parameter sets from D(s)_j,a_j,r_j,s_j+1)；

C5, calculating the target value of each state, specifically by executing a_tThe latter reward updates the Q value as the target value: if the next state is the absorbing state, y_j＝r_jOtherwise

C6, updating the parameter theta through the SGD;

c7, every C timesUpdating target action-value function network after iteration

Parameter theta of^-The parameter θ, C of the network Q for the current action-value function is a positive integer greater than 1.

Further, in the network congestion link diagnosis method based on deep learning of the present invention, the whole network congestion link diagnosis method based on deep learning has a state set S, a policy set C, and a policy pi, a next moment action a ═ pi (S) is selected according to the current state, and for each state S in the state set, a corresponding return value r (S) corresponds to it; setting attenuation coefficient gamma for each next state in the state sequence, and setting corresponding weight function V for each strategy pi^π(s₀)＝E[R(s₀)+γR(s₁)+γ²R(s₂)+K|s₀＝S,π]＝E[R(s₀)+γV^π(s₁)]。

The invention also provides a network congestion link diagnosis system based on deep learning, which adopts any one of the network congestion link diagnosis methods based on deep learning to diagnose the network congestion link.

The invention has the beneficial effects that: according to the invention, the study is carried out through a tomography method of network link performance parameters, and Deep Learning and reinforcement Learning are combined through Deep Q-Learning, so that the running state of a link in a network can be obtained more accurately and quickly. The DQN is used for network training and a handling scheme of recursive updating is proposed for a plurality of next states which may occur in state transition for the present problem. Compared with the SCFS algorithm, the method has higher inference accuracy and robustness.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

fig. 1 is a flowchart of an embodiment of a deep learning based network congestion link diagnosis method according to the present invention;

FIG. 2 is a schematic diagram of neural network training in accordance with the present invention;

fig. 3 is a flow chart of DQN training of the present invention;

FIG. 4 is a flowchart of the recursive update in DQN training of the present invention;

FIG. 5 is a flow chart of the action guess updating network in DQN training according to the present invention;

FIG. 6 is a diagram illustrating state transitions when guessing 0;

FIG. 7 is a schematic diagram of the state transition when guessing 1;

FIG. 8 is a graph of the number of training cycles versus the degree of difference between the value network and the target network in accordance with the present invention;

FIG. 9 is a comparison of the present invention and the SCFS algorithm DR;

FIG. 10 is a comparison of the present invention and the SCFS algorithm FPR.

Detailed Description

For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

The invention discloses a method and a system for diagnosing a network congestion link based on Deep Q-Learning (DQN), which mainly combine reinforcement Learning and Deep Learning through DQN, utilize the advantages of the method facing a high-dimensional state and adopt a strategy based on 'state-action-reward' to construct a Q-Learning mode of a label so as to diagnose the congestion link. In the invention, the enhanced learning part of DQN defines the 'state' as a binary group consisting of a link and a congestion state set of all paths passing through the link; "action" is defined as guessing whether the link is congested based on the set of congestion states for the path; "reward" is defined as a positive reward when a pair is guessed and a negative reward when a mistake is guessed. The deep learning part of the DQN adopts deep neural networks such as a deep convolutional neural network. Therefore, the DQN automatically learns the incidence relation between the network congestion path and the network congestion link through continuous iteration, and the accurate diagnosis of the network congestion link is realized. Simulation experiment results in various network congestion scenes show that compared with the traditional SCFS method, the DQN method has better congestion link diagnosis performance.

Referring to fig. 1, the method for diagnosing a network congestion link based on deep learning according to the present embodiment includes the following steps:

s1, collecting M parts of actual link congestion state data by the network to be diagnosed to obtain M pieces of link congestion state network directed acyclic graphs serving as a sample pool; wherein M is a positive integer greater than 1;

s2, performing decision state modeling on each link congestion state network directed acyclic graph respectively, and generating a state set S together;

s3, taking a state set S generated by M link congestion state network directed acyclic graphs and a corresponding decision set B as inputs according to a training method DQN of the neural network, and carrying out neural network training, wherein each group of training data is a state of a directed acyclic graph as an input during training, and the corresponding decision is an output;

s4, adopting the same method as the step S2 to model the decision state of the network directed acyclic graph to be subjected to the network congestion link diagnosis and generate an initial task state S₀Will state s₀And substituting the data into the neural network obtained by DQN training to predict the link congestion state.

Referring to fig. 2, the DQN objective function construction method is as follows:

a1, using a deep neural network as the network of Q value, with the parameter being ω, i.e. by updating ω to make the Q function approach the optimal value: q (s, a, omega) ≈ Q^π(s, a); in the formula, s represents a state, and a represents a decision (action);

a2, using the mean-square error in the Q value to define the objective function, i.e. the loss function: l (ω) ═ E [ (r + γ · maxQ (s ', a', ω) -Q (s, a, ω)²)]；

In the formula, L (omega) is an objective function, s 'represents the next state, a' represents the next decision, E represents an expected operation, r represents reward, and gamma represents an attenuation coefficient;

a3, calculating the gradient of parameter ω with respect to loss function:

a4, using SGD to realize the end-to-end optimization goal;

referring to fig. 3, in DQN training, the main steps include:

b3 method for initializing target action-cost function

B4, setting the total number M of fragments;

b5 initializing network input state s₀And computing a network output;

b6 set S with state_next＝{s₀Taking the network parameters as an input set, and carrying out recursive updating on the network parameters;

the step B6 of recursively updating the network includes:

b62, if the next state set is not empty, taking the next state set as the input of the network recursive update, continuing the recursive process, otherwise, ending the process;

more specific steps are shown in fig. 4. The step of performing a network update by performing an action guess for each state in step B61 and obtaining a set of next states is shown in fig. 5 and includes:

c1, selecting actions by adopting an epsilon-greedy strategy: randomly selecting an action from the action set A as a by a probability epsilon (small)_tOtherwise, inputting the current state into the current network (using one CNN) to calculate the Q value of each action, and selecting the action with the maximum Q value (the optimal action)) As a is_t；

C2, execution a_tGet execution a_tFeedback r of_tAnd the next state s_t+1；

C3, and combining the four parameters(s)_t,a_t,r_t,s_t+1) Storing the current state into D (the state of N times is stored in D)

C5, calculating the target value of each state (by executing a)_tLater reward to update the Q value as the target value): if the next state is the absorbing state, y_j＝r_jOtherwise

C6, updating the parameter theta through the SGD;

c7, updating target action-value function network after each C iterations

Parameter theta of^-Parameter θ of network Q for the current action-value function.

The Network congestion link diagnosis method based on Deep reinforcement learning mainly comprises Deep Q Network (DQN) and congestion link diagnosis; the Deep Q Learning comprises reinforcement Learning and a Deep neural network, and the congestion link diagnosis comprises the step of obtaining the congestion state of an end-to-end link by a network tomography method.

Variable definition of the network:

the network is represented by a directed acyclic graph G ═ (V, E), where V ═ {0,1,2, …, k, …, m } is the set of nodes and E ═ l₁,l₂,…,l_k,…,l_mIs a set of links, and link l_kThen representing a link with an end node of k, the set of all paths in the network is defined as P ═ P₁,p₂,…,p_i,…,p_nAnd a corresponding path congestion state observation set is defined as Y ═ Y₁,y₂,…,y_i,…,y_nH, wherein the ith path p_iThe congestion state of is y_iWhen y is_iWhen 1, represents a path p_iIs in a congested state; and if y_iWhen 0, it represents a path p_iIn a normal state, phi_kRepresents a transit link l_kA set of paths of; y is_kCorresponds to phi_kA set of congestion status observations of each path in the set. The set of path states in the network is X ═ X₁,x₂,...,x_k,...,x_m}。

The variables involved in DQN are defined:

state s is defined as a couple of links and the set of path congestion states across the links. I.e. s ═ s_k＝(l_k,Y_k) (ii) a Set of states is S ═ S₁,s₂,s₃,...,s_k,...,s_m}. For being in state s ═ s_kWhen we can take the set of actions a-a, where a-0 represents a guess link l_kIs a normal link, i.e.

When a is 1, it represents a guess of l_kFor congested links, i.e. having

Then the prize R (s, a) ═ 1 will be won; otherwise, a penalty R (s, a) — 2 is obtained.

DQN diagnostic schematic:

referring to fig. 6, initial state set: s₁＝(l₁,[1,1,1,1,1]) When guessing 0, the state is transferred;

state s₂＝(l₂,[1,1]),(l₅,[1,1,1]) The two states are learned in parallel and transferred to the next state;

state s₃＝(l₆,[1,1]),(l₉,[1]) In case of guess 1, the process proceeds to the absorption state, and ends.

Referring to FIG. 7, s₁＝(l₁,[1,1,1,1,1]) If guess 1, directly transferring to the absorption state E, and ending;

in the network congestion link diagnosis method based on deep learning, the strategy of the network congestion link diagnosis method is continuously optimized from the existing state, a state set S, a behavior set A and a strategy pi exist in the whole system, a next-moment behavior a is selected according to the current state, wherein for each state S in the state set, a corresponding return value R (S) corresponds to the state S; setting attenuation coefficient gamma for each next state in the state sequence, and setting corresponding weight function V for each strategy pi^π(s₀)＝E[R(s₀)+γR(s₁)+γ²R(s₂)+K|s₀＝S,π]The expression satisfies the Bellman equation and is written as V^π(s₀)＝E[R(s₀)+γV^π(s₁)]。

Deep Neural Networks (DNNs) refer to a series of specific Neural networks composed of a stack of layers, each layer being composed of nodes. The operation is carried out in the node, the operation mode of the node is approximately similar to that of a neuron of a human being, and when enough stimulation information is met, signals are activated and released. Nodes combine input data with a set of coefficients (or weights) to specify their importance in the algorithm learning task by amplifying or suppressing the input. The sum of the products of the input data and the weights will enter the activation function of the node, determine whether the signal continues to propagate in the network, and the distance of propagation, and thus determine how the signal affects the final outcome of the network, such as classification actions. Deep learning networks differ from the more common single hidden layer neural networks by depth, the number of node layers through which the data passes in a multi-step process of pattern recognition. A system with more than three layers (including input and output layers) may be referred to as "deep" learning. Therefore, depth is a well-defined term that refers to more than one hidden layer.

In a deep learning network, each node layer learns to identify a particular set of features based on the output of a previous layer. As the depth of the neural network increases, the features that the nodes can recognize become more complex, because each layer integrates and reorganizes the features of the previous layer. The form and size of deep neural networks vary according to the application. Popular forms and sizes are evolving rapidly to improve model accuracy and efficiency. There are two main forms of network that handle input: feed forward and loop. In the feedforward network, all calculations are a series of operations based on the previous layer output, such as CNN. The cyclic network is inherently memorised, allowing long term dependencies to affect the output, such as LSTM, etc.

Updating formula according to Q-Learning: q^*(s, a) ═ Q (s, a) + α (r + γ maxQ (s ', a') -Q (s, a)), and the Loss Function of DQN is L (θ) ═ E [ (TargetQ-Q (s, a; θ))²]Where θ is a network parameter, the target is TargetQ ═ r + γ maxQ (s ', a'; θ). Experience replay (experience replay), the function of which is mainly to solve the problems of correlation and non-static distribution. The specific method is that each time step agent and the environment are interacted to obtain a transfer sample(s)_t,a_t,r_t,s_t+1) Storing the data in a playback memory unit, and randomly taking out some (minimatch) for training when training. The target network generates a targetQ value, Q (s, a; θ)_i) Representing the output of the current network MainNet, which is used for evaluating the value function of the current state action pair;

the output of the TargetNet is expressed and substituted into the formula for obtaining the targetQ value to obtain the target Q value. And updating the parameters of the MainNet according to the above Loss Function, and copying the parameters of the MainNet to the TargetNet after N iterations.

Network tomography treats the target network as a black box, and, in general, all measurements are taken around the end nodes of the network, this measurement strategy is called end-to-end measurement, which, in addition to passively listening for data packet transmissions between end-to-end pairs, according to different routing protocols, the end-to-end measurement modes are mainly divided into two types at present: multicast measurements versus unicast measurements, for security and like reasons, nguyen et al proves that the prior congestion probability of a LINK can be uniquely determined by data of a plurality of measurement time slots under the framework of Boolean network tomography based on multi-time slot measurement, and provides a method CLINK (coordinated Link identification) based on matrix inversion to solve the problem; the method is more accurate compared with the SCFS algorithm, and has higher detection rate particularly when the number of the congested links in the network is large.

And (3) carrying out an experiment by using a simulation network, wherein the simulation network comprises 15 paths, the prior congestion probability of the link is randomly generated, and the experiment is repeatedly simulated for 100 times. The method provided by the patent is used for diagnosing link congestion, and a relational graph of the training cycle number of the scheme and the difference degree between the value network and the target network is obtained and is shown in fig. 8. In the figure, the horizontal axis coordinate is the number of training cycles, and the vertical axis coordinate is the difference between the value network and the target network. It can be seen that as the number of training cycles increases, the degree of difference between the value network and the target network decreases. A comparative experiment is carried out by using the SCFS algorithm and the algorithm herein, and a line graph of the relationship between the Detection Rate (DR) and the false alarm rate (FPR) and the congestion probability ρ is obtained, as shown in fig. 9 and 10 in sequence. The detection rate is a ratio of the number of detected positive samples to the number of all positive samples, and the false detection rate is a ratio of actual negative samples among the samples detected as positive samples. As can be seen from fig. 9, the detection rate of the SCFS algorithm is small as the congestion probability increases, whereas the detection rate of the method is less affected by the congestion probability, and is always kept at a higher level, which is much better than the SCFS algorithm in performance. As can be seen from fig. 10, when the congestion probability is less than or equal to 0.7, the method herein is only slightly higher than the SCFS algorithm in terms of false alarm rate. Taken together, the methods herein perform better.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A network congestion link diagnosis method based on deep learning is characterized by comprising the following steps:

S2, performing decision state modeling on each link congestion state network directed acyclic graph respectively, and generating state sets togetherS, state S is defined as a doublet of the link and the set of path congestion states across the link, i.e., S ═ S_k＝(l_k,Y_k) The state set is S ═ S₁,s₂,s₃,...,s_k,...,s_mFor in state s ═ s }_kThe set of actions taken is a, where a 0 represents the guessed link l_kIs a normal link, i.e.

When a is 1, it represents a guess of l_kFor congested links, i.e. having

Then, a reward will be earned; otherwise, obtaining punishment;

s4, adopting the same method as the step S2 to model the decision state of the network directed acyclic graph to be subjected to the network congestion link diagnosis and generate an initial task state S₀Will state s₀And substituting the information into the neural network obtained by training in the step S3, and automatically learning the incidence relation between the network congestion path and the network congestion link through continuous iteration to predict the link congestion state.

2. The deep learning-based network congestion link diagnosis method according to claim 1, wherein the objective function construction method of DQN in step S3 is as follows:

a1, using a deep neural network as the network of Q value, with the parameter being omega, i.e. by updating omega so thatThe Q function approaches the optimum: q (s, a, omega) ≈ Q^π(s, a); in the formula, s represents a state, a represents a decision, and pi is a strategy;

a2, defining an objective function using mean square error in Q-value:

L(ω)＝E[(r+γ·maxQ(s′,a′,ω)-Q(s,a,ω)²)]；

wherein s 'represents the next state, a' represents the next decision, E represents the desired operation, r represents the reward, and γ represents the attenuation coefficient;

and A4, using the SGD to realize the end-to-end optimization target.

3. The deep learning based network congestion link diagnosis method according to claim 2,

in DQN training, the main steps include:

b3 method for initializing target action-cost function

B4, setting the total number M of fragments;

b5 initializing network input state s₀And computing a network output;

4. The deep learning based network congestion link diagnosis method according to claim 3, wherein the step of recursively updating the network in step B6 comprises:

5. The deep learning based network congestion link diagnosis method of claim 4, wherein the step of performing action guessing for each state to perform network update and get the next set of states in step B61 comprises:

C2, execution a_tGet execution a_tFeedback r of_tAnd the next state s_t+1；

C6, updating the parameter theta through the SGD;

c7, updating target action-value function network after each C iterations

6. The deep learning-based network congestion link diagnosis method according to claim 1, wherein there are a state set S, a policy set C, and a policy pi in the whole deep learning-based network congestion link diagnosis method, a next-moment action a ═ pi (S) is selected according to the current state, and for each state S in the state set, there is a corresponding return value r (S) corresponding to it; setting attenuation coefficient gamma for each next state in the state sequence, and setting corresponding weight function V for each strategy pi^π(s₀)＝E[R(s₀)+γR(s₁)+γ²R(s₂)+K|s₀＝S,π]＝E[R(s₀)+γV^π(s₁)]。

7. A deep learning based network congestion link diagnostic system, characterized by: network congestion link diagnosis is performed by using the deep learning based network congestion link diagnosis method according to any one of claims 1 to 6.