CN109194583B - Network congestion link diagnosis method and system based on deep reinforcement learning - Google Patents

Network congestion link diagnosis method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN109194583B
CN109194583B CN201810890267.0A CN201810890267A CN109194583B CN 109194583 B CN109194583 B CN 109194583B CN 201810890267 A CN201810890267 A CN 201810890267A CN 109194583 B CN109194583 B CN 109194583B
Authority
CN
China
Prior art keywords
state
network
link
congestion
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810890267.0A
Other languages
Chinese (zh)
Other versions
CN109194583A (en
Inventor
潘胜利
曾德泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN201810890267.0A priority Critical patent/CN109194583B/en
Publication of CN109194583A publication Critical patent/CN109194583A/en
Application granted granted Critical
Publication of CN109194583B publication Critical patent/CN109194583B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/127Avoiding congestion; Recovering from congestion by using congestion prediction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network congestion link diagnosis method and system based on deep Learning, which combine reinforcement Learning and deep Learning through DQN, utilize the advantages of the method facing a high-dimensional state and adopt a strategy based on state-action-reward to construct a Q-Learning mode of a label to diagnose a congestion link. In the invention, the enhanced learning part of DQN defines the state as a binary group consisting of a link and a congestion state set of all paths passing through the link; an action is defined to guess whether the link is congested based on the congestion state set of the path; the reward is defined as positive reward when guess is right and negative reward when guess is wrong, and the deep learning part of the DQN adopts a deep neural network such as a deep convolutional neural network. Therefore, the DQN automatically learns the incidence relation between the network congestion path and the network congestion link through continuous iteration, so that the network congestion link can be accurately diagnosed, and the diagnosis performance of the invention is excellent.

Description

Network congestion link diagnosis method and system based on deep reinforcement learning
Technical Field
The invention relates to the field of network congestion link diagnosis, in particular to a network congestion link diagnosis method and system based on deep reinforcement learning.
Background
The external measurement technology based on the cooperation of the routers initiates a measurement process at the edge of the network, and the parameters to be measured are obtained through the feedback of the internal nodes to the detection data. The common tools include ping for diagnosing network connectivity, obtaining traceroute of network topology, pathchar for measuring performance parameters such as link bandwidth and time delay, and the like. Such an approach would fail when the internal nodes do not support cooperation because of network security, etc. In addition, in such methods, an ICMP (internet Control Measurement protocol) message is mostly used as detection data, and the priority of the ICMP message in an actual network is low, so that the measured performance parameters may not accurately reflect the actual state of the network. End-to-end measurement obtains end-to-end performance parameters of the network by transceiving data between network edge nodes. The method only needs to use the basic store-and-forward function of the router, the dependence on the Network is minimum, and the Network Tomography (NT) is a method for deducing the Network internal parameters such as link performance parameters, topological structures and the like according to end-to-end measurement data. Because the method can obtain the internal performance parameters of the network under the condition of no internal node cooperation, and is very fit with the characteristics of non-cooperation, isomerization and edge control based on the current internet, the method carries out research by a tomography method of the performance parameters of the network link, solves the diagnosis of the network congestion link through deep learning, and more accurately and quickly obtains the running state of the link in the network.
Disclosure of Invention
In order to solve the technical problems, the application provides a network congestion link diagnosis method and system based on deep reinforcement learning.
The invention solves the technical problem and adopts a network congestion link diagnosis method based on deep learning, which comprises the following steps:
s1, collecting M parts of actual link congestion state data by the network to be diagnosed to obtain M pieces of link congestion state network directed acyclic graphs serving as a sample pool; m is an integer greater than 1, the link congestion status network is represented by a directed acyclic graph, G ═ V, E, where V ═ {0,1,2, …, k, …, M } is the set of network nodes, E ═ { l ═ l1,l2,…,lk,…,lmIs a set of links, and link lkThen representing a link with an end node of k, the set of all paths in the network is defined as P ═ P1,p2,…,pi,…,pnAnd a corresponding path congestion state observation set is defined as Y ═ Y1,y2,…,yi,…,ynH, wherein the ith path piThe congestion state of is yiWhen y isiWhen 1, represents a path piIs in a congested state; and if yiWhen 0, it represents a path piIn a normal state, phikRepresents a transit link lkA set of paths of; y iskCorresponds to phikThe congestion state observation set of each path in the network is X ═ X1,x2,...,xk,...,xm};
S2, performing decision state modeling on each link congestion state network directed acyclic graph respectively, and generating a state set S together, wherein the state S is defined as a bituple of the link and a path congestion state set passing through the link, namely S ═ Sk=(lk,Yk) The state set is S ═ S1,s2,s3,...,sk,...,smFor in state s ═ s }kThe set of actions taken is a, where a 0 represents the guessed link lkIs a normal link, i.e.
Figure GDA0002965432180000021
When a is 1, it represents a guess of lkFor congested links, i.e. having
Figure GDA0002965432180000022
When the true link congestion status is the same as the guessed link congestion status, i.e., when
Figure GDA0002965432180000023
Then, a reward will be earned; otherwise, obtaining punishment;
s3, according to a training method DQN of the neural network, taking the state set S and the corresponding decision set B as training data sets to train the neural network, wherein each set of training data is input in a state of a directed acyclic graph during training, and the corresponding decision is output;
s4, adopting the same method as the step S2 to model the decision state of the network directed acyclic graph to be subjected to the network congestion link diagnosis and generate an initial task state S0Will state s0And substituting the neural network obtained by training in the step S3 to predict the link congestion state.
Further, in the method for diagnosing a network congestion link based on deep learning of the present invention, the method for constructing an objective function of DQN in step S3 is as follows:
a1, using a deep neural network as the network of Q value, with the parameter being ω, i.e. by updating ω to make the Q function approach the optimal value: q (s, a, omega) ≈ Qπ(s, a); wherein s represents a state and a represents a decision;
a2, defining an objective function using mean square error in Q-value:
L(ω)=E[(r+γ·maxQ(s′,a′,ω)-Q(s,a,ω)2)];
where s 'represents the next state, a' represents the next decision, and E represents. . . And r represents. . . And gamma denotes an attenuation coefficient;
a3, calculating the gradient of the parameter ω with respect to the objective function:
Figure GDA0002965432180000031
and A4, using the SGD to realize the end-to-end optimization target.
Furthermore, in the deep learning-based network congestion link diagnosis method of the present invention, the DQN training mainly comprises the following steps:
b1, initializing an experience pool D, setting the capacity to be N, and storing training samples;
b2, initializing a Q neural network of the action-value function, wherein the weight parameter theta is a random value;
b3 method for initializing target action-cost function
Figure GDA0002965432180000032
A neural network having the same structure as Q and a weight parameter theta-=θ;
B4, setting the total number M of fragments;
b5 initializing network input state s0And computing a network output;
b6 set S with statenext={s0And taking the network parameters as an input set, and performing recursive updating on the network parameters.
Further, in the method for diagnosing a network congestion link based on deep learning of the present invention, the step B6 of recursively updating the network includes:
b61, guessing the action of each state in the input set and updating the network, and obtaining the next state, if the next state is non-absorbing, adding the next state into the next state set;
and B62, if the next state set is not empty, taking the next state set as the input of the network recursive update, and continuing the recursive process, otherwise, ending the process.
Further, in the deep learning based network congestion link diagnosis method of the present invention, the step of performing action guessing on each state in step B61 to perform network update and obtain the next set of states includes:
c1, selecting actions by adopting an epsilon-greedy strategy: randomly selecting an action from the action set A as a by using probability epsilontOtherwise, inputting the current state into the current network, calculating the Q value of each action by using one-time CNN, and selecting the action with the maximum Q value as at
C2, execution atGet execution atFeedback r oftAnd the next state st+1
C3, and combining the four parameters(s)t,at,rt,st+1) Storing the current state into D, wherein the D stores the state of N moments;
c4, randomly fetch minibratch state parameter sets from D(s)j,aj,rj,sj+1);
C5, calculating the target value of each state, specifically by executing atThe latter reward updates the Q value as the target value: if the next state is the absorbing state, yj=rjOtherwise
Figure GDA0002965432180000033
C6, updating the parameter theta through the SGD;
c7, every C timesUpdating target action-value function network after iteration
Figure GDA0002965432180000041
Parameter theta of-The parameter θ, C of the network Q for the current action-value function is a positive integer greater than 1.
Further, in the network congestion link diagnosis method based on deep learning of the present invention, the whole network congestion link diagnosis method based on deep learning has a state set S, a policy set C, and a policy pi, a next moment action a ═ pi (S) is selected according to the current state, and for each state S in the state set, a corresponding return value r (S) corresponds to it; setting attenuation coefficient gamma for each next state in the state sequence, and setting corresponding weight function V for each strategy piπ(s0)=E[R(s0)+γR(s1)+γ2R(s2)+K|s0=S,π]=E[R(s0)+γVπ(s1)]。
The invention also provides a network congestion link diagnosis system based on deep learning, which adopts any one of the network congestion link diagnosis methods based on deep learning to diagnose the network congestion link.
The invention has the beneficial effects that: according to the invention, the study is carried out through a tomography method of network link performance parameters, and Deep Learning and reinforcement Learning are combined through Deep Q-Learning, so that the running state of a link in a network can be obtained more accurately and quickly. The DQN is used for network training and a handling scheme of recursive updating is proposed for a plurality of next states which may occur in state transition for the present problem. Compared with the SCFS algorithm, the method has higher inference accuracy and robustness.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
fig. 1 is a flowchart of an embodiment of a deep learning based network congestion link diagnosis method according to the present invention;
FIG. 2 is a schematic diagram of neural network training in accordance with the present invention;
fig. 3 is a flow chart of DQN training of the present invention;
FIG. 4 is a flowchart of the recursive update in DQN training of the present invention;
FIG. 5 is a flow chart of the action guess updating network in DQN training according to the present invention;
FIG. 6 is a diagram illustrating state transitions when guessing 0;
FIG. 7 is a schematic diagram of the state transition when guessing 1;
FIG. 8 is a graph of the number of training cycles versus the degree of difference between the value network and the target network in accordance with the present invention;
FIG. 9 is a comparison of the present invention and the SCFS algorithm DR;
FIG. 10 is a comparison of the present invention and the SCFS algorithm FPR.
Detailed Description
For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
The invention discloses a method and a system for diagnosing a network congestion link based on Deep Q-Learning (DQN), which mainly combine reinforcement Learning and Deep Learning through DQN, utilize the advantages of the method facing a high-dimensional state and adopt a strategy based on 'state-action-reward' to construct a Q-Learning mode of a label so as to diagnose the congestion link. In the invention, the enhanced learning part of DQN defines the 'state' as a binary group consisting of a link and a congestion state set of all paths passing through the link; "action" is defined as guessing whether the link is congested based on the set of congestion states for the path; "reward" is defined as a positive reward when a pair is guessed and a negative reward when a mistake is guessed. The deep learning part of the DQN adopts deep neural networks such as a deep convolutional neural network. Therefore, the DQN automatically learns the incidence relation between the network congestion path and the network congestion link through continuous iteration, and the accurate diagnosis of the network congestion link is realized. Simulation experiment results in various network congestion scenes show that compared with the traditional SCFS method, the DQN method has better congestion link diagnosis performance.
Referring to fig. 1, the method for diagnosing a network congestion link based on deep learning according to the present embodiment includes the following steps:
s1, collecting M parts of actual link congestion state data by the network to be diagnosed to obtain M pieces of link congestion state network directed acyclic graphs serving as a sample pool; wherein M is a positive integer greater than 1;
s2, performing decision state modeling on each link congestion state network directed acyclic graph respectively, and generating a state set S together;
s3, taking a state set S generated by M link congestion state network directed acyclic graphs and a corresponding decision set B as inputs according to a training method DQN of the neural network, and carrying out neural network training, wherein each group of training data is a state of a directed acyclic graph as an input during training, and the corresponding decision is an output;
s4, adopting the same method as the step S2 to model the decision state of the network directed acyclic graph to be subjected to the network congestion link diagnosis and generate an initial task state S0Will state s0And substituting the data into the neural network obtained by DQN training to predict the link congestion state.
Referring to fig. 2, the DQN objective function construction method is as follows:
a1, using a deep neural network as the network of Q value, with the parameter being ω, i.e. by updating ω to make the Q function approach the optimal value: q (s, a, omega) ≈ Qπ(s, a); in the formula, s represents a state, and a represents a decision (action);
a2, using the mean-square error in the Q value to define the objective function, i.e. the loss function: l (ω) ═ E [ (r + γ · maxQ (s ', a', ω) -Q (s, a, ω)2)];
In the formula, L (omega) is an objective function, s 'represents the next state, a' represents the next decision, E represents an expected operation, r represents reward, and gamma represents an attenuation coefficient;
a3, calculating the gradient of parameter ω with respect to loss function:
Figure GDA0002965432180000061
a4, using SGD to realize the end-to-end optimization goal;
referring to fig. 3, in DQN training, the main steps include:
b1, initializing an experience pool D, setting the capacity to be N, and storing training samples;
b2, initializing a Q neural network of the action-value function, wherein the weight parameter theta is a random value;
b3 method for initializing target action-cost function
Figure GDA0002965432180000062
A neural network having the same structure as Q and a weight parameter theta-=θ;
B4, setting the total number M of fragments;
b5 initializing network input state s0And computing a network output;
b6 set S with statenext={s0Taking the network parameters as an input set, and carrying out recursive updating on the network parameters;
the step B6 of recursively updating the network includes:
b61, guessing the action of each state in the input set and updating the network, and obtaining the next state, if the next state is non-absorbing, adding the next state into the next state set;
b62, if the next state set is not empty, taking the next state set as the input of the network recursive update, continuing the recursive process, otherwise, ending the process;
more specific steps are shown in fig. 4. The step of performing a network update by performing an action guess for each state in step B61 and obtaining a set of next states is shown in fig. 5 and includes:
c1, selecting actions by adopting an epsilon-greedy strategy: randomly selecting an action from the action set A as a by a probability epsilon (small)tOtherwise, inputting the current state into the current network (using one CNN) to calculate the Q value of each action, and selecting the action with the maximum Q value (the optimal action)) As a ist
C2, execution atGet execution atFeedback r oftAnd the next state st+1
C3, and combining the four parameters(s)t,at,rt,st+1) Storing the current state into D (the state of N times is stored in D)
C4, randomly fetch minibratch state parameter sets from D(s)j,aj,rj,sj+1);
C5, calculating the target value of each state (by executing a)tLater reward to update the Q value as the target value): if the next state is the absorbing state, yj=rjOtherwise
Figure GDA0002965432180000071
C6, updating the parameter theta through the SGD;
c7, updating target action-value function network after each C iterations
Figure GDA0002965432180000072
Parameter theta of-Parameter θ of network Q for the current action-value function.
The Network congestion link diagnosis method based on Deep reinforcement learning mainly comprises Deep Q Network (DQN) and congestion link diagnosis; the Deep Q Learning comprises reinforcement Learning and a Deep neural network, and the congestion link diagnosis comprises the step of obtaining the congestion state of an end-to-end link by a network tomography method.
Variable definition of the network:
the network is represented by a directed acyclic graph G ═ (V, E), where V ═ {0,1,2, …, k, …, m } is the set of nodes and E ═ l1,l2,…,lk,…,lmIs a set of links, and link lkThen representing a link with an end node of k, the set of all paths in the network is defined as P ═ P1,p2,…,pi,…,pnAnd a corresponding path congestion state observation set is defined as Y ═ Y1,y2,…,yi,…,ynH, wherein the ith path piThe congestion state of is yiWhen y isiWhen 1, represents a path piIs in a congested state; and if yiWhen 0, it represents a path piIn a normal state, phikRepresents a transit link lkA set of paths of; y iskCorresponds to phikA set of congestion status observations of each path in the set. The set of path states in the network is X ═ X1,x2,...,xk,...,xm}。
The variables involved in DQN are defined:
state s is defined as a couple of links and the set of path congestion states across the links. I.e. s ═ sk=(lk,Yk) (ii) a Set of states is S ═ S1,s2,s3,...,sk,...,sm}. For being in state s ═ skWhen we can take the set of actions a-a, where a-0 represents a guess link lkIs a normal link, i.e.
Figure GDA0002965432180000073
When a is 1, it represents a guess of lkFor congested links, i.e. having
Figure GDA0002965432180000074
When the true link congestion status is the same as the guessed link congestion status, i.e., when
Figure GDA0002965432180000075
Then the prize R (s, a) ═ 1 will be won; otherwise, a penalty R (s, a) — 2 is obtained.
DQN diagnostic schematic:
referring to fig. 6, initial state set: s1=(l1,[1,1,1,1,1]) When guessing 0, the state is transferred;
state s2=(l2,[1,1]),(l5,[1,1,1]) The two states are learned in parallel and transferred to the next state;
state s3=(l6,[1,1]),(l9,[1]) In case of guess 1, the process proceeds to the absorption state, and ends.
Referring to FIG. 7, s1=(l1,[1,1,1,1,1]) If guess 1, directly transferring to the absorption state E, and ending;
in the network congestion link diagnosis method based on deep learning, the strategy of the network congestion link diagnosis method is continuously optimized from the existing state, a state set S, a behavior set A and a strategy pi exist in the whole system, a next-moment behavior a is selected according to the current state, wherein for each state S in the state set, a corresponding return value R (S) corresponds to the state S; setting attenuation coefficient gamma for each next state in the state sequence, and setting corresponding weight function V for each strategy piπ(s0)=E[R(s0)+γR(s1)+γ2R(s2)+K|s0=S,π]The expression satisfies the Bellman equation and is written as Vπ(s0)=E[R(s0)+γVπ(s1)]。
Deep Neural Networks (DNNs) refer to a series of specific Neural networks composed of a stack of layers, each layer being composed of nodes. The operation is carried out in the node, the operation mode of the node is approximately similar to that of a neuron of a human being, and when enough stimulation information is met, signals are activated and released. Nodes combine input data with a set of coefficients (or weights) to specify their importance in the algorithm learning task by amplifying or suppressing the input. The sum of the products of the input data and the weights will enter the activation function of the node, determine whether the signal continues to propagate in the network, and the distance of propagation, and thus determine how the signal affects the final outcome of the network, such as classification actions. Deep learning networks differ from the more common single hidden layer neural networks by depth, the number of node layers through which the data passes in a multi-step process of pattern recognition. A system with more than three layers (including input and output layers) may be referred to as "deep" learning. Therefore, depth is a well-defined term that refers to more than one hidden layer.
In a deep learning network, each node layer learns to identify a particular set of features based on the output of a previous layer. As the depth of the neural network increases, the features that the nodes can recognize become more complex, because each layer integrates and reorganizes the features of the previous layer. The form and size of deep neural networks vary according to the application. Popular forms and sizes are evolving rapidly to improve model accuracy and efficiency. There are two main forms of network that handle input: feed forward and loop. In the feedforward network, all calculations are a series of operations based on the previous layer output, such as CNN. The cyclic network is inherently memorised, allowing long term dependencies to affect the output, such as LSTM, etc.
Updating formula according to Q-Learning: q*(s, a) ═ Q (s, a) + α (r + γ maxQ (s ', a') -Q (s, a)), and the Loss Function of DQN is L (θ) ═ E [ (TargetQ-Q (s, a; θ))2]Where θ is a network parameter, the target is TargetQ ═ r + γ maxQ (s ', a'; θ). Experience replay (experience replay), the function of which is mainly to solve the problems of correlation and non-static distribution. The specific method is that each time step agent and the environment are interacted to obtain a transfer sample(s)t,at,rt,st+1) Storing the data in a playback memory unit, and randomly taking out some (minimatch) for training when training. The target network generates a targetQ value, Q (s, a; θ)i) Representing the output of the current network MainNet, which is used for evaluating the value function of the current state action pair;
Figure GDA0002965432180000091
the output of the TargetNet is expressed and substituted into the formula for obtaining the targetQ value to obtain the target Q value. And updating the parameters of the MainNet according to the above Loss Function, and copying the parameters of the MainNet to the TargetNet after N iterations.
Network tomography treats the target network as a black box, and, in general, all measurements are taken around the end nodes of the network, this measurement strategy is called end-to-end measurement, which, in addition to passively listening for data packet transmissions between end-to-end pairs, according to different routing protocols, the end-to-end measurement modes are mainly divided into two types at present: multicast measurements versus unicast measurements, for security and like reasons, nguyen et al proves that the prior congestion probability of a LINK can be uniquely determined by data of a plurality of measurement time slots under the framework of Boolean network tomography based on multi-time slot measurement, and provides a method CLINK (coordinated Link identification) based on matrix inversion to solve the problem; the method is more accurate compared with the SCFS algorithm, and has higher detection rate particularly when the number of the congested links in the network is large.
And (3) carrying out an experiment by using a simulation network, wherein the simulation network comprises 15 paths, the prior congestion probability of the link is randomly generated, and the experiment is repeatedly simulated for 100 times. The method provided by the patent is used for diagnosing link congestion, and a relational graph of the training cycle number of the scheme and the difference degree between the value network and the target network is obtained and is shown in fig. 8. In the figure, the horizontal axis coordinate is the number of training cycles, and the vertical axis coordinate is the difference between the value network and the target network. It can be seen that as the number of training cycles increases, the degree of difference between the value network and the target network decreases. A comparative experiment is carried out by using the SCFS algorithm and the algorithm herein, and a line graph of the relationship between the Detection Rate (DR) and the false alarm rate (FPR) and the congestion probability ρ is obtained, as shown in fig. 9 and 10 in sequence. The detection rate is a ratio of the number of detected positive samples to the number of all positive samples, and the false detection rate is a ratio of actual negative samples among the samples detected as positive samples. As can be seen from fig. 9, the detection rate of the SCFS algorithm is small as the congestion probability increases, whereas the detection rate of the method is less affected by the congestion probability, and is always kept at a higher level, which is much better than the SCFS algorithm in performance. As can be seen from fig. 10, when the congestion probability is less than or equal to 0.7, the method herein is only slightly higher than the SCFS algorithm in terms of false alarm rate. Taken together, the methods herein perform better.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. A network congestion link diagnosis method based on deep learning is characterized by comprising the following steps:
s1, collecting M parts of actual link congestion state data by the network to be diagnosed to obtain M pieces of link congestion state network directed acyclic graphs serving as a sample pool; m is an integer greater than 1, the link congestion status network is represented by a directed acyclic graph, G ═ V, E, where V ═ {0,1,2, …, k, …, M } is the set of network nodes, E ═ { l ═ l1,l2,…,lk,…,lmIs a set of links, and link lkThen representing a link with an end node of k, the set of all paths in the network is defined as P ═ P1,p2,…,pi,…,pnAnd a corresponding path congestion state observation set is defined as Y ═ Y1,y2,…,yi,…,ynH, wherein the ith path piThe congestion state of is yiWhen y isiWhen 1, represents a path piIs in a congested state; and if yiWhen 0, it represents a path piIn a normal state, phikRepresents a transit link lkA set of paths of; y iskCorresponds to phikThe congestion state observation set of each path in the network is X ═ X1,x2,...,xk,...,xm};
S2, performing decision state modeling on each link congestion state network directed acyclic graph respectively, and generating state sets togetherS, state S is defined as a doublet of the link and the set of path congestion states across the link, i.e., S ═ Sk=(lk,Yk) The state set is S ═ S1,s2,s3,...,sk,...,smFor in state s ═ s }kThe set of actions taken is a, where a 0 represents the guessed link lkIs a normal link, i.e.
Figure FDA0002953209810000011
When a is 1, it represents a guess of lkFor congested links, i.e. having
Figure FDA0002953209810000012
When the true link congestion status is the same as the guessed link congestion status, i.e., when
Figure FDA0002953209810000013
Then, a reward will be earned; otherwise, obtaining punishment;
s3, according to a training method DQN of the neural network, taking the state set S and the corresponding decision set B as training data sets to train the neural network, wherein each set of training data is input in a state of a directed acyclic graph during training, and the corresponding decision is output;
s4, adopting the same method as the step S2 to model the decision state of the network directed acyclic graph to be subjected to the network congestion link diagnosis and generate an initial task state S0Will state s0And substituting the information into the neural network obtained by training in the step S3, and automatically learning the incidence relation between the network congestion path and the network congestion link through continuous iteration to predict the link congestion state.
2. The deep learning-based network congestion link diagnosis method according to claim 1, wherein the objective function construction method of DQN in step S3 is as follows:
a1, using a deep neural network as the network of Q value, with the parameter being omega, i.e. by updating omega so thatThe Q function approaches the optimum: q (s, a, omega) ≈ Qπ(s, a); in the formula, s represents a state, a represents a decision, and pi is a strategy;
a2, defining an objective function using mean square error in Q-value:
L(ω)=E[(r+γ·maxQ(s′,a′,ω)-Q(s,a,ω)2)];
wherein s 'represents the next state, a' represents the next decision, E represents the desired operation, r represents the reward, and γ represents the attenuation coefficient;
a3, calculating the gradient of the parameter ω with respect to the objective function:
Figure FDA0002953209810000021
and A4, using the SGD to realize the end-to-end optimization target.
3. The deep learning based network congestion link diagnosis method according to claim 2,
in DQN training, the main steps include:
b1, initializing an experience pool D, setting the capacity to be N, and storing training samples;
b2, initializing a Q neural network of the action-value function, wherein the weight parameter theta is a random value;
b3 method for initializing target action-cost function
Figure FDA0002953209810000022
A neural network having the same structure as Q and a weight parameter theta-=θ;
B4, setting the total number M of fragments;
b5 initializing network input state s0And computing a network output;
b6 set S with statenext={s0And taking the network parameters as an input set, and performing recursive updating on the network parameters.
4. The deep learning based network congestion link diagnosis method according to claim 3, wherein the step of recursively updating the network in step B6 comprises:
b61, guessing the action of each state in the input set and updating the network, and obtaining the next state, if the next state is non-absorbing, adding the next state into the next state set;
and B62, if the next state set is not empty, taking the next state set as the input of the network recursive update, and continuing the recursive process, otherwise, ending the process.
5. The deep learning based network congestion link diagnosis method of claim 4, wherein the step of performing action guessing for each state to perform network update and get the next set of states in step B61 comprises:
c1, selecting actions by adopting an epsilon-greedy strategy: randomly selecting an action from the action set A as a by using probability epsilontOtherwise, inputting the current state into the current network, calculating the Q value of each action by using one-time CNN, and selecting the action with the maximum Q value as at
C2, execution atGet execution atFeedback r oftAnd the next state st+1
C3, and combining the four parameters(s)t,at,rt,st+1) Storing the current state into D, wherein the D stores the state of N moments;
c4, randomly fetch minibratch state parameter sets from D(s)j,aj,rj,sj+1);
C5, calculating the target value of each state, specifically by executing atThe latter reward updates the Q value as the target value: if the next state is the absorbing state, yj=rjOtherwise
Figure FDA0002953209810000031
C6, updating the parameter theta through the SGD;
c7, updating target action-value function network after each C iterations
Figure FDA0002953209810000032
Parameter theta of-The parameter θ, C of the network Q for the current action-value function is a positive integer greater than 1.
6. The deep learning-based network congestion link diagnosis method according to claim 1, wherein there are a state set S, a policy set C, and a policy pi in the whole deep learning-based network congestion link diagnosis method, a next-moment action a ═ pi (S) is selected according to the current state, and for each state S in the state set, there is a corresponding return value r (S) corresponding to it; setting attenuation coefficient gamma for each next state in the state sequence, and setting corresponding weight function V for each strategy piπ(s0)=E[R(s0)+γR(s1)+γ2R(s2)+K|s0=S,π]=E[R(s0)+γVπ(s1)]。
7. A deep learning based network congestion link diagnostic system, characterized by: network congestion link diagnosis is performed by using the deep learning based network congestion link diagnosis method according to any one of claims 1 to 6.
CN201810890267.0A 2018-08-07 2018-08-07 Network congestion link diagnosis method and system based on deep reinforcement learning Active CN109194583B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810890267.0A CN109194583B (en) 2018-08-07 2018-08-07 Network congestion link diagnosis method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810890267.0A CN109194583B (en) 2018-08-07 2018-08-07 Network congestion link diagnosis method and system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN109194583A CN109194583A (en) 2019-01-11
CN109194583B true CN109194583B (en) 2021-05-14

Family

ID=64920861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810890267.0A Active CN109194583B (en) 2018-08-07 2018-08-07 Network congestion link diagnosis method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN109194583B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135482B (en) * 2019-04-30 2021-07-13 中国地质大学(武汉) Network topology inference method and system based on convolutional neural network
CN110213025A (en) * 2019-05-22 2019-09-06 浙江大学 Dedicated ad hoc network anti-interference method based on deeply study
EP3977783B1 (en) 2019-06-03 2023-07-26 Nokia Solutions and Networks Oy Uplink power control using deep q-learning
CN110225019B (en) * 2019-06-04 2021-08-31 腾讯科技(深圳)有限公司 Network security processing method and device
CN112242959B (en) * 2019-07-16 2022-10-14 中国移动通信集团浙江有限公司 Micro-service current-limiting control method, device, equipment and computer storage medium
CN110458466B (en) * 2019-08-16 2023-09-26 内蒙古大学 Patent estimation method and system based on data mining and heterogeneous knowledge association
CN110581808B (en) * 2019-08-22 2021-06-15 武汉大学 Congestion control method and system based on deep reinforcement learning
CN110809274B (en) * 2019-10-28 2023-04-21 南京邮电大学 Unmanned aerial vehicle base station enhanced network optimization method for narrowband Internet of things
CN110768906B (en) * 2019-11-05 2022-08-30 重庆邮电大学 SDN-oriented energy-saving routing method based on Q learning
CN111416774B (en) * 2020-03-17 2023-03-21 深圳市赛为智能股份有限公司 Network congestion control method and device, computer equipment and storage medium
CN112714074B (en) * 2020-12-29 2023-03-31 西安交通大学 Intelligent TCP congestion control method, system, equipment and storage medium
CN114567597B (en) * 2022-02-21 2023-12-19 深圳市亦青藤电子科技有限公司 Congestion control method and device based on deep reinforcement learning in Internet of things
CN115208821B (en) * 2022-07-18 2023-08-08 广东电网有限责任公司 Cross-network route forwarding method and device based on BP neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012131424A1 (en) * 2011-02-25 2012-10-04 Telefonaktiebolaget L M Ericsson (Publ) Method for introducing network congestion predictions in policy decision
CN107396204A (en) * 2017-06-12 2017-11-24 江苏大学 A kind of P2P video request program node selecting methods based on linear programming and intensified learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012131424A1 (en) * 2011-02-25 2012-10-04 Telefonaktiebolaget L M Ericsson (Publ) Method for introducing network congestion predictions in policy decision
CN107396204A (en) * 2017-06-12 2017-11-24 江苏大学 A kind of P2P video request program node selecting methods based on linear programming and intensified learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"An artificial neural network based fault detection and diagnosis for wireless mesh networks";Akmal Yaqini 等;《2018 Wireless Days (WD)》;20180405;全文 *
"基于神经网络预测算法的网络拥塞控制";陈炜;《计算机信息工程》;20060731(第2期);全文 *
"无线网络中基于强化学习的拥塞控制算法改进";罗颖 等;《自动化仪表》;20140805(第6期);全文 *

Also Published As

Publication number Publication date
CN109194583A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
CN109194583B (en) Network congestion link diagnosis method and system based on deep reinforcement learning
CN111310915B (en) Data anomaly detection defense method oriented to reinforcement learning
CN112668235A (en) Robot control method of DDPG algorithm based on offline model pre-training learning
CN111856925B (en) State trajectory-based confrontation type imitation learning method and device
Whiteson et al. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning
Wang et al. Cell selection with deep reinforcement learning in sparse mobile crowdsensing
CN113570039B (en) Block chain system based on reinforcement learning optimization consensus
CN111917642A (en) SDN intelligent routing data transmission method for distributed deep reinforcement learning
CA3131476A1 (en) Hybrid quantum computation architecture for solving quadratic unconstrained binary optimization problems
CN110135482B (en) Network topology inference method and system based on convolutional neural network
CN111340192B (en) Network path allocation model training method, path allocation method and device
CN113276852B (en) Unmanned lane keeping method based on maximum entropy reinforcement learning framework
Hansson et al. Feedforward neural networks with ReLU activation functions are linear splines
Fortier et al. DOSI: Training artificial neural networks using overlapping swarm intelligence with local credit assignment
McCarthy et al. Imaginary hindsight experience replay: Curious model-based learning for sparse reward tasks
Bhavanasi et al. Dealing with changes: Resilient routing via graph neural networks and multi-agent deep reinforcement learning
CN112422321B (en) Efficient network topology detection method based on gradient guidance
CN115022231B (en) Optimal path planning method and system based on deep reinforcement learning
Chen et al. GAIL-PT: a generic intelligent penetration testing framework with generative adversarial imitation learning
CN115150335A (en) Optimal flow segmentation method and system based on deep reinforcement learning
CN117014355A (en) TSSDN dynamic route decision method based on DDPG deep reinforcement learning algorithm
CN112297012B (en) Robot reinforcement learning method based on self-adaptive model
Thomas et al. Applying artificial neural networks to coherent control experiments: A theoretical proof of concept
CN114189470B (en) Intelligent routing decision protection method and device based on imitation learning
Wang et al. Robust Reinforcement Learning via Adversarial Kernel Approximation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant