CN111988225A - Multi-path routing method based on reinforcement learning and transfer learning - Google Patents

Multi-path routing method based on reinforcement learning and transfer learning Download PDF

Info

Publication number
CN111988225A
CN111988225A CN202010840208.XA CN202010840208A CN111988225A CN 111988225 A CN111988225 A CN 111988225A CN 202010840208 A CN202010840208 A CN 202010840208A CN 111988225 A CN111988225 A CN 111988225A
Authority
CN
China
Prior art keywords
network
neural network
path
learning
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010840208.XA
Other languages
Chinese (zh)
Other versions
CN111988225B (en
Inventor
魏雯婷
张瑞卿
伏丽莹
顾华玺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010840208.XA priority Critical patent/CN111988225B/en
Publication of CN111988225A publication Critical patent/CN111988225A/en
Application granted granted Critical
Publication of CN111988225B publication Critical patent/CN111988225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering

Abstract

The invention provides a multi-path routing method based on reinforcement learning and transfer learning, which is used for solving the technical problem of poor equivalent path load balance in a network environment with less flow data in the prior art and comprises the following implementation steps: constructing a real network Z and an experimental network G with a topology structure consistent with that of the Z; establishing a two-dimensional array H; constructing a multi-path routing model based on reinforcement learning; initializing a flow matrix DM and an equivalent path flow ratio matrix PM; performing iterative training on a multi-path routing model based on reinforcement learning in an experimental network G; migrating global neural network weight parameters in a routing decision model trained in an experimental network G to a real network Z based on a migration learning method; and (3) performing adaptive training on the global neural network initialized in the real network Z to obtain a multi-path routing result according with the real network environment characteristics. The method can be used in data center networks and other scenes.

Description

Multi-path routing method based on reinforcement learning and transfer learning
Technical Field
The invention belongs to the technical field of computer networks, and relates to a multipath routing method based on reinforcement learning and transfer learning, which can be used in the fields of data center networks and the like.
Background
For the network, the routing decision specifies how the data traffic reaches another node from a designated node in the network, and the routing decision can schedule the traffic, so that the load balance of different transmission paths in the network is determined, the numerical value for measuring the network load balance condition is the difference of the bandwidth utilization rate of all equivalent paths between two communication pairs in the network, and the smaller the difference is, the better the equivalent path load balance is. The routing decision can be divided into a traditional routing decision method and a routing decision method based on reinforcement learning, wherein the traditional routing decision algorithm designs a routing rule in advance by manpower, but lacks perception on a network state, so that loads of some equivalent paths are high easily, and traffic in a high-load path cannot be transferred to a low-load path, so that load imbalance is caused. The reinforcement learning is one branch of machine learning, a routing decision algorithm based on the reinforcement learning has higher perception capability on the network flow state, and the data sending quantity of different transmission paths can be dynamically adjusted according to the change of the network flow. When the load in the equivalent path changes, the algorithm can quickly sense and make effective strategy adjustment, and data in the high-load path is adjusted to the low-load path for transmission. The reinforcement learning routing method based on the Q-learning algorithm cannot be applied to a complex network environment, and the reinforcement learning routing method based on the DDPG algorithm has a low convergence speed and cannot effectively react to different network scenes under the condition of less network traffic data, so that the network load is unbalanced. Although the routing method based on reinforcement learning effectively solves the problem of difficult routing decision algorithm design in a large-scale complex network, the dependence of the algorithm training process on data is high, the generalization capability of the trained routing decision model is poor, and when the system slightly changes, the model needs to be recalculated. The transfer learning is also an important branch of machine learning, and is different from a traditional machine learning algorithm, a machine can automatically extract information from a large amount of data, the core idea of the transfer learning is to find the similarity between problems, the application of past experience and knowledge in similar problems is realized, and the training purpose is achieved by utilizing data in other data sets.
For example, in a patent application with publication number CN109361601A entitled "a reinforcement learning-based SDN route planning method", a route decision model based on reinforcement learning is constructed by using a Q-learning algorithm, and network topology information, a traffic matrix and QoS levels thereof are used as inputs of the reinforcement learning route decision model to output a shortest path meeting requirements. The designed reward function contains the QoS level of the traffic in the network, link bandwidth utilization information, and the like. Reinforcement learning continuously interacts with the network model to try and adjust routing decisions. The method finds the shortest forwarding path for each flow, improves the bandwidth utilization rate of links in the network, and reduces network congestion. However, the existing defects are that each flow is only forwarded along the selected fixed shortest path, which easily causes the conditions that the load of part of paths is higher, the difference of the bandwidth utilization rate of the equivalent path is larger, and the load of the equivalent path is unbalanced.
As another patent application with application publication number CN110611619A entitled "intelligent routing decision method based on DDPG reinforcement learning", a DDPG algorithm is used to construct a routing decision model based on reinforcement learning, network traffic matrix information is used as input of the reinforcement learning algorithm, the reinforcement learning algorithm is used to minimize the absolute value of the difference between the maximum bandwidth utilization rate and the minimum bandwidth utilization rate in the equivalent path of the network as an optimization target, and the purpose of load balancing is achieved by dynamically adjusting the data transmission amount in different transmission paths in the network. The method fully utilizes bandwidth resources of different transmission paths in the network and balances network load, but the algorithm only utilizes the numerical values of the maximum bandwidth utilization rate and the minimum bandwidth utilization rate in a group of equivalent paths to judge the quality of the network load balancing state, and the numerical values of other transmission paths in the group are not used, so that the bandwidth of other paths cannot be effectively adjusted, and the load of other paths is unbalanced. In addition, the flow data in the network is difficult to collect, the data set is small, and the training requirement of the reinforcement learning algorithm cannot be met, so that the model only performs well in a small number of network scenes, and in many cases, the network still has unbalanced load distribution.
Disclosure of Invention
The present invention aims to provide a multipath routing method based on reinforcement learning and transfer learning, aiming at the defects of the prior art, and the method is used for solving the technical problem of poor equivalent path load balance in a network environment with less traffic data in the prior art.
The technical idea of the invention is as follows: firstly, an experimental network consistent with a real network topological structure is constructed, flow information in the experimental network is collected, the information is input into a reinforcement learning algorithm to calculate a routing strategy, and an algorithm model meeting requirements is obtained through multiple times of training. Then, transferring the algorithm model obtained by training in the experimental network to a real network, and training out a network model conforming to the real network environment characteristics, which specifically comprises the following steps:
(1) constructing a real network Z and an experimental network G with a topology structure consistent with that of the Z:
constructing a real network Z comprising a server nodes and m switch nodes and an experimental network G with a topological structure consistent with that of Z, wherein each server node is a source node and a destination node of other server nodes, and numbering n equivalent paths formed by connecting each source node with other destination nodes through one or more switch nodes from 1 to n, wherein a is more than or equal to 16, and m is more than or equal to 16;
(2) establishing a two-dimensional array H:
establishing a two-dimensional array H with a source nodes as abscissa and a destination nodes as ordinate, and storing the equivalent path number between each source-destination node pair to the position corresponding to the two-dimensional array H;
(3) constructing a multi-path routing model based on reinforcement learning:
construction of reinforcement learning A3C-based network including global neural network and numaThe global neural network and the local agents both adopt an Actor-Critic neural network structure comprising L full connection layers, and weight parameter sets of the Actor neural network and the Critic neural network in the global neural network are respectively thetagAnd ωgThe weight parameter sets of the Actor neural network and the Critic neural network in the local agent are respectively theta and omega, wherein numa≥10,L≥15;
(4) Initializing traffic matrix DM and equivalent path traffic proportion matrix PM:
initializing a flow matrix DM and an equivalent path flow ratio matrix PM with the sizes of a multiplied by a, and randomly assigning DM to each element in the DMijMore than or equal to 0, and simultaneously assigning each element in the PM to be equal value between (0, 1);
(5) performing iterative training on a multi-path routing model based on reinforcement learning in an experimental network G:
(5a) initializing the set of weight parameters for the local agent to be θ and ω, and the set of weight parameters for the global neural network to be θgAnd ωgThe method comprises the following steps of (1) initializing an empirical playback set D with the length of N, wherein N is larger than 0, according to standard normal distribution; the number of initialization iterations is K, the maximum number of iterations is K, and K is more than or equal to 106The initial sampling state of the network environment is S0Let k equal to 0, S0=0;
(5b) Synchronizing global neural network weight parameter sets to numaIn a local agent, i.e. theta ═ thetag,ω=ωg
(5c) Will be of size DMij×PMijThe flow of the bottleneck link is correspondingly sent to the equivalent path of the G according to the number in the H, the bandwidth utilization rate of the equivalent path corresponding to each number in the H is measured through the SDN controller, the bandwidth utilization rate of the bottleneck link is used as the bandwidth utilization rate of the equivalent path, and the bandwidth utilization rates S of the n equivalent paths are usedtCurrent sampling state as G;
(5d) using a state gain algorithm and passing through StAnd St-1Calculates a state gain vector phi (deltas) while simultaneously dividing StConversion into a feature vector phi (S)t) Then phi (S)t) And Φ (. DELTA.S) as numaThe input of the Actor neural network of the individual local agent is calculated to obtain a route decision behavior vector At
(5e) Obtaining G according to the method of step (5c) and executing AtBandwidth utilization S of the last n equivalent pathst+1And taking the sampling state as the sampling state of G after state transition according to St+1Calculating a reward value R for a network environmentt
(5f) Will St、At、RtAnd St+1Combined empirical information St,At,Rt,StStoring the state of the G into an experience playback set D to realize the transfer of the state of the G;
(5g) randomly sampling M samples from D, calculating parameter update gradient values D omega of omega and parameter update gradient values D theta of theta, wherein { S }k,Ak,Rk,Sk+1Represents the kth sample, and updates ω with d ω and θ with d θ;
(5h) updating the global neural network weight parameters by using the updated omega and theta;
(5i) by routing decision vector AtUpdating the equivalent path traffic ratio matrix PM according to the behavior value corresponding to each path, judging whether K is equal to K, if so, obtaining a routing decision model trained in the experimental network G, otherwise, making K equal to K +1, and executing the step (5 b);
(6) migrating the global neural network weight parameters in the routing decision model obtained by training in the experimental network G to a real network Z based on a migration learning method:
the first L layer weight parameters of the global neural network in the route decision model are made to be unchanged, and the rest L-L layer weight parameters are initialized randomly to be used as the global neural network initialized in the real network Z, so that the migration process is completed;
(7) carrying out adaptive training on the global neural network initialized in the real network Z:
initializing a maximum number of iterations to KT,KT≥104And performing adaptive training on the global neural network initialized in the Z in the real network Z according to the method in the steps (5b) - (5h) to obtain a route decision model according with the Z characteristics.
Compared with the prior art, the invention has the following advantages:
1. because the method calculates the reinforcement learning reward function by using the variance of the bandwidth utilization rate of the equivalent transmission paths in each group, the algorithm finally aims at minimizing the bandwidth utilization rate difference of all equivalent transmission paths in each group, and continuously adjusts the parameters of the algorithm, thereby obtaining a final routing decision model, being capable of accurately routing data on paths with higher load to paths with lower load.
2. Because the invention adopts A3C reinforcement learning algorithm, a plurality of local agents train at the same time, break the relativity of the flow data, improve the convergence of the route decision model, and make the obtained route decision model more accurately adjust the flow proportion of each equivalent path.
3. The invention utilizes the transfer learning method to ensure that the multi-path routing algorithm based on the reinforcement learning can be better applied to the network environment with less flow data, increases the practical value of the reinforcement learning routing algorithm, avoids the condition that a routing decision model is excessively fitted to a small number of network states and is not friendly to other network states, compared with the prior art, the invention utilizes the transfer learning to ensure that the generalization capability of the model is stronger, can improve the performance of the routing decision model in different network scenes and ensure the load balance of the network, and compared with the routing algorithm only using the reinforcement learning, the invention improves the adaptability of the routing decision model to the network environment and ensures the load balance of different network scenes.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
FIG. 2 is a flow chart of an implementation of the present invention for iterative training of a single agent in a routing method based on reinforcement learning and transfer learning.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments.
Referring to fig. 1, the implementation steps of the invention are as follows:
step 1), constructing a real network Z and an experimental network G which is consistent with the Z topological structure:
constructing a real network Z comprising a server nodes and m switch nodes and an experimental network G with a topology structure consistent with that of Z, wherein each server node is a source node and a destination node of other server nodes, and numbering n equivalent paths formed by connecting each source node with other destination nodes through one or more switch nodes from 1 to n, wherein a is more than or equal to 16, m is more than or equal to 16, in the example, a fat-tree topology comprising 16 server nodes is selected, and in the topology, a is 16, and m is 20;
step 2), establishing a two-dimensional array H:
establishing a two-dimensional array H with a source nodes as abscissa and a destination nodes as ordinate, and storing the equivalent path number between each source-destination node pair to the position corresponding to the two-dimensional array H;
step 3), constructing a multi-path routing model based on reinforcement learning:
construction of reinforcement learning A3C-based network including global neural network and numaMultiple paths for independent local agentsThe routing model, the global neural network and the local agent all adopt an Actor-Critic neural network structure comprising L full connection layers, and the weight parameter sets of the Actor-Critic neural network and the Critic neural network in the global neural network are respectively thetagAnd ωgThe weight parameter sets of the Actor neural network and the Critic neural network in the local agent are respectively theta and omega, wherein numaMore than or equal to 10, and L more than or equal to 15; the A3C algorithm utilizes a distributed scheme to improve the convergence of the Actor-critical neural network in the reinforcement learning algorithm, and the complexity of the algorithm is low, in this example, num is includeda10 agents carry out algorithm training, L15, the Actor neural network is used for calculating routing behaviors, the criticic neural network evaluates a routing result calculated by the Actor neural network according to the difference of two adjacent network state evaluation values, and the evaluation result can increase the probability that a good decision of the Actor neural network is selected;
step 4), initializing a flow matrix DM and an equivalent path flow ratio matrix PM:
initializing a flow matrix DM and an equivalent path flow ratio matrix PM with the sizes of a multiplied by a, and randomly assigning DM to each element in the DMijMore than or equal to 0, and simultaneously assigning each element in the PM to be equal value between (0, 1);
step 5), performing iterative training on the multi-path routing model based on reinforcement learning in the experimental network G:
(5a) initializing the set of weight parameters for the local agent to be θ and ω, and the set of weight parameters for the global neural network to be θgAnd ωgAll obey standard normal distribution, complete random initialization of the neural network, and initialize an experience playback set D with the length of N, wherein N is more than or equal to 104(ii) a The number of initialization iterations is K, the maximum number of iterations is K, and K is more than or equal to 106The initial sampling state of the network environment is S0Let k equal to 0, S0The experience playback set in this example stores the network state and change information available over a period of time, N104
Referring to FIG. 2, the specific steps for training a single agent routing decision model are described in further detail:
(5b) synchronizing global neural network weight parameter sets to numaIn a local agent, i.e. theta ═ thetag,ω=ωg
(5c) Will be of size DMij×PMijThe flow of the bottleneck link is correspondingly sent to the equivalent path of the G according to the number in the H, the bandwidth utilization rate of the equivalent path corresponding to each number in the H is measured through the SDN controller, the bandwidth utilization rate of the bottleneck link is used as the bandwidth utilization rate of the equivalent path, and the bandwidth utilization rates S of the n equivalent paths are usedtAs the current sampling state of G, where the bottleneck link refers to a link with the maximum bandwidth utilization rate in an equivalent path where the source node and the destination node are the same, each element in the traffic demand matrix DM represents the size of traffic that needs to be sent from one source node to another destination node, and each entry in the equivalent path traffic proportion matrix PM represents the proportion of data volume that needs to be carried by each path of each group of equivalent paths;
(5d) using a state gain algorithm and passing through StAnd St-1Calculates a state gain vector phi (deltas) while simultaneously dividing StConversion into a feature vector phi (S)t) Then phi (S)t) And Φ (. DELTA.S) as numaThe input of the Actor neural network of the individual local agent is calculated to obtain a route decision behavior vector AtThe calculation formula is as follows:
At=π((Φ(St)+Φ(ΔS)),θ)
where pi represents a behavior decision algorithm whose inputs are two parameters Φ (S)t) The result of addition with Φ (Δ S);
(5e) obtaining G according to the method of step (5c) and executing AtBandwidth utilization S of the last n equivalent pathst+1And taking the sampling state as the sampling state of G after state transition according to St+1Calculating a reward value R for a network environmenttThe calculation method comprises dividing equivalent paths with the same source node and destination node into a group, calculating the variance of the bandwidth utilization rate of each group of equivalent paths, adding and summing the variances of all groups to obtain the reward value R of the reinforcement learning algorithmtBecause the load balancing degree and the network throughput are completely unified, in a multi-path network structure, the equivalent path load balancing can improve the overall throughput performance of the network, and therefore the difference of the bandwidth utilization rate of the minimized equivalent path is taken as an optimization target;
(5f) will St、At、RtAnd St+1Combined empirical information St,At,Rt,StStoring the state of the G into an experience playback set D to realize the transfer of the state of the G;
(5g) randomly sampling M samples from D, wherein M is more than or equal to 128, calculating a parameter update gradient value D omega of omega, updating a local Critic neural network weight parameter omega, calculating a parameter update gradient value D theta of theta, and updating a local Actor neural network weight parameter theta, wherein { S ≧ S { (S) } S { (S) }k,Ak,Rk,Sk+1Denotes the kth sample, where M is 256 in this example, and the calculation formula is:
Figure BDA0002640722150000071
Figure BDA0002640722150000072
θ=θ-αdθ
ω=ω-βdω
v represents a behavior value algorithm, pi represents a behavior decision algorithm, noise is added to neural network parameters instead of noise added to an action space for random noise values added to theta and omega, the method is more reasonable, the value of the random number is 00.3, the amplitude is reduced along with the increase of a k value of iteration times, so that the influence of the random number on the decision of the intelligent agent is gradually reduced, the probability of outputting an optimal routing result by the Actor neural network of the intelligent agent is improved, alpha and beta are the learning rates of the Actor neural network and the criticic neural network in the local intelligent agent respectively, and the values are all constant 0.01;
(5h) updating the global neural network weight parameters by using the updated omega and theta, wherein the updating method comprises the following steps:
θg←τθg+(1-τ)θ
ωg←τωg+(1-τ)ω
wherein τ is learning efficiency, and τ is 0.8;
(5i) by routing decision vector AtUpdating the equivalent path traffic ratio matrix PM according to the behavior value corresponding to each path in the routing table, and determining whether K is equal to K, if so, obtaining a trained routing decision model, otherwise, making K equal to K +1, and executing step (5b), where K is equal to 10 in this example6
Step 6), migrating the global neural network weight parameters in the routing decision model trained in the experimental network G to a real network Z based on a migration learning method:
the first L layer weight parameters of the global neural network in the routing decision model trained in the experimental network G are made to be unchanged, the remaining L-L layer weight parameters are initialized randomly to serve as the global neural network initialized in the real network Z, and the migration process is completed, because the local change of the network environment has little influence on the state distribution of network traffic data, the experience and knowledge of the routing decision model in the original network can be kept, the convergence of the model in the real network environment is improved, the training speed of the model is accelerated, and the low layer neural network parameters are mainly used for extracting and sensing the general characteristics in the communication network, so the first L layer weight parameters are made to be unchanged, in the example, L is 15, and L is 10;
step 7), performing adaptive training on the global neural network initialized in the real network Z:
initializing a maximum number of iterations to KT,KT≥104And performing adaptive training on the global neural network initialized in the Z in the real network Z according to the method of the steps (5b) - (5h) to obtain a route decision model according with the characteristics of the Z, wherein K is used in the exampleT=104And the number of iterations in the experimental network is far less, so that the training speed of the model is proved to be accelerated.

Claims (6)

1. A multipath routing method based on reinforcement learning and transfer learning is characterized by comprising the following steps:
(1) constructing a real network Z and an experimental network G with a topology structure consistent with that of the Z:
constructing a real network Z comprising a server nodes and m switch nodes and an experimental network G with a topological structure consistent with that of Z, wherein each server node is a source node and a destination node of other server nodes, and numbering n equivalent paths formed by connecting each source node with other destination nodes through one or more switch nodes from 1 to n, wherein a is more than or equal to 16, and m is more than or equal to 16;
(2) establishing a two-dimensional array H:
establishing a two-dimensional array H with a source nodes as abscissa and a destination nodes as ordinate, and storing the equivalent path number between each source-destination node pair to the position corresponding to the two-dimensional array H;
(3) constructing a multi-path routing model based on reinforcement learning:
construction of reinforcement learning A3C-based network including global neural network and numaThe global neural network and the local agents both adopt an Actor-Critic neural network structure comprising L full connection layers, and weight parameter sets of the Actor neural network and the Critic neural network in the global neural network are respectively thetagAnd ωgThe weight parameter sets of the Actor neural network and the Critic neural network in the local agent are respectively theta and omega, wherein numa≥10,L≥15;
(4) Initializing traffic matrix DM and equivalent path traffic proportion matrix PM:
initializing a flow matrix DM and an equivalent path flow ratio matrix PM with the sizes of a multiplied by a, and randomly assigning DM to each element in the DMijMore than or equal to 0, and simultaneously assigning each element in the PM to be equal value between (0, 1);
(5) performing iterative training on a multi-path routing model based on reinforcement learning in an experimental network G:
(5a) initializing a set of weight parameters for a local agentThe total theta and omega, and the weight parameter set of the global neural network is total thetagAnd ωgAll obey the standard normal distribution; initializing an empirical playback set D of length N, N ≧ 104(ii) a The number of initialization iterations is K, the maximum number of iterations is K, and K is more than or equal to 106The initial sampling state of the network environment is S0And let k equal to 0, S0=0;
(5b) Setting a global neural network weight parameter thetagAnd ωgSynchronise to numaIn a local agent, i.e. theta ═ thetag,ω=ωg
(5c) Will be of size DMij×PMijThe flow of the bottleneck link is correspondingly sent to the equivalent path of the G according to the number in the H, the bandwidth utilization rate of the equivalent path corresponding to each number in the H is measured through the SDN controller, the bandwidth utilization rate of the bottleneck link is used as the bandwidth utilization rate of the equivalent path, and the bandwidth utilization rates S of the n equivalent paths are usedtCurrent sampling state as G;
(5d) using a state gain algorithm and passing through StAnd St-1Calculates a state gain vector phi (deltas) while simultaneously dividing StConversion into a feature vector phi (S)t) Then phi (S)t) And Φ (. DELTA.S) as numaThe input of the Actor neural network of the individual local agent is calculated to obtain a route decision behavior vector At
(5e) Obtaining G according to the method of step (5c) and executing AtBandwidth utilization S of the last n equivalent pathst+1And taking the sampling state as the sampling state of G after state transition according to St+1Calculating a reward value R for a network environmentt
(5f) Will St、At、RtAnd St+1Combined empirical information St,At,Rt,StStoring the state of the G into an experience playback set D to realize the transfer of the state of the G;
(5g) randomly sampling M samples from D, wherein M is more than or equal to 128, calculating the parameter update gradient value D omega of omega and the parameter update gradient value D theta of theta, wherein { S ≧ Sk,Ak,Rk,Sk+1Represents the kth sample, and updates ω with d ω and θ with d θ;
(5h) updating the global neural network weight parameters by using the updated omega and theta;
(5i) by routing decision vector AtUpdating the equivalent path traffic ratio matrix PM according to the behavior value corresponding to each path, judging whether K is equal to K, if so, obtaining a routing decision model trained in the experimental network G, otherwise, making K equal to K +1, and executing the step (5 b);
(6) migrating global neural network weight parameters in a routing decision model trained in an experimental network G to a real network Z based on a migration learning method:
the first L layer weight parameters of the global neural network in the routing decision model obtained by training in the experimental network G are unchanged, and the rest L-L layer weight parameters are initialized randomly to be used as the global neural network initialized in the real network Z, so that the migration process is completed;
(7) carrying out adaptive training on the global neural network initialized in the real network Z:
initializing a maximum number of iterations to KT,KT≥104And performing adaptive training on the global neural network initialized in the Z in the real network Z according to the method in the steps (5b) - (5h) to obtain a route decision model according with the Z characteristics.
2. The multi-path routing method based on reinforcement learning and migration learning of claim 1, wherein the bottleneck link in step (5c) is a link with maximum bandwidth utilization in an equivalent path where the source node and the destination node are the same.
3. The reinforcement learning and migration learning-based multipath routing method of claim 1, wherein the routing decision behavior vector A in step (5d)tThe calculation formula is as follows:
At=π((Φ(St)+Φ(ΔS)),θ)
where pi represents a behavioral decision algorithm.
4. The reinforcement learning and migration learning-based multi-path routing method as claimed in claim 1, wherein the reward value R in step (5e)tThe calculation method comprises dividing equivalent paths with the same source node and destination node into a group, calculating the variance of the bandwidth utilization rate of each group of equivalent paths, adding and summing the variances of all groups to obtain the reward value R of the reinforcement learning algorithmt
5. The multi-path routing method based on reinforcement learning and migration learning of claim 1, wherein the parameter update gradient values d ω and θ of ω in step (5g) update the gradient values d θ, ω is updated by d ω, θ is updated by d θ, and the calculation formulas are:
dθ←dθ+▽θlogπ(Ak|Sk;θ;)(R-V(Sk;ω;))
dω←dω+▽ω(R-V(Sk;ω;))2
θ=θ-αdθ
ω=ω-βdω
wherein V represents a behavior value algorithm, pi represents a behavior decision algorithm, random numbers in the range of 00.3 are taken for random noise values added on theta and omega, alpha and beta are respectively the learning rates of an Actor neural network and a Critic neural network in a local agent, and the values are all constant 0.01.
6. The multi-path routing method based on reinforcement learning and migration learning of claim 1, wherein the global neural network weight parameters are updated by using the updated ω and θ in step (5h), and the updating method is as follows:
θg←τθg+(1-τ)θ
ωg←τωg+(1-τ)ω
where τ is learning efficiency, and τ is 0.8.
CN202010840208.XA 2020-08-19 2020-08-19 Multi-path routing method based on reinforcement learning and transfer learning Active CN111988225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010840208.XA CN111988225B (en) 2020-08-19 2020-08-19 Multi-path routing method based on reinforcement learning and transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010840208.XA CN111988225B (en) 2020-08-19 2020-08-19 Multi-path routing method based on reinforcement learning and transfer learning

Publications (2)

Publication Number Publication Date
CN111988225A true CN111988225A (en) 2020-11-24
CN111988225B CN111988225B (en) 2022-03-04

Family

ID=73435767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010840208.XA Active CN111988225B (en) 2020-08-19 2020-08-19 Multi-path routing method based on reinforcement learning and transfer learning

Country Status (1)

Country Link
CN (1) CN111988225B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783199A (en) * 2020-12-25 2021-05-11 北京航空航天大学 Unmanned aerial vehicle autonomous navigation method based on transfer learning
CN112822109A (en) * 2020-12-31 2021-05-18 上海缔安科技股份有限公司 SDN core network QoS route optimization algorithm based on reinforcement learning
CN112866015A (en) * 2021-01-07 2021-05-28 华东师范大学 Intelligent energy-saving control method based on data center network flow prediction and learning
CN113518039A (en) * 2021-03-03 2021-10-19 山东大学 Deep reinforcement learning-based resource optimization method and system under SDN architecture
CN114221691A (en) * 2021-12-17 2022-03-22 南京工业大学 Software-defined air-space-ground integrated network route optimization method based on deep reinforcement learning
CN114866494A (en) * 2022-07-05 2022-08-05 之江实验室 Reinforced learning intelligent agent training method, modal bandwidth resource scheduling method and device
CN115022231A (en) * 2022-06-30 2022-09-06 武汉烽火技术服务有限公司 Optimal path planning method and system based on deep reinforcement learning
CN117202239A (en) * 2023-11-06 2023-12-08 深圳市四海伽蓝电子科技有限公司 Method and system for unified management of wireless network bridge network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101835239A (en) * 2010-03-09 2010-09-15 西安电子科技大学 Multi-path delay sensing optimal route selecting method for cognitive network
CN110611619A (en) * 2019-09-12 2019-12-24 西安电子科技大学 Intelligent routing decision method based on DDPG reinforcement learning algorithm
CN110703766A (en) * 2019-11-07 2020-01-17 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network
US20200195577A1 (en) * 2018-12-17 2020-06-18 Electronics And Telecommunications Research Institute System and method for selecting optimal path in multi-media multi-path network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101835239A (en) * 2010-03-09 2010-09-15 西安电子科技大学 Multi-path delay sensing optimal route selecting method for cognitive network
US20200195577A1 (en) * 2018-12-17 2020-06-18 Electronics And Telecommunications Research Institute System and method for selecting optimal path in multi-media multi-path network
CN110611619A (en) * 2019-09-12 2019-12-24 西安电子科技大学 Intelligent routing decision method based on DDPG reinforcement learning algorithm
CN110703766A (en) * 2019-11-07 2020-01-17 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XINGYI CHENG,ETC: "DeepTransport Learning Spatial-Temporal Dependency for Traffic Condition Forecasting", 《IEEE》 *
刘博等: "异质Agent间的知识迁移强化学习", 《中国科技论文在线》 *
王桂芝等: "机器学习在SDN路由优化中的应用研究综述", 《计算机研究与发展》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783199B (en) * 2020-12-25 2022-05-13 北京航空航天大学 Unmanned aerial vehicle autonomous navigation method based on transfer learning
CN112783199A (en) * 2020-12-25 2021-05-11 北京航空航天大学 Unmanned aerial vehicle autonomous navigation method based on transfer learning
CN112822109A (en) * 2020-12-31 2021-05-18 上海缔安科技股份有限公司 SDN core network QoS route optimization algorithm based on reinforcement learning
CN112822109B (en) * 2020-12-31 2023-04-07 上海缔安科技股份有限公司 SDN core network QoS route optimization method based on reinforcement learning
CN112866015A (en) * 2021-01-07 2021-05-28 华东师范大学 Intelligent energy-saving control method based on data center network flow prediction and learning
CN113518039B (en) * 2021-03-03 2023-03-24 山东大学 Deep reinforcement learning-based resource optimization method and system under SDN architecture
CN113518039A (en) * 2021-03-03 2021-10-19 山东大学 Deep reinforcement learning-based resource optimization method and system under SDN architecture
CN114221691A (en) * 2021-12-17 2022-03-22 南京工业大学 Software-defined air-space-ground integrated network route optimization method based on deep reinforcement learning
CN115022231A (en) * 2022-06-30 2022-09-06 武汉烽火技术服务有限公司 Optimal path planning method and system based on deep reinforcement learning
CN115022231B (en) * 2022-06-30 2023-11-03 武汉烽火技术服务有限公司 Optimal path planning method and system based on deep reinforcement learning
CN114866494A (en) * 2022-07-05 2022-08-05 之江实验室 Reinforced learning intelligent agent training method, modal bandwidth resource scheduling method and device
CN114866494B (en) * 2022-07-05 2022-09-20 之江实验室 Reinforced learning intelligent agent training method, modal bandwidth resource scheduling method and device
CN117202239A (en) * 2023-11-06 2023-12-08 深圳市四海伽蓝电子科技有限公司 Method and system for unified management of wireless network bridge network
CN117202239B (en) * 2023-11-06 2024-02-20 深圳市四海伽蓝电子科技有限公司 Method and system for unified management of wireless network bridge network

Also Published As

Publication number Publication date
CN111988225B (en) 2022-03-04

Similar Documents

Publication Publication Date Title
CN111988225B (en) Multi-path routing method based on reinforcement learning and transfer learning
CN110611619B (en) Intelligent routing decision method based on DDPG reinforcement learning algorithm
CN109818865B (en) SDN enhanced path boxing device and method
CN112437020B (en) Data center network load balancing method based on deep reinforcement learning
CN113328938B (en) Network autonomous intelligent management and control method based on deep reinforcement learning
CN109039942B (en) Network load balancing system and balancing method based on deep reinforcement learning
CN112486690B (en) Edge computing resource allocation method suitable for industrial Internet of things
CN108566659B (en) 5G network slice online mapping method based on reliability
CN110784366B (en) Switch migration method based on IMMAC algorithm in SDN
CN105515987B (en) A kind of mapping method based on SDN framework Virtual optical-fiber networks
CN114567598B (en) Load balancing method and device based on deep learning and cross-domain cooperation
CN110891019B (en) Data center flow scheduling method based on load balancing
CN111917642B (en) SDN intelligent routing data transmission method for distributed deep reinforcement learning
CN116527567B (en) Intelligent network path optimization method and system based on deep reinforcement learning
CN114143264B (en) Flow scheduling method based on reinforcement learning under SRv network
CN108964746B (en) Time-varying satellite network multi-topology searching shortest routing method
CN108400940B (en) A kind of multicast virtual network function dispositions method based on Estimation of Distribution Algorithm
CN114707575B (en) SDN multi-controller deployment method based on AP clustering
Hu et al. EARS: Intelligence-driven experiential network architecture for automatic routing in software-defined networking
CN108111335A (en) A kind of method and system dispatched and link virtual network function
CN113612692B (en) Centralized optical on-chip network self-adaptive route planning method based on DQN algorithm
CN111885493B (en) Micro-cloud deployment method based on improved cuckoo search algorithm
CN115225561A (en) Route optimization method and system based on graph structure characteristics
CN114828146A (en) Routing method for geographical position of unmanned cluster based on neural network and iterative learning
CN115225512B (en) Multi-domain service chain active reconfiguration mechanism based on node load prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant