CN111917642A - SDN intelligent routing data transmission method for distributed deep reinforcement learning - Google Patents
SDN intelligent routing data transmission method for distributed deep reinforcement learning Download PDFInfo
- Publication number
- CN111917642A CN111917642A CN202010673851.8A CN202010673851A CN111917642A CN 111917642 A CN111917642 A CN 111917642A CN 202010673851 A CN202010673851 A CN 202010673851A CN 111917642 A CN111917642 A CN 111917642A
- Authority
- CN
- China
- Prior art keywords
- network
- parameter
- actor
- local
- evaluator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000005540 biological transmission Effects 0.000 title claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims abstract description 6
- 230000009471 action Effects 0.000 claims description 48
- 238000013527 convolutional neural network Methods 0.000 claims description 20
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 8
- 230000004083 survival effect Effects 0.000 abstract description 2
- 230000008859 change Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/12—Shortest path evaluation
- H04L45/124—Shortest path evaluation using a combination of metrics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/12—Shortest path evaluation
- H04L45/121—Shortest path evaluation by minimising delays
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/12—Shortest path evaluation
- H04L45/125—Shortest path evaluation based on throughput or bandwidth
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a distributed deep reinforcement learning SDN network intelligent routing data transmission method, which realizes the calculation of a fast routing path, maximizes the throughput under the condition of ensuring delay and solves the problems of low speed and low throughput of the traditional algorithm. The invention uses the reinforcement learning algorithm, the algorithm simplifies the route calculation process into simple input and output, avoids multiple iterations during calculation so as to realize the rapid calculation of the route path, the speed of the route algorithm is accelerated, the forwarding delay is reduced, the data packet which is discarded due to the expiry of ttl originally has a more probable survival rate and is successfully forwarded, and the network throughput is increased. The invention is provided with two stages of off-line training and on-line training, and the parameters are updated in a dynamic environment to select the optimal path, so that the invention has topology self-adaptability.
Description
Technical Field
The invention belongs to the field of data transmission, and particularly relates to an SDN intelligent routing data transmission method for distributed deep reinforcement learning.
Background
The current information technology is in a mature stage, data flow in an SDN (Software Defined Network) architecture is flexible and controllable, a controller has a full Network view and can sense Network state change (such as flow distribution, congestion condition, link utilization condition and the like) in real time, in reality, the routing problem is often solved through a shortest path algorithm, and some simple Network parameters (such as path hop count, time delay and the like) are used as optimization indexes of the algorithm to find a path with the least hop count or a path with the least time delay as a final target of the algorithm. The single measurement standard and the optimization target easily cause the congestion of part of the key links, thereby causing the problem of unbalanced network load. Although the optimal path with multiple compound constraint conditions can be found by the shortest routing algorithm based on Lagrange relaxation during multi-service path distribution, the heuristic routing algorithm can calculate the optimal path only through multiple iterations, and has the advantages of low convergence speed, poor timeliness and low throughput.
Disclosure of Invention
Aiming at the defects in the prior art, the SDN intelligent routing data transmission method for the distributed deep reinforcement learning solves the problems in the prior art.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a SDN intelligent routing data transmission method for distributed deep reinforcement learning comprises the following steps:
s1, constructing a reward function and a deep reinforcement learning model comprising an actor network and an evaluator network, and arranging the deep reinforcement learning model in an application layer of the SDN network;
s2, randomly initializing actor network parameter theta of deep reinforcement learning modelaAnd evaluator network parameter θc;
S3, randomly initializing ith local GPU in control layer of SDN networkiLocal actor parameter θ 'of Upper actor network'aAnd local evaluator parameter θ 'of evaluator network'c;
S4, according to the reward function and the actor network parameter thetaaEvaluator network parameter θcLocal actor parameter θ'aAnd local evaluator parameter θ'cUsing A3C algorithm to the ith local GPUiThe above deep reinforcement learning model is used for off-line training and updating the actor network parameter thetaaAnd evaluator network parameter θc;
S5, updating the actor network parameter thetaaAnd updated evaluator network parameter θcActing on the whole SDN network, and transmitting data by using the SDN network after updating parameters;
s6, regularly detecting whether the topological structure of the SDN network changes, if so, entering the step S7, otherwise, repeating the step S6;
s7, carrying out on-line training on the deep reinforcement learning model, and using the self-adaptive operation algorithm to carry out the on-line training on the actor network parameter thetaaAnd evaluator network parameter θcUpdating and updating the actor network parameter thetaaAnd evaluator network parameter θcActing on the whole SDN network, and transmitting data by using the SDN network after updating parameters;
where i ═ 1, 2., L represents the total number of local GPUs.
Further, the actor network in the step S1 is a fully connected neural network, and the evaluator network in the step S1 is a combined network of the fully connected neural network and the CNN convolutional neural network; the input of the actor network and the evaluator network comprise network states of the SDN network, the network states comprise current node information, destination node information, bandwidth requirements and delay requirements, and the input of the evaluator network further comprises network characteristics of the SDN network processed by the CNN convolutional neural network; the CNN convolutional neural network comprises an input layer, a convolutional layer, a pooling layer, a full-connection layer and an output layer which are sequentially connected.
Further, the incentive function in step S1 is:
wherein the content of the first and second substances,is shown in state snIn case of (a), the nth routing node in the SDN network makes an action a to the mth routing nodenThe reward value obtained later; g denotes an action penalty, a1Represents a first weight, a2Representing a second weight, c (n) representing a remaining capacity of an nth routing node, c (m) representing a remaining capacity of an mth routing node, c (l) representing a remaining capacity of an ith link in the SDN network, d (n) representing a degree of difference in traffic load between the nth routing node and its neighboring nodes, and d (m) representing a degree of difference in traffic load between the mth routing node and its neighboring nodes; the state snThe method comprises the following steps: the node where the data packet is located is the nth routing node, the final destination node of the data packet, the forwarding bandwidth requirement of the datagram and the delay requirement of the data packet; the action anIs shown in state snAll forwarding operations that may be taken in case of (1).
Further, the step S4 includes the following sub-steps:
s41, setting the first counter T to 0, the second counter T to 0, and the maximum iteration number TmaxAnd routing hop count limit tmax;
S42, let d thetaa0 and d θcSynchronizing the local parameter with the global parameter to obtain a local actor parameter thetaa' the value of is synchronized to the actor network parameter θaIs a local evaluator parameter thetac' value synchronization as evaluator network parameter θcA value of (d);
s43, let the first intermediate count value tstartT, by local GPUiReading the state s at the current momentt;
S44, obtaining strategy pi (a) through actor networkt|st;θ′a) And according to the strategy pi (a)t|st;θ′a) Performing action atWherein, pi (a)t|st;θ′a) Is shown in state stAnd local GPUiGo local actor parameter θ'aThe action to be performed in the case of (a) ist;
S45, acquiring and executing action atThe latter prize value rtAnd new state st+1And the count value of the first counter t is increased by one;
s46, judging new state StIf the condition defined by the final state is met, if yes, setting the updated reward value R to be 0, and entering the step S48, otherwise, entering the step S47;
s47, judging t-tstartWhether greater than the routing hop limit tmaxIf yes, the updated reward value R is set as V(s)t,θ′c) And proceeds to step S48, otherwise returns to step S44, where V (S)t,θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state stThe routing policy evaluation value of (1);
s48, setting t-1 as third counter z and updating R as gradientupdata=rz+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s49, updating the reward value R according to the gradientupdataLocal actor parameter θ'aAnd local evaluator parameter θ'cObtaining a local actor parameter gradient Δ θaAnd local actor parameter gradient Δ θcThe update values of (a) are:
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,representing a local actor parameter θ'aDerivative of (d), log pi (a)z|sz;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action azLogarithm of the probability of this strategy, rzIndicating the execution of action azγ represents the discount rate of the prize, V(s)z;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state szIs a routing policy evaluation value, Δ θc_updataRepresenting a gradient Δ θcThe updated value of (a) is set,represents a pair (R)updata-V(sz;θ′c))2Get theta'cPartial derivatives of (d);
s410, let Delta thetaa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determining whether the third counter z is equal to the first intermediate count value tstartIf yes, go to step S411, otherwise, decrease the count value of the third counter z by one, update the gradient with the reward value RupdataIs updated to rz+ γ R, and return to step S49;
s411, judging whether the second counter T is larger than or equal to the maximum iteration number TmaxIf so, the local actor parameter gradient Δ θ is usedaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcAnd ending the updating process, otherwise, incrementing the count value of the second counter T by one, and returning to step S42.
Further, a local actor parameter gradient Δ θ is used in the step S411aAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcThe formula of (1) is:
θa_updata=θa+βΔθa
θc_updata=θc+βΔθc
wherein, thetaa_updataRepresenting updated actor network parameters thetaa,θc_updataRepresenting updated evaluator network parameter θcAnd beta represents the local GPUiWeights in an SDN network.
Further, the step S7 includes the following sub-steps:
s71, setting a fourth counter j to be 1, and collecting a routing request task f;
s72, distributing the routing request task f to an idle GPU in the SDN network, wherein the idle GPU is the GPUidle;
S73, setting d thetaa0 and d θc0, and will GPUidleLocal actor parameter of [ theta ]'aSynchronizing to an actor network parameter θaParameter value, local evaluator parameter θ'cSynchronizing to evaluator network parameter θcA parameter value;
s74, let the second intermediate count value jstartJ and reads the initial state s at the current timej;
S75, obtaining the state S through the actor networkjAnd local actor parameter θ'aIn case of performing action ajStrategy of (a)j|sj;θ′a) And implements the strategy pi (a)j|sj;θ′a);
S76, acquiring and executing action ajThe latter prize value rjAnd new state sj+1Incrementing the count value of the fourth counter j by one, and performing action ajAdding an action set A;
s77, judging new state SjWhether the condition defined by the final state of the routing request task f is achieved, if yes, the step S78 is entered, otherwise, the step S75 is returned;
s78, obtaining the route path p according to the action set A, judging whether the route request task f is matched with the route path p, if yes, making the update reward value R equal to 0, and proceeding to the step S79, otherwise, making the update reward value R equal to V (S)j,θ′c) And proceeds to step S79;
s79, setting the fifth counter k to j-1 and updating the reward value R in gradientupdata=rk+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s710, updating the reward value R according to the gradientupdataLocal actor parameter θ'aAnd local evaluator parameter θ'cObtaining a local actor parameter gradient Δ θaAnd local actor parameter gradient Δ θcThe update values of (a) are:
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,representing a local actor parameter θ'aDerivative of (d), log pi (a)k|sk;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action akLogarithm of the probability of this strategy, rkIndicating the execution of action akγ represents the discount rate of the prize, V(s)k;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state skIs a routing policy evaluation value, Δ θc_updataRepresenting a gradient Δ θcThe updated value of (a) is set,represents a pair (R)updata-V(sk;θ′c))2Get theta'cPartial derivatives of (d);
s711, let Δ θa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determines whether the fifth counter k is equal to the second intermediate count value jstartIf yes, go to step S712, otherwise, decrease the count value of the fifth counter k by one, update the gradient with the reward value RupdataIs updated to rk+ γ R, and return to step S710;
s712, passing the local actor parameter gradient delta thetaaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcAnd the actor network parameter theta is calculatedaAnd evaluator network parameter θcAnd acting on the whole SDN network, and transmitting data by using the SDN network after updating the parameters.
The invention has the beneficial effects that:
(1) the invention realizes the calculation of the fast routing path, maximizes the throughput under the condition of ensuring the delay and solves the problems of low speed and low throughput of the traditional algorithm.
(2) The invention uses the reinforcement learning algorithm, the algorithm simplifies the route calculation process into simple input and output, avoids multiple iterations during calculation so as to realize the rapid calculation of the route path, the speed of the route algorithm is accelerated, the forwarding delay is reduced, the data packet which is discarded due to the expiry of ttl originally has a more probable survival rate and is successfully forwarded, and the network throughput is increased.
(3) The invention is provided with two stages of off-line training and on-line training, and the parameters are updated in a dynamic environment to select the optimal path, so that the invention has topology self-adaptability.
(4) The invention sets the reward function, so that the node or link load, the routing requirement and the network topology information better constrain the training process of reinforcement learning, and the trained deep reinforcement learning model can more accurately execute the routing task.
Drawings
Fig. 1 is a flowchart of a distributed deep reinforcement learning SDN network intelligent routing data transmission method according to the present invention;
FIG. 2 is a schematic diagram of a CNN convolutional neural network according to the present invention;
FIG. 3 is a diagram of a deep reinforcement learning model according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a method for transmitting smart routing data in a distributed deep reinforcement learning SDN network includes the following steps:
s1, constructing a reward function and a deep reinforcement learning model comprising an actor network and an evaluator network, and arranging the deep reinforcement learning model in an application layer of the SDN network;
s2, randomly initializing actor network parameter theta of deep reinforcement learning modelaAnd evaluator network parameter θc;
S3, randomly initializing ith local GPU in control layer of SDN networkiLocal actor parameter θ 'of Upper actor network'aAnd local evaluator parameter θ 'of evaluator network'c;
S4, according to the reward function and the actor network parameter thetaaEvaluator network parameter θcLocal actor parameter θ'aAnd local evaluator parameter θ'cUsing A3C algorithm to the ith local GPUiThe above deep reinforcement learning model is used for off-line training and updating the actor network parameter thetaaAnd evaluator network parameter θc;
S5, updating the actor network parameter thetaaAnd updated evaluator network parameter θcActing on SDN network global, usage updatesThe SDN network after the parameters transmits data;
s6, regularly detecting whether the topological structure of the SDN network changes, if so, entering the step S7, otherwise, repeating the step S6;
s7, carrying out on-line training on the deep reinforcement learning model, and using the self-adaptive operation algorithm to carry out the on-line training on the actor network parameter thetaaAnd evaluator network parameter θcUpdating and updating the actor network parameter thetaaAnd evaluator network parameter θcActing on the whole SDN network, and transmitting data by using the SDN network after updating parameters;
where i ═ 1, 2., L represents the total number of local GPUs.
The actor network in the step S1 is a fully-connected neural network, and the evaluator network in the step S1 is a combined network of the fully-connected neural network and the CNN convolutional neural network; the input of the actor network and the evaluator network comprise network states of the SDN network, the network states comprise current node information, destination node information, bandwidth requirements and delay requirements, and the input of the evaluator network further comprises network characteristics of the SDN network processed by the CNN convolutional neural network.
As shown in fig. 2, the CNN convolutional neural network includes an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer, which are connected in sequence.
In step S1, the excitation function is:
wherein the content of the first and second substances,is shown in state snIn case of (a), the nth routing node in the SDN network makes an action a to the mth routing nodenThe reward value obtained later; g denotes an action penalty, a1Represents a first weight, a2Represents the second weight, c (n) represents the remaining capacity of the nth routing node, c (m) represents the remaining capacity of the mth routing nodeVolume, c (l) represents the remaining capacity of the l link in the SDN network, d (n) represents the degree of difference of the traffic load of the n routing node and its neighboring nodes, d (m) represents the degree of difference of the traffic load of the m routing node and its neighboring nodes; the state snThe method comprises the following steps: the node where the data packet is located is the nth routing node, the final destination node of the data packet, the forwarding bandwidth requirement of the datagram and the delay requirement of the data packet; the action anIs shown in state snAll forwarding operations that may be taken in case of (1).
The step S4 includes the following sub-steps:
s41, setting the first counter T to 0, the second counter T to 0, and the maximum iteration number TmaxAnd routing hop count limit tmax;
S42, let d thetaa0 and d θcSynchronizing the local parameter with the global parameter to obtain a local actor parameter theta'aThe value of (A) is synchronized to the actor network parameter θaValue of (2), local evaluator parameter θ'cValue synchronization of (2) as evaluator network parameter θcA value of (d);
s43, let the first intermediate count value tstartT, by local GPUiReading the state s at the current momentt;
S44, obtaining strategy pi (a) through actor networkt|st;θ′a) And according to the strategy pi (a)t|st;θ′a) Performing action atWherein, pi (a)t|st;θ′a) Is shown in state stAnd local GPUiGo local actor parameter θ'aThe action to be performed in the case of (a) ist;
S45, acquiring and executing action atThe latter prize value rtAnd new state st+1And the count value of the first counter t is increased by one;
s46, judging new state StIf the condition defined by the final state is met, if yes, setting the updated reward value R to be 0, and entering the step S48, otherwise, entering the step S47;
s47, judging t-tstartWhether greater than the routing hop limit tmaxIf yes, the updated reward value R is set as V(s)t,θ′c) And proceeds to step S48, otherwise returns to step S44, where V (S)t,θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state stThe routing policy evaluation value of (1);
s48, setting t-1 as third counter z and updating R as gradientupdata=rz+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s49, updating the reward value R according to the gradientupdataLocal actor parameter θ'aAnd local evaluator parameter θ'cObtaining a local actor parameter gradient Δ θaAnd local actor parameter gradient Δ θcThe update values of (a) are:
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,representing a local actor parameter θ'aDerivative of (d), log pi (a)z|sz;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action azLogarithm of the probability of this strategy, rzIndicating the execution of action azγ represents the discount rate of the prize, V(s)z;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state szIs a routing policy evaluation value, Δ θc_updataTo representGradient Delta thetacThe updated value of (a) is set,represents a pair (R)updata-V(sz;θ′c))2Get theta'cPartial derivatives of (d);
s410, let Delta thetaa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determining whether the third counter z is equal to the first intermediate count value tstartIf yes, go to step S411, otherwise, decrease the count value of the third counter z by one, update the gradient with the reward value RupdataIs updated to rz+ γ R, and return to step S49;
s411, judging whether the second counter T is larger than or equal to the maximum iteration number TmaxIf so, the local actor parameter gradient Δ θ is usedaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcAnd ending the updating process, otherwise, incrementing the count value of the second counter T by one, and returning to step S42.
The local actor parameter gradient Δ θ is used in the step S411aAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcThe formula of (1) is:
θa_updata=θa+βΔθa
θc_updata=θc+βΔθc
wherein, thetaa_updataRepresenting updated actor network parameters thetaa,θc_updataRepresenting updated evaluator network parameter θcAnd beta represents the local GPUiWeights in an SDN network.
The step S7 includes the following sub-steps:
s71, setting a fourth counter j to be 1, and collecting a routing request task f;
s72, distributing the routing request task f to the SDN networkA GPU in idle state, and the idle GPU is the GPUidle;
S73, setting d thetaa0 and d θc0, and will GPUidleLocal actor parameter of [ theta ]'aSynchronizing to an actor network parameter θaParameter value, local evaluator parameter θ'cSynchronizing to evaluator network parameter θcA parameter value;
s74, let the second intermediate count value jstartJ and reads the initial state s at the current timej;
S75, obtaining the state S through the actor networkjAnd local actor parameter θ'aIn case of performing action ajStrategy of (a)j|sj;θ′a) And implements the strategy pi (a)j|sj;θ′a);
S76, acquiring and executing action ajThe latter prize value rjAnd new state sj+1Incrementing the count value of the fourth counter j by one, and performing action ajAdding an action set A;
s77, judging new state SjWhether the condition defined by the final state of the routing request task f is achieved, if yes, the step S78 is entered, otherwise, the step S75 is returned;
s78, obtaining the route path p according to the action set A, judging whether the route request task f is matched with the route path p, if yes, making the update reward value R equal to 0, and proceeding to the step S79, otherwise, making the update reward value R equal to V (S)j,θ′c) And proceeds to step S79;
s79, setting the fifth counter k to j-1 and updating the reward value R in gradientupdata=rk+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s710, updating the reward value R according to the gradientupdataLocal actor parameter θa' and local evaluator parameter θc', obtaining local actor parameter gradient Delta thetaaAnd local actor parameter gradient Δ θcThe update values of (a) are:
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,representing a local actor parameter θ'aDerivative of (d), log pi (a)k|sk;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action akLogarithm of the probability of this strategy, rkIndicating the execution of action akγ represents the discount rate of the prize, V(s)k;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state skIs a routing policy evaluation value, Δ θc_updataRepresenting a gradient Δ θcThe updated value of (a) is set,represents a pair (R)updata-V(sk;θ′c))2Get theta'cPartial derivatives of (d);
s711, let Δ θa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determines whether the fifth counter k is equal to the second intermediate count value jstartIf yes, go to step S712, otherwise, decrease the count value of the fifth counter k by one, update the gradient with the reward value RupdataIs updated to rk+ γ R, and return to step S710;
s712, passing the local actor parameter gradient delta thetaaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcWill actNetwork parameter thetaaAnd evaluator network parameter θcAnd acting on the whole SDN network, and transmitting data by using the SDN network after updating the parameters.
As shown in fig. 3, in the present embodiment, the deep reinforcement learning model includes pairs of actors and reviewers, which are constructed using the neural network NN, and the actor network outputs a probability distribution and a routing policy for all actions in a given state, which is a multi-output neural network. The reviewer network uses the time difference error to evaluate the strategy of the behavior person, and is an output neural network. The actor network is a fully-connected neural network, and after data such as current nodes, target node information, bandwidth requirements, time delay requirements and the like are input, weighted summation and activation function processing are calculated at each neural network node, and a plurality of results are output. The actor network gives the next action according to the current state, the action has a plurality of choices and is a multi-output neural network, and the output is the probability of a plurality of routing choices. The evaluator network comprises four network information inputs and also inputs of network characteristics, and the output is the evaluation of the strategy of the actor network, so the evaluator network is a single output. The evaluator network input has one more network characteristic input, the input is the change information of the network, and the real-time network state change is added when the actor network strategy is evaluated, so that the intelligent route has self-adaptability.
Claims (6)
1. A SDN intelligent routing data transmission method for distributed deep reinforcement learning is characterized by comprising the following steps:
s1, constructing a reward function and a deep reinforcement learning model comprising an actor network and an evaluator network, and arranging the deep reinforcement learning model in an application layer of the SDN network;
s2, randomly initializing actor network parameter theta of deep reinforcement learning modelaAnd evaluator network parameter θc;
S3, randomly initializing ith local GPU in control layer of SDN networkiLocal actor parameter θ 'of Upper actor network'aAnd local evaluator parameter θ 'of evaluator network'c;
S4, according to the reward function and the actor network parameter thetaaEvaluator network parameter θcLocal actor parameter θ'aAnd local evaluator parameter θ'cUsing A3C algorithm to the ith local GPUiThe above deep reinforcement learning model is used for off-line training and updating the actor network parameter thetaaAnd evaluator network parameter θc;
S5, updating the actor network parameter thetaaAnd updated evaluator network parameter θcActing on the whole SDN network, and transmitting data by using the SDN network after updating parameters;
s6, regularly detecting whether the topological structure of the SDN network changes, if so, entering the step S7, otherwise, repeating the step S6;
s7, carrying out on-line training on the deep reinforcement learning model, and using the self-adaptive operation algorithm to carry out the on-line training on the actor network parameter thetaaAnd evaluator network parameter θcUpdating and updating the actor network parameter thetaaAnd evaluator network parameter θcActing on the whole SDN network, and transmitting data by using the SDN network after updating parameters;
where i ═ 1, 2., L represents the total number of local GPUs.
2. The SDN network smart routing data transmission method of claim 1, wherein the actor network in step S1 is a fully-connected neural network, and the evaluator network in step S1 is a combination network of the fully-connected neural network and a CNN convolutional neural network; the input of the actor network and the evaluator network comprise network states of the SDN network, the network states comprise current node information, destination node information, bandwidth requirements and delay requirements, and the input of the evaluator network further comprises network characteristics of the SDN network processed by the CNN convolutional neural network; the CNN convolutional neural network comprises an input layer, a convolutional layer, a pooling layer, a full-connection layer and an output layer which are sequentially connected.
3. The SDN network smart routing data transmission method of distributed deep reinforcement learning according to claim 1, wherein the incentive function in step S1 is:
wherein the content of the first and second substances,is shown in state snIn case of (a), the nth routing node in the SDN network makes an action a to the mth routing nodenThe reward value obtained later; g denotes an action penalty, a1Represents a first weight, a2Representing a second weight, c (n) representing a remaining capacity of an nth routing node, c (m) representing a remaining capacity of an mth routing node, c (l) representing a remaining capacity of an ith link in the SDN network, d (n) representing a degree of difference in traffic load between the nth routing node and its neighboring nodes, and d (m) representing a degree of difference in traffic load between the mth routing node and its neighboring nodes; the state snThe method comprises the following steps: the node where the data packet is located is the nth routing node, the final destination node of the data packet, the forwarding bandwidth requirement of the datagram and the delay requirement of the data packet; the action anIs shown in state snAll forwarding operations that may be taken in case of (1).
4. The SDN network smart routing data transmission method of distributed deep reinforcement learning according to claim 1, wherein the step S4 includes the following sub-steps:
s41, setting the first counter T to 0, the second counter T to 0, and the maximum iteration number TmaxAnd routing hop count limit tmax;
S42, let d thetaa0 and d θcSynchronizing the local parameter with the global parameter to obtain a local actor parameter theta'aThe value of (A) is synchronized to the actor network parameter θaWill be localEvaluator parameter θ'cValue synchronization of (2) as evaluator network parameter θcA value of (d);
s43, let the first intermediate count value tstartT, by local GPUiReading the state s at the current momentt;
S44, obtaining strategy pi (a) through actor networkt|st;θ′a) And according to the strategy pi (a)t|st;θ′a) Performing action atWherein, pi (a)t|st;θ′a) Is shown in state stAnd local GPUiGo local actor parameter θ'aThe action to be performed in the case of (a) ist;
S45, acquiring and executing action atThe latter prize value rtAnd new state st+1And the count value of the first counter t is increased by one;
s46, judging new state StIf the condition defined by the final state is met, if yes, setting the updated reward value R to be 0, and entering the step S48, otherwise, entering the step S47;
s47, judging t-tstartWhether greater than the routing hop limit tmaxIf yes, the updated reward value R is set as V(s)t,θ′c) And proceeds to step S48, otherwise returns to step S44, where V (S)t,θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state stThe routing policy evaluation value of (1);
s48, setting t-1 as third counter z and updating R as gradientupdata=rz+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s49, updating the reward value R according to the gradientupdataLocal actor parameter θ'aAnd local evaluator parameter θ'cObtaining a local actor parameter gradient Δ θaAnd local actor parameter gradient Δ θcThe update values of (a) are:
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,representing a local actor parameter θ'aDerivative of (d), log pi (a)z|sz;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action azLogarithm of the probability of this strategy, rzIndicating the execution of action azγ represents the discount rate of the prize, V(s)z;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state szIs a routing policy evaluation value, Δ θc_updataRepresenting a gradient Δ θcThe updated value of (a) is set,represents a pair (R)updata-V(sz;θ′c))2Get theta'cPartial derivatives of (d);
s410, let Delta thetaa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determining whether the third counter z is equal to the first intermediate count value tstartIf yes, go to step S411, otherwise, decrease the count value of the third counter z by one, update the gradient with the reward value RupdataIs updated to rz+ γ R, and return to step S49;
s411, judging whether the second counter T is larger than or equal to the maximum iteration number TmaxIf so, the local actor parameter gradient Δ θ is usedaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcAnd ending the updating process, otherwise, incrementing the count value of the second counter T by one, and returning to step S42.
5. The SDN network smart routing data transmission method of claim 4, wherein a local actor parameter gradient Δ θ is used in step S411aAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcThe formula of (1) is:
θa_updata=θa+βΔθa
θc_updata=θc+βΔθc
wherein, thetaa_updataRepresenting updated actor network parameters thetaa,θc_updataRepresenting updated evaluator network parameter θcAnd beta represents the local GPUiWeights in an SDN network.
6. The SDN network smart routing data transmission method of distributed deep reinforcement learning according to claim 4, wherein the step S7 includes the following sub-steps:
s71, setting a fourth counter j to be 1, and collecting a routing request task f;
s72, distributing the routing request task f to an idle GPU in the SDN network, wherein the idle GPU is the GPUidle;
S73, setting d thetaa0 and d θc0, and will GPUidleLocal actor parameter of [ theta ]'aSynchronizing to an actor network parameter θaParameter value, local evaluator parameter θ'cSynchronizing to evaluator network parameter θcA parameter value;
s74, let the second intermediate count value jstartJ and reads the initial state s at the current timej;
S75, obtaining the state S through the actor networkjAnd local actor parameter θ'aIn case of performing action ajStrategy of (a)j|sj;θ′a) And implements the strategy pi (a)j|sj;θ′a);
S76, acquiring and executing action ajThe latter prize value rjAnd new state sj+1Incrementing the count value of the fourth counter j by one, and performing action ajAdding an action set A;
s77, judging new state SjWhether the condition defined by the final state of the routing request task f is achieved, if yes, the step S78 is entered, otherwise, the step S75 is returned;
s78, obtaining the route path p according to the action set A, judging whether the route request task f is matched with the route path p, if yes, making the update reward value R equal to 0, and proceeding to the step S79, otherwise, making the update reward value R equal to V (S)j,θ′c) And proceeds to step S79;
s79, setting the fifth counter k to j-1 and updating the reward value R in gradientupdata=rk+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s710, updating the reward value R according to the gradientupdataLocal actor parameter θ'aAnd local evaluator parameter θ'cObtaining a local actor parameter gradient Δ θaAnd local actor parameter gradient Δ θcThe update values of (a) are:
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,representing a local actor parameter θ'aDerivative of (d), log pi (a)k|sk;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action akLogarithm of the probability of this strategy, rkIndicating the execution of action akγ represents the discount rate of the prize, V(s)k;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state skIs a routing policy evaluation value, Δ θc_updataRepresenting a gradient Δ θcThe updated value of (a) is set,represents a pair (R)updata-V(sk;θ′c))2Get theta'cPartial derivatives of (d);
s711, let Δ θa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determines whether the fifth counter k is equal to the second intermediate count value jstartIf yes, go to step S712, otherwise, decrease the count value of the fifth counter k by one, update the gradient with the reward value RupdataIs updated to rk+ γ R, and return to step S710;
s712, passing the local actor parameter gradient delta thetaaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcAnd the actor network parameter theta is calculatedaAnd evaluator network parameter θcAnd acting on the whole SDN network, and transmitting data by using the SDN network after updating the parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010673851.8A CN111917642B (en) | 2020-07-14 | 2020-07-14 | SDN intelligent routing data transmission method for distributed deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010673851.8A CN111917642B (en) | 2020-07-14 | 2020-07-14 | SDN intelligent routing data transmission method for distributed deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111917642A true CN111917642A (en) | 2020-11-10 |
CN111917642B CN111917642B (en) | 2021-04-27 |
Family
ID=73280083
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010673851.8A Expired - Fee Related CN111917642B (en) | 2020-07-14 | 2020-07-14 | SDN intelligent routing data transmission method for distributed deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111917642B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112818788A (en) * | 2021-01-25 | 2021-05-18 | 电子科技大学 | Distributed convolutional neural network hierarchical matching method based on unmanned aerial vehicle cluster |
CN113316216A (en) * | 2021-05-26 | 2021-08-27 | 电子科技大学 | Routing method for micro-nano satellite network |
CN113537628A (en) * | 2021-08-04 | 2021-10-22 | 郭宏亮 | General reliable shortest path algorithm based on distributed reinforcement learning |
CN114051272A (en) * | 2021-10-30 | 2022-02-15 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Intelligent routing method for dynamic topological network |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150269479A1 (en) * | 2014-03-24 | 2015-09-24 | Qualcomm Incorporated | Conversion of neuron types to hardware |
CN106873585A (en) * | 2017-01-18 | 2017-06-20 | 无锡辰星机器人科技有限公司 | One kind navigation method for searching, robot and system |
CN108600104A (en) * | 2018-04-28 | 2018-09-28 | 电子科技大学 | A kind of SDN Internet of Things flow polymerizations based on tree-shaped routing |
CN108803615A (en) * | 2018-07-03 | 2018-11-13 | 东南大学 | A kind of visual human's circumstances not known navigation algorithm based on deeply study |
US20190014488A1 (en) * | 2017-07-06 | 2019-01-10 | Futurewei Technologies, Inc. | System and method for deep learning and wireless network optimization using deep learning |
CN109343341A (en) * | 2018-11-21 | 2019-02-15 | 北京航天自动控制研究所 | It is a kind of based on deeply study carrier rocket vertically recycle intelligent control method |
CN109803344A (en) * | 2018-12-28 | 2019-05-24 | 北京邮电大学 | A kind of unmanned plane network topology and routing joint mapping method |
US10396919B1 (en) * | 2017-05-12 | 2019-08-27 | Virginia Tech Intellectual Properties, Inc. | Processing of communications signals using machine learning |
CN110472880A (en) * | 2019-08-20 | 2019-11-19 | 李峰 | Evaluate the method, apparatus and storage medium of collaborative problem resolution ability |
CN110472738A (en) * | 2019-08-16 | 2019-11-19 | 北京理工大学 | A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study |
US20190354859A1 (en) * | 2018-05-18 | 2019-11-21 | Deepmind Technologies Limited | Meta-gradient updates for training return functions for reinforcement learning systems |
CN110515303A (en) * | 2019-09-17 | 2019-11-29 | 余姚市浙江大学机器人研究中心 | A kind of adaptive dynamic path planning method based on DDQN |
CN110611619A (en) * | 2019-09-12 | 2019-12-24 | 西安电子科技大学 | Intelligent routing decision method based on DDPG reinforcement learning algorithm |
CN110770761A (en) * | 2017-07-06 | 2020-02-07 | 华为技术有限公司 | Deep learning system and method and wireless network optimization using deep learning |
CN111010294A (en) * | 2019-11-28 | 2020-04-14 | 国网甘肃省电力公司电力科学研究院 | Electric power communication network routing method based on deep reinforcement learning |
US20200139973A1 (en) * | 2018-11-01 | 2020-05-07 | GM Global Technology Operations LLC | Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle |
CN111316295A (en) * | 2017-10-27 | 2020-06-19 | 渊慧科技有限公司 | Reinforcement learning using distributed prioritized playback |
-
2020
- 2020-07-14 CN CN202010673851.8A patent/CN111917642B/en not_active Expired - Fee Related
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150269479A1 (en) * | 2014-03-24 | 2015-09-24 | Qualcomm Incorporated | Conversion of neuron types to hardware |
CN106873585A (en) * | 2017-01-18 | 2017-06-20 | 无锡辰星机器人科技有限公司 | One kind navigation method for searching, robot and system |
US10396919B1 (en) * | 2017-05-12 | 2019-08-27 | Virginia Tech Intellectual Properties, Inc. | Processing of communications signals using machine learning |
US20190014488A1 (en) * | 2017-07-06 | 2019-01-10 | Futurewei Technologies, Inc. | System and method for deep learning and wireless network optimization using deep learning |
CN110770761A (en) * | 2017-07-06 | 2020-02-07 | 华为技术有限公司 | Deep learning system and method and wireless network optimization using deep learning |
CN111316295A (en) * | 2017-10-27 | 2020-06-19 | 渊慧科技有限公司 | Reinforcement learning using distributed prioritized playback |
CN108600104A (en) * | 2018-04-28 | 2018-09-28 | 电子科技大学 | A kind of SDN Internet of Things flow polymerizations based on tree-shaped routing |
US20190354859A1 (en) * | 2018-05-18 | 2019-11-21 | Deepmind Technologies Limited | Meta-gradient updates for training return functions for reinforcement learning systems |
CN108803615A (en) * | 2018-07-03 | 2018-11-13 | 东南大学 | A kind of visual human's circumstances not known navigation algorithm based on deeply study |
US20200139973A1 (en) * | 2018-11-01 | 2020-05-07 | GM Global Technology Operations LLC | Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle |
CN109343341A (en) * | 2018-11-21 | 2019-02-15 | 北京航天自动控制研究所 | It is a kind of based on deeply study carrier rocket vertically recycle intelligent control method |
CN109803344A (en) * | 2018-12-28 | 2019-05-24 | 北京邮电大学 | A kind of unmanned plane network topology and routing joint mapping method |
CN110472738A (en) * | 2019-08-16 | 2019-11-19 | 北京理工大学 | A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study |
CN110472880A (en) * | 2019-08-20 | 2019-11-19 | 李峰 | Evaluate the method, apparatus and storage medium of collaborative problem resolution ability |
CN110611619A (en) * | 2019-09-12 | 2019-12-24 | 西安电子科技大学 | Intelligent routing decision method based on DDPG reinforcement learning algorithm |
CN110515303A (en) * | 2019-09-17 | 2019-11-29 | 余姚市浙江大学机器人研究中心 | A kind of adaptive dynamic path planning method based on DDQN |
CN111010294A (en) * | 2019-11-28 | 2020-04-14 | 国网甘肃省电力公司电力科学研究院 | Electric power communication network routing method based on deep reinforcement learning |
Non-Patent Citations (3)
Title |
---|
LINGXIN ZHANG: "Multi-task Deep Reinforcement Learning for Scalable Parallel Task Scheduling", 《2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)》 * |
兰巨龙: "基于深度增强学习的软件定义网络路由优化机制", 《电子与信息学报》 * |
章小宁: "名址分离网络中一种新的双层映射系统研究", 《电子与信息学报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112818788A (en) * | 2021-01-25 | 2021-05-18 | 电子科技大学 | Distributed convolutional neural network hierarchical matching method based on unmanned aerial vehicle cluster |
CN112818788B (en) * | 2021-01-25 | 2022-05-03 | 电子科技大学 | Distributed convolutional neural network hierarchical matching method based on unmanned aerial vehicle cluster |
CN113316216A (en) * | 2021-05-26 | 2021-08-27 | 电子科技大学 | Routing method for micro-nano satellite network |
CN113316216B (en) * | 2021-05-26 | 2022-04-08 | 电子科技大学 | Routing method for micro-nano satellite network |
CN113537628A (en) * | 2021-08-04 | 2021-10-22 | 郭宏亮 | General reliable shortest path algorithm based on distributed reinforcement learning |
CN113537628B (en) * | 2021-08-04 | 2023-08-22 | 郭宏亮 | Universal reliable shortest path method based on distributed reinforcement learning |
CN114051272A (en) * | 2021-10-30 | 2022-02-15 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Intelligent routing method for dynamic topological network |
Also Published As
Publication number | Publication date |
---|---|
CN111917642B (en) | 2021-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111917642B (en) | SDN intelligent routing data transmission method for distributed deep reinforcement learning | |
CN110611619B (en) | Intelligent routing decision method based on DDPG reinforcement learning algorithm | |
CN112437020B (en) | Data center network load balancing method based on deep reinforcement learning | |
CN114697229B (en) | Construction method and application of distributed routing planning model | |
CN111988225A (en) | Multi-path routing method based on reinforcement learning and transfer learning | |
CN103971160B (en) | particle swarm optimization method based on complex network | |
WO2020172825A1 (en) | Method and apparatus for determining transmission policy | |
CN116527567B (en) | Intelligent network path optimization method and system based on deep reinforcement learning | |
CN113570039B (en) | Block chain system based on reinforcement learning optimization consensus | |
CN113395207B (en) | Deep reinforcement learning-based route optimization framework and method under SDN framework | |
CN112486690A (en) | Edge computing resource allocation method suitable for industrial Internet of things | |
CN113784410B (en) | Heterogeneous wireless network vertical switching method based on reinforcement learning TD3 algorithm | |
CN110328668B (en) | Mechanical arm path planning method based on speed smooth deterministic strategy gradient | |
CN114415735B (en) | Dynamic environment-oriented multi-unmanned aerial vehicle distributed intelligent task allocation method | |
CN108111335A (en) | A kind of method and system dispatched and link virtual network function | |
CN113938415B (en) | Network route forwarding method and system based on link state estimation | |
CN113821041A (en) | Multi-robot collaborative navigation and obstacle avoidance method | |
CN117041129A (en) | Low-orbit satellite network flow routing method based on multi-agent reinforcement learning | |
CN111340192B (en) | Network path allocation model training method, path allocation method and device | |
Fuji et al. | Deep multi-agent reinforcement learning using dnn-weight evolution to optimize supply chain performance | |
CN114205251B (en) | Switch link resource prediction method based on space-time characteristics | |
CN115714741A (en) | Routing decision method and system based on collaborative multi-agent reinforcement learning | |
CN115225561A (en) | Route optimization method and system based on graph structure characteristics | |
CN117014355A (en) | TSSDN dynamic route decision method based on DDPG deep reinforcement learning algorithm | |
CN115150335A (en) | Optimal flow segmentation method and system based on deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210427 |