CN111917642A - SDN intelligent routing data transmission method for distributed deep reinforcement learning - Google Patents

SDN intelligent routing data transmission method for distributed deep reinforcement learning Download PDF

Info

Publication number
CN111917642A
CN111917642A CN202010673851.8A CN202010673851A CN111917642A CN 111917642 A CN111917642 A CN 111917642A CN 202010673851 A CN202010673851 A CN 202010673851A CN 111917642 A CN111917642 A CN 111917642A
Authority
CN
China
Prior art keywords
network
parameter
actor
local
evaluator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010673851.8A
Other languages
Chinese (zh)
Other versions
CN111917642B (en
Inventor
刘宇涛
崔金鹏
章小宁
贺元林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202010673851.8A priority Critical patent/CN111917642B/en
Publication of CN111917642A publication Critical patent/CN111917642A/en
Application granted granted Critical
Publication of CN111917642B publication Critical patent/CN111917642B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/124Shortest path evaluation using a combination of metrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/121Shortest path evaluation by minimising delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/125Shortest path evaluation based on throughput or bandwidth

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a distributed deep reinforcement learning SDN network intelligent routing data transmission method, which realizes the calculation of a fast routing path, maximizes the throughput under the condition of ensuring delay and solves the problems of low speed and low throughput of the traditional algorithm. The invention uses the reinforcement learning algorithm, the algorithm simplifies the route calculation process into simple input and output, avoids multiple iterations during calculation so as to realize the rapid calculation of the route path, the speed of the route algorithm is accelerated, the forwarding delay is reduced, the data packet which is discarded due to the expiry of ttl originally has a more probable survival rate and is successfully forwarded, and the network throughput is increased. The invention is provided with two stages of off-line training and on-line training, and the parameters are updated in a dynamic environment to select the optimal path, so that the invention has topology self-adaptability.

Description

SDN intelligent routing data transmission method for distributed deep reinforcement learning
Technical Field
The invention belongs to the field of data transmission, and particularly relates to an SDN intelligent routing data transmission method for distributed deep reinforcement learning.
Background
The current information technology is in a mature stage, data flow in an SDN (Software Defined Network) architecture is flexible and controllable, a controller has a full Network view and can sense Network state change (such as flow distribution, congestion condition, link utilization condition and the like) in real time, in reality, the routing problem is often solved through a shortest path algorithm, and some simple Network parameters (such as path hop count, time delay and the like) are used as optimization indexes of the algorithm to find a path with the least hop count or a path with the least time delay as a final target of the algorithm. The single measurement standard and the optimization target easily cause the congestion of part of the key links, thereby causing the problem of unbalanced network load. Although the optimal path with multiple compound constraint conditions can be found by the shortest routing algorithm based on Lagrange relaxation during multi-service path distribution, the heuristic routing algorithm can calculate the optimal path only through multiple iterations, and has the advantages of low convergence speed, poor timeliness and low throughput.
Disclosure of Invention
Aiming at the defects in the prior art, the SDN intelligent routing data transmission method for the distributed deep reinforcement learning solves the problems in the prior art.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a SDN intelligent routing data transmission method for distributed deep reinforcement learning comprises the following steps:
s1, constructing a reward function and a deep reinforcement learning model comprising an actor network and an evaluator network, and arranging the deep reinforcement learning model in an application layer of the SDN network;
s2, randomly initializing actor network parameter theta of deep reinforcement learning modelaAnd evaluator network parameter θc
S3, randomly initializing ith local GPU in control layer of SDN networkiLocal actor parameter θ 'of Upper actor network'aAnd local evaluator parameter θ 'of evaluator network'c
S4, according to the reward function and the actor network parameter thetaaEvaluator network parameter θcLocal actor parameter θ'aAnd local evaluator parameter θ'cUsing A3C algorithm to the ith local GPUiThe above deep reinforcement learning model is used for off-line training and updating the actor network parameter thetaaAnd evaluator network parameter θc
S5, updating the actor network parameter thetaaAnd updated evaluator network parameter θcActing on the whole SDN network, and transmitting data by using the SDN network after updating parameters;
s6, regularly detecting whether the topological structure of the SDN network changes, if so, entering the step S7, otherwise, repeating the step S6;
s7, carrying out on-line training on the deep reinforcement learning model, and using the self-adaptive operation algorithm to carry out the on-line training on the actor network parameter thetaaAnd evaluator network parameter θcUpdating and updating the actor network parameter thetaaAnd evaluator network parameter θcActing on the whole SDN network, and transmitting data by using the SDN network after updating parameters;
where i ═ 1, 2., L represents the total number of local GPUs.
Further, the actor network in the step S1 is a fully connected neural network, and the evaluator network in the step S1 is a combined network of the fully connected neural network and the CNN convolutional neural network; the input of the actor network and the evaluator network comprise network states of the SDN network, the network states comprise current node information, destination node information, bandwidth requirements and delay requirements, and the input of the evaluator network further comprises network characteristics of the SDN network processed by the CNN convolutional neural network; the CNN convolutional neural network comprises an input layer, a convolutional layer, a pooling layer, a full-connection layer and an output layer which are sequentially connected.
Further, the incentive function in step S1 is:
Figure BDA0002583337990000021
wherein the content of the first and second substances,
Figure BDA0002583337990000022
is shown in state snIn case of (a), the nth routing node in the SDN network makes an action a to the mth routing nodenThe reward value obtained later; g denotes an action penalty, a1Represents a first weight, a2Representing a second weight, c (n) representing a remaining capacity of an nth routing node, c (m) representing a remaining capacity of an mth routing node, c (l) representing a remaining capacity of an ith link in the SDN network, d (n) representing a degree of difference in traffic load between the nth routing node and its neighboring nodes, and d (m) representing a degree of difference in traffic load between the mth routing node and its neighboring nodes; the state snThe method comprises the following steps: the node where the data packet is located is the nth routing node, the final destination node of the data packet, the forwarding bandwidth requirement of the datagram and the delay requirement of the data packet; the action anIs shown in state snAll forwarding operations that may be taken in case of (1).
Further, the step S4 includes the following sub-steps:
s41, setting the first counter T to 0, the second counter T to 0, and the maximum iteration number TmaxAnd routing hop count limit tmax
S42, let d thetaa0 and d θcSynchronizing the local parameter with the global parameter to obtain a local actor parameter thetaa' the value of is synchronized to the actor network parameter θaIs a local evaluator parameter thetac' value synchronization as evaluator network parameter θcA value of (d);
s43, let the first intermediate count value tstartT, by local GPUiReading the state s at the current momentt
S44, obtaining strategy pi (a) through actor networkt|st;θ′a) And according to the strategy pi (a)t|st;θ′a) Performing action atWherein, pi (a)t|st;θ′a) Is shown in state stAnd local GPUiGo local actor parameter θ'aThe action to be performed in the case of (a) ist
S45, acquiring and executing action atThe latter prize value rtAnd new state st+1And the count value of the first counter t is increased by one;
s46, judging new state StIf the condition defined by the final state is met, if yes, setting the updated reward value R to be 0, and entering the step S48, otherwise, entering the step S47;
s47, judging t-tstartWhether greater than the routing hop limit tmaxIf yes, the updated reward value R is set as V(s)t,θ′c) And proceeds to step S48, otherwise returns to step S44, where V (S)t,θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state stThe routing policy evaluation value of (1);
s48, setting t-1 as third counter z and updating R as gradientupdata=rz+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s49, updating the reward value R according to the gradientupdataLocal actor parameter θ'aAnd local evaluator parameter θ'cObtaining a local actor parameter gradient Δ θaAnd local actor parameter gradient Δ θcThe update values of (a) are:
Figure BDA0002583337990000041
Figure BDA0002583337990000042
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,
Figure BDA0002583337990000043
representing a local actor parameter θ'aDerivative of (d), log pi (a)z|sz;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action azLogarithm of the probability of this strategy, rzIndicating the execution of action azγ represents the discount rate of the prize, V(s)z;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state szIs a routing policy evaluation value, Δ θc_updataRepresenting a gradient Δ θcThe updated value of (a) is set,
Figure BDA0002583337990000044
represents a pair (R)updata-V(sz;θ′c))2Get theta'cPartial derivatives of (d);
s410, let Delta thetaa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determining whether the third counter z is equal to the first intermediate count value tstartIf yes, go to step S411, otherwise, decrease the count value of the third counter z by one, update the gradient with the reward value RupdataIs updated to rz+ γ R, and return to step S49;
s411, judging whether the second counter T is larger than or equal to the maximum iteration number TmaxIf so, the local actor parameter gradient Δ θ is usedaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcAnd ending the updating process, otherwise, incrementing the count value of the second counter T by one, and returning to step S42.
Further, a local actor parameter gradient Δ θ is used in the step S411aAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcThe formula of (1) is:
θa_updata=θa+βΔθa
θc_updata=θc+βΔθc
wherein, thetaa_updataRepresenting updated actor network parameters thetaa,θc_updataRepresenting updated evaluator network parameter θcAnd beta represents the local GPUiWeights in an SDN network.
Further, the step S7 includes the following sub-steps:
s71, setting a fourth counter j to be 1, and collecting a routing request task f;
s72, distributing the routing request task f to an idle GPU in the SDN network, wherein the idle GPU is the GPUidle
S73, setting d thetaa0 and d θc0, and will GPUidleLocal actor parameter of [ theta ]'aSynchronizing to an actor network parameter θaParameter value, local evaluator parameter θ'cSynchronizing to evaluator network parameter θcA parameter value;
s74, let the second intermediate count value jstartJ and reads the initial state s at the current timej
S75, obtaining the state S through the actor networkjAnd local actor parameter θ'aIn case of performing action ajStrategy of (a)j|sj;θ′a) And implements the strategy pi (a)j|sj;θ′a);
S76, acquiring and executing action ajThe latter prize value rjAnd new state sj+1Incrementing the count value of the fourth counter j by one, and performing action ajAdding an action set A;
s77, judging new state SjWhether the condition defined by the final state of the routing request task f is achieved, if yes, the step S78 is entered, otherwise, the step S75 is returned;
s78, obtaining the route path p according to the action set A, judging whether the route request task f is matched with the route path p, if yes, making the update reward value R equal to 0, and proceeding to the step S79, otherwise, making the update reward value R equal to V (S)j,θ′c) And proceeds to step S79;
s79, setting the fifth counter k to j-1 and updating the reward value R in gradientupdata=rk+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s710, updating the reward value R according to the gradientupdataLocal actor parameter θ'aAnd local evaluator parameter θ'cObtaining a local actor parameter gradient Δ θaAnd local actor parameter gradient Δ θcThe update values of (a) are:
Figure BDA0002583337990000051
Figure BDA0002583337990000052
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,
Figure BDA0002583337990000053
representing a local actor parameter θ'aDerivative of (d), log pi (a)k|sk;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action akLogarithm of the probability of this strategy, rkIndicating the execution of action akγ represents the discount rate of the prize, V(s)k;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state skIs a routing policy evaluation value, Δ θc_updataRepresenting a gradient Δ θcThe updated value of (a) is set,
Figure BDA0002583337990000061
represents a pair (R)updata-V(sk;θ′c))2Get theta'cPartial derivatives of (d);
s711, let Δ θa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determines whether the fifth counter k is equal to the second intermediate count value jstartIf yes, go to step S712, otherwise, decrease the count value of the fifth counter k by one, update the gradient with the reward value RupdataIs updated to rk+ γ R, and return to step S710;
s712, passing the local actor parameter gradient delta thetaaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcAnd the actor network parameter theta is calculatedaAnd evaluator network parameter θcAnd acting on the whole SDN network, and transmitting data by using the SDN network after updating the parameters.
The invention has the beneficial effects that:
(1) the invention realizes the calculation of the fast routing path, maximizes the throughput under the condition of ensuring the delay and solves the problems of low speed and low throughput of the traditional algorithm.
(2) The invention uses the reinforcement learning algorithm, the algorithm simplifies the route calculation process into simple input and output, avoids multiple iterations during calculation so as to realize the rapid calculation of the route path, the speed of the route algorithm is accelerated, the forwarding delay is reduced, the data packet which is discarded due to the expiry of ttl originally has a more probable survival rate and is successfully forwarded, and the network throughput is increased.
(3) The invention is provided with two stages of off-line training and on-line training, and the parameters are updated in a dynamic environment to select the optimal path, so that the invention has topology self-adaptability.
(4) The invention sets the reward function, so that the node or link load, the routing requirement and the network topology information better constrain the training process of reinforcement learning, and the trained deep reinforcement learning model can more accurately execute the routing task.
Drawings
Fig. 1 is a flowchart of a distributed deep reinforcement learning SDN network intelligent routing data transmission method according to the present invention;
FIG. 2 is a schematic diagram of a CNN convolutional neural network according to the present invention;
FIG. 3 is a diagram of a deep reinforcement learning model according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a method for transmitting smart routing data in a distributed deep reinforcement learning SDN network includes the following steps:
s1, constructing a reward function and a deep reinforcement learning model comprising an actor network and an evaluator network, and arranging the deep reinforcement learning model in an application layer of the SDN network;
s2, randomly initializing actor network parameter theta of deep reinforcement learning modelaAnd evaluator network parameter θc
S3, randomly initializing ith local GPU in control layer of SDN networkiLocal actor parameter θ 'of Upper actor network'aAnd local evaluator parameter θ 'of evaluator network'c
S4, according to the reward function and the actor network parameter thetaaEvaluator network parameter θcLocal actor parameter θ'aAnd local evaluator parameter θ'cUsing A3C algorithm to the ith local GPUiThe above deep reinforcement learning model is used for off-line training and updating the actor network parameter thetaaAnd evaluator network parameter θc
S5, updating the actor network parameter thetaaAnd updated evaluator network parameter θcActing on SDN network global, usage updatesThe SDN network after the parameters transmits data;
s6, regularly detecting whether the topological structure of the SDN network changes, if so, entering the step S7, otherwise, repeating the step S6;
s7, carrying out on-line training on the deep reinforcement learning model, and using the self-adaptive operation algorithm to carry out the on-line training on the actor network parameter thetaaAnd evaluator network parameter θcUpdating and updating the actor network parameter thetaaAnd evaluator network parameter θcActing on the whole SDN network, and transmitting data by using the SDN network after updating parameters;
where i ═ 1, 2., L represents the total number of local GPUs.
The actor network in the step S1 is a fully-connected neural network, and the evaluator network in the step S1 is a combined network of the fully-connected neural network and the CNN convolutional neural network; the input of the actor network and the evaluator network comprise network states of the SDN network, the network states comprise current node information, destination node information, bandwidth requirements and delay requirements, and the input of the evaluator network further comprises network characteristics of the SDN network processed by the CNN convolutional neural network.
As shown in fig. 2, the CNN convolutional neural network includes an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer, which are connected in sequence.
In step S1, the excitation function is:
Figure BDA0002583337990000081
wherein the content of the first and second substances,
Figure BDA0002583337990000082
is shown in state snIn case of (a), the nth routing node in the SDN network makes an action a to the mth routing nodenThe reward value obtained later; g denotes an action penalty, a1Represents a first weight, a2Represents the second weight, c (n) represents the remaining capacity of the nth routing node, c (m) represents the remaining capacity of the mth routing nodeVolume, c (l) represents the remaining capacity of the l link in the SDN network, d (n) represents the degree of difference of the traffic load of the n routing node and its neighboring nodes, d (m) represents the degree of difference of the traffic load of the m routing node and its neighboring nodes; the state snThe method comprises the following steps: the node where the data packet is located is the nth routing node, the final destination node of the data packet, the forwarding bandwidth requirement of the datagram and the delay requirement of the data packet; the action anIs shown in state snAll forwarding operations that may be taken in case of (1).
The step S4 includes the following sub-steps:
s41, setting the first counter T to 0, the second counter T to 0, and the maximum iteration number TmaxAnd routing hop count limit tmax
S42, let d thetaa0 and d θcSynchronizing the local parameter with the global parameter to obtain a local actor parameter theta'aThe value of (A) is synchronized to the actor network parameter θaValue of (2), local evaluator parameter θ'cValue synchronization of (2) as evaluator network parameter θcA value of (d);
s43, let the first intermediate count value tstartT, by local GPUiReading the state s at the current momentt
S44, obtaining strategy pi (a) through actor networkt|st;θ′a) And according to the strategy pi (a)t|st;θ′a) Performing action atWherein, pi (a)t|st;θ′a) Is shown in state stAnd local GPUiGo local actor parameter θ'aThe action to be performed in the case of (a) ist
S45, acquiring and executing action atThe latter prize value rtAnd new state st+1And the count value of the first counter t is increased by one;
s46, judging new state StIf the condition defined by the final state is met, if yes, setting the updated reward value R to be 0, and entering the step S48, otherwise, entering the step S47;
s47, judging t-tstartWhether greater than the routing hop limit tmaxIf yes, the updated reward value R is set as V(s)t,θ′c) And proceeds to step S48, otherwise returns to step S44, where V (S)t,θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state stThe routing policy evaluation value of (1);
s48, setting t-1 as third counter z and updating R as gradientupdata=rz+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s49, updating the reward value R according to the gradientupdataLocal actor parameter θ'aAnd local evaluator parameter θ'cObtaining a local actor parameter gradient Δ θaAnd local actor parameter gradient Δ θcThe update values of (a) are:
Figure BDA0002583337990000091
Figure BDA0002583337990000101
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,
Figure BDA0002583337990000102
representing a local actor parameter θ'aDerivative of (d), log pi (a)z|sz;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action azLogarithm of the probability of this strategy, rzIndicating the execution of action azγ represents the discount rate of the prize, V(s)z;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state szIs a routing policy evaluation value, Δ θc_updataTo representGradient Delta thetacThe updated value of (a) is set,
Figure BDA0002583337990000103
represents a pair (R)updata-V(sz;θ′c))2Get theta'cPartial derivatives of (d);
s410, let Delta thetaa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determining whether the third counter z is equal to the first intermediate count value tstartIf yes, go to step S411, otherwise, decrease the count value of the third counter z by one, update the gradient with the reward value RupdataIs updated to rz+ γ R, and return to step S49;
s411, judging whether the second counter T is larger than or equal to the maximum iteration number TmaxIf so, the local actor parameter gradient Δ θ is usedaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcAnd ending the updating process, otherwise, incrementing the count value of the second counter T by one, and returning to step S42.
The local actor parameter gradient Δ θ is used in the step S411aAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcThe formula of (1) is:
θa_updata=θa+βΔθa
θc_updata=θc+βΔθc
wherein, thetaa_updataRepresenting updated actor network parameters thetaa,θc_updataRepresenting updated evaluator network parameter θcAnd beta represents the local GPUiWeights in an SDN network.
The step S7 includes the following sub-steps:
s71, setting a fourth counter j to be 1, and collecting a routing request task f;
s72, distributing the routing request task f to the SDN networkA GPU in idle state, and the idle GPU is the GPUidle
S73, setting d thetaa0 and d θc0, and will GPUidleLocal actor parameter of [ theta ]'aSynchronizing to an actor network parameter θaParameter value, local evaluator parameter θ'cSynchronizing to evaluator network parameter θcA parameter value;
s74, let the second intermediate count value jstartJ and reads the initial state s at the current timej
S75, obtaining the state S through the actor networkjAnd local actor parameter θ'aIn case of performing action ajStrategy of (a)j|sj;θ′a) And implements the strategy pi (a)j|sj;θ′a);
S76, acquiring and executing action ajThe latter prize value rjAnd new state sj+1Incrementing the count value of the fourth counter j by one, and performing action ajAdding an action set A;
s77, judging new state SjWhether the condition defined by the final state of the routing request task f is achieved, if yes, the step S78 is entered, otherwise, the step S75 is returned;
s78, obtaining the route path p according to the action set A, judging whether the route request task f is matched with the route path p, if yes, making the update reward value R equal to 0, and proceeding to the step S79, otherwise, making the update reward value R equal to V (S)j,θ′c) And proceeds to step S79;
s79, setting the fifth counter k to j-1 and updating the reward value R in gradientupdata=rk+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s710, updating the reward value R according to the gradientupdataLocal actor parameter θa' and local evaluator parameter θc', obtaining local actor parameter gradient Delta thetaaAnd local actor parameter gradient Δ θcThe update values of (a) are:
Figure BDA0002583337990000111
Figure BDA0002583337990000112
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,
Figure BDA0002583337990000113
representing a local actor parameter θ'aDerivative of (d), log pi (a)k|sk;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action akLogarithm of the probability of this strategy, rkIndicating the execution of action akγ represents the discount rate of the prize, V(s)k;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state skIs a routing policy evaluation value, Δ θc_updataRepresenting a gradient Δ θcThe updated value of (a) is set,
Figure BDA0002583337990000114
represents a pair (R)updata-V(sk;θ′c))2Get theta'cPartial derivatives of (d);
s711, let Δ θa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determines whether the fifth counter k is equal to the second intermediate count value jstartIf yes, go to step S712, otherwise, decrease the count value of the fifth counter k by one, update the gradient with the reward value RupdataIs updated to rk+ γ R, and return to step S710;
s712, passing the local actor parameter gradient delta thetaaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcWill actNetwork parameter thetaaAnd evaluator network parameter θcAnd acting on the whole SDN network, and transmitting data by using the SDN network after updating the parameters.
As shown in fig. 3, in the present embodiment, the deep reinforcement learning model includes pairs of actors and reviewers, which are constructed using the neural network NN, and the actor network outputs a probability distribution and a routing policy for all actions in a given state, which is a multi-output neural network. The reviewer network uses the time difference error to evaluate the strategy of the behavior person, and is an output neural network. The actor network is a fully-connected neural network, and after data such as current nodes, target node information, bandwidth requirements, time delay requirements and the like are input, weighted summation and activation function processing are calculated at each neural network node, and a plurality of results are output. The actor network gives the next action according to the current state, the action has a plurality of choices and is a multi-output neural network, and the output is the probability of a plurality of routing choices. The evaluator network comprises four network information inputs and also inputs of network characteristics, and the output is the evaluation of the strategy of the actor network, so the evaluator network is a single output. The evaluator network input has one more network characteristic input, the input is the change information of the network, and the real-time network state change is added when the actor network strategy is evaluated, so that the intelligent route has self-adaptability.

Claims (6)

1. A SDN intelligent routing data transmission method for distributed deep reinforcement learning is characterized by comprising the following steps:
s1, constructing a reward function and a deep reinforcement learning model comprising an actor network and an evaluator network, and arranging the deep reinforcement learning model in an application layer of the SDN network;
s2, randomly initializing actor network parameter theta of deep reinforcement learning modelaAnd evaluator network parameter θc
S3, randomly initializing ith local GPU in control layer of SDN networkiLocal actor parameter θ 'of Upper actor network'aAnd local evaluator parameter θ 'of evaluator network'c
S4, according to the reward function and the actor network parameter thetaaEvaluator network parameter θcLocal actor parameter θ'aAnd local evaluator parameter θ'cUsing A3C algorithm to the ith local GPUiThe above deep reinforcement learning model is used for off-line training and updating the actor network parameter thetaaAnd evaluator network parameter θc
S5, updating the actor network parameter thetaaAnd updated evaluator network parameter θcActing on the whole SDN network, and transmitting data by using the SDN network after updating parameters;
s6, regularly detecting whether the topological structure of the SDN network changes, if so, entering the step S7, otherwise, repeating the step S6;
s7, carrying out on-line training on the deep reinforcement learning model, and using the self-adaptive operation algorithm to carry out the on-line training on the actor network parameter thetaaAnd evaluator network parameter θcUpdating and updating the actor network parameter thetaaAnd evaluator network parameter θcActing on the whole SDN network, and transmitting data by using the SDN network after updating parameters;
where i ═ 1, 2., L represents the total number of local GPUs.
2. The SDN network smart routing data transmission method of claim 1, wherein the actor network in step S1 is a fully-connected neural network, and the evaluator network in step S1 is a combination network of the fully-connected neural network and a CNN convolutional neural network; the input of the actor network and the evaluator network comprise network states of the SDN network, the network states comprise current node information, destination node information, bandwidth requirements and delay requirements, and the input of the evaluator network further comprises network characteristics of the SDN network processed by the CNN convolutional neural network; the CNN convolutional neural network comprises an input layer, a convolutional layer, a pooling layer, a full-connection layer and an output layer which are sequentially connected.
3. The SDN network smart routing data transmission method of distributed deep reinforcement learning according to claim 1, wherein the incentive function in step S1 is:
Figure RE-FDA0002611597120000021
wherein the content of the first and second substances,
Figure RE-FDA0002611597120000022
is shown in state snIn case of (a), the nth routing node in the SDN network makes an action a to the mth routing nodenThe reward value obtained later; g denotes an action penalty, a1Represents a first weight, a2Representing a second weight, c (n) representing a remaining capacity of an nth routing node, c (m) representing a remaining capacity of an mth routing node, c (l) representing a remaining capacity of an ith link in the SDN network, d (n) representing a degree of difference in traffic load between the nth routing node and its neighboring nodes, and d (m) representing a degree of difference in traffic load between the mth routing node and its neighboring nodes; the state snThe method comprises the following steps: the node where the data packet is located is the nth routing node, the final destination node of the data packet, the forwarding bandwidth requirement of the datagram and the delay requirement of the data packet; the action anIs shown in state snAll forwarding operations that may be taken in case of (1).
4. The SDN network smart routing data transmission method of distributed deep reinforcement learning according to claim 1, wherein the step S4 includes the following sub-steps:
s41, setting the first counter T to 0, the second counter T to 0, and the maximum iteration number TmaxAnd routing hop count limit tmax
S42, let d thetaa0 and d θcSynchronizing the local parameter with the global parameter to obtain a local actor parameter theta'aThe value of (A) is synchronized to the actor network parameter θaWill be localEvaluator parameter θ'cValue synchronization of (2) as evaluator network parameter θcA value of (d);
s43, let the first intermediate count value tstartT, by local GPUiReading the state s at the current momentt
S44, obtaining strategy pi (a) through actor networkt|st;θ′a) And according to the strategy pi (a)t|st;θ′a) Performing action atWherein, pi (a)t|st;θ′a) Is shown in state stAnd local GPUiGo local actor parameter θ'aThe action to be performed in the case of (a) ist
S45, acquiring and executing action atThe latter prize value rtAnd new state st+1And the count value of the first counter t is increased by one;
s46, judging new state StIf the condition defined by the final state is met, if yes, setting the updated reward value R to be 0, and entering the step S48, otherwise, entering the step S47;
s47, judging t-tstartWhether greater than the routing hop limit tmaxIf yes, the updated reward value R is set as V(s)t,θ′c) And proceeds to step S48, otherwise returns to step S44, where V (S)t,θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state stThe routing policy evaluation value of (1);
s48, setting t-1 as third counter z and updating R as gradientupdata=rz+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s49, updating the reward value R according to the gradientupdataLocal actor parameter θ'aAnd local evaluator parameter θ'cObtaining a local actor parameter gradient Δ θaAnd local actor parameter gradient Δ θcThe update values of (a) are:
Figure RE-FDA0002611597120000031
Figure RE-FDA0002611597120000032
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,
Figure RE-FDA0002611597120000033
representing a local actor parameter θ'aDerivative of (d), log pi (a)z|sz;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action azLogarithm of the probability of this strategy, rzIndicating the execution of action azγ represents the discount rate of the prize, V(s)z;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state szIs a routing policy evaluation value, Δ θc_updataRepresenting a gradient Δ θcThe updated value of (a) is set,
Figure RE-FDA0002611597120000034
represents a pair (R)updata-V(sz;θ′c))2Get theta'cPartial derivatives of (d);
s410, let Delta thetaa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determining whether the third counter z is equal to the first intermediate count value tstartIf yes, go to step S411, otherwise, decrease the count value of the third counter z by one, update the gradient with the reward value RupdataIs updated to rz+ γ R, and return to step S49;
s411, judging whether the second counter T is larger than or equal to the maximum iteration number TmaxIf so, the local actor parameter gradient Δ θ is usedaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcAnd ending the updating process, otherwise, incrementing the count value of the second counter T by one, and returning to step S42.
5. The SDN network smart routing data transmission method of claim 4, wherein a local actor parameter gradient Δ θ is used in step S411aAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcThe formula of (1) is:
θa_updata=θa+βΔθa
θc_updata=θc+βΔθc
wherein, thetaa_updataRepresenting updated actor network parameters thetaa,θc_updataRepresenting updated evaluator network parameter θcAnd beta represents the local GPUiWeights in an SDN network.
6. The SDN network smart routing data transmission method of distributed deep reinforcement learning according to claim 4, wherein the step S7 includes the following sub-steps:
s71, setting a fourth counter j to be 1, and collecting a routing request task f;
s72, distributing the routing request task f to an idle GPU in the SDN network, wherein the idle GPU is the GPUidle
S73, setting d thetaa0 and d θc0, and will GPUidleLocal actor parameter of [ theta ]'aSynchronizing to an actor network parameter θaParameter value, local evaluator parameter θ'cSynchronizing to evaluator network parameter θcA parameter value;
s74, let the second intermediate count value jstartJ and reads the initial state s at the current timej
S75, obtaining the state S through the actor networkjAnd local actor parameter θ'aIn case of performing action ajStrategy of (a)j|sj;θ′a) And implements the strategy pi (a)j|sj;θ′a);
S76, acquiring and executing action ajThe latter prize value rjAnd new state sj+1Incrementing the count value of the fourth counter j by one, and performing action ajAdding an action set A;
s77, judging new state SjWhether the condition defined by the final state of the routing request task f is achieved, if yes, the step S78 is entered, otherwise, the step S75 is returned;
s78, obtaining the route path p according to the action set A, judging whether the route request task f is matched with the route path p, if yes, making the update reward value R equal to 0, and proceeding to the step S79, otherwise, making the update reward value R equal to V (S)j,θ′c) And proceeds to step S79;
s79, setting the fifth counter k to j-1 and updating the reward value R in gradientupdata=rk+ γ R, initializing gradient Δ θ of actor network parametersaAnd gradient of evaluator network parameters delta thetacIs 0;
s710, updating the reward value R according to the gradientupdataLocal actor parameter θ'aAnd local evaluator parameter θ'cObtaining a local actor parameter gradient Δ θaAnd local actor parameter gradient Δ θcThe update values of (a) are:
Figure RE-FDA0002611597120000051
Figure RE-FDA0002611597120000052
wherein, Delta thetaa_updataRepresenting a gradient Δ θaThe updated value of (a) is set,
Figure RE-FDA0002611597120000053
representing a local actor parameter θ'aDerivative of (d), log pi (a)k|sk;θ′a) Is represented by the parameter theta'aAnd state szIn case of performing action akLogarithm of the probability of this strategy, rkIndicating the execution of action akγ represents the discount rate of the prize, V(s)k;θ′c) Representing evaluator network local evaluator parameter θ'cTime pair arrival state skIs a routing policy evaluation value, Δ θc_updataRepresenting a gradient Δ θcThe updated value of (a) is set,
Figure RE-FDA0002611597120000054
represents a pair (R)updata-V(sk;θ′c))2Get theta'cPartial derivatives of (d);
s711, let Δ θa=Δθa_updata、Δθc=Δθc_updataAnd R ═ RupdataAnd determines whether the fifth counter k is equal to the second intermediate count value jstartIf yes, go to step S712, otherwise, decrease the count value of the fifth counter k by one, update the gradient with the reward value RupdataIs updated to rk+ γ R, and return to step S710;
s712, passing the local actor parameter gradient delta thetaaAnd local actor parameter gradient Δ θcUpdating the actor network parameters θ separatelyaAnd evaluator network parameter θcAnd the actor network parameter theta is calculatedaAnd evaluator network parameter θcAnd acting on the whole SDN network, and transmitting data by using the SDN network after updating the parameters.
CN202010673851.8A 2020-07-14 2020-07-14 SDN intelligent routing data transmission method for distributed deep reinforcement learning Expired - Fee Related CN111917642B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010673851.8A CN111917642B (en) 2020-07-14 2020-07-14 SDN intelligent routing data transmission method for distributed deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010673851.8A CN111917642B (en) 2020-07-14 2020-07-14 SDN intelligent routing data transmission method for distributed deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111917642A true CN111917642A (en) 2020-11-10
CN111917642B CN111917642B (en) 2021-04-27

Family

ID=73280083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010673851.8A Expired - Fee Related CN111917642B (en) 2020-07-14 2020-07-14 SDN intelligent routing data transmission method for distributed deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111917642B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818788A (en) * 2021-01-25 2021-05-18 电子科技大学 Distributed convolutional neural network hierarchical matching method based on unmanned aerial vehicle cluster
CN113316216A (en) * 2021-05-26 2021-08-27 电子科技大学 Routing method for micro-nano satellite network
CN113537628A (en) * 2021-08-04 2021-10-22 郭宏亮 General reliable shortest path algorithm based on distributed reinforcement learning
CN114051272A (en) * 2021-10-30 2022-02-15 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent routing method for dynamic topological network

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150269479A1 (en) * 2014-03-24 2015-09-24 Qualcomm Incorporated Conversion of neuron types to hardware
CN106873585A (en) * 2017-01-18 2017-06-20 无锡辰星机器人科技有限公司 One kind navigation method for searching, robot and system
CN108600104A (en) * 2018-04-28 2018-09-28 电子科技大学 A kind of SDN Internet of Things flow polymerizations based on tree-shaped routing
CN108803615A (en) * 2018-07-03 2018-11-13 东南大学 A kind of visual human's circumstances not known navigation algorithm based on deeply study
US20190014488A1 (en) * 2017-07-06 2019-01-10 Futurewei Technologies, Inc. System and method for deep learning and wireless network optimization using deep learning
CN109343341A (en) * 2018-11-21 2019-02-15 北京航天自动控制研究所 It is a kind of based on deeply study carrier rocket vertically recycle intelligent control method
CN109803344A (en) * 2018-12-28 2019-05-24 北京邮电大学 A kind of unmanned plane network topology and routing joint mapping method
US10396919B1 (en) * 2017-05-12 2019-08-27 Virginia Tech Intellectual Properties, Inc. Processing of communications signals using machine learning
CN110472880A (en) * 2019-08-20 2019-11-19 李峰 Evaluate the method, apparatus and storage medium of collaborative problem resolution ability
CN110472738A (en) * 2019-08-16 2019-11-19 北京理工大学 A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study
US20190354859A1 (en) * 2018-05-18 2019-11-21 Deepmind Technologies Limited Meta-gradient updates for training return functions for reinforcement learning systems
CN110515303A (en) * 2019-09-17 2019-11-29 余姚市浙江大学机器人研究中心 A kind of adaptive dynamic path planning method based on DDQN
CN110611619A (en) * 2019-09-12 2019-12-24 西安电子科技大学 Intelligent routing decision method based on DDPG reinforcement learning algorithm
CN110770761A (en) * 2017-07-06 2020-02-07 华为技术有限公司 Deep learning system and method and wireless network optimization using deep learning
CN111010294A (en) * 2019-11-28 2020-04-14 国网甘肃省电力公司电力科学研究院 Electric power communication network routing method based on deep reinforcement learning
US20200139973A1 (en) * 2018-11-01 2020-05-07 GM Global Technology Operations LLC Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle
CN111316295A (en) * 2017-10-27 2020-06-19 渊慧科技有限公司 Reinforcement learning using distributed prioritized playback

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150269479A1 (en) * 2014-03-24 2015-09-24 Qualcomm Incorporated Conversion of neuron types to hardware
CN106873585A (en) * 2017-01-18 2017-06-20 无锡辰星机器人科技有限公司 One kind navigation method for searching, robot and system
US10396919B1 (en) * 2017-05-12 2019-08-27 Virginia Tech Intellectual Properties, Inc. Processing of communications signals using machine learning
US20190014488A1 (en) * 2017-07-06 2019-01-10 Futurewei Technologies, Inc. System and method for deep learning and wireless network optimization using deep learning
CN110770761A (en) * 2017-07-06 2020-02-07 华为技术有限公司 Deep learning system and method and wireless network optimization using deep learning
CN111316295A (en) * 2017-10-27 2020-06-19 渊慧科技有限公司 Reinforcement learning using distributed prioritized playback
CN108600104A (en) * 2018-04-28 2018-09-28 电子科技大学 A kind of SDN Internet of Things flow polymerizations based on tree-shaped routing
US20190354859A1 (en) * 2018-05-18 2019-11-21 Deepmind Technologies Limited Meta-gradient updates for training return functions for reinforcement learning systems
CN108803615A (en) * 2018-07-03 2018-11-13 东南大学 A kind of visual human's circumstances not known navigation algorithm based on deeply study
US20200139973A1 (en) * 2018-11-01 2020-05-07 GM Global Technology Operations LLC Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle
CN109343341A (en) * 2018-11-21 2019-02-15 北京航天自动控制研究所 It is a kind of based on deeply study carrier rocket vertically recycle intelligent control method
CN109803344A (en) * 2018-12-28 2019-05-24 北京邮电大学 A kind of unmanned plane network topology and routing joint mapping method
CN110472738A (en) * 2019-08-16 2019-11-19 北京理工大学 A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study
CN110472880A (en) * 2019-08-20 2019-11-19 李峰 Evaluate the method, apparatus and storage medium of collaborative problem resolution ability
CN110611619A (en) * 2019-09-12 2019-12-24 西安电子科技大学 Intelligent routing decision method based on DDPG reinforcement learning algorithm
CN110515303A (en) * 2019-09-17 2019-11-29 余姚市浙江大学机器人研究中心 A kind of adaptive dynamic path planning method based on DDQN
CN111010294A (en) * 2019-11-28 2020-04-14 国网甘肃省电力公司电力科学研究院 Electric power communication network routing method based on deep reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LINGXIN ZHANG: "Multi-task Deep Reinforcement Learning for Scalable Parallel Task Scheduling", 《2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)》 *
兰巨龙: "基于深度增强学习的软件定义网络路由优化机制", 《电子与信息学报》 *
章小宁: "名址分离网络中一种新的双层映射系统研究", 《电子与信息学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818788A (en) * 2021-01-25 2021-05-18 电子科技大学 Distributed convolutional neural network hierarchical matching method based on unmanned aerial vehicle cluster
CN112818788B (en) * 2021-01-25 2022-05-03 电子科技大学 Distributed convolutional neural network hierarchical matching method based on unmanned aerial vehicle cluster
CN113316216A (en) * 2021-05-26 2021-08-27 电子科技大学 Routing method for micro-nano satellite network
CN113316216B (en) * 2021-05-26 2022-04-08 电子科技大学 Routing method for micro-nano satellite network
CN113537628A (en) * 2021-08-04 2021-10-22 郭宏亮 General reliable shortest path algorithm based on distributed reinforcement learning
CN113537628B (en) * 2021-08-04 2023-08-22 郭宏亮 Universal reliable shortest path method based on distributed reinforcement learning
CN114051272A (en) * 2021-10-30 2022-02-15 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent routing method for dynamic topological network

Also Published As

Publication number Publication date
CN111917642B (en) 2021-04-27

Similar Documents

Publication Publication Date Title
CN111917642B (en) SDN intelligent routing data transmission method for distributed deep reinforcement learning
CN110611619B (en) Intelligent routing decision method based on DDPG reinforcement learning algorithm
CN112437020B (en) Data center network load balancing method based on deep reinforcement learning
CN114697229B (en) Construction method and application of distributed routing planning model
CN111988225A (en) Multi-path routing method based on reinforcement learning and transfer learning
CN103971160B (en) particle swarm optimization method based on complex network
WO2020172825A1 (en) Method and apparatus for determining transmission policy
CN116527567B (en) Intelligent network path optimization method and system based on deep reinforcement learning
CN113570039B (en) Block chain system based on reinforcement learning optimization consensus
CN113395207B (en) Deep reinforcement learning-based route optimization framework and method under SDN framework
CN112486690A (en) Edge computing resource allocation method suitable for industrial Internet of things
CN113784410B (en) Heterogeneous wireless network vertical switching method based on reinforcement learning TD3 algorithm
CN110328668B (en) Mechanical arm path planning method based on speed smooth deterministic strategy gradient
CN114415735B (en) Dynamic environment-oriented multi-unmanned aerial vehicle distributed intelligent task allocation method
CN108111335A (en) A kind of method and system dispatched and link virtual network function
CN113938415B (en) Network route forwarding method and system based on link state estimation
CN113821041A (en) Multi-robot collaborative navigation and obstacle avoidance method
CN117041129A (en) Low-orbit satellite network flow routing method based on multi-agent reinforcement learning
CN111340192B (en) Network path allocation model training method, path allocation method and device
Fuji et al. Deep multi-agent reinforcement learning using dnn-weight evolution to optimize supply chain performance
CN114205251B (en) Switch link resource prediction method based on space-time characteristics
CN115714741A (en) Routing decision method and system based on collaborative multi-agent reinforcement learning
CN115225561A (en) Route optimization method and system based on graph structure characteristics
CN117014355A (en) TSSDN dynamic route decision method based on DDPG deep reinforcement learning algorithm
CN115150335A (en) Optimal flow segmentation method and system based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210427