CN112968834A - SDN route convergence method under reinforcement learning based on network characteristics - Google Patents
SDN route convergence method under reinforcement learning based on network characteristics Download PDFInfo
- Publication number
- CN112968834A CN112968834A CN202110145046.2A CN202110145046A CN112968834A CN 112968834 A CN112968834 A CN 112968834A CN 202110145046 A CN202110145046 A CN 202110145046A CN 112968834 A CN112968834 A CN 112968834A
- Authority
- CN
- China
- Prior art keywords
- node
- network
- theta
- reinforcement learning
- agent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/12—Shortest path evaluation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/12—Shortest path evaluation
- H04L45/122—Shortest path evaluation by minimising distances, e.g. by selecting a route with minimum of number of hops
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/22—Alternate routing
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses an SDN route convergence method under reinforcement learning based on network characteristics, which applies reinforcement learning to SDN route convergence, uses a QLearning algorithm as a reinforcement learning model, and defines a direction factor theta to describe the direction of each transfer in a path according to an input network topology. And guiding the reinforcement learning agent to explore according to the theta value in the path transfer process. In the early stage of the epicode, the agent is allowed to select an action corresponding to the theta value being negative in the exploration phase, and the probability of the agent exploring the action corresponding to the theta value being negative is reduced with the continuous iteration of the epicode. Therefore, the exploration efficiency is improved while the agent obtains sufficient experience from the environment, and the generation of loops in the training phase is reduced. The method utilizes the characteristics of continuous interaction and strategy adjustment of reinforcement learning and network environment, and can find the optimal path in the route convergence process compared with the traditional route convergence algorithm.
Description
Technical Field
The invention relates to the field of network communication technology and reinforcement learning, in particular to an SDN route convergence method under reinforcement learning based on network characteristics.
Background
The reinforcement learning process may be summarized as the agent mapping from environmental states to behavioral actions such that the accumulated reward value is maximized. In the routing planning, the intelligent agent receives the current state information and the reward information from the routing system, and the action selected by the intelligent agent can be regarded as the input received by the routing system from the intelligent agent, and the action and the reward information in the current routing system can influence the action selection of the intelligent agent in a longer time later. In the whole routing planning system, the intelligent agent must learn the optimal action to maximize the accumulated reward value, and the action selected by the intelligent agent is the optimal path of the traffic. In the reinforcement learning task, Q-learning does not depend on an environment model, and in the limited Markov decision process, Q-learning can find the optimal strategy, and what agents need to do is to try in the system continuously to learn one strategy. The strategy is determined by the accumulated reward obtained after the strategy is executed, and the best strategy is to select the action with the maximum Q value in each state. Exploration refers to the agent selecting actions that have not been performed before, and exploitation refers to the agent taking the current optimal action from previously learned experience. In the invention, links which are not selected before are explored, namely selected, so that more possibilities are searched; and the known planning line is perfected by utilizing, namely selecting the currently selected link.
If the shortest path planning is realized by using reinforcement learning, agent's exploration trend is to find nodes with smaller and smaller depths. In the network, except for the link length, the link bandwidth, the delay, the hop count and the like can be taken as dominant characteristics, and various characteristics can also be taken as a new characteristic through weight addition. Through the dominant characteristics of the network topology, the agent can be guided to change from high random behavior in an exploration phase into efficient exploration, so that the learning network can achieve convergence more quickly. To avoid network convergence to sub-optimal solutions, we allow agent's exploration behavior to be highly random during the initial phase of training. With the increment of the training steps, by increasing the gradient difference of the dominant characteristics of each link, the agent can realize the transition from high random exploration to high-efficiency exploration, and can ensure the convergence of the optimal solution while improving the convergence speed.
Disclosure of Invention
The invention provides an SDN route convergence method under reinforcement learning based on network characteristics by combining reinforcement learning and solves the problems that a loop is easily formed around the conventional route convergence algorithm at present, an optimal path cannot be found and the like.
The technical scheme adopted by the invention for solving the technical problem is as follows: an SDN route convergence method under reinforcement learning based on network characteristics comprises the following steps:
step 1: establishing an SDN network area topological graph, and dividing network areas in a fine-grained manner;
step 2: defining a direction factor theta, setting a source node, wherein the direction factor theta is { -1, 0, 1}, when the direction factor theta from one node to another node in a topological graph is-1, the direction factor theta represents that the direction factor theta is close to the source node, when the direction factor theta from one node to another node in the topological graph is 1, the direction factor theta represents that the direction factor theta is far away from the source node, when the direction factor theta from one node to another node in the topological graph is 0, the shortest distances between the two nodes and the source node are the same, constructing a network topological hierarchical graph according to the theta values between different nodes in the topological graph and the relationship between the two nodes and the source node, and the shortest distances between all the nodes in each layer and the source node are the same;
and step 3: inputting the network topology hierarchical diagram obtained in the step 2 into a reinforcement learning model to guide agent exploration by using a Qlearning algorithm as the reinforcement learning model; when the height difference is oriented to 0, it means that the nodes are in the same layer, and at this time, the transition between the nodes in the same layer is less affected by the layering, and appears as a random exploration state of agent. And when the height difference between the layers is continuously increased, the agent is guided to explore the lower layer, and the efficient exploration state of the agent is shown.
The following formula is set:
h(θ)=f(θt)
wherein t represents the iteration number of the epsilon, and step is a set threshold value. And f (t) taking an absolute value of theta according to the iteration progress of the epamode, and specifically determining the specific value of theta by the h (theta) according to the state of the corresponding action. With continuous iteration of the epicode, in the selectable actions, the value of θ corresponding to the action close to the source node becomes smaller, and the value of θ corresponding to the action far from the source node becomes larger, so that a formula D is defined:
the range of the value of the interval D is from 0 to the sum of theta values corresponding to all current optional actions, the interval is divided into n parts, n is the number of the current optional actions, and the length of each subinterval is the theta value of the corresponding action.
η=random(D)
And obtaining the random number eta by carrying out equal probability value on the interval D.
Calculated by the following formula:
the function g (D, η) is the action corresponding to the interval D where the random number η is located, i.e. the agent searches for the action a to be selected.
The strategy formula of reinforcement learning based on the network area characteristics is as follows:
the epsilon-greedy policy balances exploration and utilization based on an exploration factor epsilon (epsilon 0, 1). Generating a random number sigma (sigma belongs to [0,1]), and when sigma is less than or equal to epsilon, using a random strategy by agent to explore the environment through randomly selecting actions to obtain experience; when σ > ε, agent uses a greedy strategy to leverage the experience that has been gained.
When agent generates state transition, inputting the current state s and the selected action a into a function R, generating reward to evaluate the state transition, and setting a reward function:
Rt(s,a)=αB-βt+γδ(s_-d)-δ
r is the reward earned at node i, selecting a link to node j. Alpha, beta, gamma and delta are used as four positive parameters to weigh the weights of the four parts of rewards. B is the residual bandwidth of the link corresponding to the selected action, and t is the delay of the corresponding link. δ (s _ -d) is the stimulus function, s _ representing the state to transition after selecting action a based on state s.
And training the Q value table according to the set reward function, and obtaining a path Routing through the trained Q value table, wherein the path is the converged optimal path after the link fails.
Further, the fine-grained network area division specifically includes: and constructing a network connection matrix according to the SDN network topology, wherein the network connection matrix comprises the adjacency relation among nodes of the network. Inputting the network connection matrix and the node number n in the network topology into a hub node election algorithm, recording the connection number of the nodes, and expressing as follows:
wherein, node _ link [ i ] is node i, T [ i ] [ j ] is link connected by node i, and the node with the highest link connection number is selected as the pivot node. The hub node and the adjacent nodes thereof are used as a divided network area.
Further, the process of training the Q-value table by the QLearning algorithm is specifically as follows:
and setting the maximum step number of the single training.
(1) Initializing a Q value table and a reward function R;
(2) adopting a strategy based on network area characteristics, and selecting an action a;
(3) executing action a, transferring to a state s _, calculating a reward value by using a reward function R, and updating a Q value table;
(4) and judging whether s _ is a destination node or not. If not, let s be s _, go back to (2). If s _ is the destination node, the training ends.
Further, when planning the backup path, the link bandwidth performance and the link delay are concerned, and therefore α is set to 0.4, β is set to 0.3, γ is set to 0.1, and δ is set to 0.2.
The invention has the beneficial effects that: the present invention defines a direction factor theta to describe the direction of each transition in the path. And guiding the reinforcement learning agent to explore according to the theta value in the path transfer process. Therefore, the exploration efficiency is improved while the agent obtains sufficient experience from the environment, and the generation of loops in the training phase is reduced. Compared with the traditional route convergence algorithm, the method can find the optimal path in the route convergence process by utilizing the characteristics of continuous interaction and strategy adjustment between reinforcement learning and the network environment.
Drawings
Figure 1 is a SDN network topology diagram;
fig. 2 is a network topology hierarchy diagram.
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
Aiming at the problem that the existing SDN control adopts Dijkstra algorithm as the shortest route convergence algorithm, the invention tries to apply reinforcement learning to SDN route convergence. And directly using the network topology environment for the training of the Q value table by utilizing the characteristic of SDN forwarding control separation. Considering that the residual bandwidth of each link in the topology dynamically changes along with the forwarding operation of different flows, the invention introduces a reinforcement learning technology, and utilizes the advantages of a reinforcement learning self-exploration environment to deal with the dynamics of a network environment, thereby finding an optimal route convergence path under the condition of ensuring the route convergence speed.
The invention provides an SDN route convergence method under reinforcement learning based on network characteristics, which comprises the following steps:
step 1: establishing an SDN network area topological graph, and dividing network areas in a fine-grained manner; the method specifically comprises the following steps: and constructing a network connection matrix according to the SDN network topology, wherein the network connection matrix comprises the adjacency relation among nodes of the network. Inputting the network connection matrix and the node number n in the network topology into a hub node election algorithm, recording the connection number of the nodes, and expressing as follows:
wherein, node _ link [ i ] is node i, T [ i ] [ j ] is link connected by node i, and the node with the highest link connection number is selected as the pivot node. The hub node and its neighboring nodes serve as a divided network area, as shown in fig. 1.
Step 2: defining a direction factor theta, setting a source node, wherein the direction factor theta is { -1, 0, 1}, when the direction factor theta from one node to another node in a topological graph is-1, the direction factor theta represents that the direction factor theta is close to the source node, when the direction factor theta from one node to another node in the topological graph is 1, the direction factor theta represents that the direction factor theta is far away from the source node, when the direction factor theta from one node to another node in the topological graph is 0, the shortest distances between the two nodes and the source node are the same, and constructing a network topological hierarchical graph according to the theta values between different nodes in the topological graph and the relationship between the two nodes and the source node, wherein as shown in fig. 2, the shortest distances between all nodes in each layer and the source node are the same;
and step 3: inputting the network topology hierarchical diagram obtained in the step 2 into a reinforcement learning model to guide agent exploration by using a Qlearning algorithm as the reinforcement learning model; when the height difference is oriented to 0, it means that each node is almost in the same layer, and then the transition between nodes in the same layer is less affected by the layering and appears as a random exploration state of agent. And when the height difference between the layers is continuously increased, the agent is guided to explore the lower layer, and the efficient exploration state of the agent is shown.
The following formula is set:
h(θ)=f(θt)
wherein t represents the iteration number of the epsilon, and step is a set threshold value. And f (t) taking an absolute value of theta according to the iteration progress of the epamode, and specifically determining the specific value of theta by the h (theta) according to the state of the corresponding action. With continuous iteration of the epicode, in the selectable actions, the value of θ corresponding to the action close to the source node becomes smaller, and the value of θ corresponding to the action far from the source node becomes larger, so that a formula D is defined:
the range of the value of the interval D is from 0 to the sum of theta values corresponding to all current optional actions, the interval is divided into n parts, n is the number of the current optional actions, and the length of each subinterval is the theta value of the corresponding action.
η=random(D)
And obtaining the random number eta by carrying out equal probability value on the interval D.
Calculated by the following formula:
the function g (D, η) is the action corresponding to the interval D where the random number η is located, i.e. the agent searches for the action a to be selected.
The strategy formula of reinforcement learning based on the network area characteristics is as follows:
the epsilon-greedy policy balances exploration and utilization based on an exploration factor epsilon (epsilon 0, 1). Generating a random number sigma (sigma belongs to [0,1]), and when sigma is less than or equal to epsilon, using a random strategy by agent to explore the environment through randomly selecting actions to obtain experience; when σ > ε, agent uses a greedy strategy to leverage the experience that has been gained.
When agent generates state transition, inputting the current state s and the selected action a into a function R, generating reward to evaluate the state transition, and setting a reward function:
Rt(s,a)=αB-βt+γδ(s_-d)-δ
r is the reward earned at node i, selecting a link to node j. Alpha, beta, gamma and delta are used as four positive parameters to weigh the weights of the four parts of rewards. B is the residual bandwidth of the link corresponding to the selected action, and t is the delay of the corresponding link. δ (s _ -d) is the stimulus function, s _ representing the state to transition after selecting action a based on state s.
Training the Q value table according to the set reward function, which comprises the following specific steps:
and setting the maximum step number of the single training.
(1) Initializing a Q value table and a reward function R;
(2) adopting a strategy based on network area characteristics, and selecting an action a;
(3) executing action a, transferring to a state s _, calculating a reward value by using a reward function R, and updating a Q value table;
(4) and judging whether s _ is a destination node or not. If not, let s be s _, go back to (2). If s _ is the destination node, the training ends.
And obtaining a path Routing by the trained Q value table, wherein the path is the converged optimal path after the link fails.
One specific application example of the present invention is as follows:
step 1: an SDN network area topological graph is constructed, and a MINET is used for constructing the network topological graph shown in the figure 1, wherein the network topological graph comprises 16 OpenFlow switches and 5 hosts. Step 2: the QLearning algorithm is used as a reinforcement learning model, and the route convergence method provided by the invention uses a Markov decision process to carry out modeling, so that the model MDP quadruple provided by the invention is defined as follows:
(1) state collection: in the network topology, each switch represents a state, and therefore, according to the network topology, the invention defines a network state set as follows:
S=[s1,s2,s3,…s16]
wherein s is1~s16Representing 16 OpenFlow switches in the network. The source node information of the data packet indicates an initial state of the data packet, and the destination node information indicates a termination state of the data packet. When a certain data packet reaches the destination node, the data packet reaches the termination state. Once the current data packet reaches the termination state, the termination of one round of training is indicated, and the data packet will return to the initial state again for the next round of training.
(2) An action space: in an SDN network, the transmission path of a data packet is determined by the network state, i.e. the data packet can only be transmitted at connected network nodes. According to the network topological graph, the invention defines the network connection state as shown in the following formula:
since packets can only be transmitted at connected network nodes, the present invention can define a set of actions for each state S [ i ] ∈ S according to the set of network states and the network connection state as follows:
A(si)={sj|T[si][sj]=1}
indicates that the current state is at siThe state-selectable action set appears as s on the network topologyiDirectly connected nodes sjI.e. the current state siWill only select the state s connected to itj. For example: state s1The action set of (1) is:A(s1)={s2,s4}。
(3) and (3) state transition: in each round of training, when the data packet is in state siIf the action is not the selected state of the round, the data packet moves to the next state.
(4) The reward function:
Rt(s,a)=αB-βt+γδ(s_-d)-δ
the present invention focuses on the link bandwidth performance and the link delay when planning the backup path, and therefore, α is set to 0.4, β is set to 0.3, γ is set to 0.1, and δ is set to 0.2.
In the system model, each time a data packet passes through one switch, a negative reward is obtained to represent the forwarding cost of the data packet, and the more switches pass through during forwarding, the more accumulated negative rewards are, and the higher the cost is; in order to increase the link bandwidth utilization rate, the data packet is encouraged to select a link with high link bandwidth utilization rate, and each time the data packet passes through one switch, the reward with the size equal to the size of the link utilization rate can be obtained; in order to force the data packet to reach the destination node as soon as possible, when the data packet reaches the destination node, an extra size 1 is obtained, which is expressed by the formula:
in the formula siIndicating the current state, i.e. the current packet is on switch number i, ajIndicating that the switch numbered j is selected.
In the invention, a network region characteristic strategy is adopted to carry out reinforcement learning model training.
After determining the MDP quadruple, when a certain link fails, a new path is searched from a source node to a destination node, and a Q value table is trained by using a QLearning algorithm:
and setting the maximum step number of the single training.
(1) Initializing a Q value table and a reward function R;
(2) adopting a strategy based on network area characteristics, and selecting an action a;
(3) executing action a, transferring to a state s _, calculating a reward value by using a reward function R, and updating a Q value table;
(4) and judging whether s' is a destination node or not. If not, let s be s _, go back to (2).
In the routing convergence planning process based on reinforcement learning, the learning rate alpha is set to be 0.8, the discount rate gamma is set to be 0.6, and the value of the action strategy epsilon-greedy strategy epsilon is epsilon-0.3.
And obtaining a path Routing according to the trained Q value table, wherein the path is the converged optimal path after the link fails.
The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the appended claims.
Claims (4)
1. An SDN route convergence method under reinforcement learning based on network characteristics is characterized by comprising the following steps:
step 1: establishing an SDN network area topological graph, and dividing network areas in a fine-grained manner;
step 2: defining a direction factor theta, setting a source node, wherein the direction factor theta is { -1, 0, 1}, when the direction factor theta from one node to another node in a topological graph is-1, the direction factor theta represents that the direction factor theta is close to the source node, when the direction factor theta from one node to another node in the topological graph is 1, the direction factor theta represents that the direction factor theta is far away from the source node, when the direction factor theta from one node to another node in the topological graph is 0, the shortest distances between the two nodes and the source node are the same, constructing a network topological hierarchical graph according to the theta values between different nodes in the topological graph and the relationship between the two nodes and the source node, and the shortest distances between all the nodes in each layer and the source node are the same;
and step 3: inputting the network topology hierarchical diagram obtained in the step 2 into a reinforcement learning model to guide agent exploration by using a Qlearning algorithm as the reinforcement learning model; when the height difference is oriented to 0, it means that the nodes are in the same layer, and at this time, the transition between the nodes in the same layer is less affected by the layering, and appears as a random exploration state of agent. And when the height difference between the layers is continuously increased, the agent is guided to explore the lower layer, and the efficient exploration state of the agent is shown.
The following formula is set:
h(θ)=f(θt)
wherein t represents the iteration number of the epsilon, and step is a set threshold value. And f (t) taking an absolute value of theta according to the iteration progress of the epamode, and specifically determining the specific value of theta by the h (theta) according to the state of the corresponding action. With continuous iteration of the epicode, in the selectable actions, the value of θ corresponding to the action close to the source node becomes smaller, and the value of θ corresponding to the action far from the source node becomes larger, so that a formula D is defined:
the range of the value of the interval D is from 0 to the sum of theta values corresponding to all current optional actions, the interval is divided into n parts, n is the number of the current optional actions, and the length of each subinterval is the theta value of the corresponding action.
η=random(D)
And obtaining the random number eta by carrying out equal probability value on the interval D.
Calculated by the following formula:
the function g (D, η) is the action corresponding to the interval D where the random number η is located, i.e. the agent searches for the action a to be selected.
The strategy formula of reinforcement learning based on the network area characteristics is as follows:
the epsilon-greedy strategy balances exploration and utilization based on an exploration factor epsilon (epsilon 0, 1). Generating a random number sigma (sigma belongs to [0,1]), and when sigma is less than or equal to epsilon, using a random strategy by agent to explore the environment through randomly selecting actions to obtain experience; when σ > ε, agent uses a greedy strategy to leverage the experience that has been gained.
When agent generates state transition, inputting the current state s and the selected action a into a function R, generating reward to evaluate the state transition, and setting a reward function:
Rt(s,a)=αB-βt+γδ(s_-d)-δ
r is the reward earned at node i, selecting a link to node j. Alpha, beta, gamma and delta are used as four positive parameters to weigh the weights of the four parts of rewards. B is the residual bandwidth of the link corresponding to the selected action, and t is the delay of the corresponding link. δ (s _ -d) is the stimulus function, s _ representing the state to transition after selecting action a based on state s.
And training the Q value table according to the set reward function, and obtaining a path Routing through the trained Q value table, wherein the path is the converged optimal path after the link fails.
2. The SDN route convergence method under reinforcement learning based on network characteristics according to claim 1, wherein the fine-grained network area division specifically includes: and constructing a network connection matrix according to the SDN network topology, wherein the network connection matrix comprises the adjacency relation among nodes of the network. Inputting the network connection matrix and the node number n in the network topology into a hub node election algorithm, recording the connection number of the nodes, and expressing as follows:
wherein, node _ link [ i ] is node i, T [ i ] [ j ] is link connected by node i, and the node with the highest link connection number is selected as the pivot node. The hub node and the adjacent nodes thereof are used as a divided network area.
3. The SDN route convergence method under reinforcement learning based on network features of claim 1, wherein the process of training the Q-value table by the QLearning algorithm is specifically as follows:
and setting the maximum step number of the single training.
(1) Initializing a Q value table and a reward function R;
(2) adopting a strategy based on network area characteristics, and selecting an action a;
(3) executing action a, transferring to a state s _, calculating a reward value by using a reward function R, and updating a Q value table;
(4) and judging whether s _ is a destination node or not. If not, let s be s _, go back to (2). If s _ is the destination node, the training ends.
4. The SDN route convergence method under reinforcement learning based on network characteristics of claim 1, wherein link bandwidth performance and link delay are considered when planning the backup path, so that α -0.4, β -0.3, γ -0.1, and δ -0.2 are set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110145046.2A CN112968834B (en) | 2021-02-02 | 2021-02-02 | SDN route convergence method under reinforcement learning based on network characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110145046.2A CN112968834B (en) | 2021-02-02 | 2021-02-02 | SDN route convergence method under reinforcement learning based on network characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112968834A true CN112968834A (en) | 2021-06-15 |
CN112968834B CN112968834B (en) | 2022-05-24 |
Family
ID=76271994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110145046.2A Active CN112968834B (en) | 2021-02-02 | 2021-02-02 | SDN route convergence method under reinforcement learning based on network characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112968834B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106411749A (en) * | 2016-10-12 | 2017-02-15 | 国网江苏省电力公司苏州供电公司 | Path selection method for software defined network based on Q learning |
CN108667734A (en) * | 2018-05-18 | 2018-10-16 | 南京邮电大学 | It is a kind of that the through street with LSTM neural networks is learnt by decision making algorithm based on Q |
US20190138948A1 (en) * | 2017-11-09 | 2019-05-09 | Ciena Corporation | Reinforcement learning for autonomous telecommunications networks |
CN111770019A (en) * | 2020-05-13 | 2020-10-13 | 西安电子科技大学 | Q-learning optical network-on-chip self-adaptive route planning method based on Dijkstra algorithm |
-
2021
- 2021-02-02 CN CN202110145046.2A patent/CN112968834B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106411749A (en) * | 2016-10-12 | 2017-02-15 | 国网江苏省电力公司苏州供电公司 | Path selection method for software defined network based on Q learning |
US20190138948A1 (en) * | 2017-11-09 | 2019-05-09 | Ciena Corporation | Reinforcement learning for autonomous telecommunications networks |
CN108667734A (en) * | 2018-05-18 | 2018-10-16 | 南京邮电大学 | It is a kind of that the through street with LSTM neural networks is learnt by decision making algorithm based on Q |
CN111770019A (en) * | 2020-05-13 | 2020-10-13 | 西安电子科技大学 | Q-learning optical network-on-chip self-adaptive route planning method based on Dijkstra algorithm |
Non-Patent Citations (1)
Title |
---|
TRUNG V. PHAN 等: "Q-DATA: Enhanced Traffic Flow Monitoring in Software-Defined Networks applying Q-learning", 《 2019 15TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM)》 * |
Also Published As
Publication number | Publication date |
---|---|
CN112968834B (en) | 2022-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qi et al. | Knowledge-driven service offloading decision for vehicular edge computing: A deep reinforcement learning approach | |
CN109039942B (en) | Network load balancing system and balancing method based on deep reinforcement learning | |
CN112437020B (en) | Data center network load balancing method based on deep reinforcement learning | |
CN112491714B (en) | Intelligent QoS route optimization method and system based on deep reinforcement learning in SDN environment | |
CN114697229B (en) | Construction method and application of distributed routing planning model | |
CN110488861A (en) | Unmanned plane track optimizing method, device and unmanned plane based on deeply study | |
CN108521375A (en) | The transmission of the network multi-service flow QoS based on SDN a kind of and dispatching method | |
Liu et al. | Drl-or: Deep reinforcement learning-based online routing for multi-type service requirements | |
CN108075975B (en) | Method and system for determining route transmission path in Internet of things environment | |
Singh et al. | OANTALG: an orientation based ant colony algorithm for mobile ad hoc networks | |
CN113194034A (en) | Route optimization method and system based on graph neural network and deep reinforcement learning | |
CN114500360A (en) | Network traffic scheduling method and system based on deep reinforcement learning | |
Quan et al. | Cybertwin-driven DRL-based adaptive transmission scheduling for software defined vehicular networks | |
Zhang et al. | IFS-RL: An intelligent forwarding strategy based on reinforcement learning in named-data networking | |
CN114143264A (en) | Traffic scheduling method based on reinforcement learning in SRv6 network | |
Oužecki et al. | Reinforcement learning as adaptive network routing of mobile agents | |
Dai et al. | Routing optimization meets Machine Intelligence: A perspective for the future network | |
CN110225493A (en) | Based on D2D route selection method, system, equipment and the medium for improving ant colony | |
CN114710439B (en) | Network energy consumption and throughput joint optimization routing method based on deep reinforcement learning | |
Cárdenas et al. | A multimetric predictive ANN-based routing protocol for vehicular ad hoc networks | |
Mani Kandan et al. | Fuzzy hierarchical ant colony optimization routing for weighted cluster in MANET | |
Zhou et al. | Multi-task deep learning based dynamic service function chains routing in SDN/NFV-enabled networks | |
Zhang et al. | A service migration method based on dynamic awareness in mobile edge computing | |
Wei et al. | GRL-PS: Graph embedding-based DRL approach for adaptive path selection | |
CN112968834B (en) | SDN route convergence method under reinforcement learning based on network characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |