CN113194034A

CN113194034A - Route optimization method and system based on graph neural network and deep reinforcement learning

Info

Publication number: CN113194034A
Application number: CN202110435964.9A
Authority: CN
Inventors: 戴彬; 伍仲丽; 吕梦达
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2021-07-30

Abstract

The invention discloses a route optimization method and system based on a graph neural network and deep reinforcement learning, and belongs to the field of network route optimization. Measuring a current network state s, and selecting a shortest path from k source nodes to a target node as an action set a according to a flow demand required to be distributed by a current network state request; inputting the action set a into a graph neural network, aggregating link characteristics and carrying out iterative updating, and obtaining a network state s and an estimated Q value of the action set a through a Q function; and performing deep reinforcement learning according to the estimated Q value to obtain a routing strategy in the current network state, and feeding back the routing strategy to the network topology to execute corresponding routing action. The invention provides a network route optimization system structure based on a graph neural network and deep reinforcement learning, and aims to utilize the graph neural network to learn the relationship between pixel elements in topology and the rules for forming the graph neural network and utilize a deep reinforcement learning algorithm to make decisions so as to optimize network routes.

Description

Route optimization method and system based on graph neural network and deep reinforcement learning

Technical Field

The invention belongs to the field of network route optimization, and particularly relates to a route optimization method and system based on a graph neural network and deep reinforcement learning.

Background

In the network field, finding the optimal routing configuration from a given traffic matrix is a basic problem, and is also a non-deterministic polynomial problem (NP). Existing solutions based on Deep Learning (DRL) usually preprocess data from network states, present the data in a matrix with a fixed size, and then process the data by a conventional neural network (e.g., a fully-connected neural network, a convolutional neural network), so as to solve the routing optimization problem. Deep reinforcement learning is researched as a key technology of Network routing optimization, and the goal is to establish a self-driven Software Defined Network (SDN). However, when the deep reinforcement learning technology is applied to different network scenes, the method cannot be popularized, because most of the existing deep reinforcement learning methods can only use a fixed network topology during training, but cannot be popularized and effectively operated on a dynamic network topology. The main reason for this limitation is that computer networks are essentially expressed based on graph structures (such as network topology and routing strategies), but the current deep reinforcement learning-based schemes almost entirely use traditional neural network structures, which are not suitable for learning and generalizing graph structure information and cannot model the graph structure information. Even if the most advanced deep reinforcement learning method (such as SoA-DRL) is used, the performance is not ideal when the dynamic network topology is trained, and the method cannot be popularized to a new network topology.

In recent years, Graph Neural Networks (GNNs) have been proposed to model and operate graphs in order to facilitate relational reasoning and structural generalization. In other words, graph neural networks help learn the relationships between graph elements and the rules that make up them, which shows unprecedented generalization ability in the field of network modeling and optimization. And the network topology and routing strategy just need to be learned and optimized for the algorithm of the graph structure.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a route optimization method and a route optimization system based on a graph neural network and deep reinforcement learning, and aims to solve the problem of flow distribution of dynamic and unknown network topologies which is difficult to solve by the existing deep reinforcement learning.

In an SDN-based network topology scenario, the SDN controller has a global view of the current network state and must make routing decisions at the arrival of each traffic demand. This problem can be described as a typical network resource allocation problem: in a network topology, each link has a fixed channel capacity, and the controller is responsible for receiving traffic requests and allocating different bandwidths to each link in real time as needed. Therefore, the key issue is how to allocate bandwidth in a network topology to maximize the traffic through the network topology. In this case, the route optimization problem is defined as: an optimal routing strategy is found for each incoming traffic demand from source to destination.

To achieve the above object, according to a first aspect of the present invention, there is provided a route optimization method based on a graph neural network and deep reinforcement learning, including the following steps:

s0., measuring the current network state s, and taking the traffic demand allocated by the current network state request as the target traffic demand;

s1, selecting k shortest paths from source nodes to target nodes according to target flow requirements, wherein a set of the shortest paths is called an action set a, and k is a positive integer;

s2, inputting the action set a into a graph neural network to calculate aggregation link characteristics, performing aggregation and iterative updating, and obtaining a network state s and an estimated Q value of the action set a through a Q function;

s3, performing deep reinforcement learning according to the estimated Q value to obtain a routing strategy in the current network state, feeding the routing strategy back to the network topology to execute corresponding routing action, and obtaining a new network state s';

s4, judging whether a new flow demand exists or not by combining the new network state S ', if so, taking the flow demand requested to be distributed by the new network state S' as a target flow demand, and returning to S1; if not, the next occurrence of the traffic demand is waited, and the process returns to S0.

Preferably, step S2 specifically includes:

calculating link characteristics of each link and adjacent links on the shortest path, aggregating the link characteristics connected with the same node, and updating the link characteristics of each link;

iterating the steps for T times, wherein T is a preset value;

and aggregating the link characteristics after iterative updating, and obtaining the estimated Q value of the network state s and the action set a through a Q function.

Preferably, the graph neural network is a neural network model consisting of a fully connected network and a recurrent neural network RNN:

calculating link characteristics by using a message transfer algorithm;

completing aggregation of link characteristics by a fully connected neural network;

the updating of the link characteristics is carried out by the recurrent neural network RNN.

Preferably, the method further comprises: and regularly acquiring the reward r after the routing action is executed, feeding the reward r back to the deep reinforcement learning for accumulation, and training the deep reinforcement learning.

Preferably, the method further comprises: after obtaining the reward r after executing the routing action each time, forming a tuple { s, a, r, s '} by the current network state s, the action set a, the reward r and the new network state s', storing the tuple in an experience replay buffer, training the graph neural network by randomly sampling from the experience replay buffer, and updating the parameters of the graph structure network.

Preferably, deep reinforcement learning obtains an estimated Q value, an exploration strategy of E-greedy is used, the estimated Q value is randomly selected according to the probability of E, the maximum value of the estimated Q value is selected according to the probability (1-E), and the final selection result is used as a routing strategy in the current network state.

Preferably, the network state is defined by the characteristics of the topological links, including link capacity, link betweenness, current traffic demand, and the like. The link capacity represents the available capacity on the link, and the link betweenness is a centrality measure inherited from graph theory, representing how many paths are possible to traverse the link. In particular toThe link betweenness may be calculated by: for each pair of nodes in the topology, we compute k candidate paths (e.g., k shortest paths) and update each link counter, which indicates how many paths pass through the link. Thus, the betweenness on each link is the number of end-to-end paths through the link divided by the total number of paths. For data processing purposes, we set the link state eigenvalues to the vector { x }₁,x₂,…,x_NIn which x₁For link available capacity, x₂Is the link number, x₃Representing the bandwidth requirement (bandwidth allocated on the link after application of the routing operation) allocated according to the current traffic request, x₄-x_NVector values that are zero-padded.

Preferably, the number of route combinations that are possible per source-to-destination node pair will typically result in a realistic large-scale network operating in a high-dimensional data space. This complicates the routing problem very much, since the controller should estimate the Q-value for all possible routing actions. To reduce dimensionality, we limit the set of operations per source-to-target pair to k candidate paths. In the experimental environment adopted by the invention, in order to keep a good balance between the flexibility of routing traffic and the evaluation cost, we select k as 4 shortest paths (in hops). The action set may vary depending on the source node and destination node routing traffic demands.

The invention provides a DRL + GNN network route optimization architecture in an innovative way by aiming at a model popularization problem in a network route optimization problem and combining a Graph Neural Network (GNN) for modeling and operating a graph. The system structure aims to utilize GNN to learn the relationship between the graphic elements in the topology and the rules for forming the graphic elements, and utilize a DRL algorithm to make decisions, so that the network routing is optimized, and the system structure has the potential of being popularized to dynamic and unknown network topologies. The specific objective of the architecture is to allocate traffic when traffic demand arrives, maximizing traffic through the network route, thereby implementing a network route optimization method that can be generalized and optimized.

According to a second aspect of the present invention, there is provided a route optimization system based on a graph neural network and deep reinforcement learning, comprising: a computer-readable storage medium and a processor;

the computer-readable storage medium is used for storing executable instructions;

the processor is used for reading the executable instructions stored in the computer readable storage medium and executing the route optimization method based on the graph neural network and the deep reinforcement learning.

Through the technical scheme, compared with the prior art, the network route optimization system structure based on the graph neural network and the deep reinforcement learning is innovatively provided by combining the graph neural network for modeling and operating the graph aiming at the model popularization problem in the network route optimization problem. The system structure aims to utilize the graph neural network to learn the relationship between the graphic elements in the topology and the rules for forming the graphic elements, and utilizes the deep reinforcement learning algorithm to make decisions, thereby optimizing the network routing and having the potential of being popularized to dynamic and unknown network topologies. Compared with the deep reinforcement learning method of the traditional neural network structure, the system structure can optimize the routing performance to a greater degree, and particularly has stronger generalization popularization capability on dynamic and unknown network topologies.

Drawings

FIG. 1 is a block diagram of a network architecture to which a routing optimization method based on a graph neural network and deep reinforcement learning is applied, according to the present invention;

FIG. 2 is a schematic flow chart of a route optimization method based on a graph neural network and deep reinforcement learning according to the present invention;

fig. 3 is a schematic diagram of a message passing algorithm-based graph neural network architecture provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention provides a route optimization method based on a graph neural network and deep reinforcement learning, which is applied to a network architecture shown in figure 1, wherein the flow schematic diagram of the method is shown in figure 2, and the method comprises the following steps:

Specifically, step S2 specifically includes:

iterating the steps for T times, wherein T is a preset value;

Specifically, the graph neural network is a neural network model composed of a fully-connected network and a recurrent neural network RNN:

calculating link characteristics by using a message transfer algorithm;

As shown in fig. 3, the specific steps are:

calculating the link characteristics of each link and the adjacent links on the path through a message function (M);

aggregating the link characteristics connected with the same node by using a fully connected neural network (sum);

using Recurrent Neural Network (RNN) to update the link state of each link; iterating the step I and the step III for T times;

inputting the link state after iterative update into a fully connected neural network (sum) for aggregation, and obtaining the estimated Q value of the network state s and the action set a through a Q function.

Specifically, the method further comprises: and regularly acquiring the reward r after the routing action is executed, feeding the reward r back to the deep reinforcement learning for accumulation, and training the deep reinforcement learning.

Specifically, the method further comprises: after obtaining the reward r after executing the routing action each time, forming a tuple { s, a, r, s ' } by the current network state s, the action set a, the reward r and the new network state s ', storing the tuple { s, a, r, s ' } generated each time by adopting an experience replay buffer, training the graph neural network by random sampling from the tuple, and updating the parameters of the graph structure network.

Specifically, deep reinforcement learning obtains an estimated Q value, an exploration strategy of E-greedy is used, the estimated Q value is randomly selected according to the probability of E, the maximum value of the estimated Q value is selected according to the probability (1-E), and the final selection result is used as a routing strategy in the current network state.

Specifically, the network state is defined by the characteristics of the topological links, including link capacity, link betweenness, current traffic demand, and the like. The link capacity represents the available capacity on the link, and the link betweenness is a centrality measure inherited from graph theory, representing how many paths are possible to traverse the link. The link betweenness may be specifically calculated by: for each pair of nodes in the topology, we compute k candidate paths(e.g., k shortest paths) and updates each link counter indicating how many paths pass through the link. Thus, the betweenness on each link is the number of end-to-end paths through the link divided by the total number of paths. For data processing purposes, we set the link state eigenvalues to the vector { x }₁,x₂,…,x_NIn which x₁For link available capacity, x₂Is the link number, x₃Representing the bandwidth requirement (bandwidth allocated on the link after application of the routing operation) allocated according to the current traffic request, x₄-x_NVector values that are zero-padded.

In particular, the number of route combinations that each source-to-destination node pair may implement typically results in a realistic large-scale network operating in a high-dimensional data space. This complicates the routing problem very much, since the controller should estimate the Q-value for all possible routing actions. To reduce dimensionality, we limit the set of operations per source-to-target pair to k candidate paths. In the experimental environment adopted by the invention, in order to keep a good balance between the flexibility of routing traffic and the evaluation cost, we select k as 4 shortest paths (in hops). The action set may vary depending on the source node and destination node routing traffic demands.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A route optimization method based on a graph neural network and deep reinforcement learning is characterized by comprising the following steps:

s2, inputting the action set a into a graph neural network to calculate link characteristics, performing aggregation and iterative updating, and obtaining a network state s and an estimated Q value of the action set a through a Q function;

2. The route optimization method according to claim 1, wherein the step S2 specifically includes:

iterating the steps for T times, wherein T is a preset value;

3. The route optimization method according to claim 2, wherein the graph neural network is a neural network model composed of a fully connected network and a recurrent neural network RNN:

calculating link characteristics by using a message transfer algorithm;

4. The route optimization method of claim 1, further comprising: and regularly acquiring the reward r after the routing action is executed, feeding the reward r back to the deep reinforcement learning for accumulation, and training the deep reinforcement learning.

5. The route optimization method of claim 4, further comprising: after obtaining the reward r after executing the routing action each time, forming a tuple { s, a, r, s '} by the current network state s, the action set a, the reward r and the new network state s', and accumulating the tuple;

and training the graph neural network by randomly sampling from the accumulated tuples, and updating the parameters of the graph structure network.

6. The route optimization method of claim 3, wherein the deep reinforcement learning obtains the estimated Q value, the estimated Q value is randomly selected with a probability of e by using an e-greedy exploration strategy, the maximum value of the estimated Q value is selected with a probability of (1-e), and the final selection result is used as the routing strategy in the current network state.

7. The route optimization method according to claim 1, wherein the vector for network states { x }₁,x₂,…,x_NDenotes wherein x₁For link available capacity, x₂Is the link number, x₃For the current flow demand, x₄～x_NIs a vector value of zero padding, and N is the number of network states.

8. A route optimization system based on a graph neural network and deep reinforcement learning is characterized by comprising: a computer-readable storage medium and a processor;

the processor is used for reading executable instructions stored in the computer-readable storage medium and executing the method for route optimization based on the graph neural network and the deep reinforcement learning according to any one of claims 1 to 7.