WO2024037136A1

WO2024037136A1 - Graph structure feature-based routing optimization method and system

Info

Publication number: WO2024037136A1
Application number: PCT/CN2023/098735
Authority: WO
Inventors: 郭永安; 吴庆鹏; 张啸; 佘昊; 钱琪杰
Original assignee: 南京邮电大学
Priority date: 2022-08-15
Filing date: 2023-06-07
Publication date: 2024-02-22
Also published as: CN115225561B; CN115225561A

Abstract

Disclosed in the present invention are a graph structure feature-based routing optimization method and system. The system is used for an SDN environment and comprises a control plane and a data plane, wherein the control plane comprises an information acquisition module, a strategy deployment module, and a deep graph learning (DGL) module. The method comprises: acquiring network topology structure information and information in a network, and generating a corresponding graph adjacency matrix and a network information feature matrix; training a graph learning neural network according to the graph adjacency matrix and the network information feature matrix, to obtain a DGL model allowing for the minimum SDN routing overhead and the maximum link utilization rate; and using the DGL model and deploying same to an SDN. According to the method, a dynamic and complex network topology is learned from a spatial dimension, the difficulty in optimizing a dynamic topology is overcome, and a better routing scheme is provided for the SDN.

Description

A route optimization method and system based on graph structure characteristics

Technical field

The invention relates to the field of computer network technology, and in particular to a routing optimization method and system based on graph structure characteristics.

Background technique

In recent years, with the complexity of the network environment and the diversification of business traffic, the routing path optimization problem has become a research hotspot. In traditional networks, routing selection adopts the best-effort model and uses OSPF technology to provide the shortest path, which cannot adapt to dynamic and complex network environments. The proposal of Software Defined Network (SDN) architecture decouples the control plane and data plane of traditional networks, greatly increasing the space for solutions to routing optimization problems. In an SDN environment, the combination of deep reinforcement learning and neural networks can greatly help routing decisions. However, algorithms such as CNN, RNN, and LSTM are essentially suitable for Euclidean spaces, such as images, grids, etc. Network topology is usually a complex model with strong spatial correlation between links and nodes. It is difficult for traditional neural networks to express this feature, and routing optimization models based on deep reinforcement learning are When the network topology changes, it needs to be retrained, and it does not have the ability to generalize to dynamic topology. Therefore, there is a need for a method that can extract the spatial characteristics of network topology, learn dynamic and complex network topology from the spatial dimension, and be able to overcome the optimization problems of dynamic topology and provide better routing solutions.

Contents of the invention

The purpose of the present invention is to provide a routing optimization method and system based on graph structure characteristics, which is suitable for SDN network environments. Switches or routing devices support traditional layer 2 network protocols to optimize global routing overhead from multiple network attributes. , adapt to dynamic and complex SDN networks and ensure SDN network performance.

In order to realize the above functions, the present invention designs a routing optimization method based on graph structure characteristics. For the target SDN network, the following steps S1 to S3 are performed to obtain the routing overhead of each link in the target SDN network and adjust each link. weight to complete routing optimization of the target SDN network.

Step S1: For the target SDN network, based on the southbound interface protocol, obtain the network topology diagram of the target SDN network, and construct a graph adjacency matrix according to the connection relationship between the nodes on each link of the target SDN network in the network topology diagram, respectively. For each node on each link of the target SDN network, construct the information feature vector of each node based on the link bandwidth, traffic, packet loss rate, and transmission delay of each node, and build the target SDN based on the information feature vector of each node Network information feature matrix of the network.

Step S2: Taking the graph adjacency matrix and the network information feature matrix as the state of the target SDN network, based on the graph learning algorithm Through the network, the graph adjacency matrix and the network information feature matrix are used as input, through the deep graph learning method, the routing strategy and routing cost of the target SDN network in the current state are used as the output, and based on the gradient back propagation method, the graph learning neural network is updated. Network parameters, and after a preset number of iterations, the graph learning neural network is trained to obtain a deep graph learning model that minimizes the routing overhead of the target SDN network and maximizes link utilization.

Step S3: Based on the trained deep graph learning model and the status of the target SDN network, obtain the routing strategy that minimizes the routing cost of the target SDN network, deploy the routing strategy to the target SDN network, and change each link of the target SDN network according to the routing strategy. Route weight to complete routing optimization of the target SDN network.

As a preferred technical solution of the present invention: the specific steps of step S1 are as follows:

Step S1.1: For the target SDN network, based on the southbound interface protocol, obtain the network topology of the target SDN network, where the network topology includes M routers and N links.

Step S1.2: Based on the network topology of the target SDN network, each router corresponds to a real node, and each link corresponds to an edge. Insert a virtual node on the edge corresponding to each link, and combine the network topology of the target SDN network. The topology is expressed as a network topology graph G(V,E) with M real nodes, N virtual nodes, and 2N edges, where V represents the node set and E represents the edge set, specifically as follows:
V={V _real , V _virtual }

Among them, V _real represents the set of real nodes, and V _virtual represents the set of virtual nodes;
V _real ={v _s1 , v _s2 ,..., v _sM }

Among them, v _s1 , v _s2 ,..., v _sM represent M real nodes;
V _virtual ={v _x1 , v _x2 ,..., v _xN }

Among them, v _x1 , v _x2 ,..., v _xN represent N virtual nodes;
E＝{e ₁ , e ₂ ,..., e _2N }

Among them, e ₁ , e ₂ ,..., e _2N represent 2N edges.

Step S1.3: Let x=M+N, x represents the total number of nodes, and the nodes include M real nodes and N virtual nodes. Based on the network topology diagram of the target SDN network, construct an x-order graph adjacency matrix A as follows:

Among them, the elements a _ij in the graph adjacency matrix A are as follows:

Step S1.4: For any node i of the target SDN network, based on the link bandwidth, traffic, packet loss rate, and transmission delay of node i, construct the information feature vector h i of node _i as follows:
h _i =[B _wi , T _hi , L _pi , D _ti ]

In the formula, B _wi is the link bandwidth of node i, T _hi is the traffic of node i, L _pi is the packet loss rate of node i, and D _ti is the transmission delay of node i;

Based on the information feature vector of each node, the network information feature matrix H of the target SDN network is constructed as follows:

In the formula, h ₁ , h ₂ ,..., h _i ,..., h _x are the information feature vectors of each node.

As a preferred technical solution of the present invention: node i described in step S1.4, if node i is a virtual node, then the traffic T _hi , packet loss rate L _pi , and transmission delay D _ti of node i are 0, If node i is a real node, the link bandwidth B _wi of node i is 0.

As a preferred technical solution of the present invention: the deep graph learning method in step S2 includes four graph learning neural networks and an experience pool. The four graph learning neural networks are respectively an online graph policy network, an online graph value network, and a target Graph policy network, target graph value network, and the four graph learning neural networks each include an input layer, two hidden layers, and an output layer.

The input layer of the online graph policy network and the target graph policy network is based on the graph adjacency matrix A and the network information feature matrix H As input, the outputs of the online graph policy network and the target graph policy network are used as the inputs of the online graph value network and the target graph value network respectively. Each graph learns the propagation formula from the input layer of the neural network to the hidden layer and between hidden layers. Similarly, if the input layer is recorded as layer 0, the first hidden layer is recorded as layer 1, and the second hidden layer is recorded as layer 2, the propagation formula is as follows:

In the formula, σ(·) means normalizing the formula inside the brackets, H ^l is the network information feature matrix of the l-th layer, W ^l+1 is the weight matrix of the l+1-th layer, where H ⁰ =H , I is the x-order unit matrix, for The degree matrix of As follows:

in, As follows:

Among them, in the online graph policy network and the target graph policy network, W ¹ is a 4×4 matrix, W ² is a 4×1 matrix, the output layer is a fully connected layer, and its output value is an x×1 matrix, denoted is the routing policy Policy, the specific formula is as follows:
Policy=H ² ×K

In the formula, K is the weight matrix of the output layer of the online graph policy network and the target graph policy network, and H ² is the network information feature matrix of the second layer; in the online graph value network and the target graph value network, W ¹ and W ² are both A 1×1 matrix, the output layer is the aggregation layer, and its output value is a 1×1 matrix, recorded as Value, specifically as follows:

In the formula, Q is the weight value of the output layer, is the i-th value in the layer 2 network information feature matrix ^H2 ; according to the routing policy Policy output by the online graph policy network, the routing cost of each link in the target SDN network is updated.

As a preferred technical solution of the present invention: the specific steps of step S2 are as follows:

Step S2.1: Initialize the weight matrices of the online graph policy network, online graph value network, target policy network, and target graph value network. Among them, the weight matrix of the online graph policy network is W _θ and the weight matrix of the online graph value network is W _θ′ , the weight matrix of the target graph policy network is W _ω , and the weight matrix of the target graph value network is W _ω′ .

Step S2.2: Initialize the experience pool. The specific steps are as follows:

Step S2.2.1: Use the graph adjacency matrix A and the network information feature matrix H as the state S of the target SDN network, and define S = [A, H], s _t represents the status of the target SDN network at time t, s _t = [A _t , H _t ], A _t represents the graph adjacency matrix of the target SDN network at time t, H _t represents the target SDN at time t Network information feature matrix of the network.

Step S2.2.2: Definition are the outputs of the output layer of the online graph policy network, target graph policy network, online graph value network, and target graph value network at time t respectively; calculate the output routing policy of the online graph policy network according to the following formula The obtained environmental feedback f _t :
f _t =U(B _w , _Th , L _p , D _t )×K _f

In the formula, U(B _w , _Th , L _p , D _t ) is the link utilization rate, B _w , _Th , L _p , and D _t are the link bandwidth, traffic, and packet loss rate of the target SDN network respectively. Transmission delay, K _f is the proportional coefficient; the objective function to construct the target SDN network link utilization maximization is U _max (B _w , _Th , L _p , D _t ).

Step S2.2.3: Define the experience pool R as follows:

In the formula, s _t+1 represents the status of the target SDN network at time t+1, that is, the online graph policy network outputs the routing policy The obtained status of the target SDN network.

Step S2.3: For the target SDN network, perform a preset number of iterations, where the preset number of iterations is T. The specific steps are as follows:

Step S2.3.1: Let t=1 and obtain the initial state s ₁ of the target SDN network;

Step S2.3.2: Based on the status s _t of the target SDN network at time t, the online graph policy network outputs the routing policy. The process is recorded as Among them, θ is the network parameter of the online graph policy network;

Step S2.3.3: According to routing policy Update the routing costs of each link in the target SDN network;

Step S2.3.4: Obtain the routing policy Updated target SDN network state s _t+1 , and obtain environmental feedback f _t at the same time;

Step S2.3.5: Place Stored in the experience pool R as a set of historical records;

Step S2.3.6: Randomly select Y groups of historical records from the experience pool R Among them, the subscript m represents any set of historical records in the experience pool R;

Step S2.3.7: Based on the historical records extracted in step S2.3.6 Calculate target map value The output corresponding to the network As follows:

In the formula, Represents the routing strategy selected by the target graph policy network based on the state s _m+1 of the target SDN network, θ′ is the network parameter of the target graph policy network, ω′ is the network parameter of the target graph value network, Indicates the expected value of the routing policy π′ (s _m+ 1 |θ′) selected by the target graph policy network when the target graph value network is based on the state s _m+1 of the target SDN network and the network parameter is ω′, and γ is the discount. Factor is a constant, and γ∈(0,1);

Step S2.3.8: Calculate the loss Loss _ogvn of the online graph value network output value according to the following formula:

In the formula, The online graph value network representing the network parameter ω is in the state s _m of the target SDN network, and when the routing policy output by the online graph policy network is π(s _m |θ), the value output by the online graph value network;

Step S2.3.9: According to the loss Loss _ogvn of the output value of the online graph value network, based on the gradient backpropagation method, update the network parameters ω of the online graph value network;

Step S2.3.10: Calculate gradient value According to the gradient value Based on the gradient backpropagation method, the network parameters θ of the online graph policy network are updated, where Indicates finding the gradient of the formula in parentheses;

Step S2.3.11: Update the network parameters θ′ of the target graph policy network and the network parameters ω′ of the target graph value network according to the following formulas:
θ′=τθ+(1-τ)θ′
ω′=τω+(1-τ)ω′

In the formula, τ is a constant, and τ∈(0,1);

Step S2.3.12: Repeat S2.3.2 to step S2.3.11 until the number of iterations reaches the preset number T, and the target SDN is obtained.

Routing strategy with minimum network routing cost.

As a preferred technical solution of the present invention: the specific steps of step S3 are as follows:

Step S31: Obtain the graph adjacency matrix A and network information feature matrix H of the target SDN network;

Step S32: Based on the trained deep graph learning model and according to the status [A, H] of the target SDN network, obtain the routing strategy that minimizes the routing cost of the target SDN network;

Step S33: Deploy to the target SDN network according to the routing policy obtained in step S32, and change the link weights of the target SDN network according to the routing policy;

Step S34: During the traffic transmission process, the updated weight of each link is used for traffic transmission according to the shortest path scheme.

The present invention also designs a system for route optimization method based on graph structure characteristics. The target SDN network includes a control plane and a data plane, where the control plane includes an information acquisition module, a policy deployment module, and a DGL module; so that the method based on graph structure characteristics The system of routing optimization method implements the routing optimization method based on graph structure characteristics.

Each link and node of the target SDN network is deployed on the data plane. The information acquisition module on the control plane is used to obtain the network topology diagram of the target SDN network, generate a graph adjacency matrix and a network information feature matrix, and send them to the DGL module.

The DGL module is based on the graph learning neural network. It takes the graph adjacency matrix and the network information feature matrix as inputs. Through the deep graph learning method, it uses the routing cost of the target SDN network in the current state as the output. Based on the gradient back propagation method, it updates the graph learning neural network. Network parameters of the network, and after a preset number of iterations, the graph learning neural network is trained to obtain a deep graph learning model that minimizes the routing overhead of the target SDN network and maximizes link utilization.

The policy deployment module on the control plane is used to obtain the routing strategy that minimizes the routing cost of the target SDN network based on the trained deep graph learning model obtained by the DGL module and based on the status of the target SDN network, and combine the routing strategy with the target SDN network Routing overhead is sent to the data plane.

Beneficial effects: Compared with the existing technology, the advantages of the present invention include:

1. Use graph learning neural network to obtain the spatial relationship between nodes and links in the network topology;

2. Use policy network and value network methods to conduct unsupervised learning on the algorithm, making the algorithm's learning ability more detailed;

3. Use intelligent algorithms to optimize routing overhead in SDN network environments and improve link utilization, thus optimizing average end-to-end delay, packet loss rate, throughput, etc.;

4. The deep graph learning model has strong generalization ability. The trained deep graph learning model is still effective when the network topology changes, and can adapt to large-scale dynamic and complex networks.

Description of drawings

Figure 1 is an overall block diagram of a system based on a route optimization method based on graph structure features provided according to an embodiment of the present invention;

Figure 2 is a DGL algorithm framework diagram provided according to an embodiment of the present invention;

Figure 3 is a structural diagram of a graph learning neural network provided according to an embodiment of the present invention.

Detailed ways

The present invention will be further described below in conjunction with the accompanying drawings. The following examples are only used to more clearly illustrate the technical solutions of the present invention, but cannot be used to limit the scope of the present invention.

The embodiment of the present invention provides a routing optimization method based on graph structure characteristics. For the target SDN network, the following steps S1 to S3 are performed to obtain the routing overhead of each link in the target SDN network and adjust the weight of each link. , complete the routing optimization of the target SDN network.

Step S1: Referring to Figure 1, for the target SDN network, obtain the network topology diagram of the target SDN network based on the southbound interface protocol, and construct a diagram based on the connection relationships between the nodes on each link of the target SDN network in the network topology diagram. The adjacency matrix is for each node on each link of the target SDN network, and based on the link bandwidth, traffic, packet loss rate, and transmission delay of each node, the information feature vector of each node is constructed, and based on the information feature vector of each node , construct the network information feature matrix of the target SDN network.

The specific steps of step S1 are as follows:

Among them, e ₁ , e ₂ ,..., e _2N represent 2N edges.

Among them, the elements a _ij in the graph adjacency matrix A are as follows:

In the formula, B _wi is the link bandwidth of node i, T _hi is the traffic of node i, L _pi is the packet loss rate of node i, and D _ti is the transmission delay of node i.

For the node i, if the node i is a virtual node, the traffic T _hi , the packet loss rate L _pi , and the transmission delay D _ti of the node i are 0. If the node i is a real node, the link bandwidth B _wi of the node i is 0.

Step S2: Take the graph adjacency matrix and the network information feature matrix as the target SDN network state, learn the neural network based on the graph, take the graph adjacency matrix and the network information feature matrix as input, and use the deep graph learning method (Deep Graph Learning, DGL) to Taking the routing strategy and routing cost of the target SDN network in the current state as the output, based on the gradient back propagation method, the network parameters of the graph learning neural network are updated, and after a preset number of iterations, the graph learning neural network is Conduct training to obtain a deep graph learning model that minimizes routing overhead and maximizes link utilization in the target SDN network.

The deep graph learning method described in step S2 includes four graph learning neural networks and an experience pool. Referring to Figure 2, the four graph learning neural networks are the Online Graph Strategy Network (OGSN) and the Online Graph Value Network. (Online Graph Value Network, OGVN), Target Graph Strategy Network (TGSN), Target Graph Value Network (Target Graph Value Network, TGVN), referring to Figure 3, each of the four graph learning neural networks includes an input layer, two hidden layers, and an output layer.

The input layer of the online graph policy network and the target graph policy network takes the graph adjacency matrix A and the network information feature matrix H as inputs, and the outputs of the online graph policy network and the target graph policy network serve as the inputs of the online graph value network and the target graph value network respectively. , among them, the propagation formulas from the input layer to the hidden layer and between hidden layers of each graph learning neural network are the same. The input layer is recorded as layer 0, the first hidden layer is recorded as layer 1, and the second hidden layer Denoted as layer 2, the propagation formula is as follows:

in, As follows:

In the formula, K is the weight matrix of the output layer of the online graph policy network and the target graph policy network, and H ² is the network information feature matrix of the second layer.

In the online graph value network and the target graph value network, W ¹ and W ² are both 1×1 matrices, the output layer is the aggregation layer, and its output value is a 1×1 matrix, recorded as Value, as follows:

Referring to Figure 2, the specific steps of step S2 are as follows:

Step S2.1: Initialize the weight matrices of the online graph policy network, online graph value network, target policy network, and target graph value network. Among them, the weight matrix of the online graph policy network is W _θ and the weight matrix of the online graph value network is W _θ′ , the weight matrix of the goal graph policy network is W _ω , and the weight matrix of the goal graph value network is W _ω′ . During initialization, the network parameters of the online graph policy network and the goal policy network are consistent, and the online graph value network and goal graph The network parameters of the value network are consistent.

Step S2.2: Initialize the experience pool. The specific steps are as follows:

Step S2.2.1: Take the graph adjacency matrix A and the network information feature matrix H as the state S of the target SDN network, define S = [A, H], s _t represents the state of the target SDN network at time t, s _t = [A _t , H _t ], A _t represents the graph adjacency matrix of the target SDN network at time t, and H _t represents the network information feature matrix of the target SDN network at time t.

In the formula, U(B _w , _Th , L _p , D _t ) is the link utilization rate, B _w , _Th , L _p , and D _t are the link bandwidth, traffic, and packet loss rate of the target SDN network respectively. Transmission delay, K _f is the proportional coefficient.

The objective function to construct the target SDN network link utilization maximization is U _max (B _w , _Th , L _p , D _t ).

Step S2.2.3: Define the experience pool R as follows:

Step S2.3.7: Based on the historical records extracted in step S2.3.6 Calculate the output corresponding to the target graph value network As follows:

In the formula, Represents the routing strategy selected by the target graph policy network based on the state s _m+1 of the target SDN network, θ′ is the network parameter of the target graph policy network, ω′ is the network parameter of the target graph value network, Indicates the expected value of the routing policy π′ (s _m+ 1 |θ′) selected by the target graph policy network when the target graph value network is based on the state s _m+1 of the target SDN network and the network parameter is ω′, and γ is the discount. The factor is a constant, and γ∈(0,1).

In the formula, The online graph value network representing the network parameter ω is in the state s _m of the target SDN network, and when the routing policy output by the online graph policy network is π(s _m |θ), the online graph value network The value of the output.

Step S2.3.9: According to the loss Loss _ogvn of the output value of the online graph value network, based on the gradient backpropagation method, update the network parameters ω of the online graph value network.

Step S2.3.10: Calculate gradient value According to the gradient value Based on the gradient backpropagation method, the network parameters θ of the online graph policy network are updated, where Indicates finding the gradient of the formula in parentheses.

In the formula, τ is a constant, and τ∈(0,1).

Step S2.3.12: Repeat S2.3.2 to Step S2.3.11 until the number of iterations reaches the preset number T, and obtain the routing strategy that minimizes the routing cost of the target SDN network.

The specific steps of step S3 are as follows:

Embodiments of the present invention also provide a system for routing optimization methods based on graph structure characteristics. Referring to Figure 1, the target SDN network includes a control plane and a data plane, where the control plane includes an information acquisition module, a policy deployment module, and a DGL module; such that The system of the routing optimization method based on graph structure characteristics implements the routing optimization method based on graph structure characteristics.

The embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those of ordinary skill in the art, other modifications can be made without departing from the spirit of the present invention. Various changes.

Claims

A routing optimization method based on graph structure features, which is characterized by executing the following steps S1 to S3 for the target SDN network, obtaining the routing overhead of each link in the target SDN network, adjusting the weight of each link, and completing Routing optimization of target SDN network:

Step S1: For the target SDN network, based on the southbound interface protocol, obtain the network topology diagram of the target SDN network, and construct a graph adjacency matrix according to the connection relationship between the nodes on each link of the target SDN network in the network topology diagram, respectively. For each node on each link of the target SDN network, construct the information feature vector of each node based on the link bandwidth, traffic, packet loss rate, and transmission delay of each node, and build the target SDN based on the information feature vector of each node Network information feature matrix of the network;

Step S2: Take the graph adjacency matrix and the network information feature matrix as the state of the target SDN network, learn the neural network based on the graph, take the graph adjacency matrix and the network information feature matrix as input, and use the deep graph learning method to learn the target SDN network in the current state. The routing strategy and routing cost are the output. Based on the gradient back propagation method, the network parameters of the graph learning neural network are updated, and after a preset number of iterations, the graph learning neural network is trained to obtain the minimum routing cost of the target SDN network. Deep graph learning model with maximum link utilization;

Step S3: Based on the trained deep graph learning model and the status of the target SDN network, obtain the routing strategy that minimizes the routing cost of the target SDN network, deploy the routing strategy to the target SDN network, and change each link of the target SDN network according to the routing strategy. Route weight to complete routing optimization of the target SDN network.
A route optimization method based on graph structure characteristics according to claim 1, characterized in that the specific steps of step S1 are as follows:

Step S1.1: For the target SDN network, based on the southbound interface protocol, obtain the network topology of the target SDN network, where the network topology includes M routers and N links;

Step S1.2: Based on the network topology of the target SDN network, each router corresponds to a real node, and each link corresponds to an edge. Insert a virtual node on the edge corresponding to each link, and combine the network topology of the target SDN network. The topology is expressed as a network topology graph G(V,E) with M real nodes, N virtual nodes, and 2N edges, where V represents the node set and E represents the edge set, specifically as follows:
V={V real , V virtual }

Among them, V real represents the set of real nodes, and V virtual represents the set of virtual nodes;

V real ={v s1 , v s2 ,..., v sM }

Among them, v s1 , v s2 ,..., v sM represent M real nodes;
V virtual ={v x1 , v x2 ,..., v xN }

Among them, v x1 , v x2 ,..., v xN represent N virtual nodes;
E＝{e 1 , e 2 ,..., e 2N }

Among them, e 1 , e 2 ,..., e 2N represents 2N edges;

Step S1.3: Let x=M+N, x represents the total number of nodes, and the nodes include M real nodes and N virtual nodes. Based on the network topology diagram of the target SDN network, construct an x-order graph adjacency matrix A as follows:

Among them, the elements a ij in the graph adjacency matrix A are as follows:

Step S1.4: For any node i of the target SDN network, based on the link bandwidth, traffic, packet loss rate, and transmission delay of node i, construct the information feature vector h i of node i as follows:
h i =[B wi , T hi , L pi , D ti ]

In the formula, B wi is the link bandwidth of node i, T hi is the traffic of node i, L pi is the packet loss rate of node i, and D ti is the transmission delay of node i;

Based on the information feature vector of each node, the network information feature matrix H of the target SDN network is constructed as follows:

In the formula, h 1 , h 2 ,..., h i ,..., h x are the information feature vectors of each node.
A route optimization method based on graph structure characteristics according to claim 2, characterized in that in step S1.4 For the node i, if the node i is a virtual node, the traffic T hi , the packet loss rate L pi , and the transmission delay D ti of the node i are 0. If the node i is a real node, the link bandwidth B of the node i wi is 0.
A route optimization method based on graph structure characteristics according to claim 2, characterized in that the deep graph learning method in step S2 includes four graph learning neural networks and an experience pool, and the four graph learning neural networks are respectively They are online graph policy network, online graph value network, target graph policy network, and target graph value network. The four graph learning neural networks each include an input layer, two hidden layers, and an output layer;

The input layer of the online graph policy network and the target graph policy network takes the graph adjacency matrix A and the network information feature matrix H as inputs, and the outputs of the online graph policy network and the target graph policy network serve as the inputs of the online graph value network and the target graph value network respectively. , among them, the propagation formulas from the input layer to the hidden layer and between hidden layers of each graph learning neural network are the same. The input layer is recorded as layer 0, the first hidden layer is recorded as layer 1, and the second hidden layer Denoted as layer 2, the propagation formula is as follows:

In the formula, σ(·) means normalizing the formula inside the brackets, H l is the network information feature matrix of the l-th layer, W l+1 is the weight matrix of the l+1-th layer, where H 0 =H , I is the x-order unit matrix, for The degree matrix of As follows:

in, As follows:

Among them, in the online graph policy network and the target graph policy network, W 1 is a 4×4 matrix, W 2 is a 4×1 matrix, the output layer is a fully connected layer, and its output value is an x×1 matrix, denoted is the routing policy Policy, the specific formula is as follows:
Policy=H 2 ×K

In the formula, K is the weight matrix of the output layer of the online graph policy network and the target graph policy network, and H 2 is the network information feature matrix of the second layer;

In the online graph value network and the target graph value network, W 1 and W 2 are both 1×1 matrices, the output layer is the aggregation layer, and its output value is a 1×1 matrix, recorded as Value, as follows:

In the formula, Q is the weight value of the output layer, is the i-th value in the network information feature matrix H 2 of layer 2;

According to the routing policy Policy output by the online graph policy network, the routing cost of each link in the target SDN network is updated.
A route optimization method based on graph structure characteristics according to claim 4, characterized in that the specific steps of step S2 are as follows:

Step S2.1: Initialize the weight matrices of the online graph policy network, online graph value network, target policy network, and target graph value network. Among them, the weight matrix of the online graph policy network is W θ and the weight matrix of the online graph value network is W θ′ , the weight matrix of the target graph policy network is W ω , and the weight matrix of the target graph value network is W ω′ ;

Step S2.2: Initialize the experience pool. The specific steps are as follows:

Step S2.2.1: Take the graph adjacency matrix A and the network information feature matrix H as the state S of the target SDN network, define S = [A, H], s t represents the state of the target SDN network at time t, s t = [A t , H t ], A t represents the graph adjacency matrix of the target SDN network at time t, H t represents the network information feature matrix of the target SDN network at time t;

Step S2.2.2: Definition are the outputs of the output layer of the online graph policy network, target graph policy network, online graph value network, and target graph value network at time t respectively; calculate the output routing policy of the online graph policy network according to the following formula The obtained environmental feedback f t :
f t =U(B w , Th , L p , D t )×K f

In the formula, U(B w , Th , L p , D t ) is the link utilization rate, B w , Th , L p , and D t are the link bandwidth, traffic, and packet loss rate of the target SDN network respectively. Transmission delay, K f is the proportional coefficient;

The objective function to construct the target SDN network link utilization maximization is U max (B w , Th , L p , D t );

Step S2.2.3: Define the experience pool R as follows:

In the formula, s t+1 represents the status of the target SDN network at time t+1, that is, the online graph policy network outputs the routing policy The obtained status of the target SDN network;

Step S2.3: For the target SDN network, perform a preset number of iterations, where the preset number of iterations is T. The specific steps are as follows Down:

Step S2.3.1: Let t=1 and obtain the initial state s 1 of the target SDN network;

Step S2.3.2: Based on the status s t of the target SDN network at time t, the online graph policy network outputs the routing policy. The process is recorded as Among them, θ is the network parameter of the online graph policy network;

Step S2.3.3: According to routing policy Update the routing costs of each link in the target SDN network;

Step S2.3.4: Obtain the routing policy Updated target SDN network state s t+1 , and obtain environmental feedback f t at the same time;

Step S2.3.5: Place Stored in the experience pool R as a set of historical records;

Step S2.3.6: Randomly select Y groups of historical records from the experience pool R Among them, the subscript m represents any set of historical records in the experience pool R;

Step S2.3.7: Based on the historical records extracted in step S2.3.6 Calculate the output corresponding to the target graph value network As follows:

In the formula, Represents the routing strategy selected by the target graph policy network according to the state sm+1 of the target SDN network, the routing strategy selected by θ′, and ω′ is the network parameter of the target graph value network, Indicates the expected value of the routing policy π′ (s m+ 1 |θ′) selected by the target graph policy network when the target graph value network is based on the state s m+1 of the target SDN network and the network parameter is ω′, and γ is the discount factor. , is a constant, and γ∈(0,1);

Step S2.3.8: Calculate the loss Loss ogvn of the online graph value network output value according to the following formula:

In the formula, The online graph value network representing the network parameter ω is in the state s m of the target SDN network. When the routing policy output by the online graph policy network is π(s m |θ), the online graph value The value of network output;

Step S2.3.9: According to the loss Loss ogvn of the output value of the online graph value network, based on the gradient backpropagation method, update the network parameters ω of the online graph value network;

Step S2.3.10: Calculate gradient value According to the gradient value Based on the gradient backpropagation method, the network parameters θ of the online graph policy network are updated, where Indicates finding the gradient of the formula in parentheses;

Step S2.3.11: Update the network parameters θ′ of the target graph policy network and the network parameters ω′ of the target graph value network according to the following formulas:
θ′=τθ+(1-τ)θ′
ω′=τω+(1-τ)ω′

In the formula, τ is a constant, and τ∈(0,1);

Step S2.3.12: Repeat S2.3.2 to Step S2.3.11 until the number of iterations reaches the preset number T, and obtain the routing strategy that minimizes the routing cost of the target SDN network.
A route optimization method based on graph structure characteristics according to claim 5, characterized in that the specific steps of step S3 are as follows:

Step S31: Obtain the graph adjacency matrix A and network information feature matrix H of the target SDN network;

Step S32: Based on the trained deep graph learning model and according to the status [A, H] of the target SDN network, obtain the routing strategy that minimizes the routing cost of the target SDN network;

Step S33: Deploy to the target SDN network according to the routing policy obtained in step S32, and change the link weights of the target SDN network according to the routing policy;

Step S34: During the traffic transmission process, the updated weight of each link is used for traffic transmission according to the shortest path scheme.
A system of route optimization method based on graph structure characteristics, characterized in that the target SDN network includes a control plane and a data plane, wherein the control plane includes an information acquisition module, a policy deployment module, and a DGL module; so that the said based on graph structure characteristics The system of the routing optimization method implements the routing optimization method based on graph structure characteristics as described in any one of claims 1-6;

Each link and node of the target SDN network is deployed on the data plane. The information acquisition module on the control plane is used to obtain the network topology diagram of the target SDN network, generate a graph adjacency matrix and a network information feature matrix, and send them to the DGL module;

The DGL module is based on the graph learning neural network, taking the graph adjacency matrix and the network information feature matrix as input, and uses deep graphics to The learning method uses the routing cost of the target SDN network in the current state as the output, updates the network parameters of the graph learning neural network based on the gradient back propagation method, and trains the graph learning neural network after a preset number of iterations to obtain the desired performance. A deep graph learning model with minimum routing overhead and maximum link utilization in the target SDN network;

The policy deployment module on the control plane is used to obtain the routing strategy that minimizes the routing cost of the target SDN network based on the trained deep graph learning model obtained by the DGL module and based on the status of the target SDN network, and combine the routing strategy with the target SDN network Routing overhead is sent to the data plane.