CN114051272A

CN114051272A - Intelligent routing method for dynamic topological network

Info

Publication number: CN114051272A
Application number: CN202111278176.XA
Authority: CN
Inventors: 伍元胜; 杜俊逸; 倪大冬; 肖磊; 杨佩彤
Original assignee: Southwest Electronic Technology Institute No 10 Institute of Cetc
Current assignee: Southwest Electronic Technology Institute No 10 Institute of Cetc
Priority date: 2021-10-30
Filing date: 2021-10-30
Publication date: 2022-02-15

Abstract

The intelligent routing method of the dynamic topological network disclosed by the invention has the advantages of good adaptability, support of complex routing constraint and high efficiency. The invention is realized by the following technical scheme: according to the dynamic routing process of the dynamic topological network, modeling the dynamic routing problem of the dynamic topological network into a reinforcement learning routing problem, outputting weights of all links by using a strategy function and a value function in a graph neural network approximate PPO reinforcement learning algorithm and a strategy function by taking the network topology as a part of an environment state, calculating a constraint minimum weight path based on the link weights by the network environment, calculating an optimal routing strategy which can maximize the network throughput by the routing agent for generalization of different network topologies and interactive learning of the network environment, and calculating a minimum weight path a meeting complex routing constraint, thereby realizing the processing of the routing agent for the complex routing constraint; and the routing agent can automatically extract the network topology characteristics and automatically adapt to the dynamically changed topology.

Description

Intelligent routing method for dynamic topological network

Technical Field

The invention belongs to the field of communication network routing, and relates to a dynamic topological network intelligent routing method of a wireless ad hoc network with dynamically changing topology.

Background

In the aspects of modern human society and virtual network society, networks in different forms exist, such as communication networks based on telephone and mail, social networks based on media, urban transportation networks based on transportation, scientific research academic networks based on cooperation, financial networks based on transaction, information communication networks, early warning detection networks, navigation positioning networks, comprehensive information networks and the like. No matter the needs of the people, work and study or cultural life, the network is ubiquitous, and the dependence of the human on the network is highlighted everywhere. Routing is the process of selecting a transmission path for a data packet in a network and is one of the core functions of a communication network.

Routing is generally divided into two categories, static routing and dynamic routing. Static routing is manually configured routing, and corresponding changes cannot be made according to network changes without manual intervention, so that the static routing is not suitable for large or variable networks. The dynamic routing refers to that the router can automatically establish a routing table according to specific routing information exchanged among the routers, and can automatically adjust in due time according to changes of links and nodes, when nodes or links among the nodes in the network have faults or other available routes exist, the dynamic routing can automatically select the optimal available route and continuously forward the message, and the dynamic routing is suitable for dynamic topology networks (such as car networking, unmanned aerial vehicle networks and the like) in which the network topology is dynamically changed due to node movement and link state change. However, conventional dynamic routing techniques, which are usually based on a fixed routing policy, do not adapt well to the dynamic changes of the network. For example, the DSDV routing protocol calculates the shortest path by using the number of routing hops as a link weight, and cannot adapt to the bottleneck link change caused by the topology change, thereby causing network congestion.

In recent years, with the increase of computing power, artificial intelligence technology has been rapidly developed. Among them, deep learning, which is one of the most representative techniques, has been successfully applied in the field of image processing. Deep learning has the capability of automatically extracting features, and can automatically extract low-dimensional features from high-dimensional data for solving the problem of dimension disaster. Modern deep learning approaches generally follow an "end-to-end" design philosophy, emphasizing minimizing a priori representation and computational assumptions, and avoiding explicit structural and manual features. The traditional deep learning represented by a neural network is mainly used for processing regular grid-shaped data such as images, and structural data features are difficult to extract. The graph neural network is a popular technology in the field of deep learning in recent years, is mainly used for learning structured data, and can be used for automatically extracting graph data features.

Reinforcement learning is another representative artificial intelligence technique, and is mainly used for solving intelligent decision problems. Reinforcement learning is an iterative learning process, and in each iteration, an agent explores the environmental state and the agent dynamic space under the guidance of a reward function. The environment state space is represented by a state set S, the action space of the agent is represented by an action set A, and the interaction process of the agent and the environment is as follows: given a certain state s of the environment, the agent will perform a certain action a, the environment will migrate from state s to a new state, while the agent gets a reward r from the environment. The goal of reinforcement learning is to learn an optimal strategy that maximizes long-term cumulative returns. The reinforcement learning algorithm can be largely classified into 3 types, i.e., a value function method, a strategy search method, and a mixed-type AC (Actor-criticc) algorithm. The value function method is mainly used for solving the reinforcement learning problem of discrete motion space, and for the continuous motion space, strategy search or an AC algorithm is usually adopted. The AC algorithm is a combination of a value function method and a strategy search method, wherein an operator and a critic respectively correspond to a strategy function and a value function, the strategy function realizes the mapping from an environment state to the action of the intelligent agent, the value function realizes the evaluation of the current action of the intelligent agent, and the strategy function obtains feedback from the value function to learn. The essence of reinforcement learning is the constant interaction of the agent and the environment, which is often modeled as a markov decision process. Wherein the agent cyclically and continuously observes the state, selects actions and obtains rewards from environmental feedback, and learns action strategies that maximize cumulative rewards in a trial-and-error process. When the state space and the action space are increased, the traditional reinforcement learning faces serious dimensionality disasters and cannot solve the problem of large state space and action space.

The deep reinforcement learning is the combination of the reinforcement learning and the deep learning, absorbs the capability of the deep learning to solve the 'dimension disaster', and can solve the decision problem of high-dimensional state and action space which is difficult to solve by the traditional reinforcement learning. The key problem faced by deep reinforcement learning is the instability problem of the algorithm after the deep neural network is introduced. The trpo (trust Region Policy optimization) algorithm uses a trust domain approach to prevent Policy updates that are too far from the previous Policy, allowing for improvements in Policy performance monotonicity, preventing catastrophic bad Policy updates. The PPO (proximity Policy optimization) algorithm belongs to the AC algorithm, is an improvement on the TRPO algorithm, realizes the limitation on strategy updating by using a truncated alternative objective function, achieves the effect similar to the effect of ensuring strategy updating constraint by using a complex conjugate gradient algorithm for the TRPO, and is much simpler than the TRPO algorithm and better in universality.

In recent years, deep reinforcement learning has been used to solve the routing problem of networks. However, most of the existing deep reinforcement learning routes use traditional neural networks (such as a multilayer perceptron, a convolutional neural network, and a long-short term memory neural network), are not suitable for learning graph structure information, and cannot extract features of a network topology graph, which results in that an algorithm needs to be modified and retrained for different network topologies, and cannot adapt to dynamic changes of the topologies. Deep reinforcement learning based on a graph neural network is used for solving the routing problem of an optical transport network, the method is based on K candidate path routes, a Message Passing Neural Network (MPNN) is used for approximating a Q value function in a DQN reinforcement learning algorithm, and after training of the DQN algorithm is completed, a candidate path with the maximum Q value is selected from the K candidate paths to serve as a service path. However, the routing of the K candidate paths requires that the K candidate paths are calculated for each pair of nodes in advance, and topology change will cause failure of the K candidate paths calculated in advance, so that the method is not suitable for a dynamic topology network; in addition, the multi-constrained K candidate routes are generally NP-hard, and even if a heuristic algorithm is adopted, the time complexity is very large, and the number of candidate routes (i.e., K value) is difficult to be made large, which will severely limit the solution space of the routes.

The dynamic routing process in the dynamic topology network can be expressed as: the services arrive at the network one by one, a route calculation entity of the network needs to calculate a path for each service, if the route calculation is successful, the network accepts the service and allocates bandwidth resources for the service, if the route calculation is failed, the network rejects the service, and the above service routing process is repeated until m continuous services are rejected, which indicates that the resources of the network are exhausted, and the network stops the above dynamic routing process. According to the above dynamic routing process, the dynamic routing problem in the dynamic topology network can be expressed as: given the topology of the current network, the available bandwidth of the link, the source node of the current service, the sink node of the current service, the bandwidth of the current service, and the routing constraint of the current service, how the routing entity of the network computes an appropriate path for the current service, and when the dynamic routing process is stopped, the total throughput of the network (i.e., the total bandwidth of the successfully routed service) is maximized. In a dynamic topology network, the topology of the network may change due to node movement, node or link failure, and the routing entity of the network needs to have the capability of adapting to the topology change, i.e. to operate normally after the topology change.

Disclosure of Invention

Aiming at the defect that the conventional deep reinforcement learning routing method cannot be used for a dynamic topological network, the invention provides the deep reinforcement learning routing method for the dynamic topological network, which has good adaptability, can support complex routing constraint and has high throughput, and can realize dynamic routing calculation of the mobile wireless ad hoc network with dynamically changed network topology.

The invention realizes the above-mentioned purpose and a dynamic topological network intelligent routing method, which has the following technical characteristics:

according to the dynamic routing process of the dynamic topology network, defining a network environment state s as a current network topology, an available bandwidth of a link, a source node of a current service, a destination node of the current service and a bandwidth of the current service, defining an action a of a routing agent as a service path, defining reward fed back by the environment as a service bandwidth for successful routing calculation, and modeling a dynamic routing problem of the dynamic topology network as a deep reinforcement learning routing problem of the routing agent through interactive learning with a network environment, wherein the mapping from the network environment state s to the action a of the routing agent can maximize network throughput;

the routing agent realizes the mapping from the network environment state s to the intelligent action a, and comprises two processes of routing strategy mapping and constraint route calculation: in the routing strategy mapping process, a network environment state s is used as input, a routing strategy w defined as the weight of all links in a network topology is used as output, and the mapping from the network environment state s to the routing strategy w is realized; in the constraint route calculation process, a network environment state s and a routing strategy w are used as input, a service path, namely the action a of the intelligent agent is used as output, a constraint shortest-path algorithm is used as the current service in the network environment state s, and a minimum weight path a meeting complex route constraint is calculated, so that the complex route constraint is processed by the routing intelligent agent;

the routing agent learns the mapping relation from the network environment state s to the routing strategy w in the routing strategy mapping process by using a near-end strategy Optimization (PPO) algorithm, and realizes automatic extraction of network topology characteristics and automatic adaptation to dynamically-changed topology by using a strategy function and a value function of a neural network approximate PPO algorithm.

Compared with the prior art, the invention has the following beneficial effects:

the adaptability is good. According to the dynamic routing process of the dynamic topological network, the dynamic routing problem of the dynamic topological network is modeled into a deep reinforcement learning routing problem of the mapping from a network environment state s, which can maximize the network throughput, of a routing agent to a routing agent action a through interactive learning with a network environment by the routing agent; the routing agent realizes the mapping from the network environment state s to the intelligent action a and comprises two processes of routing strategy mapping and constraint route calculation; the routing agent learns the mapping from the network environment state s to the routing strategy w in the routing strategy mapping process by using a PPO algorithm, and explicitly takes the network topology as the environment state of deep reinforcement learning by approximating a strategy function and a value function of the PPO algorithm through a neural network, so that the routing agent can automatically extract the network topology characteristics by using the neural network, realize the generalization of the routing strategy to different topologies and adapt to the dynamic change of the network topology.

Complex routing constraints are supported. The routing agent realizes that the mapping from a network environment state s to an intelligent action a comprises two processes of routing strategy mapping and constraint route calculation, learns the mapping from the network environment state s to a routing strategy w in the routing strategy mapping process by using a PPO algorithm of a graph neural network approximate value function and a strategy function, and calculates a minimum weight path a meeting complex routing constraint for the current service according to the routing strategy w by using a constraint shortest path algorithm in the constraint route calculation process; the two processes of routing strategy mapping and constraint calculation organically combine the deep learning technology with the traditional shortest routing algorithm with constraint, once the routing strategy is determined, the minimum weight path with constraint can also be uniquely determined, so that the control of deep reinforcement learning on the routing is not damaged, the complex routing constraint can be well processed, and the support of a routing intelligent agent on the complex routing constraint is realized.

The throughput is high. The invention models the dynamic routing problem of a dynamic topological network into a deep reinforcement learning routing problem, defines a network environment state s as a current network topology, an available bandwidth of a link, a source node of a current service, a destination node of the current service and a bandwidth of the current service, defines an action a of a routing agent as a service path, defines an incentive fed back by an environment as a service bandwidth for successful routing calculation, and determines an accumulated incentive as the current throughput of the network.

The efficiency is high. According to the invention, a routing strategy w is generated from a network environment state s, and then the minimum weight path a meeting complex routing constraints is calculated in real time on the basis, so that the problem of failure of K candidate paths after topology change can be avoided, and the problems of difficult calculation (NP) of K paths with multiple constraints and routing solution space loss routing goodness loss of the K candidate paths are solved.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of dynamic topology network intelligent routing of the present invention;

FIG. 2 is a schematic diagram of the graph network structure of the dynamic topology network intelligent routing of the present invention;

FIG. 3 is a diagram of a network environment state representation as an input graph G_inpAn exemplary diagram of (a);

FIG. 4 is a schematic diagram of the structure of the input graph network block GNinp in FIG. 2;

fig. 5 is a schematic structural diagram of the core map network block GNcore in fig. 2;

FIG. 6 is a schematic diagram of the structure of the output graph network block GNout in FIG. 2;

FIG. 7 is a graph of the change in curtaining reward over time steps during routing agent training;

fig. 8 is a graph of the loss function over time steps.

In order to make the technical problems, technical solutions and main points to be solved by the present invention clearer, the following detailed description is made with reference to the accompanying drawings and specific embodiments.

Detailed Description

See fig. 1. According to the invention, according to the dynamic routing process of the dynamic topology network, a network environment state s is defined as the current network topology, the available bandwidth of a link, a source node of the current service, a destination node of the current service and the bandwidth of the current service, an action a of a routing agent is defined as a service path, an incentive fed back by the environment is defined as the service bandwidth with successful routing calculation, and the dynamic routing problem of the dynamic topology network is modeled as a deep reinforcement learning routing problem of the mapping from the network environment state s to the action a of the routing agent, wherein the network environment state s can maximize the network throughput through interactive learning with the network environment;

the routing agent realizes the mapping from the network environment state s to the intelligent action a, and comprises two processes of routing strategy mapping and constraint route calculation: in the routing strategy mapping process, a network environment state s is used as input, a routing strategy w defined as the weight of all links in a network topology is used as output, and the mapping from the network environment state s to the routing strategy w is realized; in the constraint route calculation process, a network environment state s and a routing strategy w are used as input, a service path, namely the action a of the intelligent agent is used as output, a constraint shortest-path algorithm is used as the current service in the network environment state s, and a minimum weight path a meeting complex route constraint is calculated, so that the complex route constraint is processed by the routing intelligent agent; the routing agent learns the mapping relation from the network environment state s to the routing strategy w in the routing strategy mapping process by using a near-end strategy Optimization (PPO) algorithm, and realizes automatic extraction of network topology characteristics and automatic adaptation to dynamically-changed topology by using a strategy function and a value function of a neural network approximate PPO algorithm.

See fig. 2. The graphical neural network used by the routing agent, which is implemented in this embodiment as a network block GN of the input graph, approximates the policy functions and value functions of the PPO algorithm_inpCore graph network block GN_coreAnd output graph network block G_outA graph network formed by connecting in series. Input diagram G_inpNetwork block GN as input diagram_inpInput of (1), GN_inpProcessed output graph G₀，G₀Network block GN as a core map_coreInput of (1), GN_coreOutput graph G after processing M times_M，G_MNetwork block G as an output graph_outThrough G_outAfter processing, output graph G_out. The graph network is a general graph model obtained by further popularizing DeepMind on the basis of summarizing a large number of graph neural networks. The basic building unit of the graph network is a graph network block, and the graph network block takes a graph as input and output to realize the transformation of nodes, edges and global attributes of an input graph. The graph network block contains 3 update functions and 3 aggregation functions. Wherein 3 update functions phi^e,φ^v,φ^uRespectively realizing the updating of the edge attribute, the node attribute and the global attribute, and 3 aggregation functions rho^e→v,ρ^e→u,ρ^v→uRespectively realizing the aggregation of all the adjacent edge attributes of the nodes and all the edge attributes in the graphThe aggregation of properties and the aggregation of all the node attributes in the graph. The aggregation function needs to satisfy the permutation invariance, i.e. the order of the aggregated edges and nodes does not affect the aggregation result. Common aggregation functions include element-by-element summation, averaging, and maximization functions. The calculation process of the graph network block is as follows: the function phi is first updated using the edge attribute^eUpdating the attribute of each edge in the graph; then, the adjacent edge attribute aggregation function ρ is used^e→vAggregating the adjacent edge attributes of the nodes and then using the node update function phi^vUpdating each node in the graph; finally, a node property aggregation function ρ is used^e→uAnd edge attribute aggregation function ρ^v→uAfter all the node attributes in the graph and all the edge attributes in the graph are respectively aggregated, a global attribute updating function phi is used^uAnd updating the global attribute. The graph network is formed by combining 1 or more graph network blocks, each graph network block is equivalent to a layer in a traditional neural network, and the multiple graph network blocks can be combined in a sequence mode (corresponding to a traditional multilayer perceptron) or a recursion mode (corresponding to a traditional recurrent neural network). The graph network has high flexibility, which is mainly represented by: the updating function in the graph network block can be any function including a traditional neural network, the parameters of the updating function are optional, and the aggregation function can also be any function with arrangement invariance and is also optional. The configuration of the multiple graph network blocks in the graph network may be shared or may be different. The high flexibility of the graph network enables the graph network to have strong representation capability, and many types of graph neural networks can be represented, such as MPNN, MLNN, relationship network, deep set, belief propagation embedding and the like.

The routing agent represents the network environment state s as an input graph network block G_inpInput diagram G_inp(u, V, E); u is a global attribute of the graph and represents the sum of available bandwidths of all links in the network;

is a set of network nodes, N^vIs the number of network nodes, v_iIs the ith node attribute, is a 2-dimensional vector,the 1 st element is network access bandwidth, the 2 nd element represents network bandwidth, the node attribute is used for representing a source node, a destination node and bandwidth of a service, and the specific method is as follows: each node attribute, the network access bandwidth of the service source node and the network egress bandwidth of the service destination node are used for representing the service bandwidth, and the other node attributes are all 0;

as a set of network edges, N^eIs the number of edges, r_kIs the destination node index of link k, s_kIs the source node index of link k, e_kIs an attribute of the kth edge representing the available bandwidth of link k.

See fig. 3. In the example of fig. 3, the network environment state s includes a network topology composed of 6 nodes and 8 edges, the available bandwidth of each edge is 5Mbps, the service source node is 1, the service destination node is 6, and the service bandwidth is 2 Mbps; the routing agent represents the network environment state s as an input graph network block G_inpInput diagram G_inpU denotes the sum of the available bandwidths of all links in the network 40;

is a set of network nodes, N^vIs the number of network nodes 6, v_iThe attribute of the ith node is a 2-dimensional vector, the 1 st element is an access bandwidth, the 2 nd element represents an access bandwidth, the access bandwidth of the service source node 1 and the access bandwidth of the service destination node 6 are used for representing service bandwidths and are both 2, and the attributes of other nodes are all 0; e { (E)_k,r_k,s_k)}_k＝1:NeAs a set of network edges, N^eIs the number of edges 8, r_kIs the destination node index of link k, s_kIs the source node index of link k, e_kThe attributes of the kth edge, which indicate the available bandwidth of link k, are all 5.

Input diagram G_inpNetwork block GN as input diagram_inpInput of (1), GN_inpProcessed output graph G₀，G₀Network block GN as a core map_coreThe input of (a) is performed,warp GN_coreOutput graph G after processing M times_M，G_MNetwork block G as an output graph_outThrough G_outAfter processing, output graph G_out。

See fig. 4. Input map network block GN used by PPO algorithm_inpUpdating function phi by edges^eNode update function phi^vGlobal property update function phi^uComposition phi of^e,φ^v,φ^uSingle-layer neural network MLP (Multi-layer neural network) with 3 non-activation functions respectively_e(e_k), MLP_v(v_i),MLP_u(u) implementing the input graph G, respectively_inpEdge, node and global attribute e of_k,v_iConversion of u, let converted attribute e'_k,v'_iU' have the same dimension d to facilitate subsequent core map network blocks GN_coreAnd (4) processing. Input graph network Block G_inpThe middle function defines:

e'_k＝φ^e(e_k)＝MLP_e(e_k)

v'_i＝φ^v(v_i)＝MLP_v(v_i)

u'＝φ^u(u)＝MLP_u(u)

see fig. 5. Core map network block GN used by PPO algorithm_coreUpdating function phi by edges^eNode update function phi^vGlobal property update function phi^uAdjacent edge aggregation function update function rho^e→vEdge aggregation function ρ^e→μAnd node aggregation function ρ^v→uComposition, edge update function phi^eFor multi-layer perceptron neural networks

The input parameters include an edge attribute e_kTwo endpoint attribute v_rkAnd v_skAnd global property u, adjacent edge aggregation function ρ^e→vFor element-by-element summation function, implementing pair node v_iAll adjacent sides E'_iProperty aggregation of (2); node update function phi^vAlso a multi-layer perceptron neural network MLP_v(v_i,ε'_iU) input parameters include node attribute v_iNeighbor aggregation attribute ε'_iAnd a global attribute u; node aggregation function ρ^v→uThe element-by-element summation function is also solved, and attribute aggregation of all the nodes V' is realized; edge aggregation function ρ^e→uThe element-by-element summation function is also solved, and attribute aggregation of all edges E' in the input graph is realized; global attribute update function phi^uFor multi-layer perceptron neural network MLP_u(u, v ', ε'), the input parameters include global attribute u, node aggregation attribute ν 'and edge aggregation attribute ε'. The function is defined in the core graph network block GNcore:

v'_i＝φ^v(v_i,ε'_i,u)＝MLP_v(v_i,ε'_i,u)

u'＝φ^u(u,ν',ε')＝MLP_u(u,ν',ε')

see fig. 6. Output map network block GN used by PPO algorithm_outUpdating function phi by edges^eGlobal property update function phi^uComposition, respectively to graph G_MEdge attribute e of_kTransforming the global attribute u to adapt to a PPO algorithm frame; edge update function phi^eFor 1 single-layer neural network MLP without activation function_e(e_k) The input layer has d neurons, the output layer has 2 neurons, which respectively represent the weight average value and the pair of the links corresponding to the edgesNumber standard deviation; global update function phi^uSingle layer neural network MLP parameterized as 1 no-activation function_uAnd (u), the input layer has d neurons, and the output layer has only 1 neuron, and represents the value of the value function. Output graph network block GN_outThe middle function defines:

e'_k＝φ^e(e_k)＝MLP_e(e_k)

u'＝φ^u(u)＝MLP_u(u)

see fig. 7 and 8. The intelligent agent trains on a randomly generated network topology containing 15 nodes and 30 edges, in a network environment, services arrive at a network one by one, source and destination node pairs of the services are generated by a gravity model, and the bandwidth of the services is 1 Mbps. In a graph network, an input graph network block GN_inpThe number d of neurons in the output layer of 3 MLPs is 16, and the core map block GN_coreUsing the ReLU activation function, the number of repetitions M is taken to be 3, GN_coreThe 3 update functions of (1) were all 3-layer MLPs, the number of neurons per layer was 16, and the ReLU activation function was used. And (2) the routing agent interacts with 4 network environments simultaneously during PPO algorithm training, executes 128 steps in each network environment to obtain 512 samples in total, repeatedly uses the samples to train for 4 times, randomly breaks up all the samples during each training, then divides the samples into 4 mini batches of 128 samples in each batch, optimizes a loss function by using a random gradient descent method, and repeats the sampling and training processes, and exits when 70 ten thousand steps are reached. Fig. 7 is a graph of the change of the curtailment response with time step when the routing agent trains, fig. 8 is a graph of the change of the loss function with time step, it can be seen from the graph that the training process runs for 70 ten thousand steps in total, and when the training is performed for 30 ten thousand steps (i.e. 23 minutes), the routing agent converges.

The foregoing is merely a preferred embodiment of the invention, which is intended to be illustrative and not limiting. It will be understood by those skilled in the art that many variations, modifications, and even equivalents may be made thereto within the spirit and scope of the invention as defined in the claims, but all of which fall within the scope of the invention.

Claims

1. An intelligent routing method for a dynamic topological network has the following technical characteristics:

2. The intelligent routing method for dynamic topology networks of claim 1, wherein: graph neural network from input graph network block GN_inpCore graph network block GN_coreAnd output graph network block G_outAre connected in series to formThe graph network block is a basic building unit of the graph network, and takes the graph as input and output to realize the transformation of the nodes, edges and global attributes of the input graph.

3. The intelligent routing method for dynamic topology networks of claim 2, wherein: input diagram G_inpNetwork block GN as input diagram_inpInput of (1), GN_inpProcessed output graph G₀，G₀Network block GN as a core map_coreInput of (1), GN_coreOutput graph G after processing M times_M，G_MNetwork block G as an output graph_outThrough G_outAfter processing, output graph G_out。

4. The intelligent routing method for dynamic topology networks of claim 1, wherein: the graph network block contains an edge update function phi_eNode update function phi_vAnd global attribute update function phi_u3 update functions, adjacent edge attribute aggregation function ρ^e→vPoint attribute aggregation function ρ^e→uAnd edge attribute aggregation function ρ^v→u3 aggregation functions, of which^e,φ^v,φ^uThe 3 update functions respectively realize the update of the side attribute, the node attribute and the global attribute, rho^e→v,ρ^e→u,ρ^v→uThe 3 aggregation functions respectively realize the aggregation of all the adjacent edge attributes of the nodes, the aggregation of all the edge attributes in the graph and the aggregation of all the node attributes in the graph.

5. The intelligent routing method for dynamic topology networks of claim 4, wherein: the calculation process of the graph network block is as follows: the function phi is first updated using the edge attribute^eThe attributes of each edge in the graph are updated and then an adjacent edge attribute aggregation function ρ is used^e→vAggregating the adjacent edge attributes of the nodes and then using the node update function phi^vUpdating each node in the graph; finally, a node property aggregation function ρ is used^e→uAnd edge attribute aggregationSum function ρ^v→uAfter all the node attributes in the graph and all the edge attributes in the graph are respectively aggregated, a global attribute updating function phi is used^uAnd updating the global attribute.

6. The intelligent routing method for dynamic topology networks of claim 1, wherein: the graph network is formed by combining 1 or more graph network blocks, each graph network block is equivalent to a layer in a traditional neural network, a plurality of graph network blocks corresponding to the traditional multilayer perceptron are combined in a sequence mode, or a plurality of graph network blocks corresponding to the traditional recurrent neural network are combined in a recurrent mode.

7. The intelligent routing method for dynamic topology networks of claim 1, wherein: the routing agent represents the network environment state s as an input graph network block G_inpInput diagram G_inp(u, V, E); u is a global attribute of the graph and represents the sum of available bandwidths of all links in the network;

is a set of network nodes, N^vIs the number of network nodes, v_iThe method is characterized in that the ith node attribute is a 2-dimensional vector, the 1 st element is network access bandwidth, the 2 nd element represents network bandwidth, the node attribute is used for representing a source node, a destination node and bandwidth of a service, and the specific method comprises the following steps: each node attribute, the network access bandwidth of the service source node and the network egress bandwidth of the service destination node are used for representing the service bandwidth, and the other node attributes are all 0;

8. The intelligent routing method for dynamic topology networks of claim 1, wherein: PPO algorithmInput diagram network block GN used_inpUpdating function phi by edges^eNode update function phi^vGlobal property update function phi^uComposition phi of^e,φ^v,φ^uSingle-layer neural network MLP (Multi-layer neural network) with 3 non-activation functions respectively_e(e_k),MLP_v(v_i),MLP_u(u) implementing the input graph G, respectively_inpEdge, node and global attribute e of_k,v_iConversion of u, let converted attribute e'_k,v'_iU' have the same dimension d to facilitate subsequent core map network blocks GN_coreAnd (4) processing.

9. The intelligent routing method for dynamic topology networks of claim 8, wherein: core map network block GN used by PPO algorithm_coreUpdating function phi by edges_eNode update function phi^vGlobal property update function phi^uAdjacent edge aggregation function update function rho_e→vEdge aggregation function ρ_e→μAnd node aggregation function ρ^v→uComposition, edge update function phi^eFor multi-layer perceptron neural networks

The input parameters include an edge attribute e_kTwo endpoint attributes

And

and global property u, adjacent edge aggregation function ρ^e→vFor element-by-element summation function, implementing pair node v_iAll adjacent sides E'_iProperty aggregation of (2); node update function phi^vAlso a multi-layer perceptron neural network MLP_v(v_i,ε'_iU) input parameters include node attribute v_iNeighbor aggregation attribute ε'_iAnd a global attribute u; node aggregation function ρ^v→uAlso for the element-by-element summation functions, realNow aggregating the attributes of all the nodes V'; edge aggregation function ρ^e→uThe element-by-element summation function is also solved, and attribute aggregation of all edges E' in the input graph is realized; global attribute update function phi^uFor multi-layer perceptron neural network MLP_u(u, v ', ε'), the input parameters include global attribute u, node aggregation attribute ν 'and edge aggregation attribute ε'.

10. The intelligent routing method for dynamic topology networks of claim 1, wherein: output map network block GN used by PPO algorithm_outUpdating function phi by edges^eGlobal property update function phi^uComposition, respectively to graph G_MEdge attribute e of_kTransforming the global attribute u to adapt to a PPO algorithm frame; edge update function phi^eFor 1 single-layer neural network MLP without activation function_e(e_k) The input layer is provided with d neurons, the output layer is provided with 2 neurons, and the weight mean value and the logarithmic standard deviation of the link corresponding to the edge are respectively represented; global update function phi^uSingle layer neural network MLP parameterized as 1 no-activation function_uAnd (u), the input layer has d neurons, and the output layer has only 1 neuron, and represents the value of the value function.