CN114051272A - Intelligent routing method for dynamic topological network - Google Patents

Intelligent routing method for dynamic topological network Download PDF

Info

Publication number
CN114051272A
CN114051272A CN202111278176.XA CN202111278176A CN114051272A CN 114051272 A CN114051272 A CN 114051272A CN 202111278176 A CN202111278176 A CN 202111278176A CN 114051272 A CN114051272 A CN 114051272A
Authority
CN
China
Prior art keywords
network
routing
graph
attribute
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111278176.XA
Other languages
Chinese (zh)
Inventor
伍元胜
杜俊逸
倪大冬
肖磊
杨佩彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Electronic Technology Institute No 10 Institute of Cetc
Original Assignee
Southwest Electronic Technology Institute No 10 Institute of Cetc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Electronic Technology Institute No 10 Institute of Cetc filed Critical Southwest Electronic Technology Institute No 10 Institute of Cetc
Priority to CN202111278176.XA priority Critical patent/CN114051272A/en
Publication of CN114051272A publication Critical patent/CN114051272A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The intelligent routing method of the dynamic topological network disclosed by the invention has the advantages of good adaptability, support of complex routing constraint and high efficiency. The invention is realized by the following technical scheme: according to the dynamic routing process of the dynamic topological network, modeling the dynamic routing problem of the dynamic topological network into a reinforcement learning routing problem, outputting weights of all links by using a strategy function and a value function in a graph neural network approximate PPO reinforcement learning algorithm and a strategy function by taking the network topology as a part of an environment state, calculating a constraint minimum weight path based on the link weights by the network environment, calculating an optimal routing strategy which can maximize the network throughput by the routing agent for generalization of different network topologies and interactive learning of the network environment, and calculating a minimum weight path a meeting complex routing constraint, thereby realizing the processing of the routing agent for the complex routing constraint; and the routing agent can automatically extract the network topology characteristics and automatically adapt to the dynamically changed topology.

Description

Intelligent routing method for dynamic topological network
Technical Field
The invention belongs to the field of communication network routing, and relates to a dynamic topological network intelligent routing method of a wireless ad hoc network with dynamically changing topology.
Background
In the aspects of modern human society and virtual network society, networks in different forms exist, such as communication networks based on telephone and mail, social networks based on media, urban transportation networks based on transportation, scientific research academic networks based on cooperation, financial networks based on transaction, information communication networks, early warning detection networks, navigation positioning networks, comprehensive information networks and the like. No matter the needs of the people, work and study or cultural life, the network is ubiquitous, and the dependence of the human on the network is highlighted everywhere. Routing is the process of selecting a transmission path for a data packet in a network and is one of the core functions of a communication network.
Routing is generally divided into two categories, static routing and dynamic routing. Static routing is manually configured routing, and corresponding changes cannot be made according to network changes without manual intervention, so that the static routing is not suitable for large or variable networks. The dynamic routing refers to that the router can automatically establish a routing table according to specific routing information exchanged among the routers, and can automatically adjust in due time according to changes of links and nodes, when nodes or links among the nodes in the network have faults or other available routes exist, the dynamic routing can automatically select the optimal available route and continuously forward the message, and the dynamic routing is suitable for dynamic topology networks (such as car networking, unmanned aerial vehicle networks and the like) in which the network topology is dynamically changed due to node movement and link state change. However, conventional dynamic routing techniques, which are usually based on a fixed routing policy, do not adapt well to the dynamic changes of the network. For example, the DSDV routing protocol calculates the shortest path by using the number of routing hops as a link weight, and cannot adapt to the bottleneck link change caused by the topology change, thereby causing network congestion.
In recent years, with the increase of computing power, artificial intelligence technology has been rapidly developed. Among them, deep learning, which is one of the most representative techniques, has been successfully applied in the field of image processing. Deep learning has the capability of automatically extracting features, and can automatically extract low-dimensional features from high-dimensional data for solving the problem of dimension disaster. Modern deep learning approaches generally follow an "end-to-end" design philosophy, emphasizing minimizing a priori representation and computational assumptions, and avoiding explicit structural and manual features. The traditional deep learning represented by a neural network is mainly used for processing regular grid-shaped data such as images, and structural data features are difficult to extract. The graph neural network is a popular technology in the field of deep learning in recent years, is mainly used for learning structured data, and can be used for automatically extracting graph data features.
Reinforcement learning is another representative artificial intelligence technique, and is mainly used for solving intelligent decision problems. Reinforcement learning is an iterative learning process, and in each iteration, an agent explores the environmental state and the agent dynamic space under the guidance of a reward function. The environment state space is represented by a state set S, the action space of the agent is represented by an action set A, and the interaction process of the agent and the environment is as follows: given a certain state s of the environment, the agent will perform a certain action a, the environment will migrate from state s to a new state, while the agent gets a reward r from the environment. The goal of reinforcement learning is to learn an optimal strategy that maximizes long-term cumulative returns. The reinforcement learning algorithm can be largely classified into 3 types, i.e., a value function method, a strategy search method, and a mixed-type AC (Actor-criticc) algorithm. The value function method is mainly used for solving the reinforcement learning problem of discrete motion space, and for the continuous motion space, strategy search or an AC algorithm is usually adopted. The AC algorithm is a combination of a value function method and a strategy search method, wherein an operator and a critic respectively correspond to a strategy function and a value function, the strategy function realizes the mapping from an environment state to the action of the intelligent agent, the value function realizes the evaluation of the current action of the intelligent agent, and the strategy function obtains feedback from the value function to learn. The essence of reinforcement learning is the constant interaction of the agent and the environment, which is often modeled as a markov decision process. Wherein the agent cyclically and continuously observes the state, selects actions and obtains rewards from environmental feedback, and learns action strategies that maximize cumulative rewards in a trial-and-error process. When the state space and the action space are increased, the traditional reinforcement learning faces serious dimensionality disasters and cannot solve the problem of large state space and action space.
The deep reinforcement learning is the combination of the reinforcement learning and the deep learning, absorbs the capability of the deep learning to solve the 'dimension disaster', and can solve the decision problem of high-dimensional state and action space which is difficult to solve by the traditional reinforcement learning. The key problem faced by deep reinforcement learning is the instability problem of the algorithm after the deep neural network is introduced. The trpo (trust Region Policy optimization) algorithm uses a trust domain approach to prevent Policy updates that are too far from the previous Policy, allowing for improvements in Policy performance monotonicity, preventing catastrophic bad Policy updates. The PPO (proximity Policy optimization) algorithm belongs to the AC algorithm, is an improvement on the TRPO algorithm, realizes the limitation on strategy updating by using a truncated alternative objective function, achieves the effect similar to the effect of ensuring strategy updating constraint by using a complex conjugate gradient algorithm for the TRPO, and is much simpler than the TRPO algorithm and better in universality.
In recent years, deep reinforcement learning has been used to solve the routing problem of networks. However, most of the existing deep reinforcement learning routes use traditional neural networks (such as a multilayer perceptron, a convolutional neural network, and a long-short term memory neural network), are not suitable for learning graph structure information, and cannot extract features of a network topology graph, which results in that an algorithm needs to be modified and retrained for different network topologies, and cannot adapt to dynamic changes of the topologies. Deep reinforcement learning based on a graph neural network is used for solving the routing problem of an optical transport network, the method is based on K candidate path routes, a Message Passing Neural Network (MPNN) is used for approximating a Q value function in a DQN reinforcement learning algorithm, and after training of the DQN algorithm is completed, a candidate path with the maximum Q value is selected from the K candidate paths to serve as a service path. However, the routing of the K candidate paths requires that the K candidate paths are calculated for each pair of nodes in advance, and topology change will cause failure of the K candidate paths calculated in advance, so that the method is not suitable for a dynamic topology network; in addition, the multi-constrained K candidate routes are generally NP-hard, and even if a heuristic algorithm is adopted, the time complexity is very large, and the number of candidate routes (i.e., K value) is difficult to be made large, which will severely limit the solution space of the routes.
The dynamic routing process in the dynamic topology network can be expressed as: the services arrive at the network one by one, a route calculation entity of the network needs to calculate a path for each service, if the route calculation is successful, the network accepts the service and allocates bandwidth resources for the service, if the route calculation is failed, the network rejects the service, and the above service routing process is repeated until m continuous services are rejected, which indicates that the resources of the network are exhausted, and the network stops the above dynamic routing process. According to the above dynamic routing process, the dynamic routing problem in the dynamic topology network can be expressed as: given the topology of the current network, the available bandwidth of the link, the source node of the current service, the sink node of the current service, the bandwidth of the current service, and the routing constraint of the current service, how the routing entity of the network computes an appropriate path for the current service, and when the dynamic routing process is stopped, the total throughput of the network (i.e., the total bandwidth of the successfully routed service) is maximized. In a dynamic topology network, the topology of the network may change due to node movement, node or link failure, and the routing entity of the network needs to have the capability of adapting to the topology change, i.e. to operate normally after the topology change.
Disclosure of Invention
Aiming at the defect that the conventional deep reinforcement learning routing method cannot be used for a dynamic topological network, the invention provides the deep reinforcement learning routing method for the dynamic topological network, which has good adaptability, can support complex routing constraint and has high throughput, and can realize dynamic routing calculation of the mobile wireless ad hoc network with dynamically changed network topology.
The invention realizes the above-mentioned purpose and a dynamic topological network intelligent routing method, which has the following technical characteristics:
according to the dynamic routing process of the dynamic topology network, defining a network environment state s as a current network topology, an available bandwidth of a link, a source node of a current service, a destination node of the current service and a bandwidth of the current service, defining an action a of a routing agent as a service path, defining reward fed back by the environment as a service bandwidth for successful routing calculation, and modeling a dynamic routing problem of the dynamic topology network as a deep reinforcement learning routing problem of the routing agent through interactive learning with a network environment, wherein the mapping from the network environment state s to the action a of the routing agent can maximize network throughput;
the routing agent realizes the mapping from the network environment state s to the intelligent action a, and comprises two processes of routing strategy mapping and constraint route calculation: in the routing strategy mapping process, a network environment state s is used as input, a routing strategy w defined as the weight of all links in a network topology is used as output, and the mapping from the network environment state s to the routing strategy w is realized; in the constraint route calculation process, a network environment state s and a routing strategy w are used as input, a service path, namely the action a of the intelligent agent is used as output, a constraint shortest-path algorithm is used as the current service in the network environment state s, and a minimum weight path a meeting complex route constraint is calculated, so that the complex route constraint is processed by the routing intelligent agent;
the routing agent learns the mapping relation from the network environment state s to the routing strategy w in the routing strategy mapping process by using a near-end strategy Optimization (PPO) algorithm, and realizes automatic extraction of network topology characteristics and automatic adaptation to dynamically-changed topology by using a strategy function and a value function of a neural network approximate PPO algorithm.
Compared with the prior art, the invention has the following beneficial effects:
the adaptability is good. According to the dynamic routing process of the dynamic topological network, the dynamic routing problem of the dynamic topological network is modeled into a deep reinforcement learning routing problem of the mapping from a network environment state s, which can maximize the network throughput, of a routing agent to a routing agent action a through interactive learning with a network environment by the routing agent; the routing agent realizes the mapping from the network environment state s to the intelligent action a and comprises two processes of routing strategy mapping and constraint route calculation; the routing agent learns the mapping from the network environment state s to the routing strategy w in the routing strategy mapping process by using a PPO algorithm, and explicitly takes the network topology as the environment state of deep reinforcement learning by approximating a strategy function and a value function of the PPO algorithm through a neural network, so that the routing agent can automatically extract the network topology characteristics by using the neural network, realize the generalization of the routing strategy to different topologies and adapt to the dynamic change of the network topology.
Complex routing constraints are supported. The routing agent realizes that the mapping from a network environment state s to an intelligent action a comprises two processes of routing strategy mapping and constraint route calculation, learns the mapping from the network environment state s to a routing strategy w in the routing strategy mapping process by using a PPO algorithm of a graph neural network approximate value function and a strategy function, and calculates a minimum weight path a meeting complex routing constraint for the current service according to the routing strategy w by using a constraint shortest path algorithm in the constraint route calculation process; the two processes of routing strategy mapping and constraint calculation organically combine the deep learning technology with the traditional shortest routing algorithm with constraint, once the routing strategy is determined, the minimum weight path with constraint can also be uniquely determined, so that the control of deep reinforcement learning on the routing is not damaged, the complex routing constraint can be well processed, and the support of a routing intelligent agent on the complex routing constraint is realized.
The throughput is high. The invention models the dynamic routing problem of a dynamic topological network into a deep reinforcement learning routing problem, defines a network environment state s as a current network topology, an available bandwidth of a link, a source node of a current service, a destination node of the current service and a bandwidth of the current service, defines an action a of a routing agent as a service path, defines an incentive fed back by an environment as a service bandwidth for successful routing calculation, and determines an accumulated incentive as the current throughput of the network.
The efficiency is high. According to the invention, a routing strategy w is generated from a network environment state s, and then the minimum weight path a meeting complex routing constraints is calculated in real time on the basis, so that the problem of failure of K candidate paths after topology change can be avoided, and the problems of difficult calculation (NP) of K paths with multiple constraints and routing solution space loss routing goodness loss of the K candidate paths are solved.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a schematic diagram of dynamic topology network intelligent routing of the present invention;
FIG. 2 is a schematic diagram of the graph network structure of the dynamic topology network intelligent routing of the present invention;
FIG. 3 is a diagram of a network environment state representation as an input graph GinpAn exemplary diagram of (a);
FIG. 4 is a schematic diagram of the structure of the input graph network block GNinp in FIG. 2;
fig. 5 is a schematic structural diagram of the core map network block GNcore in fig. 2;
FIG. 6 is a schematic diagram of the structure of the output graph network block GNout in FIG. 2;
FIG. 7 is a graph of the change in curtaining reward over time steps during routing agent training;
fig. 8 is a graph of the loss function over time steps.
In order to make the technical problems, technical solutions and main points to be solved by the present invention clearer, the following detailed description is made with reference to the accompanying drawings and specific embodiments.
Detailed Description
See fig. 1. According to the invention, according to the dynamic routing process of the dynamic topology network, a network environment state s is defined as the current network topology, the available bandwidth of a link, a source node of the current service, a destination node of the current service and the bandwidth of the current service, an action a of a routing agent is defined as a service path, an incentive fed back by the environment is defined as the service bandwidth with successful routing calculation, and the dynamic routing problem of the dynamic topology network is modeled as a deep reinforcement learning routing problem of the mapping from the network environment state s to the action a of the routing agent, wherein the network environment state s can maximize the network throughput through interactive learning with the network environment;
the routing agent realizes the mapping from the network environment state s to the intelligent action a, and comprises two processes of routing strategy mapping and constraint route calculation: in the routing strategy mapping process, a network environment state s is used as input, a routing strategy w defined as the weight of all links in a network topology is used as output, and the mapping from the network environment state s to the routing strategy w is realized; in the constraint route calculation process, a network environment state s and a routing strategy w are used as input, a service path, namely the action a of the intelligent agent is used as output, a constraint shortest-path algorithm is used as the current service in the network environment state s, and a minimum weight path a meeting complex route constraint is calculated, so that the complex route constraint is processed by the routing intelligent agent; the routing agent learns the mapping relation from the network environment state s to the routing strategy w in the routing strategy mapping process by using a near-end strategy Optimization (PPO) algorithm, and realizes automatic extraction of network topology characteristics and automatic adaptation to dynamically-changed topology by using a strategy function and a value function of a neural network approximate PPO algorithm.
See fig. 2. The graphical neural network used by the routing agent, which is implemented in this embodiment as a network block GN of the input graph, approximates the policy functions and value functions of the PPO algorithminpCore graph network block GNcoreAnd output graph network block GoutA graph network formed by connecting in series. Input diagram GinpNetwork block GN as input diagraminpInput of (1), GNinpProcessed output graph G0,G0Network block GN as a core mapcoreInput of (1), GNcoreOutput graph G after processing M timesM,GMNetwork block G as an output graphoutThrough GoutAfter processing, output graph Gout. The graph network is a general graph model obtained by further popularizing DeepMind on the basis of summarizing a large number of graph neural networks. The basic building unit of the graph network is a graph network block, and the graph network block takes a graph as input and output to realize the transformation of nodes, edges and global attributes of an input graph. The graph network block contains 3 update functions and 3 aggregation functions. Wherein 3 update functions phievuRespectively realizing the updating of the edge attribute, the node attribute and the global attribute, and 3 aggregation functions rhoe→ve→uv→uRespectively realizing the aggregation of all the adjacent edge attributes of the nodes and all the edge attributes in the graphThe aggregation of properties and the aggregation of all the node attributes in the graph. The aggregation function needs to satisfy the permutation invariance, i.e. the order of the aggregated edges and nodes does not affect the aggregation result. Common aggregation functions include element-by-element summation, averaging, and maximization functions. The calculation process of the graph network block is as follows: the function phi is first updated using the edge attributeeUpdating the attribute of each edge in the graph; then, the adjacent edge attribute aggregation function ρ is usede→vAggregating the adjacent edge attributes of the nodes and then using the node update function phivUpdating each node in the graph; finally, a node property aggregation function ρ is usede→uAnd edge attribute aggregation function ρv→uAfter all the node attributes in the graph and all the edge attributes in the graph are respectively aggregated, a global attribute updating function phi is useduAnd updating the global attribute. The graph network is formed by combining 1 or more graph network blocks, each graph network block is equivalent to a layer in a traditional neural network, and the multiple graph network blocks can be combined in a sequence mode (corresponding to a traditional multilayer perceptron) or a recursion mode (corresponding to a traditional recurrent neural network). The graph network has high flexibility, which is mainly represented by: the updating function in the graph network block can be any function including a traditional neural network, the parameters of the updating function are optional, and the aggregation function can also be any function with arrangement invariance and is also optional. The configuration of the multiple graph network blocks in the graph network may be shared or may be different. The high flexibility of the graph network enables the graph network to have strong representation capability, and many types of graph neural networks can be represented, such as MPNN, MLNN, relationship network, deep set, belief propagation embedding and the like.
The routing agent represents the network environment state s as an input graph network block GinpInput diagram Ginp(u, V, E); u is a global attribute of the graph and represents the sum of available bandwidths of all links in the network;
Figure BDA0003330413550000063
is a set of network nodes, NvIs the number of network nodes, viIs the ith node attribute, is a 2-dimensional vector,the 1 st element is network access bandwidth, the 2 nd element represents network bandwidth, the node attribute is used for representing a source node, a destination node and bandwidth of a service, and the specific method is as follows: each node attribute, the network access bandwidth of the service source node and the network egress bandwidth of the service destination node are used for representing the service bandwidth, and the other node attributes are all 0;
Figure BDA0003330413550000061
as a set of network edges, NeIs the number of edges, rkIs the destination node index of link k, skIs the source node index of link k, ekIs an attribute of the kth edge representing the available bandwidth of link k.
See fig. 3. In the example of fig. 3, the network environment state s includes a network topology composed of 6 nodes and 8 edges, the available bandwidth of each edge is 5Mbps, the service source node is 1, the service destination node is 6, and the service bandwidth is 2 Mbps; the routing agent represents the network environment state s as an input graph network block GinpInput diagram GinpU denotes the sum of the available bandwidths of all links in the network 40;
Figure BDA0003330413550000062
is a set of network nodes, NvIs the number of network nodes 6, viThe attribute of the ith node is a 2-dimensional vector, the 1 st element is an access bandwidth, the 2 nd element represents an access bandwidth, the access bandwidth of the service source node 1 and the access bandwidth of the service destination node 6 are used for representing service bandwidths and are both 2, and the attributes of other nodes are all 0; e { (E)k,rk,sk)}k=1:NeAs a set of network edges, NeIs the number of edges 8, rkIs the destination node index of link k, skIs the source node index of link k, ekThe attributes of the kth edge, which indicate the available bandwidth of link k, are all 5.
Input diagram GinpNetwork block GN as input diagraminpInput of (1), GNinpProcessed output graph G0,G0Network block GN as a core mapcoreThe input of (a) is performed,warp GNcoreOutput graph G after processing M timesM,GMNetwork block G as an output graphoutThrough GoutAfter processing, output graph Gout
See fig. 4. Input map network block GN used by PPO algorithminpUpdating function phi by edgeseNode update function phivGlobal property update function phiuComposition phi ofevuSingle-layer neural network MLP (Multi-layer neural network) with 3 non-activation functions respectivelye(ek), MLPv(vi),MLPu(u) implementing the input graph G, respectivelyinpEdge, node and global attribute e ofk,viConversion of u, let converted attribute e'k,v'iU' have the same dimension d to facilitate subsequent core map network blocks GNcoreAnd (4) processing. Input graph network Block GinpThe middle function defines:
e'k=φe(ek)=MLPe(ek)
v'i=φv(vi)=MLPv(vi)
u'=φu(u)=MLPu(u)
see fig. 5. Core map network block GN used by PPO algorithmcoreUpdating function phi by edgeseNode update function phivGlobal property update function phiuAdjacent edge aggregation function update function rhoe→vEdge aggregation function ρe→μAnd node aggregation function ρv→uComposition, edge update function phieFor multi-layer perceptron neural networks
Figure BDA0003330413550000071
The input parameters include an edge attribute ekTwo endpoint attribute vrkAnd vskAnd global property u, adjacent edge aggregation function ρe→vFor element-by-element summation function, implementing pair node viAll adjacent sides E'iProperty aggregation of (2); node update function phivAlso a multi-layer perceptron neural network MLPv(vi,ε'iU) input parameters include node attribute viNeighbor aggregation attribute ε'iAnd a global attribute u; node aggregation function ρv→uThe element-by-element summation function is also solved, and attribute aggregation of all the nodes V' is realized; edge aggregation function ρe→uThe element-by-element summation function is also solved, and attribute aggregation of all edges E' in the input graph is realized; global attribute update function phiuFor multi-layer perceptron neural network MLPu(u, v ', ε'), the input parameters include global attribute u, node aggregation attribute ν 'and edge aggregation attribute ε'. The function is defined in the core graph network block GNcore:
Figure BDA0003330413550000084
v'i=φv(vi,ε'i,u)=MLPv(vi,ε'i,u)
u'=φu(u,ν',ε')=MLPu(u,ν',ε')
Figure BDA0003330413550000081
Figure BDA0003330413550000082
Figure BDA0003330413550000083
see fig. 6. Output map network block GN used by PPO algorithmoutUpdating function phi by edgeseGlobal property update function phiuComposition, respectively to graph GMEdge attribute e ofkTransforming the global attribute u to adapt to a PPO algorithm frame; edge update function phieFor 1 single-layer neural network MLP without activation functione(ek) The input layer has d neurons, the output layer has 2 neurons, which respectively represent the weight average value and the pair of the links corresponding to the edgesNumber standard deviation; global update function phiuSingle layer neural network MLP parameterized as 1 no-activation functionuAnd (u), the input layer has d neurons, and the output layer has only 1 neuron, and represents the value of the value function. Output graph network block GNoutThe middle function defines:
e'k=φe(ek)=MLPe(ek)
u'=φu(u)=MLPu(u)
see fig. 7 and 8. The intelligent agent trains on a randomly generated network topology containing 15 nodes and 30 edges, in a network environment, services arrive at a network one by one, source and destination node pairs of the services are generated by a gravity model, and the bandwidth of the services is 1 Mbps. In a graph network, an input graph network block GNinpThe number d of neurons in the output layer of 3 MLPs is 16, and the core map block GNcoreUsing the ReLU activation function, the number of repetitions M is taken to be 3, GNcoreThe 3 update functions of (1) were all 3-layer MLPs, the number of neurons per layer was 16, and the ReLU activation function was used. And (2) the routing agent interacts with 4 network environments simultaneously during PPO algorithm training, executes 128 steps in each network environment to obtain 512 samples in total, repeatedly uses the samples to train for 4 times, randomly breaks up all the samples during each training, then divides the samples into 4 mini batches of 128 samples in each batch, optimizes a loss function by using a random gradient descent method, and repeats the sampling and training processes, and exits when 70 ten thousand steps are reached. Fig. 7 is a graph of the change of the curtailment response with time step when the routing agent trains, fig. 8 is a graph of the change of the loss function with time step, it can be seen from the graph that the training process runs for 70 ten thousand steps in total, and when the training is performed for 30 ten thousand steps (i.e. 23 minutes), the routing agent converges.
The foregoing is merely a preferred embodiment of the invention, which is intended to be illustrative and not limiting. It will be understood by those skilled in the art that many variations, modifications, and even equivalents may be made thereto within the spirit and scope of the invention as defined in the claims, but all of which fall within the scope of the invention.

Claims (10)

1. An intelligent routing method for a dynamic topological network has the following technical characteristics:
according to the dynamic routing process of the dynamic topology network, defining a network environment state s as a current network topology, an available bandwidth of a link, a source node of a current service, a destination node of the current service and a bandwidth of the current service, defining an action a of a routing agent as a service path, defining reward fed back by the environment as a service bandwidth for successful routing calculation, and modeling a dynamic routing problem of the dynamic topology network as a deep reinforcement learning routing problem of the routing agent through interactive learning with a network environment, wherein the mapping from the network environment state s to the action a of the routing agent can maximize network throughput;
the routing agent realizes the mapping from the network environment state s to the intelligent action a, and comprises two processes of routing strategy mapping and constraint route calculation: in the routing strategy mapping process, a network environment state s is used as input, a routing strategy w defined as the weight of all links in a network topology is used as output, and the mapping from the network environment state s to the routing strategy w is realized; in the constraint route calculation process, a network environment state s and a routing strategy w are used as input, a service path, namely the action a of the intelligent agent is used as output, a constraint shortest-path algorithm is used as the current service in the network environment state s, and a minimum weight path a meeting complex route constraint is calculated, so that the complex route constraint is processed by the routing intelligent agent;
the routing agent learns the mapping relation from the network environment state s to the routing strategy w in the routing strategy mapping process by using a near-end strategy Optimization (PPO) algorithm, and realizes automatic extraction of network topology characteristics and automatic adaptation to dynamically-changed topology by using a strategy function and a value function of a neural network approximate PPO algorithm.
2. The intelligent routing method for dynamic topology networks of claim 1, wherein: graph neural network from input graph network block GNinpCore graph network block GNcoreAnd output graph network block GoutAre connected in series to formThe graph network block is a basic building unit of the graph network, and takes the graph as input and output to realize the transformation of the nodes, edges and global attributes of the input graph.
3. The intelligent routing method for dynamic topology networks of claim 2, wherein: input diagram GinpNetwork block GN as input diagraminpInput of (1), GNinpProcessed output graph G0,G0Network block GN as a core mapcoreInput of (1), GNcoreOutput graph G after processing M timesM,GMNetwork block G as an output graphoutThrough GoutAfter processing, output graph Gout
4. The intelligent routing method for dynamic topology networks of claim 1, wherein: the graph network block contains an edge update function phieNode update function phivAnd global attribute update function phiu3 update functions, adjacent edge attribute aggregation function ρe→vPoint attribute aggregation function ρe→uAnd edge attribute aggregation function ρv→u3 aggregation functions, of whichevuThe 3 update functions respectively realize the update of the side attribute, the node attribute and the global attribute, rhoe→ve→uv→uThe 3 aggregation functions respectively realize the aggregation of all the adjacent edge attributes of the nodes, the aggregation of all the edge attributes in the graph and the aggregation of all the node attributes in the graph.
5. The intelligent routing method for dynamic topology networks of claim 4, wherein: the calculation process of the graph network block is as follows: the function phi is first updated using the edge attributeeThe attributes of each edge in the graph are updated and then an adjacent edge attribute aggregation function ρ is usede→vAggregating the adjacent edge attributes of the nodes and then using the node update function phivUpdating each node in the graph; finally, a node property aggregation function ρ is usede→uAnd edge attribute aggregationSum function ρv→uAfter all the node attributes in the graph and all the edge attributes in the graph are respectively aggregated, a global attribute updating function phi is useduAnd updating the global attribute.
6. The intelligent routing method for dynamic topology networks of claim 1, wherein: the graph network is formed by combining 1 or more graph network blocks, each graph network block is equivalent to a layer in a traditional neural network, a plurality of graph network blocks corresponding to the traditional multilayer perceptron are combined in a sequence mode, or a plurality of graph network blocks corresponding to the traditional recurrent neural network are combined in a recurrent mode.
7. The intelligent routing method for dynamic topology networks of claim 1, wherein: the routing agent represents the network environment state s as an input graph network block GinpInput diagram Ginp(u, V, E); u is a global attribute of the graph and represents the sum of available bandwidths of all links in the network;
Figure FDA0003330413540000021
is a set of network nodes, NvIs the number of network nodes, viThe method is characterized in that the ith node attribute is a 2-dimensional vector, the 1 st element is network access bandwidth, the 2 nd element represents network bandwidth, the node attribute is used for representing a source node, a destination node and bandwidth of a service, and the specific method comprises the following steps: each node attribute, the network access bandwidth of the service source node and the network egress bandwidth of the service destination node are used for representing the service bandwidth, and the other node attributes are all 0;
Figure FDA0003330413540000022
as a set of network edges, NeIs the number of edges, rkIs the destination node index of link k, skIs the source node index of link k, ekIs an attribute of the kth edge representing the available bandwidth of link k.
8. The intelligent routing method for dynamic topology networks of claim 1, wherein: PPO algorithmInput diagram network block GN usedinpUpdating function phi by edgeseNode update function phivGlobal property update function phiuComposition phi ofevuSingle-layer neural network MLP (Multi-layer neural network) with 3 non-activation functions respectivelye(ek),MLPv(vi),MLPu(u) implementing the input graph G, respectivelyinpEdge, node and global attribute e ofk,viConversion of u, let converted attribute e'k,v'iU' have the same dimension d to facilitate subsequent core map network blocks GNcoreAnd (4) processing.
9. The intelligent routing method for dynamic topology networks of claim 8, wherein: core map network block GN used by PPO algorithmcoreUpdating function phi by edgeseNode update function phivGlobal property update function phiuAdjacent edge aggregation function update function rhoe→vEdge aggregation function ρe→μAnd node aggregation function ρv→uComposition, edge update function phieFor multi-layer perceptron neural networks
Figure FDA0003330413540000023
The input parameters include an edge attribute ekTwo endpoint attributes
Figure FDA0003330413540000024
And
Figure FDA0003330413540000025
and global property u, adjacent edge aggregation function ρe→vFor element-by-element summation function, implementing pair node viAll adjacent sides E'iProperty aggregation of (2); node update function phivAlso a multi-layer perceptron neural network MLPv(vi,ε'iU) input parameters include node attribute viNeighbor aggregation attribute ε'iAnd a global attribute u; node aggregation function ρv→uAlso for the element-by-element summation functions, realNow aggregating the attributes of all the nodes V'; edge aggregation function ρe→uThe element-by-element summation function is also solved, and attribute aggregation of all edges E' in the input graph is realized; global attribute update function phiuFor multi-layer perceptron neural network MLPu(u, v ', ε'), the input parameters include global attribute u, node aggregation attribute ν 'and edge aggregation attribute ε'.
10. The intelligent routing method for dynamic topology networks of claim 1, wherein: output map network block GN used by PPO algorithmoutUpdating function phi by edgeseGlobal property update function phiuComposition, respectively to graph GMEdge attribute e ofkTransforming the global attribute u to adapt to a PPO algorithm frame; edge update function phieFor 1 single-layer neural network MLP without activation functione(ek) The input layer is provided with d neurons, the output layer is provided with 2 neurons, and the weight mean value and the logarithmic standard deviation of the link corresponding to the edge are respectively represented; global update function phiuSingle layer neural network MLP parameterized as 1 no-activation functionuAnd (u), the input layer has d neurons, and the output layer has only 1 neuron, and represents the value of the value function.
CN202111278176.XA 2021-10-30 2021-10-30 Intelligent routing method for dynamic topological network Pending CN114051272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111278176.XA CN114051272A (en) 2021-10-30 2021-10-30 Intelligent routing method for dynamic topological network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111278176.XA CN114051272A (en) 2021-10-30 2021-10-30 Intelligent routing method for dynamic topological network

Publications (1)

Publication Number Publication Date
CN114051272A true CN114051272A (en) 2022-02-15

Family

ID=80206439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111278176.XA Pending CN114051272A (en) 2021-10-30 2021-10-30 Intelligent routing method for dynamic topological network

Country Status (1)

Country Link
CN (1) CN114051272A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114980185A (en) * 2022-05-12 2022-08-30 重庆邮电大学 Vehicle-mounted self-organizing network routing method based on topological evolution
WO2024057391A1 (en) * 2022-09-13 2024-03-21 日本電信電話株式会社 Learning device, inference device, learning method, inference method, learning program, and inference program

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020134507A1 (en) * 2018-12-28 2020-07-02 北京邮电大学 Routing construction method for unmanned aerial vehicle network, unmanned aerial vehicle, and storage medium
CN111917642A (en) * 2020-07-14 2020-11-10 电子科技大学 SDN intelligent routing data transmission method for distributed deep reinforcement learning
CN112887156A (en) * 2021-02-23 2021-06-01 重庆邮电大学 Dynamic virtual network function arrangement method based on deep reinforcement learning
CN113194034A (en) * 2021-04-22 2021-07-30 华中科技大学 Route optimization method and system based on graph neural network and deep reinforcement learning
CN113285872A (en) * 2021-03-09 2021-08-20 清华大学 Time-sensitive network communication flow scheduling method based on deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020134507A1 (en) * 2018-12-28 2020-07-02 北京邮电大学 Routing construction method for unmanned aerial vehicle network, unmanned aerial vehicle, and storage medium
CN111917642A (en) * 2020-07-14 2020-11-10 电子科技大学 SDN intelligent routing data transmission method for distributed deep reinforcement learning
CN112887156A (en) * 2021-02-23 2021-06-01 重庆邮电大学 Dynamic virtual network function arrangement method based on deep reinforcement learning
CN113285872A (en) * 2021-03-09 2021-08-20 清华大学 Time-sensitive network communication flow scheduling method based on deep reinforcement learning
CN113194034A (en) * 2021-04-22 2021-07-30 华中科技大学 Route optimization method and system based on graph neural network and deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
伍元胜: "面向动态拓扑网络的深度强化学习路由技术" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114980185A (en) * 2022-05-12 2022-08-30 重庆邮电大学 Vehicle-mounted self-organizing network routing method based on topological evolution
WO2024057391A1 (en) * 2022-09-13 2024-03-21 日本電信電話株式会社 Learning device, inference device, learning method, inference method, learning program, and inference program

Similar Documents

Publication Publication Date Title
CN112202672B (en) Network route forwarding method and system based on service quality requirement
CN110365514B (en) SDN multistage virtual network mapping method and device based on reinforcement learning
CN111770019B (en) Q-learning optical network-on-chip self-adaptive route planning method based on Dijkstra algorithm
CN114051272A (en) Intelligent routing method for dynamic topological network
CN114697229B (en) Construction method and application of distributed routing planning model
CN112437020A (en) Data center network load balancing method based on deep reinforcement learning
CN114499648B (en) Unmanned aerial vehicle cluster network intelligent multi-hop routing method based on multi-agent cooperation
CN113395207A (en) Deep reinforcement learning-based route optimization framework and method under SDN framework
CN115396366B (en) Distributed intelligent routing method based on graph attention network
Du et al. GAQ-EBkSP: a DRL-based urban traffic dynamic rerouting framework using fog-cloud architecture
CN112311608A (en) Multilayer heterogeneous network space node characterization method
CN111246320B (en) Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network
CN116527565A (en) Internet route optimization method and device based on graph convolution neural network
CN115842768A (en) SDN route optimization method based on time-space feature fusion of graph neural network
CN115225561A (en) Route optimization method and system based on graph structure characteristics
CN114710439A (en) Network energy consumption and throughput joint optimization routing method based on deep reinforcement learning
Bhavanasi et al. Dealing with changes: Resilient routing via graph neural networks and multi-agent deep reinforcement learning
Abou El Houda et al. Cost-efficient federated reinforcement learning-based network routing for wireless networks
Sooda et al. A comparative analysis for determining the optimal path using PSO and GA
CN116938810A (en) Deep reinforcement learning SDN intelligent route optimization method based on graph neural network
CN112333102B (en) Software defined network routing method and system based on knowledge graph
CN111865793B (en) IPv6 network service customized reliable routing system and method based on function learning
CN110781352B (en) Method for optimizing topological structure to realize network structure controllability at lowest cost
CN113177636A (en) Network dynamic routing method and system based on multiple constraint conditions
Lent Dynamic Routing in Challenged Networks with Graph Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220215

RJ01 Rejection of invention patent application after publication