CN116418730A

CN116418730A - Distributed extensible intelligent routing method based on key nodes

Info

Publication number: CN116418730A
Application number: CN202310361356.7A
Authority: CN
Inventors: 肖哲; 刘晓东; 潘宁; 刘丽哲; 焦利彬; 许萌签; 甘瑞蒙; 李金�; 贾泽坤
Original assignee: CETC 54 Research Institute
Current assignee: CETC 54 Research Institute
Priority date: 2023-04-06
Filing date: 2023-04-06
Publication date: 2023-07-11

Abstract

The invention provides a distributed extensible intelligent routing method based on key nodes, and belongs to the field of network resource scheduling optimization. According to the method, the network average aggregation coefficient, the key node proportion and the node criticality are calculated according to the network topology scale, the network density degree and the centrality of various nodes, and the deployment position of the intelligent agent is determined by integrating the key node proportion and the node criticality; generating a routing strategy by the intelligent agent according to the service flow requirement and the network state, feeding back a utility function after the network executes the strategy, and optimizing the strategy by the intelligent agent through feeding back adjustment parameters; after the training of the intelligent agent is completed, the routing strategy is continuously adjusted along with the change of the network state, and the network efficiency is continuously optimized. The invention realizes complex network control through a small number of intelligent agent nodes, effectively improves network throughput, reduces average service delay and deployment cost on the basis of not greatly changing the prior network environment, has better expandability, can be flexibly deployed under different network scales, and is suitable for engineering realization.

Description

Distributed extensible intelligent routing method based on key nodes

Technical Field

The invention relates to the field of distributed intelligent route optimization, in particular to a distributed extensible intelligent route method based on key nodes.

Background

Along with the development of internet technology, the complexity and the dynamics of the communication network are continuously improved, the breakthroughs of material science and manufacturing technology push the processing capacity of the terminal equipment to be continuously improved, however, the development speed of network equipment and a transmission channel is relatively slow, a series of new services such as video live broadcast, electronic competition, remote medical treatment, financial data and the like put forward higher requirements on the performance of the communication network, and the 'high reliability, low time delay and large bandwidth' become the development targets of future network control technology.

The traditional routing method generally adopts a shortest path algorithm to plan paths, designs a fixed network flow model, IS obtained by solving an objective function, and more commonly comprises routing algorithms based on distance vectors such as RIP, IGRP, EIGRP and the like and routing algorithms based on link states such as OSPF, IS-IS and the like, which are widely deployed in various environments and are mature in use, but a best effort forwarding mode cannot formulate differentiated routing strategies for network data flows with different characteristics, so as to meet differentiated service requirements in a complex network. The method is characterized in that an efficient and self-adaptive network service routing control scheme is found to ensure the QoS of a network, reduce unnecessary network resource overhead and improve the network resource utilization rate, and provides efficient and stable service support for upper-layer application under the condition of limited calculation, storage and communication resources, so that the method is a problem to be solved in the current communication network.

Aiming at the problems that the traditional routing method has low convergence rate of complex environment and can not adapt to dynamic network environment, and the like, in recent years, new technologies and architectures such as machine learning, SDN and the like are researched and applied to routing planning, and certain progress is made in the fields such as congestion control, load balancing and the like. The routing algorithm can be roughly classified into two types of intelligent routing algorithm based on supervised learning and intelligent routing algorithm based on reinforcement learning (Reinforcement Learning, RL) according to the type of the machine learning method applied by the routing algorithm, and the intelligent routing algorithm based on supervised learning can calculate an appropriate routing scheme more accurately by using network state information and topology information, and simultaneously has advantages compared with the traditional scheme in aspects of convergence speed improvement and signaling overhead reduction. However, using supervised learning algorithms always faces a problem: deep neural networks can use millions or even tens of millions of large parameters, which can only be considered as black boxes, which makes intelligent algorithms difficult to debug, and the volume of deployment is large, which makes detail adjustment difficult.

The intelligent routing method based on deep reinforcement learning (Deep Reinforcement Learning, DRL) can obtain an approximately optimal network configuration scheme usually only by one operation, and can continuously interact with the environment by learning actual network data without simplifying the environment, and can adapt to a nonlinear complex system by operating according to actual information. However, the convergence of the DRL model is strongly related to the output dimension, and most algorithms calculate the route in an indirect manner in order to avoid the problem, for example, calculate the link weight through the DRL algorithm, and make a route decision through other traditional algorithms, so that a real intelligent route selection is not achieved.

In recent years, intelligent routing research is dedicated to network performance improvement in a specific scene, and in an actual application scene, because of factors such as large network scale, changeable environment and the like, the robustness and reliability of the existing method cannot be met, and the algorithm is far from meeting the requirements for daily network management and control. Therefore, how to design a simple, effective, low-cost routing scheme becomes a research difficulty and hotspot in the field of intelligent routing planning.

Disclosure of Invention

In view of the above, the invention provides a distributed scalable intelligent routing method based on key nodes, which constructs a distributed intelligent body by using an RL algorithm, determines the deployment position of the intelligent body according to the network topology scale and the node criticality, realizes complex network control by a small number of intelligent body nodes, effectively improves the network throughput, reduces the average service delay and the deployment cost in the existing network environment, has better scalability, can be flexibly deployed under different network scales, and is suitable for engineering realization.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a distributed scalable intelligent routing method based on key nodes comprises the following steps:

(1) Network link state information is acquired in real time, and each node V in the network G= (V, E) is calculated according to the network state by using a centrality criticality method _i Center criticality ρ of (2) _i The method comprises the steps of carrying out a first treatment on the surface of the Wherein v= { V ₁ ，v ₂ ，...v _N -representing a set of nodes, N representing the total number of all nodes, E representing a set of all edges in the network topology;

(2) According to the value of the centrality criticality, the nodes are arranged from big to small, and the nodes ranked at the front are used as network topology key nodes;

(3) Calculating the shortest paths from the current node to all other target nodes at the node by using a Dijkstra method to obtain next-hop neighbor nodes, and forming a routing table;

(4) Deploying an agent based on reinforcement learning at a selected key node, starting pre-training of the agent, randomly generating service flow demands, recalculating a next hop route through the service flow of the agent to the key node, standardizing performance indexes such as service flow time delay, packet loss and the like, determining a strategy rewarding value according to a utility function, and adjusting an agent model parameter;

(5) In the real service flow, the trained agent is utilized to perform route calculation at each key node, and the route is continuously learned in the route process, so that the intelligent route is realized.

Further, the specific mode of the step (1) is as follows:

(101) Calculating the degree-centrality DC of each node in the network _i Center of median BC _i Near centrality CC _i ：

Wherein k is _i Representing existing AND node v _i The number of connected edges g _st Representing the connection v _s And v _t Is used to determine the number of shortest paths,

representing connection through node v _i And connect v _s And v _t D _ij Representing node v _i To node v _j Is a distance of (2);

(102) Calculating the centrality criticality of all nodes according to a node centrality criticality calculation formula:

further, the specific mode of the step (2) is as follows:

(201) Calculating a network aggregation coefficient:

wherein k is _i Representing node v _i All neighbors of E _i Representing node v _i All the adjacent edges, C _i Representing node v _i C represents the average aggregation coefficient of the network;

(202) Determining a key node proportion m according to the network scale and the aggregation coefficient:

(203) And arranging the nodes according to the value of the centrality criticality from large to small, and determining the critical nodes according to the proportion m.

Further, the specific mode of the step (3) is as follows:

(301) All nodes diffuse own topological connection relations and establish a topological table;

(302) And forming a routing table by calculating a topology table by using a Dijkstra method, and interacting network state perception data.

Further, the specific mode of the step (4) is as follows:

(401) Setting service type, service flow demand and service flow size, randomly generating service flow data, adjusting service flow duration according to network load level, and simulating real service flow state;

(402) When an agent exists in a certain service flow routing path, the agent decides to judge the next jump action, and all agents share a global state space (S, A, P, R), wherein S is the current state set of all agents, and A= pi _i∈N A ⁱ Representing the joint action space of all agents, { A ⁱ } _i∈N The action space of the intelligent agent i is as follows, P: S.times.A.times.S.fwdarw.0, 1]For the state transition probability, R:

each intelligent agent makes a routing action decision according to the current network situation and service flow information, and meanwhile, the global rewards are improved;

(403) Constructing a state indicating variable K of a node relative to a service flow, and if a certain service flow passes through a node i, K _i =1, otherwise K _i =0; when K is _i When 1, node i will not be used to plan the next hop route; the decision of an agent at node i is represented as

In theta ⁱ To strengthen the learning network parameters s ⁱ For part of the state of the agent at node i, comprising traffic flow information, part of the neighbor routing table, the available bandwidth of the link, c ⁱ The condition state for the agent decision at the node i is composed of a state indicating variable K; in order to maximize network transmission capacity +.>

The global rewards of the network should be maximized, J (theta) is a global objective function, and theta is the reinforcement learning network parameter theta of each intelligent agent ⁱ And calculating gradient update θ for J (θ) by:

wherein τ represents a value represented by τ to p _θ (τ) vs _θ Sampled state-action pairs (s, a), estimator A (s, a) is used to estimate the policy pi in state s _θ The extent of advantage of action a taken over randomly taken action, policy pi when A (s, a) > 0 _θ Action a will take better results than random action, pi _θ (a|s) represents a policy to employ action a under conditions of network parameter θ and agent state S;

(404) The strategy parameter gradient function of the agent at the computing node i is used for updating the reinforcement learning network parameters, and the computing formula is as follows:

wherein A is ⁱ (s ⁱ A) is a local unbiased estimate of the generalized dominance value a (s, a), when the agent i is in the routing path,

at this point the agent will not update its policy parameters;

(405) Agent part state s at node i ⁱ Condition state c ⁱ Under the condition, outputting an action, namely selecting probability distribution of the next hop neighbor of the service flow through the reinforcement learning network, and calculating a reward function of the action as follows:

wherein gamma is E [0,1 ]]To reward discount factors, r _t The utility value for the t-th traffic stream is expressed as:

wherein the method comprises the steps of

Respectively representing normalized throughput, delay and packet loss of the t-th service flow, and calculating normalized by dividing t-th service flow parameters by service flow parameter average values of the same type, homology/target nodes, and alpha ₁ 、α ₂ 、α ₃ Is a non-negative scalar and represents the importance weight of performance indexes such as throughput, delay, packet loss and the like;

(406) After the decision of the path participated by the intelligent agent is finished, detecting whether an unsafe path exists in the path, the loop and link state of which do not meet the service flow requirement, if so, the intelligent agent rewards the negative scalar for the decision as punishment, and recalculating the path by adopting a Dijkstra method.

The beneficial effects of the invention are as follows:

1. compared with the traditional routing protocol, the invention has the advantage that the network throughput and the service average time delay are obviously improved.

2. The method adopts a distributed routing scheme, determines the key node deployment agent based on the network scale and the aggregation degree, has better expandability, can be flexibly deployed under different network scales, and is easy for engineering realization.

Drawings

Fig. 1 is an overall flowchart of a distributed scalable intelligent routing method based on key nodes in an embodiment of the present invention.

FIG. 2 is a flow chart of key node computation in an embodiment of the invention.

Fig. 3 is a flow chart of a distributed multi-agent routing scheme in an embodiment of the present invention.

Fig. 4 is a network topology diagram of a simulation experiment in an embodiment of the present invention.

Fig. 5 is a schematic diagram showing the comparison of service average time delays under the condition of different intelligent agent numbers in the simulation experiment of the present invention.

Detailed Description

The technical scheme of the invention is further described in detail below with reference to the accompanying drawings in the simulation experiment of the invention.

A distributed extensible intelligent routing method based on key nodes is shown in figure 1, and comprises the following steps:

(1) Network link state information is acquired in real time, and each node V in the network G= (V, E) is calculated according to the network state by using a centrality criticality method _i Center criticality ρ of (2) _i The method comprises the steps of carrying out a first treatment on the surface of the Wherein V represents a node set, the total number of all nodes is N, and E represents a set of all edges in the network topology;

(3) Calculating the shortest path from the current node to all other target nodes by using a Dijkstra method at the node, and calculating the next-hop neighbor node to form a routing table;

(4) Arranging an agent based on reinforcement learning at a selected key node, starting pre-training of the agent, randomly generating service flow demands, recalculating a next hop route through the service flow of the agent to the key node, standardizing performance indexes such as service flow time delay, packet loss and the like, determining a strategy rewarding value according to a utility function, and adjusting an agent model parameter;

According to the method, the number of the agents required by the control network is reduced by introducing network key nodes, the control efficiency of a single-agent network is improved by combining network key node positions to deploy the agents, and the whole network is driven by local intelligence, so that an intelligent network capable of self-evolution is formed.

Further, as shown in fig. 2, in step (1), the centrality criticality of the node is calculated by means of the centrality, the betweenness centrality and the proximity centrality, wherein the centrality judges the node importance by measuring the number of node neighbors; the betting center is used for judging the dependency relationship among nodes, namely: the greater the median centrality is the greater the influence degree of the node on other nodes of the whole network; the approximate centrality is similar to the medium centrality, the node centrality is judged by using the whole network characteristics, and the centrality of the node is measured by the opposite number of the average distance from one node to other nodes of the whole network. The three centrality calculation modes are shown in the following formulas:

wherein DC _i For node v _i Center of degree, BC _i For node v _i Medium centre of (C2) _i For node v _i Near centrality, k _i Representing the existing andnode v _i The number of connected edges, N is the total number of nodes, g _st Representing the connection v _s And v _t Is used to determine the number of shortest paths,

representing connection through node v _i And connect v _s And v _t D _ij Representing node v _i To node v _j Is a distance of (3).

The influence degree of the nodes on the whole network is measured from different angles by different centrality calculation modes, three different centrality parameters are considered in the method, a node centrality criticality calculation formula is provided, and the centrality criticality calculation formula of all the nodes is as follows:

rho in _i Representing node k _i Central criticality of (c).

Further, in the step (2), the parameter m needs to be determined according to the network scale and the aggregation coefficient, the aggregation coefficient can measure the network density, the network with the same scale and higher aggregation coefficient can control the network through fewer intelligent nodes. The network aggregation factor calculation formula is as follows:

wherein k is _i Representing node v _i All neighbors of E _i Representing node v _i All the adjacent edges, C _i Representing node v _i C represents the average aggregation factor of the network.

The calculation mode of the key node selection proportion parameter m is as follows:

further, as shown in fig. 3, the flow of the distributed agent routing scheme is shown in fig. 3, firstly, all nodes in step (3) firstly diffuse their own topological connection relationship, establish a topology table, after the network converges and stabilizes, each node calculates the distance to other target nodes and the next hop neighbor number through the topology table by using Dijkstra method to form a routing table, and meanwhile, the nodes interact with network state sensing data, and in the network operation, each node can sense and calculate the current network bandwidth, link load and traffic flow information according to the default value, the set value or the current real data.

Further, in the step (4), in the agent pre-training stage, setting a service flow service type, a service flow requirement and a service flow size, randomly generating service flow data, and adjusting a service flow duration according to a network load level, for a certain service flow, when a service flow starting point (current node) is a non-agent, directly judging a next hop node according to a routing table until the node is the agent.

When an agent exists in a certain service flow routing path, the decision of the agent judges the next jump action, and all agents share a global state space (S, A, P, R), wherein S in the formula is the current state set of all agents, and A=pi _i∈N A ⁱ Representing the joint action space of all agents, { A ⁱ } _i∈N Is the action space of the intelligent agent i, P is S multiplied by A multiplied by S to the number of [0,1 ]]For the state transition probability, R:

the intelligent agent at each node makes a routing action decision according to the current network situation and service flow information, and meanwhile, the global rewards are improved.

Constructing a state indicating variable K of a node relative to a service flow, and if a certain service flow passes through a node i, K _i =1, otherwise K _i =0; when K is _i If 1, the node i is not used for planning the next hop route; the decision of an agent at node i can be expressed as

In theta ⁱ Reinforcement learning network parameters s ⁱ For the partial state of the current intelligent agent i, the current intelligent agent i comprises service flow information, partial neighbor routing table and available bandwidth of a link, c ⁱ The condition state for the decision of the current intelligent agent is composed of a state indicating variable K; in order to maximize network transmission capacity +.>

The global rewards of the network should be maximized, J (theta) is a global objective function, and theta is the reinforcement learning network parameter theta of each intelligent agent ⁱ And (c) updating θ by calculating the gradient of J (θ), the calculation formula is as follows:

wherein τ represents a value represented by τ to p _θ (τ) vs _θ Sampled state-action pairs (s, a), estimator A (s, a) is used to estimate the policy pi in state s _θ The degree of advantage of action a to taking action at random, when A (s, a)>At 0, policy pi _θ Action a will take better results than random action, pi _θ (a|s) represents a policy to employ the action a under the conditions of the network parameter θ and the agent state S.

Further, the policy parameter gradient function of the agent at the node i is:

at this point the agent will not update its policy parameters.

At the sectionPart state s of agent at point i ⁱ Condition state c ⁱ Under the condition, outputting an action, namely selecting probability distribution of the next hop neighbor of the service flow through the reinforcement learning network, and calculating a reward function of the action as follows:

wherein gamma is E [0,1 ]]To reward discount factors, r _t The utility value for the t-th traffic flow can be expressed as:

wherein the method comprises the steps of

Respectively representing normalized throughput, delay and packet loss of the t-th service flow, and calculating normalized by dividing t-th service flow parameters by service flow parameter average values of the same type, homology/target nodes, and alpha ₁ 、α ₂ 、α ₃ Is a non-negative scalar and represents the importance weight of performance indexes such as throughput, delay, packet loss and the like.

Meanwhile, in order to reduce unsafe routing problems caused by random exploration in the learning process of the intelligent agent, an intelligent decision path detection mechanism is introduced in the method, after the path decision participated by the intelligent agent is finished, whether unsafe paths exist in which the loop and link states do not meet the service flow requirements or not is detected, if unsafe paths exist, the intelligent agent rewards a negative scalar for the decision as punishment, and the Dijkstra method is adopted to recalculate the paths.

The following examples further illustrate the above method by simulating a network as shown in fig. 4.

(1) Reading the simulated network topology information and parameters of each node, calculating the centrality, the betweenness centrality and the proximity centrality of each node, and utilizing a formula

Determining the centrality criticality of each node, wherein the calculation result is shown in table 1:

table 1 all nodes centrality calculation table in network example

As can be seen from table 1, the centrality criticality of each node in the network is ordered from big to small, and is: { V4, V6, V8, V7, V1, V3, V5, V9, V2, V10}.

(2) Calculating a network aggregation coefficient according to the network topology information (node number and edge number), wherein the calculation result of each node aggregation coefficient in the network is shown in table 2:

table 2 network node aggregation coefficients

The calculation of the table shows that the total network average data aggregation coefficient is 0.42 by using the formula

Calculating the key node proportion, so that the whole network key node proportion is 0.6, the whole network node number is 10, so that the key node number is 6, and 6 nodes with larger key degree are { V4, V6, V8, V7, V1 and V3} as key nodes according to the central key degree sequence, namely the deployment position of the intelligent agent in the network.

(3) According to the key node position, distributing the distributed agent, randomly generating the whole network service flow by the network, registering the network load as 0.8 in the embodiment, namely, the total network service flow is 0.8 times of the total bandwidth of the network, calculating the average time delay of network service transmission under three conditions of deploying the agent at the key node and deploying the agent in the whole network, and taking an average value once every 3000 service flow time delay, wherein the three-condition time delay curves are shown in figure 5. As can be seen from fig. 5, in the initial stage, the agent is in the learning stage, the time delay of the network for deploying the agent is larger than that of the network for deploying the agent, but the time delay gradually decreases after learning, the final time delay tends to be stable and the average time delay is lower than that of the network for deploying the agent, and the effect of the two schemes of deploying the agent in the whole network is similar to that of deploying the agent in the key node. The embodiment shows that the method for deploying the intelligent agent at the key node can effectively reduce the average time delay of the network, improve the transmission performance of the network, has better expandability, can be flexibly deployed under different network scales, is easy for engineering realization, reduces the cost and achieves the expected purpose.

In a word, the invention realizes complex network control through a small number of intelligent agent nodes, effectively improves network throughput, reduces average service delay and deployment cost on the basis of not greatly changing the existing network environment, has better expandability, can be flexibly deployed under different network scales, and is suitable for engineering realization.

Claims

1. A distributed scalable intelligent routing method based on key nodes, comprising the steps of:

2. The key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (1) is:

3. the key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (2) is:

(201) Calculating a network aggregation coefficient:

4. The key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (3) is:

5. The key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (4) is:

(402) When an agent exists in a certain service flow routing path, the decision of the agent judges the next jump action, and all agents share a global state space (S, A, P, R), wherein S is the current state set of all agents, and A=pi _i∈N A ⁱ Representing the joint action space of all agents, { A ⁱ } _i∈N The action space of the intelligent agent i is as follows, P: S.times.A.times.S.fwdarw.0, 1]For the state transition probability, R:

at this point the agent will not update its policy parameters;

wherein the method comprises the steps of