CN116418730A - Distributed extensible intelligent routing method based on key nodes - Google Patents

Distributed extensible intelligent routing method based on key nodes Download PDF

Info

Publication number
CN116418730A
CN116418730A CN202310361356.7A CN202310361356A CN116418730A CN 116418730 A CN116418730 A CN 116418730A CN 202310361356 A CN202310361356 A CN 202310361356A CN 116418730 A CN116418730 A CN 116418730A
Authority
CN
China
Prior art keywords
network
node
agent
service flow
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310361356.7A
Other languages
Chinese (zh)
Inventor
肖哲
刘晓东
潘宁
刘丽哲
焦利彬
许萌签
甘瑞蒙
李金�
贾泽坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 54 Research Institute
Original Assignee
CETC 54 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 54 Research Institute filed Critical CETC 54 Research Institute
Priority to CN202310361356.7A priority Critical patent/CN116418730A/en
Publication of CN116418730A publication Critical patent/CN116418730A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/14Routing performance; Theoretical aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0894Policy-based network configuration management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a distributed extensible intelligent routing method based on key nodes, and belongs to the field of network resource scheduling optimization. According to the method, the network average aggregation coefficient, the key node proportion and the node criticality are calculated according to the network topology scale, the network density degree and the centrality of various nodes, and the deployment position of the intelligent agent is determined by integrating the key node proportion and the node criticality; generating a routing strategy by the intelligent agent according to the service flow requirement and the network state, feeding back a utility function after the network executes the strategy, and optimizing the strategy by the intelligent agent through feeding back adjustment parameters; after the training of the intelligent agent is completed, the routing strategy is continuously adjusted along with the change of the network state, and the network efficiency is continuously optimized. The invention realizes complex network control through a small number of intelligent agent nodes, effectively improves network throughput, reduces average service delay and deployment cost on the basis of not greatly changing the prior network environment, has better expandability, can be flexibly deployed under different network scales, and is suitable for engineering realization.

Description

Distributed extensible intelligent routing method based on key nodes
Technical Field
The invention relates to the field of distributed intelligent route optimization, in particular to a distributed extensible intelligent route method based on key nodes.
Background
Along with the development of internet technology, the complexity and the dynamics of the communication network are continuously improved, the breakthroughs of material science and manufacturing technology push the processing capacity of the terminal equipment to be continuously improved, however, the development speed of network equipment and a transmission channel is relatively slow, a series of new services such as video live broadcast, electronic competition, remote medical treatment, financial data and the like put forward higher requirements on the performance of the communication network, and the 'high reliability, low time delay and large bandwidth' become the development targets of future network control technology.
The traditional routing method generally adopts a shortest path algorithm to plan paths, designs a fixed network flow model, IS obtained by solving an objective function, and more commonly comprises routing algorithms based on distance vectors such as RIP, IGRP, EIGRP and the like and routing algorithms based on link states such as OSPF, IS-IS and the like, which are widely deployed in various environments and are mature in use, but a best effort forwarding mode cannot formulate differentiated routing strategies for network data flows with different characteristics, so as to meet differentiated service requirements in a complex network. The method is characterized in that an efficient and self-adaptive network service routing control scheme is found to ensure the QoS of a network, reduce unnecessary network resource overhead and improve the network resource utilization rate, and provides efficient and stable service support for upper-layer application under the condition of limited calculation, storage and communication resources, so that the method is a problem to be solved in the current communication network.
Aiming at the problems that the traditional routing method has low convergence rate of complex environment and can not adapt to dynamic network environment, and the like, in recent years, new technologies and architectures such as machine learning, SDN and the like are researched and applied to routing planning, and certain progress is made in the fields such as congestion control, load balancing and the like. The routing algorithm can be roughly classified into two types of intelligent routing algorithm based on supervised learning and intelligent routing algorithm based on reinforcement learning (Reinforcement Learning, RL) according to the type of the machine learning method applied by the routing algorithm, and the intelligent routing algorithm based on supervised learning can calculate an appropriate routing scheme more accurately by using network state information and topology information, and simultaneously has advantages compared with the traditional scheme in aspects of convergence speed improvement and signaling overhead reduction. However, using supervised learning algorithms always faces a problem: deep neural networks can use millions or even tens of millions of large parameters, which can only be considered as black boxes, which makes intelligent algorithms difficult to debug, and the volume of deployment is large, which makes detail adjustment difficult.
The intelligent routing method based on deep reinforcement learning (Deep Reinforcement Learning, DRL) can obtain an approximately optimal network configuration scheme usually only by one operation, and can continuously interact with the environment by learning actual network data without simplifying the environment, and can adapt to a nonlinear complex system by operating according to actual information. However, the convergence of the DRL model is strongly related to the output dimension, and most algorithms calculate the route in an indirect manner in order to avoid the problem, for example, calculate the link weight through the DRL algorithm, and make a route decision through other traditional algorithms, so that a real intelligent route selection is not achieved.
In recent years, intelligent routing research is dedicated to network performance improvement in a specific scene, and in an actual application scene, because of factors such as large network scale, changeable environment and the like, the robustness and reliability of the existing method cannot be met, and the algorithm is far from meeting the requirements for daily network management and control. Therefore, how to design a simple, effective, low-cost routing scheme becomes a research difficulty and hotspot in the field of intelligent routing planning.
Disclosure of Invention
In view of the above, the invention provides a distributed scalable intelligent routing method based on key nodes, which constructs a distributed intelligent body by using an RL algorithm, determines the deployment position of the intelligent body according to the network topology scale and the node criticality, realizes complex network control by a small number of intelligent body nodes, effectively improves the network throughput, reduces the average service delay and the deployment cost in the existing network environment, has better scalability, can be flexibly deployed under different network scales, and is suitable for engineering realization.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a distributed scalable intelligent routing method based on key nodes comprises the following steps:
(1) Network link state information is acquired in real time, and each node V in the network G= (V, E) is calculated according to the network state by using a centrality criticality method i Center criticality ρ of (2) i The method comprises the steps of carrying out a first treatment on the surface of the Wherein v= { V 1 ,v 2 ,...v N -representing a set of nodes, N representing the total number of all nodes, E representing a set of all edges in the network topology;
(2) According to the value of the centrality criticality, the nodes are arranged from big to small, and the nodes ranked at the front are used as network topology key nodes;
(3) Calculating the shortest paths from the current node to all other target nodes at the node by using a Dijkstra method to obtain next-hop neighbor nodes, and forming a routing table;
(4) Deploying an agent based on reinforcement learning at a selected key node, starting pre-training of the agent, randomly generating service flow demands, recalculating a next hop route through the service flow of the agent to the key node, standardizing performance indexes such as service flow time delay, packet loss and the like, determining a strategy rewarding value according to a utility function, and adjusting an agent model parameter;
(5) In the real service flow, the trained agent is utilized to perform route calculation at each key node, and the route is continuously learned in the route process, so that the intelligent route is realized.
Further, the specific mode of the step (1) is as follows:
(101) Calculating the degree-centrality DC of each node in the network i Center of median BC i Near centrality CC i
Figure BDA0004165272070000021
Figure BDA0004165272070000022
Figure BDA0004165272070000023
Wherein k is i Representing existing AND node v i The number of connected edges g st Representing the connection v s And v t Is used to determine the number of shortest paths,
Figure BDA0004165272070000024
representing connection through node v i And connect v s And v t D ij Representing node v i To node v j Is a distance of (2);
(102) Calculating the centrality criticality of all nodes according to a node centrality criticality calculation formula:
Figure BDA0004165272070000031
further, the specific mode of the step (2) is as follows:
(201) Calculating a network aggregation coefficient:
Figure BDA0004165272070000032
wherein k is i Representing node v i All neighbors of E i Representing node v i All the adjacent edges, C i Representing node v i C represents the average aggregation coefficient of the network;
(202) Determining a key node proportion m according to the network scale and the aggregation coefficient:
Figure BDA0004165272070000033
(203) And arranging the nodes according to the value of the centrality criticality from large to small, and determining the critical nodes according to the proportion m.
Further, the specific mode of the step (3) is as follows:
(301) All nodes diffuse own topological connection relations and establish a topological table;
(302) And forming a routing table by calculating a topology table by using a Dijkstra method, and interacting network state perception data.
Further, the specific mode of the step (4) is as follows:
(401) Setting service type, service flow demand and service flow size, randomly generating service flow data, adjusting service flow duration according to network load level, and simulating real service flow state;
(402) When an agent exists in a certain service flow routing path, the agent decides to judge the next jump action, and all agents share a global state space (S, A, P, R), wherein S is the current state set of all agents, and A= pi i∈N A i Representing the joint action space of all agents, { A i } i∈N The action space of the intelligent agent i is as follows, P: S.times.A.times.S.fwdarw.0, 1]For the state transition probability, R:
Figure BDA0004165272070000034
each intelligent agent makes a routing action decision according to the current network situation and service flow information, and meanwhile, the global rewards are improved;
(403) Constructing a state indicating variable K of a node relative to a service flow, and if a certain service flow passes through a node i, K i =1, otherwise K i =0; when K is i When 1, node i will not be used to plan the next hop route; the decision of an agent at node i is represented as
Figure BDA0004165272070000035
In theta i To strengthen the learning network parameters s i For part of the state of the agent at node i, comprising traffic flow information, part of the neighbor routing table, the available bandwidth of the link, c i The condition state for the agent decision at the node i is composed of a state indicating variable K; in order to maximize network transmission capacity +.>
Figure BDA0004165272070000036
The global rewards of the network should be maximized, J (theta) is a global objective function, and theta is the reinforcement learning network parameter theta of each intelligent agent i And calculating gradient update θ for J (θ) by:
Figure BDA0004165272070000041
wherein τ represents a value represented by τ to p θ (τ) vs θ Sampled state-action pairs (s, a), estimator A (s, a) is used to estimate the policy pi in state s θ The extent of advantage of action a taken over randomly taken action, policy pi when A (s, a) > 0 θ Action a will take better results than random action, pi θ (a|s) represents a policy to employ action a under conditions of network parameter θ and agent state S;
(404) The strategy parameter gradient function of the agent at the computing node i is used for updating the reinforcement learning network parameters, and the computing formula is as follows:
Figure BDA0004165272070000042
wherein A is i (s i A) is a local unbiased estimate of the generalized dominance value a (s, a), when the agent i is in the routing path,
Figure BDA0004165272070000043
at this point the agent will not update its policy parameters;
(405) Agent part state s at node i i Condition state c i Under the condition, outputting an action, namely selecting probability distribution of the next hop neighbor of the service flow through the reinforcement learning network, and calculating a reward function of the action as follows:
Figure BDA0004165272070000044
wherein gamma is E [0,1 ]]To reward discount factors, r t The utility value for the t-th traffic stream is expressed as:
Figure BDA0004165272070000045
wherein the method comprises the steps of
Figure BDA0004165272070000046
Respectively representing normalized throughput, delay and packet loss of the t-th service flow, and calculating normalized by dividing t-th service flow parameters by service flow parameter average values of the same type, homology/target nodes, and alpha 1 、α 2 、α 3 Is a non-negative scalar and represents the importance weight of performance indexes such as throughput, delay, packet loss and the like;
(406) After the decision of the path participated by the intelligent agent is finished, detecting whether an unsafe path exists in the path, the loop and link state of which do not meet the service flow requirement, if so, the intelligent agent rewards the negative scalar for the decision as punishment, and recalculating the path by adopting a Dijkstra method.
The beneficial effects of the invention are as follows:
1. compared with the traditional routing protocol, the invention has the advantage that the network throughput and the service average time delay are obviously improved.
2. The method adopts a distributed routing scheme, determines the key node deployment agent based on the network scale and the aggregation degree, has better expandability, can be flexibly deployed under different network scales, and is easy for engineering realization.
Drawings
Fig. 1 is an overall flowchart of a distributed scalable intelligent routing method based on key nodes in an embodiment of the present invention.
FIG. 2 is a flow chart of key node computation in an embodiment of the invention.
Fig. 3 is a flow chart of a distributed multi-agent routing scheme in an embodiment of the present invention.
Fig. 4 is a network topology diagram of a simulation experiment in an embodiment of the present invention.
Fig. 5 is a schematic diagram showing the comparison of service average time delays under the condition of different intelligent agent numbers in the simulation experiment of the present invention.
Detailed Description
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings in the simulation experiment of the invention.
A distributed extensible intelligent routing method based on key nodes is shown in figure 1, and comprises the following steps:
(1) Network link state information is acquired in real time, and each node V in the network G= (V, E) is calculated according to the network state by using a centrality criticality method i Center criticality ρ of (2) i The method comprises the steps of carrying out a first treatment on the surface of the Wherein V represents a node set, the total number of all nodes is N, and E represents a set of all edges in the network topology;
(2) According to the value of the centrality criticality, the nodes are arranged from big to small, and the nodes ranked at the front are used as network topology key nodes;
(3) Calculating the shortest path from the current node to all other target nodes by using a Dijkstra method at the node, and calculating the next-hop neighbor node to form a routing table;
(4) Arranging an agent based on reinforcement learning at a selected key node, starting pre-training of the agent, randomly generating service flow demands, recalculating a next hop route through the service flow of the agent to the key node, standardizing performance indexes such as service flow time delay, packet loss and the like, determining a strategy rewarding value according to a utility function, and adjusting an agent model parameter;
(5) In the real service flow, the trained agent is utilized to perform route calculation at each key node, and the route is continuously learned in the route process, so that the intelligent route is realized.
According to the method, the number of the agents required by the control network is reduced by introducing network key nodes, the control efficiency of a single-agent network is improved by combining network key node positions to deploy the agents, and the whole network is driven by local intelligence, so that an intelligent network capable of self-evolution is formed.
Further, as shown in fig. 2, in step (1), the centrality criticality of the node is calculated by means of the centrality, the betweenness centrality and the proximity centrality, wherein the centrality judges the node importance by measuring the number of node neighbors; the betting center is used for judging the dependency relationship among nodes, namely: the greater the median centrality is the greater the influence degree of the node on other nodes of the whole network; the approximate centrality is similar to the medium centrality, the node centrality is judged by using the whole network characteristics, and the centrality of the node is measured by the opposite number of the average distance from one node to other nodes of the whole network. The three centrality calculation modes are shown in the following formulas:
Figure BDA0004165272070000061
wherein DC i For node v i Center of degree, BC i For node v i Medium centre of (C2) i For node v i Near centrality, k i Representing the existing andnode v i The number of connected edges, N is the total number of nodes, g st Representing the connection v s And v t Is used to determine the number of shortest paths,
Figure BDA0004165272070000062
representing connection through node v i And connect v s And v t D ij Representing node v i To node v j Is a distance of (3).
The influence degree of the nodes on the whole network is measured from different angles by different centrality calculation modes, three different centrality parameters are considered in the method, a node centrality criticality calculation formula is provided, and the centrality criticality calculation formula of all the nodes is as follows:
Figure BDA0004165272070000063
rho in i Representing node k i Central criticality of (c).
Further, in the step (2), the parameter m needs to be determined according to the network scale and the aggregation coefficient, the aggregation coefficient can measure the network density, the network with the same scale and higher aggregation coefficient can control the network through fewer intelligent nodes. The network aggregation factor calculation formula is as follows:
Figure BDA0004165272070000064
wherein k is i Representing node v i All neighbors of E i Representing node v i All the adjacent edges, C i Representing node v i C represents the average aggregation factor of the network.
The calculation mode of the key node selection proportion parameter m is as follows:
Figure BDA0004165272070000065
further, as shown in fig. 3, the flow of the distributed agent routing scheme is shown in fig. 3, firstly, all nodes in step (3) firstly diffuse their own topological connection relationship, establish a topology table, after the network converges and stabilizes, each node calculates the distance to other target nodes and the next hop neighbor number through the topology table by using Dijkstra method to form a routing table, and meanwhile, the nodes interact with network state sensing data, and in the network operation, each node can sense and calculate the current network bandwidth, link load and traffic flow information according to the default value, the set value or the current real data.
Further, in the step (4), in the agent pre-training stage, setting a service flow service type, a service flow requirement and a service flow size, randomly generating service flow data, and adjusting a service flow duration according to a network load level, for a certain service flow, when a service flow starting point (current node) is a non-agent, directly judging a next hop node according to a routing table until the node is the agent.
When an agent exists in a certain service flow routing path, the decision of the agent judges the next jump action, and all agents share a global state space (S, A, P, R), wherein S in the formula is the current state set of all agents, and A=pi i∈N A i Representing the joint action space of all agents, { A i } i∈N Is the action space of the intelligent agent i, P is S multiplied by A multiplied by S to the number of [0,1 ]]For the state transition probability, R:
Figure BDA0004165272070000071
Figure BDA0004165272070000072
the intelligent agent at each node makes a routing action decision according to the current network situation and service flow information, and meanwhile, the global rewards are improved.
Constructing a state indicating variable K of a node relative to a service flow, and if a certain service flow passes through a node i, K i =1, otherwise K i =0; when K is i If 1, the node i is not used for planning the next hop route; the decision of an agent at node i can be expressed as
Figure BDA0004165272070000073
In theta i Reinforcement learning network parameters s i For the partial state of the current intelligent agent i, the current intelligent agent i comprises service flow information, partial neighbor routing table and available bandwidth of a link, c i The condition state for the decision of the current intelligent agent is composed of a state indicating variable K; in order to maximize network transmission capacity +.>
Figure BDA0004165272070000074
The global rewards of the network should be maximized, J (theta) is a global objective function, and theta is the reinforcement learning network parameter theta of each intelligent agent i And (c) updating θ by calculating the gradient of J (θ), the calculation formula is as follows:
Figure BDA0004165272070000075
wherein τ represents a value represented by τ to p θ (τ) vs θ Sampled state-action pairs (s, a), estimator A (s, a) is used to estimate the policy pi in state s θ The degree of advantage of action a to taking action at random, when A (s, a)>At 0, policy pi θ Action a will take better results than random action, pi θ (a|s) represents a policy to employ the action a under the conditions of the network parameter θ and the agent state S.
Further, the policy parameter gradient function of the agent at the node i is:
Figure BDA0004165272070000076
wherein A is i (s i A) is a local unbiased estimate of the generalized dominance value a (s, a), when the agent i is in the routing path,
Figure BDA0004165272070000077
at this point the agent will not update its policy parameters.
At the sectionPart state s of agent at point i i Condition state c i Under the condition, outputting an action, namely selecting probability distribution of the next hop neighbor of the service flow through the reinforcement learning network, and calculating a reward function of the action as follows:
Figure BDA0004165272070000078
wherein gamma is E [0,1 ]]To reward discount factors, r t The utility value for the t-th traffic flow can be expressed as:
Figure BDA0004165272070000079
wherein the method comprises the steps of
Figure BDA00041652720700000710
Respectively representing normalized throughput, delay and packet loss of the t-th service flow, and calculating normalized by dividing t-th service flow parameters by service flow parameter average values of the same type, homology/target nodes, and alpha 1 、α 2 、α 3 Is a non-negative scalar and represents the importance weight of performance indexes such as throughput, delay, packet loss and the like.
Meanwhile, in order to reduce unsafe routing problems caused by random exploration in the learning process of the intelligent agent, an intelligent decision path detection mechanism is introduced in the method, after the path decision participated by the intelligent agent is finished, whether unsafe paths exist in which the loop and link states do not meet the service flow requirements or not is detected, if unsafe paths exist, the intelligent agent rewards a negative scalar for the decision as punishment, and the Dijkstra method is adopted to recalculate the paths.
The following examples further illustrate the above method by simulating a network as shown in fig. 4.
(1) Reading the simulated network topology information and parameters of each node, calculating the centrality, the betweenness centrality and the proximity centrality of each node, and utilizing a formula
Figure BDA0004165272070000081
Determining the centrality criticality of each node, wherein the calculation result is shown in table 1:
table 1 all nodes centrality calculation table in network example
Figure BDA0004165272070000082
As can be seen from table 1, the centrality criticality of each node in the network is ordered from big to small, and is: { V4, V6, V8, V7, V1, V3, V5, V9, V2, V10}.
(2) Calculating a network aggregation coefficient according to the network topology information (node number and edge number), wherein the calculation result of each node aggregation coefficient in the network is shown in table 2:
table 2 network node aggregation coefficients
Figure BDA0004165272070000083
The calculation of the table shows that the total network average data aggregation coefficient is 0.42 by using the formula
Figure BDA0004165272070000084
Calculating the key node proportion, so that the whole network key node proportion is 0.6, the whole network node number is 10, so that the key node number is 6, and 6 nodes with larger key degree are { V4, V6, V8, V7, V1 and V3} as key nodes according to the central key degree sequence, namely the deployment position of the intelligent agent in the network.
(3) According to the key node position, distributing the distributed agent, randomly generating the whole network service flow by the network, registering the network load as 0.8 in the embodiment, namely, the total network service flow is 0.8 times of the total bandwidth of the network, calculating the average time delay of network service transmission under three conditions of deploying the agent at the key node and deploying the agent in the whole network, and taking an average value once every 3000 service flow time delay, wherein the three-condition time delay curves are shown in figure 5. As can be seen from fig. 5, in the initial stage, the agent is in the learning stage, the time delay of the network for deploying the agent is larger than that of the network for deploying the agent, but the time delay gradually decreases after learning, the final time delay tends to be stable and the average time delay is lower than that of the network for deploying the agent, and the effect of the two schemes of deploying the agent in the whole network is similar to that of deploying the agent in the key node. The embodiment shows that the method for deploying the intelligent agent at the key node can effectively reduce the average time delay of the network, improve the transmission performance of the network, has better expandability, can be flexibly deployed under different network scales, is easy for engineering realization, reduces the cost and achieves the expected purpose.
In a word, the invention realizes complex network control through a small number of intelligent agent nodes, effectively improves network throughput, reduces average service delay and deployment cost on the basis of not greatly changing the existing network environment, has better expandability, can be flexibly deployed under different network scales, and is suitable for engineering realization.

Claims (5)

1. A distributed scalable intelligent routing method based on key nodes, comprising the steps of:
(1) Network link state information is acquired in real time, and each node V in the network G= (V, E) is calculated according to the network state by using a centrality criticality method i Center criticality ρ of (2) i The method comprises the steps of carrying out a first treatment on the surface of the Wherein v= { v 1 ,v 2 ,...v N -representing a set of nodes, N representing the total number of all nodes, E representing a set of all edges in the network topology;
(2) According to the value of the centrality criticality, the nodes are arranged from big to small, and the nodes ranked at the front are used as network topology key nodes;
(3) Calculating the shortest paths from the current node to all other target nodes at the node by using a Dijkstra method to obtain next-hop neighbor nodes, and forming a routing table;
(4) Deploying an agent based on reinforcement learning at a selected key node, starting pre-training of the agent, randomly generating service flow demands, recalculating a next hop route through the service flow of the agent to the key node, standardizing performance indexes such as service flow time delay, packet loss and the like, determining a strategy rewarding value according to a utility function, and adjusting an agent model parameter;
(5) In the real service flow, the trained agent is utilized to perform route calculation at each key node, and the route is continuously learned in the route process, so that the intelligent route is realized.
2. The key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (1) is:
(101) Calculating the degree-centrality DC of each node in the network i Center of median BC i Near centrality CC i
Figure FDA0004165272060000011
Figure FDA0004165272060000012
Figure FDA0004165272060000013
Wherein k is i Representing existing AND node v i The number of connected edges g st Representing the connection v s And v t Is used to determine the number of shortest paths,
Figure FDA0004165272060000014
representing connection through node v i And connect v s And v t D ij Representing node v i To node v j Is a distance of (2);
(102) Calculating the centrality criticality of all nodes according to a node centrality criticality calculation formula:
Figure FDA0004165272060000015
3. the key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (2) is:
(201) Calculating a network aggregation coefficient:
Figure FDA0004165272060000021
wherein k is i Representing node v i All neighbors of E i Representing node v i All the adjacent edges, C i Representing node v i C represents the average aggregation coefficient of the network;
(202) Determining a key node proportion m according to the network scale and the aggregation coefficient:
Figure FDA0004165272060000022
(203) And arranging the nodes according to the value of the centrality criticality from large to small, and determining the critical nodes according to the proportion m.
4. The key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (3) is:
(301) All nodes diffuse own topological connection relations and establish a topological table;
(302) And forming a routing table by calculating a topology table by using a Dijkstra method, and interacting network state perception data.
5. The key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (4) is:
(401) Setting service type, service flow demand and service flow size, randomly generating service flow data, adjusting service flow duration according to network load level, and simulating real service flow state;
(402) When an agent exists in a certain service flow routing path, the decision of the agent judges the next jump action, and all agents share a global state space (S, A, P, R), wherein S is the current state set of all agents, and A=pi i∈N A i Representing the joint action space of all agents, { A i } i∈N The action space of the intelligent agent i is as follows, P: S.times.A.times.S.fwdarw.0, 1]For the state transition probability, R:
Figure FDA0004165272060000023
each intelligent agent makes a routing action decision according to the current network situation and service flow information, and meanwhile, the global rewards are improved;
(403) Constructing a state indicating variable K of a node relative to a service flow, and if a certain service flow passes through a node i, K i =1, otherwise K i =0; when K is i When 1, node i will not be used to plan the next hop route; the decision of an agent at node i is represented as
Figure FDA0004165272060000024
In theta i To strengthen the learning network parameters s i For part of the state of the agent at node i, comprising traffic flow information, part of the neighbor routing table, the available bandwidth of the link, c i The condition state for the agent decision at the node i is composed of a state indicating variable K; in order to maximize network transmission capacity +.>
Figure FDA0004165272060000025
The global rewards of the network should be maximized, J (theta) is a global objective function, and theta is the reinforcement learning network parameter theta of each intelligent agent i And calculating gradient update θ for J (θ) by:
Figure FDA0004165272060000026
wherein τ represents a value represented by τ to p θ (τ) vs θ Sampled state-action pairs (s, a), estimator A (s, a) is used to estimate the policy pi in state s θ The extent of advantage of action a taken over randomly taken action, policy pi when A (s, a) > 0 θ Action a will take better results than random action, pi θ (a|s) represents a policy to employ action a under conditions of network parameter θ and agent state S;
(404) The strategy parameter gradient function of the agent at the computing node i is used for updating the reinforcement learning network parameters, and the computing formula is as follows:
Figure FDA0004165272060000031
wherein A is i (s i A) is a local unbiased estimate of the generalized dominance value a (s, a), when the agent i is in the routing path,
Figure FDA0004165272060000032
at this point the agent will not update its policy parameters;
(405) Agent part state s at node i i Condition state c i Under the condition, outputting an action, namely selecting probability distribution of the next hop neighbor of the service flow through the reinforcement learning network, and calculating a reward function of the action as follows:
Figure FDA0004165272060000033
wherein gamma is E [0,1 ]]To reward discount factors, r t The utility value for the t-th traffic stream is expressed as:
Figure FDA0004165272060000034
wherein the method comprises the steps of
Figure FDA0004165272060000035
Respectively representing normalized throughput, delay and packet loss of the t-th service flow, and calculating normalized by dividing t-th service flow parameters by service flow parameter average values of the same type, homology/target nodes, and alpha 1 、α 2 、α 3 Is a non-negative scalar and represents the importance weight of performance indexes such as throughput, delay, packet loss and the like;
(406) After the decision of the path participated by the intelligent agent is finished, detecting whether an unsafe path exists in the path, the loop and link state of which do not meet the service flow requirement, if so, the intelligent agent rewards the negative scalar for the decision as punishment, and recalculating the path by adopting a Dijkstra method.
CN202310361356.7A 2023-04-06 2023-04-06 Distributed extensible intelligent routing method based on key nodes Pending CN116418730A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310361356.7A CN116418730A (en) 2023-04-06 2023-04-06 Distributed extensible intelligent routing method based on key nodes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310361356.7A CN116418730A (en) 2023-04-06 2023-04-06 Distributed extensible intelligent routing method based on key nodes

Publications (1)

Publication Number Publication Date
CN116418730A true CN116418730A (en) 2023-07-11

Family

ID=87052670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310361356.7A Pending CN116418730A (en) 2023-04-06 2023-04-06 Distributed extensible intelligent routing method based on key nodes

Country Status (1)

Country Link
CN (1) CN116418730A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737511A (en) * 2023-08-10 2023-09-12 山景智能(北京)科技有限公司 Graph-based scheduling job monitoring method and device
CN117319287A (en) * 2023-11-27 2023-12-29 之江实验室 Network extensible routing method and system based on multi-agent reinforcement learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737511A (en) * 2023-08-10 2023-09-12 山景智能(北京)科技有限公司 Graph-based scheduling job monitoring method and device
CN117319287A (en) * 2023-11-27 2023-12-29 之江实验室 Network extensible routing method and system based on multi-agent reinforcement learning
CN117319287B (en) * 2023-11-27 2024-04-05 之江实验室 Network extensible routing method and system based on multi-agent reinforcement learning

Similar Documents

Publication Publication Date Title
Tang et al. Survey on machine learning for intelligent end-to-end communication toward 6G: From network access, routing to traffic control and streaming adaption
Mao et al. An intelligent route computation approach based on real-time deep learning strategy for software defined communication systems
CN116418730A (en) Distributed extensible intelligent routing method based on key nodes
Altman et al. Non-cooperative forwarding in ad-hoc networks
Kuklinski et al. Density based clustering algorithm for vehicular ad-hoc networks
CN107182074A (en) A kind of route optimal path choosing method based on Zigbee
Ghosh et al. A cognitive routing framework for reliable communication in IoT for industry 5.0
CN109547351A (en) Method for routing based on Q study and trust model in Ad Hoc network
CN107094112A (en) Bandwidth constraint multicast routing optimization method based on drosophila optimized algorithm
Du et al. Multi-agent reinforcement learning for dynamic resource management in 6G in-X subnetworks
CN116390164A (en) Low orbit satellite network trusted load balancing routing method, system, equipment and medium
Han et al. QMIX aided routing in social-based delay-tolerant networks
Qin et al. Traffic optimization in satellites communications: A multi-agent reinforcement learning approach
Zhao et al. Improving inter-domain routing through multi-agent reinforcement learning
Abdel-Kader An improved discrete PSO with GA operators for efficient QoS-multicast routing
Sutariya et al. A survey of ant colony based routing algorithms for manet
Bozorgi et al. A smart optimizer approach for clustering protocol in UAV-assisted IoT wireless networks
Peng et al. Real-time transmission optimization for edge computing in industrial cyber-physical systems
Meng et al. Intelligent routing orchestration for ultra-low latency transport networks
Hosseinzadeh et al. A Q-learning-based smart clustering routing method in flying Ad Hoc networks
HaghighiFard et al. Hierarchical Federated Learning in Multi-hop Cluster-Based VANETs
Safari et al. A review of ai-based MANET routing protocols
CN103619047A (en) Opportunistic routing method in multiple-concurrent-flow wireless mesh network
Wu et al. On-demand Intelligent Routing Algorithms for the Deterministic Networks
Wei et al. G-Routing: Graph Neural Networks-Based Flexible Online Routing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination