CN116418730A - Distributed extensible intelligent routing method based on key nodes - Google Patents
Distributed extensible intelligent routing method based on key nodes Download PDFInfo
- Publication number
- CN116418730A CN116418730A CN202310361356.7A CN202310361356A CN116418730A CN 116418730 A CN116418730 A CN 116418730A CN 202310361356 A CN202310361356 A CN 202310361356A CN 116418730 A CN116418730 A CN 116418730A
- Authority
- CN
- China
- Prior art keywords
- network
- node
- agent
- service flow
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000002776 aggregation Effects 0.000 claims abstract description 15
- 238000004220 aggregation Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 5
- 230000009471 action Effects 0.000 claims description 30
- 238000004364 calculation method Methods 0.000 claims description 17
- 230000002787 reinforcement Effects 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 13
- 230000005540 biological transmission Effects 0.000 claims description 6
- 230000008901 benefit Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 4
- 238000009826 distribution Methods 0.000 claims description 3
- 230000009916 joint effect Effects 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 2
- 238000005457 optimization Methods 0.000 abstract description 2
- 239000003795 chemical substances by application Substances 0.000 description 70
- 238000004422 calculation algorithm Methods 0.000 description 15
- 238000004891 communication Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 108010015046 cell aggregation factors Proteins 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 102100036255 Glucose-6-phosphatase 2 Human genes 0.000 description 1
- 101000930907 Homo sapiens Glucose-6-phosphatase 2 Proteins 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/14—Routing performance; Theoretical aspects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0823—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0894—Policy-based network configuration management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a distributed extensible intelligent routing method based on key nodes, and belongs to the field of network resource scheduling optimization. According to the method, the network average aggregation coefficient, the key node proportion and the node criticality are calculated according to the network topology scale, the network density degree and the centrality of various nodes, and the deployment position of the intelligent agent is determined by integrating the key node proportion and the node criticality; generating a routing strategy by the intelligent agent according to the service flow requirement and the network state, feeding back a utility function after the network executes the strategy, and optimizing the strategy by the intelligent agent through feeding back adjustment parameters; after the training of the intelligent agent is completed, the routing strategy is continuously adjusted along with the change of the network state, and the network efficiency is continuously optimized. The invention realizes complex network control through a small number of intelligent agent nodes, effectively improves network throughput, reduces average service delay and deployment cost on the basis of not greatly changing the prior network environment, has better expandability, can be flexibly deployed under different network scales, and is suitable for engineering realization.
Description
Technical Field
The invention relates to the field of distributed intelligent route optimization, in particular to a distributed extensible intelligent route method based on key nodes.
Background
Along with the development of internet technology, the complexity and the dynamics of the communication network are continuously improved, the breakthroughs of material science and manufacturing technology push the processing capacity of the terminal equipment to be continuously improved, however, the development speed of network equipment and a transmission channel is relatively slow, a series of new services such as video live broadcast, electronic competition, remote medical treatment, financial data and the like put forward higher requirements on the performance of the communication network, and the 'high reliability, low time delay and large bandwidth' become the development targets of future network control technology.
The traditional routing method generally adopts a shortest path algorithm to plan paths, designs a fixed network flow model, IS obtained by solving an objective function, and more commonly comprises routing algorithms based on distance vectors such as RIP, IGRP, EIGRP and the like and routing algorithms based on link states such as OSPF, IS-IS and the like, which are widely deployed in various environments and are mature in use, but a best effort forwarding mode cannot formulate differentiated routing strategies for network data flows with different characteristics, so as to meet differentiated service requirements in a complex network. The method is characterized in that an efficient and self-adaptive network service routing control scheme is found to ensure the QoS of a network, reduce unnecessary network resource overhead and improve the network resource utilization rate, and provides efficient and stable service support for upper-layer application under the condition of limited calculation, storage and communication resources, so that the method is a problem to be solved in the current communication network.
Aiming at the problems that the traditional routing method has low convergence rate of complex environment and can not adapt to dynamic network environment, and the like, in recent years, new technologies and architectures such as machine learning, SDN and the like are researched and applied to routing planning, and certain progress is made in the fields such as congestion control, load balancing and the like. The routing algorithm can be roughly classified into two types of intelligent routing algorithm based on supervised learning and intelligent routing algorithm based on reinforcement learning (Reinforcement Learning, RL) according to the type of the machine learning method applied by the routing algorithm, and the intelligent routing algorithm based on supervised learning can calculate an appropriate routing scheme more accurately by using network state information and topology information, and simultaneously has advantages compared with the traditional scheme in aspects of convergence speed improvement and signaling overhead reduction. However, using supervised learning algorithms always faces a problem: deep neural networks can use millions or even tens of millions of large parameters, which can only be considered as black boxes, which makes intelligent algorithms difficult to debug, and the volume of deployment is large, which makes detail adjustment difficult.
The intelligent routing method based on deep reinforcement learning (Deep Reinforcement Learning, DRL) can obtain an approximately optimal network configuration scheme usually only by one operation, and can continuously interact with the environment by learning actual network data without simplifying the environment, and can adapt to a nonlinear complex system by operating according to actual information. However, the convergence of the DRL model is strongly related to the output dimension, and most algorithms calculate the route in an indirect manner in order to avoid the problem, for example, calculate the link weight through the DRL algorithm, and make a route decision through other traditional algorithms, so that a real intelligent route selection is not achieved.
In recent years, intelligent routing research is dedicated to network performance improvement in a specific scene, and in an actual application scene, because of factors such as large network scale, changeable environment and the like, the robustness and reliability of the existing method cannot be met, and the algorithm is far from meeting the requirements for daily network management and control. Therefore, how to design a simple, effective, low-cost routing scheme becomes a research difficulty and hotspot in the field of intelligent routing planning.
Disclosure of Invention
In view of the above, the invention provides a distributed scalable intelligent routing method based on key nodes, which constructs a distributed intelligent body by using an RL algorithm, determines the deployment position of the intelligent body according to the network topology scale and the node criticality, realizes complex network control by a small number of intelligent body nodes, effectively improves the network throughput, reduces the average service delay and the deployment cost in the existing network environment, has better scalability, can be flexibly deployed under different network scales, and is suitable for engineering realization.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a distributed scalable intelligent routing method based on key nodes comprises the following steps:
(1) Network link state information is acquired in real time, and each node V in the network G= (V, E) is calculated according to the network state by using a centrality criticality method i Center criticality ρ of (2) i The method comprises the steps of carrying out a first treatment on the surface of the Wherein v= { V 1 ,v 2 ,...v N -representing a set of nodes, N representing the total number of all nodes, E representing a set of all edges in the network topology;
(2) According to the value of the centrality criticality, the nodes are arranged from big to small, and the nodes ranked at the front are used as network topology key nodes;
(3) Calculating the shortest paths from the current node to all other target nodes at the node by using a Dijkstra method to obtain next-hop neighbor nodes, and forming a routing table;
(4) Deploying an agent based on reinforcement learning at a selected key node, starting pre-training of the agent, randomly generating service flow demands, recalculating a next hop route through the service flow of the agent to the key node, standardizing performance indexes such as service flow time delay, packet loss and the like, determining a strategy rewarding value according to a utility function, and adjusting an agent model parameter;
(5) In the real service flow, the trained agent is utilized to perform route calculation at each key node, and the route is continuously learned in the route process, so that the intelligent route is realized.
Further, the specific mode of the step (1) is as follows:
(101) Calculating the degree-centrality DC of each node in the network i Center of median BC i Near centrality CC i :
Wherein k is i Representing existing AND node v i The number of connected edges g st Representing the connection v s And v t Is used to determine the number of shortest paths,representing connection through node v i And connect v s And v t D ij Representing node v i To node v j Is a distance of (2);
(102) Calculating the centrality criticality of all nodes according to a node centrality criticality calculation formula:
further, the specific mode of the step (2) is as follows:
(201) Calculating a network aggregation coefficient:
wherein k is i Representing node v i All neighbors of E i Representing node v i All the adjacent edges, C i Representing node v i C represents the average aggregation coefficient of the network;
(202) Determining a key node proportion m according to the network scale and the aggregation coefficient:
(203) And arranging the nodes according to the value of the centrality criticality from large to small, and determining the critical nodes according to the proportion m.
Further, the specific mode of the step (3) is as follows:
(301) All nodes diffuse own topological connection relations and establish a topological table;
(302) And forming a routing table by calculating a topology table by using a Dijkstra method, and interacting network state perception data.
Further, the specific mode of the step (4) is as follows:
(401) Setting service type, service flow demand and service flow size, randomly generating service flow data, adjusting service flow duration according to network load level, and simulating real service flow state;
(402) When an agent exists in a certain service flow routing path, the agent decides to judge the next jump action, and all agents share a global state space (S, A, P, R), wherein S is the current state set of all agents, and A= pi i∈N A i Representing the joint action space of all agents, { A i } i∈N The action space of the intelligent agent i is as follows, P: S.times.A.times.S.fwdarw.0, 1]For the state transition probability, R:each intelligent agent makes a routing action decision according to the current network situation and service flow information, and meanwhile, the global rewards are improved;
(403) Constructing a state indicating variable K of a node relative to a service flow, and if a certain service flow passes through a node i, K i =1, otherwise K i =0; when K is i When 1, node i will not be used to plan the next hop route; the decision of an agent at node i is represented asIn theta i To strengthen the learning network parameters s i For part of the state of the agent at node i, comprising traffic flow information, part of the neighbor routing table, the available bandwidth of the link, c i The condition state for the agent decision at the node i is composed of a state indicating variable K; in order to maximize network transmission capacity +.>The global rewards of the network should be maximized, J (theta) is a global objective function, and theta is the reinforcement learning network parameter theta of each intelligent agent i And calculating gradient update θ for J (θ) by:
wherein τ represents a value represented by τ to p θ (τ) vs θ Sampled state-action pairs (s, a), estimator A (s, a) is used to estimate the policy pi in state s θ The extent of advantage of action a taken over randomly taken action, policy pi when A (s, a) > 0 θ Action a will take better results than random action, pi θ (a|s) represents a policy to employ action a under conditions of network parameter θ and agent state S;
(404) The strategy parameter gradient function of the agent at the computing node i is used for updating the reinforcement learning network parameters, and the computing formula is as follows:
wherein A is i (s i A) is a local unbiased estimate of the generalized dominance value a (s, a), when the agent i is in the routing path,at this point the agent will not update its policy parameters;
(405) Agent part state s at node i i Condition state c i Under the condition, outputting an action, namely selecting probability distribution of the next hop neighbor of the service flow through the reinforcement learning network, and calculating a reward function of the action as follows:
wherein gamma is E [0,1 ]]To reward discount factors, r t The utility value for the t-th traffic stream is expressed as:
wherein the method comprises the steps ofRespectively representing normalized throughput, delay and packet loss of the t-th service flow, and calculating normalized by dividing t-th service flow parameters by service flow parameter average values of the same type, homology/target nodes, and alpha 1 、α 2 、α 3 Is a non-negative scalar and represents the importance weight of performance indexes such as throughput, delay, packet loss and the like;
(406) After the decision of the path participated by the intelligent agent is finished, detecting whether an unsafe path exists in the path, the loop and link state of which do not meet the service flow requirement, if so, the intelligent agent rewards the negative scalar for the decision as punishment, and recalculating the path by adopting a Dijkstra method.
The beneficial effects of the invention are as follows:
1. compared with the traditional routing protocol, the invention has the advantage that the network throughput and the service average time delay are obviously improved.
2. The method adopts a distributed routing scheme, determines the key node deployment agent based on the network scale and the aggregation degree, has better expandability, can be flexibly deployed under different network scales, and is easy for engineering realization.
Drawings
Fig. 1 is an overall flowchart of a distributed scalable intelligent routing method based on key nodes in an embodiment of the present invention.
FIG. 2 is a flow chart of key node computation in an embodiment of the invention.
Fig. 3 is a flow chart of a distributed multi-agent routing scheme in an embodiment of the present invention.
Fig. 4 is a network topology diagram of a simulation experiment in an embodiment of the present invention.
Fig. 5 is a schematic diagram showing the comparison of service average time delays under the condition of different intelligent agent numbers in the simulation experiment of the present invention.
Detailed Description
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings in the simulation experiment of the invention.
A distributed extensible intelligent routing method based on key nodes is shown in figure 1, and comprises the following steps:
(1) Network link state information is acquired in real time, and each node V in the network G= (V, E) is calculated according to the network state by using a centrality criticality method i Center criticality ρ of (2) i The method comprises the steps of carrying out a first treatment on the surface of the Wherein V represents a node set, the total number of all nodes is N, and E represents a set of all edges in the network topology;
(2) According to the value of the centrality criticality, the nodes are arranged from big to small, and the nodes ranked at the front are used as network topology key nodes;
(3) Calculating the shortest path from the current node to all other target nodes by using a Dijkstra method at the node, and calculating the next-hop neighbor node to form a routing table;
(4) Arranging an agent based on reinforcement learning at a selected key node, starting pre-training of the agent, randomly generating service flow demands, recalculating a next hop route through the service flow of the agent to the key node, standardizing performance indexes such as service flow time delay, packet loss and the like, determining a strategy rewarding value according to a utility function, and adjusting an agent model parameter;
(5) In the real service flow, the trained agent is utilized to perform route calculation at each key node, and the route is continuously learned in the route process, so that the intelligent route is realized.
According to the method, the number of the agents required by the control network is reduced by introducing network key nodes, the control efficiency of a single-agent network is improved by combining network key node positions to deploy the agents, and the whole network is driven by local intelligence, so that an intelligent network capable of self-evolution is formed.
Further, as shown in fig. 2, in step (1), the centrality criticality of the node is calculated by means of the centrality, the betweenness centrality and the proximity centrality, wherein the centrality judges the node importance by measuring the number of node neighbors; the betting center is used for judging the dependency relationship among nodes, namely: the greater the median centrality is the greater the influence degree of the node on other nodes of the whole network; the approximate centrality is similar to the medium centrality, the node centrality is judged by using the whole network characteristics, and the centrality of the node is measured by the opposite number of the average distance from one node to other nodes of the whole network. The three centrality calculation modes are shown in the following formulas:
wherein DC i For node v i Center of degree, BC i For node v i Medium centre of (C2) i For node v i Near centrality, k i Representing the existing andnode v i The number of connected edges, N is the total number of nodes, g st Representing the connection v s And v t Is used to determine the number of shortest paths,representing connection through node v i And connect v s And v t D ij Representing node v i To node v j Is a distance of (3).
The influence degree of the nodes on the whole network is measured from different angles by different centrality calculation modes, three different centrality parameters are considered in the method, a node centrality criticality calculation formula is provided, and the centrality criticality calculation formula of all the nodes is as follows:
rho in i Representing node k i Central criticality of (c).
Further, in the step (2), the parameter m needs to be determined according to the network scale and the aggregation coefficient, the aggregation coefficient can measure the network density, the network with the same scale and higher aggregation coefficient can control the network through fewer intelligent nodes. The network aggregation factor calculation formula is as follows:
wherein k is i Representing node v i All neighbors of E i Representing node v i All the adjacent edges, C i Representing node v i C represents the average aggregation factor of the network.
The calculation mode of the key node selection proportion parameter m is as follows:
further, as shown in fig. 3, the flow of the distributed agent routing scheme is shown in fig. 3, firstly, all nodes in step (3) firstly diffuse their own topological connection relationship, establish a topology table, after the network converges and stabilizes, each node calculates the distance to other target nodes and the next hop neighbor number through the topology table by using Dijkstra method to form a routing table, and meanwhile, the nodes interact with network state sensing data, and in the network operation, each node can sense and calculate the current network bandwidth, link load and traffic flow information according to the default value, the set value or the current real data.
Further, in the step (4), in the agent pre-training stage, setting a service flow service type, a service flow requirement and a service flow size, randomly generating service flow data, and adjusting a service flow duration according to a network load level, for a certain service flow, when a service flow starting point (current node) is a non-agent, directly judging a next hop node according to a routing table until the node is the agent.
When an agent exists in a certain service flow routing path, the decision of the agent judges the next jump action, and all agents share a global state space (S, A, P, R), wherein S in the formula is the current state set of all agents, and A=pi i∈N A i Representing the joint action space of all agents, { A i } i∈N Is the action space of the intelligent agent i, P is S multiplied by A multiplied by S to the number of [0,1 ]]For the state transition probability, R: the intelligent agent at each node makes a routing action decision according to the current network situation and service flow information, and meanwhile, the global rewards are improved.
Constructing a state indicating variable K of a node relative to a service flow, and if a certain service flow passes through a node i, K i =1, otherwise K i =0; when K is i If 1, the node i is not used for planning the next hop route; the decision of an agent at node i can be expressed asIn theta i Reinforcement learning network parameters s i For the partial state of the current intelligent agent i, the current intelligent agent i comprises service flow information, partial neighbor routing table and available bandwidth of a link, c i The condition state for the decision of the current intelligent agent is composed of a state indicating variable K; in order to maximize network transmission capacity +.>The global rewards of the network should be maximized, J (theta) is a global objective function, and theta is the reinforcement learning network parameter theta of each intelligent agent i And (c) updating θ by calculating the gradient of J (θ), the calculation formula is as follows:
wherein τ represents a value represented by τ to p θ (τ) vs θ Sampled state-action pairs (s, a), estimator A (s, a) is used to estimate the policy pi in state s θ The degree of advantage of action a to taking action at random, when A (s, a)>At 0, policy pi θ Action a will take better results than random action, pi θ (a|s) represents a policy to employ the action a under the conditions of the network parameter θ and the agent state S.
Further, the policy parameter gradient function of the agent at the node i is:
wherein A is i (s i A) is a local unbiased estimate of the generalized dominance value a (s, a), when the agent i is in the routing path,at this point the agent will not update its policy parameters.
At the sectionPart state s of agent at point i i Condition state c i Under the condition, outputting an action, namely selecting probability distribution of the next hop neighbor of the service flow through the reinforcement learning network, and calculating a reward function of the action as follows:
wherein gamma is E [0,1 ]]To reward discount factors, r t The utility value for the t-th traffic flow can be expressed as:
wherein the method comprises the steps ofRespectively representing normalized throughput, delay and packet loss of the t-th service flow, and calculating normalized by dividing t-th service flow parameters by service flow parameter average values of the same type, homology/target nodes, and alpha 1 、α 2 、α 3 Is a non-negative scalar and represents the importance weight of performance indexes such as throughput, delay, packet loss and the like.
Meanwhile, in order to reduce unsafe routing problems caused by random exploration in the learning process of the intelligent agent, an intelligent decision path detection mechanism is introduced in the method, after the path decision participated by the intelligent agent is finished, whether unsafe paths exist in which the loop and link states do not meet the service flow requirements or not is detected, if unsafe paths exist, the intelligent agent rewards a negative scalar for the decision as punishment, and the Dijkstra method is adopted to recalculate the paths.
The following examples further illustrate the above method by simulating a network as shown in fig. 4.
(1) Reading the simulated network topology information and parameters of each node, calculating the centrality, the betweenness centrality and the proximity centrality of each node, and utilizing a formulaDetermining the centrality criticality of each node, wherein the calculation result is shown in table 1:
table 1 all nodes centrality calculation table in network example
As can be seen from table 1, the centrality criticality of each node in the network is ordered from big to small, and is: { V4, V6, V8, V7, V1, V3, V5, V9, V2, V10}.
(2) Calculating a network aggregation coefficient according to the network topology information (node number and edge number), wherein the calculation result of each node aggregation coefficient in the network is shown in table 2:
table 2 network node aggregation coefficients
The calculation of the table shows that the total network average data aggregation coefficient is 0.42 by using the formulaCalculating the key node proportion, so that the whole network key node proportion is 0.6, the whole network node number is 10, so that the key node number is 6, and 6 nodes with larger key degree are { V4, V6, V8, V7, V1 and V3} as key nodes according to the central key degree sequence, namely the deployment position of the intelligent agent in the network.
(3) According to the key node position, distributing the distributed agent, randomly generating the whole network service flow by the network, registering the network load as 0.8 in the embodiment, namely, the total network service flow is 0.8 times of the total bandwidth of the network, calculating the average time delay of network service transmission under three conditions of deploying the agent at the key node and deploying the agent in the whole network, and taking an average value once every 3000 service flow time delay, wherein the three-condition time delay curves are shown in figure 5. As can be seen from fig. 5, in the initial stage, the agent is in the learning stage, the time delay of the network for deploying the agent is larger than that of the network for deploying the agent, but the time delay gradually decreases after learning, the final time delay tends to be stable and the average time delay is lower than that of the network for deploying the agent, and the effect of the two schemes of deploying the agent in the whole network is similar to that of deploying the agent in the key node. The embodiment shows that the method for deploying the intelligent agent at the key node can effectively reduce the average time delay of the network, improve the transmission performance of the network, has better expandability, can be flexibly deployed under different network scales, is easy for engineering realization, reduces the cost and achieves the expected purpose.
In a word, the invention realizes complex network control through a small number of intelligent agent nodes, effectively improves network throughput, reduces average service delay and deployment cost on the basis of not greatly changing the existing network environment, has better expandability, can be flexibly deployed under different network scales, and is suitable for engineering realization.
Claims (5)
1. A distributed scalable intelligent routing method based on key nodes, comprising the steps of:
(1) Network link state information is acquired in real time, and each node V in the network G= (V, E) is calculated according to the network state by using a centrality criticality method i Center criticality ρ of (2) i The method comprises the steps of carrying out a first treatment on the surface of the Wherein v= { v 1 ,v 2 ,...v N -representing a set of nodes, N representing the total number of all nodes, E representing a set of all edges in the network topology;
(2) According to the value of the centrality criticality, the nodes are arranged from big to small, and the nodes ranked at the front are used as network topology key nodes;
(3) Calculating the shortest paths from the current node to all other target nodes at the node by using a Dijkstra method to obtain next-hop neighbor nodes, and forming a routing table;
(4) Deploying an agent based on reinforcement learning at a selected key node, starting pre-training of the agent, randomly generating service flow demands, recalculating a next hop route through the service flow of the agent to the key node, standardizing performance indexes such as service flow time delay, packet loss and the like, determining a strategy rewarding value according to a utility function, and adjusting an agent model parameter;
(5) In the real service flow, the trained agent is utilized to perform route calculation at each key node, and the route is continuously learned in the route process, so that the intelligent route is realized.
2. The key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (1) is:
(101) Calculating the degree-centrality DC of each node in the network i Center of median BC i Near centrality CC i :
Wherein k is i Representing existing AND node v i The number of connected edges g st Representing the connection v s And v t Is used to determine the number of shortest paths,representing connection through node v i And connect v s And v t D ij Representing node v i To node v j Is a distance of (2);
(102) Calculating the centrality criticality of all nodes according to a node centrality criticality calculation formula:
3. the key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (2) is:
(201) Calculating a network aggregation coefficient:
wherein k is i Representing node v i All neighbors of E i Representing node v i All the adjacent edges, C i Representing node v i C represents the average aggregation coefficient of the network;
(202) Determining a key node proportion m according to the network scale and the aggregation coefficient:
(203) And arranging the nodes according to the value of the centrality criticality from large to small, and determining the critical nodes according to the proportion m.
4. The key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (3) is:
(301) All nodes diffuse own topological connection relations and establish a topological table;
(302) And forming a routing table by calculating a topology table by using a Dijkstra method, and interacting network state perception data.
5. The key node-based distributed scalable intelligent routing method according to claim 1, wherein the specific manner of step (4) is:
(401) Setting service type, service flow demand and service flow size, randomly generating service flow data, adjusting service flow duration according to network load level, and simulating real service flow state;
(402) When an agent exists in a certain service flow routing path, the decision of the agent judges the next jump action, and all agents share a global state space (S, A, P, R), wherein S is the current state set of all agents, and A=pi i∈N A i Representing the joint action space of all agents, { A i } i∈N The action space of the intelligent agent i is as follows, P: S.times.A.times.S.fwdarw.0, 1]For the state transition probability, R:each intelligent agent makes a routing action decision according to the current network situation and service flow information, and meanwhile, the global rewards are improved;
(403) Constructing a state indicating variable K of a node relative to a service flow, and if a certain service flow passes through a node i, K i =1, otherwise K i =0; when K is i When 1, node i will not be used to plan the next hop route; the decision of an agent at node i is represented asIn theta i To strengthen the learning network parameters s i For part of the state of the agent at node i, comprising traffic flow information, part of the neighbor routing table, the available bandwidth of the link, c i The condition state for the agent decision at the node i is composed of a state indicating variable K; in order to maximize network transmission capacity +.>The global rewards of the network should be maximized, J (theta) is a global objective function, and theta is the reinforcement learning network parameter theta of each intelligent agent i And calculating gradient update θ for J (θ) by:
wherein τ represents a value represented by τ to p θ (τ) vs θ Sampled state-action pairs (s, a), estimator A (s, a) is used to estimate the policy pi in state s θ The extent of advantage of action a taken over randomly taken action, policy pi when A (s, a) > 0 θ Action a will take better results than random action, pi θ (a|s) represents a policy to employ action a under conditions of network parameter θ and agent state S;
(404) The strategy parameter gradient function of the agent at the computing node i is used for updating the reinforcement learning network parameters, and the computing formula is as follows:
wherein A is i (s i A) is a local unbiased estimate of the generalized dominance value a (s, a), when the agent i is in the routing path,at this point the agent will not update its policy parameters;
(405) Agent part state s at node i i Condition state c i Under the condition, outputting an action, namely selecting probability distribution of the next hop neighbor of the service flow through the reinforcement learning network, and calculating a reward function of the action as follows:
wherein gamma is E [0,1 ]]To reward discount factors, r t The utility value for the t-th traffic stream is expressed as:
wherein the method comprises the steps ofRespectively representing normalized throughput, delay and packet loss of the t-th service flow, and calculating normalized by dividing t-th service flow parameters by service flow parameter average values of the same type, homology/target nodes, and alpha 1 、α 2 、α 3 Is a non-negative scalar and represents the importance weight of performance indexes such as throughput, delay, packet loss and the like;
(406) After the decision of the path participated by the intelligent agent is finished, detecting whether an unsafe path exists in the path, the loop and link state of which do not meet the service flow requirement, if so, the intelligent agent rewards the negative scalar for the decision as punishment, and recalculating the path by adopting a Dijkstra method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310361356.7A CN116418730A (en) | 2023-04-06 | 2023-04-06 | Distributed extensible intelligent routing method based on key nodes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310361356.7A CN116418730A (en) | 2023-04-06 | 2023-04-06 | Distributed extensible intelligent routing method based on key nodes |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116418730A true CN116418730A (en) | 2023-07-11 |
Family
ID=87052670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310361356.7A Pending CN116418730A (en) | 2023-04-06 | 2023-04-06 | Distributed extensible intelligent routing method based on key nodes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116418730A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116737511A (en) * | 2023-08-10 | 2023-09-12 | 山景智能(北京)科技有限公司 | Graph-based scheduling job monitoring method and device |
CN117319287A (en) * | 2023-11-27 | 2023-12-29 | 之江实验室 | Network extensible routing method and system based on multi-agent reinforcement learning |
-
2023
- 2023-04-06 CN CN202310361356.7A patent/CN116418730A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116737511A (en) * | 2023-08-10 | 2023-09-12 | 山景智能(北京)科技有限公司 | Graph-based scheduling job monitoring method and device |
CN117319287A (en) * | 2023-11-27 | 2023-12-29 | 之江实验室 | Network extensible routing method and system based on multi-agent reinforcement learning |
CN117319287B (en) * | 2023-11-27 | 2024-04-05 | 之江实验室 | Network extensible routing method and system based on multi-agent reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tang et al. | Survey on machine learning for intelligent end-to-end communication toward 6G: From network access, routing to traffic control and streaming adaption | |
Mao et al. | An intelligent route computation approach based on real-time deep learning strategy for software defined communication systems | |
CN116418730A (en) | Distributed extensible intelligent routing method based on key nodes | |
Altman et al. | Non-cooperative forwarding in ad-hoc networks | |
Kuklinski et al. | Density based clustering algorithm for vehicular ad-hoc networks | |
CN107182074A (en) | A kind of route optimal path choosing method based on Zigbee | |
Ghosh et al. | A cognitive routing framework for reliable communication in IoT for industry 5.0 | |
CN109547351A (en) | Method for routing based on Q study and trust model in Ad Hoc network | |
CN107094112A (en) | Bandwidth constraint multicast routing optimization method based on drosophila optimized algorithm | |
Du et al. | Multi-agent reinforcement learning for dynamic resource management in 6G in-X subnetworks | |
CN116390164A (en) | Low orbit satellite network trusted load balancing routing method, system, equipment and medium | |
Han et al. | QMIX aided routing in social-based delay-tolerant networks | |
Qin et al. | Traffic optimization in satellites communications: A multi-agent reinforcement learning approach | |
Zhao et al. | Improving inter-domain routing through multi-agent reinforcement learning | |
Abdel-Kader | An improved discrete PSO with GA operators for efficient QoS-multicast routing | |
Sutariya et al. | A survey of ant colony based routing algorithms for manet | |
Bozorgi et al. | A smart optimizer approach for clustering protocol in UAV-assisted IoT wireless networks | |
Peng et al. | Real-time transmission optimization for edge computing in industrial cyber-physical systems | |
Meng et al. | Intelligent routing orchestration for ultra-low latency transport networks | |
Hosseinzadeh et al. | A Q-learning-based smart clustering routing method in flying Ad Hoc networks | |
HaghighiFard et al. | Hierarchical Federated Learning in Multi-hop Cluster-Based VANETs | |
Safari et al. | A review of ai-based MANET routing protocols | |
CN103619047A (en) | Opportunistic routing method in multiple-concurrent-flow wireless mesh network | |
Wu et al. | On-demand Intelligent Routing Algorithms for the Deterministic Networks | |
Wei et al. | G-Routing: Graph Neural Networks-Based Flexible Online Routing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |