CN117041129A - Low-orbit satellite network flow routing method based on multi-agent reinforcement learning - Google Patents

Low-orbit satellite network flow routing method based on multi-agent reinforcement learning Download PDF

Info

Publication number
CN117041129A
CN117041129A CN202311071886.4A CN202311071886A CN117041129A CN 117041129 A CN117041129 A CN 117041129A CN 202311071886 A CN202311071886 A CN 202311071886A CN 117041129 A CN117041129 A CN 117041129A
Authority
CN
China
Prior art keywords
satellite
delay
routing
network
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311071886.4A
Other languages
Chinese (zh)
Inventor
赖俊宇
刘华烁
徐国尧
朱俊宏
甘炼强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202311071886.4A priority Critical patent/CN117041129A/en
Publication of CN117041129A publication Critical patent/CN117041129A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18521Systems of inter linked satellites, i.e. inter satellite service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/38Flow based routing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Astronomy & Astrophysics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Radio Relay Systems (AREA)

Abstract

The invention discloses a low-orbit satellite network flow routing method based on multi-agent reinforcement learning, belonging to the technical field of computer networks and communication. The invention comprehensively utilizes the reinforcement learning and the routing technology based on the data flow. The number of times of carrying out the deep neural network model inference on the low orbit satellite broadband network data packet can be effectively reduced, and the accumulated time spent on the deep neural network model inference is obviously reduced. And the routing performance of the large-scale low-orbit satellite broadband network can be effectively improved and promoted, so that the network performance requirement of the large-scale low-orbit satellite broadband network can be better met.

Description

Low-orbit satellite network flow routing method based on multi-agent reinforcement learning
Technical Field
The invention belongs to the technical field of computer networks and communication, and particularly relates to a Flow-based routing (Flow-based routing) method based on Multi-agent reinforcement learning (Deep Reinforcement Learning, DRL) in a low-orbit satellite network.
Background
In recent years, with the rapid increase of the ubiquitous communication demands of human beings and the continuous emergence of various innovative applications, large-scale Low Earth Orbit (LEO) satellite networks, such as Starlink constellations proposed by SpaceX, have become research hotspots in industry and academia. Low-orbit satellite broadband network (LEO Satellite Broadband Network, LSBN) is widely regarded as an important complement to future terrestrial networks and will play a key role in the upcoming sixth generation (6G) mobile communication network system. Compared with the traditional high orbit satellite network, the low orbit satellite broadband network has the advantages of seamless coverage on the earth surface, small point-to-point communication delay and low communication transmission power consumption. However, the high dynamics and mobility of low-orbit satellites result in intermittent link connections and dynamic network topology, which makes conventional routing algorithms designed for terrestrial networks unsuitable for large-scale low-orbit satellite broadband networks directly.
On the other hand, artificial intelligence techniques based on Deep Reinforcement Learning (DRL) are finding increasing application in many scientific fields. Researchers have utilized deep reinforcement learning methods to implement routing and switch forwarding of data packets in conventional terrestrial networks. The academy has recently begun to study low-orbit satellite broadband network routing methods based on deep reinforcement learning. Preliminary experimental evaluation results show that the routing method based on deep reinforcement learning can outperform the performance of the traditional routing algorithm in the low-orbit satellite broadband network. However, most existing studies only assume that the routing decision process for a packet can be performed and completed immediately after the router receives a packet, and the above assumption of the decision process is over-idealized, ignoring the time required for Deep Neural Network (DNN) model reasoning in making a packet routing decision in an actual network environment. The time required for deep neural network model reasoning cannot be ignored in view of the limited computational resources on low orbit satellites. This will increase the transmission delay of the data packets in the network, increase the data packet loss rate, and ultimately limit the throughput of network traffic in the low-orbit satellite broadband network. Therefore, ignoring the time of reasoning of the deep neural network model threatens the correctness of the conclusions drawn by these prior research efforts.
Disclosure of Invention
In order to eliminate the negative effect of the deep neural network model inference time on the routing performance, the invention provides a Multi-agent reinforcement learning (MADRL) based Flow-based routing method which makes routing decisions for network data flows rather than for each individual data packet. Flow routing is formalized as a multi-agent decision problem based on a Partially Observable Markov Decision Process (POMDP). Each low orbit satellite acts as an Agent that can forward a network data stream to one of its neighboring satellites according to its own Policy (Policy). It is emphasized that the deep neural network model on the agent only infers when it routes the first packet in a particular data stream, with subsequent packets in that data stream being forwarded according to the same routing decisions as the first packet. The invention further provides a self-adaptive data flow routing updating method, which automatically updates the routing decision and adapts to the dynamically-changed network topology so as to enhance the performance of the proposed flow routing method, because the topology dynamic of the low-orbit satellite broadband network can cause routing failure and thus affect the flow routing performance.
The technical scheme adopted by the invention is as follows: a low-orbit satellite network flow routing method based on multi-agent reinforcement learning, the method comprising:
a1, constructing a low-orbit constellation broadband network distributed inter-satellite routing model;
firstly, constructing a low-orbit constellation network routing model; the model comprises modeling of key elements such as inter-satellite communication links, satellite motion trajectories, constellation network topologies, user distribution and the like; through deep analysis of the architecture and characteristics of a target system, an accurate low-orbit constellation network routing model is constructed;
the low-orbit satellite is denoted as Sat i I∈ {1,2, …, total }, total represents the total number of low-orbit satellites; assume that each satellite establishes n inter-satellite links to communicate with its neighboring satellitesA letter; the links are respectively connected with a front satellite and a rear satellite on the same orbit and are connected with satellites on the left side and the right side of the adjacent orbits; link i,j Representing Sat i To Sat j Wherein i represents the number of the satellite at the transmitting end and j represents the number of the satellite at the receiving end;
when the low orbit satellite receives a data packet, selecting a next-hop satellite according to an upper routing algorithm of the low orbit satellite, and forwarding the data packet to the next-hop satellite through an inter-satellite link; this process will introduce time delays, including decision delays and forwarding delays: decision delay refers to the time delay from receiving a data packet to making a routing decision; the forwarding delay refers to the time delay from making a routing decision to the next hop satellite receiving a data packet; specifically, for packet k, the decision delayComprising two parts: decision queuing delay->And decision making delay->Decision queuing delay refers to the queuing time required to queue a certain low orbit satellite for routing decisions, while decision making delay refers to the time required for the satellite to make routing decisions; in the packet forwarding process, forwarding delay +.>Comprising a plurality of parts: forwarding queuing delay->Transmission delayAnd propagation delay->Forwarding queuingDelay refers to the time required for a data packet to be queued for forwarding in a low-orbit satellite, transmission delay refers to the time required for the data packet to be transmitted through an inter-satellite link, and propagation delay refers to the time required for a signal to travel from one satellite to another along the inter-satellite link;
is arranged atUp-allocated bandwidth->For transmitting data packet k, the transmission delay on the link is +.>Calculated by the following formula:
wherein S is k Is the size of packet k; if link is i,j Temporarily no free bandwidth, packet k will be buffered to link i,j Will introduce a forwarding queuing delay in the forwarding queue cache of (a)When the buffer memory reaches the maximum capacity, the next incoming data packet is discarded; on the other hand, assume time tsat i And Sat j Is (x) i,t ,y i,t ,z i,t ) And (x) j,t ,y j,t ,z j,t ) The method comprises the steps of carrying out a first treatment on the surface of the Spatial distance between these two satellites +.>Calculated by the following formula:
if link is assumed i,j Is transmitted by (a)The distance of broadcastingThe formula can be used:
to calculate signal propagation delayWherein c is the speed of light in vacuum;
calculating low-orbit satellite Sat i Total delay D of upper route packet k i,k
If the next-hop low-orbit satellite is not the target node, the above procedure will be performed again on the next-hop low-orbit satellite;
a2: modeling the routing problem as a locally observable markov decision process;
the routing performance optimization problem of the low-orbit constellation broadband network is converted into a locally observable Markov decision process, so that the uncertainty and randomness of the system are better described, and the complex decision problem is effectively processed; the process P is described by the following 6-tuple:
P=(S,A,T,R,O,γ)
wherein S is the global state space of the environment, a is the action set shared by the agents, T is the state transfer function of the environment, r=sxa is the global rewarding function shared by the agents, O is the local observation state space of the agents, γe [0,1] is the discount factor for balancing the long-term rewarding; the local observation state, action and reward functions are more specifically defined as:
the actions are as follows: after each intelligent agent receives the data packet, the routing decision is needed to be carried out on the data packet; intelligent body driven spaceSelecting an action to carry out data packet routing; wherein,and->Each representing the delivery of a data packet to one of its four adjacent satellites;
bonus function: the goal of each agent is to learn an optimal routing strategy to improve its routing performance, to ensure that each agent learns an optimal routing decision, sat i A reward function r that routes data packet k at time t i (t) is:
wherein, psi is penalty value to the agent when the data packet is lost, dis j,k Representing the next hop satellite Sat j And a normalized spatial distance between the target satellites,is the normalized forwarding delay of packet k, +.>Is to route the data packet k at Sat j Normalized decision delay on; kappa (kappa) 12 And kappa (kappa) 3 Is a weight for balancing the above factors, and the cumulative discount prize is composed ofCalculation of gamma.epsilon.0, 1]Representing a discount factor;
local observation state: in a broadband network of low-orbit satellites, each low-orbit satellite serves as an intelligent agent, and the local observation state space is defined asEach satellite can be communicated with four adjacent satellites on the periphery, namely, the upper side, the lower side, the left side and the right side; wherein->Is satellite Sat i Spatial distances of four adjacent satellites to the target satellite of the current data packet k; the invention uses Simplified General Perturbations (SGP 4) model to estimate the space position of adjacent satellite and target satellite; />Representing four connecting satellites Sat i Network available bandwidth of inter-satellite links; />Is satellite Sat i Current traffic load of the last four forwarding queues, +.>Is satellite Sat i The load of the decision queue on four adjacent satellites; because the element values are different in range, normalization is needed before use;
a3: designing a routing method based on multi-agent deep reinforcement learning;
by utilizing a deep reinforcement learning technology, feedback and rewards are continuously obtained from a low-orbit constellation network environment through cooperation and learning among intelligent agents, and an inter-satellite routing strategy is optimized so as to improve the routing performance and throughput of the whole network; each satellite contains two deep neural networks: estimating Q of Q network i (o i ,a i ;μ i ) And target Q network Q i ′(o i ,a i ;μ i ') respectively from mu i Sum mu i ' parameterization of the corresponding network; at each decision time t, satellite Sat i Consider its local observation o i (t) and from action space A based on epsilon-greedy policy i In selection action a i (t):
s i (t) represents a global state, a i (t) representing an action, when the agent selects an action based on the current observation, the agent interacts with the environment, the current state is changed to the next state o i (t+1) while Sat i Will get rewards r i (t) setting an experience tuple { o } i (t),o i (t+1),a i (t),r i (t) } the experience tuples are recorded in an experience playback pool RB, which is a setting for breaking the correlation of training data, thereby optimizing the reinforcement learning training process, from which the agent randomly extracts a batch of experience tuples and trains with their parameter values updating the estimated Q network, in each iteration the target Q network is used to calculate each state-action pair (o i (t),a i (t)) fixed target Q value y i (t) wherein the estimated Q network is used to obtain the next state o i Maximum Q value of all actions on (t+1) and using target network parameter μ i ', wherein y i The calculation method of (t) is as follows:
wherein γ is a discount factor for determining the importance of future rewards; r is (r) i (t) is the value for the state-action pair (o i (t),a i (t)) an instant prize; the loss function is:
Loss i (t)=(y i (t)-Q i (o i (t),a i (t);μ i )) 2
estimating the parameter value of the Q-network is updated by minimizing the mean square error between the estimated Q-value and the target Q-value using random gradient descent, estimating the parameter μ of the Q-network i By copying the parameters mu of the target Q network at the end of each training iteration i ' update:
wherein alpha is the learning rate, after each iteration, the parameters of the target Q network are updated softly according to the estimated Q network, and gradually, the estimated Q network can estimate the data packet routing decision of the agent more accurately;
a4: defining a data stream in a low orbit satellite constellation network distributed routing scene;
with the acceleration of the new generation low orbit satellite broadband network construction process and the rapid increase of the number of users, the satellite network communication frequency is further improved, and the development is continued towards the high flux and broadband network; the throughput requirement of a single satellite exceeds hundreds of Gbps, the MADRL algorithm is deployed on a low-orbit satellite and used for distributed routing forwarding, and the inherent reasoning time delay of a neural network model can lead to serious limitation of single satellite throughput and great improvement of packet loss rate, so that the high-bandwidth and low-time delay transmission requirement of a new generation low-orbit constellation broadband network can not be met; in order to fully optimize the MADRL-based distributed routing scheme, the invention provides a data Flow-based routing scheme;
a data Flow (Flow) refers to a set of ordered sequences of data packets with the same source node and destination node; in a low-orbit satellite broadband network scenario, the present invention focuses on inter-satellite routing of data packets. Thus, the low-orbit satellite nodes are used as the start and end points of the data stream, regardless of which ground user nodes the data packets in the sequence are transmitted and received by. The MADRL is a distributed algorithm, so that each low-orbit satellite node is an independent intelligent agent, and after receiving a data packet, the satellite node needs to determine the data flow to which the data packet belongs according to the network port information for receiving the data packet and the destination address information to which the data packet is sent. Thus, the definition of data flow in a low-orbit satellite network scenario is: an ordered set of packets received from the same portal on a low-orbit satellite, with the destination node being the same satellite, is defined as the same data stream.
A5: providing routing policy sharing mechanism based on data flow
The invention provides a flow routing method for effectively reducing the negative influence of the inference time of a deep neural network on the routing performance of a low-orbit satellite network. The method organizes the data packets into streams, considers factors such as characteristics of the data streams, time delay requirements, bandwidth requirements and the like, and selects an optimal routing path and a resource allocation strategy between low-orbit satellites through learning and decision on an intelligent body so as to optimize network routing performance to the greatest extent.
Considering that all data packets in the same data stream have the same destination address, each low-orbit satellite independently maintains a "flow routing table" for all data streams passing through it, similar to the traditional MADRL packet routing method, which requires the low-orbit satellite to use a deep neural model to make routing decisions for the 1 st data packet in the data stream. The obtained routing information is stored as a corresponding entry in a flow routing table, is used for a subsequent data packet in the same data flow, eliminates the need for carrying out deep neural network model reasoning on the subsequent data packet in the data flow, and obviously reduces the accumulated time spent for carrying out the deep neural network model reasoning in the low-orbit satellite broadband network. Thus, the routing performance (including end-to-end transmission delay, packet loss, and network throughput) will be significantly optimized to meet the large-scale low-orbit satellite broadband network performance requirements.
A6, designing a self-adaptive flow route updating method based on time delay jitter
The invention introduces a self-adaptive data flow route updating method, which automatically updates the data flow route decision by monitoring the transmission delay change of the inter-satellite data packet in real time. Specifically, the delay difference between two successful transmission data packets before and after calculation is based on and compared with a preset threshold. If the delay difference (delay jitter) exceeds the threshold, a multi-agent reinforcement learning algorithm will be triggered to re-route the flow, updating the routing decision. The implementation of this mechanism is completely independent of the complex network Model (Model-fe). When the satellite finishes forwarding the data packet, the time delay of the data packet transmitted in the hop is perceived and recorded, and two continuous data packets belonging to the same data stream are recordedDelay i+1 And Delay i Difference is made, and Delay variation delta Delay is calculated i+1
ΔDelay i+1 =|Delay i+1 -Delay i |
Then based on the Delay jitter DeltaDelay i+1 And judging the applicability of the routing strategy for the data flow in the routing table under the current network state. If Delay is delayed by DeltaDelay i+1 Above a set threshold value theta thr When the current routing path possibly has performance problems or abnormal conditions, calling the deep neural network model again when forwarding the next data packet of the data flow, executing the routing strategy output by reasoning to forward the data packet, and replacing the old strategy in the routing table, thereby completing the forwarding of the data packet and the updating of the routing strategy of the data flow.
Compared with the prior network routing technology, the invention comprehensively utilizes the reinforcement learning technology and the data flow-based routing technology. The number of times of carrying out the deep neural network model inference on the low orbit satellite broadband network data packet can be effectively reduced, and the accumulated time spent on the deep neural network model inference is obviously reduced. And the routing performance of the large-scale low-orbit satellite broadband network can be effectively improved and promoted, so that the network performance requirement of the large-scale low-orbit satellite broadband network can be better met.
Drawings
Fig. 1 is a schematic diagram of a fully distributed routing framework in a low-rail constellation network in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of a flow routing method based on multi-agent reinforcement learning in an example of the present invention;
FIG. 3 is a graph of end-to-end delay test results for different numbers of ground users in an example of the present invention;
fig. 4 is a graph of packet loss rate test results for different numbers of ground users in an example of the present invention;
fig. 5 is a graph of network throughput test results for different numbers of ground users in an example of the invention.
Description of the embodiments
The following detailed description of specific embodiments of the invention is provided in connection with the accompanying drawings and specific examples. The following specific examples are given for the purpose of illustration only and are not intended to limit the scope of the invention. The specific implementation of the invention is divided into two stages: 1) Stage one. Generating data on the simulation platform to train the deep reinforcement learning model, 2) stage two. And deploying the trained deep reinforcement learning model in a real system to execute routing decisions.
Stage one: training phase
Step 1: constructing low-orbit constellation broadband network distributed inter-satellite routing model
The invention adopts classical Iridium (Iridium) constellation configuration as target network topology, and the low orbit satellite network comprises N orbit Strip track, N orbit =6, each track has N Sat_orbit Low orbit satellite with evenly distributed particles, N Sat_orbit =11. Specific parameter values for the iridium network topology are shown in table 1.
Table 1 iridium network topology parameter values
Parameter name Symbolic representation Parameter value
Track number N orbit 6
Number of satellites per orbit N Sat_orbit =11 11
Track height h orbit 780km
Satellite movement speed v sat 7.46km/s
Longitude difference between homodromous tracks β 31.6°
Longitude difference between opposite tracks α 22°
Track semi-long shaft r a 7185km
Eccentricity of orbit e 0
Near-site depression angle ω
Track tilt angle i 86.4°
In this network, the low-orbit satellite is denoted as Sat i I∈ {1,2, …, total } (total represents the total number of LEO satellites). Four Inter-Satellite Links (ISLs) may be established for each Satellite) For communication with four low-orbit satellites adjacent thereto. These links are respectively connected to the satellites on the upper and lower sides of the same orbit, and to the satellites on the left and right sides of the adjacent two orbits. By link i,j To represent Sat i To Sat j Wherein i denotes the number of the transmitting side satellite and j denotes the number of the receiving side satellite.
Step 2: constructing a ground user distribution model and generating a communication request according to a user behavior model
The present invention uses GPW4 (Gridded Population of the World, version 4) as a data source for building a ground based user distribution model. GPW4 is a global population gridding dataset developed by Cooperation of the university of Columbia and the International center of earth science and application. The dataset provides grid data of global population quantity, density and distribution based on a variety of data sources including national census, remote sensing data, land utilization data, and the like. The invention selects proper resolution and data format to process the GPW4 data set, then divides the ground into M continuous areas with uneven user distribution according to the global population quantity and distribution information provided by the GPW4 data set, and the user position in each area is uniformly distributed, and the probability distribution function is as follows:
where a and b are the boundaries of the region.
The invention adopts a probability statistical model to represent the user communication request in a period of time, and assumes that all user behaviors independently and periodically send data packets to an access satellite, and the time intervals of two adjacent tasks of a single user are distributed in a negative exponential manner, and the probability density function is as follows:
wherein,the expected value of the request time interval is sent for a single user.
Step 3: the data packet is sent to a satellite, and the satellite acquires the local state information observed quantity
In a low-orbit satellite constellation network with large space scale characteristics, a centralized control node is difficult to acquire a network global state in time to realize real-time routing decision, so that the satellite is defined as an intelligent agent independent of each other, and the data packet routing decision is determined only according to local observation information. For each satellite, after receiving the data packet, its local observation state space is defined asWherein->Is satellite Sat i Spatial distances of four adjacent satellites to the target satellite of the current data packet k. The present invention uses the existing SGP4 model to estimate the spatial locations of neighboring satellites and the target satellite. />Representing four connections Sat i Is available bandwidth for the network of inter-satellite links. />For Sat i Current traffic load of the last four forwarding queues, +.> For Sat i Is the load of the decision queue on four adjacent satellites. Since the above element numerical ranges are not identical, they are normalized using the following formula:
step 4: satellite relies on the local state observed quantity obtained in the step 3, utilizes a strategy network in a deep reinforcement learning model to select the optimal action, and executes the routing decision of the data packet
After each satellite agent receives the data packet and obtains the local information observation quantity, the routing decision is needed for the data packet. The agent can move from the working spaceAnd selecting one action to perform data packet routing. Wherein (1)>And->Each representing the transfer of the data packet to one of four adjacent satellites as the next hop.
In the training stage, the selection of the routing strategy by the agent each time is divided into two cases of exploration and utilization, and the exploration and utilization are compromised based on probability by utilizing an epsilon-greedy algorithm: the intelligent agent tries to randomly explore with epsilon probability, and uses the current optimal strategy with 1-epsilon probability, so that training samples can be collected more widely to a certain extent.
Step 5: calculating low-orbit constellation broadband network node data packet routing delay
In this step, when a low-orbit satellite receives a data packet, it will select the next-hop satellite to process the data packet through the routing decision obtained in step 4, and forward the data packet to the next-hop satellite through the inter-satellite link. This routing process requires a certain time delay, including decision delays and forwarding delays. The decision delay refers to the time delay from receiving a data packet to making a routing decision, and the forwarding delay refers to the time delay from making a routing decision to receiving a data packet by the next hop satellite.
In particular, for a broadband network at a low-orbit satelliteUpstream-routed packet k, decision delayComprising two parts: decision queuing delay->And decision making delay->Decision queuing delay refers to the time required to queue a routing decision in a satellite, while decision-making delay refers to the time required for a satellite to make a routing decision. In the packet forwarding process, forwarding delay +.>Also comprising a plurality of parts: forwarding queuing delay->Transmission delay->And propagation delay->The propagation delay refers to the time required to wait for a packet to be forwarded in a satellite, the propagation delay refers to the time required for a packet to be transmitted over an inter-satellite link, and the propagation delay refers to the time required for a packet to travel along an inter-satellite link from one satellite to another.
If in link i,j On which bandwidth is allocatedFor transmitting data packet k, the transmission delay on the link is +.>The formula can be used:
and (5) performing calculation. Wherein S is k Is the size of packet k. If link is i,j Temporarily no free bandwidth, packet k is buffered to a link i,j This will introduce a forwarding queuing delay in the forwarding queue buffer of (1)When the buffer reaches maximum capacity, subsequent packets will be discarded. On the other hand, assume time tsat i And Sat j Is (x) i,t ,y i,t ,z i,t ) And (x) j,t ,y j,t ,z j,t ). Spatial distance between these two satellites +.>The formula can be used:
and (5) performing calculation.
If link is assumed i,j Is of the propagation distance ofThe formula can be used:
to calculate signal propagation delayWherein c is the speed of light in vacuum.
In summary, at Sat i The total delay of the upstream packet k may be as follows:
to calculate. If the next-hop satellite is not the target node, the above procedure will be performed again on the next-hop satellite.
Step 6: calculating rewards value of intelligent agent for route decision
In this step, if the current data packet is forwarded to a neighboring satellite, a corresponding reward is given to the agent according to the delay calculated in step 5. The goal of each agent is to learn an optimal routing strategy to improve routing performance, in order to ensure that each agent (i.e., satellite) learns an optimal routing decision, sat i The bonus function for routing data packet k at time t is defined as follows
Wherein, psi is penalty value to the agent when the data packet is lost, dis j,k Representing the next hop satellite Sat j And a normalized spatial distance between the target satellites,is the normalized forwarding delay of packet k, +.>Is to route the data packet k at Sat j Normalized decision delay on (a) the (b). Kappa (kappa) 12 And kappa (kappa) 3 Is a weight used to balance the above factors. Cumulative discount rewards are composed ofCalculation of gamma.epsilon.0, 1]Representing the discount factor.
Step 7: training a strategic network of reinforcement learning models for each agent
The multi-agent deep reinforcement learning algorithm is a strategy for optimizing the routing strategy of each low-orbit satellite data packet so as to realize the following steps ofA method of maximizing overall cumulative discount returns. And each satellite has two deep neural networks: estimating Q of Q network i (o i ,a i ;μ i ) And target Q network Q i ′(o i ,a i ;μ i ') respectively from mu i Sum mu i 'parameterization'. At each decision time t, satellite Sat i Consider its local observation o i (t) and from action space A based on epsilon-greedy policy i In selection action a i (t):
When the agent selects actions according to the current observation, the agent interacts with the environment, and the current state is changed into the next state o i (t+1) while Sat i Will get rewards r i (t). In this algorithm, the empirical tuple { o } needs to be set i (t),o i (t+1),a i (t),r i (t) }. The experience tuples will be recorded into a playback experience playback pool RB, which then the agent trains by randomly extracting a batch of experience tuples from the RB and updating the parameter values of the Q network with them. In each iteration, the target Q network is used to calculate each state-action pair (o i (t),a i (t)) fixed target Q value y i (t). The target Q value is calculated using the Bellman equation, where the estimated Q network is used to obtain the next state o i Maximum Q value of all actions on (t+1) and using target network parameter μ i ′:
Wherein γ is a discount factor for determining the importance of future rewards; r is (r) i (t) is the value for the state-action pair (o i (t),a i (t)) immediate rewards. The loss function is defined as:
Loss i (t)=(y i (t)-Q i (o i (t),a i (t);μ i )) 2
the parameter values of the Q network are updated by using random gradient descent to minimize the mean square error between the estimated Q value and the target Q value. Estimating the parameter μ of the Q network i By copying the parameters mu of the target Q network at the end of each training iteration i ' update:
where α is the learning rate. After each iteration, the parameters of the target Q network are soft updated according to the estimated network. Increasingly, Q networks can more accurately estimate packet routing decisions for agents.
Stage two: execution phase
Step 1: deploying the deep reinforcement learning model with stage one training completion into a real low orbit satellite network
In this embodiment, a real Iridium (Iridium) constellation network is used as an application scenario for executing a stage, and a DRL model trained in the stage one is deployed on each satellite in the Iridium constellation, so as to form a completely distributed low-orbit satellite network routing architecture to execute routing decisions. Specific parameter values for the iridium satellite based network topology are shown in table 2.
Table 2 iridium network topology parameter values
In an iridium satellite based network, the low-orbit satellite is denoted as Sat i I∈ {1,2, …, total } (total represents the total number of LEO satellites). The present invention assumes that each Satellite in an iridium Satellite based network may establish four Inter-Satellite links (ISLs) for communication with its neighboring satellites. These links are connected to satellites above and below the same orbit, and to satellites to the left and right of the adjacent two orbits, respectively. The four links are respectively connected by link i,j To represent Sat i To Sat j Wherein i denotes the number of the transmitting side satellite and j denotes the number of the receiving side satellite. Meanwhile, the invention uses GPW4 (Gridded Population of the World, version 4) to simulate the global distribution state of real low orbit satellite network users. GPW4 is a global population gridding dataset developed by Cooperation of the university of Columbia and the International center of earth science and application. The dataset provides grid data of global population quantity, density and distribution based on a variety of data sources including national census, remote sensing data, land utilization data, and the like. The invention selects proper resolution and data format to process the GPW4 data set, then divides the ground into M continuous areas with uneven user distribution according to the global population quantity and distribution information provided by the GPW4 data set, and the user positions in each area are uniformly distributed.
When a low-orbit satellite (e.g., LEO #23 in fig. 3) receives a data packet, it will make a data stream routing decision using the deep reinforcement learning model trained in stage one and forward the data packet to the next hop satellite over the inter-satellite link. If the next-hop satellite is not the target node, the above procedure will be performed again on the next-hop satellite.
Step 2: stream level routing decision making based on data stream and trained MADRL model
In this step, in order to mitigate the negative effect of the deep neural network inference time on the network routing performance, the present invention proposes a flow routing method for the optimization requirement of the flow level. The method organizes the data packets into streams, considers the factors of stream characteristics, time delay requirements, bandwidth requirements and the like, and selects the optimal routing path and resource allocation among satellites through the learning and decision of an intelligent agent so as to meet the requirements of user traffic to the greatest extent and optimize the overall performance of the network.
Considering that all packets in a flow have the same destination address, each satellite is specified to independently maintain a flow routing table for all traffic flows passing through it. Similar to the traditional MADRL packet routing method, the MADRL flow routing method requires the low-orbit satellite to use the DNN model to make routing decisions for the first packet in the traffic flow. The routing information is then stored as a corresponding entry in the satellite stream routing table and is available directly for use by subsequent packets in the same data stream. This effectively eliminates the need for subsequent data packets to make DNN model inferences, thereby significantly reducing the cumulative time spent making DNN model inferences in low orbit satellite broadband networks. Thus, it is expected that routing performance, such as end-to-end transmission delay, packet loss, and network throughput, will be significantly optimized and meet the needs of large-scale low-orbit satellite broadband networks. The present invention more vividly shows a routing table for a data stream by way of example, assuming that there are N satellite nodes in a certain low orbit constellation network, each satellite can only adopt four forwarding strategies (up, down, left, right), and the routing table pattern maintained by the kth (k e {1,2,..once., N }) satellite at a certain moment is shown in table 3. For the data stream which is not received yet and the target node is the satellite, the forwarding decision stored in the routing table is set as None, and in other cases, the data packet which is received subsequently is forwarded according to the existing strategy in the routing table.
Table 3 routing for data flows indicates intent
Step 3: adaptive update of flow routes based on delay jitter
In this step, an adaptive flow routing update method is introduced to monitor and adjust inter-satellite packet transmission delay in real time, thereby ensuring the routing performance of the network. This method is based on calculating the delay difference between two successful transmissions of the data packet before and after and comparing it with a preset threshold. If the delay variance exceeds the threshold, a multi-agent reinforcement learning algorithm is triggered for routing.
The present invention therefore proposes a delay jitter based adaptive policy update mechanism that is implemented completely independent of the complex network Model (Model-free). When the satellite finishes forwarding the data packet, the satellite senses and records the time delay of the data packet transmitted in the hop,and Delay the time Delay of two continuous data packets belonging to the same data stream i+1 And Delay i Difference calculation Delay jitter delta Delay i+1
ΔDelay i+1 =|Delay i+1 -Delay i |
Then based on the Delay jitter DeltaDelay i+1 And judging the applicability of the routing strategy for the data flow in the routing table under the current network state. If Delay is delayed by DeltaDelay i+1 Above a certain threshold value theta thr When the current routing path possibly has performance problems or abnormal conditions, and the next data packet of the data flow is forwarded, the routing strategy output by reasoning is executed by using the deep neural network model to forward the data packet, and the old strategy in the routing table is replaced, so that forwarding of the data packet and strategy updating of the data flow are completed.
Step 4: developing low orbit satellite stream level routing strategy performance assessment
Performance evaluation is carried out on the low-orbit satellite network flow routing strategy provided by the invention, and the performance evaluation indexes focused by the invention comprise end-to-end time delay, packet loss rate and throughput. Meanwhile, the reference algorithm for comparing the strategy development performance provided by the invention comprises the following steps:
1) OSPF (Open Shortest Path First): the shortest path of the current router to the different target nodes is calculated periodically and stored in the routing table. When a packet arrives at the router, the routing decision is made by looking up the routing table, and the delay of the routing decision is negligible.
2) ELB (Offloading To Access Satellite): and the unbalanced load condition in the satellite network is effectively avoided. Since the routing table in the load balancing algorithm is calculated in a similar manner to the OSPF algorithm, the routing decision delay of the load balancing algorithm is negligible.
3) MADRL-packet: one satellite would need to make DNN model inferences for each data packet, which would introduce a non-negligible cumulative decision delay.
The invention tests the algorithm performance by changing the number of users of the low orbit satellite constellation network, and ends toThe experimental results of the three performance indexes of the end delay, the packet loss rate and the throughput are shown in fig. 3, fig. 4 and fig. 5. Notably, the proposed MADRL-flow routing method can always exhibit superior performance over all baseline algorithms in three performance metrics. Furthermore, the invention is realized by arranging three different componentsThe performance of the proposed MADRL-flow routing method was investigated. As shown in FIG. 5, at +.>Of the three different threshold configurations set to 0.001, 0.005, and 0.008, respectively, MADRL-flow (0.005) is superior to MADRL-flow (0.001) and MADRL-flow (0.008). The intuitive reason is that when->When too large, the updating of the flow routing table may lag behind the real-time network state. Conversely, too little ∈>The values increase the need for DNN model reasoning, affecting the performance of the flow routing method.
The invention further researches how much the proposed MADRL-flow routing method can optimize the average decision delay of each data packet. The evaluation results are shown in Table 4. Since the lookup of the routing table is typically done instantaneously, the average decision delay of OSPF and ELB can be ignored. When the number of ground users is 27000, the average decision delay using the MADRL-packet method reaches 42.9 milliseconds, accounting for more than half of the total end-to-end delay. While whenThe MADRL-flow method can significantly reduce the average decision delay to around 1 millisecond when set to 0.008. In addition, the decision delay ratio using MADRL-flow routing method is also greatly reduced when +.>At 0.008, it is only about 2% of the total end-to-end delay.
Table 4 the decision delay test results table for different numbers of ground users in the example of the present invention.
/>

Claims (2)

1. A low orbit satellite network flow routing method based on multi-agent reinforcement learning comprises a training stage and an execution stage;
the training phase comprises:
step A1: constructing a low-orbit constellation broadband network distributed inter-satellite routing model;
the low orbit satellite network comprises N orbit Tracks, each track having N Sat_orbit Uniformly distributed low-orbit satellites, denoted as Sat i I epsilon {1,2, …, total }, total representing the total number of LEO satellites, each satellite may establish four inter-satellite links for communication with its neighboring four low-orbit satellites. These links are respectively connected to the satellites on the upper and lower sides of the same orbit, and to the satellites on the left and right sides of the adjacent two orbits. By link i,j To represent Sat i To Sat j Wherein i represents the number of the satellite at the transmitting end and j represents the number of the satellite at the receiving end;
step A2: constructing a ground user distribution model, and generating a communication request according to a user behavior model;
dividing the ground into M continuous areas with uneven user distribution, wherein the user positions in each area are uniformly distributed; setting all user behaviors to independently and periodically send data packets to an access satellite;
step A3: the data packet is sent to a satellite, and the satellite acquires the local state information observed quantity;
defining satellites as mutually independent agents, determining packet routing decisions based on local observationsThe method comprises the steps of carrying out a first treatment on the surface of the For each satellite, after receiving the data packet, its local observation state space is defined asWherein the method comprises the steps of Is satellite Sat i Spatial distances of four adjacent satellites to the target satellite of the current data packet k; />Representing four connections Sat i Network available bandwidth of inter-satellite link, < + >>For Sat i The current traffic load of the last four forwarding queues,for Sat i The load of the decision queue on four adjacent satellites; normalizing the element values in the step A3;
step A4: c, the satellite relies on the local state observed quantity obtained in the step A3, a strategy network in the deep reinforcement learning model is utilized to select the optimal action, and the routing decision of the data packet is executed;
after each satellite agent receives the data packet and obtains the local information observation quantity, the satellite agent needs to make routing decision on the data packet, and the agent moves from the working spaceSelecting an action to route the data packet, wherein, < ->And->Each representing the transfer of the data packet to one of four adjacent satellites as the next hop;
step A5: calculating the routing delay of the low-orbit constellation broadband network node data packet;
when a low orbit satellite receives a data packet, it will select the next hop satellite to process the data packet through the routing decision obtained in step A4, and forward the data packet to the next hop satellite through the inter-satellite link; this routing process requires a certain time delay, including decision delays and forwarding delays; decision delay refers to the time delay from receiving a data packet to making a routing decision, while forwarding delay refers to the time delay from making a routing decision to receiving a data packet by the next hop satellite;
decision delay for packet k routed over a low-orbit satellite broadband networkComprising two parts: decision queuing delay->And decision making delay->Decision queuing delay refers to the time required to wait for a routing decision in a satellite, while decision making delay refers to the time required for a satellite to make a routing decision, and in the process of forwarding a data packet, the forwarding delay +.>Also comprising a plurality of parts: forwarding queuing delay->Transmission delay->And propagation delay->The transmission delay refers to the time required for the data packet to be transmitted through the inter-satellite link, and the propagation delay refers to the time required for the data packet to travel from one satellite to another along the inter-satellite link;
if in link i,j On which bandwidth is allocatedFor transmitting data packet k, the transmission delay on the link is +.>The formula can be used:
calculating; wherein S is k Is the size of the data packet k, if link i,j Temporarily no free bandwidth, packet k is buffered to a link i,j This will introduce a forwarding queuing delay in the forwarding queue buffer of (1)When the buffer memory reaches the maximum capacity, the subsequent data packet is discarded; on the other hand, assume time tsat i And Sat j Is (x) i,t ,y i,t ,z i,t ) And (x) j,t ,y j,t ,z j,t ) The spatial distance between these two satellites +.>By the formula:
calculating;
if link is assumed i,j Is of the propagation distance ofThe formula can be used:
to calculate signal propagation delayWherein c is the speed of light in vacuum;
in summary, at Sat i The total delay of the upstream packet k may be as follows:
calculating; if the next-hop satellite is not the target node, the above procedure will be performed again on the next-hop satellite;
step A6: calculating a reward value of the intelligent agent for routing decision;
if the current data packet is forwarded to the neighbor satellite, giving corresponding rewards to the intelligent agent according to the delay calculated in the step 5; the goal of each agent is to learn an optimal routing strategy to improve routing performance, to ensure that each agent learns an optimal routing decision, sat i The bonus function that routes the data packet k at time t is defined as follows:
wherein, psi is penalty value to the agent when the data packet is lost, dis j,k Representing the next hop satellite Sat j And a normalized spatial distance between the target satellites,is the normalized forwarding delay of packet k, +.>Is to route the data packet k at Sat j Normalized decision delay on, κ 12 And kappa (kappa) 3 Is a weight for balancing the above factors, and the cumulative discount prize is composed ofCalculation of gamma.epsilon.0, 1]Representing a discount factor;
step 7: training a strategy network of the reinforcement learning model of each agent;
each satellite contains two deep neural networks: estimating Q of Q network i (o i ,a i ;μ i ) And a target Q network Q' i (o i ,a i ;μ′ i ) Respectively from mu i And mu' i Parameterization of the corresponding network; at each decision time t, satellite Sat i Consider its local observation o i (t) and from action space A based on epsilon-greedy policy i In selection action a i (t):
s i (t) represents a global state, a i (t) representing an action, when the agent selects an action based on the current observation, the agent interacts with the environment, the current state is changed to the next state o i (t+1) while Sat i Will get rewards r i (t) setting an experience tuple { o } i (t),o i (t+1),a i (t),r i (t) } the experience tuples are to be recorded in an experience playback pool RB from which the agent randomly extracts a batch of experience tuples and trains them with their parameter values updating the estimated Q network, in each iteration the target Q network is used to calculate each state-action pair (o i (t),a i (t)) fixed target Q value y i (t) wherein the estimated Q network is used to obtain the next state o i Maximum Q value for all actions at (t+1) and using the target network parameter μ' i Wherein y is i The calculation method of (t) is as follows:
wherein γ is a discount factor for determining the importance of future rewards; r is (r) i (t) is the value for the state-action pair (o i (t),a i (t)) an instant prize; the loss function is:
Loss i (t)=(y i (t)-Q i (o i (t),a i (t);μ i )) 2
estimating the parameter value of the Q-network is updated by minimizing the mean square error between the estimated Q-value and the target Q-value using random gradient descent, estimating the parameter μ of the Q-network i By copying the parameter μ 'of the target Q network at the end of each training iteration' i Updating:
wherein alpha is the learning rate, and after each iteration, the parameters of the target Q network are updated softly according to the estimated Q network;
the execution phase comprises:
step B1: deploying the deep reinforcement learning model completed in the training stage into a real low-orbit satellite network;
when a low-orbit satellite receives a data packet, a trained deep reinforcement learning model is used for making a data flow routing decision, and the data packet is forwarded to the next jumping satellite through an inter-satellite link; if the next-hop satellite is not the target node, the above procedure will be performed again on the next-hop satellite;
step 2: making a stream level routing decision based on the data stream and the trained MADRL model;
the data packets are organized into streams, the characteristics, the time delay requirement and the bandwidth requirement of the streams are considered, and the optimal routing path and resource allocation among satellites are selected through the learning and decision of an intelligent agent so as to meet the requirement of user traffic to the greatest extent and optimize the overall performance of the network;
providing that each satellite independently maintains a flow routing table for all traffic flows passing through it, considering that all packets in the flow have the same destination address; the low orbit satellite uses DNN model to make route decision for the first data packet in the traffic flow; the routing information is then stored as a corresponding entry in the satellite flow routing table and can be directly used for subsequent packets in the same data flow;
step 3: self-adaptively updating the flow route based on the delay jitter;
when the satellite finishes forwarding the data packet, the intelligent agent can sense and record the time Delay of the data packet transmitted in the hop and Delay the time Delay of two continuous data packets belonging to the same data stream i+1 And Delay i Difference calculation Delay jitter delta Delay i+1
ΔDelay i+1 =|Delay i+1 -Delay i |
Then based on the Delay jitter DeltaDelay i+1 Judging the applicability of a routing strategy for the data flow in a routing table in the current network state; if Delay is delayed by DeltaDelay i+1 Higher than a set threshold value theta thr When the current routing path possibly has performance problems or abnormal conditions, and the next data packet of the data flow is forwarded, the routing strategy output by reasoning is executed by using a deep neural network model to forward the data packet, and the old strategy in the routing table is replaced, so that the forwarding of the data packet and the strategy of the data flow are completedUpdating.
2. The method for routing a low-orbit satellite network based on multi-agent reinforcement learning according to claim 1, wherein the selection of the routing strategy by the agent in step 4 of the training phase is divided into two cases of exploration and utilization each time, the agent performs random exploration with epsilon probability, and utilizes the current optimal strategy with 1-epsilon probability.
CN202311071886.4A 2023-08-24 2023-08-24 Low-orbit satellite network flow routing method based on multi-agent reinforcement learning Pending CN117041129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311071886.4A CN117041129A (en) 2023-08-24 2023-08-24 Low-orbit satellite network flow routing method based on multi-agent reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311071886.4A CN117041129A (en) 2023-08-24 2023-08-24 Low-orbit satellite network flow routing method based on multi-agent reinforcement learning

Publications (1)

Publication Number Publication Date
CN117041129A true CN117041129A (en) 2023-11-10

Family

ID=88641005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311071886.4A Pending CN117041129A (en) 2023-08-24 2023-08-24 Low-orbit satellite network flow routing method based on multi-agent reinforcement learning

Country Status (1)

Country Link
CN (1) CN117041129A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117395188A (en) * 2023-12-07 2024-01-12 南京信息工程大学 Deep reinforcement learning-based heaven-earth integrated load balancing routing method
CN117692052A (en) * 2024-02-04 2024-03-12 北京邮电大学 Access selection method and device for multiple ground users in low-orbit satellite network
CN117939520A (en) * 2024-03-22 2024-04-26 银河航天(西安)科技有限公司 Satellite link-based adaptation degree determining method, device and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117395188A (en) * 2023-12-07 2024-01-12 南京信息工程大学 Deep reinforcement learning-based heaven-earth integrated load balancing routing method
CN117395188B (en) * 2023-12-07 2024-03-12 南京信息工程大学 Deep reinforcement learning-based heaven-earth integrated load balancing routing method
CN117692052A (en) * 2024-02-04 2024-03-12 北京邮电大学 Access selection method and device for multiple ground users in low-orbit satellite network
CN117692052B (en) * 2024-02-04 2024-04-19 北京邮电大学 Access selection method and device for multiple ground users in low-orbit satellite network
CN117939520A (en) * 2024-03-22 2024-04-26 银河航天(西安)科技有限公司 Satellite link-based adaptation degree determining method, device and storage medium
CN117939520B (en) * 2024-03-22 2024-05-24 银河航天(西安)科技有限公司 Satellite link-based adaptation degree determining method, device and storage medium

Similar Documents

Publication Publication Date Title
CN110012516B (en) Low-orbit satellite routing strategy method based on deep reinforcement learning architecture
CN109818865B (en) SDN enhanced path boxing device and method
CN117041129A (en) Low-orbit satellite network flow routing method based on multi-agent reinforcement learning
CN112437020B (en) Data center network load balancing method based on deep reinforcement learning
CN112491714B (en) Intelligent QoS route optimization method and system based on deep reinforcement learning in SDN environment
CN109039942B (en) Network load balancing system and balancing method based on deep reinforcement learning
CN113572686B (en) Heaven and earth integrated self-adaptive dynamic QoS routing method based on SDN
CN111416771B (en) Method for controlling routing action based on multi-agent reinforcement learning routing strategy
CN116527567B (en) Intelligent network path optimization method and system based on deep reinforcement learning
WO2021036414A1 (en) Co-channel interference prediction method for satellite-to-ground downlink under low earth orbit satellite constellation
CN111917642B (en) SDN intelligent routing data transmission method for distributed deep reinforcement learning
CN114221691A (en) Software-defined air-space-ground integrated network route optimization method based on deep reinforcement learning
Tang et al. Federated learning for intelligent transmission with space-air-ground integrated network toward 6G
CN114169234A (en) Scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation
CN116390164A (en) Low orbit satellite network trusted load balancing routing method, system, equipment and medium
Mai et al. Packet routing with graph attention multi-agent reinforcement learning
CN116634498A (en) Low orbit satellite constellation network edge calculation multistage unloading method based on reinforcement learning
Zeng et al. Multi-agent reinforcement learning for adaptive routing: A hybrid method using eligibility traces
CN113645589B (en) Unmanned aerial vehicle cluster route calculation method based on inverse fact policy gradient
Li et al. Swarm-intelligence-based routing and wavelength assignment in optical satellite networks
CN116886176A (en) Predictable inter-satellite routing method based on link utility function
CN115225512B (en) Multi-domain service chain active reconfiguration mechanism based on node load prediction
CN114629769B (en) Traffic map generation method of self-organizing network
Meng et al. Intelligent routing orchestration for ultra-low latency transport networks
CN112511445B (en) Shortest path route generating method based on load weighting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination