CN117614507A

CN117614507A - Self-adaptive flow unloading method of high-dynamic topology heaven-earth integrated network

Info

Publication number: CN117614507A
Application number: CN202311484841.XA
Authority: CN
Inventors: 石金明; 辛宁; 陈特; 李殷乔
Original assignee: China Academy of Space Technology CAST
Current assignee: China Academy of Space Technology CAST
Priority date: 2023-11-08
Filing date: 2023-11-08
Publication date: 2024-02-27

Abstract

The invention relates to a self-adaptive flow unloading method of a high-dynamic topology heaven-earth integrated network, which comprises the steps of collecting information such as heaven-earth integrated network topology, link state, satellite flow load and the like, self-adaptive flow unloading flow of LEO satellites, reinforcement learning model aggregation of GEO satellites and the like. The method is based on a federal reinforcement learning architecture and an algorithm of a GEO satellite and LEO satellite mixed constellation, and intelligent decision is made on a flow unloading request from a ground user terminal, so that the average flow unloading delay of the ground user terminal is optimized, and the problems of unbalanced LEO satellite flow load and overlarge unloading delay caused by isomerism and high dynamic property in an heaven-earth integrated network are solved.

Description

Self-adaptive flow unloading method of high-dynamic topology heaven-earth integrated network

Technical Field

The invention relates to a low-orbit constellation network flow unloading method applied to the field of space information, and belongs to the technical field of flow distribution in a wireless communication network.

Background

In a conventional satellite communication system, GEO satellites are used for a static topology network, and uninterrupted transmission is performed between the satellites and a ground station, so that wide coverage can be achieved even if only one satellite is deployed. GEO satellites, however, are more suitable for providing delay insensitive communication services because of their longer earth transmission delay. While large LEO constellations can provide high throughput, high bandwidth rates, and low service delays for global coverage, the high speed motion of LEO satellites relative to the ground results in shorter durations of their links with the ground and faster changes in the satellite-to-ground network topology. In addition, it is difficult to satisfy various business demands by only GEO satellites or LEO satellites. For GEO satellites, even though they have a large amount of communication and buffering resources, the long propagation delay of the satellite to the earth cannot meet certain delay sensitive class services. For LEO satellites, relatively little on-board resources are available, which may increase the offloading delay and packet loss rate in traffic offloading. Therefore, an heaven-earth integrated network based on a mixed constellation of GEO satellites and LEO satellites plays an important role in meeting the diversified demands of ground users.

In a typical world-wide integrated network, GEO satellites are generally part of the world-wide backbone network, and LEO satellites are part of the world-wide access network, responsible for setting up inter-satellite links, data exchanges, and relays between the world-wide and ground backbones. In addition, the ground users are unevenly distributed over the coverage of the satellites, which results in the problem that some satellites covering the hot spot area are not available with sufficient resources, while others are too much. Therefore, research on inter-satellite cooperative transmission has important significance for improving the utilization rate of the space-earth integrated network resources.

Because the long-distance transmission among different network segments in the heaven-earth integrated network can cause unstable communication links, most of the existing flow unloading methods face to the ground static topology network, the existing flow unloading methods cannot be directly applied to the heaven-earth integrated network with high dynamic topology, and because of the isomerism and the high dynamic property of the heaven-earth integrated network, flow transmission strategies are designed in a targeted manner so as to adapt to the unbalance of resources and the change of environment. Therefore, there is a need to design an adaptive traffic offload method for a heaven-earth integrated network that is oriented to a high dynamic topology. Reinforcement learning is a learning strategy method that can dynamically give near optimal traffic offload decisions by collecting network states and based on the network states. In addition, in the space-earth integrated network, because the nodes are highly dynamic, centralized control of global information is difficult to realize, so the centralized reinforcement learning method is not suitable for the space-earth integrated network with high dynamic.

Federal learning, as an enhanced decentralization approach, can be deployed on multiple LEO satellites and learn and make decisions in a distributed fashion. The federal learning can effectively utilize local data and satellite-borne computing resources of a plurality of LEO satellites, and save signaling overhead caused by frequent transmission of original training data. Therefore, the self-adaptive flow unloading method for the high-dynamic heaven-earth integrated network can be designed based on federal reinforcement learning.

Disclosure of Invention

The technical solution of the invention is as follows: the self-adaptive flow unloading method of the high-dynamic topology heaven-earth integrated network is provided, and the problems of unbalanced LEO satellite flow load and overlarge unloading delay caused by isomerism and high dynamic property in the heaven-earth integrated network are solved.

The technical scheme of the invention is as follows:

an adaptive traffic offload method for a high dynamic topology heaven and earth integrated network, comprising:

(1) The ground user terminal generates a traffic unloading task and sends a traffic unloading request to LEO satellites in a coverage area where the traffic unloading request is located;

(2) After receiving a flow unloading request of a ground terminal, the LEO satellite receives the unloading flow of the ground terminal;

(3) The LEO satellite makes traffic offload decisions for forwarding offload traffic:

(4) If the unloaded flow is forwarded to the adjacent LEO satellite, the adjacent LEO satellite continuously executes the flow unloading decision process of the step (3);

(5) If the unloading flow is forwarded to the GEO satellite, judging whether a local buffer queue is full, if so, directly discarding the data packet, otherwise, adding the data packet into the buffer queue, then forwarding the data packet to an earth station near the target terminal, and then forwarding the data packet to the target terminal by the earth station.

Further, the step (3) performs a traffic offload decision, specifically:

a) If the target terminal is located in the coverage area of the LEO satellite, forwarding the data to the target terminal through the LEO satellite in a one-hop mode;

b) If the target terminal is not in the LEO satellite coverage area and other satellites are required to forward the traffic, constructing an observation state vector s according to the collected network state information _i Will s _i Inputting a locally trained reinforcement learning model to obtain a flow unloading decision a _i And forwards traffic to neighboring LEO satellites or GEO satellites based on the decision information.

Further, an observation state vector s is constructed _i The method specifically comprises the following steps:

s _i ＝{l ₀ ,l ₁ ,…,l _k ,…,l _K ,l _G ,d ₀ ,d ₁ ,…,d _k ,…,d _K ,d _G ,

q ₀ ,q ₁ ,…,q _k ,…,q _K ,q _G ,B ₀ ,B ₁ ,…,B _k ,…,B _K ,B _G }

wherein l ₀ Indicating the link duration of the LEO satellite with a terrestrial user terminal, l _k Indicating the link duration of the LEO satellite with the neighboring LEO satellite k, l _G Representing a link duration of the LEO satellite with the GEO satellite; d, d ₀ Represents the distance of the LEO satellite from the ground user terminal, d _k Represents the distance, d, of the LEO satellite from the adjacent LEO satellite k _G Representing the distance of the LEO satellite from the GEO satellite; q ₀ Represents the buffer queue length, q, of the LEO satellite _k Buffer queue length, q, representing adjacent LEO satellite k of the LEO satellite _G Representing the buffer queue length of the GEO satellite; b (B) ₀ Indicating the available communication bandwidth of the LEO satellite, B _k Representing the available communication bandwidth of a neighboring LEO satellite k of the LEO satellite, B _G Representing the communication bandwidth available to GEO satellites.

Further, during each cycle, the LEO satellite broadcasts Hello packets to its neighboring LEO satellites, including the buffer queue length, location parameters, motion parameters, and available communication bandwidth local to the LEO satellite.

Further, training the reinforcement learning model is achieved specifically by the following modes:

(b1) At the beginning of each cycle, the LEO satellite calculates a reward R based on the actual discharge delay of the traffic _i The network state s in the last period is to be calculated _i Traffic offload decision a _i And a traffic offload bonus R _i Storing in an experience data set as history data;

(b2) Within each period, LEO satellites are randomly sampled in a local empirical data setTraining a reinforcement learning model and updating model parameters by a plurality of samples;

(b3) Periodically polymerizing the reinforcement learning model by the GEO satellite;

(b4) The GEO satellite broadcasts the aggregated reinforcement learning network model parameters to LEO constellations;

(b5) And after the LEO satellite receives the aggregation model parameters of the GEO satellite, updating the local reinforcement learning model parameters.

Further, the LEO satellite calculates a reward R according to the actual unloading delay of the flow _i The method specifically comprises the following steps:

the reward function obtained by LEO satellite performing traffic offload decision is defined as

Wherein, kappa _i Indicating traffic type, κ _S Representing delay-sensitive traffic, κ _N Indicating that the traffic is not delay sensitive,is an indication function;

for delay-sensitive traffic, the reward function for offloading traffic by LEO satellites is defined as

Wherein t is _i Indicating delay in flow unloading, t _max,i Representing a maximum tolerable delay for flow offloading, - Γ being a constant representing a loss due to flow offloading delay exceeding the maximum tolerable delay;

for non-delay sensitive traffic, the reward function for offloading traffic by LEO satellites is defined as

Further, traffic offload decision a _i The method comprises the following steps:

a _i ＝{x ₀ ,x ₁ ,…,x _k ,…,x _K ,b ₀ ,b ₁ ,…,b _k ,…,b _K }

wherein x is ₀ Representing data transmission to GEO satellite, x _k Representing offloading data to neighboring LEO satellites k, b of the LEO satellite ₀ Representing the communication bandwidth allocated for the link between the LEO satellite and the GEO satellite, b _k Representing the communication bandwidth allocated for the link between the LEO satellite and the neighboring LEO satellite.

Further, the transmission delay of one hop in the heaven-earth integrated network is calculated by the following formula:

wherein d is _ij Represents the distance between node i and node j, c represents the speed of light, l _i Indicating the size of the data packet, R _jj Representing the transmission rate of the link between node i and node j, q _i Representing the cache queue length of node i.

Further, if a data packet from a ground user terminal can be forwarded to a target node by one hop through the LEO satellite, traffic offload is delayed as

t _i ＝t _ul +t _lu

Wherein t is _ul Representing one-hop propagation delay from a user terminal to the LEO satellite, t _lu Representing a one-hop transmission delay from the LEO satellite to a target user terminal;

if the data packet cannot be forwarded to the target node by one hop through the LEO satellite, but can be forwarded to the target node by two hops through the LEO satellite adjacent to the data packet, the traffic unloading delay is

t _i ＝t _ul +t _ln +t _nu

Wherein t is _ln Representing one-hop propagation delay from the LEO satellite to its neighboring LEO satellites, t _nu Representing a one-hop transmission delay from the LEO satellite adjacent satellite to the target user terminal;

if the data packet cannot be forwarded to the target node through the LEO satellite and LEO satellites adjacent to the LEO satellite, the LEO satellite forwards the data packet to the GEO satellite, and the traffic offload delay of the process is that

Wherein t is _lg Representing one-hop propagation delay from the LEO satellite to the GEO satellite, t _gs Representing one-hop propagation delay from GEO satellite to earth station, t _su Representing the transmission delay from the earth station to the user terminal,represents the maximum buffer queue length, q, of a GEO satellite _G Representing the buffer queue length of the GEO satellite.

Further, the reinforcement learning model is based on a Q-learning model, and the updating method of model parameters is as follows:

wherein w represents a parameter of the Q-network, eta represents a learning rate, L _Q (w) is a loss function calculated as follows:

where λ is the discount factor and Q (s, a; w) is the Q network.

Further, the GEO satellite in step (b 3) periodically aggregates the reinforcement learning model, specifically:

at intervals, the GEO satellite randomly selects N from the LEO constellation _C Each LEO satellite sends a model transmission request to the selected LEO satellite;

after the selected LEO satellite receives the request from the GEO satellite, the local reinforcement learning model parameters w are obtained _c And number of samples n in training sample set _c Uploading to a GEO satellite;

and the GEO satellites perform model aggregation according to the received LEO satellite reinforcement learning model parameters and the number of samples in the training sample set.

Further, model polymerization was performed in the following manner:

wherein w is _agg Is the parameters of the reinforced learning network model after aggregation.

Compared with the prior art, the invention has the beneficial effects that:

(1) The invention adopts a self-adaptive flow unloading method based on federal reinforcement learning, and the method utilizes reinforcement learning to carry out real-time intelligent decision on a ground flow unloading request by collecting information such as topology information of an heaven-earth integrated network, states of inter-satellite and satellite-earth links, flow loads of adjacent satellites, available communication bandwidths and the like, so that the method is applicable to the heaven-earth integrated network with high dynamic topology;

(2) In the invention, the LEO satellite only carries out aggregation in the GEO satellite on the initial part of the local training model parameters, so as to update the global reinforcement learning model. The GEO satellite then sends the updated model back to the LEO satellite. Thus, the LEO satellite can quickly make traffic offloading decisions according to the updated model. The process can greatly save redundant original data exchange consumption, and the global model realizes global convergence of integrating all LEO satellite flow unloading characteristics;

(3) According to the invention, the federal reinforcement learning algorithm is deployed on the LEO satellites, so that the cooperative intelligent traffic unloading decision of a plurality of LEO satellites can be realized, the average traffic unloading delay of the user terminal in the heaven-earth integrated network is optimized, and the congestion problem caused by unbalanced load in the network is effectively avoided.

Drawings

FIG. 1 is a schematic diagram of an adaptive traffic offload architecture for an heaven-earth integrated network in accordance with the present invention;

fig. 2 is a schematic diagram of an heaven-earth integrated network traffic offloading flow in an embodiment of the present invention.

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the accompanying drawings.

The invention provides a self-adaptive flow unloading method of a high-dynamic topology heaven-earth integrated network, in the method, the LEO satellite has the main functions of receiving a flow unloading request from a ground user terminal and deciding whether to forward the flow to an adjacent LEO satellite or a GEO satellite, wherein the flow forwarding decision can be self-adaptively adjusted according to the network state through a reinforcement learning model local to the LEO satellite. The main function of GEO satellites is to forward traffic from LEO satellites to earth stations near the target user terminal while being responsible for collecting reinforcement learning model parameters from LEO satellites and performing periodic model aggregation. The schematic diagram of the adaptive traffic offload architecture of the heaven-earth integrated network in the invention is shown in fig. 1.

In the high dynamic topology heaven-earth integrated network traffic offloading method, the observed state space of the LEO satellite is defined as follows:

s _i ＝{l ₀ ，l ₁ ，…，l _k ，…，l _K ，l _G ，d ₀ ，d ₁ ，…，d _k ，…，d _K ，d _G ，

q ₀ ，q ₁ ，…，q _k ，…，q _K ，q _G ，B ₀ ，B ₁ ，…，B _k ，…，B _K ，B _G }

In addition, the distance between the LEO satellite and the ground user terminal and the distance between the LEO satellite and the adjacent satellite can be calculated by the following formula:

wherein x is _i ，y _i ，z _i The position coordinates of the satellite or ground terminal are represented by the following calculation method:

wherein θ is _k Andlatitude and longitude information representing a satellite or ground terminal, respectively. In particular, when h=0, x _i ，y _i ，z _i Indicating the location of the ground terminal.

The traffic offload decision space for LEO satellites is defined as follows:

a _i ＝{x ₀ ，x ₁ ，…，x _k ，…，x _K ，b ₀ ，b ₁ ，…，b _k ，…，b _K }

In a high dynamic topology heaven-earth integrated network, the main optimization objective of the traffic offload algorithm is to minimize the average traffic offload delay of the user terminals in the heaven-earth integrated network. Thus, traffic offload decision assessment for LEO satellites needs to be based on traffic offload delays. The flow unloading delay calculation method of the single user terminal in the heaven-earth integrated network is as follows:

1. for delay-sensitive traffic, the reward function for offloading traffic by LEO satellites is defined as

Wherein t is _i Indicating delay in flow unloading, t _max,i Representing the maximum tolerable delay for traffic offloading, - Γ being a constant representing the loss caused by the traffic offloading delay exceeding the maximum tolerable delay.

2. For non-delay sensitive traffic, the bonus function for offloading traffic by LEO satellites is defined as

In summary, the reward function obtained by LEO satellites performing traffic offload decisions is defined as

Wherein kappa is _i Indicating traffic type, κ _S Representing delay-sensitive traffic, κ _N Indicating that the traffic is not delay sensitive,to indicate a function.

The transmission delay of one hop in the heaven-earth integrated network is calculated by the following formula:

wherein d is _ij Represents the distance between node i and node j, c represents the speed of light, l _i Indicating the size of the data packet, R _ij Representing the transmission rate of the link between node i and node j, q _i Representing the cache queue length of node i.

In particular, for a certain LEO satellite, if a packet from a terrestrial user terminal can be forwarded to the target node by one hop through the LEO satellite, the traffic offload delay is

t _i ＝t _ul +t _lu

Wherein t is _ul Representing one-hop propagation delay from a user terminal to the LEO satellite, t _lu Representing a one-hop transmission delay from the LEO satellite to the target user terminal.

If the data packet cannot be forwarded to the target node by one hop through the LEO satellite, but can be forwarded to the target node by two hops through the LEO satellite adjacent to the LEO satellite, the traffic unloading delay is t _i ＝t _ul +t _ln +t _nu

Wherein t is _ln Representing one-hop propagation delay from the LEO satellite to its neighboring LEO satellites, t _nu Representing one-hop transmission delay from the LEO satellite's neighboring satellites to the target user terminal.

If the data packet can not be forwarded to the target node through the LEO satellite and LEO satellites adjacent to the LEO satellite, the LEO satellite forwards the data packet to the GEO satellite, and if the buffer queue of the GEO satellite is full, the data packet is directly discarded, otherwise, the data packet is forwarded to an earth station associated with the target node, and then the earth station sends the data packet to the target user terminal, and the traffic unloading delay in the process is that

Wherein t is _lg Representing one-hop propagation delay from the LEO satellite to the GEO satellite, t _gs Representing one-hop propagation delay from GEO satellite to earth station, t _su Representing the transmission delay from the earth station to the user terminal,representing the maximum buffer queue length for GEO satellites.

The transmission rate of the link between the nodes needs to be calculated when the traffic offload delay is calculated, including the transmission rate of the satellite-to-ground link and the transmission rate of the inter-satellite link. The achievable transmission rate of an inter-satellite link can be calculated by the following formula:

wherein P is _ts Representing the transmitted power of LEO satellites, G _ts Indicating the gain of LEO satellite transmitting antenna, G _rs Representing the gain of the LEO satellite receiving antenna, k representing the Boltzmann constant, N _s Representing noise temperature of the whole communication system E _b /N ₀ Representing the ratio of the received energy per bit to the noise spectral density, L _f The free space attenuation representing the signal power in the inter-satellite link is calculated as follows:

where c represents the speed of light, d represents the distance between two LEO satellites, and f is the center frequency of the transmitted signal.

The calculation formula of the satellite-ground link reachable transmission rate is as follows:

R _SG ＝Blog ₂ (1+SNR _SG )

wherein SNR is _SG The signal-to-noise ratio, representing the satellite-to-ground link, can be calculated by the following formula:

wherein P is _t Representing the transmitting power of the transmitting end, G _t Indicating the antenna gain of the transmitting end, G _r Indicating the antenna gain of the receiving end, L _r Represents the atmospheric attenuation of the satellite-to-ground link, Σp _t′ G _t′ G _r′ L _f′ L _r′ Indicating interference generated by other satellite-to-ground link transmissions on the satellite-to-ground link, N indicating noise power.

As shown in fig. 2, the flow of the LEO satellite flow unloading method based on federal reinforcement learning of the invention is as follows:

1. the ground user terminal sends a flow unloading request to LEO satellites in the coverage area where the ground user terminal is located;

2. after receiving a flow unloading request of a ground user terminal, the LEO satellite receives unloading flow from the ground user terminal;

3. the LEO satellite makes decisions on forwarding of the offload traffic:

a) If the target node for unloading the data is in the coverage area of the LEO satellite, directly forwarding the data to the target node;

b) If the target node for unloading data is not in the coverage of the LEO satellite, constructing an observation state vector s according to the collected network state information _i Will s _i Inputting a locally trained reinforcement learning model to obtain a flow unloading decision a _i Forwarding the traffic to an adjacent LEO satellite or GEO satellite according to the decision information;

4. if the unloading flow is forwarded to the adjacent LEO satellite, the adjacent LEO satellite continues to execute the flow unloading decision process of the step 3;

5. if the unloading flow is forwarded to the GEO satellite, the data packet is directly discarded if the GEO satellite buffer queue is full, otherwise, the data is forwarded to an earth station near the target terminal, and then the earth station forwards the data to the target terminal.

Training the reinforcement learning model is realized in the following manner:

(1) LEO satellite calculates a reward R based on actual offloading delays of traffic _i And will<s _i ,a _i ,R _i ,s _i+1 >Storing the experience data set;

(2) Within each period, LEO satellites are randomly sampled in a local empirical data setTraining a reinforcement learning model based on a Q-learning model and updating model parameters by the following method:

wherein w represents a parameter of the Q-network, eta represents a learning rate, L _Q (w) is lossThe function is calculated as follows:

(3) The GEO satellites periodically aggregate the reinforcement learning model:

a) GEO random selection N _C A plurality of LEO satellites;

b) The selected LEO satellite will locally train model parameters w _c And number of samples n in training sample set _c Uploading to a GEO satellite;

c) The GEO satellite carries out model aggregation according to the received LEO satellite training model and the sample number, and the aggregation mode is as follows:

(4) The GEO satellite broadcasts the aggregated reinforcement learning network model parameters to LEO constellations;

(5) And after the LEO satellite receives the aggregation model parameters of the GEO satellite, updating the local reinforcement learning model parameters.

The invention is not described in detail in the field of technical personnel common knowledge.

Claims

1. The self-adaptive traffic unloading method of the high-dynamic topology heaven-earth integrated network is characterized by comprising the following steps of:

2. The adaptive traffic offload method for a high dynamic topology heaven and earth integration network of claim 1, wherein: and (3) carrying out flow unloading decision, specifically:

3. The adaptive traffic offload method for a high dynamic topology heaven and earth integration network of claim 2, wherein: construction of an observed State vector s _i The method specifically comprises the following steps:

wherein l ₀ Indicating the link duration of the LEO satellite with a terrestrial user terminal, l _k Indicating the link duration of the LEO satellite with the neighboring LEO satellite k, l _G Representing a link duration of the LEO satellite with the GEO satellite; d, d ₀ Represents the distance of the LEO satellite from the ground user terminal, d _k Representing the LEO satelliteDistance, d, adjacent LEO satellite k _G Representing the distance of the LEO satellite from the GEO satellite; q ₀ Represents the buffer queue length, q, of the LEO satellite _k Buffer queue length, q, representing adjacent LEO satellite k of the LEO satellite _G Representing the buffer queue length of the GEO satellite; b (B) ₀ Indicating the available communication bandwidth of the LEO satellite, B _k Representing the available communication bandwidth of a neighboring LEO satellite k of the LEO satellite, B _G Representing the communication bandwidth available to GEO satellites.

4. The adaptive traffic offload method for a high dynamic topology heaven-earth integration network of claim 3, wherein: during each period, the LEO satellite broadcasts Hello packets to its neighboring LEO satellites, including the buffer queue length, location parameters, motion parameters, and available communication bandwidth local to the LEO satellite.

5. The adaptive traffic offload method for a high dynamic topology heaven-earth integration network of claim 3, wherein: training the reinforcement learning model is realized in the following manner:

6. The adaptive traffic offload method for a high dynamic topology heaven and earth integration network of claim 5, wherein: the LEO satellite calculates a reward R according to the actual discharge delay of the flow _i The method specifically comprises the following steps:

7. The adaptive traffic offload method for a high dynamic topology heaven and earth integration network of claim 6, wherein: traffic offload decision a _i The method comprises the following steps:

a _i ＝{x ₀ ,x ₁ ,…,x _k ,…,x _K ,b ₀ ,b ₁ ,…,b _k ,…,b _k }

8. The adaptive traffic offload method for a high dynamic topology heaven and earth integration network of claim 6, wherein: the transmission delay of one hop in the heaven-earth integrated network is calculated by the following formula:

9. The adaptive traffic offload method for a high dynamic topology heaven and earth integration network of claim 8, wherein: if a packet from a ground user terminal can be forwarded to a target node by one hop through the LEO satellite, traffic offload is delayed as

t _i ＝t _ul +t _lu

t _i ＝t _ul +t _ln +t _nu

10. The adaptive traffic offload method for a high dynamic topology heaven and earth integration network of claim 9, wherein: the reinforcement learning model is based on a Q-learning model, and the updating method of model parameters is as follows:

where λ is the discount factor and Q (s, a; w) is the Q network.

11. The adaptive traffic offload method for a high dynamic topology heaven and earth integration network of claim 5, wherein: the step (b 3) of periodically polymerizing the reinforcement learning model by the GEO satellite comprises the following specific steps:

12. The adaptive traffic offload method for a high dynamic topology heaven and earth integration network of claim 11, wherein: model polymerization is carried out in the following manner: