CN114629547B

CN114629547B - High-throughput beam hopping scheduling method for differentiated services

Info

Publication number: CN114629547B
Application number: CN202210273871.5A
Authority: CN
Inventors: 白卫岗; 刘聪俐; 李建东; 史琰; 周笛; 李浩然; 朱彦
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-03-19
Filing date: 2022-03-19
Publication date: 2023-01-20
Anticipated expiration: 2042-03-19
Also published as: CN114629547A

Abstract

The invention provides a high-throughput beam hopping scheduling method for differentiated services, which mainly solves the problems of low utilization rate of on-satellite resources and high computational complexity in the prior art. The implementation scheme is as follows: dividing wave position cells in a satellite coverage range into different clusters; establishing a clustering model for load balancing among clusters according to the load balancing principle among clusters and the geographical position near principle of the wave position cell in the cluster: solving the clustering model with balanced load among clusters by using an immune algorithm to obtain a clustering result; determining the delay tolerance taking the hopping time slot as a unit according to the requirements of different service types on the delay; establishing a hopping wave beam scheduling model based on the delay tolerance: and setting a beam dynamic scheduling model in each divided cluster, regarding the beam dynamic scheduling model as a Markov decision process, and solving according to a deep reinforcement learning algorithm to obtain a beam scheduling result. The invention can improve the resource utilization rate and reduce the calculation complexity while ensuring the throughput, and can be used for satellite resource allocation.

Description

High-throughput beam hopping scheduling method for differentiated services

Technical Field

The invention relates to the field of satellite communication, in particular to a beam hopping satellite beam clustering scheduling method which can be used for reasonably allocating satellite resources to cells with different service requirements in a scene with fast service requirement dynamic change in a satellite coverage area.

Background

The early single beam is mostly a global beam or an area beam, the beam width is wide, and the antenna gain is small. In order to cope with the rapid increase in traffic demand, multi-beam technology based on spot beams is employed. The wave beam is narrow, so the gain is high, and the frequency reuse technology is added, thereby the system capacity of the satellite is obviously improved, and more services are served. However, with the rapid development of communication and internet of things, terrestrial services exhibit the characteristic of uneven space-time distribution, and particularly, the high dynamics of low-orbit satellites face more uneven services. In order to solve the problem, improve the resource utilization rate, avoid the condition of 'uneven strain' as much as possible, and provide a beam hopping technology, thereby further improving the system capacity. However, in the aspect of resource allocation, the existing beam hopping technology is usually designed based on the heterogeneity of the ground service requirements, and focuses on finding the optimal throughput to meet the capacity requirements of different areas, but does not consider the factors such as the service type and the time delay. In actual process, the service delay requirements of different users are different. In order to guarantee user experience, the requirements of throughput and service delay need to be considered comprehensively. In terms of clustering, in order to simplify the system and improve the resource utilization rate, a uniform clustering, uniform power allocation, and full frequency multiplexing manner are usually adopted, that is, the power allocated to each cluster is uniform and non-adjustable. Under the condition that inter-cluster resources are the same, non-uniformity of inter-cluster services is not considered, uniform clustering is carried out, and overload or underload phenomena can occur.

The patent document "hopping pattern optimization method and device based on time slot allocation algorithm, and storage medium" (patent application No. 201910675600.0, application publication No. CN 110518956A) applied by the university of civil liberation army engineering of china discloses an improved hopping beam time slot allocation method under a clustering scene. The method allocates the time slot number for each cell in a pre-allocation mode, and then reallocates by using the same frequency interference distance threshold, thereby effectively eliminating the influence of interference on the signal quality while improving the system capacity. However, the method does not consider the service delay performance, so that the real-time service may fail due to timeout waiting.

A method for scheduling resources based on beam hopping is disclosed in a patent document applied by Shanghai Yuanxin satellite science and technology Co., ltd (patent application number 201811070246.0, application publication number CN 109121147A). The method is characterized in that a satellite coverage area is represented by two three-dimensional matrixes. The first matrix is a user actual demand matrix, and a three-dimensional matrix is formed by adding a time dimension; the second matrix is a beam hopping service matrix, and a time dimension is also added. And obtaining a target matrix to be optimized by multiplying the two matrixes, and then solving. The method solves the problem that the satellite capacity is matched with the ground requirement in the fast moving scene of the low-orbit satellite. However, with the increase of the number of satellite beams and wave position cells, the search space of the optimization algorithm facing global scheduling is increased sharply, and the complexity of the algorithm is improved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a high-throughput beam hopping scheduling method facing differentiated services, so that the throughput is guaranteed, the resource utilization rate is improved, and the calculation complexity is reduced.

The specific idea of the invention for realizing the purpose is as follows: by establishing a load balancing clustering model based on an immune algorithm, wave bit cells in a satellite coverage range are divided into different clusters, so that the calculation complexity is reduced; and (3) completing beam resource scheduling by constructing a beam resource scheduling model facing to delay tolerance constraint and adopting a deep reinforcement learning method based on priority playback.

According to the above thought, the technical scheme of the invention comprises the following steps:

(1) And generating clustering results in the satellite coverage range:

1a) Dividing wave bit cells in a satellite coverage range into different clusters;

1b) Establishing a clustering model for load balancing among clusters according to the load balancing principle among clusters and the geographical position near principle of the wave position cell in the cluster:

s.t.C ₁ :d _mn ≤s,m∈M _i and n∈M _i

wherein P is an objective function for minimizing mean square error of load among clusters, K is the number of clusters, N is the number of wave position cells, R is _j Traffic demand, X, for cell j _ij Representing whether the wave position j belongs to a cluster i, R is the load mean of K clusters, C ₁ As a distance constraint, C ₂ To ensure the constraint condition that a wave position cell only belongs to a cluster, m and n represent two wave position cells in the same cluster, d _mn Representing the wave position center distance of M cells and n cells, s is the upper limit of the distance between two wave position centers in the cluster, M _i Represents a set of wave position cells belonging to cluster i;

1c) Solving the clustering model with balanced load among clusters by using an immune algorithm to obtain a clustering result;

(2) Establishing a cluster beam dynamic scheduling model:

2a) Determining the delay tolerance taking a hopping time slot as a unit for different service types according to the requirements of the service types on the delay;

2b) Executing the process that different types of data packets wait to be issued in an on-satellite cache queue, wherein the corresponding delay tolerance is reduced by one every time a jump time slot passes, and the data packets are discarded when the on-satellite waiting delay exceeds the tolerance of the delay tolerance and are regarded as overtime failure;

2c) According to the principle of maximizing service guarantee rate and minimizing overtime failure rate of the service, establishing a hopping wave beam scheduling model based on delay tolerance:

wherein, P ₁ In order to maximize the service guarantee rate of the intra-cluster service, T is the set of all decision moments of the beam hopping satellite in the coverage time range of the same area, N is the total wave position cell number in the cluster,

is shown at t _j The amount of packets sent to cell n after the end of the slot,

indicates a cutoff to t _j Time slot, the total amount of packets, P, received by the satellite for cell n ₂ To minimize intra-cluster traffic timeout failure rates,

amount of data packets indicating that the data packet whose destination address is the wave bit cell n has failed due to timeout waiting, C ₁ The number of cells for obtaining beam scheduling in each hopping time slot in the cluster is 1,C ₂ To ensure that the data packets on the satellite at the current time do not exceed the maximum limit,

is shown at t _j Whether slot-wave-bit cell n is illuminated by the operating beam,

indicating that the light is illuminated, whereas, not illuminated,

is shown at t _j After the time slot is finished, the number of data packets stored in a satellite memory corresponding to the wave position cell n is equal to L, and L is the maximum capacity of a cache queue of each wave position cell on the satellite;

(3) And (3) establishing the beam dynamic scheduling model in the step (2) in each cluster divided in the step (1), regarding the scheduling model problem as a Markov decision process, and solving according to deep reinforcement learning to obtain a beam scheduling result.

Compared with the prior art, the invention has the following advantages:

first, the computational complexity is reduced: aiming at a satellite system with uniform power distribution and full frequency multiplexing, along with the increase of the number of satellite beams and wave position cells, the search space of an optimization algorithm facing global scheduling is increased sharply, and the algorithm complexity is improved; the load balancing clustering model established by the invention divides a complex task into a plurality of subtasks, so that the search space is reduced, and the computational complexity is reduced.

Secondly, the system service guarantee rate is improved: compared with the existing beam hopping scheduling algorithm, the invention considers the difference of services in the scheduling process, provides a beam scheduling model based on the time delay tolerance, and during specific operation, establishes an optimization problem by taking the maximum service guarantee rate and the minimum service overtime failure rate as objective functions, so that the system throughput is increased, the service guarantee rate is improved, and the service overtime failure rate is reduced.

Description of the drawings:

FIG. 1 is a general flow chart of an implementation of the present invention;

FIG. 2 is a sub-flowchart for solving a load balancing clustering model using an immune algorithm in accordance with the present invention;

FIG. 3 is a wave level cell layout of the present invention;

FIG. 4 is a graph of the clustering results in the present invention;

FIG. 5 is a state reconstruction diagram in the present invention;

FIG. 6 is a graph comparing the convergence rate of deep reinforcement learning of the present invention with the prior global hopping algorithm;

FIG. 7 is a graph comparing the normalized throughput of the present invention with different prior art hopping algorithms;

fig. 8 is a graph comparing service coverage of the present invention with that of the existing different hopping algorithms.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with examples are described in further detail below.

The present example includes two major parts: the first part is to use an immune algorithm to generate a clustering result of load balance, and the second part is to use a deep reinforcement learning algorithm to obtain a beam dynamic scheduling result.

Referring to fig. 1, the implementation steps of this example are as follows:

a first part: an immune algorithm is used to produce load-balanced clustering results.

Step 1, dividing wave position cells in a satellite coverage range into different clusters.

Initializing all cell service request volumes and geographic positions in the coverage area of the beam hopping satellite, wherein the cell service request volumes and the geographic positions comprise the relative geographic positions of 19 wave position cells in the coverage area, the service request volume of each cell and wave position cell numbers as shown in figure 3;

and determining the number of the divided clusters according to the number K of the working beams owned by the single satellite, and enabling the working beams to correspond to the clusters one by one, wherein each working beam is responsible for carrying out dynamic beam scheduling in one cluster.

In the embodiment, the satellite is provided with 3 working beams, 3 clusters are divided according to the one-to-one correspondence relationship between the single satellite working beams and the clusters, and each working beam is responsible for carrying out dynamic beam scheduling in one cluster.

And 2, establishing a cluster model for load balancing among clusters according to the load balancing principle among clusters and the principle that the geographical position of the wave position cell in the cluster is close.

2.1 Determining the service request quantity of each wave position cell, and calculating the sum S of the service request quantities of all cells in the coverage area of the satellite;

2.2 Calculate the load mean of K clusters

Establishing a load balancing optimization target:

wherein P is an objective function for minimizing mean square error of load among clusters, the number of K clusters, N is the number of wave position cells, R _j Traffic demand, X, for cell j _ij Representing whether the wave position j belongs to the cluster i or not, wherein R is the load mean value of K clusters;

2.3 Determining the distance upper limit s of the centers of two wave positions in the cluster, and establishing a distance constraint condition:

d _mn ≤s,m∈M _i and n∈M _i

where m, n represent two wave site cells in the same cluster, d _mn Representing the wave position center distance, M, of M and n cells _i Representing the set of wave-bit cells belonging to cluster i.

2.4 Combining the load balancing optimization target in 2.2) with the distance constraint condition in 2.3) to obtain a clustering model of load balancing among clusters:

wherein P is an objective function for minimizing mean square error of load among clusters, K is the number of clusters, N is the number of wave position cells, R is _j Traffic demand, X, for cell j _ij Indicating whether the wave position j belongs to a cluster i, R is the load mean value of K clusters, C ₁ As a distance constraint, C ₂ To ensure the constraint condition that a wave position cell only belongs to a cluster, m and n represent two wave position cells in the same cluster, d _mn Representing the wave position center distance of M cells and n cells, s is the upper limit of the distance between two wave position centers in the cluster, M _i Representing the set of wave-bit cells belonging to cluster i.

And 3, solving the clustering model with balanced load among clusters by using an immune algorithm to obtain a clustering result.

Referring to fig. 2, the specific implementation of this step is as follows:

3.1 Initializing the central wave position population of each cluster and a memory base, namely setting the iteration number N according to the number of wave position cells and the number of divided clusters _e Population size S, memory pool capacity O, crossover probability P _c Probability of mutation P _m Diversity evaluation parameter P _s And setting the current iteration number n =0; randomly generating M initial clustering center antibody populations, wherein M is 35 in the embodiment;

3.2 For clustering models, the affinity function is designed:

wherein A is _v For the affinity function, P is an objective function P that minimizes the mean square error of the load between clusters, C is a penalty constant for solutions that do not satisfy the distance requirement, and Y denotes that the distance constraint C is not satisfied ₁ The number of wave bits;

3.3 Calculating fitness function values of all individuals according to the fitness function expression in 3.2);

3.4 Calculate the reproduction rate and antibody concentration for all individuals:

3.4.1 Calculate the affinity between the two antibodies:

wherein S is _v,s For affinity between antibodies, k _v,s Indicates the same number of bits of the antibody s and the antibody v, and L indicates the length of the antibody;

3.4.2 Antibody concentration was calculated from the results of 3.4.1):

wherein, C _v The concentration of the antibody, N is the total number of the antibody, and T is a preset threshold value;

3.4.3 From the fitness function and antibody concentration, the rate of reproduction was calculated:

wherein P is the reproduction rate, alpha is a constant, A _v As a fitness function value, C _v Is the antibody concentration;

3.5 According to the reproduction rate, adding individuals with high fitness and low antibody concentration into a memory bank, and taking the first S individuals to form a parent group;

3.6 Adopting a selection mechanism of roulette to select individuals for crossing and variation to obtain a new group, and then taking out part of individuals from a memory bank to form the new group together with the new group;

3.7 ) whether the maximum number of iterations N has been reached _e ：

If the iteration number N is reached _e And then outputting the optimal clustering result A,

otherwise, repeat 3.3) -3.6) until the number of iterations N is reached _e And outputting an optimal clustering result A.

The clustering result obtained in this embodiment is shown in fig. 4, where the wave site cells of the same color are a cluster.

A second part: and obtaining a beam dynamic scheduling result by using a deep reinforcement learning algorithm.

And step 4, determining the delay tolerance taking the hopping time slot as a unit according to the requirements of different service types on the delay.

4.1 Calculate the time delay that the data packet has passed during its transmission before reaching the destination satellite:

T _{delay_1} ＝T _prop +T _trans

wherein, T _{delay_1} Total time delay experienced before reaching destination satellite，T _prop For propagation delay, T _trans Is the transmission delay;

4.2 Estimate the transmission delay T of the data packet from the destination satellite to the user terminal _{delay_2} ；

4.3 Time delay limit T in QoS guarantee for determining the service type of the data packet _limit Determining the length BH of a hopping time slot of a hopping beam _slot And calculating the residual delay tolerance of the data packet:

wherein D is _tole For residual delay tolerance, T _{delay_1} The total delay experienced in the transmission before reaching the destination satellite. In this embodiment, the service type in 3 is set, and the remaining delay tolerance is 2, 4, and 20, respectively.

And 5, continuously reducing the residual delay tolerance along with time.

And executing the process that different types of data packets wait to be issued in the on-satellite cache queue, wherein the corresponding delay tolerance is reduced by one every time a jump time slot passes, and when the on-satellite waiting delay of the data packets exceeds the tolerance, the data packets are discarded and regarded as overtime failure.

And 6, establishing a hopping wave beam scheduling model based on the delay tolerance according to the principles of maximizing the service guarantee rate and minimizing the overtime failure rate of the service.

6.1 Characterize intra-cluster beam hopping system scenarios:

setting N in a working beam service cluster as wave position cells, expressing the service request quantity of each cell in a data packet mode and obeying the arrival rate of lambda _i I =1,2,. Cndot, poisson distribution of N;

representing the amount of data stored in the satellite memory corresponding to each wave bit cell as

Wherein

Is shown at t _j After the time slot is finished, the number of data packets stored in a satellite memory corresponding to the nth wave position cell;

6.2 According to the data packet arrival condition of each cell, establishing an optimization objective function for maximizing the service guarantee rate of the service and minimizing the overtime failure rate of the service:

wherein, P ₁ To maximize the rate of service guarantees, P, for intra-cluster services ₂ In order to minimize the business overtime failure rate in the cluster, T is the set of all decision moments of the beam hopping satellite in the same area coverage time range, N is the total wave position cell number in the cluster,

denotes a cutoff to t _j The slotted satellite receives the total amount of packets destined for cell n,

the data packet quantity which indicates that the data packet with the destination address of the wave bit cell n is invalid due to overtime waiting is represented;

6.3 According to the one-to-one correspondence relationship between the working beam and the cluster, a beam constraint condition is established:

wherein, the first and the second end of the pipe are connected with each other,

is shown at t _j Whether the slot wave position cell n is illuminated by the operating beam,

indicating illuminated, whereas it is not illuminated;

6.4 Determining the maximum capacity L of each wave bit cell buffer queue on the satellite, and establishing a buffer constraint condition:

wherein the content of the first and second substances,

6.5 The optimization objective function of maximizing the service guarantee rate and minimizing the service overtime failure rate in 6.2) is combined with the beam constraint condition in 6.3) and the cache constraint condition in 6.4) to obtain a beam hopping scheduling model based on the delay tolerance:

wherein, C ₁ As a beam constraintEnsuring that the number of cells obtaining beam scheduling in each hopping time slot in the cluster is 1,C ₂ And the data packet on the satellite at the current moment is not beyond the maximum limit for the buffer constraint condition.

Step 7, in each cluster of the first part, establishing the beam scheduling model in 6.5), and regarding each scheduling model problem as a Markov decision process.

The Markov decision process comprises the design of state, action and reward, the optimization objective function of maximizing service guarantee rate and minimizing service overtime failure rate in the beam scheduling model is converted into reward, the beam constraint condition in the beam scheduling model is converted into action, and the arrival condition of a data packet with delay tolerance in each wave position cell is converted into state, and the method is specifically realized as follows:

7.1 The number matrix of the data packets to be transmitted with different residual delay tolerances in each wave bit queue of the current time slot is designed as follows:

is t _j The state at the time of the time slot,

is a two-dimensional state matrix;

the matrix

The reconstruction is obtained by reconstructing the arrival condition of the data packet in each wave bit buffer queue in the current time slot cluster, and the specific state reconstruction process is shown in fig. 5, wherein a wave bit n represents a cell with the number of n in the cluster, and T is _th For the maximum delay tolerance, t, of all traffic types _j For the time slots that the data packets go through during the waiting process, the "x", "o" and "Δ" represent the arrival of three different types of service data packets, respectively, and the left side of fig. 5 showsDividing the number of data packets in each wave bit queue in the current time slot cluster according to types to obtain a right graph matrix

The row number represents the residual delay tolerance, the column number represents the wave bit cell number, and the value in the row b column of the matrix a represents the number of data packets with the residual delay tolerance a in the wave bit cell b;

7.2 Design action selects the illuminated wave position cell for the current time slot:

wherein the content of the first and second substances,

is t _j Action of selection in time slot, x _n Indicating whether the wave position is irradiated by the working beam in the time slot or not, wherein N is the total number of wave position cells in the cluster;

7.3 Design reward is the difference between the number of processed packets and the number of failed packets for the current timeslot:

wherein the content of the first and second substances,

is t _j The reward obtained after the time slot has performed the action,

indicating the total number of packets processed by the system after the current time slot selection action,

the number of the packets which are failed by the total timeout of the current time slot.

And 8, solving by utilizing deep reinforcement learning according to the Markov decision process design in the step 7 to obtain a beam scheduling result.

8.1 Initialization parameters):

8.1.1 Initializing scene parameters within a cluster of beam-hopping satellites:

confirming wave position cell parameters and wave beam parameters in each cluster according to the clustering result obtained in the step 3; the cluster wave position cell parameters comprise the serial number of the cluster cell, the service request quantity of the cluster cell, the service type and the data packet size; the beam parameters comprise single beam working bandwidth and single beam power;

8.1.2 Initializing deep reinforcement learning parameters:

setting a training period M, the time slot number T of each period, a learning rate alpha and an experience pool capacity N by taking a working beam as an agent and the arrival condition of a data packet in a cluster as an environment _ep Bulk data size N _b Discount factor gamma, network update frequency C, current network Q, target network

A greedy factor ε;

8.2 Initialize the state of the current environment to s) _t Updating the greedy factor epsilon;

8.3 S) is to _t As the input of Q network, obtaining Q value output corresponding to the used actions of Q network, and selecting action a by epsilon-greedy method _t ；

8.4 In state s) _t Lower execution action a _t To obtain a new state s _t+1 And a prize r _t ；

8.5 S) mixing(s) in 8.4) _t ,a _t ,r _t ,s _t+1 ) Storing the current environment into an experience pool, and updating the current environment to be s _t+1 ；

8.6 Sample N from the experience pool _b Training the Q network by using a sample, and updating the Q network by using a gradient descent method;

8.7 ) whether the current time slot t reaches the network updating frequency is judged:

if t modC =1, i.e. the remainder of t integer division C is 1, then update

Network, otherwise, not updating

A network;

8.8 ) whether the current time slot T reaches the time slot number T of each period is judged:

if T = T, judging whether the current iteration round number M reaches the training period M:

if M = M, the iteration is terminated, and a scheduling model with the training finished is output;

otherwise, repeating 8.2) -8.7), and continuing training;

if T ≠ T, repeat 8.3) -8.7), continue training.

The effect of the present invention can be further illustrated by the following simulation results:

1. simulation conditions are as follows:

simulation parameters: setting 3 satellite working beams, 19 satellite coverage wave digits, 100MHz single beam working bandwidth, 70W single beam power, 20ms beam hopping time slots, 20kbits of data packet size, 2, 4 and 20 delay tolerance of three services, 20 population scale, 100 iteration times of an immune algorithm, 0.95 diversity evaluation parameter, 0.5 cross probability, 0.4 variation probability, 15 memory bank capacity, 600 cycles of deep reinforcement learning training cycle, 1000 time slots per cycle, 0.00001 learning rate, 100000 empirical pool capacity, 32 batch data size, 0.9 discount factor, 20 updating step length, 1 initial exploration rate and 0.01 final exploration rate.

The simulation environment is as follows: MATLABR2018b, python3.6.

2. Simulation content and results

Simulation 1: the delay tolerance-based beam hopping clustering scheduling method and the deep reinforcement learning convergence rate of the existing global hopping algorithm are simulated respectively, the result is shown in fig. 6, as can be seen from fig. 6, the convergence is started in iteration 400, but the convergence is started only in 1200 according to the global scheduling method, and the scheduling method of the invention improves twice the convergence rate compared with the global scheduling and reduces the computational complexity.

Simulation 2: the delay tolerance-based beam hopping clustering scheduling method is simulated with the normalized throughput of the existing longest queue priority, polling and random scheduling hopping algorithm, the result is shown in fig. 7, and as can be seen from fig. 7, when the supply-demand ratio is 110%, the throughput of the method is respectively improved by 6%, 10% and 15% compared with the longest queue priority, polling and random scheduling algorithm.

Simulation 3: the delay tolerance-based hopping beam clustering scheduling method is simulated with the service guarantee rates of the existing longest queue priority, polling and random allocation hopping algorithms, the result is shown in fig. 8, and as can be seen from fig. 8, when the supply-demand ratio is 110%, the service demand guarantee rates of the delay tolerance-based hopping beam clustering scheduling method are respectively improved by 7%, 11% and 15% compared with the service demand guarantee rates of the longest queue priority, polling and random allocation algorithms.

The foregoing description is only an example of the present invention and is not intended to limit the invention, so that it will be apparent to those skilled in the art that various changes and modifications in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A high throughput beam hopping scheduling method for differentiated services is characterized by comprising the following steps:

(1) And generating clustering results in the satellite coverage range:

1a) Dividing wave position cells in a satellite coverage range into different clusters;

s.t.C ₁ :d _mn ≤s,m∈M _i and n∈M _i

wherein P is an objective function for minimizing mean square error of load among clusters, K is the number of clusters, N is the number of wave position cells, R _j Traffic demand, X, for cell j _ij Representing whether the wave position j belongs to a cluster i, R is the load mean of K clusters, C ₁ As a distance constraint, C ₂ To ensure the constraint condition that a wave position cell only belongs to a cluster, m and n represent two wave position cells in the same cluster, d _mn Representing the wave position center distance of M cells and n cells, s is the upper limit of the distance between two wave position centers in the cluster, M _i Represents a set of wave position cells belonging to cluster i;

(2) Establishing a cluster beam dynamic scheduling model:

s.t.C ₁ :

C ₂ :

amount of data packets indicating that the data packet whose destination address is the wave bit cell n has failed due to timeout waiting, C ₁ The number of cells for obtaining beam scheduling in each hopping time slot in the cluster is 1 ₂ To ensure that the data packets on the satellite at the current time do not exceed the maximum limit,

indicating that the light is illuminated, whereas, not illuminated,

is shown at t _j Satellite storage corresponding to wave position cell n after time slot is finishedThe number of data packets stored in the device, L is the maximum capacity of each wave bit cell buffer queue on the satellite;

(3) And (3) establishing the beam dynamic scheduling model in the step (2) in each cluster divided in the step (1), taking the scheduling model problem as a Markov decision process, and solving according to deep reinforcement learning to obtain a beam scheduling result.

2. The method of claim 1, wherein 1 a) the wave position cells within the coverage area of the satellite are divided into different clusters, the number of the divided clusters is determined according to the number of the operating beams K owned by a single satellite, the operating beams are in one-to-one correspondence with the clusters, and each operating beam is responsible for performing dynamic beam scheduling in one cluster.

3. The method according to claim 1, wherein in 1 b), a clustering model for load balancing among clusters is established according to the principle of load balancing among clusters and the principle of geographical location proximity of a wave site cell in a cluster, and the following is achieved:

1b1) Determining the service request quantity of each wave position cell, and calculating the sum S of the service request quantities of all cells in the coverage area of the satellite;

1b2) Calculating the load mean of K clusters

Establishing a load balancing optimization target:

wherein P is an objective function for minimizing mean square error of load among clusters, K is the number of clusters, N is the number of wave position cells, R _j Traffic demand, X, for cell j _ij Representing whether the wave position j belongs to the cluster i or not, wherein R is the load mean value of K clusters;

1b3) Determining the distance upper limit s of the centers of two wave positions in the cluster, and establishing a distance constraint condition:

d _mn ≤s,m∈M _i and n∈M _i

where m, n represent two wave site cells in the same cluster, d _mn Representing the wave position center distance, M, of M and n cells _i Represents a set of wave position cells belonging to cluster i;

1b4) Combining the load balancing optimization target in 1b 2) with the distance constraint condition in 1b 3) to obtain a clustering model for load balancing among clusters:

s.t.C ₁ :d _mn ≤s,m∈M _i and n∈M _i

C ₂ :

4. the method according to claim 1, wherein 1 c) the clustering model for load balancing among clusters is solved by using an immune algorithm, and the following is realized:

1c1) Initializing a central wave position population and a memory library of each cluster;

1c2) For the clustering model, an affinity function is designed:

1c3) Sorting the population in a descending order according to the affinity function in 1c 2), selecting the first H individuals to form a parent population, sequentially performing selection, crossing and mutation operations to obtain a new population, taking out partial individuals from a memory bank to jointly form a new generation population,

1c4) Repeating 1c 3) for a plurality of iterations until a maximum number of iterations N is reached _e And obtaining an optimal solution.

5. The method of claim 1, wherein in 2 a), the delay tolerance in units of hopping slots is determined according to the requirements of different service types on delay, and is implemented as follows:

2a1) Calculating the time delay of the data packet in the transmission process before reaching the target satellite:

T _{delay_1} ＝T _prop +T _trans

wherein, T _{delay_1} Total time delay, T, experienced before reaching the destination satellite _prop For propagation delay, T _trans Is the transmission delay;

2a2) Estimating the transmission delay T of data packet from destination satellite to user terminal _{delay_2} ；

2a3) Determining time delay limit T in QoS guarantee of service type of data packet _limit Determining the length BH of a beam hopping slot _slot And calculating the residual delay tolerance of the data packet:

wherein D is _tole For residual delay tolerance, T _{delay_1} The total delay experienced during transmission before reaching the destination satellite.

6. The method of claim 1, wherein 2 c) according to the principles of maximizing service guarantee rate and minimizing service timeout failure rate, a time delay tolerance-based beam hopping scheduling model is established as follows:

2c1) Characterizing a cluster internal hopping beam system scenario:

setting N in a working beam service cluster as wave position cells, expressing the service request quantity of each cell in a data packet mode and obeying the arrival rate of lambda _i I =1,2,.., poise of NLoose distribution;

Wherein

2c2) Establishing an optimization objective function for maximizing service guarantee rate and minimizing service overtime failure rate according to the data packet arrival condition of each cell:

wherein, P ₁ To maximize the rate of service guarantees, P, for intra-cluster services ₂ In order to minimize the cluster service timeout failure rate, T is the set of all decision moments of the beam hopping satellite in the same area coverage time range, N is the total wave position cell number in the cluster,

data indicating failure of data packet with destination address being wave bit cell n due to overtime waitingThe amount of the package;

2c3) According to the one-to-one correspondence relationship between the working beams and the clusters, beam constraint conditions are established:

wherein the content of the first and second substances,

indicating illuminated, whereas it is not illuminated;

2c4) Determining the maximum capacity L of each wave bit cell buffer queue on the satellite, and establishing a buffer constraint condition:

wherein the content of the first and second substances,

2c5) Combining the optimized objective function of maximizing the service guarantee rate and minimizing the service overtime failure rate in 2c 2), the beam constraint condition in 2c 3) and the cache constraint condition in 2c 4) to obtain a time delay tolerance-based beam hopping scheduling model:

s.t.C ₁ :

C ₂ :

7. the method according to claim 1, wherein (3) regarding the beam dynamic scheduling model established in each cluster as a markov decision process, and performing solution according to deep reinforcement learning to obtain a beam scheduling result, which is implemented as follows:

3a) The state in the deep reinforcement learning algorithm is designed as the current time slot, and the number matrix of the data packets to be transmitted with different residual delay tolerances in each wave position queue is as follows:

wherein the content of the first and second substances,

is t _j The state at the time of the time slot,

is a two-dimensional state matrix;

3b) The actions in the design deep reinforcement learning algorithm are to select the irradiated wave position cell for the current time slot:

3c) The reward in the deep reinforcement learning algorithm is designed as the difference between the processing packet number and the failure packet number of the current time slot:

wherein the content of the first and second substances,

is t _j The reward obtained after the time slot has performed the action,

the number of the packets which are always overtime and invalid in the current time slot;

3d) And taking the working beam as an agent and the arrival condition of the data packet in the cluster as an environment, and executing a deep reinforcement learning algorithm to obtain an optimization result.