CN114629547A

CN114629547A - High-throughput beam hopping scheduling method for differentiated services

Info

Publication number: CN114629547A
Application number: CN202210273871.5A
Authority: CN
Inventors: 白卫岗; 刘聪俐; 李建东; 史琰; 周笛; 李浩然; 朱彦
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-03-19
Filing date: 2022-03-19
Publication date: 2022-06-14
Anticipated expiration: 2042-03-19
Also published as: CN114629547B

Abstract

The invention provides a high-throughput beam hopping scheduling method for differentiated services, which mainly solves the problems of low utilization rate of on-satellite resources and high computational complexity in the prior art. The implementation scheme is as follows: dividing wave position cells in a satellite coverage range into different clusters; establishing a clustering model for load balancing among clusters according to the load balancing principle among clusters and the geographical position near principle of the wave position cell in the cluster: solving the clustering model with balanced load among clusters by using an immune algorithm to obtain a clustering result; determining the delay tolerance taking the hopping time slot as a unit according to the requirements of different service types on the delay; establishing a hopping wave beam scheduling model based on delay tolerance: and setting a beam dynamic scheduling model in each divided cluster, regarding the beam dynamic scheduling model as a Markov decision process, and solving according to a deep reinforcement learning algorithm to obtain a beam scheduling result. The invention can improve the resource utilization rate and reduce the calculation complexity while ensuring the throughput, and can be used for satellite resource allocation.

Description

High-throughput beam hopping scheduling method for differentiated services

Technical Field

The invention relates to the field of satellite communication, in particular to a beam hopping satellite beam clustering scheduling method which can be used for reasonably allocating satellite resources to cells with different service requirements in a scene with fast service requirement dynamic change in a satellite coverage area.

Background

The early single beam is mostly a global beam or an area beam, the beam width is wide, and the antenna gain is small. In order to cope with the rapid increase in traffic demand, multi-beam technology based on spot beams is employed. The wave beam is narrow, so the gain is high, and the frequency reuse technology is added, so the system capacity of the satellite is obviously improved, and more services are served. However, with the rapid development of communication and internet of things, terrestrial services exhibit the characteristic of uneven space-time distribution, and particularly, the high dynamics of low-orbit satellites face more uneven services. In order to solve the problem, improve the resource utilization rate, avoid the condition of 'uneven strain' as much as possible, and provide a beam hopping technology, thereby further improving the system capacity. However, in the aspect of resource allocation, the existing beam hopping technology is usually designed based on the heterogeneity of the ground service requirements, and focuses on finding the optimal throughput to meet the capacity requirements of different areas, but does not consider the factors such as the service type and the time delay. In actual process, the service delay requirements of different users are different. In order to guarantee user experience, the requirements of throughput and service delay need to be considered comprehensively. In terms of clustering, for the sake of simplifying system considerations and improving resource utilization, uniform clustering, uniform power allocation, and full frequency multiplexing are usually adopted, that is, the power allocated to each cluster is uniform and non-adjustable. Under the condition that the resources among clusters are the same, the non-uniformity of the service among the clusters is not considered, the clusters are uniformly clustered, and the overload or underload phenomenon can occur.

An improved hopping beam time slot allocation method under a clustering scene is disclosed in a patent document ' hopping wave pattern optimization method and device based on a time slot allocation algorithm and a storage medium ' (patent application number 201910675600.0, application publication number CN 110518956A) applied by China people's liberation army engineering university. The method allocates the time slot number for each cell in a pre-allocation mode, and then reallocates by using the same frequency interference distance threshold, thereby effectively eliminating the influence of interference on the signal quality while improving the system capacity. However, the method does not consider the service delay performance, so that the real-time service may fail due to timeout waiting.

A method for scheduling resources based on beam hopping is disclosed in the patent document applied by Shanghai Yuanxin satellite science and technology Co., Ltd (patent application No. 201811070246.0, application publication No. CN 109121147A). The method is characterized in that a satellite coverage area is represented by two three-dimensional matrixes. The first matrix is a user actual demand matrix, and a three-dimensional matrix is formed by adding a time dimension; the second matrix is a beam hopping service matrix, and a time dimension is also added. And obtaining a target matrix to be optimized by multiplying the two matrixes, and then solving. The method solves the problem that the satellite capacity is matched with the ground requirement in the fast moving scene of the low-orbit satellite. However, with the increase of the number of satellite beams and wave position cells, the search space of the optimization algorithm facing global scheduling is increased sharply, and the complexity of the algorithm is improved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a high-throughput beam hopping scheduling method for differentiated services, so that the throughput is guaranteed, the resource utilization rate is improved, and the calculation complexity is reduced.

The specific idea of the invention for realizing the above purpose is as follows: by establishing a load balancing clustering model based on an immune algorithm, wave bit cells in a satellite coverage range are divided into different clusters, so that the calculation complexity is reduced; and (3) completing beam resource scheduling by constructing a beam resource scheduling model facing to delay tolerance constraint and adopting a deep reinforcement learning method based on priority playback.

According to the above thought, the technical scheme of the invention comprises the following steps:

(1) and generating clustering results in the satellite coverage range:

1a) dividing wave position cells in a satellite coverage range into different clusters;

1b) establishing a clustering model for load balancing among clusters according to the load balancing principle among clusters and the geographical position near principle of the wave position cell in the cluster:

s.t.C₁:d_mn≤s,m∈M_i and n∈M_i

wherein P is an objective function for minimizing mean square error of load among clusters, the number of K clusters, N is the number of wave position cells, R_jTraffic demand, X, for cell j_ijRepresenting whether the wave position j belongs to a cluster i, R is the load mean of K clusters, C₁As a distance constraint, C₂To ensure that a single cell belongs to a single cluster constraint, m, n represent two cells in the same cluster, d_mnRepresenting the wave position center distance of M cells and n cells, s is the upper limit of the distance between two wave position centers in the cluster, M_iRepresents a set of wave position cells belonging to cluster i;

1c) solving the clustering model with balanced load among clusters by using an immune algorithm to obtain a clustering result;

(2) establishing a cluster beam dynamic scheduling model:

2a) determining the delay tolerance taking a hopping time slot as a unit for different service types according to the requirements of the service types on the delay;

2b) executing the process that different types of data packets wait to be issued in an on-satellite cache queue, wherein the corresponding delay tolerance is reduced by one every time a jump time slot passes, and the data packets are discarded when the on-satellite waiting delay exceeds the tolerance of the delay tolerance and are regarded as overtime failure;

2c) according to the principle of maximizing service guarantee rate and minimizing overtime failure rate of services, a hopping wave beam scheduling model based on delay tolerance is established:

wherein, P₁In order to maximize the service guarantee rate of the intra-cluster service, T is the set of all decision moments of the beam hopping satellite in the coverage time range of the same area, N is the total wave position cell number in the cluster,

is shown at t_jThe amount of packets sent to cell n after the end of the slot,

denotes a cutoff to t_jTime slot, the total amount of packets, P, received by the satellite for cell n₂To minimize intra-cluster traffic timeout failure rates,

amount of data packets indicating that the data packet whose destination address is the wave bit cell n has failed due to timeout waiting, C₁The number of cells for obtaining beam scheduling in each hopping time slot in the cluster is 1, C₂To ensure that the data packets on the satellite at the current time do not exceed the maximum limit,

is shown at t_jWhether slot-wave-bit cell n is illuminated by the operating beam,

indicating that the light is illuminated, whereas, not illuminated,

is shown at t_jAfter the time slot is finished, the number of data packets stored in a satellite memory corresponding to the wave position cell n is equal to L, and L is the maximum capacity of a cache queue of each wave position cell on the satellite;

(3) and (3) establishing the beam dynamic scheduling model in the step (2) in each cluster divided in the step (1), regarding the scheduling model problem as a Markov decision process, and solving according to deep reinforcement learning to obtain a beam scheduling result.

Compared with the prior art, the invention has the following advantages:

first, computational complexity is reduced: aiming at a satellite system with uniform power distribution and full frequency multiplexing, along with the increase of the number of satellite beams and wave position cells, the search space of an optimization algorithm facing global scheduling is increased sharply, and the algorithm complexity is improved; the load balancing clustering model established by the invention divides a complex task into a plurality of subtasks, so that the search space is reduced, and the computational complexity is reduced.

Secondly, the service guarantee rate of the system is improved: compared with the existing hop wave beam scheduling algorithm, the invention considers the difference of services in the scheduling process, provides a wave beam scheduling model based on time delay tolerance, and establishes an optimization problem by taking the maximized service guarantee rate and the minimized service overtime failure rate as a target function during specific operation, so that the service guarantee rate is improved and the service overtime failure rate is reduced while the system throughput is increased.

Description of the drawings:

FIG. 1 is a general flow chart of an implementation of the present invention;

FIG. 2 is a sub-flowchart of the present invention for solving a load balancing clustering model using an immune algorithm;

FIG. 3 is a wave level cell layout of the present invention;

FIG. 4 is a graph of clustering results in the present invention;

FIG. 5 is a state reconstruction diagram in the present invention;

FIG. 6 is a graph comparing the convergence rate of deep reinforcement learning of the present invention with the conventional global hopping algorithm;

FIG. 7 is a graph comparing normalized throughput of the present invention with different hopping algorithms in the prior art;

fig. 8 is a comparison graph of service coverage of the present invention with different hopping algorithms.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with examples are described in further detail below.

This example includes two major parts: the first part is to use an immune algorithm to generate a clustering result of load balancing, and the second part is to use a deep reinforcement learning algorithm to obtain a beam dynamic scheduling result.

Referring to fig. 1, the implementation steps of this example are as follows:

a first part: an immune algorithm is used to produce load-balanced clustering results.

Step 1, dividing wave position cells in a satellite coverage range into different clusters.

Initializing all cell service request volumes and geographic positions in the coverage area of the beam hopping satellite, wherein the cell service request volumes and the geographic positions comprise the relative geographic positions of 19 wave position cells in the coverage area, the service request volume of each cell and wave position cell numbers as shown in figure 3;

and determining the number of the divided clusters according to the number K of the working beams owned by the single satellite, and enabling the working beams to correspond to the clusters one by one, wherein each working beam is responsible for carrying out dynamic beam scheduling in one cluster.

In the embodiment, the satellite is provided with 3 working beams, 3 clusters are divided according to the one-to-one correspondence relationship between the single satellite working beams and the clusters, and each working beam is responsible for carrying out dynamic beam scheduling in one cluster.

And 2, establishing a cluster model for load balancing among clusters according to the load balancing principle among clusters and the principle that the geographical position of the wave position cell in the cluster is close.

2.1) determining the service request quantity of each wave position cell, and calculating the sum S of the service request quantities of all cells in the coverage area of the satellite;

2.2) calculating the load mean of K clusters

Establishing a load balancing optimization target:

wherein P is an objective function for minimizing mean square error of load among clusters, the number of K clusters, N is the number of wave position cells, R_jTraffic demand, X, for cell j_ijRepresenting whether the wave position j belongs to the cluster i or not, wherein R is the load mean value of K clusters;

2.3) determining the distance upper limit s of the two wave position centers in the cluster, and establishing a distance constraint condition:

d_mn≤s,m∈M_i and n∈M_i

where m, n represent two wave site cells in the same cluster, d_mnRepresenting the wave position center distance, M, of M and n cells_iRepresenting the set of wave-bit cells belonging to cluster i.

2.4) combining the load balance optimization target in 2.2) with the distance constraint condition in 2.3) to obtain a clustering model of load balance among clusters:

wherein P is an objective function for minimizing mean square error of load among clusters, K is the number of clusters, and N is a wave position cellThe number of (2), R_jTraffic demand, X, for cell j_ijRepresenting whether the wave position j belongs to a cluster i, R is the load mean of K clusters, C₁As a distance constraint, C₂To ensure that a single cell belongs to a single cluster constraint, m, n represent two cells in the same cluster, d_mnRepresenting the wave position center distance of M cells and n cells, s is the upper limit of the distance between two wave position centers in the cluster, M_iRepresenting the set of wave-bit cells belonging to cluster i.

And 3, solving the clustering model with the load balance among the clusters by using an immune algorithm to obtain a clustering result.

Referring to fig. 2, the specific implementation of this step is as follows:

3.1) initializing each cluster central wave position population and a memory base, namely setting iteration times N according to the wave position cell number and the division cluster number_ePopulation size S, memory pool capacity O, crossover probability P_cProbability of mutation P_mDiversity evaluation parameter P_sSetting the current iteration number n to be 0; randomly generating M initial clustering center antibody populations, wherein M is 35 in the embodiment;

3.2) designing an affinity function for the clustering model:

wherein A is_vFor the affinity function, P is an objective function P that minimizes the mean square error of the load between clusters, C is a penalty constant for solutions that do not satisfy the distance requirement, and Y denotes that the distance constraint C is not satisfied₁The number of wave bits;

3.3) calculating the fitness function values of all individuals according to the fitness function expression in 3.2);

3.4) calculating the reproduction rate and antibody concentration of all individuals:

3.4.1) calculate the affinity between two antibodies:

wherein S is_v,sFor affinity between antibodies, k_v,sIndicates the same number of bits of the antibody s and the antibody v, and L indicates the length of the antibody;

3.4.2) calculating antibody concentration based on the results of 3.4.1):

wherein, C_vThe concentration of the antibody, N is the total number of the antibody, and T is a preset threshold value;

3.4.3) calculating the reproduction rate according to the fitness function and the antibody concentration:

wherein P is the reproduction rate, alpha is a constant, A_vAs a fitness function value, C_vIs the antibody concentration;

3.5) adding individuals with high fitness and low antibody concentration into a memory bank according to the reproduction rate, and taking the first S individuals to form a parent group;

3.6) selecting individuals by adopting a selection mechanism of roulette to carry out crossing and variation to obtain a new group, and taking out part of individuals from the memory bank to form the new group together with the new group;

3.7) judging whether the maximum iteration number N is reached_e：

If the iteration number N is reached_eThen outputting the optimal clustering result A,

otherwise, repeat 3.3) -3.6) until the number of iterations N is reached_eAnd outputting an optimal clustering result A.

The clustering result obtained in this embodiment is shown in fig. 4, where the wave site cells of the same color are a cluster.

A second part: and obtaining a beam dynamic scheduling result by using a deep reinforcement learning algorithm.

And step 4, determining the delay tolerance taking the hopping time slot as a unit according to the requirements of different service types on the delay.

4.1) calculating the time delay of the data packet in the transmission process before reaching the destination satellite:

T_{delay_1}＝T_prop+T_trans

wherein, T_{delay_1}Total time delay, T, experienced before reaching the destination satellite_propFor propagation delay, T_transIs the transmission delay;

4.2) estimating the transmission time delay T of the data packet transmitted from the target satellite to the user terminal_{delay_2}；

4.3) determining the time delay limit T in the QoS guarantee of the service type of the data packet_limitDetermining the length BH of a beam hopping slot_slotAnd calculating the residual delay tolerance of the data packet:

wherein D is_toleFor residual delay tolerance, T_{delay_1}The total delay experienced during transmission before reaching the destination satellite. In this embodiment, the service type in 3 is set, and the remaining delay tolerance is 2, 4, and 20, respectively.

And 5, continuously reducing the residual delay tolerance along with time.

And executing the process that different types of data packets wait to be issued in the on-satellite cache queue, wherein the corresponding delay tolerance is reduced by one every time a jump time slot passes, and when the on-satellite waiting delay of the data packets exceeds the tolerance, the data packets are discarded and regarded as overtime failure.

And 6, establishing a hopping wave beam scheduling model based on the delay tolerance according to the principles of maximizing the service guarantee rate and minimizing the overtime failure rate of the service.

6.1) characterizing intra-cluster beam hopping system scenarios:

setting N in a working beam service cluster as wave position cells, expressing the service request quantity of each cell in a data packet mode and obeying the arrival rate of lambda_iPoisson distribution of N, i ═ 1,2, ·;

representing the amount of data stored in the satellite memory corresponding to each wave bit cell as

Wherein

Is shown at t_jAfter the time slot is finished, the number of data packets stored in a satellite memory corresponding to the nth wave position cell;

6.2) establishing an optimization objective function for maximizing the service guarantee rate of the service and minimizing the overtime failure rate of the service according to the data packet arrival condition of each cell:

wherein, P₁To maximize the rate of service guarantees, P, for intra-cluster services₂In order to minimize the cluster service timeout failure rate, T is the set of all decision moments of the beam hopping satellite in the same area coverage time range, N is the total wave position cell number in the cluster,

is shown at t_jThe amount of packets sent to cell n after the end of the slot,

denotes a cutoff to t_jThe slotted satellite receives the total amount of packets destined for cell n,

the data packet quantity of the data packet with the destination address as the wave bit cell n, which is invalid due to overtime waiting, is represented;

6.3) establishing a beam constraint condition according to the one-to-one correspondence relationship between the working beams and the clusters:

wherein the content of the first and second substances,

indicating illuminated, whereas it is not illuminated;

6.4) determining the maximum capacity L of each wave bit cell buffer queue on the satellite, and establishing a buffer constraint condition:

wherein the content of the first and second substances,

6.5) combining the optimization objective function of maximizing the service guarantee rate and minimizing the service overtime failure rate in 6.2) with the beam constraint condition in 6.3) and the buffer constraint condition in 6.4) to obtain a hopping beam scheduling model based on the delay tolerance:

wherein, C₁For the beam constraint condition, ensuring that the number of cells obtaining beam scheduling in each hopping time slot in the cluster is 1, C₂And the data packet on the satellite at the current moment is not beyond the maximum limit for the buffer constraint condition.

Step 7, in each cluster of the first part, establishing the beam scheduling model in 6.5), and regarding each scheduling model problem as a Markov decision process.

The Markov decision process comprises the design of state, action and reward, the optimization objective function of maximizing service guarantee rate and minimizing service overtime failure rate in the beam scheduling model is converted into reward, the beam constraint condition in the beam scheduling model is converted into action, and the data packet arrival condition of each wave position cell with delay tolerance is converted into state, and the method is specifically realized as follows:

7.1) the design state is that the number matrix of the data packets to be transmitted with different residual delay tolerances in each wave bit queue of the current time slot is as follows:

wherein the content of the first and second substances,

is t_jThe state at the time of the time slot,

is a two-dimensional state matrix;

the matrix

The reconstruction is obtained by reconstructing the arrival condition of the data packet in each wave bit buffer queue in the current time slot cluster, and the specific state reconstruction process is shown in fig. 5, wherein a wave bit n represents a cell with the number of n in the cluster, and T is_thFor the maximum delay tolerance, t, of all traffic types_jFor the time slot that the data packet experiences in the waiting process, the "x", "o" and "Δ" respectively represent the arrival of three different types of service data packets, and the number of the data packets in each wave bit queue in the current time slot cluster shown in the left diagram of fig. 5 is divided according to the type to obtain a matrix of the right diagram

The row number represents the residual delay tolerance, the column number represents the wave bit cell number, and the value in the row b column of the matrix a represents the number of data packets with the residual delay tolerance a in the wave bit cell b;

7.2) design action selects the illuminated wave position cell for the current time slot:

wherein the content of the first and second substances,

is t_jAction of selection in time slot, x_nIndicating whether the wave position is irradiated by the working beam in the time slot or not, wherein N is the total number of wave position cells in the cluster;

7.3) designing the reward as the difference between the number of processing packets and the number of failure packets of the current time slot:

wherein the content of the first and second substances,

is t_jAfter the time slot executes the actionThe benefit to be obtained is that the user has,

indicating the total number of packets processed by the system after the current time slot selection action,

the number of the packets which are failed by the total timeout of the current time slot.

And 8, solving by utilizing deep reinforcement learning according to the Markov decision process design in the step 7 to obtain a beam scheduling result.

8.1) initialization parameters:

8.1.1) initializing scene parameters in the beam hopping satellite cluster:

confirming wave position cell parameters and wave beam parameters in each cluster according to the clustering result obtained in the step 3; the cluster wave position cell parameters comprise the serial number of the cluster cell, the service request quantity of the cluster cell, the service type and the data packet size; the beam parameters comprise single beam working bandwidth and single beam power;

8.1.2) initializing deep reinforcement learning parameters:

setting a training period M, the time slot number T of each period, a learning rate alpha and an experience pool capacity N by taking a working beam as an agent and the arrival condition of a data packet in a cluster as an environment_epBulk data size N_bDiscount factor gamma, network update frequency C, current network Q, target network

A greedy factor epsilon;

8.2) initializing the State of the Current Environment to s_tUpdating the greedy factor epsilon;

8.3) reacting s_tAs the input of Q network, obtaining Q value output corresponding to the used action of Q network, and selecting action a by epsilon-greedy method_t；

8.4) in state s_tLower execution action a_tTo obtain a new state s_t+1And a prize r_t；

8.5) mixing(s) of 8.4)_t,a_t,r_t,s_t+1) Storing the current environment into an experience pool, and updating the current environment to be s_t+1；

8.6) sampling N from the experience pool_bTraining the Q network by samples, and updating the Q network by using a gradient descent method;

8.7) judging whether the current time slot t reaches the network updating frequency:

if t modC is 1, i.e. the remainder of t integer division C is 1, then update

Network, otherwise, not updating

A network;

8.8) judging whether the current time slot T reaches the time slot number T of each period:

if T is equal to T, judging whether the current iteration round number M reaches a training period M:

if M is equal to M, the iteration is terminated, and a scheduling model with the training finished is output;

otherwise, repeating 8.2) -8.7), and continuing training;

if T ≠ T, repeat 8.3) -8.7), continue training.

The effect of the present invention can be further illustrated by the following simulation results:

firstly, simulation conditions:

simulation parameters: setting 3 satellite working beams, 19 satellite coverage wave digits, 100MHz single beam working bandwidth, 70W single beam power, 20ms beam hopping time slots, 20kbits of data packet size, 2, 4 and 20 delay tolerance of three services, 20 population scale, 100 iteration times of an immune algorithm, 0.95 diversity evaluation parameter, 0.5 cross probability, 0.4 variation probability, 15 memory bank capacity, 600 cycles of deep reinforcement learning training cycle, 1000 time slots per cycle, 0.00001 learning rate, 100000 empirical pool capacity, 32 batch data size, 0.9 discount factor, 20 updating step length, 1 initial exploration rate and 0.01 final exploration rate.

The simulation environment is as follows: MATLABR2018b, python 3.6.

Second, simulation content and results

Simulation 1: the delay tolerance-based beam hopping clustering scheduling method and the deep reinforcement learning convergence rate of the existing global hopping algorithm are simulated respectively, the result is shown in fig. 6, as can be seen from fig. 6, the convergence is started in iteration 400, but the convergence is started only in 1200 according to the global scheduling method, and the scheduling method of the invention improves twice the convergence rate compared with the global scheduling and reduces the computational complexity.

Simulation 2: the time delay tolerance-based beam hopping clustering scheduling method is simulated with the normalized throughput of the existing longest queue priority, polling and random scheduling hopping algorithm, the result is shown in fig. 7, and as can be seen from fig. 7, when the supply-demand ratio is 110%, the throughput of the invention is respectively improved by 6%, 10% and 15% compared with the longest queue priority, polling and random scheduling algorithm.

Simulation 3: the delay tolerance-based beam hopping clustering scheduling method provided by the invention is simulated with the service guarantee rates of the existing longest queue priority, polling and random allocation hopping algorithm, the result is shown in fig. 8, and as can be seen from fig. 8, when the supply-demand ratio is 110%, the service demand guarantee rates of the delay tolerance-based beam hopping clustering scheduling method provided by the invention are respectively improved by 7%, 11% and 15% compared with the service demand guarantee rates of the longest queue priority, polling and random allocation algorithm.

The foregoing description is only an example of the present invention and is not intended to limit the invention, so that it will be apparent to those skilled in the art that various changes and modifications in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A high throughput beam hopping scheduling method for differentiated services is characterized by comprising the following steps:

(1) and generating clustering results in the satellite coverage range:

1a) dividing wave bit cells in a satellite coverage range into different clusters;

s.t.C₁:d_mn≤s,m∈M_i and n∈M_i

(2) establishing a cluster beam dynamic scheduling model:

2c) according to the principle of maximizing service guarantee rate and minimizing overtime failure rate of the service, establishing a hopping wave beam scheduling model based on delay tolerance:

is shown at t_jThe amount of packets sent to cell n after the end of the slot,

denotes a cutoff to t_jTime slot, total amount of packets received by the satellite for cell n, P₂To minimize intra-cluster traffic timeout failure rates,

indicating that the light is illuminated, whereas, not illuminated,

2. The method of claim 1, wherein 1a) the wave position cells within the coverage area of the satellite are divided into different clusters, the number of the divided clusters is determined according to the number of the operating beams K owned by a single satellite, the operating beams are in one-to-one correspondence with the clusters, and each operating beam is responsible for performing dynamic beam scheduling in one cluster.

3. The method according to claim 1, wherein in 1b), a clustering model for load balancing among clusters is established according to the principle of load balancing among clusters and the principle of geographical location proximity of a wave site cell in a cluster, and the following is achieved:

1b1) determining the service request quantity of each wave position cell, and calculating the sum S of the service request quantities of all cells in the coverage area of the satellite;

1b2) calculating the load mean of K clusters

Establishing a load balancing optimization target:

1b3) determining the distance upper limit s of the centers of two wave positions in the cluster, and establishing a distance constraint condition:

d_mn≤s,m∈M_i and n∈M_i

1b4) Combining the load balancing optimization goal in 1b2) with the distance constraint condition in 1b3) to obtain a clustering model of load balancing among clusters:

s.t.C₁:d_mn≤s,m∈M_i and n∈M_i。

4. the method according to claim 1, wherein 1c) the clustering model for load balancing among clusters is solved by using an immune algorithm, and the following is realized:

1c1) initializing a central wave position population and a memory library of each cluster;

1c2) for the clustering model, an affinity function is designed:

1c3) arranging the population in descending order according to the affinity function in 1c2), selecting the first H individuals to form a parent population, sequentially performing selection, crossing and mutation operations to obtain a new population, taking out partial individuals from the memory bank to form a new generation population together,

1c4) repeat 1c3) for a plurality of iterations until a maximum number of iterations N is reached_eAnd obtaining an optimal solution.

5. The method of claim 1, wherein in 2a), the delay tolerance in units of hopping slots is determined according to the requirements of different service types on delay, and is implemented as follows:

2a1) calculating the time delay of the data packet in the transmission process before reaching the target satellite:

T_{delay_1}＝T_prop+T_trans

2a2) estimating the transmission delay T of data packet from destination satellite to user terminal_{delay_2}；

2a3) Determining time delay limit T in QoS guarantee of service type of data packet_limitDetermining the length BH of a beam hopping slot_slotAnd calculating the residual delay tolerance of the data packet:

wherein D is_toleFor residual delay tolerance, T_{delay_1}Total experienced during transmission before reaching destination satelliteAnd (4) time delay.

6. The method of claim 1, wherein 2c) according to the principles of maximizing service guarantee rate and minimizing service timeout failure rate, a time delay tolerance-based beam hopping scheduling model is established as follows:

2c1) characterizing a cluster internal hopping beam system scenario:

Wherein

2c2) establishing an optimization objective function for maximizing service guarantee rate and minimizing service overtime failure rate according to the data packet arrival condition of each cell:

is shown at t_jThe amount of packets sent to cell n after the end of the slot,

the data packet quantity which indicates that the data packet with the destination address of the wave bit cell n is invalid due to overtime waiting is represented;

2c3) according to the one-to-one correspondence relationship between the working beams and the clusters, beam constraint conditions are established:

wherein, the first and the second end of the pipe are connected with each other,

indicating illuminated, whereas it is not illuminated;

2c4) determining the maximum capacity L of a buffer queue of each wave bit cell on the satellite, and establishing a buffer constraint condition:

wherein the content of the first and second substances,

2c5) combining the optimized objective function of maximizing the service guarantee rate and minimizing the service timeout failure rate in 2c2) with the beam constraint condition in 2c3) and the buffer constraint condition in 2c4), obtaining a hop beam scheduling model based on the delay tolerance:

7. the method according to claim 1, wherein (3) considering the beam dynamic scheduling model established in each cluster as a markov decision process, and performing solution according to deep reinforcement learning to obtain a beam scheduling result, which is implemented as follows:

3a) the state in the deep reinforcement learning algorithm is designed as the current time slot, and the number matrix of the data packets to be transmitted with different residual delay tolerances in each wave bit queue is as follows:

wherein the content of the first and second substances,

is t_jThe state at the time of the time slot,

is a two-dimensional state matrix;

3b) the actions in the design deep reinforcement learning algorithm are to select the irradiated wave position cell for the current time slot:

wherein the content of the first and second substances,

3c) the reward in the deep reinforcement learning algorithm is designed as the difference between the processing packet number and the failure packet number of the current time slot:

wherein the content of the first and second substances,

is t_jThe reward obtained after the time slot has performed the action,

the number of the packets which are invalid in the total overtime of the current time slot;

3d) and taking the working beam as an agent and the arrival condition of the data packet in the cluster as an environment, and executing a deep reinforcement learning algorithm to obtain an optimization result.