CN115001611B

CN115001611B - Resource allocation method of beam hopping satellite spectrum sharing system based on reinforcement learning

Info

Publication number: CN115001611B
Application number: CN202210542515.9A
Authority: CN
Inventors: 任品毅; 吴镇国; 徐东阳; 鲁磊
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2023-09-26
Anticipated expiration: 2042-05-18
Also published as: CN115001611A

Abstract

The invention discloses a resource allocation method of a beam hopping satellite spectrum sharing system based on reinforcement learning, which comprises the following steps: solving an optimization problem based on reinforcement learning to obtain a Q value table, wherein the optimization problem is constructed by taking the throughput of a maximum LEO satellite system as a target; and performing resource allocation on each LEO user according to the Q value table to finish the resource allocation of the beam hopping satellite spectrum sharing system based on reinforcement learning, wherein the method can improve the throughput of the LEO satellite system.

Description

Resource allocation method of beam hopping satellite spectrum sharing system based on reinforcement learning

Technical Field

The invention belongs to the field of beam hopping satellite systems, and relates to a resource allocation method of a beam hopping satellite spectrum sharing system based on reinforcement learning.

Background

With the development of satellite communications, competition between countries for spectrum resources has become a necessary trend. In order to meet the demand of satellite communication on non-renewable spectrum resources, spectrum sharing becomes an important field of academic research, satellites negotiate with each other by exchanging ephemeris information, and a proper spectrum sharing method is adopted to improve spectrum utilization rate. Most of the existing spectrum sharing methods utilize space isolation to set angles, reduce co-frequency interference to realize spectrum sharing by switching beams, or limit the transmitting power of satellite beams by self-adaptive power control so as to reduce co-frequency interference to realize spectrum sharing, but the former needs to realize angle adjustment by optimizing phased array antennas, has higher cost and cannot be applied to satellite systems on a large scale, and the latter limits the transmitting power of satellite beams to have a certain influence on the throughput of satellites.

In order to effectively utilize spectrum resources, spectrum sharing is realized, and a beam hopping technique becomes an applicable means. The beam hopping technology uses a small number of beams to perform time-sharing coverage, and based on the time slicing technology, only part of spot beams on a satellite are in a working state at a certain specific moment, so that the same-frequency interference can be reduced from the time isolation angle, and the part of working spot beams can fully utilize resources.

In the hopping beam spectrum sharing system, the capacity of a satellite beam channel and the service condition of a user need to be comprehensively considered during resource allocation, so that the resource allocation problem of the hopping beam spectrum sharing system is a complex and changeable problem. The current resource allocation algorithm cannot adapt to complex environments, has insufficient flexibility and has a certain influence on the throughput of the system.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a resource allocation method of a beam hopping satellite spectrum sharing system based on reinforcement learning, which can improve the throughput of an LEO satellite system.

In order to achieve the above object, the resource allocation method of the reinforcement learning-based beam hopping satellite spectrum sharing system according to the present invention comprises the following steps:

solving an optimization problem based on reinforcement learning to obtain a Q value table, wherein the optimization problem is constructed by taking the throughput of a maximum LEO satellite system as a target;

and performing resource allocation on each LEO user according to the Q value table to complete the resource allocation of the beam hopping satellite spectrum sharing system based on reinforcement learning.

The method specifically comprises the following steps:

establishing an optimization problem by taking the throughput of the maximum LEO satellite system as a target;

calculating coordinates of the GEO satellite, the LEO satellite and the user under a geocentric earth fixed coordinate system;

calculating off-axis angle, antenna gain, channel gain and interference gain;

setting signal-to-interference-and-noise ratios of LEO users and GEO users under different time slots when the LEO satellite works on the wave beam;

introducing reinforcement learning to solve the optimization problem, initializing a Q value, a return function and state information, selecting LEO users for resource allocation according to a greedy strategy, calculating the return function, counting the signal-to-interference-and-noise ratio, observing the next state, updating the Q value according to a Q value updating formula, and continuing to repeatedly operate the next time slot until a Q value table tends to converge;

and performing resource allocation on each LEO user according to the obtained Q value table, and completing the resource allocation of the beam hopping satellite spectrum sharing system based on reinforcement learning.

With the aim of maximizing the throughput of the LEO satellite system, the established optimization problem is:

maxR _LEO

wherein ,γ_th Is GEO user signal quality threshold, K is cluster number, P _tot For satellite total power.

The antenna gain is:

wherein ,G_max For maximum gain of antenna, J ₁ (. Cndot.) and J ₃ (. Cndot.) are first-order Bessel function and third-order Bessel function, sin theta _3dB And the angle theta is the transmission angle between the satellite transmitting antenna and the interfered user, and is the 3dB angle of the antenna.

The channel gain is:

where L is the path loss.

The interference gain is:

wherein L is path loss, θ is the transmitting angle between the satellite transmitting antenna and the interfered user, and γ is the receiving angle between the user and the interfered satellite.

The signal-to-interference-and-noise ratios of the LEO user and the GEO user are respectively as follows:

wherein , and />The transmission power of LEO satellite and GEO satellite respectively, < >>Bandwidth overlap for LEO user and GEO user,/-for>Bandwidth overlap for GEO user and LEO user,/-for>For elements of the LEO hop beam working matrix, indicating whether the LEO beam v is activated in time slot j, or +> and />Is additive Gaussian white noise power, +.> and />For the channel gain +.> and />Is the interference gain.

The Q value update formula is:

wherein ,is environmental status information, a _j Cell selected for beam hopping service for LEO satellites in current time slot, R _j+1 Is a return function.

And obtaining a resource allocation decision scheme according to the Q value table, and then carrying out resource allocation on each LEO user according to the resource allocation decision scheme.

The invention has the following beneficial effects:

the resource allocation method of the beam hopping satellite spectrum sharing system based on reinforcement learning aims at maximizing the throughput of the LEO satellite system during specific operation, establishes an optimization problem, uses reinforcement learning, comprehensively considers the current traffic flow distribution situation, considers the on-board resource situation, and considers the internal connection of resource allocation, so that the throughput of the LEO satellite system is increased under the condition of ensuring the communication service quality of GEO users, fully utilizes resources and improves the resource utilization rate.

Drawings

FIG. 1 is a schematic diagram of a system architecture of the present invention;

FIG. 2 is a graph of GEO user carrier-to-interference-and-noise ratio of the present invention;

FIG. 3 is a graph of GEO user carrier-to-interference and noise ratio in a conventional resource allocation manner;

FIG. 4 is a graph showing the throughput of the present invention compared to a conventional resource allocation scheme;

fig. 5 is a graph showing the influence of the number of reinforcement learning training times on throughput fluctuation in the present invention.

Detailed Description

In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments, but not intended to limit the scope of the present disclosure. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

In the accompanying drawings, there is shown a schematic structural diagram in accordance with a disclosed embodiment of the invention. The figures are not drawn to scale, wherein certain details are exaggerated for clarity of presentation and may have been omitted. The shapes of the various regions, layers and their relative sizes, positional relationships shown in the drawings are merely exemplary, may in practice deviate due to manufacturing tolerances or technical limitations, and one skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions as actually required.

Referring to fig. 1, there are three main devices in the system, namely a GEO satellite, an LEO satellite and cells, wherein the GEO satellite continuously performs data transmission on cells 1-7, and the LEO satellite serves 49 cells in a moving process in a staring beam manner, wherein 7 cells overlap with the GEO satellite, and the LEO satellite exchanges ephemeris information with the GEO satellite, and reasonably designs a beam hopping schedule by calling a beam hopping resource allocation algorithm under the condition of acquiring the spectrum arrangement of the beam of the GEO satellite; and receiving the requested service data by the terminals in the cell according to the beam hopping schedule received by the satellite.

The resource allocation method of the jumping beam spectrum sharing system based on reinforcement learning comprises the following steps:

1) With the aim of maximizing LEO satellite system throughput, an optimization problem is established, namely

maxR _LEO

wherein ,for the signal quality of GEO user n in time slot j, γ _th For GEO user signal quality threshold, T _ij Allocating an indicator to a slot, T _ij =1, i.e. user i gets the resource allocation in slot j, K is the cluster number, P _i ^j For the power of beam i in slot j, P _tot For satellite total power.

2) Let the coordinates of the GEO satellite in the geocentric and geodetic fixed coordinate system under the time slot j be c_geo ^j The LEO satellite has a coordinate of c_ LEO in the geocentric and geodetic fixed coordinate system in time slot j ^j The GEO user u coordinates in the geocentric and geodetic fixed coordinate system in time slot j areLEO user v has a coordinate +.>

3) Let the off-axis angle of GEO satellite in LEO user direction at time slot j beThe off-axis angle of the LEO user in the GEO satellite direction is +.>The off-axis angle of the LEO satellite in the GEO user direction at time slot j is +.>The off-axis angle of the GEO user in the LEO satellite direction is +.> wherein ：

by antenna gain expressionIt can be seen that the off-axis angle affects the antenna gain, where G _max For maximum gain of antenna, J ₁ (. Cndot.) and J ₃ (. Cndot.) are respectively first-order Bessel function and third-order Bessel function, sin theta _3dB Is the 3dB angle of the antenna.

Let the interference gain from GEO satellite to LEO user v beThe interference gain of LEO satellite to GEO user u is +.>The channel gain from GEO satellite to GEO user u is +.>The channel gain from LEO satellite to LEO user v is +.> wherein ：

wherein ,L_sg→el 、L _sl→eg 、L _sg→eg 、L _sl→el The path loss from GEO satellite to LEO user, the path loss from LEO satellite to GEO user, the path loss from GEO satellite to GEO user, the path loss from LEO satellite to LEO user,beta is the connection angle between the direction of the GEO transmitting antenna and the transmitting end and the corresponding receiving end of the GEO user ^j Is the included angle between the direction of the LEO transmitting antenna and the connecting line between the transmitting end and the corresponding receiving end of the LEO user.

4) Is provided with and />Signal-to-interference-and-noise ratio for LEO users and GEO users at different time slots when LEO satellites are operating on all beams, wherein

wherein ,is->LEO satellite transmitting power and GEO satellite transmitting power respectively, < ->For the bandwidth overlap of LEO user and GEO user,>for the bandwidth overlap of GEO user and LEO user,>for elements of the LEO hop beam working matrix, indicating whether the LEO beam v is activated in time slot j, or +> and />Is an additive white gaussian noise power.

5) Is provided withFor the environmental status information, set action a _j Cell and +.>In relation, let Q be Q(s) _j ,a _j ) Represented as in state s _j Lower selection action a _j The update rule of the Q value of (2) is as follows: /> wherein ,R_j Is a return function;

wherein ,r_a R _b Is positive number and negative number with larger absolute value, gamma ₀ And when the threshold value is the signal-to-interference-and-noise ratio threshold value of the GEO user, the Q value, the return function value and the state information are 0 during initialization.

6) Selecting action a according to a greedy policy _j The greedy policy is:

wherein epsilon is the probability of random selection action, and LEO satellite serves the user to obtain a corresponding return value R _j Updating state information s _j+1 And obtaining a Q value according to a Q value updating formula, and continuing to repeatedly operate the next time slot until the system convergence is obtained to obtain a complete Q value table, so as to obtain a resource allocation decision scheme under all time slots.

In summary, the specific process of the invention is as follows:

1) Establishing an optimization problem with the maximum throughput of the LEO satellite system as a target;

2) Calculating coordinates of the GEO satellite, the LEO satellite and the user under a geocentric fixed coordinate system, and determining off-axis angles, antenna gains, channel gains and interference gains of the satellite and the user;

3) Setting signal-to-interference-and-noise ratios of LEO users and GEO users under different time slots when the LEO satellite works on the wave beam;

4) Initializing a Q value, a return function and state information, selecting LEO users for resource allocation according to a greedy strategy, calculating the return function, counting the signal-to-interference-and-noise ratio, observing the next state, and updating according to a Q value updating formula;

5) Repeating the operation on the next time slot until the Q value table tends to converge;

6) And acquiring a resource allocation decision scheme under each time slot according to the Q value table, and allocating resources to each LEO user to complete the resource allocation of the reinforcement learning-based jump beam spectrum sharing system.

Fig. 2 is a carrier-to-interference-and-noise ratio diagram of a GEO user in the present invention, and it can be seen that the present invention ensures the information transmission quality of the GEO user in the frequency sharing process, and the information transmission of the GEO user is not affected.

Fig. 3 is a graph of the carrier-to-interference-and-noise ratio of a GEO user in a multi-beam satellite fixed allocation method, and it can be seen that in the spectrum sharing process, the information transmission quality of the GEO user is not considered to be guaranteed, so that the GEO user is affected by the same-frequency interference of the LEO satellite.

Fig. 4 is a graph comparing the throughput of the LEO satellite with that of the multi-beam satellite fixed distribution method according to the present invention, and it can be seen that the overall throughput performance of the satellite system is better, and the resource utilization rate is improved.

Fig. 5 is a graph showing the influence of the training frequency on the throughput fluctuation in the present invention, and it can be seen that as the training frequency increases, the variance of the throughput fluctuation also tends to be stable, so that a proper training frequency can be selected, and the influence caused by randomness can be reduced.

Claims

1. The resource allocation method of the beam hopping satellite spectrum sharing system based on reinforcement learning is characterized by comprising the following steps of:

performing resource allocation on each LEO user according to the Q value table to finish the resource allocation of the beam hopping satellite spectrum sharing system based on reinforcement learning;

the method specifically comprises the following steps:

calculating off-axis angle, antenna gain, channel gain and interference gain;

2. The method for resource allocation of a reinforcement learning-based beam hopping satellite spectrum sharing system according to claim 1, wherein the established optimization problem is aimed at maximizing throughput of the LEO satellite system:

maxR _LEO

wherein ,R_LEO For LEO satellite system throughput,for the signal-to-interference-and-noise ratio of GEO user n in time slot j, gamma _th For GEO user signal quality threshold, T _ij Allocating an indicator in a matrix for time slots, T _ij =1 means that LEO user i obtains LEO satellite beam for information transmission in time slot j, T _ij =0 indicates that LEO user i does not acquire LEO satellite beam for information transmission in time slot j, K is the cluster number, P _i ^j Satellite power, P, obtained for LEO user i in time slot j _tot For total satellite power, N _LEO For LEO user number, i is LEO user i, i e [1, N _LEO ]J is the time slot j, n is the GEO user n, n E [1 ],N _GEO ]，N _GEO Is the number of GEO users.

3. The resource allocation method of the reinforcement learning-based beam hopping satellite spectrum sharing system according to claim 1, wherein the antenna gain is:

4. The method for resource allocation of a reinforcement learning-based beam hopping satellite spectrum sharing system according to claim 3, wherein the channel gain is:

wherein, the channel gain from the GEO satellite to the GEO user n time slot j is thatLEO satellite to LEO user i channel gain isL _sg→eg For the path loss of GEO satellite to GEO user, L _sl→el For the path loss of LEO satellite to LEO user, < ->In order to achieve the connection angle between the pointing direction of the GEO transmitting antenna and the transmitting end and the corresponding receiving end of the GEO user in the time slot j, beta ^j In time slot j, the LEO transmitting antenna is directed to an included angle with a connection line between the transmitting end and the corresponding receiving end of the LEO user.

5. The method for resource allocation of a reinforcement learning-based beam hopping satellite spectrum sharing system according to claim 3, wherein the interference gain is:

wherein the interference gain from GEO satellite to LEO user i time slot j is thatThe interference gain from LEO satellite to GEO user n time slot j is +.>L _sg→el Path loss, L, for GEO satellite to LEO user _sl→eg For LEO satellite to GEO user path loss,for the transmission angle between GEO satellite and LEO user i in time slot j, +.>For the reception angle of LEO user i in the GEO satellite direction in time slot j, +.>For time slot j, LEO satellite and GEO user nAngle of emission->For the reception angle of GEO user n in the LEO satellite direction at time slot j.

6. The resource allocation method of the reinforcement learning-based beam hopping satellite spectrum sharing system according to claim 1, wherein the signal-to-interference-and-noise ratios of LEO user i and GEO user n in time slot j are respectively:

wherein , and />Power obtained by LEO user i in time slot j and power obtained by GEO user n in time slot j, respectively,/->To overlap the bandwidth of the GEO satellite interfering signal at time slot j, delta _i ^j To overlap the bandwidths of LEO satellite interference signals in time slot j, T _i ^j For elements of the LEO hop beam working matrix, indicating whether LEO user i is activated or not,/-in slot j> and />To addSexual Gaussian white noise power, < >> and />For the channel gain +.> and />Is the interference gain.

7. The method for resource allocation of reinforcement learning-based beam hopping satellite spectrum sharing system according to claim 1, wherein the Q value update formula is:

wherein ,is environmental status information, a _j Cell selected for beam hopping service for LEO satellites in current time slot, R _j+1 Alpha is learning rate and gamma is discount factor.

8. The resource allocation method of the reinforcement learning-based beam hopping satellite spectrum sharing system according to claim 1, wherein a resource allocation decision scheme is obtained according to the Q value table, and then resource allocation is performed to each LEO user according to the resource allocation decision scheme.