CN115001611A

CN115001611A - Resource allocation method of hopping beam satellite spectrum sharing system based on reinforcement learning

Info

Publication number: CN115001611A
Application number: CN202210542515.9A
Authority: CN
Inventors: 任品毅; 吴镇国; 徐东阳; 鲁磊
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-09-02
Anticipated expiration: 2042-05-18
Also published as: CN115001611B

Abstract

The invention discloses a resource allocation method of a hopping beam satellite spectrum sharing system based on reinforcement learning, which comprises the following steps: solving an optimization problem based on reinforcement learning to obtain a Q value table, wherein the optimization problem is constructed by taking the throughput of the maximized LEO satellite system as a target; and performing resource allocation on each LEO user according to the Q value table, and completing the resource allocation of the beam hopping satellite spectrum sharing system based on reinforcement learning.

Description

Resource allocation method of hopping beam satellite spectrum sharing system based on reinforcement learning

Technical Field

The invention belongs to the field of a hopping beam satellite system, and relates to a resource allocation method of a hopping beam satellite spectrum sharing system based on reinforcement learning.

Background

With the development of satellite communication, competition among various countries for spectrum resources becomes a necessary trend. In order to meet the requirement of satellite communication on nonrenewable frequency spectrum resources, frequency spectrum sharing becomes a key field of academic research, satellites exchange ephemeris information to negotiate with each other, and a proper frequency spectrum sharing method is adopted to improve the frequency spectrum utilization rate. Most of the existing spectrum sharing methods set angles by using spatial isolation, reduce co-channel interference by switching beams to realize spectrum sharing, or limit the transmitting power of satellite beams by adaptive power control to further reduce co-channel interference to realize spectrum sharing, but the former needs to realize angle adjustment by optimizing a phased array antenna, has high cost and cannot be applied to a satellite system in a large scale, and the latter limits the transmitting power of satellite beams to have certain influence on the throughput of satellites.

In order to effectively utilize spectrum resources and realize spectrum sharing, the beam hopping technology becomes an applicable means. The beam hopping technology utilizes a small number of beams to perform time-sharing coverage, and based on the time slicing technology, only part of spot beams on a satellite are in a working state at a certain specific moment, so that the same frequency interference can be reduced from the angle of time isolation, and the resource can be fully utilized by the part of working spot beams.

In a hopping beam spectrum sharing system, the capacity of a satellite beam channel and the service condition of a user need to be considered comprehensively during resource allocation, so that the problem of resource allocation of the hopping beam spectrum sharing system is a complex and variable problem. However, the current resource allocation algorithm cannot adapt to a complex environment, has insufficient flexibility and has a certain influence on the throughput of the system.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a resource allocation method of a beam hopping satellite spectrum sharing system based on reinforcement learning, which can improve the throughput of an LEO satellite system.

In order to achieve the above object, the resource allocation method of the hopping beam satellite spectrum sharing system based on reinforcement learning according to the present invention comprises the following steps:

solving an optimization problem based on reinforcement learning to obtain a Q value table, wherein the optimization problem is constructed by taking the throughput of the maximized LEO satellite system as a target;

and performing resource allocation on each LEO user according to the Q value table to complete resource allocation of the hopping beam satellite spectrum sharing system based on reinforcement learning.

The method specifically comprises the following steps:

establishing an optimization problem with the aim of maximizing the throughput of the LEO satellite system;

calculating coordinates of the GEO satellite, the LEO satellite and the user in the geocentric geostationary coordinate system;

calculating an off-axis angle, antenna gain, channel gain and interference gain;

setting the signal-to-interference-and-noise ratios of LEO users and GEO users in different time slots when the LEO satellite works in a wave beam;

introducing reinforcement learning to solve the optimization problem, wherein a Q value, a return function and state information are initialized, an LEO user for resource allocation is selected according to a greedy strategy, the return function is calculated, the signal-to-interference-and-noise ratio is counted, the next state is observed, the Q value is updated according to a Q value updating formula, and repeated operation is continuously carried out on the next time slot until a Q value table tends to be converged;

and performing resource allocation on each LEO user according to the obtained Q value table to complete resource allocation of the hopping beam satellite spectrum sharing system based on reinforcement learning.

With the goal of maximizing throughput of the LEO satellite system, the established optimization problem is as follows:

maxR _LEO

wherein ,γ_th For GEO user signal quality threshold, K is the number of clusters, P _tot Is the satellite total power.

The antenna gain is:

wherein ,G_max Maximum gain of the antenna, J ₁ (. and J) ₃ (. to) a first order Bessel function and a third order Bessel function, respectivelyThe root function, sin θ _3dB Is the 3dB angle of the antenna, and theta is the transmitting angle between the satellite transmitting antenna and the interfered user.

The channel gain is:

where L is the path loss.

The interference gain is:

wherein, L is path loss, θ is the transmitting angle between the satellite transmitting antenna and the interfered user, and γ is the receiving angle of the user in the direction of the interfering satellite.

The signal-to-interference-and-noise ratios of the LEO users and the GEO users are respectively as follows:

wherein ,

and

respectively the transmitting power of the LEO satellite and the transmitting power of the GEO satellite,

for the bandwidth overlap of LEO users and GEO users,

for the bandwidth overlap ratio of GEO users and LEO users,

an element of the working matrix for the LEO hop beam, indicating whether or not the LEO beam v is activated in time slot j,

and

is the power of additive white gaussian noise,

and

in order to obtain the gain of the channel,

and

is the interference gain.

The Q value updating formula is as follows:

wherein ,

as environmental status information, a _j Cell selected for beam hopping service of LEO satellite in current time slot, R _j+1 Is a function of the reward.

And acquiring a resource allocation decision scheme according to the Q value table, and then performing resource allocation on each LEO user according to the resource allocation decision scheme.

The invention has the following beneficial effects:

the resource allocation method of the hopping beam satellite spectrum sharing system based on reinforcement learning aims at maximizing the throughput of an LEO satellite system during specific operation, establishes an optimization problem, uses reinforcement learning, comprehensively considers the current service flow distribution condition and the satellite resource condition, and considers the internal connection of resource allocation, so that the throughput of the LEO satellite system is increased under the condition of ensuring the communication service quality of GEO users, the resources are fully utilized, and the resource utilization rate is improved.

Drawings

FIG. 1 is a schematic diagram of the system of the present invention;

FIG. 2 is a GEO user carrier-to-interference-plus-noise ratio diagram of the present invention;

FIG. 3 is a GEO user carrier-to-interference-plus-noise ratio diagram of a conventional resource allocation scheme;

FIG. 4 is a graph comparing throughput of the present invention with conventional resource allocation;

FIG. 5 is a graph illustrating the impact of reinforcement learning training times on throughput variability according to the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments, and are not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

There is shown in the drawings a schematic block diagram of a disclosed embodiment in accordance with the invention. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.

Referring to fig. 1, there are three main devices in the system, namely, a GEO satellite, a LEO satellite and a cell, wherein the GEO satellite continuously performs data transmission on cells 1 to 7, and the LEO satellite serves 49 cells in a mobile process in a beam staring manner, wherein 7 cells and the GEO satellite repeatedly cover, and the LEO satellite exchanges ephemeris information with the GEO satellite to obtain a beam hopping resource allocation algorithm under the condition of acquiring spectrum arrangement of a beam of the GEO satellite, so as to reasonably design a beam hopping plan table; the terminals in the cell receive the requested service data according to the beam hopping schedule received by the satellite.

The resource allocation method of the hopping beam spectrum sharing system based on reinforcement learning comprises the following steps:

1) with the goal of maximizing LEO satellite system throughput at maximum, an optimization problem is established, i.e.

maxR _LEO

wherein ,

for GEO user n at time slot j signal quality, γ _th As GEO user signal quality threshold, T _ij Allocating an indicator factor, T, to a time slot _ij I.e. the resource allocation is obtained for user i in slot j, K is the number of clusters, P _i ^j For the power of beam i in time slot j, P _tot Is the total satellite power.

2) Let the coordinate of the GEO satellite in the Earth-centered-Earth-fixed coordinate system under the time slot j be c _ GEO ^j The coordinate of the LEO satellite in the geocentric geostationary coordinate system under the time slot j is c _ LEO ^j The coordinates of the GEO user u in the earth center-earth fixed coordinate system under the time slot j are

The coordinates of the LEO user v in the geocentric geostationary coordinate system under the time slot j are

3) Let the off-axis angle of the GEO satellite in the LEO user direction at time slot j be

The off-axis angle of the LEO user in the GEO satellite direction is

The off-axis angle of the LEO satellite in the GEO user direction at time slot j is

The off-axis angle of the GEO user in the direction of the LEO satellite is

wherein ：

expressed by antenna gain

It is known that the off-axis angle affects the antenna gain, where G _max Maximum gain of the antenna, J ₁ (. and J) ₃ (. cndot.) is a first order Bessel function and a third order Bessel function, sin θ _3dB At the 3dB angle of the antenna.

Let the interference gain from GEO satellite to LEO user v be

Interference gain from LEO satellite to GEO user u is

The channel gain from the GEO satellite to the GEO user u is

The channel gain from LEO satellite to LEO user v is

wherein ：

wherein ,L_sg→el 、L _sl→eg 、L _sg→eg 、L _sl→el Respectively the path loss from the GEO satellite to the LEO user, the path loss from the LEO satellite to the GEO user, the path loss from the GEO satellite to the GEO user, and the path loss from the LEO satellite to the LEO user,

is the angle between the pointing direction of the GEO transmitting antenna and the connecting line between the transmitting end and the corresponding receiving end of the GEO user, beta ^j For directing and transmitting LEO transmitting antennasAnd the connection included angle between the transmitting end and the receiving end corresponding to the LEO user.

4) Is provided with

And

LEO users and GEO users have signal-to-interference-and-noise ratios in different time slots when the LEO satellite works in all wave beams, wherein

wherein ,

and

are LEO satellite transmission power and GEO satellite transmission power,

for the bandwidth overlap ratio of the LEO users and GEO users,

for the bandwidth overlap ratio of GEO users and LEO users,

an element of the operation matrix for the LEO hop beam, indicating whether or not the LEO beam v is activated in time slot j,

and

is additive white gaussian noise power.

5) Is provided with

Setting an action a for the environmental status information _j Cell selected for beam hopping service of LEO satellite in current time slot and method thereof

Correlation, let Q be Q(s) _j ,a _j ) Is shown in state s _j Lower selection action a _j The update rule of Q is as follows:

wherein ,R_j Is a return function;

wherein ,r_a And r _b Positive and negative numbers with large absolute value, gamma ₀ And the value is the GEO user signal-to-interference-and-noise ratio threshold value, and the Q value, the return function value and the state information are 0 during initialization.

6) Selecting action a according to a greedy policy _j The greedy policy is:

wherein epsilon is the probability of randomly selecting an action, and the LEO satellite serves the user to obtain a corresponding return value R _j And updating the status information s _j+1 And obtaining a Q value according to a Q value updating formula, and continuously repeating the operation of the next time slot until a complete Q value table is obtained by system convergence, thereby obtaining a resource allocation decision scheme under all time slots.

In summary, the specific process of the present invention is as follows:

1) establishing an optimization problem by taking the maximum throughput of the maximized LEO satellite system as a target;

2) calculating coordinates of the GEO satellite, the LEO satellite and the user in a geocentric geostationary coordinate system, and determining an off-axis angle, antenna gain, channel gain and interference gain of the satellite and the user;

3) setting the signal-to-interference-and-noise ratios of LEO users and GEO users in different time slots when the LEO satellite works in a wave beam;

4) initializing a Q value, a return function and state information, selecting an LEO user for resource allocation according to a greedy strategy, calculating the return function, counting a signal-to-interference-and-noise ratio, observing the next state, and updating according to a Q value updating formula;

5) continuing to repeat the operation of the next time slot until the Q value table tends to converge;

6) and acquiring a resource allocation decision scheme under each time slot according to the Q value table, and performing resource allocation on each LEO user to complete resource allocation of the hopping beam spectrum sharing system based on reinforcement learning.

Fig. 2 is a carrier-to-interference-and-noise ratio diagram of the GEO user in the present invention, and it can be seen that the present invention ensures the information transmission quality of the GEO user in the frequency sharing process, and the information transmission of the GEO user is not affected.

Fig. 3 is a chart of carrier-to-noise ratio of GEO users in the multi-beam satellite fixed allocation method, and it can be seen that, in the spectrum sharing process, the invention does not consider ensuring the information transmission quality of GEO users, so that the co-frequency interference of the LEO satellite affects the GEO users.

Fig. 4 is a comparison graph of throughput of LEO satellites according to the multi-beam satellite fixed allocation method of the present invention, and it can be seen that the overall throughput performance of the satellite system is better in the present invention, and the resource utilization rate is improved.

Fig. 5 is a diagram illustrating the influence of the training times on the fluctuation of throughput in the present invention, and it can be seen that as the training times increase, the variance of the fluctuation of throughput also tends to be stable, so that an appropriate training time can be selected to reduce the influence caused by randomness.

Claims

1. A resource allocation method of a hopping beam satellite spectrum sharing system based on reinforcement learning is characterized by comprising the following steps:

and performing resource allocation on each LEO user according to the Q value table, and completing the resource allocation of the hopping beam satellite spectrum sharing system based on reinforcement learning.

2. The resource allocation algorithm of the reinforcement learning-based beam-hopping satellite spectrum sharing system according to claim 1, specifically comprising the following steps:

3. The resource allocation algorithm of the reinforcement learning-based beam-hopping satellite spectrum sharing system according to claim 1, wherein the optimization problem is established with the goal of maximizing the throughput of the LEO satellite system as follows:

max R _LEO

wherein ,γ_th Is GEO user signal quality threshold, K is cluster number, P _tot Is the total satellite power.

4. The reinforcement learning-based resource allocation algorithm for the frequency spectrum sharing system of the beam hopping satellite according to claim 1, wherein the antenna gain is:

wherein ,G_max Maximum gain of the antenna, J ₁ (. o) and J ₃ (. cndot.) is a first order Bessel function and a third order Bessel function, sin θ _3dB Is the 3dB angle of the antenna, and theta is the transmission angle between the satellite transmitting antenna and the interfered user.

5. The reinforcement learning-based resource allocation algorithm for the frequency spectrum sharing system of the beam hopping satellite, according to claim 4, wherein the channel gain is:

where L is the path loss.

6. The resource allocation algorithm of the reinforcement learning-based beam-hopping satellite spectrum sharing system according to claim 4, wherein the interference gain is:

7. The resource allocation algorithm of the reinforcement learning-based beam-hopping satellite spectrum sharing system according to claim 1, wherein the signal-to-interference-and-noise ratios of the LEO users and the GEO users are respectively:

wherein ,

and

for the bandwidth overlap ratio of the LEO users and GEO users,

for the bandwidth overlap ratio of GEO users and LEO users,

and

in order to be an additive white gaussian noise power,

and

in order to obtain the gain of the channel,

and

is the interference gain.

8. The resource allocation algorithm of the reinforcement learning-based beam-hopping satellite spectrum sharing system according to claim 1, wherein the Q value is updated according to the formula:

wherein ,

9. The reinforcement learning-based resource allocation algorithm for the frequency spectrum sharing system of the beam hopping satellite, according to claim 1, wherein a resource allocation decision scheme is obtained according to the Q-value table, and then resource allocation is performed for each LEO user according to the resource allocation decision scheme.