CN113258988A

CN113258988A - DQN-based multi-service low-orbit satellite resource allocation method

Info

Publication number: CN113258988A
Application number: CN202110523792.0A
Authority: CN
Inventors: 唐伦; 李子煜; 宋艾遥; 孙移星; 朱丹青; 陈前斌
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-08-13
Anticipated expiration: 2041-05-13
Also published as: CN113258988B

Abstract

The invention relates to a DQN-based multi-service low-orbit satellite resource allocation method, which belongs to the field of satellite communication and comprises the following steps: s1: establishing a joint power and channel distribution model based on low earth orbit satellite multi-service; s2: the method comprises the following steps of mapping resource allocation of a multi-beam low-orbit satellite communication system to the problem that an intelligent agent learns interactively in the environment to obtain maximized long-term benefits; s3: and solving the S2 problem through state reconstruction and a DQN algorithm. The invention can improve the system throughput under the conditions of meeting the requirements of multi-service users and maintaining the stability of the service queues.

Description

DQN-based multi-service low-orbit satellite resource allocation method

Technical Field

The invention belongs to the field of satellite communication, and relates to a DQN-based multi-service low-orbit satellite resource allocation method.

Background

The low-earth satellite communication system, which is used as a supplement to the terrestrial communication system, has irreplaceable advantages of lower propagation delay, higher throughput and the like, and is regarded as an important component of 5G communication. Due to the inherent laws of society and economy, the satellite services required in different regions are different, resulting in uneven traffic distribution among the beams. When the network demand is different from the preset capacity, the configuration of the satellite is difficult to adapt to the change, and the network is easy to be congested. Because the satellite bandwidth is limited and the call is not ended immediately, when the flow is increased, the influence of the current resource allocation result on the future environment is more obvious. With the diversification of user terminal service types and the rapid increase of service volume, how to accommodate more users and improve the service quality makes the resource allocation problem of the low-earth-orbit satellite system more complicated.

There have been a lot of work to deeply research the flexible resource allocation strategy of the satellite communication system and to obtain better research results, but the defects of the existing research and technology are:

1) GEO satellites are stationary relative to the ground, while low earth orbit satellites move at high speed, covering an area in about 5 to 12 minutes, making it difficult to directly apply the resource allocation algorithm to the low earth orbit satellite network.

2) The resource allocation algorithm of most satellites still adopts the traditional iterative algorithm, and the artificial participation factor is obvious. When the method is used in a network environment with complex and sudden change, the convergence cannot be rapidly carried out, and the efficient response is made.

3) Some studies consider the application of reinforcement learning techniques to satellite system performance optimization, but still for optimization of a single resource.

Disclosure of Invention

In view of this, the present invention aims to meet the requirements of multi-service users, maintain stable service queues, and improve system throughput, and provides a method for allocating resources of a multi-service low-earth-orbit satellite based on DQN.

In order to achieve the purpose, the invention provides the following technical scheme:

a DQN-based multi-service low-orbit satellite resource allocation method comprises the following steps:

s1: establishing a joint power and channel distribution model based on low earth orbit satellite multi-service;

s2: the method comprises the following steps of mapping resource allocation of a multi-beam low-orbit satellite communication system to the problem that an intelligent agent learns interactively in the environment to obtain maximized long-term benefits;

s3: and solving the S2 problem through state reconstruction and a DQN algorithm.

Further, step S1 specifically includes: in order to guarantee the communication quality of a switching user, a combined power and channel distribution model based on the low earth orbit satellite multi-service is established, the aim of maximizing the system throughput is achieved, meanwhile, the combined power and channel distribution model is limited by the coverage time of the low earth orbit satellite and the stability of a service queue, and the combined power and channel distribution model based on the low earth orbit satellite multi-service comprises the following steps:

s11: the satellite network provides S ═ 1,2, S } different application services for user U, and the priority weight of each service is set as W ═ omega₁,ω₂,...,ω_S]The channel allocation state of the beam n at the time slot t is represented as

Where K is the number of calls being serviced in beam n,

which indicates the type of service,

which indicates the type of call to be placed,

in order to be a new call,

for handing over the call, the channel allocation status of all beams constitutes the channel allocation matrix of the satellite, denoted v (t) { υ [ ]₁(t),υ₂(t),...,υ_n(t)}；

S12: for each new call the call is given,its state is represented as

Where i is the current number of new calls,

which indicates the type of service,

indicating the call type, and at different times, v (t) will change with the arrival or departure of the user u (t), and correspondingly allocate or release the corresponding resources;

s13: the end-to-end time delay between the user and the satellite meets the covering time constraint of the single beam of the low orbit satellite, namely the total average end-to-end time delay of the service s

And

respectively representing the average queuing delay and the downlink propagation delay of the service s, and T is L/v_satFor beam coverage duration, v_satIs the low orbit satellite operating speed, and L is the known satellite coverage area diameter;

s14: the queue stability is that the satellite system constructs a corresponding queue Q for each service_s(t) satisfies

The queue is stable, where Q_s(t) represents the length of the buffer queue in the satellite at the beginning of time slot t for service s, and E is the expectation of the queue.

Further, step S2 specifically includes: the method comprises the following steps of mapping resource allocation of a multi-beam low-orbit satellite communication system to the problem that an intelligent agent performs interactive learning in the environment to obtain the maximized long-term benefit, and formulating the state, action and reward function of a DQN model by using a neural network as a nonlinear approximation function for deep enhanced learning:

s21: the state space is defined as s_t＝{V(t),P(t),Q_s(t), u (t), where V (t) is channel allocation information of the time slot t satellite, P (t) is power allocation information, Q_s(t) is the queue length of the time slot t service, u (t) is the user information of the new request service of the time slot t;

s22: the motion space is defined as a_t＝{x_nc(t), p (t) }, wherein, x_nc(t) indicates whether channel c in beam n of time slot t allocates channel to user, x_nc(t) 1, i.e. time slot t, allocates channel c in beam n to user, whereas x_nc(t) if 0, then not allocating, p (t) allocating power size for user;

s23: the reward function is defined as

The system instant reward is the sum of instant rewards of all new service request users in the network, and is equivalent to

Wherein, ω is_SThe weight value when the service type of the user is s, kappa reflects the priority of the user, namely the priority of the switching user is higher than that of the new access user, when the new user requests, the reward profit value is set to a value related to the transmission rate, and the system throughput is expressed as

Wherein R is_uncExpressed as the transmission rate, R, allocated to the user_thThe minimum transmission rate required for the user to normally transmit. When the transmission rate allocated to a user is lower than the minimum transmission rate R required by the normal transmission of the user_thThen, the distribution strategy effect is poor, and feedback is given

(will in simulation)

Set to-1); otherwise giving feedback

Further, step S3 specifically includes:

s31: and (3) state reconstruction process:

s311: simplifying the beam associated with the new user to a beam of one turn around the source beam, the compressed beam being

Wherein the content of the first and second substances,

indicating a new request service u_tThe angle of departure between the source beam of (a) and its surrounding beam n,

h is the satellite altitude, θ_3dBIs 3dB beamwidth;

s312: the compressed power distribution information and satellite channel distribution information are expressed as

And

s313: further compressing the satellite channel distribution information V^*The information in (t) and user u (t) is processed into the information by one-hot one-hot coding (the information is represented by classification variables as binary vectors, and the state information variables are converted into a form which is easy to use by a machine learning algorithm)

The reconstructed state space is phi(s)_t)＝{U^*(t),P^*(t),Q_s(t)}；

S32: the DQN algorithm solving process comprises the following steps:

s321: the experience playback pool and the target Q network are used for updating the Q network, so that the network training is more stable;

s322: in order to optimize and approximate the action value function, the loss function is required to approach 0 as much as possible, the Q network is updated by reverse training through a gradient descent method, and the convergence speed is accelerated by adopting a self-adaptive estimation optimizer.

The invention has the beneficial effects that: the invention can improve the system throughput under the conditions of meeting the requirements of multi-service users and maintaining the stability of the service queue.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

figure 1 is a view of a multi-beam low earth orbit satellite communication system scenario;

FIG. 2 is a schematic diagram of a data traffic queuing model;

FIG. 3 is a DQN-based multi-service low-earth-orbit satellite resource allocation algorithm framework diagram;

fig. 4 is a schematic diagram of a state reconstruction process.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

Please refer to fig. 1 to 4:

in fig. 1, the handoff user crosses the source beam to the adjacent beam, and the time T represents the maximum stay time of the terrestrial user in the coverage area of the satellite, and the communication time between the available user and the satellite is L/v_sat. Wherein, v_satIs the low earth orbit satellite velocity, and L is the known satellite footprint diameter.

The satellite network provides S ═ 1,2, S } different application services for user U, and the priority weight of each service is set as W ═ omega₁,ω₂,...,ω_S]. The channel allocation status of beam n can be expressed as

K is the number of calls being served in beam n. Wherein the content of the first and second substances,

which indicates the type of service,

indicates the type of call in which, among other things,

in order to be a new call,

to handover a call. The channel allocation states of all beams may constitute a channel allocation matrix for the satellite, denoted v (t) upsilon₁(t),υ₂(t),...,υ_n(t) }. For each new call, its state may be represented as

Wherein

Which indicates the type of service,

indicating the type of call. At different time, v (t) will change with the arrival or departure of user u (t), and allocate or release the corresponding resources.

In order to ensure the service quality and efficient transmission of each service, the end-to-end time delay between the user and the satellite should satisfy the coverage time constraint of a single beam of the low-orbit satellite, i.e. the total average end-to-end time delay of the service s

Wherein the content of the first and second substances,

and

respectively representing the average queuing delay of the service s and the propagation delay of the downlink, T_uCovering time for beam. FIG. 2 is a schematic diagram of a data traffic queuing model, where the queue stability is that the satellite system constructs a corresponding queuing queue Q for each service_s(t) satisfies

The queue is stable.

Fig. 3 is a frame diagram of a DQN-based multi-service low-earth-orbit satellite resource allocation algorithm. The state space is defined as s_t＝{V(t),P(t),Q_s(t), u (t), where V (t) is channel allocation information of the time slot t satellite, P (t) is power allocation information, Q_s(t) is the queue length of the time slot t service, u (t) is the user information of the new request service of the time slot t; the motion space is defined as a_t＝{x_nc(t), p (t) }, wherein. x is the number of_nc(t) indicates whether to allocate channels for users, and p (t) allocates power for users; the reward function is defined as

Wherein, ω is_SThe user is a weighted value when the service type of the user is s, and kappa reflects the priority of the user, namely the priority of the switching user is higher than that of the new access user. When a new user requests, the reward benefit value is set to a value related to the transmission rate, and the system throughput is expressed as

Wherein R is_uncExpressed as the transmission rate, R, allocated to the user_thThe minimum transmission rate required for the user to normally transmit.

Fig. 4 is a schematic diagram of a state reconstruction process. To avoid additional complexity in the location of the user, the beam associated with the new user is reduced to a beam of one turn around the source beam, the compressed beam being

Wherein the content of the first and second substances,

h is the satellite altitude, θ_3dBIs 3dB beamwidth. The compressed power distribution information and satellite channel distribution information are expressed as

And

further compressing the satellite channel distribution information V^*The information in (t) and the user u (t) is processed into the information by one-hot codes

The reconstructed state space is phi(s)_t)＝{U^*(t),P^*(t),Q_s(t)}。

And the experience playback pool and the target Q network are used for updating the Q network, so that the network training is more stable. Meanwhile, in order to optimize and approximate the action value function, the loss function must approach 0 as much as possible, the Q network is updated by reverse training through a gradient descent method, and the convergence speed is accelerated by adopting a self-adaptive estimation optimizer. The specific steps in fig. 3 are as follows.

1) Initializing low orbit satellite scene related parameters, Q network, target Q network parameters and weight theta^-Initializing an experience return visit pool;

2) acquiring channel allocation information V, power allocation information P, service queue information Q and information u of a newly requested service user of a low earth orbit satellite system;

3) randomly initializing a state S₀；

4) The states are processed according to the state reconstruction process shown in FIG. 4₀＝φ(s₀)；

5) Randomly selecting a probability p through an epsilon-greedy strategy at any time t;

6) when p is ≦ ε, randomly selecting action a_tE is a; otherwise, an action is selected

7) Performing action a_tChanging the environmental state to obtain the reward value r_tAnd observing the next state s_t+1；

8) Will s_t+1Is treated as phi_t+1＝φ(s_t+1) And will be<φ(s_t),a_t,r_t,φ(s_t+1)>Storing the experience playback pool;

9) randomly drawing and selecting a batch of samples from an experience playback pool<φ(s_t),a_t,r_t,φ(s_t+1)>；

10) Calculating Loss function Loss (theta) E [ (y)_t)-Q(φ(s_t),a_t；θ)²]；

11) Calculating deviation correction terms of the first moment and the second moment by using an Adam algorithm;

12) updating a weight parameter theta of the network through a back propagation algorithm of the neural network;

13) updating the target Q network parameter theta with the Q network parameter theta every fixed number of steps G^-And outputting the weight parameter theta of the DQN network and a strategy for allocating corresponding resources to each new requesting user.

The invention designs a DQN-based multi-service low-orbit satellite resource allocation method aiming at the problem that the total throughput of a system is lower due to the time-varying downlink transmission scene of a low-orbit satellite communication system and the large difference of service volumes among beams. And under the premise of fully considering the coverage time of each low-orbit satellite and the stability of a service queue, performing dynamic resource allocation by combining the channel bandwidth and the power according to the service type priority and the priority of different users. The method can effectively improve the total throughput of the system.

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. A DQN-based multi-service low-orbit satellite resource allocation method is characterized in that: the method comprises the following steps:

s3: the problem in step S2 is solved through state reconstruction and DQN algorithm.

2. The method for resource allocation of DQN-based multi-service low-orbit satellites according to claim 1, wherein: the step S1 specifically includes:

s11: the satellite network provides users U with S ═ { S ═ 1, 2., S } different application services, with the priority weight for each service set to W ═ ω₁,ω₂,...,ω_S]The channel allocation state of the beam n at the time slot t is represented as

K is the number of calls being served in beam n, where,

which indicates the type of service,

which indicates the type of call to be placed,

in order to be a new call,

S12: for each new call, its state is represented as

Where i is the current number of new calls,

which indicates the type of service,

And

3. The method for resource allocation of DQN-based multi-service low-orbit satellites according to claim 1, wherein: the step S2 specifically includes:

s23: the reward function is defined as

Wherein R is_uncExpressed as the transmission rate, R, allocated to the user_thMinimum transmission rate required for normal transmission by the user; when the transmission rate allocated to the user is lowMinimum transmission rate R required for normal transmission of the user_thThen, the distribution strategy effect is poor, and feedback is given

(will in simulation)

Set to-1); otherwise giving feedback

4. The method for resource allocation of DQN-based multi-service low-orbit satellites according to claim 1, wherein: the state reconstruction process described in step S3 includes:

Wherein the content of the first and second substances,

h is the satellite altitude, θ_3dBIs 3dB beamwidth;

And

s313: further allocating the compressed satellite channelsInformation V^*The information in (t) and the user u (t) is processed into the information by one-hot coding

The reconstructed state space is phi(s)_t)＝{U^*(t),P^*(t),Q_s(t)}。

5. The DQN-based multi-service low-earth-orbit satellite resource allocation method according to claim 4, wherein: the DQN algorithm solving process described in step S3 includes:

s321: utilizing the experience playback pool and the target Q network for Q network update;

s322: and carrying out reverse training updating on the Q network by a gradient descent method, and accelerating the convergence speed by adopting a self-adaptive estimation optimizer.

6. The DQN-based multi-service low-earth-orbit satellite resource allocation method according to claim 5, wherein: the DQN algorithm solving process specifically comprises the following steps:

3) randomly initializing a state S₀；

4) Processing of states with state reconstruction₀＝φ(s₀)；

10) Calculating Loss function Loss (theta) E [ (y)_t)-Q(φ(s_t),a_t；θ)²]；