CN113258988A - DQN-based multi-service low-orbit satellite resource allocation method - Google Patents

DQN-based multi-service low-orbit satellite resource allocation method Download PDF

Info

Publication number
CN113258988A
CN113258988A CN202110523792.0A CN202110523792A CN113258988A CN 113258988 A CN113258988 A CN 113258988A CN 202110523792 A CN202110523792 A CN 202110523792A CN 113258988 A CN113258988 A CN 113258988A
Authority
CN
China
Prior art keywords
service
user
satellite
dqn
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110523792.0A
Other languages
Chinese (zh)
Other versions
CN113258988B (en
Inventor
唐伦
李子煜
宋艾遥
孙移星
朱丹青
陈前斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202110523792.0A priority Critical patent/CN113258988B/en
Publication of CN113258988A publication Critical patent/CN113258988A/en
Application granted granted Critical
Publication of CN113258988B publication Critical patent/CN113258988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/1851Systems using a satellite or space-based relay
    • H04B7/18513Transmission in a satellite or space-based system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0473Wireless resource allocation based on the type of the allocated resource the resource being transmission power
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention relates to a DQN-based multi-service low-orbit satellite resource allocation method, which belongs to the field of satellite communication and comprises the following steps: s1: establishing a joint power and channel distribution model based on low earth orbit satellite multi-service; s2: the method comprises the following steps of mapping resource allocation of a multi-beam low-orbit satellite communication system to the problem that an intelligent agent learns interactively in the environment to obtain maximized long-term benefits; s3: and solving the S2 problem through state reconstruction and a DQN algorithm. The invention can improve the system throughput under the conditions of meeting the requirements of multi-service users and maintaining the stability of the service queues.

Description

DQN-based multi-service low-orbit satellite resource allocation method
Technical Field
The invention belongs to the field of satellite communication, and relates to a DQN-based multi-service low-orbit satellite resource allocation method.
Background
The low-earth satellite communication system, which is used as a supplement to the terrestrial communication system, has irreplaceable advantages of lower propagation delay, higher throughput and the like, and is regarded as an important component of 5G communication. Due to the inherent laws of society and economy, the satellite services required in different regions are different, resulting in uneven traffic distribution among the beams. When the network demand is different from the preset capacity, the configuration of the satellite is difficult to adapt to the change, and the network is easy to be congested. Because the satellite bandwidth is limited and the call is not ended immediately, when the flow is increased, the influence of the current resource allocation result on the future environment is more obvious. With the diversification of user terminal service types and the rapid increase of service volume, how to accommodate more users and improve the service quality makes the resource allocation problem of the low-earth-orbit satellite system more complicated.
There have been a lot of work to deeply research the flexible resource allocation strategy of the satellite communication system and to obtain better research results, but the defects of the existing research and technology are:
1) GEO satellites are stationary relative to the ground, while low earth orbit satellites move at high speed, covering an area in about 5 to 12 minutes, making it difficult to directly apply the resource allocation algorithm to the low earth orbit satellite network.
2) The resource allocation algorithm of most satellites still adopts the traditional iterative algorithm, and the artificial participation factor is obvious. When the method is used in a network environment with complex and sudden change, the convergence cannot be rapidly carried out, and the efficient response is made.
3) Some studies consider the application of reinforcement learning techniques to satellite system performance optimization, but still for optimization of a single resource.
Disclosure of Invention
In view of this, the present invention aims to meet the requirements of multi-service users, maintain stable service queues, and improve system throughput, and provides a method for allocating resources of a multi-service low-earth-orbit satellite based on DQN.
In order to achieve the purpose, the invention provides the following technical scheme:
a DQN-based multi-service low-orbit satellite resource allocation method comprises the following steps:
s1: establishing a joint power and channel distribution model based on low earth orbit satellite multi-service;
s2: the method comprises the following steps of mapping resource allocation of a multi-beam low-orbit satellite communication system to the problem that an intelligent agent learns interactively in the environment to obtain maximized long-term benefits;
s3: and solving the S2 problem through state reconstruction and a DQN algorithm.
Further, step S1 specifically includes: in order to guarantee the communication quality of a switching user, a combined power and channel distribution model based on the low earth orbit satellite multi-service is established, the aim of maximizing the system throughput is achieved, meanwhile, the combined power and channel distribution model is limited by the coverage time of the low earth orbit satellite and the stability of a service queue, and the combined power and channel distribution model based on the low earth orbit satellite multi-service comprises the following steps:
s11: the satellite network provides S ═ 1,2, S } different application services for user U, and the priority weight of each service is set as W ═ omega12,...,ωS]The channel allocation state of the beam n at the time slot t is represented as
Figure BDA0003065040590000021
Where K is the number of calls being serviced in beam n,
Figure BDA0003065040590000022
which indicates the type of service,
Figure BDA0003065040590000023
which indicates the type of call to be placed,
Figure BDA0003065040590000024
in order to be a new call,
Figure BDA0003065040590000025
for handing over the call, the channel allocation status of all beams constitutes the channel allocation matrix of the satellite, denoted v (t) { υ [ ]1(t),υ2(t),...,υn(t)};
S12: for each new call the call is given,its state is represented as
Figure BDA0003065040590000026
Where i is the current number of new calls,
Figure BDA0003065040590000027
which indicates the type of service,
Figure BDA0003065040590000028
indicating the call type, and at different times, v (t) will change with the arrival or departure of the user u (t), and correspondingly allocate or release the corresponding resources;
s13: the end-to-end time delay between the user and the satellite meets the covering time constraint of the single beam of the low orbit satellite, namely the total average end-to-end time delay of the service s
Figure BDA0003065040590000029
Figure BDA00030650405900000210
And
Figure BDA00030650405900000211
respectively representing the average queuing delay and the downlink propagation delay of the service s, and T is L/vsatFor beam coverage duration, vsatIs the low orbit satellite operating speed, and L is the known satellite coverage area diameter;
s14: the queue stability is that the satellite system constructs a corresponding queue Q for each services(t) satisfies
Figure BDA00030650405900000212
The queue is stable, where Qs(t) represents the length of the buffer queue in the satellite at the beginning of time slot t for service s, and E is the expectation of the queue.
Further, step S2 specifically includes: the method comprises the following steps of mapping resource allocation of a multi-beam low-orbit satellite communication system to the problem that an intelligent agent performs interactive learning in the environment to obtain the maximized long-term benefit, and formulating the state, action and reward function of a DQN model by using a neural network as a nonlinear approximation function for deep enhanced learning:
s21: the state space is defined as st={V(t),P(t),Qs(t), u (t), where V (t) is channel allocation information of the time slot t satellite, P (t) is power allocation information, Qs(t) is the queue length of the time slot t service, u (t) is the user information of the new request service of the time slot t;
s22: the motion space is defined as at={xnc(t), p (t) }, wherein, xnc(t) indicates whether channel c in beam n of time slot t allocates channel to user, xnc(t) 1, i.e. time slot t, allocates channel c in beam n to user, whereas xnc(t) if 0, then not allocating, p (t) allocating power size for user;
s23: the reward function is defined as
Figure BDA0003065040590000031
The system instant reward is the sum of instant rewards of all new service request users in the network, and is equivalent to
Figure BDA0003065040590000032
Wherein, ω isSThe weight value when the service type of the user is s, kappa reflects the priority of the user, namely the priority of the switching user is higher than that of the new access user, when the new user requests, the reward profit value is set to a value related to the transmission rate, and the system throughput is expressed as
Figure BDA0003065040590000033
Wherein R isuncExpressed as the transmission rate, R, allocated to the userthThe minimum transmission rate required for the user to normally transmit. When the transmission rate allocated to a user is lower than the minimum transmission rate R required by the normal transmission of the userthThen, the distribution strategy effect is poor, and feedback is given
Figure BDA0003065040590000034
(will in simulation)
Figure BDA0003065040590000035
Set to-1); otherwise giving feedback
Figure BDA0003065040590000036
Further, step S3 specifically includes:
s31: and (3) state reconstruction process:
s311: simplifying the beam associated with the new user to a beam of one turn around the source beam, the compressed beam being
Figure BDA0003065040590000037
Wherein the content of the first and second substances,
Figure BDA0003065040590000038
indicating a new request service utThe angle of departure between the source beam of (a) and its surrounding beam n,
Figure BDA0003065040590000039
h is the satellite altitude, θ3dBIs 3dB beamwidth;
s312: the compressed power distribution information and satellite channel distribution information are expressed as
Figure BDA00030650405900000310
And
Figure BDA00030650405900000311
s313: further compressing the satellite channel distribution information V*The information in (t) and user u (t) is processed into the information by one-hot one-hot coding (the information is represented by classification variables as binary vectors, and the state information variables are converted into a form which is easy to use by a machine learning algorithm)
Figure BDA00030650405900000312
The reconstructed state space is phi(s)t)={U*(t),P*(t),Qs(t)};
S32: the DQN algorithm solving process comprises the following steps:
s321: the experience playback pool and the target Q network are used for updating the Q network, so that the network training is more stable;
s322: in order to optimize and approximate the action value function, the loss function is required to approach 0 as much as possible, the Q network is updated by reverse training through a gradient descent method, and the convergence speed is accelerated by adopting a self-adaptive estimation optimizer.
The invention has the beneficial effects that: the invention can improve the system throughput under the conditions of meeting the requirements of multi-service users and maintaining the stability of the service queue.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
figure 1 is a view of a multi-beam low earth orbit satellite communication system scenario;
FIG. 2 is a schematic diagram of a data traffic queuing model;
FIG. 3 is a DQN-based multi-service low-earth-orbit satellite resource allocation algorithm framework diagram;
fig. 4 is a schematic diagram of a state reconstruction process.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
Please refer to fig. 1 to 4:
in fig. 1, the handoff user crosses the source beam to the adjacent beam, and the time T represents the maximum stay time of the terrestrial user in the coverage area of the satellite, and the communication time between the available user and the satellite is L/vsat. Wherein, vsatIs the low earth orbit satellite velocity, and L is the known satellite footprint diameter.
The satellite network provides S ═ 1,2, S } different application services for user U, and the priority weight of each service is set as W ═ omega12,...,ωS]. The channel allocation status of beam n can be expressed as
Figure BDA0003065040590000051
K is the number of calls being served in beam n. Wherein the content of the first and second substances,
Figure BDA0003065040590000052
which indicates the type of service,
Figure BDA0003065040590000053
indicates the type of call in which, among other things,
Figure BDA0003065040590000054
in order to be a new call,
Figure BDA0003065040590000055
to handover a call. The channel allocation states of all beams may constitute a channel allocation matrix for the satellite, denoted v (t) upsilon1(t),υ2(t),...,υn(t) }. For each new call, its state may be represented as
Figure BDA0003065040590000056
Wherein
Figure BDA0003065040590000057
Which indicates the type of service,
Figure BDA0003065040590000058
indicating the type of call. At different time, v (t) will change with the arrival or departure of user u (t), and allocate or release the corresponding resources.
In order to ensure the service quality and efficient transmission of each service, the end-to-end time delay between the user and the satellite should satisfy the coverage time constraint of a single beam of the low-orbit satellite, i.e. the total average end-to-end time delay of the service s
Figure BDA0003065040590000059
Wherein the content of the first and second substances,
Figure BDA00030650405900000510
and
Figure BDA00030650405900000511
respectively representing the average queuing delay of the service s and the propagation delay of the downlink, TuCovering time for beam. FIG. 2 is a schematic diagram of a data traffic queuing model, where the queue stability is that the satellite system constructs a corresponding queuing queue Q for each services(t) satisfies
Figure BDA00030650405900000512
The queue is stable.
Fig. 3 is a frame diagram of a DQN-based multi-service low-earth-orbit satellite resource allocation algorithm. The state space is defined as st={V(t),P(t),Qs(t), u (t), where V (t) is channel allocation information of the time slot t satellite, P (t) is power allocation information, Qs(t) is the queue length of the time slot t service, u (t) is the user information of the new request service of the time slot t; the motion space is defined as at={xnc(t), p (t) }, wherein. x is the number ofnc(t) indicates whether to allocate channels for users, and p (t) allocates power for users; the reward function is defined as
Figure BDA00030650405900000513
The system instant reward is the sum of instant rewards of all new service request users in the network, and is equivalent to
Figure BDA00030650405900000514
Wherein, ω isSThe user is a weighted value when the service type of the user is s, and kappa reflects the priority of the user, namely the priority of the switching user is higher than that of the new access user. When a new user requests, the reward benefit value is set to a value related to the transmission rate, and the system throughput is expressed as
Figure BDA00030650405900000515
Wherein R isuncExpressed as the transmission rate, R, allocated to the userthThe minimum transmission rate required for the user to normally transmit.
Fig. 4 is a schematic diagram of a state reconstruction process. To avoid additional complexity in the location of the user, the beam associated with the new user is reduced to a beam of one turn around the source beam, the compressed beam being
Figure BDA00030650405900000516
Wherein the content of the first and second substances,
Figure BDA00030650405900000517
indicating a new request service utThe angle of departure between the source beam of (a) and its surrounding beam n,
Figure BDA0003065040590000061
h is the satellite altitude, θ3dBIs 3dB beamwidth. The compressed power distribution information and satellite channel distribution information are expressed as
Figure BDA0003065040590000062
And
Figure BDA0003065040590000063
further compressing the satellite channel distribution information V*The information in (t) and the user u (t) is processed into the information by one-hot codes
Figure BDA0003065040590000064
The reconstructed state space is phi(s)t)={U*(t),P*(t),Qs(t)}。
And the experience playback pool and the target Q network are used for updating the Q network, so that the network training is more stable. Meanwhile, in order to optimize and approximate the action value function, the loss function must approach 0 as much as possible, the Q network is updated by reverse training through a gradient descent method, and the convergence speed is accelerated by adopting a self-adaptive estimation optimizer. The specific steps in fig. 3 are as follows.
1) Initializing low orbit satellite scene related parameters, Q network, target Q network parameters and weight theta-Initializing an experience return visit pool;
2) acquiring channel allocation information V, power allocation information P, service queue information Q and information u of a newly requested service user of a low earth orbit satellite system;
3) randomly initializing a state S0
4) The states are processed according to the state reconstruction process shown in FIG. 40=φ(s0);
5) Randomly selecting a probability p through an epsilon-greedy strategy at any time t;
6) when p is ≦ ε, randomly selecting action atE is a; otherwise, an action is selected
Figure BDA0003065040590000065
7) Performing action atChanging the environmental state to obtain the reward value rtAnd observing the next state st+1
8) Will st+1Is treated as phit+1=φ(st+1) And will be<φ(st),at,rt,φ(st+1)>Storing the experience playback pool;
9) randomly drawing and selecting a batch of samples from an experience playback pool<φ(st),at,rt,φ(st+1)>;
10) Calculating Loss function Loss (theta) E [ (y)t)-Q(φ(st),at;θ)2];
11) Calculating deviation correction terms of the first moment and the second moment by using an Adam algorithm;
12) updating a weight parameter theta of the network through a back propagation algorithm of the neural network;
13) updating the target Q network parameter theta with the Q network parameter theta every fixed number of steps G-And outputting the weight parameter theta of the DQN network and a strategy for allocating corresponding resources to each new requesting user.
The invention designs a DQN-based multi-service low-orbit satellite resource allocation method aiming at the problem that the total throughput of a system is lower due to the time-varying downlink transmission scene of a low-orbit satellite communication system and the large difference of service volumes among beams. And under the premise of fully considering the coverage time of each low-orbit satellite and the stability of a service queue, performing dynamic resource allocation by combining the channel bandwidth and the power according to the service type priority and the priority of different users. The method can effectively improve the total throughput of the system.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (6)

1. A DQN-based multi-service low-orbit satellite resource allocation method is characterized in that: the method comprises the following steps:
s1: establishing a joint power and channel distribution model based on low earth orbit satellite multi-service;
s2: the method comprises the following steps of mapping resource allocation of a multi-beam low-orbit satellite communication system to the problem that an intelligent agent learns interactively in the environment to obtain maximized long-term benefits;
s3: the problem in step S2 is solved through state reconstruction and DQN algorithm.
2. The method for resource allocation of DQN-based multi-service low-orbit satellites according to claim 1, wherein: the step S1 specifically includes:
s11: the satellite network provides users U with S ═ { S ═ 1, 2., S } different application services, with the priority weight for each service set to W ═ ω12,...,ωS]The channel allocation state of the beam n at the time slot t is represented as
Figure FDA0003065040580000011
K is the number of calls being served in beam n, where,
Figure FDA0003065040580000012
which indicates the type of service,
Figure FDA0003065040580000013
which indicates the type of call to be placed,
Figure FDA0003065040580000014
in order to be a new call,
Figure FDA0003065040580000015
for handing over the call, the channel allocation status of all beams constitutes the channel allocation matrix of the satellite, denoted v (t) { υ [ ]1(t),υ2(t),...,υn(t)};
S12: for each new call, its state is represented as
Figure FDA0003065040580000016
Where i is the current number of new calls,
Figure FDA0003065040580000017
which indicates the type of service,
Figure FDA0003065040580000018
indicating the call type, and at different times, v (t) will change with the arrival or departure of the user u (t), and correspondingly allocate or release the corresponding resources;
s13: the end-to-end time delay between the user and the satellite meets the covering time constraint of the single beam of the low orbit satellite, namely the total average end-to-end time delay of the service s
Figure FDA0003065040580000019
Figure FDA00030650405800000110
And
Figure FDA00030650405800000111
respectively representing the average queuing delay and the downlink propagation delay of the service s, and T is L/vsatFor beam coverage duration, vsatIs the low orbit satellite operating speed, and L is the known satellite coverage area diameter;
s14: the queue stability is that the satellite system constructs a corresponding queue Q for each services(t) satisfies
Figure FDA00030650405800000112
The queue is stable, where Qs(t) represents the length of the buffer queue in the satellite at the beginning of time slot t for service s, and E is the expectation of the queue.
3. The method for resource allocation of DQN-based multi-service low-orbit satellites according to claim 1, wherein: the step S2 specifically includes:
s21: the state space is defined as st={V(t),P(t),Qs(t), u (t), where V (t) is channel allocation information of the time slot t satellite, P (t) is power allocation information, Qs(t) is the queue length of the time slot t service, u (t) is the user information of the new request service of the time slot t;
s22: the motion space is defined as at={xnc(t), p (t) }, wherein, xnc(t) indicates whether channel c in beam n of time slot t allocates channel to user, xnc(t) 1, i.e. time slot t, allocates channel c in beam n to user, whereas xnc(t) if 0, then not allocating, p (t) allocating power size for user;
s23: the reward function is defined as
Figure FDA0003065040580000021
The system instant reward is the sum of instant rewards of all new service request users in the network, and is equivalent to
Figure FDA0003065040580000022
Wherein, ω isSThe weight value when the service type of the user is s, kappa reflects the priority of the user, namely the priority of the switching user is higher than that of the new access user, when the new user requests, the reward profit value is set to a value related to the transmission rate, and the system throughput is expressed as
Figure FDA0003065040580000023
Wherein R isuncExpressed as the transmission rate, R, allocated to the userthMinimum transmission rate required for normal transmission by the user; when the transmission rate allocated to the user is lowMinimum transmission rate R required for normal transmission of the userthThen, the distribution strategy effect is poor, and feedback is given
Figure FDA0003065040580000024
(will in simulation)
Figure FDA0003065040580000025
Set to-1); otherwise giving feedback
Figure FDA0003065040580000026
4. The method for resource allocation of DQN-based multi-service low-orbit satellites according to claim 1, wherein: the state reconstruction process described in step S3 includes:
s311: simplifying the beam associated with the new user to a beam of one turn around the source beam, the compressed beam being
Figure FDA0003065040580000027
Wherein the content of the first and second substances,
Figure FDA0003065040580000028
indicating a new request service utThe angle of departure between the source beam of (a) and its surrounding beam n,
Figure FDA0003065040580000029
h is the satellite altitude, θ3dBIs 3dB beamwidth;
s312: the compressed power distribution information and satellite channel distribution information are expressed as
Figure FDA00030650405800000210
And
Figure FDA00030650405800000211
s313: further allocating the compressed satellite channelsInformation V*The information in (t) and the user u (t) is processed into the information by one-hot coding
Figure FDA00030650405800000212
The reconstructed state space is phi(s)t)={U*(t),P*(t),Qs(t)}。
5. The DQN-based multi-service low-earth-orbit satellite resource allocation method according to claim 4, wherein: the DQN algorithm solving process described in step S3 includes:
s321: utilizing the experience playback pool and the target Q network for Q network update;
s322: and carrying out reverse training updating on the Q network by a gradient descent method, and accelerating the convergence speed by adopting a self-adaptive estimation optimizer.
6. The DQN-based multi-service low-earth-orbit satellite resource allocation method according to claim 5, wherein: the DQN algorithm solving process specifically comprises the following steps:
1) initializing low orbit satellite scene related parameters, Q network, target Q network parameters and weight theta-Initializing an experience return visit pool;
2) acquiring channel allocation information V, power allocation information P, service queue information Q and information u of a newly requested service user of a low earth orbit satellite system;
3) randomly initializing a state S0
4) Processing of states with state reconstruction0=φ(s0);
5) Randomly selecting a probability p through an epsilon-greedy strategy at any time t;
6) when p is ≦ ε, randomly selecting action atE is a; otherwise, an action is selected
Figure FDA0003065040580000031
7) Performing action atChanging the environmental state to obtain the reward value rtAnd observing the next state st+1
8) Will st+1Is treated as phit+1=φ(st+1) And will be<φ(st),at,rt,φ(st+1)>Storing the experience playback pool;
9) randomly drawing and selecting a batch of samples from an experience playback pool<φ(st),at,rt,φ(st+1)>;
10) Calculating Loss function Loss (theta) E [ (y)t)-Q(φ(st),at;θ)2];
11) Calculating deviation correction terms of the first moment and the second moment by using an Adam algorithm;
12) updating a weight parameter theta of the network through a back propagation algorithm of the neural network;
13) updating the target Q network parameter theta with the Q network parameter theta every fixed number of steps G-And outputting the weight parameter theta of the DQN network and a strategy for allocating corresponding resources to each new requesting user.
CN202110523792.0A 2021-05-13 2021-05-13 DQN-based multi-service low-orbit satellite resource allocation method Active CN113258988B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110523792.0A CN113258988B (en) 2021-05-13 2021-05-13 DQN-based multi-service low-orbit satellite resource allocation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110523792.0A CN113258988B (en) 2021-05-13 2021-05-13 DQN-based multi-service low-orbit satellite resource allocation method

Publications (2)

Publication Number Publication Date
CN113258988A true CN113258988A (en) 2021-08-13
CN113258988B CN113258988B (en) 2022-05-20

Family

ID=77181770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110523792.0A Active CN113258988B (en) 2021-05-13 2021-05-13 DQN-based multi-service low-orbit satellite resource allocation method

Country Status (1)

Country Link
CN (1) CN113258988B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113726374A (en) * 2021-09-18 2021-11-30 北方工业大学 Multi-beam satellite bandwidth allocation method with complementary long and short periods
CN113872675A (en) * 2021-09-28 2021-12-31 东方红卫星移动通信有限公司 Low-earth orbit satellite multi-user service method and system
CN114362810A (en) * 2022-01-11 2022-04-15 重庆邮电大学 Low-orbit satellite beam hopping optimization method based on migration depth reinforcement learning
KR102391927B1 (en) * 2021-11-01 2022-04-28 한화시스템 주식회사 System and method for allocating low-latency traffic resource in low-orbit satellite network
CN114553299A (en) * 2022-02-17 2022-05-27 重庆邮电大学 Satellite system beam scheduling and resource allocation method
CN114665952A (en) * 2022-03-24 2022-06-24 重庆邮电大学 Low-orbit satellite network beam hopping optimization method based on satellite-ground fusion architecture
CN114698045A (en) * 2022-03-30 2022-07-01 西安交通大学 Serial Q learning distributed switching method and system under large-scale LEO satellite network
CN114710200A (en) * 2022-04-07 2022-07-05 中国科学院计算机网络信息中心 Satellite network resource arrangement method and system based on reinforcement learning
CN114710195A (en) * 2022-03-24 2022-07-05 重庆邮电大学 Low-orbit satellite energy-efficient resource allocation method based on beam hopping technology
CN114928401A (en) * 2022-05-17 2022-08-19 重庆邮电大学 Dynamic planning method for LEO inter-satellite link based on multi-agent reinforcement learning
CN115021799A (en) * 2022-07-11 2022-09-06 北京理工大学 Low-orbit satellite switching method based on multi-agent cooperation
CN115276754A (en) * 2022-06-20 2022-11-01 南京邮电大学 Grid time delay prediction-based satellite transmission optimization method
CN115758707A (en) * 2022-11-10 2023-03-07 北京航天驭星科技有限公司 Modeling method, model and acquisition method of east-west retention strategy model of satellite
CN116170062A (en) * 2023-02-15 2023-05-26 中国人民解放军61096部队 Resource preemption method, device and server of multi-beam satellite mobile communication system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2555446A1 (en) * 2011-08-04 2013-02-06 Centre National D'etudes Spatiales System and method for multiple management of transmission resources of a space system for multi-cell radio communication
US20180181971A1 (en) * 2009-11-24 2018-06-28 Visa U.S.A. Inc. Systems and Methods for Multi-Channel Offer Redemption
CN108966352A (en) * 2018-07-06 2018-12-07 北京邮电大学 Dynamic beam dispatching method based on depth enhancing study
CN109743735A (en) * 2018-12-18 2019-05-10 北京邮电大学 A kind of dynamic channel assignment method based on depth enhancing study in satellite communication system
CN110850861A (en) * 2018-07-27 2020-02-28 通用汽车环球科技运作有限责任公司 Attention-based hierarchical lane change depth reinforcement learning
CN111211831A (en) * 2020-01-13 2020-05-29 东方红卫星移动通信有限公司 Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method
CN111262619A (en) * 2020-01-20 2020-06-09 中国科学院计算技术研究所 Multi-beam satellite resource allocation method and system
CN111867104A (en) * 2020-07-15 2020-10-30 中国科学院上海微系统与信息技术研究所 Power distribution method and power distribution device for low earth orbit satellite downlink
CN111970047A (en) * 2020-08-25 2020-11-20 桂林电子科技大学 LEO satellite channel allocation method based on reinforcement learning
CN112019260A (en) * 2020-09-14 2020-12-01 西安交通大学 Low-orbit heterogeneous satellite network routing method and system
CN112039580A (en) * 2020-09-17 2020-12-04 中国人民解放军32039部队 Downlink beam resource allocation method and device for multi-beam communication satellite
CN112312581A (en) * 2020-05-11 2021-02-02 北京邮电大学 Aloha enhanced access method for low-orbit constellation system
CN112749729A (en) * 2019-10-31 2021-05-04 辉达公司 Processor and system for training machine learning model based on precision of comparison model parameters

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181971A1 (en) * 2009-11-24 2018-06-28 Visa U.S.A. Inc. Systems and Methods for Multi-Channel Offer Redemption
US20180225678A1 (en) * 2009-11-24 2018-08-09 Visa U.S.A. Inc. Systems and Methods for Multi-Channel Offer Redemption
EP2555446A1 (en) * 2011-08-04 2013-02-06 Centre National D'etudes Spatiales System and method for multiple management of transmission resources of a space system for multi-cell radio communication
CN108966352A (en) * 2018-07-06 2018-12-07 北京邮电大学 Dynamic beam dispatching method based on depth enhancing study
CN110850861A (en) * 2018-07-27 2020-02-28 通用汽车环球科技运作有限责任公司 Attention-based hierarchical lane change depth reinforcement learning
CN109743735A (en) * 2018-12-18 2019-05-10 北京邮电大学 A kind of dynamic channel assignment method based on depth enhancing study in satellite communication system
CN112749729A (en) * 2019-10-31 2021-05-04 辉达公司 Processor and system for training machine learning model based on precision of comparison model parameters
CN111211831A (en) * 2020-01-13 2020-05-29 东方红卫星移动通信有限公司 Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method
CN111262619A (en) * 2020-01-20 2020-06-09 中国科学院计算技术研究所 Multi-beam satellite resource allocation method and system
CN112312581A (en) * 2020-05-11 2021-02-02 北京邮电大学 Aloha enhanced access method for low-orbit constellation system
CN111867104A (en) * 2020-07-15 2020-10-30 中国科学院上海微系统与信息技术研究所 Power distribution method and power distribution device for low earth orbit satellite downlink
CN111970047A (en) * 2020-08-25 2020-11-20 桂林电子科技大学 LEO satellite channel allocation method based on reinforcement learning
CN112019260A (en) * 2020-09-14 2020-12-01 西安交通大学 Low-orbit heterogeneous satellite network routing method and system
CN112039580A (en) * 2020-09-17 2020-12-04 中国人民解放军32039部队 Downlink beam resource allocation method and device for multi-beam communication satellite

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YANMIN WANG等: "Coordinated resource allocation for satellite-terrestrial coexistence based on radio maps", 《COMMUNICATIONS TECHNOLOGIES & APPLICATIONS》 *
韩永锋: "基于深度强化学习的卫星动态资源管理研究综述", 《 第十六届卫星通信学术年会论文集》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113726374B (en) * 2021-09-18 2022-07-12 北方工业大学 Long-short period complementary multi-beam satellite bandwidth allocation method
CN113726374A (en) * 2021-09-18 2021-11-30 北方工业大学 Multi-beam satellite bandwidth allocation method with complementary long and short periods
CN113872675A (en) * 2021-09-28 2021-12-31 东方红卫星移动通信有限公司 Low-earth orbit satellite multi-user service method and system
CN113872675B (en) * 2021-09-28 2023-09-01 东方红卫星移动通信有限公司 Multi-user service method and system for low orbit satellite
KR102391927B1 (en) * 2021-11-01 2022-04-28 한화시스템 주식회사 System and method for allocating low-latency traffic resource in low-orbit satellite network
CN114362810A (en) * 2022-01-11 2022-04-15 重庆邮电大学 Low-orbit satellite beam hopping optimization method based on migration depth reinforcement learning
CN114553299A (en) * 2022-02-17 2022-05-27 重庆邮电大学 Satellite system beam scheduling and resource allocation method
CN114553299B (en) * 2022-02-17 2024-03-29 深圳泓越信息科技有限公司 Method for scheduling beam and distributing resource of satellite system
CN114665952A (en) * 2022-03-24 2022-06-24 重庆邮电大学 Low-orbit satellite network beam hopping optimization method based on satellite-ground fusion architecture
CN114710195A (en) * 2022-03-24 2022-07-05 重庆邮电大学 Low-orbit satellite energy-efficient resource allocation method based on beam hopping technology
CN114710195B (en) * 2022-03-24 2023-07-25 重庆邮电大学 Low-orbit satellite energy-efficient resource allocation method based on beam hopping technology
CN114698045A (en) * 2022-03-30 2022-07-01 西安交通大学 Serial Q learning distributed switching method and system under large-scale LEO satellite network
CN114698045B (en) * 2022-03-30 2023-08-29 西安交通大学 Serial Q learning distributed switching method and system under large-scale LEO satellite network
CN114710200A (en) * 2022-04-07 2022-07-05 中国科学院计算机网络信息中心 Satellite network resource arrangement method and system based on reinforcement learning
CN114710200B (en) * 2022-04-07 2023-06-23 中国科学院计算机网络信息中心 Satellite network resource arrangement method and system based on reinforcement learning
CN114928401B (en) * 2022-05-17 2023-07-07 重庆邮电大学 LEO inter-satellite link dynamic planning method based on multi-agent reinforcement learning
CN114928401A (en) * 2022-05-17 2022-08-19 重庆邮电大学 Dynamic planning method for LEO inter-satellite link based on multi-agent reinforcement learning
CN115276754B (en) * 2022-06-20 2023-06-16 南京邮电大学 Satellite transmission optimization method based on grid time delay prediction
CN115276754A (en) * 2022-06-20 2022-11-01 南京邮电大学 Grid time delay prediction-based satellite transmission optimization method
CN115021799B (en) * 2022-07-11 2023-03-10 北京理工大学 Low-orbit satellite switching method based on multi-agent cooperation
CN115021799A (en) * 2022-07-11 2022-09-06 北京理工大学 Low-orbit satellite switching method based on multi-agent cooperation
CN115758707A (en) * 2022-11-10 2023-03-07 北京航天驭星科技有限公司 Modeling method, model and acquisition method of east-west retention strategy model of satellite
CN116170062A (en) * 2023-02-15 2023-05-26 中国人民解放军61096部队 Resource preemption method, device and server of multi-beam satellite mobile communication system
CN116170062B (en) * 2023-02-15 2024-02-27 中国人民解放军61096部队 Resource preemption method, device and server of multi-beam satellite mobile communication system

Also Published As

Publication number Publication date
CN113258988B (en) 2022-05-20

Similar Documents

Publication Publication Date Title
CN113258988B (en) DQN-based multi-service low-orbit satellite resource allocation method
CN111970047B (en) LEO satellite channel allocation method based on reinforcement learning
CN114362810B (en) Low orbit satellite beam jump optimization method based on migration depth reinforcement learning
CN114499629B (en) Dynamic allocation method for jumping beam satellite system resources based on deep reinforcement learning
CN115021799B (en) Low-orbit satellite switching method based on multi-agent cooperation
CN114665952B (en) Low-orbit satellite network beam-jumping optimization method based on star-ground fusion architecture
CN113452432A (en) Dynamic allocation method for downlink resources of multi-beam low-orbit satellite communication
CN106792451A (en) A kind of D2D communication resource optimization methods based on Multiple-population Genetic Algorithm
CN115173922B (en) Multi-beam satellite communication system resource allocation method based on CMADDQN network
Chen et al. Learning-based computation offloading for IoRT through Ka/Q-band satellite–terrestrial integrated networks
CN115866788A (en) 3C resource scheduling method of heaven and earth fusion network for active migration of MEC tasks
Park et al. Trends in LEO satellite handover algorithms
CN113453358B (en) Joint resource allocation method of wireless energy-carrying D2D network
Qi et al. Enhanced 5G mobile broadcasting service with shape-adaptive RIS
Zhang et al. Dynamic beam hopping for DVB-S2X satellite: A multi-objective deep reinforcement learning approach
CN115065390B (en) Fair multi-group multicast precoding method based on flow demand
CN114710200B (en) Satellite network resource arrangement method and system based on reinforcement learning
CN114845310A (en) Artificial bee colony algorithm-based LEO satellite channel allocation method
CN113316239B (en) Unmanned aerial vehicle network transmission power distribution method and device based on reinforcement learning
CN112887314B (en) Time delay perception cloud and mist cooperative video distribution method
Liu et al. Research on Handover Strategy of LEO Satellite Network
CN115103449B (en) Multi-beam low-orbit satellite space energy distribution method and device and electronic equipment
CN114614878B (en) Coding calculation distribution method based on matrix-vector multiplication task in star-to-ground network
CN114364007B (en) Subcarrier power control method of low-orbit satellite and unmanned aerial vehicle cellular fusion network
CN114374742B (en) Dynamic cache updating and cooperative transmission method for low-orbit satellite network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant