CN114499636B - End-to-end time delay optimization method for uplink and downlink users of multi-beam satellite - Google Patents

End-to-end time delay optimization method for uplink and downlink users of multi-beam satellite Download PDF

Info

Publication number
CN114499636B
CN114499636B CN202210056077.5A CN202210056077A CN114499636B CN 114499636 B CN114499636 B CN 114499636B CN 202210056077 A CN202210056077 A CN 202210056077A CN 114499636 B CN114499636 B CN 114499636B
Authority
CN
China
Prior art keywords
user
downlink
uplink
channel
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210056077.5A
Other languages
Chinese (zh)
Other versions
CN114499636A (en
Inventor
崔高峰
王亚楠
王力男
胡东伟
刘丽哲
徐媛媛
段鹏飞
王卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
CETC 54 Research Institute
Original Assignee
Beijing University of Posts and Telecommunications
CETC 54 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications, CETC 54 Research Institute filed Critical Beijing University of Posts and Telecommunications
Priority to CN202210056077.5A priority Critical patent/CN114499636B/en
Publication of CN114499636A publication Critical patent/CN114499636A/en
Application granted granted Critical
Publication of CN114499636B publication Critical patent/CN114499636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/1851Systems using a satellite or space-based relay
    • H04B7/18519Operations control, administration or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0408Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas using two or more beams, i.e. beam diversity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/1853Satellite systems for providing telephony service to a mobile station, i.e. mobile satellite service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18578Satellite systems for providing broadband data service to individual earth stations
    • H04B7/18595Arrangements for adapting broadband applications to satellite systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Astronomy & Astrophysics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Environmental & Geological Engineering (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Radio Relay Systems (AREA)

Abstract

The invention provides an end-to-end time delay optimization method for uplink and downlink users of a multi-beam satellite, which belongs to the field of satellite communication, and specifically comprises the steps of firstly, building an end-to-end communication scene for the uplink and downlink users in a same-frequency networking; the uplink and downlink users initiate communication requests, the uplink and downlink channel capacities are calculated respectively, and the actual reachable information transmission rate and the total communication time delay of the users are further calculated; in each resource allocation time slot, scheduling users needing data transmission by using a deep reinforcement learning network, and allocating uplink and downlink channel resources; and solving the channel optimal power distribution with the maximized downlink user average data transmission rate by utilizing convex optimization so as to realize the minimized time delay. And obtaining the reward value update of the deep reinforcement learning network, scheduling and resource allocation are carried out on the user pairs at the next moment, then downlink power allocation is optimized convexly, and the process is circulated until all the user pairs are scheduled and transmission is completed. The invention ultimately reduces the total time delay of the system and maximizes the long-term gain of the system.

Description

End-to-end time delay optimization method for uplink and downlink users of multi-beam satellite
Technical Field
The invention belongs to the field of satellite communication, and particularly relates to an end-to-end time delay optimization method for uplink and downlink users of a multi-beam satellite.
Background
The satellite communication system has a plurality of advantages compared with a ground communication system, has strong survivability and is not easily affected by natural disasters; the coverage range is wide, the connection problem of users in remote areas which cannot be covered by a ground network can be effectively solved, and the method plays an important role in the fields of maritime affairs, aviation and the like. But at the same time, the satellite communication system has some disadvantages: for example, the signal strength in the satellite beam edge region is less than the terrestrial cell edge fading, so that there is severe interference in the multi-beam satellite system with a small frequency reuse factor.
In order to meet the demand of rapidly increasing traffic on the ground, the system capacity can be improved by full frequency multiplexing among satellite multi-beams, but the generated co-channel interference is more serious.
Currently, in the research on multi-beam satellite resource allocation, most of the research only considers the downlink from a satellite to a user or the uplink resource allocation from the user to the satellite, and for the same-frequency networking system, the discussion on the optimization of user joint resource allocation considering the uplink and downlink interference influence is less.
The method reduces the end-to-end communication time delay of the system by researching an effective uplink and downlink user combined resource allocation optimization algorithm, and has important significance for improving the performance of the satellite communication system.
Disclosure of Invention
The invention provides a multi-beam satellite uplink and downlink user end-to-end time delay optimization method aiming at the problems, a continuous resource allocation process with time correlation is modeled into a Markov process based on the theoretical knowledge of deep reinforcement learning and convex optimization, uplink and downlink user scheduling and bandwidth allocation decisions are established by utilizing a near-end strategy optimization (PPO) neural network, power allocation among downlink users is optimized by using a convex optimization method, and the aim of reducing the total time delay of a system is finally achieved.
The end-to-end time delay optimization method for uplink and downlink users of the multi-beam satellite comprises the following specific steps:
step one, building an end-to-end communication scene of uplink and downlink users in a multi-beam satellite same-frequency networking;
for a single multi-beam GEO satellite, the GEO satellite comprises K beams, and N is randomly distributed in each beam b A user; n is shared under satellite coverage pair Data transmission is carried out on users, and the total number of the users is 2N pair U for a set of users in which data is transmitted t ={u i |1≤i≤N pair Denotes a user set U for receiving data r ={u j |1≤j≤N pair Denotes.
Step two, aiming at uplink user u i As the data sent by the sending end, the capacity of the uplink channel transmitted to the receiving end of the satellite through the subchannel n in the beam k is calculated
Figure BDA0003476528560000011
The calculation formula is as follows:
Figure BDA0003476528560000012
Figure BDA0003476528560000021
n=1,2,...,CH max ;CH max number of channel resource blocks equally divided for uplink or downlink bandwidth, B c A bandwidth for each channel;
Figure BDA0003476528560000022
the signal-to-interference-and-noise ratio received by the satellite antenna on the subchannel n in the beam k; the calculation formula is as follows:
Figure BDA0003476528560000023
p k,n for user u i The power of the data sent out via subchannel n of beam k; g i→k,n For user u i The transmitted data passes through the channel gain of a subchannel n in a beam k; n is a radical of 0 Is Gaussian white noise power spectral density, phi k,n Representing a set of sub-channels, u, of other beams than beam k that are co-frequency with sub-channel n γ To use a sub-channel gamma e phi k,n User of p γ For user u i Transmit power on subchannel γ;
step three, aiming at downlink user u in wave beam k j And calculating the capacity of a downlink channel transmitted to a user receiving end by the satellite through the sub-channel m in the beam
Figure BDA0003476528560000024
The calculation formula is as follows:
Figure BDA0003476528560000025
m=1,2,...,CH max
Figure BDA0003476528560000026
for downlink users u in satellite beam k j The signal-to-interference-and-noise ratio received at the subchannel m; the calculation formula is as follows:
Figure BDA0003476528560000027
wherein p is k,m The transmitting power allocated to the wave beam k subchannel m for the satellite; g k,m→j For users u in satellite beam k during downlink transmission j Channel gain on subchannel m; phi is a k,m Representing the set of channels, p, of the other beams than beam k that are co-frequency with sub-channel m σ For the transmission power allocated to the co-channel sigma, b σ The beam number of the sub-channel sigma;
step four, user u i With user u j During communication, the respective upper and lower parts are utilizedCalculation of user pairs u by row channel capacity i -u j The actual achievable information transmission rate;
first, user u is calculated i Communication transmission rate R i The calculation formula is as follows:
Figure BDA0003476528560000028
ψ i for user u i A set of occupied subchannels.
Then, user u is calculated j Communication transmission rate R j The calculation formula is as follows:
Figure BDA0003476528560000029
ψ j for user u j A set of occupied subchannels.
Finally, user u i With user u j Actual velocity V in communication i,j The minimum of the two, namely:
V i,j =min(R i ,R j )
step five, utilizing the user pairs u i -u j The actual reachable information transmission rate, and the total time delay T spent by the user for communication is calculated i,j
The calculation formula is as follows:
Figure BDA0003476528560000031
wherein t is the current system running time slot, rho is the length of a single time slot, and rho (t-1) represents the scheduling user pair u at the time t i -u j Waiting time delay of time, d h Is the satellite height, s is the propagation velocity of electromagnetic waves in vacuum,
Figure BDA0003476528560000032
for propagation delay of signals in uplink and downlink, D i,j For user pair u i -u j Number of communicationsAnd (4) data volume.
Step six, allocating time slots in each resource, inputting the environmental state of the current time slot into a deep reinforcement learning network, scheduling users needing data transmission, and allocating uplink and downlink channel resources;
first, for the invoked user pair u i -u j Modeling a resource allocation process of the system as a Markov process (S, A, R), and making decisions on scheduling and bandwidth resource allocation of a plurality of time slot user pairs;
the states S, actions A and rewards R are defined as follows:
1) The state vector is: s t ={U t ,L t ,D t ,W t };
U t For sets of unserviced user pairs, L t For the channel gain of each user with respect to each beam, D t Indicates the data size, W, of each user pair t A matrix is occupied for the bandwidth resources of the system.
2) For each time slot t, the action is defined as:
Figure BDA0003476528560000033
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003476528560000034
the user number of the service is selected for the uplink subchannel n in beam k,
Figure BDA0003476528560000035
and selecting the user number of the service for the downlink sub-channel m in the beam k, and if the number is 0, indicating that the time slot channel is idle, not servicing the user.
3) After each resource allocation, the environment designs a system reward value according to the current state, the action in the current state and the next state: r t =-t min
t min After the current resource is distributed, the minimum transmission delay required by each communication user pair is obtained;
when the current user pair is successfully allocated to the channel for communication, t min The calculation is as follows:
Figure BDA0003476528560000036
D i,j when this is the case, the user is paired with u i -u j The amount of transmission data of (1);
when no user pair is successfully allocated to channel communication at the current time, t min =ρ。
Step seven, selecting the user pairs u of the service in the fixed satellite i -u j And on the basis of the allocated uplink and downlink channel bandwidth resources, solving the downlink channel optimal power allocation with the maximum average data transmission rate of the downlink user by utilizing convex optimization so as to realize the minimum time delay.
After the channel resource of the uplink user is fixed, the uplink transmitting power is known as the maximum transmitting power of the user, and the rate R of the uplink user i Then the calculation is carried out; when allocating power resources for downlink users, when R j >R i The power allocation result in time is not the optimal value for maximizing the total system capacity, so this situation is not true;
when R is j ≤R i The power allocation process of the time downlink user is as follows:
first, the power optimization problem is modeled to maximize the average data transmission rate of the downlink users:
Pro:
Figure BDA0003476528560000041
Figure BDA0003476528560000042
Figure BDA0003476528560000043
Figure BDA0003476528560000044
the constraint C1 indicates that the sum of the total powers allocated to all downlink channels must not exceed the maximum available power P of the satellite tot
Constraint condition C2 indicates that the power value allocated to each downlink channel is a non-negative value;
the constraint condition C3 indicates that the data rate of each downlink user needs to be less than or equal to the data rate of its corresponding uplink user.
Then, due to the presence of co-channel interference between beams, making this problem a non-convex one, the objective function is transformed into a special DC structure, namely:
f(P)=g(P)-h(P)
wherein g (P) and h (P) are both concave functions, and are respectively expressed as:
Figure BDA0003476528560000045
Figure BDA0003476528560000046
constraint C3 is also represented as a DC structure:
R j =W(P j )-V(P j )
W(P j ) And V (P) j ) Are all concave functions, and are respectively expressed as:
Figure BDA0003476528560000047
Figure BDA0003476528560000048
and finally, converting the original problem into a convex optimization problem by using first-order Taylor expansion, and continuously iterating and solving by using a Sequential Convex Approximation (SCA) method to obtain an optimal downlink channel power distribution result.
Step eight, obtaining the transmission rate R of the downlink user j Then, the user pair u is calculated i -u j And (3) obtaining the reward value of the deep reinforcement learning network decision according to the time delay, updating the network, making scheduling and uplink and downlink channel resource allocation decisions for the user pairs at the next moment, and then performing convex optimization on downlink power allocation until all the user pairs are scheduled and transmission is completed.
The invention has the advantages that:
1) The channel gain conditions of uplink users and downlink users are comprehensively considered in a multi-beam full-frequency multiplexing system, and the users to be scheduled and the distributed channel resources are intelligently selected in each time slot by combining the data transmission quantity of user pairs, so that the same frequency interference suffered by each user pair can be kept at a lower level under the full-frequency multiplexing condition, the end-to-end communication rate of the users is maximized, and the effect of reducing the total time delay of the system is finally achieved.
2) The method for optimizing the end-to-end time delay of the uplink and downlink users of the multi-beam satellite can solve the time sequence correlation in the resource allocation process and maximize the long-term benefit of the system by utilizing the deep reinforcement learning method.
Drawings
FIG. 1 is a schematic diagram of a method for optimizing end-to-end delay of uplink and downlink users of a multi-beam satellite according to the present invention;
fig. 2 is a flow chart of an end-to-end time delay optimization method for uplink and downlink users of a multi-beam satellite according to the present invention;
fig. 3 is an end-to-end communication scene diagram of uplink and downlink users of a multi-beam satellite constructed by the invention.
Fig. 4 is a diagram comparing the delay performance of the system under different system bandwidths according to the present invention.
Detailed Description
The invention is further described with reference to the accompanying drawings and specific embodiments;
the invention discloses an end-to-end time delay optimization method for uplink and downlink users of a multi-beam satellite, which comprises the steps of firstly initiating communication requests for the uplink and downlink users in a satellite coverage range, making a user pair scheduling decision and a bandwidth allocation decision of the uplink and downlink users by utilizing a PPO (polyphenylene oxide) network in each time slot after the satellite collects the channel gain and the communication data volume of each user pair, and then allocating power resources for the channels occupied by each downlink user by utilizing a Sequential Convex Approximation (SCA) method, as shown in figure 1. And calculating the signal to interference plus noise ratio (SINR) of each uplink user and each downlink user under the resource condition, obtaining the reachable channel capacity of each user by utilizing a Shannon formula, and carrying out communication between the uplink users and the downlink users at the minimum information transmission rate of the uplink users and the downlink users. And continuously selecting the user pairs to be scheduled in the next time slot, allocating resources, recalculating the transmission rate of each user pair in the next time slot, communicating the user pairs at the updated rate, and repeating the steps until all the user pairs complete communication.
As shown in fig. 2, the end-to-end delay optimization method for uplink and downlink users of a multi-beam satellite includes the following specific steps:
step one, building an end-to-end communication scene of uplink and downlink users in a multi-beam satellite same-frequency networking;
as shown in fig. 3, for a single multi-beam GEO satellite, K beams are included, with N randomly distributed in each beam b A user; n is shared under satellite coverage pair Data transmission is carried out on users, and the total number of the users is 2N pair U for a set of users in which data is transmitted t ={u i |1≤i≤N pair Denotes a user set U for receiving data r ={u j |1≤j≤N pair Represents it.
Total power resource of the system is P tot The user uplink transmission power is fixed as the maximum transmission power p of the user umax When allocated to uplink user n i When sub-channels are used, the transmission power of the user on each sub-channel is p umax /n i . At the beginning of system operation, N pair Requesting communication simultaneously to users, with uplink user u i ∈U t The corresponding downlink user is defined as u j ∈U r
Considering the actual satellite systemThe user pairs need to be coordinated and scheduled due to the limited resources and the influence of the same frequency interference strength. In order to ensure the continuity of user service, the user pairs u i -u j During communication, the frequency resources used by it will be occupied until the communication is completed, and the resources released by it can not be reallocated to the next user.
Step two, aiming at uplink user u i As the data sent by the sending end, the capacity of the uplink channel transmitted to the satellite receiving end through the sub-channel n in the wave beam k is calculated
Figure BDA0003476528560000061
In the uplink transmission process, the slave user u i The channel gain from the transmitting end to the receiving end of the satellite through the subchannel n in the beam k is defined as g i→k,n Including path loss PL, user transmit antenna gain
Figure BDA0003476528560000062
And satellite receiving antenna gain
Figure BDA0003476528560000063
Figure BDA0003476528560000064
User u i Subchannel n through beam k with power p k,n After data is sent, the signal-to-interference-and-noise ratio received by the satellite antenna on the subchannel n in the beam k is as follows:
Figure BDA0003476528560000065
wherein the content of the first and second substances,
Figure BDA0003476528560000066
the total interference magnitude received for the satellite on subchannel n in beam k; n is a radical of 0 Is Gaussian white noise power spectral density, phi k,n Representing a divided beamSet of subchannels, u, of other beams than k, having the same frequency as subchannel n γ To use a sub-channel gamma e phi k,n User of p γ For user u i Transmit power on subchannel γ;
after allocating frequency resources for uplink users, user u i The resulting set of sub-channels is
Figure BDA0003476528560000067
Bandwidth of B i Calculating the user u by the Shannon formula i Sub-channel n e ψ in beam k i The uplink channel capacity is:
Figure BDA0003476528560000068
Figure BDA0003476528560000069
n=1,2,...,CH max ;CH max number of channel resource blocks equally divided for uplink or downlink bandwidth, B c A bandwidth for each channel; system uplink bandwidth B up And downlink bandwidth B down Are respectively divided into CH max Resource block of one channel, each channel bandwidth
Figure BDA00034765285600000610
The set of uplink channels is denoted by psi up ={n|n=1,2,...,CH max The downlink channel set is denoted by psi down ={m|m=1,2,...,CH max }。
Step three, aiming at downlink user u in wave beam k j And calculating the capacity of a downlink channel transmitted to a user receiving end by the satellite through the sub-channel m in the beam
Figure BDA00034765285600000611
During downlink transmission, user u in satellite beam k j The channel gain on the downlink subchannel m in the beam is
Figure BDA00034765285600000612
Where PL is the path loss where,
Figure BDA00034765285600000613
in order for the satellite to transmit the antenna gain,
Figure BDA00034765285600000614
antenna gain is received for the user. The signal-to-interference-and-noise ratio of the user receiving end is as follows:
Figure BDA00034765285600000615
wherein p is k,m The transmitting power allocated to the wave beam k subchannel m for the satellite; phi is a k,m Representing the set of channels, p, of the other beams than beam k that are co-frequency with sub-channel m σ For the transmission power allocated to the co-channel sigma, b σ The beam number of the sub-channel sigma;
Figure BDA00034765285600000616
for user u j The total interference magnitude received on the beam k subchannel m.
Is being user u j After allocating frequency resources, user u j The resulting set of sub-channels is
Figure BDA00034765285600000617
Bandwidth of B j Calculating user u by Shannon's formula j In sub-channel m e psi j The uplink downlink channel capacity is:
Figure BDA00034765285600000618
m=1,2,...,CH max
step four, user u i With user u j In the communication process, the user pair u is calculated by utilizing the respective uplink and downlink channel capacity i -u j Practice ofThe achievable information transmission rate of;
first, user u is calculated i Communication transmission rate R i The calculation formula is as follows:
Figure BDA0003476528560000071
ψ i for user u i A set of occupied subchannels.
Then, user u is calculated j Communication transmission rate R j The calculation formula is as follows:
Figure BDA0003476528560000072
ψ j for user u j A set of occupied subchannels.
Finally, user u i With user u j Actual velocity V in communication i,j The minimum of the two, namely:
V i,j =min(R i ,R j )
step five, utilizing the user pairs u i -u j The actual reachable information transmission rate, and the total time delay T spent by the user for communication is calculated i,j
The calculation formula is as follows:
Figure BDA0003476528560000073
wherein t is the current system running time slot, rho is the length of a single time slot, and rho (t-1) represents the scheduling user pair u at the time t i -u j Waiting time delay of time, d h Is the satellite height, s is the propagation velocity of electromagnetic waves in vacuum,
Figure BDA0003476528560000074
for propagation delay of signals in uplink and downlink, D i,j For user pair u i -u j Communication data ofAmount of the compound (A).
Step six, allocating time slots in each resource, inputting the environmental state of the current time slot into a deep reinforcement learning network, scheduling users needing data transmission, and allocating uplink and downlink channel bandwidth resources;
considering that the user resource allocation decision of the system at the t moment is influenced by the user resource allocation condition at the t-1 moment, the method aims at the called user pair u i -u j The resource allocation process with time correlation is modeled into a Markov process (S, A, R), and a deep reinforcement learning network is utilized to make decisions on scheduling and bandwidth resource allocation for users of a plurality of time slots, so as to achieve the purpose of reducing the total time delay of the system.
The states S, actions A and rewards R are defined as follows:
1) State design
When the multibeam satellite serves users as an intelligent agent and allocates bandwidth and power resources for the users, main information of the environment where the satellite system is located needs to be acquired, including currently unserviced user pairs, channel gains of the users relative to beams, the size of transmission data volume and the bandwidth resource occupation condition of the system. Thus, the state vector designed is: s t ={U t ,L t ,D t ,W t };
U t For sets of unserviced user pairs, L t For the channel gain of each user with respect to each beam, D t Indicates the data size, W, of each user pair t A matrix is occupied for the bandwidth resources of the system.
2) Motion design
For each time slot t, the system selects the user pair to be served according to the environmental state and allocates channel resources for the user pair. The actions are defined as:
Figure BDA0003476528560000081
wherein the content of the first and second substances,
Figure BDA0003476528560000082
the user number of the service is selected for the uplink subchannel n in beam k,
Figure BDA0003476528560000083
and selecting the user number of the service for the downlink sub-channel m in the beam k, and if the number is 0, indicating that the time slot channel is idle, not servicing the user.
3) Reward design
After each resource allocation, the environment designs a system reward value according to the current state, the action in the current state and the next state; after channel resources are distributed to users, specific power resources can be distributed to each downlink user according to the power convex optimization process, and then user pair u is calculated i -u j Data transmission rate V between i,j . Setting the reward value after each action of the network as the minimum residual transmission time delay t in each communication user pair min The negative value of (a), i.e. the prize value per resource allocation, is: r t =-t min
When the current user pair is successfully allocated to the channel for communication, t min For the minimum remaining transmission delay in each communication user pair:
Figure BDA0003476528560000084
D i,j for this time, the user pairs u i -u j The amount of transmission data of (2);
when no user pair is successfully allocated to channel communication at the current time, t min =ρ。
Step seven, selecting the user pairs u of the service in the fixed satellite i -u j And on the basis of the allocated uplink and downlink channel bandwidth resources, solving the downlink channel optimal power allocation with the maximum average data transmission rate of the downlink user by utilizing convex optimization so as to realize the minimum time delay.
In each resource allocation time slot, firstly, the satellite selects the user pair of the service and the channel bandwidth resource of the allocated uplink user to be fixed, and the optimal power allocation among the downlink users is solved. Due to the fact thatThe scheduling time and position of the user are known, namely the waiting time and the propagation time are determined, and the distribution result of the power only influences the transmission delay. To minimize the total time it takes for the system to complete a communication, the average data transfer rate of the user pairs in the system needs to be maximized. When the frequency resource of the uplink user is fixed, the data rate R thereof i That is, it is fixed that the following two situations exist when allocating power resources to downlink users:
①R j ≤R i the data transmission rate between the user pairs is determined by the downlink user R j And (6) determining.
②R j >R i Let P be * ={p k,m * |1≤k≤K,1≤m≤CH max Is a set of optimal values of power that maximizes the average data transmission rate of the system, and there are user pairs u i -u j The uplink and downlink rates of the base station satisfy R j >R i
Will now be allocated to downlink user u j On subchannel m k,m Decrease to change its new downstream rate to
Figure BDA0003476528560000085
And is
Figure BDA0003476528560000086
At this time, the user pairs u i -u j The data transmission rate between the two is still R i And (6) determining. But for other and user u j User u using the same downlink subchannel m h In other words, due to user u j Transmitting power p on subchannel m k,m Decrease, user u j For user u h The generated interference is also reduced, user u h The SINR at the receiving end becomes large, and the user u h The data rate of (2) becomes large, eventually leading to an increase in the overall data transmission rate of the system.
This is in accordance with the assumption P * ={p k,m * |1≤k≤K,1≤m≤CH max The optimality is in conflict, so R j >R i The situation does not hold.
Now to the first R j ≤R i The power allocation under the circumstances is solved:
firstly, modeling the power optimization problem as maximizing the average data transmission rate of downlink users:
Figure BDA0003476528560000091
constraint C1 represents the sum of the total powers allocated to all downlink channels, which cannot exceed the maximum available power of the satellite;
constraint condition C2 indicates that the power value allocated to each downlink channel is a non-negative value;
the constraint condition C3 indicates that the data rate of each downlink user needs to be less than or equal to the data rate of its corresponding uplink user.
This problem is then a non-convex problem due to the presence of co-channel interference between beams, constraining C3 to be a non-convex set. The objective function is transformed into a special structure of DC or Difference of Convex Functions, namely:
f(P)=g(P)-h(P) (12)
wherein g (P) and h (P) are both concave functions, and are respectively expressed as:
Figure BDA0003476528560000092
Figure BDA0003476528560000093
constraint C3 is also expressed as a DC structure:
R j =W(P j )-V(P j ) (15)
W(P j ) And V (P) j ) Are all concave functions, and are respectively expressed as:
Figure BDA0003476528560000094
Figure BDA0003476528560000095
and finally, performing first-order Taylor expansion on one function of the problems, converting the original problem into a convex optimization problem, and continuously iterating and solving by using a continuous convex approximation (SCA) method to obtain an optimal downlink channel power distribution result.
Step eight, obtaining the transmission rate R of the downlink user j Then, the user pair u is calculated i -u j And (3) obtaining the reward value of the deep reinforcement learning network decision according to the time delay, updating the network, making scheduling and uplink and downlink channel resource allocation decisions for the user pairs at the next moment, then performing convex optimization on downlink power allocation, and circulating until all the user pairs are scheduled and transmission is completed.
The total frequency resource of the system, namely the channel resource, can be multiplexed among all wave beams, and the frequency resource multiplexing refers to the frequency channel multiplexing in the CH max One channel is available in all beams; while the power resources of the satellite are used for user power allocation for the downlink.
Deep reinforcement learning makes resource allocation actions and user selection decisions of
Figure BDA0003476528560000101
According to the state vector s given at the present moment t A selection is made as to which user is scheduled and which channels are allocated to this user. All channels in each beam can schedule users and transmit data to the users;
under the condition of comprehensively considering factors such as channel gain of uplink and downlink users, communication data volume and the like, the invention intelligently selects the user pairs to be served in each time slot and allocates bandwidth and power resources for the user pairs, and realizes maximization of the actual data transmission rate between the user pairs by coordinating and scheduling and reasonably allocating frequency and power resources for the uplink and downlink users, thereby finally achieving the purpose of minimizing the system delay. Minimizing system delay problems can translate into user scheduling and frequency resource allocation fixing situationsIn the condition of maximizing the average transmission rate of downlink users, firstly, the users aim at u i -u j When communication is carried out, channel resources and power resources used by uplink and downlink users respectively need to be known, and the uplink users are used as data sending ends, the sending power of the uplink users is fixed and known, so that the invention optimizes the following steps: (1) scheduling selection (which users are selected by the time slot for data transmission), (2) channel resource allocation of scheduled uplink users, and (3) channel resource and power resource allocation of scheduled downlink users, so as to minimize time delay. The deep reinforcement learning network carries out scheduling and uplink and downlink user channel resource allocation, and convex optimization carries out power allocation of downlink users.
After all the quantities to be solved are obtained, the actual transmission rate of each user pair is calculated, and the time delay of each user pair can be obtained. And then obtaining the reward value of the deep reinforcement learning network decision according to the time delay condition, wherein the reward value is used for updating the network, so that the next network can make a more optimal decision. And the network makes scheduling and uplink and downlink channel resource allocation decisions at the next moment, then performs convex optimization on downlink power allocation, and circulates until all user pairs are scheduled and transmission is completed.
The invention is applied to the end-to-end communication scene of uplink and downlink users in a multi-beam satellite communication system. In the coverage area of the satellite, users are randomly distributed in different beams, and data transmission is carried out between every two users. On the data sending side, the system allocates frequency resources for the uplink user with the data sending request, and the user accesses the uplink with fixed transmitting power to send data; on the data receiving side, the system simultaneously allocates frequency and power resources to corresponding data receiving users, and the users receive data through a downlink. In order to improve the frequency utilization rate, the total frequency resource of the system can be multiplexed among all beams, and the power resource of the satellite is used for the user power allocation of the downlink.
Compared with a three-color multiplexing system and a four-color multiplexing system, the users in the full-frequency multiplexing multi-beam satellite system are subjected to more co-frequency interference from other users, and the scheduling of each user is limited to a certain extent in consideration of the limitation of satellite resources and the strength of the co-frequency interference in the system. In the process of end-to-end communication between uplink and downlink users, the actual data transmission rate between each user pair is determined by the minimum value of the two, and is influenced by the channel state of the uplink user and the channel state of the downlink user, so that the scheduling of the user pairs is more limited. Under the condition of comprehensively considering factors such as channel gain of uplink and downlink users, communication data volume and the like, the invention intelligently selects the user pairs to be served in each time slot and allocates bandwidth and power resources for the user pairs, and realizes the maximization of the actual data transmission rate between the user pairs by coordinating and scheduling and reasonably allocating frequency and power resources for the uplink and downlink users, thereby finally achieving the purpose of minimizing the system delay.
As shown in fig. 4, by combining with the other three algorithms: the Bandwidth Average Power Average (BAPA), the bandwidth average power convex optimization (BAPO) and the bandwidth optimized power average (PPO-BOPA) based on the PPO are compared, and the result shows that the bandwidth optimized power optimization (PPO-BOPO) algorithm based on the PPO provided by the invention realizes the minimum system time delay in the compared scheme, and achieves the purpose of effectively reducing the time delay used by uplink and downlink users for communication in the multi-beam satellite system.
The bandwidth optimization power optimization (PPO-BOPO) algorithm based on PPO provided by the invention is applied to an end-to-end communication scene of uplink and downlink user pairs in a multi-beam satellite same-frequency networking system. By comprehensively considering the channel conditions and the communication data volume of the uplink user and the downlink user, the user pairs are coordinated and scheduled in each time slot, and the frequency and power resources are reasonably distributed for the user pairs, so that the actual communication rate between the user pairs is maximized, and the system delay is further reduced. According to the performance comparison and analysis results, the algorithm provided by the invention can effectively reduce the total time of all users in the system for completing communication, and improves the overall time delay performance of the satellite system.

Claims (5)

1. An end-to-end time delay optimization method for uplink and downlink users of a multi-beam satellite is characterized by comprising the following specific steps:
firstly, building an end-to-end communication scene of uplink and downlink users in a multi-beam satellite same-frequency networking;
for uplink user u i As the data sent by the sending end, the capacity of the uplink channel transmitted to the receiving end of the satellite through the subchannel n in the beam k is calculated
Figure FDA0003879576840000011
At the same time, for downlink user u in beam k j And calculating the downlink channel capacity of the satellite transmitted to the user receiving end through the sub-channel m in the beam
Figure FDA0003879576840000012
Then, user u i With user u j In the communication process, the user pair u is calculated by utilizing the respective uplink and downlink channel capacity i -u j The actual achievable information transmission rate; and calculating the total time delay T spent by the user for communication i,j
The user pair u i -u j The actual achievable information transmission rate calculation process is as follows:
first, user u is calculated i Communication transmission rate R i The calculation formula is as follows:
Figure FDA0003879576840000013
ψ i for user u i A set of occupied subchannels;
then, user u is calculated j Communication transmission rate R j The calculation formula is as follows:
Figure FDA0003879576840000014
ψ j for user u j A set of occupied subchannels;
finally, user u i With user u j Actual velocity V in communication i,j The minimum of the two, namely:
V i,j =min(R i ,R j )
secondly, allocating time slots in each resource, inputting the environmental state of the current time slot into a deep reinforcement learning network, scheduling users needing data transmission, and allocating uplink and downlink channel resources; when scheduling user pairs u is selected i -u j On the basis of allocating fixed uplink and downlink channel bandwidth resources, solving the optimal power allocation of the downlink channel with maximized average data transmission rate of the downlink user by utilizing convex optimization so as to realize minimized time delay;
the convex optimization solves the optimal power allocation of the downlink channel, and the specific process is as follows:
after the channel resource of the uplink user is fixed, the uplink transmitting power is known as the maximum transmitting power of the user, and the rate R of the uplink user i Then the calculation is carried out; when allocating power resources for downlink users, when R j >R i The power allocation result in time is not the optimal value for maximizing the total system capacity, so this situation is not true;
when R is j ≤R i The power allocation process of the time downlink user is as follows:
firstly, modeling the power optimization problem as maximizing the average data transmission rate of downlink users:
Pro:
Figure FDA0003879576840000015
Figure FDA0003879576840000016
Figure FDA0003879576840000017
Figure FDA0003879576840000018
the constraint C1 indicates that the sum of the total powers allocated to all downlink channels must not exceed the maximum available power P of the satellite tot
Constraint condition C2 indicates that the power value allocated to each downlink channel is a non-negative value;
the constraint condition C3 indicates that the data rate of each downlink user needs to be less than or equal to the data rate of the uplink user corresponding to the downlink user;
then, due to the presence of co-channel interference between beams, making this problem a non-convex one, the objective function is transformed into a special DC structure, namely:
f(P)=g(P)-h(P)
wherein g (P) and h (P) are both concave functions, and are respectively expressed as:
Figure FDA0003879576840000021
Figure FDA0003879576840000022
constraint C3 is also represented as a DC structure:
R j =W(P j )-V(P j )
W(P j ) And V (P) j ) Are all concave functions, and are respectively expressed as:
Figure FDA0003879576840000023
Figure FDA0003879576840000024
finally, converting the original problem into a convex optimization problem by using first-order Taylor expansion, and continuously iterating and solving by using a Sequential Convex Approximation (SCA) method to obtain an optimal downlink channel power distribution result; finally, obtaining the transmission rate R of the downlink user j Then, the user pair u is calculated i -u j And (3) obtaining the reward value of the deep reinforcement learning network decision according to the time delay, updating the network, making scheduling and uplink and downlink channel resource allocation decisions for the user pairs at the next moment, and then performing convex optimization on downlink power allocation until all the user pairs are scheduled and transmission is completed.
2. The method according to claim 1, wherein the uplink and downlink user end-to-end delay optimization method is specifically characterized in that the uplink and downlink user end-to-end communication scenario is as follows:
for a single multibeam GEO satellite, comprising K beams, N being randomly distributed in each beam b A user; n is shared under satellite coverage pair Data transmission is carried out on users, and the total number of the users is 2N pair U for a set of users in which data is transmitted t ={u i |1≤i≤N pair Denotes a user set U for receiving data r ={u j |1≤j≤N pair Represents it.
3. The method according to claim 1, wherein the uplink channel capacity at the satellite receiving end is calculated by the following formula:
Figure FDA0003879576840000025
Figure FDA0003879576840000026
n=1,2,...,CH max ;CH max number of channel resource blocks equally divided for uplink or downlink bandwidth, B c A bandwidth for each channel;
Figure FDA0003879576840000027
the signal-to-interference-and-noise ratio received by the satellite antenna on the subchannel n in the beam k; the calculation formula is as follows:
Figure FDA0003879576840000031
p k,n for user u i The power of the data sent out via subchannel n of beam k; g i→k,n For user u i The transmitted data passes through the channel gain of a subchannel n in a beam k; n is a radical of 0 Is Gaussian white noise power spectral density, phi k,n Representing a set of sub-channels, u, of other beams than beam k that are co-frequency with sub-channel n γ To use a sub-channel gamma e phi k,n User of p γ For user u i Transmit power on subchannel γ;
downlink channel capacity at the receiving end of a user
Figure FDA0003879576840000032
The calculation formula is as follows:
Figure FDA0003879576840000033
m=1,2,...,CH max
Figure FDA0003879576840000034
for downlink users u in satellite beam k j The signal-to-interference-and-noise ratio received at the subchannel m; the calculation formula is as follows:
Figure FDA0003879576840000035
wherein p is k,m The transmitting power allocated to the wave beam k subchannel m for the satellite; g k,m→j For user u in satellite beam k during downlink transmission j Channel gain on subchannel m; phi is a k,m Representing the set of channels, p, of the other beams than beam k that are co-frequency with sub-channel m σ For the transmission power allocated to the co-channel sigma, b σ Associated beam representing a subchannel sigmaAnd (6) numbering.
4. The method according to claim 1, wherein said user pair u is a user pair u in an end-to-end delay optimization method for uplink and downlink users of a multibeam satellite i -u j Total delay T spent on communication i,j The calculation formula is as follows:
Figure FDA0003879576840000036
wherein t is the current system running time slot, rho is the length of a single time slot, and rho (t-1) represents the scheduling user pair u at the time t i -u j Waiting time delay of time, d h Is the satellite height, s is the propagation velocity of electromagnetic waves in vacuum,
Figure FDA0003879576840000037
for propagation delay of signals in uplink and downlink, D i,j For user pair u i -u j The amount of communication data.
5. The method according to claim 1, wherein the deep reinforcement learning network schedules users and performs channel resource allocation by using an end-to-end delay optimization method for uplink and downlink users of a multi-beam satellite, the procedure is as follows:
first, for the invoked user pair u i -u j Modeling a resource allocation process of the system as a Markov process (S, A, R), and making decisions on scheduling and bandwidth resource allocation of a plurality of time slot user pairs;
the states S, actions A and rewards R are defined as follows:
1) The state vector is: s t ={U t ,L t ,D t ,W t };
U t For sets of unserviced user pairs, L t For the channel gain of each user with respect to each beam, D t Indicates the data size, W, of each user pair t A bandwidth resource occupation matrix for the system;
2) For each time slot t, the action is defined as:
Figure FDA0003879576840000038
wherein the content of the first and second substances,
Figure FDA0003879576840000039
the user number of the service is selected for the uplink subchannel n in beam k,
Figure FDA00038795768400000310
selecting a user number of service for a downlink sub-channel m in a beam k, and if the number is 0, indicating that a time slot channel is idle, not performing service on the user;
3) After each resource allocation, the environment designs a system reward value according to the current state, the action in the current state and the next state: r t =-t min
t min After the current resource is distributed, the minimum transmission delay required by each communication user pair is obtained;
when the current user pair is successfully allocated to the channel for communication, t min The calculation is as follows:
Figure FDA0003879576840000041
D i,j when this is the case, the user is paired with u i -u j The amount of transmission data of (1);
when no user pair is successfully allocated to channel communication at the current time, t min =ρ。
CN202210056077.5A 2022-01-18 2022-01-18 End-to-end time delay optimization method for uplink and downlink users of multi-beam satellite Active CN114499636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210056077.5A CN114499636B (en) 2022-01-18 2022-01-18 End-to-end time delay optimization method for uplink and downlink users of multi-beam satellite

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210056077.5A CN114499636B (en) 2022-01-18 2022-01-18 End-to-end time delay optimization method for uplink and downlink users of multi-beam satellite

Publications (2)

Publication Number Publication Date
CN114499636A CN114499636A (en) 2022-05-13
CN114499636B true CN114499636B (en) 2022-11-29

Family

ID=81471970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210056077.5A Active CN114499636B (en) 2022-01-18 2022-01-18 End-to-end time delay optimization method for uplink and downlink users of multi-beam satellite

Country Status (1)

Country Link
CN (1) CN114499636B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110611902B (en) * 2019-09-19 2022-08-02 哈尔滨工程大学 D2D resource allocation method based on uplink and downlink spectrum joint multiplexing
CN111200459B (en) * 2020-01-09 2022-01-28 南京凯瑞得信息科技有限公司 Channel allocation and power control method for uplink multi-beam satellite
CN113472425A (en) * 2021-06-30 2021-10-01 中国电子科技集团公司第三十八研究所 Energy efficiency-priority satellite multi-beam cooperative communication downlink power distribution method
CN113452432B (en) * 2021-06-30 2023-03-21 西南电子技术研究所(中国电子科技集团公司第十研究所) Dynamic allocation method for downlink resources of multi-beam low-orbit satellite communication
CN113644964B (en) * 2021-08-06 2022-03-29 北京邮电大学 Multi-dimensional resource joint allocation method of multi-beam satellite same-frequency networking system

Also Published As

Publication number Publication date
CN114499636A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN109413724B (en) MEC-based task unloading and resource allocation scheme
CN111314889B (en) Task unloading and resource allocation method based on mobile edge calculation in Internet of vehicles
CN108900237B (en) Resource allocation method for multi-beam satellite communication system
CN112601284B (en) Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning
CN111865398B (en) Satellite-ground transmission method under large-scale LEO satellite deployment
KR100842523B1 (en) Radio resource management technique for cellular systems using wireline relay stations
CN112583566B (en) Network resource allocation method based on air-space-ground integrated system
CN112803986B (en) Multi-beam power dynamic allocation method, communication equipment and low-earth-orbit satellite communication system
KR20120001461A (en) Method and apparatus for controlling transmit power in wireless network
CN111465054A (en) D2D communication resource allocation method based on utility fairness
Swetha et al. Selective overlay mode operation for D2D communication in dense 5G cellular networks
CN115103396A (en) 5G elastic coverage system multi-backhaul link selection and power distribution joint optimization method
US7809327B2 (en) Apparatus and method for controlling power in cellular system using wired relay stations
CN105007629A (en) Radio resource distribution method of ultra-dense small cell network system
CN114153515B (en) Highway internet of vehicles task unloading algorithm based on 5G millimeter wave communication
CN115173922A (en) CMADDQN network-based multi-beam satellite communication system resource allocation method
CN114499636B (en) End-to-end time delay optimization method for uplink and downlink users of multi-beam satellite
CN116318288B (en) MIMO full duplex power distribution method based on intelligent reflecting surface
CN113055860A (en) D2D many-to-many resource allocation method in cellular network
CN112954806A (en) Chord graph coloring-based joint interference alignment and resource allocation method in heterogeneous network
CN108540246B (en) Resource allocation method based on cognitive radio
CN107613565B (en) Wireless resource management method in full-duplex ultra-dense network
CN102413571A (en) Equipment and method for distribution of downlink relay backhaul resources
CN104185184B (en) Multi-cell resource allocation method based on max-min fairness
CN110740461B (en) Resource allocation and power control method based on wireless energy transmission of Internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant