CN117596681A - Method and related device for downlink spectrum sharing and beam power dynamic allocation - Google Patents

Method and related device for downlink spectrum sharing and beam power dynamic allocation Download PDF

Info

Publication number
CN117596681A
CN117596681A CN202311540998.XA CN202311540998A CN117596681A CN 117596681 A CN117596681 A CN 117596681A CN 202311540998 A CN202311540998 A CN 202311540998A CN 117596681 A CN117596681 A CN 117596681A
Authority
CN
China
Prior art keywords
leo
satellite
user
beam power
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311540998.XA
Other languages
Chinese (zh)
Inventor
徐静
樊思萌
赵中天
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202311540998.XA priority Critical patent/CN117596681A/en
Publication of CN117596681A publication Critical patent/CN117596681A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/046Wireless resource allocation based on the type of the allocated resource the resource being in the space domain, e.g. beams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0473Wireless resource allocation based on the type of the allocated resource the resource being transmission power
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/535Allocation or scheduling criteria for wireless resources based on resource usage policies
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Radio Relay Systems (AREA)

Abstract

The invention discloses a downlink spectrum sharing and beam power dynamic allocation method and a related device, which relate to the technical field of satellite communication, and the method comprises the steps of firstly acquiring a data queue to be served of a low-orbit satellite ground user, channel state information of a high-orbit satellite ground user and a matching relation between the low-orbit satellite ground user and a low-orbit satellite beam; and then inputting the data queue, the channel information and the matching relation into a network model to obtain a low-orbit satellite beam power distribution result. According to the invention, the parameters of the neural network model are updated by adopting a near-end strategy optimization algorithm in the model training process, the rapid allocation of the beam power of the low-orbit satellite group is realized based on the deep reinforcement learning framework, and the accuracy and timeliness of a resource allocation result are ensured. The invention ensures that the low-orbit satellite communication system does not cause harmful interference to the high-orbit satellite system when sharing the frequency spectrum of the high-orbit satellite system, and maximizes the service quality of the low-orbit satellite and maintains the fairness among the ground users of the low-orbit satellite.

Description

Method and related device for downlink spectrum sharing and beam power dynamic allocation
Technical Field
The invention belongs to the field of communication, and particularly relates to a downlink spectrum sharing and beam power dynamic allocation method and a related device.
Background
The Low Earth Orbit (LEO) communication system has the key characteristics of Low transmission delay, low signal loss and Low transmission cost, and brings new development and opportunity for satellite communication. With the deployment and use of more and more LEO satellite constellations, spectrum resources are increasingly strained. In a geostationary orbit (Geostationary Earth Orbit, GEO) satellite and LEO satellite coexistence system, LEO satellite sharing GEO system spectrum is one way to address the scarcity of spectrum resources. Spectrum sharing inevitably results in severe interference of GEO systems. As the size, number, and heterogeneity of LEO satellite constellations increases, the interference level increases dramatically. Therefore, in order to improve the throughput of LEO systems under GEO-system interference constraints, it is highly desirable to explore an interference mitigation technique for GEO-LEO co-existence satellite system spectrum sharing.
To address the interference problem, researchers have investigated isolation region-based strategies. Interference is mitigated by determining appropriate exclusion angles and applying techniques such as turn-off, side-looking, progressive pitch techniques, etc. Wang et al in "Coexistence Downlink Interference Analysis Between LEO System and GEO Systemin Ka Band" analyzed the impact of the isolation angle strategy in LEO and GEO coexistence systems. Hills et al in "Feasibility of Using Beam Steering to Mitigate Ku-Band LEO-to-GEO Interference" propose an improved strategy to mitigate Interference by using beam steering of the LEO constellation to form different degrees of rejection angles. In the prior art described above, the shutdown technique, while completely eliminating interference with the GEO system, sacrifices the spectral efficiency of the LEO system.
However, the above-described interference mitigation strategies do not adapt well to time-varying and increasingly complex wireless communication scenarios. Researchers have also proposed Beam Hopping (BH) techniques, adaptive power control (Adaptive Power Control, APC) techniques, and hybrid techniques to suppress interference and improve satellite system throughput. For interference with GEO satellite networks, p. -y.chen et al propose 8 BH schemes in "Coordinative Spectrum Sharing for GEO and LEO Satellite Networks". In GEO and non-high orbit satellite systems, s.k.sharma et al, in "Inline Interference Mitigation Techniques for Spectral Coexistence of GEO and NGEO Satellites," propose an APC technique to mitigate online interference. The BH and APC hybrid techniques were studied for interference management in two-star and LEO-GEO coexistence systems in "Cognitive beamhopping for spectral coexistence of multibeam satellites" and "A Novel Cognitive Satellite Network With GEO and LEO Broadband Systems in the Downlink Case", respectively, where "ANovel Cognitive Satellite Network With GEO and LEO Broadband Systems in the Downlink Case" takes into account the problem of maximizing overall spectral efficiency in the context of LEO-primary-secondary communication systems GEO. In a GEO-LEO coexistence system, one more common assumption is that GEO is the primary communication system and LEO is the secondary communication system. In the case where only LEO satellites are considered to serve one user in "Optimal Beam Power Control for Co-Existing Multibeam GEO and LEO Satellite System", li et al propose an APC scheme to jointly optimize GEO and LEO satellite multi-beam power to achieve LEO satellite throughput maximization. In the section multi-beam Power control for LEO and GEO Spectrum-sharing network, the jia et al considers the situation of a plurality of LEO ground users, and proposes an APC scheme for jointly optimizing the multi-beam power of GEO and LEO satellites. In the bottom mode, Y.Wang et al, in "ANovel Dynamic Spectrum-Sharing Method for GEO and LEO Satellite Networks," developed a spectrum-aware assisted spectrum sharing scheme to ensure that LEO satellites transmitting data do not cause deleterious interference with the GEO system. U.S. Khan et al in "Rate Splitting Multiple Access for Cognitive Radio GEO-LEO Co-Existing Satellite Networks" apply cognitive radio technology and speed division multiple access technology to GEO and LEO coexistence systems while optimizing LEO satellite power and subcarrier allocation. Gu et al in "Dynamic Cooperative Spectrum Sharing in a Multi-beam LEO-GEO Co-Existing Satellite System" propose a flexible dynamic spectrum sharing scheme that takes into account beam power allocation and low-orbit terrestrial user scheduling in the low-orbit satellite constellation. However, in GEO-LEO coexistence scenarios, these APC or BH schemes only focus on transient interference when LEO is collinear with GEO and its terrestrial users, rather than on sustained detrimental interference caused by LEO through the GEO beam coverage area. Therefore, it is necessary to study the interference mitigation scheme of the consecutive slot APC in the GEO-LEO coexistence system.
The resource allocation problem such as beam power allocation in satellite communication systems is often a non-convex NP-hard problem and has many optimization parameters. However, in the above documents, the non-convex NP-hard problem is approximately converted into a convex problem and then is solved iteratively. This will not satisfy timeliness in satellite communication systems. The low orbit satellite has small load, high running speed and high outdated optimized parameters, so that the real-time interference management is more complex. In contrast, using a machine learning based approach to approximate the optimal solution of the non-convex problem is a very promising solution. For example, existing machine learning-based resource allocation documents are all conducted in a single satellite communication system, and little research is done on resource allocation and interference management, particularly beam power allocation, of heterogeneous satellite networks.
In summary, in the multi-layered heterogeneous satellite system, the studies of beam power allocation and interference management based on machine learning are less. Existing beam power allocation schemes also do not take into account handling long-term system throughput, but only focus on the current optimal solution. The complexity of long-term systems increases with dynamic changes in the wireless environment and random arrival of data. Therefore, it is necessary to explore a DRL-based long-term interference mitigation technique to improve the long-term spectral efficiency of GEO and LEO coexistence systems.
Disclosure of Invention
The invention provides a downlink spectrum sharing and beam power dynamic allocation method and a related device, which are used for meeting the demands of spectrum sharing, interference suppression, fairness of ground users, spectrum efficiency improvement and the like of a GEO-LEO coexistence system, and rapidly realizing the allocation of the beam power of a low-orbit satellite while ensuring the service quality of high-orbit satellite communication so as to maximize the service quality of the low-orbit satellite and maintain the fairness among users. The invention is based on a deep reinforcement learning framework, can rapidly realize beam power distribution of the low-orbit satellite, and greatly improves timeliness of resource distribution.
In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides a method for downlink spectrum sharing and beam power dynamic allocation, including the following steps:
constructing a maximized long-term weighted sum rate model under beam power constraint and interference threshold constraint of a communication scene for coexistence of high-low orbit satellites;
constructing a deep reinforcement learning framework for solving the maximized long-term weighted sum rate problem;
training a neural network in the deep reinforcement learning framework by adopting a near-end strategy optimization algorithm;
and taking the data queues and channel information which are required to be transmitted by the ground users of the LEO satellites in each time slot and the matching relation between the LEO ground users and LEO satellite beams as states, and sequentially inputting the states into a trained neural network model to obtain an LEO satellite beam power distribution scheme in each time slot.
In a second aspect, the present invention provides a downlink spectrum sharing and beam power dynamic allocation system, including:
the model building module is used for building a maximized long-term weighted sum rate model under the beam power constraint and the interference threshold constraint of the communication scene for coexistence of the high-low orbit satellites;
the framework construction module is used for constructing a deep reinforcement learning framework for solving the maximized long-term weighted sum rate problem;
the model training module is used for training the neural network in the deep reinforcement learning framework by adopting a near-end strategy optimization algorithm;
the power distribution scheme calculation module is used for regarding a data queue, channel information and a matching relation between LEO ground users and LEO satellite beams which are required to be transmitted by the ground users of the LEO satellites of each time slot as states, and sequentially inputting the data queue, the channel information and the matching relation into the trained neural network model to obtain the LEO satellite beam power distribution scheme of each time slot.
In a third aspect, the invention provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the computer program.
In a fourth aspect, the present invention provides a computer readable storage medium storing a computer program which when executed by a processor performs the steps of a method as described above.
Compared with the prior art, the invention has the following beneficial effects:
the scheme provided by the invention is based on a deep reinforcement learning framework, achieves the aim of maximizing the long-term weighting and rate of the low-orbit satellite system on the premise of ensuring that the GEO satellite system is not interfered by harmful, and can make LEO satellite beam power distribution decisions in real time. Compared with the reference scheme, the method and the device can greatly improve timeliness of resource allocation, effectively improve spectrum efficiency of the LEO satellite system and reduce calculation complexity.
Drawings
For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of the system of the present invention.
Fig. 3 is a deep reinforcement learning framework of long term weighting and rate problems under beam power constraints and interference constraints of the high-low orbit satellite coexistence communication system according to the present invention.
Fig. 4 is a diagram of a training process of the neural network model of the present invention.
Fig. 5 shows the generalization performance of the present invention with respect to the parameter λ.
Fig. 6 is a cumulative distribution of LEO terrestrial user reception and rate achieved using the inventive scheme with three comparative schemes over 500 consecutive time slots, where (a) is λ=10 and (b) is λ=50.
Fig. 7 is a graph of average LE0 user queue backlog at 50 slots (λ=10) for the inventive scheme versus the three comparative schemes.
Fig. 8 is a graph of average LE0 user queue backlog at 50 slots (λ=50) for the inventive scheme versus the three comparative schemes.
Fig. 9 is a graph comparing the average and rate obtained for the present invention with three comparison schemes at different lambda values.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
In the description of the embodiments of the present invention, it should be noted that, if the terms "upper," "lower," "horizontal," "inner," and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or the azimuth or the positional relationship in which the inventive product is conventionally put in use, it is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element to be referred to must have a specific azimuth, be configured and operated in a specific azimuth, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Furthermore, the term "horizontal" if present does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the embodiments of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" should be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
Noun interpretation:
a Low Earth Orbit (LEO) satellite communication system aims to maximize long-term weighted sum rate and inter-user fairness of Low Earth satellite ground users, and a deep reinforcement learning framework is used to realize rapid allocation of beam power of a Low Earth satellite group. The invention ensures that the geosynchronous orbit (Geostationary earth orbit, GEO) satellite communication system is not interfered by harmful and ensures that the low orbit satellite normally communicates near the collinear area.
Collinear vicinity: in a communication system in which high-low orbiting satellites coexist, the high orbiting satellites serve fixed-position high orbiting satellite ground subscribers within their beam coverage in real time, and the low orbiting satellites orbit periodically. As the low-orbit satellite gets closer to the position of the high-orbit satellite and its terrestrial user connection, the interference suffered by the high-orbit satellite terrestrial user will increase gradually, and when the interference is greater than the threshold, the low-orbit satellite communication system will cause harmful interference to the high-orbit satellite communication system. The area causing harmful interference to the high-orbit satellite communication system is defined as a collinearly nearby area, and when the low-orbit satellite moves to the collinearly nearby area, the communication of the low-orbit satellite seriously affects the service quality of the high-orbit satellite.
The invention is described in further detail below with reference to the attached drawing figures:
referring to fig. 1, the embodiment of the invention discloses a method for sharing downlink spectrum and distributing low-orbit satellite resources of a coexistence system, which comprises the following steps:
s1, constructing a maximized long-term weighted sum rate model under beam power constraint and interference threshold constraint of a communication scene for coexistence of high-low orbit satellites; the method comprises the following steps:
the maximized long-term weighted sum rate model is as follows:
Wherein T represents the number of time slots of the low-orbit satellite group passing through the area nearby the collineation;LEO beam power allocation scheme representing T slots; p (t) represents LEO beam power allocation scheme for time slot t, and and->The number of ground users of the low-orbit satellite and the high-orbit satellite are respectively represented; n (N) S Representing the number of LEO satellites in the LEO satellite constellation; n (N) B Representing the number of beams per LEO satellite; t represents a time slot; k represents the index of the LEO ground user; q (Q) k (t) represents a virtual data queue that the kth LEO terrestrial user needs to transmit; c (C) k (t) represents the channel capacity of the kth LEO terrestrial subscriber; r is R k (t) represents the actual transmission rate received by the kth LEO terrestrial user, rk (t) =min (Qk (t), ck (t)); />Representing interference factors of the LEO system on the GEO system; g i,j,g (t) represents the channel gain from the jth beam of the ith LEO satellite to the g-th GEO terrestrial user; p (P) i,j (t) represents the power of the j-th beam of the i-th LEO satellite; i th Representing a tolerable interference threshold for the GEO system; m is m i,j,k (t) represents the matching relationship of the j-th beam of the i-th satellite with the k-th LEO ground user; />Representing the maximum beam power of the LEO satellite.
The virtual data queue Q k The queuing dynamics of (t) satisfy the following conditions:
Wherein S is k (t) represents the transmission rate that can be provided to the user in time slot t; a is that k (t) represents the data rate at which the kth LEO terrestrial user arrives at random at time slot t, A k (t) obeys a poisson distribution of parameter lambda.
S2, constructing a deep reinforcement learning framework for solving the maximized long-term weighted sum rate problem; the method comprises the following steps:
an intelligent agent: the intelligent agent corresponds to a network management center NMC;
environment: the environment is all factors in a wireless communication environment;
status: the state s t Is the input of the neural network and is the basis for the intelligent agent to execute the action; data queue q (t), channel information of LEO satellite ground user of time slot tAnd LEO satellite beam to LEO terrestrial user matching relation +.>Modeling is as the state:
the LEO satellite ground user's data queue q (t) is represented as:
the channel informationExpressed as:
wherein,channel gain vector vec (H) including ith LEO satellite to LEO terrestrial user i (t)) T ,vec(·) T Representing a matrix vectorization function, converting the matrix into a row vector. Interference channel gain vector vec (G) from ith LEO satellite to GEO terrestrial user i (t)) T The method comprises the steps of carrying out a first treatment on the surface of the Interference channel gain vector g from GEO satellite to LEO ground user L (t)。
Wherein H is i (t) is represented as follows:
wherein h is i,j,k (t) represents the channel gain between the jth beam of the ith LEO satellite to the kth LEO ground user; g i (t) is represented as follows:
wherein g i,j,g (t) represents the channel gain between the jth beam of the ith LEO satellite to the g-th GEO terrestrial user; interference channel gain vector g L (t) is represented as follows:
using a sign function to indicate a matching relationship between a j-th beam of an i-th LEO satellite and a kLEO ground user, h th Representing a channel gain threshold;meaning that the j-th beam of the i-th LEO satellite covers and serves the k-th LEO satellite terrestrial users +.>Otherwise, m i,j,k =0 means that the LEO beam main lobe is absentCovering ground user +.>Thus, vec (M i (t)) T Representing a matching relationship between a beam of an ith LEO satellite and LEO ground users, wherein M i (t) is represented as follows:
thus, the matching relationship of the beams of the LEO satellite group to LEO terrestrial usersThe expression is as follows:
the actions are as follows: the action a t Is the output of the neural network, and the agent changes the environment by executing actions; the action is modeled as a LEO beam power allocation vector as follows:
wherein,in the current state s t Next, select action a t The environment will then obtain the reward and take the state from s t Change to s t+1
Rewarding: the prize r t Is the state s of the intelligent agent passing observation t Output action a t Feedback from the environment to the agent; the agent takes action a t Later, it is necessary to know whether this action meets or approaches the optimization objective; prize r t Action a as an environment pair t Is feedback to action a t An index of quality evaluation; for a pair ofGiving a negative reward for actions that do not reach the beam power constraint or the interference constraint; the prize r t The expression is as follows:
wherein c 1 Representing a preset weight, c 2 And c 3 Respectively representing penalty factors which do not meet interference constraint and penalty factors which do not meet beam power constraint; i max (t) and I th Respectively representing maximum interference and interference threshold received by ground users at time slot t GEO, n p (t) represents an action r t The number of beams that do not meet the beam power constraint; r is R k (t) represents the actual transmission rate received by the kth LEO terrestrial user; q (Q) k (t) represents the data queue that the kth LEO terrestrial user needs to transmit.
The userChannel capacity C at time slot t k (t) the following:
wherein,represents the kth LEO ground user, b L Represents LEO satellite beam bandwidth, ψ k (t) represents the user +.>Is a received signal to noise ratio of (2);
the LEO ground userIs to be received signal-to-noise ratio psi k (t) as follows:
wherein at time slot t, x i,j,k (t) indicates whether the j-th beam of the i-th LEO covers and serves the k-th LEO ground user, x i,j,k (t) =1, representing overlay and service, otherwise x i,j,k (t) =0 denotes uncovered; h is a i,j,k (t) represents the j-th beam of the i-th LEO satellite to the k-th LEO ground userChannel gain between; g k (t) represents the kth user from the beam of GEO satellite to LEO satellite +.>Channel gain between; />Representing the noise power.
The satellite channel model h is as follows:
wherein G is T ,G R ,L,A R Antenna gain, free space loss and rain fade for the satellite (transmitting end) and end user (receiving end), respectively. The antenna gain at the transmitting end or the receiving end can be expressed as:
wherein G is 0 Is the maximum antenna gain when the off-axis angle is 0. J (J) 1 (v) and J 3 (v) represents the first and third order Bessel functions respectively,wherein (1)>The off-axis angle is expressed in relation to the satellite and the end user's antenna direction and position. />Representing the off-axis angle corresponding to the 3dB beamwidth. Since LEO user antenna continuously tracks LEO satellite, the off-axis angle of receiving end is 0,G R =G 0 . Free space loss l= (4 pi d) 2 f 2 /c 2 Where d, f, c represent the distance, frequency, and speed of light, respectively, between the satellite and the terrestrial user. Rain fade A R =ζ -1/2 Wherein ζ represents the amplitude of rain fade, A R Obeys a log-normal distribution.
S3, training a neural network in the deep reinforcement learning framework by adopting a near-end strategy optimization algorithm; the method comprises the following steps:
Step 1, initializing relevant parameters in a satellite communication system and a deep reinforcement learning framework;
step 2, agent and environment interact, store batch_size strip data [ s ] t ,a t ,r t ,s t+1 ]Where batch_size represents the size of a batch. Specifically, at time slot t, the Agent observes the environment to obtain the current state s t Will s t Respectively inputting the beam motion data into an actor and a critic network, outputting a normal distribution mean value of beam motion by the actor network, establishing normal distribution according to the mean value, and sampling to obtain a motion a t . Executing action a t And calculate the return function r t Acquiring the state s of the next time slot t+1 . In this way, the data [ s ] of the batch_size group is stored t ,a t ,r t ,s t+1 ]。
Step 3, training an actor and a critic neural network;
and 4, repeating the step 2 and the step 3 until the set iteration times or the convergence of the loss function are reached, and ending the training process of the actor and critic network.
S3-1 extraction of batch_size stripe data [ S ] t ,a t ,r t ,s t+1 ]The dominance estimation function is calculated according to the following equation:
A t =-V(s t )+r t +γr t+1 +…+γ T-t-1 r T-1T-t V(s T ),t={1,2,...,T}
wherein V(s) t ) Is to put the state s t An output value function after input to the critic network, gamma being a discount factor;
s3-2, calculating loss functions of an actor network and a critic network after obtaining the advantage estimation:
wherein L is actor (θ) represents a loss function of the actor network, θ represents a parameter of the actor network, p θ (a t |s t ) Represented in state s t Lower selection action a t Epsilon is a parameter controlling the upper and lower bounds in the clip function; />Representing an average of a batch of data;
wherein,representing the loss function of the critic network, +.>Parameters representing critic network, r i Representing a prize for slot i.
S3-3 network parameters θ and sum for actor and critic networks using Adam optimizerUpdate, θ old ←θ,/>
S3-4 repeatedly performs S3-2 and S3-3 a plurality of times.
S4, inputting the data queues, channel information and matching relation between LEO ground users and LEO satellite beams of LEO satellites of a plurality of time slots into a trained actor network model to obtain a beam power distribution scheme of each time slot.
As shown in fig. 2, an embodiment of the present invention provides a downlink spectrum sharing and beam power dynamic allocation system, including:
the model building module is used for building a maximized long-term weighted sum rate model under the beam power constraint and the interference threshold constraint of the communication scene for coexistence of the high-low orbit satellites;
the framework construction module is used for constructing a deep reinforcement learning framework for solving the maximized long-term weighted sum rate problem;
the model training module is used for training the neural network in the deep reinforcement learning framework by adopting a near-end strategy optimization algorithm;
The power distribution scheme calculation module is used for regarding the data queues, channel information and matching relation between LEO ground users and LEO satellite beams, which are required to be transmitted, of the LEO satellites in each time slot as states, and sequentially inputting the states into the trained neural network model to obtain the LEO satellite beam power distribution scheme in each time slot.
The principle of the invention is as follows:
the neural network model is used to achieve rapid allocation of real-time beam power, modeling all factors in a satellite communication system as an environment. The network management center for managing the satellites is regarded as an Agent of the Agent, and the Agent gives a beam power distribution scheme. In the interaction process of the Agent and the environment, the environment firstly acquires a data queue, channel state information and a matching relation between the low-orbit satellite ground user and the low-orbit satellite wave beam, which are required to be transmitted by the high-orbit satellite ground user. And inputting all the collected information into a neural network module of the intelligent agent as a state to obtain a beam power distribution result. The environment calculates the rewards using the existing allocation scheme and gathers the status of the next moment, and loops back and forth. The resource allocation model is trained based on different channel samples and randomly arrived data, and model parameters are updated by adopting a near-end strategy optimization algorithm in the process of training the model.
In the deep reinforcement learning model, a management center for managing a satellite network is regarded as an intelligent body, a data queue, channel information and matching relation between LEO ground users and LEO satellite beams, which are required to be transmitted, of LEO satellites of a current time slot are modeled together to be a state, the intelligent body not only considers the current channel condition, but also considers data rate queues and the like to be transmitted of each user when making a decision, so that beam power distribution under a plurality of time slots is associated, and the problem of front-back coupling of a plurality of time slots in the long-term weighting and rate problems is solved. In modeling of the action, the beam power allocation scheme of the LEO satellite group is regarded as an action, and the output of the neural network is 1 XN S ·N B Row vector of dimensions, where N S N is the number of LEO satellites in the LEO satellite group B Is the number of beams of one LEO satellite. Rewards obtained after taking action for each slot are defined as weighted sum rate and penalty terms for the current slot. The penalty term consists of two parts, a certain negative prize is given for actions that do not reach the beam power constraint or the interference threshold constraint, respectively.
The frame adopts a near-end policy optimization (Proximal Policy Optimization, PPO) algorithm, the PPO algorithm establishes a multi-element normal distribution by outputting multi-element normal distribution mean values of all dimensions of the motion, and then samples from the distribution to obtain the motion to be executed finally, thereby completing the processing of a continuous motion space.
T represents the length of the trace, and the long-term weighted sum rate problem can be split into the sum of the weighted sum rates of multiple time slots, so that the power allocation of T consecutive time slots can be considered as one trace in the network training. The initial time of training the network, the initial position of the LEO satellite group, the position of the high orbit satellite and the positions of all users are randomly generated. In one track, the GEO satellite and the ground user thereof and the LEO satellite ground user are fixed, and the LEO satellite group uniformly passes through the beam coverage area of the GEO satellite. At any time slot of a track, the amount of data randomly reached by the ground users of the LEO satellites obeys a poisson distribution with parameter λ.
The difference in sample data between the multiple trajectories during the training process is the location of the LEO satellite users and the random number of arrivals of LEO terrestrial users at each time slot. The LEO satellite user positions of the different trajectories are randomly generated. The mapping from the states to the actions is completed through a large number of training, the finally obtained model can well solve the problem of long-term weighting and rate maximization of beam power constraint and interference constraint in a communication scene of coexistence of high-low orbit satellites, and has good generalization for different data transmission requirements of users.
The method comprises the following specific steps:
the first step: modeling of long term weighting and rate maximization problems under beam power constraints and interference threshold constraints for high-low orbit satellite co-existence communication scenarios.
In the GEO and LEO satellite coexistence system, the GEO satellite communication system is a main communication system, the LEO satellite communication system is a secondary communication system, and the frequency band of the GEO satellite system is shared. LEO satellites can cause interference to GEO communication systems when traversing the GEO satellite beam coverage area, particularly when traveling in co-linear proximity to GEO satellites and their terrestrial users. In the primary and secondary communication system model, the secondary communication system should ensure that the interference to the primary communication system is below a given threshold when transmitting information.
The present invention contemplates one scenario: within the coverage of a single-beam GEO satellite, the GEO satellite simultaneously servesIndividual ground users, N S Multi-beam LEO satellite cooperative service>Personal ground user, wherein->And->The number of ground subscribers, N, of high and low orbit satellites, respectively S Representing the number of LEO satellites in the LEO satellite constellation. N (N) B Representing the number of beams per LEO satellite.
At time slot t, h i,j,k (t) represents the jth beam of the ith LEO satellite to the kth user of the LEO satellite Channel gain between g i,j,g (t) represents the (th) user from the (th) beam of the (th) LEO satellite to the (th) GEO satellite>Channel gain between. Similarly, from the beam of the GEO satellite to the kth user of the LEO satellite +.>The channel gain therebetween is denoted as g k (t). Specifically, the channel modeling between satellite and terrestrial users is as follows:
wherein G is T ,G R ,L,A R Antenna gain, free space loss and rain fade for the satellite (transmitting end) and end user (receiving end), respectively. The antenna gain at the transmitting end or the receiving end can be expressed as:
wherein G is 0 Is the maximum antenna gain when the off-axis angle is 0. J (J) 1 (v) and J 3 (v) represents the first and third order Bessel functions respectively,wherein (1)>The off-axis angle is expressed in relation to the satellite and the end user's antenna direction and position. />Representing the off-axis angle corresponding to the 3dB beamwidth. Since LEO user antenna continuously tracks LEO satellite, the off-axis angle of receiving end is 0,G R =G 0 . Free space loss l= (4 pi d) 2 f 2 /c 2 Where d, f, c represent the distance, frequency, and speed of light, respectively, between the satellite and the terrestrial user. Rain fade A R =ζ -1/2 Wherein ζ represents the amplitude of rain fade, A R Obeys a log-normal distribution.
In LEO satellite communication system, its ground user The signal-to-interference-and-noise ratio ψ received at time slot t k (t) is represented as follows: />
Wherein x is i,j,k (t) indicates whether the j-th beam of the i-th LEO covers and serves the userx i,j,k (t) =1, representing overlay and service, otherwise x i,j,k (t) =0 indicates uncovered. />For interference factors on LEO system, +.>Representing the ground user +.>The received noise power. P (P) i,j (t) represents the power of the j-th beam of the i-th LEO satellite; p (P) G (t) represents the beam power of LEO satellite, here assumed +.>Wherein (1)>Representing the maximum beam power of the LEO satellite.
Thus, at time slot t, the userThe channel capacity of (C) can be expressed as C k (t):
Wherein b L Representing LEO satellite beam bandwidth.
GEO userInterference received from LEO system +.>Can be expressed as:
wherein,is an interference factor of the LEO system to the GEO system. />The following constraints should be satisfied:
wherein I is th Representing the interference threshold that the GEO system can tolerate.
Simply maximizing the reception and rate problems for terrestrial users in LEO communication systems can result in a lack of fairness among the multiple users served. Thus, to embody inter-user fairness, the present invention addresses each LEO ground user in an optimization problemIntroducing a virtual data queue Q k (t),Q k (t) queuing dynamics meet the following conditions:
Wherein S is k And (t) represents the transmission rate that can be provided to the user during time slot t. A is that k (t) denotes the user at time slot tRandom arrival data Rate, A k (t) obeys a poisson distribution of parameter lambda.
The invention considers the interference process of LEO satellite group passing near the GEO satellite and the ground user thereof in a collinear way for a period of time, dynamically distributes LEO satellite beam power, and aims to ensure that the GEO satellite ground user always meets the interference requirement in the period of time and simultaneously furthest improves the data transmission rate of the LEO satellite, so the modeling for the maximum long-term weighting and rate maximization problem under the beam power constraint and the interference constraint of the communication scene of the coexistence of high-low orbit satellites is as follows:
wherein T represents the number of time slots of the low-orbit satellite group passing through the area nearby the collineation;LEO beam power allocation scheme representing T slots; p (t) represents LEO beam power allocation scheme for time slot t, andt represents a time slot; k represents the index of the LEO ground user; q (Q) k (t) represents a virtual data queue that the kth LEO terrestrial user needs to transmit; c (C) k (t) represents the channel capacity of the kth LEO terrestrial subscriber; r is R k (t) represents the actual transmission rate received by the kth LEO terrestrial user, R k (t)=min(Q k (t),C k (t));/>Representing interference factors of the LEO system on the GEO system; g i,j,g (t) represents the channel gain from the jth beam of the ith LEO satellite to the g-th GEO terrestrial user; p (P) i,j (t) represents the power of the j-th beam of the i-th LEO satellite; m is m i,j,k (t) represents the matching relationship of the j-th beam of the i-th satellite with the k-th LEO ground user; i th Representing the interference threshold that the GEO system can tolerate.
And a second step of: a deep reinforcement learning framework is constructed that solves the maximized long term weighted sum rate problem.
The beam power of the LEO satellites is allocated in successive time slots to maximize the long term weighted sum rate of the LEO satellites. The present invention employs a near-end policy optimization algorithm (PPO) adapted to solve for continuous variables to obtain beam power allocation. The Markov Decision Process (MDP) is a general framework for modeling sequence decision problems, and the definition of the important elements of the intelligent agent, the environment, the state, the action and the rewards in the reinforcement learning framework is as follows:
an intelligent agent: the intelligent agent is the center in the reinforcement learning framework and is equivalent to the human brain, and the intelligent agent acquires the state and makes decisions by observing the environment. Since all satellites are managed by a network management center (network management center, NMC), NMC can be considered an agent in the present invention.
Environment: in reinforcement learning, all the content that interacts with an agent is called an environment, the agent obtains the current state from the environment and performs actions to change the environment, and in the present invention, all the factors in the satellite communication system are in the environment.
Status: state s t Is the input of the neural network and is the basis for the intelligent agent to execute the action. In the invention, in order to solve the problem of long-term weighting and rate maximization, the intelligent agent not only needs to consider the channel condition of the current time slot, but also considers the data queues to be transmitted of each user when making a decision, so that the beam power distribution under a plurality of time slots is associated, and the problem of front-back coupling of a plurality of time slots in the problem of long-term weighting and rate is solved. Thus, the LEO satellite ground user of time slot t is subjected to data queue q (t) and channel informationAnd LEO satellite beam to LEO terrestrial user matching relation +.>Modeling is as the state:
wherein the data queue q (t) of the LEO satellite ground user is expressed as:
/>
channel informationExpressed as:
wherein,channel gain vector vec (H) including ith LEO satellite to LEO terrestrial user i (t)) T ,vec(·) T Representing a matrix vectorization function, converting the matrix into a row vector. Interference channel gain vector vec (G) from ith LEO satellite to GEO terrestrial user i (t)) T The method comprises the steps of carrying out a first treatment on the surface of the Interference channel gain vector g from GEO satellite to LEO ground user L (t)。
Wherein H is i (t) is represented as follows:
wherein h is i,j,k (t) represents the channel gain between the jth beam of the ith LEO satellite to the kth LEO ground user; g i (t) is represented as follows:
wherein g i,j,g (t) represents the channel gain between the jth beam of the ith LEO satellite to the g-th GEO terrestrial user; interference channel gain vector g L (t) is represented as follows:
using a sign function to indicate the jth beam of the ith LEO satellite and the kth LEO ground userMatching relationship between h th Representing a channel gain threshold;meaning that the j-th beam of the i-th LEO satellite covers and serves the k-th LEO satellite terrestrial users +.>Otherwise, m i,j,k =0 means that the LEO beam main lobe is not covered with the ground user +.>Thus, vec (M i (t)) T Representing a matching relationship between a beam of an ith LEO satellite and LEO ground users, wherein M i (t) is represented as follows:
thus, the matching relationship of the beams of the LEO satellite group to LEO terrestrial usersThe expression is as follows: />
The actions are as follows: the action a t Is the output of the neural network, and the agent changes the environment by executing actions; the action is modeled as a LEO beam power allocation vector as follows:
wherein,in the current state s t Next, select action a t After that, the ringThe context will get the reward and state from s t Change to s t+1
Rewarding: the prize r t Is the state s of the intelligent agent passing observation t Output action a t Feedback from the environment to the agent; the agent takes action a t Later, it is necessary to know whether this action meets or approaches the optimization objective; prize r t Action a as an environment pair t Is feedback to action a t An index of quality evaluation; giving a negative reward for actions that do not reach the beam power constraint or the interference constraint; the prize r t The expression is as follows:
wherein c 1 Representing a preset weight, c 2 And c 3 Respectively representing penalty factors which do not meet interference constraint and penalty factors which do not meet beam power constraint; i max (t) represents the maximum interference received by the ground user at time slot t GEO,I th respectively represent the interference threshold values of GEO systems, n p (t) represents an action r t The number of beams that do not meet the beam power constraint; r is R k (t) represents the actual transmission rate received by the kth LEO terrestrial user; q (Q) k (t) represents a virtual data queue that the kth LEO terrestrial user needs to transmit.
And a third step of: the training process of the neural network in the framework is solved by reinforcement learning of the long-term weighting and rate problems under the beam power constraint and the interference constraint of the communication scene where the high-low orbit satellites coexist.
The optimization problem shown in equation (6) is a mixed integer nonlinear programming problem that makes it difficult to obtain an optimal solution in polynomial time. The problem involves a continuous beam power decision process for a plurality of consecutive time slots, which makes the solution of the problem more complex. And the optimal solution can be approximated quickly and accurately by using the deep reinforcement learning framework. The near-end policy optimization (PPO, proximal Policy Optimization) algorithm is considered to be one of the most advanced methods in the field of deep reinforcement learning, and it can solve continuous and discrete action space problems based on an actor-critic network architecture. In the PPO algorithm, the probability distribution of the policy is parameterized and the agent can learn the policy directly. In terms of algorithm convergence, the PPO algorithm limits the range of strategy change through clip functions, and is faster and better than natural gradient and trust domain strategy optimization convergence.
Therefore, the invention provides a deep reinforcement learning architecture based on a PPO algorithm to solve the optimization problem shown in the formula (6), and the architecture can provide an intelligent and real-time beam power distribution scheme. The main implementation flow of the beam power distribution algorithm (DRL-based beam power allocation, drlBPA) based on the deep reinforcement learning framework provided by the scheme is shown in an algorithm 1:
Algorithm 1 DRL-based beam power allocation algorithm
The specific implementation process is as follows:
step 1: relevant parameters in the satellite communication system and the PPO algorithm are initialized.
TABLE 1 satellite channel System parameters
Initializing relevant parameters of the satellite system including, but not limited to, frequencies, receiver noise temperature, noise bandwidth, antenna efficiency, high orbit satellite antenna diameter, low orbit satellite antenna diameter, 3dB angle, rain fade mean and standard deviation, low orbit satellite operational altitude, low orbit satellite beam number, maximum power of low orbit satellite in table 1 and table 2The rate. In rain decline A R =ζ -1/2 Zeta in dB =20log(ζ),Mu and sigma represent the mean and variance of the rain fade, respectively, in relation to the position of the receiver, the frequency and polarization direction of the receiving user of the terminal and the angle of the satellite.
TABLE 2 simulation parameters for high-low orbit satellite systems
Initializing relevant parameters of a DRL model; as shown in table 3 below, super parameters for training the actor and critic networks are provided, and the relevant parameters include at least one of the following: discount factor, learning rate, number of training rounds, track length, update interval, number of updates, and cut factor. As shown in table 4 below, the layer structure of the actor and critic neural networks and the number of nodes of each layer are provided. The actor and critic neural networks are composed of an input layer, a full connection layer and an output layer, and the tanh function is used as an activation function.
Table 3 DRL algorithm simulation parameter settings
Table 4 network architecture parameters
Step 2, the Agent interacts with the environment, and a batch of data group s is stored in the experience playback pool t ,a t ,r t ,s t+1 ]Where batch_size represents the size of a batch. Specifically, at time slot t, the Agent observes the environment to obtain the current state s t Will s t Respectively input into an actor and a critic network, and the actor network outputs the normal wave beam actionThe mean value of the distribution, normal distribution is established according to the mean value, and the action a is obtained by sampling the normal distribution t . Executing action a t And calculate the return function r t Acquiring the state s of the next time slot t+1 . In this way, the data [ s ] of the batch_size group is stored t ,a t ,r t ,s t+1 ]。
Step 3, training an actor and critic neural network based on the state data (i.e. batch_size=t) of at least one track;
and 4, repeating the step 2 and the step 3 until the set iteration times or the convergence of the loss function are reached, and ending the training process of the actor and critic network.
S3-1 extraction of batch_size group data [ S ] in experience playback pool t ,a t ,r t ,s t+1 ]The dominance estimation function is calculated according to the following equation:
A t =-V(s t )+r t +γr t+1 +…+γ T-t-1 r T-1T-t V(s T ),t={1,2,...,T} (10)
wherein V(s) t ) Is to put the state s t An output value function after input to the critic network, gamma being a discount factor;
s3-2, calculating loss functions of an actor network and a critic network after obtaining the advantage estimation:
Wherein L is actor (θ) represents a loss function of the actor network, θ represents a parameter of the actor network,p θ (a t |s t ) Represented in state s t Lower selection action a t Epsilon is a parameter controlling the upper and lower bounds in the clip function; />Representing the average of a batch of data;
Wherein,representing the loss function of the critic network, +.>Parameters representing critic network, r i Representing a prize for slot i.
S3-3 network parameters θ and sum for actor and critic networks using Adam optimizerUpdate, θ old ←θ,/>
S3-4 repeatedly performs S3-2 and S3-3 a plurality of times.
And a third step of: and in a communication scene of coexistence of high-low orbit satellites, the working process of the deep reinforcement learning framework is as follows: and (3) inputting the data queues, channel information and matching relation between LEO satellite beams and LEO ground users, which are required to be transmitted, of each user of the time slot t, as states into an actor network, wherein the output action of the actor network is a beam power distribution scheme of the time slot t. And sequentially inputting a plurality of continuous states of one track into an actor network, wherein the actor network can sequentially output the optimal beam power distribution scheme of each time slot in real time, and finally, the goal of optimizing the long-term weighted sum rate is reached. It should be noted that in practical applications, the critic network is no longer operational.
Examples
The following analysis was performed with this example: to simplify the evaluation scenario, the present embodiment considers that a communication system in which high-low orbits coexist is composed of one high-orbit satellite and four low-orbit satellites, and the low-orbit satellite group shares the spectrum of the high-orbit satellite communication system and transmits information. The GEO satellite has a single beam, and simultaneously serves 5 GEO ground users in the beam coverage area; the LEO satellite constellation consists of 4 LEO satellites with 7 beams that cooperatively serve 10 LEO terrestrial users.
The method provided by the embodiment of the invention is used for analyzing the optimization problem of LEO beam power. In order to verify the beneficial effects brought by the embodiment of the invention, in the same scene, the traditional iteration-based method is compared with the DRL algorithm of the actor-critic structure.
The system parameters were set as follows:
relevant parameters of the satellite system are initialized, including but not limited to frequencies, receiver noise temperatures, noise bandwidths, antenna efficiency, high orbit satellite antenna diameter, low orbit satellite antenna diameter, 3dB angle, rain fade mean and standard deviation, low orbit satellite operational altitude, low orbit satellite beam number, maximum power of low orbit satellite in table 5 and table 6.
Table 5 satellite channel system parameters
/>
TABLE 6 simulation parameters for high-low orbit satellite systems
Initializing relevant parameters of the DRL model, as shown in Table 7 below, providing super parameters for training the actor and critic networks, the relevant parameters including at least one of: the number of layers of the actor and critic neural network, the node number of each layer, discount factors, learning rate, training rounds, track length, update interval, update times and shearing factors. In training the network, the state is composed of a queue of user channel state information and user demand data, and t=is randomly generated in each round50 sets of data, a complete training process considers 6000 batches. Through multiple experiments, the learning rate of the network is set to be 10 -5
Table 7 DRL algorithm simulation parameter settings
The active and critic neural networks are constructed, each of which is composed of an input layer, a full connection layer and an output layer, and the network structure is as follows in table 8: < input dimension, 256, 128, output dimension >. Each hidden layer is then given a nonlinear relationship to the data using the tanh function as an activation function.
Table 8 network configuration parameters
As comparison schemes, three baseline schemes were considered.
(1) Scheme based on split optimization (Fractional optimization-based scheme, FO): the FO scheme is an improvement over the CBSS-MQR scheme mentioned by P.Gu et al in "Dynamic Cooperative Spectrum Sharing in a Multi-beam LEO-GEO Co-Existing Satellite System". In this scheme, the maximized weighted sum rate problem is converted into three sub-problems (beam and user allocation (beam and user allocation, BUA), beam power control (beam power control, BPC) and time proportional allocation (time proportional allocation, TPA)). This scheme can only handle instantaneous beam power allocation. And the optimization problem (6) is the need to solve for multiple beam power allocations in successive time slots. Thus, the optimization problem (6) is broken down into a series of sub-optimization problems:
The FO scheme is to decompose the optimization problem shown in equation (6) into a series of static sub-optimization problems, and then solve the sub-problems using CBSS-MQR scheme (13).
(2) Improved form optimization-based scheme (IFO): IFO is a modified version of FO. When the proportion of the remaining time is not zero and there are still users not adequately served, the FO scheme will result in wasted time resources and low spectral efficiency. Thus, an IFO scheme has been developed that fully utilizes time resources. Specifically, in IFO, the BUA algorithm (grouping all users not fully served) and the BPC algorithm are sequentially and continuously performed as long as there is a remaining time proportion and users not fully served. The IFO algorithm is described in detail in algorithm 2.
Algorithm 2 improved algorithm based on partial optimization
(3) DNN approximation IFO scheme (The DNN Approximate IFO, dnneafo): the IFO scheme fully utilizes time resources and improves the throughput of the system. However, the IFO scheme has a high computational complexity. Therefore, in order to better meet the real-time requirements of satellite communication, dnniafo schemes are proposed, and the BPC-DNN based on the DNN model is used to replace the BPC algorithm based on iterative optimization to solve the beam power allocation sub-problem, wherein the DNN model is trained by using the solution of the BPC algorithm as a tag.
Compared with algorithm 2, the dnneafo algorithm is identical to the rest of the steps except for the 7 th line of algorithm 2, replacing the 7 th line "BPC algorithm" step in algorithm 2 with a "BPC-DNN" step: obtain the d i Channel gain vector for individual user groupsThe input to the DNN network is +.>The output is user group d i Power allocation scheme->". Wherein (1)>The expression is as follows:
wherein d i Represents the d i An indication vector for each user group. If k epsilon d i ,d i [k]=1; otherwise, d i [k]=0。
Specifically, the DNN model consists of one input layer, three hidden layers, and one output layer. Hidden layer neurons are 400. The activation functions of the hidden layer and the output layer are the tanh function and the clipped relu function, respectively. Preprocessing the training data set, determining the proportion of the training set, the verification set and the test set to be 18:1:1, wherein an optimizer is rmsprop, and the maximum iteration number is 600. The minimum batch data size is 128, the initial learning rate is 0.01, and the learning rate is reduced to half of the original learning rate for 70 times per iteration. The dnnAifo scheme can quickly obtain a power distribution result, and the rate performance is close to that of an IFO scheme
Fig. 4 depicts a training process diagram in which the progressive prize increases and converges for λ=10, 20, and λ=50. Lambda is the mean of the poisson distribution followed by each slot LEO ground user to arrive at the data. Each point on the curve is the average jackpot for 4 tracks (a batch of training data). It can be observed from fig. 4 that after a lot of training, the jackpot increases and converges meaning that the neural network model has been trained, indicating the convergence of the proposed drlda algorithm. The jackpot exhibits a fault-like decline during learning because the model is still in the exploratory phase during the initial stage of training, and the model is not suitable for the newly emerging training set, resulting in a dramatic decline in the prize. The model converged in approximately 5000 iterations, indicating that the model has had experience maximizing cumulative return. The network model was tested, 10000 sets of scenes were tested, and the feasibility of the model was 100%.
Fig. 5 shows the generalization performance of the present invention with respect to the parameter λ. To evaluate model generalization with respect to λ in the drlda scheme, the average sum rate of 6 training models at different λ was tested and compared as shown in fig. 5. The number of LEO ground user random arrival data in the training data set used to train these 6 models obeys the parameters λ=10, 20,..60 Mbps, respectively. In fig. 5, the horizontal axis λ has a value ranging from 10Mbps to 90Mbps, the interval is 10Mbps, and 9 points on the horizontal axis correspond to 9 sets of test data sets, and the rest of the data sets are identical except for the random arrival data amount of LEO ground users. As can be seen from fig. 5, the 6 models were tested on different test data sets, respectively, resulting in an average sum rate approximation, whereas the model with λ=20 Mbps corresponds to the highest average sum rate. Since the random arrival data volume of LEO users in the training data set obeys the poisson distribution of a certain lambda, and the model is still applicable to the test set of the poisson distribution of other lambda, the model of the drlBPA scheme has good generalization capability and robustness to lambda. The generalization capability shows that the Agent has learned a good strategy and can be applied to the situation of dynamic change of lambda.
Therefore, in the following simulation experiments, performance of various aspects of the network model of λ=20 was mainly tested. The sum rates obtained by the different algorithms in the smaller and larger scenes of λ are compared, respectively.
Fig. 6 shows graphs of cumulative distribution functions of downlink and velocity of drbpa and three comparative methods in dynamically solving the beam power allocation problem in the scenes of λ=10mbps and λ=50mbps when the drbpa method has converged. Each curve in fig. 6 is plotted against the average sum rate obtained for 100 implementations. As can be seen from fig. 6, when λ is small, the sum rate obtained for the other three schemes is substantially the same, except for the FO scheme, which is higher than the FO scheme. When λ is larger, the proposed drlda scheme can achieve higher throughput. Compared with a comparison scheme, the drlBPA scheme provided by the invention can effectively improve the throughput of the LEO satellite system. And in the case that the LEO user arrives at a larger data volume, the performance improvement is more obvious.
When lambda is smaller, the data rate of random arrival of LEO satellite ground users is smaller, the capacity of a receiving channel obtained by the drlBPA scheme and the IFO scheme is larger than the data rate of random arrival of users, and data of all users can be transmitted in one time slot. Thus, the actual transmission rates obtained by drlda and IFO schemes are the same (equivalent to data queue Q) with smaller λ. The dnnAifo scheme is to acquire data labels based on the IFO scheme, and once the network model in the dnnAifo scheme is trained, the sum rate close to the IFO scheme can be acquired quickly. And when lambda is larger, the data rate of random arrival of LEO satellite ground users is larger, the actual transmission rate of each scheme depends on the capacity of a receiving channel, and the larger the channel capacity is, the higher the actual transmission rate is.
Fig. 7 and 8 compare LEO user queue backlog conditions for the dnneafo scheme and three comparison schemes averaged over 50 consecutive time slots. As shown in fig. 7, when λ=10 Mbps, the average user queue backlog for the FO-only scheme is high, exhibiting instability. In fig. 8, in the case of λ=50 Mbps, the average user backlog corresponding to all schemes is large, exhibiting instability. However, the number of unstable queue backlog LEO users for the proposed drlda scheme and the three comparison schemes are 4, 9, 6 and 6, respectively. The LEO user maximum queue backlog for the drlBPA scheme is approximately 1200, which is less than the queue backlog for the three comparison schemes. In contrast, the proposed drlda scheme has better system stability.
As shown in fig. 9, the proposed drlda scheme and three comparison schemes are a comparison of the corresponding average and rate with increasing poisson distribution parameters. As can be seen from fig. 9, the mean sum rate of the drbpa scheme presented herein is better than the three comparative schemes. This is because a large amount of training enables agents to fully explore the space for beam power, better learning the mapping between states and beam power strategies.
In FIG. 9, when λ is less than or equal to 20Mbps, the average and rate are the same for other schemes than the FO scheme. Because in this case, when λ is small, the channel capacity received by each LEO terrestrial user is much larger than the amount of data that arrives at random. Besides FO, other methods can complete transmission of queue backlog data. The FO scheme allocates only one time proportion to each group of users in one slot. Under the condition of smaller lambda, only a small part of time is allocated in each user group to meet the requirement that at least one LEO user finishes transmission, so that time resources are wasted, and backlogged data in all user queues cannot be guaranteed to finish transmission. Thus, the FO scheme results in a lower average sum rate than the other three schemes. Furthermore, the IFO scheme sacrifices computational complexity in exchange for an increase in spectral efficiency compared to the FO scheme that does not fully utilize time resources. When the remaining time proportion of one slot is sufficient, the IFO scheme continues to allocate time to the user group of the unfinished transmission with the largest weighted sum rate until all users complete the transmission, while the dnneafo scheme only reduces the computational complexity compared to the IFO scheme.
Furthermore, in FIG. 9, when λ is greater than or equal to 130Mbps, the average sum rate of the three comparison schemes is the same, since in the case where λ is greater, the LEO userQuantity of random arrival data A k (t) channel capacity C much greater than LEO users k (t). The three comparison schemes can only complete the time proportion distribution of the user group once in one time slot, and the residual time proportion is zero. Thus, the rates at which the three comparative schemes actually transmit are all limited by the channel capacity and are the same.
Table 9 four schemes obtain the execution time (ms) of resource allocation
drlBPA FO IFO dnnAifo
λ=10 0.133460 125.210 640.330 290.420
λ=50 0.130034 132.081 387.562 209.812
λ=90 0.123437 120.470 205.570 138.821
λ=130 0.133476 93.5250 148.801 86.0400
The computational complexity of the proposed framework is evaluated. Table 9 lists the average calculation time for the drbpa protocol proposed by the present invention and three comparative methods under 10000 experimental environments. The results show that the method has a minimum execution time (about 0.130 ms) of about one thousandth of the execution time of the three comparison schemes, since the neural network is trained offline. The result shows that the drlBPA has remarkable advantage in the operation time and can meet the requirement of a satellite communication system on the real-time BPA.
Furthermore, it can be seen from table 9 that the average run time of the IFO scheme is the longest. The dnneafo scheme has a shorter calculation time than the IFO scheme. But the improvement in computation time is limited. This is because in the dnneafo scheme, only the dnn model with lower computational complexity is used instead of the iterative process of solving the BPC algorithm. In addition, the dnneafo scheme has longer calculation time when λ is equal to or less than 90Mbps and shorter calculation time when λ=130 Mbps, compared to FO. This is because at time slot t, when S k (t)≥Q k At (t), the time resources are not fully utilized. In the dnneafo scheme BUA and BPC-DNN are performed multiple times (for all user groups), whereas in the FO scheme BUA and BPC are performed only once (for all user groups). The dnneafo scheme sacrifices computation time to achieve higher LEO system rates. When S is k (t)<Q k At (t), BUA and BPC-DNN (for all user groups) in the dnneafo scheme are performed only once, the remaining time ratio is zero. In this case, the advantage of dnneafo over time based on the network model is highlighted.
The embodiment of the invention provides computer equipment. The computer device of this embodiment includes: a processor, a memory, and a computer program stored in the memory and executable on the processor. The steps of the various method embodiments described above are implemented when the processor executes the computer program. Alternatively, the processor may implement the functions of the modules/units in the above-described device embodiments when executing the computer program.
The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer device may include, but is not limited to, a processor, a memory.
The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like.
The memory may be used to store the computer program and/or modules, and the processor may implement various functions of the computer device by running or executing the computer program and/or modules stored in the memory, and invoking data stored in the memory.
The modules/units integrated with the computer device may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as stand alone products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The downlink spectrum sharing and beam power dynamic allocation method is characterized by comprising the following steps:
constructing a maximized long-term weighted sum rate model under beam power constraint and interference threshold constraint of a communication scene for coexistence of high-low orbit satellites;
constructing a deep reinforcement learning framework for solving the maximized long-term weighted sum rate problem;
training a neural network in the deep reinforcement learning framework by adopting a near-end strategy optimization algorithm;
and taking the data queues and channel information which are required to be transmitted by the ground users of the LEO satellites in each time slot and the matching relation between the LEO ground users and LEO satellite beams as states, and sequentially inputting the states into a trained neural network model to obtain an LEO satellite beam power distribution scheme in each time slot.
2. The method for downlink spectrum sharing and beam power dynamic allocation according to claim 1, wherein said constructing a maximized long-term weighted sum rate model under beam power constraints and interference threshold constraints for a communication scenario of high-low orbit satellite coexistence comprises:
The maximized long-term weighted sum rate model is as follows:
wherein T represents the number of time slots of the low-orbit satellite group passing through the area nearby the collineation;LEO beam power allocation scheme representing T slots; p (t) represents LEO beam power allocation scheme for time slot t, and and->The number of ground users of the low-orbit satellite and the high-orbit satellite are respectively represented; n (N) S Representing the number of LEO satellites in the LEO satellite constellation; n (N) B Representing the number of beams per LEO satellite; t represents a time slot; k represents the index of the LEO ground user; q (Q) k (t) represents a virtual data queue that the kth LEO terrestrial user needs to transmit; c (C) k (t) represents the channel capacity of the kth LEO terrestrial subscriber; r is R k (t) represents the actual transmission rate received by the kth LEO terrestrial user, rk (t) =min (Qk (t), ck (t)); />Representing interference factors of the LEO system on the GEO system; g i,j,g (t) represents the channel gain from the jth beam of the ith LEO satellite to the g-th GEO terrestrial user; p (P) i,j (t) represents the power of the j-th beam of the i-th LEO satellite; i th Representing a tolerable interference threshold for the GEO system; m is m i,j,k (t) represents the matching relationship of the j-th beam of the i-th satellite with the k-th LEO ground user; />Representing the maximum beam power of the LEO satellite.
3. The method for downlink spectrum sharing and beam power dynamic allocation according to claim 2, wherein the virtual data queue Q k The queuing dynamics of (t) satisfy the following conditions:
wherein S is k (t) represents the transmission rate that can be provided to the user in time slot t; a is that k (t) represents the data rate at which the kth LEO terrestrial user arrives at random at time slot t, A k (t) obeys a poisson distribution of parameter lambda.
4. The method of claim 1, wherein solving a deep reinforcement learning framework that maximizes a long term weighted sum rate problem comprises:
an intelligent agent: the intelligent agent corresponds to a network management center NMC;
environment: the environment is all factors in a satellite communication system;
status: the state s t Is the input of the neural network and is the basis for the intelligent agent to execute the action; data queue q (t), channel information of LEO satellite ground user of time slot tAnd LEO satellite beam to LEO terrestrial user matching relation +.>Modeling is as the state:
the LEO satellite ground user's data queue q (t) is represented as:
the channel informationExpressed as:
wherein,channel gain vector vec (H) including ith LEO satellite to LEO terrestrial user i (t)) T ,vec() T Representing a matrix vectorization function, converting the matrix into a row vector; interference channel gain vector vec (G) from ith LEO satellite to GEO terrestrial user i (t)) T The method comprises the steps of carrying out a first treatment on the surface of the Interference channel gain vector g from GEO satellite to LEO ground user L (t);
Wherein H is i (t) is represented as follows:
wherein h is i,j,k (t) represents the channel gain between the jth beam of the ith LEO satellite to the kth LEO ground user; g i (t) is represented as follows:
wherein g i,j,g (t) represents the channel gain between the jth beam of the ith LEO satellite to the g-th GEO terrestrial user; interference channel gain vector g L (t) is represented as follows:
using a sign function to indicate a matching relationship between a j-th beam of an i-th LEO satellite and a kLEO ground user, h th Representing a channel gain threshold;meaning that the j-th beam of the i-th LEO satellite covers and serves the k-th LEO satellite terrestrial users +.>Otherwise, m i,j,k =0 means that the LEO beam main lobe is not covered with the ground user +.>Thus, vec (M i (t)) T Representing a matching relationship between a beam of an ith LEO satellite and LEO ground users, wherein M i (t) is represented as follows:
thus, the matching relationship of the beams of the LEO satellite group to LEO terrestrial usersThe expression is as follows:
the actions are as follows: the action a t Is the output of the neural network, and the agent changes the environment by executing actions; the action is modeled as a LEO beam power allocation vector as follows:
wherein,in the current state s t Next, select action a t The environment will then obtain the reward and take the state from s t Change to s t+1
Rewarding: the prize r t Is the state s of the intelligent agent passing observation t Output action a t Feedback from the environment to the agent; the agent takes action a t Later, it is necessary to know whether this action meets or approaches the optimization objective; prize r t Action a as an environment pair t Is feedback to action a t An index of quality evaluation; giving a negative reward for actions that do not reach the beam power constraint or the interference constraint; the prize r t The expression is as follows:
wherein c 1 Representing a preset weight, c 2 And c 3 Respectively representing penalty factors which do not meet interference constraint and penalty factors which do not meet beam power constraint; i max (t) and I th Respectively representing maximum interference and interference threshold received by ground users at time slot t GEO, n p (t) represents an action r t The number of beams that do not meet the beam power constraint; r is R k (t) represents the actual transmission rate received by the kth LEO terrestrial user; q (Q) k (t) represents a virtual data queue that the kth LEO terrestrial user needs to transmit.
5. The method for downlink spectrum sharing and beam power dynamic allocation as claimed in claim 4, wherein the userChannel capacity C at time slot t k (t) the following:
wherein,represents the kth LEO ground user, b L Represents LEO satellite beam bandwidth, ψ k (t) represents the user +.>Is a received signal to noise ratio of (2);
the LEO ground userIs to be received signal-to-noise ratio psi k (t) as follows:
wherein at time slot t, x i,j,k (t) indicates whether the j-th beam of the i-th LEO covers and serves the k-th LEO ground user, x i,j,k (t) =1, representing overlay and service, otherwise x i,j,k (t) =0 denotes uncovered; h represents a satellite channel model, h i,j,k (t) represents the j-th beam of the i-th LEO satellite to the k-th LEO ground userChannel gain between; g k (t) represents the kth user from the beam of GEO satellite to LEO satellite +.>Channel gain between; />Representing noise power;
the satellite channel model h is as follows:
wherein G is T 、G R L and A R Respectively representing the antenna gain of a satellite user, the antenna gain of a terminal user, free space loss and rain fade; the antenna gain at the transmitting end or the receiving end is expressed as:
wherein G is 0 Is the maximum antenna gain when the off-axis angle is 0; j (J) 1 (v) and J 3 (v) represents the first and third order Bessel functions respectively,wherein, theta meterOff-axis angles are shown, related to satellite and end user antenna direction and position; θ 3dB Representing the off-axis angle corresponding to the 3dB beam width; since LEO user antenna continuously tracks LEO satellite, the off-axis angle of receiving end is 0,G R =G 0 The method comprises the steps of carrying out a first treatment on the surface of the Free space loss l= (4 pi d) 2 f 2 /c 2 Wherein d, f and c represent the distance, frequency and speed of light, respectively, between the satellite and the ground user; rain fade A R =ζ -1/2 Wherein ζ represents the amplitude of rain fade, A R Obeys a log-normal distribution.
6. The method for downlink spectrum sharing and beam power dynamic allocation according to claim 5, wherein training the neural network in the deep reinforcement learning framework using the near-end policy optimization algorithm comprises:
step 1, initializing relevant parameters in a satellite communication system and a deep reinforcement learning framework;
step 2, agent and environment interact, store batch_size strip data set [ s ] t ,a t ,r t ,s t+1 ]Wherein batch_size represents the size of a batch; specifically, at time slot t, the Agent observes the environment to obtain the current state s t Will s t Respectively inputting the beam motion data into an actor and a critic network, outputting a normal distribution mean value of beam motion by the actor network, establishing normal distribution according to the mean value, and sampling to obtain a motion a t The method comprises the steps of carrying out a first treatment on the surface of the Executing action a t And calculate the return function r t Acquiring the state s of the next time slot t+1 The method comprises the steps of carrying out a first treatment on the surface of the In this way, the data [ s ] of the batch_size group is stored t ,a t ,r t ,s t+1 ];
Step 3, training an actor and a critic neural network;
and 4, repeating the step 2 and the step 3 until the set iteration times or the convergence of the loss function are reached, and ending the training process of the actor and critic network.
7. The method for downlink spectrum sharing and beam power dynamic allocation according to claim 6, wherein said step 3 is specifically as follows:
s3-1 extraction of batch_size stripe data [ S ] t ,a t ,r t ,s t+1 ]The dominance estimation function is calculated according to the following equation:
A t =-V(s t )+r t +γr t+1 +…+γ T-t-1 r T-1T-t V(s T ),t={1,2,...,T}
wherein V(s) t ) Is to put the state s t An output value function after input to the critic network, gamma being a discount factor;
s3-2, calculating loss functions of an actor network and a critic network after obtaining the advantage estimation:
wherein L is actor (θ) represents a loss function of the actor network, θ represents a parameter of the actor network,p θ (a t |s t ) Represented in state s t Lower selection action a t Epsilon is a parameter controlling the upper and lower bounds in the clip function; />Representing an average of a batch of data;
wherein,representing the loss function of the critic network, +.>Parameters representing critic network, r i A prize indicating time slot i;
s3-3 network parameters θ and sum for actor and critic networks using Adam optimizerUpdate, θ old ←θ,
S3-4 repeatedly performs S3-2 and S3-3 a plurality of times.
8. A downlink spectrum sharing and beam power dynamic allocation system, comprising:
the model building module is used for building a maximized long-term weighted sum rate model under the beam power constraint and the interference threshold constraint of the communication scene for coexistence of the high-low orbit satellites;
the framework construction module is used for constructing a deep reinforcement learning framework for solving the maximized long-term weighted sum rate problem;
the model training module is used for training the neural network in the deep reinforcement learning framework by adopting a near-end strategy optimization algorithm;
the power distribution scheme calculation module is used for regarding the data queues, channel information and matching relation between LEO ground users and LEO satellite beams, which are required to be transmitted, of the LEO satellites in each time slot as states, and sequentially inputting the states into the trained neural network model to obtain the LEO satellite beam power distribution scheme in each time slot.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1-7.
CN202311540998.XA 2023-11-17 2023-11-17 Method and related device for downlink spectrum sharing and beam power dynamic allocation Pending CN117596681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311540998.XA CN117596681A (en) 2023-11-17 2023-11-17 Method and related device for downlink spectrum sharing and beam power dynamic allocation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311540998.XA CN117596681A (en) 2023-11-17 2023-11-17 Method and related device for downlink spectrum sharing and beam power dynamic allocation

Publications (1)

Publication Number Publication Date
CN117596681A true CN117596681A (en) 2024-02-23

Family

ID=89921193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311540998.XA Pending CN117596681A (en) 2023-11-17 2023-11-17 Method and related device for downlink spectrum sharing and beam power dynamic allocation

Country Status (1)

Country Link
CN (1) CN117596681A (en)

Similar Documents

Publication Publication Date Title
Hu et al. Multi-agent deep reinforcement learning-based flexible satellite payload for mobile terminals
Li et al. Joint pricing and power allocation for multibeam satellite systems with dynamic game model
Hu et al. Deep reinforcement learning‐based beam Hopping algorithm in multibeam satellite systems
CN114362810A (en) Low-orbit satellite beam hopping optimization method based on migration depth reinforcement learning
CN111867104B (en) Power distribution method and power distribution device for low earth orbit satellite downlink
CN114071528B (en) Multi-beam satellite beam resource adaptation method based on service demand prediction
CN113938183A (en) Communication resource allocation method based on non-orthogonal multiple access under multi-beam satellite system
Ortiz-Gomez et al. Cooperative multi-agent deep reinforcement learning for resource management in full flexible VHTS systems
Li et al. Spectrum allocation with asymmetric monopoly model for multibeam-based cognitive satellite networks
Cui et al. Latency Optimization for Hybrid GEO–LEO Satellite-Assisted IoT Networks
CN114900897B (en) Multi-beam satellite resource allocation method and system
Kisseleff et al. A new optimization tool for mega-constellation design and its application to trunking systems [International Communications Satellite Systems Conference]
He et al. Multi-objective deep reinforcement learning based time-frequency resource allocation for multi-beam satellite communications
Shafie et al. An unsupervised learning approach for spectrum allocation in Terahertz communication systems
Jiang et al. Dynamic user association in scalable ultra-dense LEO satellite networks
Aurizzi et al. An SDN-based traffic handover control procedure and SGD management logic for EHF satellite networks
CN117596681A (en) Method and related device for downlink spectrum sharing and beam power dynamic allocation
CN116886172A (en) Multi-beam satellite communication user selection and spectrum division method based on machine learning
Wei et al. Dynamic beam scheduling of multibeam low earth orbit satellites based on an enhanced artificial bee colony algorithm
Ortiz-Gomez et al. Supervised machine learning for power and bandwidth management in VHTS systems
Leng et al. User-level scheduling and resource allocation for multi-beam satellite systems with full frequency reuse
WO2023071142A1 (en) Distributed multi-satellite joint beam forming method
Hu et al. Dynamic power allocation in high throughput satellite communications: A two-stage advanced heuristic learning approach
CN113541768B (en) NOMA-based LEO satellite communication system frequency point distribution method
Zhao et al. RIS-assisted air-to-ground communications with non-orthogonal multiple access

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination