CN114499629A - Dynamic resource allocation method for beam-hopping satellite system based on deep reinforcement learning - Google Patents
Dynamic resource allocation method for beam-hopping satellite system based on deep reinforcement learning Download PDFInfo
- Publication number
- CN114499629A CN114499629A CN202111609439.0A CN202111609439A CN114499629A CN 114499629 A CN114499629 A CN 114499629A CN 202111609439 A CN202111609439 A CN 202111609439A CN 114499629 A CN114499629 A CN 114499629A
- Authority
- CN
- China
- Prior art keywords
- hopping
- time
- reinforcement learning
- data packet
- deep reinforcement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/14—Relay systems
- H04B7/15—Active relay systems
- H04B7/185—Space-based or airborne stations; Stations for satellite systems
- H04B7/1851—Systems using a satellite or space-based relay
- H04B7/18513—Transmission in a satellite or space-based system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/14—Relay systems
- H04B7/15—Active relay systems
- H04B7/185—Space-based or airborne stations; Stations for satellite systems
- H04B7/1851—Systems using a satellite or space-based relay
- H04B7/18519—Operations control, administration or maintenance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/02—Resource partitioning among network components, e.g. reuse partitioning
- H04W16/10—Dynamic resource partitioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/24—Cell structures
- H04W16/28—Cell structures using beam steering
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Astronomy & Astrophysics (AREA)
- Aviation & Aerospace Engineering (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Radio Relay Systems (AREA)
Abstract
The invention discloses a dynamic resource allocation method for a beam hopping satellite system based on deep reinforcement learning, which comprises the following steps: step 1, establishing a service model of a forward link of a beam hopping GEO satellite system; step 2, storing the data packet of the service reaching the ground wave position in each time slot in a data packet buffer queue; step 3, a utilization degree reinforcement learning algorithm is adopted, a resource distribution module of the satellite is modeled into an intelligent agent, and state input of the intelligent agent, output decision-making action of the intelligent agent and reward of evaluation action are designed; step 4, simulating the deep reinforcement learning algorithm in the step 3, and continuously training decision neural network weight parameters of the deep reinforcement learning algorithm; and 5, completing dynamic resource allocation of the beam hopping satellite system by the decision neural network obtained by training in the step 4, and solving an optimal scheme of resource allocation of the beam hopping satellite system. The invention reduces the transmission delay of the data packet and improves the throughput of the beam hopping satellite system.
Description
Technical Field
The invention relates to the field of satellite communication, in particular to a dynamic resource allocation method for a beam hopping satellite system based on deep reinforcement learning.
Background
In conventional multi-beam satellite systems, the power and frequency resources allocated to each beam are relatively fixed. However, since the service requests between beams are non-uniform and time-varying, the conventional allocation algorithm cannot satisfy the service requests. The Beam Hopping (BH) technique is based on time slicing: only a portion of the beams are activated to operate in the same time slot. The beam hopping technology is driven by a service request, and the utilization rate of system resources can be greatly improved. The resource allocation algorithm of the forward link of the current beam hopping satellite system mainly has an heuristic algorithm, an iterative algorithm and a convex optimization algorithm. Both heuristic algorithms and iterative algorithms have large calculation amount and are not suitable for matching with the scene of ground service dynamic change in real time regions. The convex optimization calculation is suitable for a scene with smaller influence degree of co-channel interference between beams in a beam hopping satellite system.
On the other hand, Deep Reinforcement Learning (DRL) is one of the directions that have been most spotlighted in the field of artificial intelligence in recent years. The method combines the perception of deep learning and the decision of reinforcement learning, directly controls the behavior of an intelligent agent through the learning of high-dimensional perception input, and provides a way for solving the perception decision problem of a complex system. Some researches show that the deep reinforcement learning algorithm can obtain better performance in a satellite dynamic resource allocation system, and mainly relates to the aspects of inter-beam channel allocation of a multi-beam satellite system, multi-target optimization resource allocation of the multi-beam satellite and optimized transmission delay of a beam hopping satellite.
However, the existing beam hopping resource allocation algorithm based on deep reinforcement learning does not consider the problem of co-channel interference between beams. When the working beams are adjacent, interference is inevitable. In order to alleviate the problem of co-channel interference between beams, a dynamic resource allocation method for a beam hopping satellite system based on deep reinforcement learning needs to be designed on the basis of a criterion of considering interference avoidance.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a dynamic resource allocation method of a beam-hopping satellite system based on deep reinforcement learning.
The invention adopts the following technical scheme for solving the technical problems:
the invention provides a dynamic resource allocation method of a beam hopping satellite system based on deep reinforcement learning, which comprises the following steps:
step 3, introducing a deep reinforcement learning algorithm, modeling a resource allocation module of the satellite into an intelligent agent, and designing state input of the intelligent agent, output decision-making action of the intelligent agent and reward of evaluation action;
and 5, completing dynamic resource allocation of the beam hopping satellite system by the decision neural network obtained by training in the step 4, and solving an optimal scheme of resource allocation of the beam hopping satellite system.
As a further optimization scheme of the dynamic resource allocation method of the beam hopping satellite system based on deep reinforcement learning, a service model of a forward link of the beam hopping satellite system is established in step 1, and the method specifically comprises the following steps:
in a beam-hopping satellite system, the ground wave position psi is defined as psi ═ c n1,2,3, N, where N represents the total number of ground wave bits, cnThe maximum working wave beam number is K which is less than or equal to N, and the wave beam jumping period is defined as T which is { T }1,t2,...,tj,...,tJWhere t isjJ is more than or equal to 1 and less than or equal to J, wherein J is the total number of the beam hopping time slots;
tjtime hopping beamsPattern(s)Wherein the content of the first and second substances,represents tjWhen c isnWhether it is illuminated by the operating beam or not,represents tjTime working beam ignition cn,Represents tjNo working beam is on during the time cn;
Wherein, ciIs the ith ground wave position,represents tjTime working beam ignition ciTime pair cnThe power gain includes satellite antenna transmit gain, free space loss, rain attenuation, and antenna receive gain;represents tjTime working beam ignition cnTime pair cnThe power gain of (a) is determined,representtjTime working beam to cnThe power of the transmitted power of the satellite,represents tjTime working beam to ciSatellite transmission power of, N0Is the noise power spectral density, W is the satellite spectrum bandwidth,represents tjTime working beam ignition ci;
Wherein f isDVB-S2() is a piecewise function of the european telecommunications standards institute standard on signal to interference plus noise ratio and spectral efficiency.
As a further optimization scheme of the dynamic resource allocation method of the beam hopping satellite system based on deep reinforcement learning, the specific process of establishing the optimization problem of minimizing the transmission delay of the data packet in the step 2 is as follows:
tjnew arrival of time cnIs defined asStoring data packets in a packet buffer queueWhereinRepresents tjWhen c isnThe data packet buffer queue of (a) is, representing the j-q th beam hopping time slot tj-qTime of arrival cnQ is not less than 0 and not more than Tth,TthIs the maximum transmission delay of the data packet;
if the transmission delay of the data packetExceeding TthThen the packet is discarded; wherein, the first and the second end of the pipe are connected with each other,
wherein, tjIs the time slot, t, in which the data packet is transmittedkIs the time slot for the data packet to arrive at the ground wave position;
in summary, the following optimization problem P for minimizing the packet transmission delay is established:
wherein, the first and the second end of the pipe are connected with each other,represents tkTo cnThe data packet of (1). Equation (5) indicates that the largest operating beam in a single slot cannot exceed K,represents tjTime working beam ignition cnEquation (6) shows that the sum of the powers of the operating beams in any time slot of the beam hopping period cannot exceed the total power P of the satellitetotThe power of a single working beam in any time slot of the beam hopping period shown in the formula (7) can not exceed the maximum power P of the single beambEquation (8) shows that the maximum transmission delay of the data packet cannot exceed Tth。
As a further optimization scheme of the dynamic resource allocation method of the beam hopping satellite system based on deep reinforcement learning, the step 3 is as follows:
step 301, state design of a deep reinforcement learning algorithm:
state stDefined as two attributes of the number of data packets and the average transmission delay of the data packets, and expressed by formula (9):
wherein the content of the first and second substances,represents tjTime of flightThe total number of data packets in the packet is defined as Represents tjTime of flightThe number of data packets in the packet stream,to representThe average transmission delay of the data packet in the inner is defined as Represents tjTime of flightAverage transmission delay of the data packets in the packet queue;
step 302, action design of a deep reinforcement learning algorithm:
hopping beam patternAs agent at tjOutput action of timeObtaining a group of beam hopping pattern sets through an iterative algorithm as actions of a deep reinforcement learning algorithm
Wherein, XkThe number of the k-th hopping beam pattern in the hopping beam pattern set is more than or equal to 1, k is less than or equal to num, and num is the number of the hopping beam patterns in the group of hopping beam pattern sets;
the iterative algorithm specifically comprises the following steps:
initializing ground wave position numbers, dividing the ground wave positions into M clusters, and setting a same frequency interference threshold value, wherein i is 1, and k is 1;
secondly, if all the ground wave positions in the ith cluster are contained in the Set, selecting one ground wave position from the ith cluster and lighting the ground wave position; otherwise, selecting one ground wave position which does not contain the Set from the ith cluster, and lightening the ground wave position;
step three, calculating the same frequency multiplexing distance of the working wave beam, if the same frequency multiplexing distance is larger than the same frequency interference threshold value, adding the ground wave position selected in the step two into the wave beam hopping pattern XkOtherwise, the ground wave position which enables the same-frequency multiplexing distance to be maximum in the ith cluster is selected to be added into the hopping beam pattern Xk;
If i is not equal to M, i is equal to i +1, and the step (II) is returned; if i is equal to M, X iskAdding into Set;
step five, if all elements in Set meet X1∪X2∪...∪XkIf psi, the iteration is terminated; otherwise k equals k +1, i equals 1, and the step is returned;
step 303, reward design of the deep reinforcement learning algorithm:
wherein | · represents hadamard matrix operation, and | | · | |, represents the sum of all elements in the matrix.
As a further optimization scheme of the dynamic resource allocation method of the beam hopping satellite system based on deep reinforcement learning, the step 4 is specifically as follows:
step I, initializing a satellite scene, and initializing a data packet buffer queue;
step II, initializing a satellite agent, initializing weight parameters of a decision neural network and a target network, initializing the training step number step of the decision neural network to be 0, and setting the updating step length of the target network to be G;
step III, initializing the capacity of the experience pool, setting a training period number E and a beam hopping time slot number J of each period, wherein the initialization training period E is 1, and the initialization time slot number J is 1;
step IV, tjThe time data packet arrives at the ground wave position, and the satellite environment state information at the time is observed and extracted as
Step V, randomly selecting one beam hopping pattern X in the Set obtained in step 302 according to probability epsilonkAsOr selecting the action corresponding to the maximum action value output by the decision neural network according to the probability 1-epsilon as the action
Step VI, selected by performing step VAt which time the environment transitions to the next stateAnd obtain the prize at that time
step VIII, randomly sampling a plurality of experience pieces of information from an experience pool, calculating a loss function, and training a decision neural network by using an Adam algorithm, wherein step is step + 1;
step IX, if step number step of training the decision neural network is a multiple of G, updating the weight parameter of the target network to be the weight parameter of the decision neural network, and executing step X;
if step number step of training decision neural network is not multiple of G, executing step X;
step X, firstly, judging whether J is equal to J or not, if J is not equal to J, then J is J +1, and returning to the step IV;
if J is equal to J, continuing to judge whether E is equal to E: if E is not equal to E, E +1, and the probability epsilon is reduced, the process returns to step IV, and if E is equal to E, the process is terminated.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
the invention provides a feasible scheme for resource allocation of a satellite communication system based on beam hopping, a deep reinforcement learning algorithm is introduced, a satellite is modeled into an intelligent body, a decision neural network of the algorithm is continuously trained by designing the state, action and reward of the deep reinforcement learning algorithm, and finally the decision neural network obtained by training is used for completing the resource allocation of the beam hopping satellite system.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a schematic diagram of a forward link traffic model of a beam-hopping satellite system;
FIG. 3 is a deep reinforcement learning algorithm framework diagram;
FIG. 4 is a flow chart of an iterative algorithm for beam hopping pattern design;
FIG. 5 is a graph comparing the transmission delays of data packets for a deep reinforcement learning algorithm, a random distribution algorithm, and a fixed distribution algorithm;
FIG. 6 is a graph comparing average throughput for a deep reinforcement learning algorithm, a random distribution algorithm, and a fixed distribution algorithm system.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings:
referring to fig. 1, the specific implementation steps of the present invention are as follows:
The forward link traffic model of a beam-hopping satellite system is shown in fig. 2. In a beam-hopping satellite system, the ground wave position psi is defined as psi ═ c n1,2,3, N, where N represents the total number of ground wave bits, cnThe maximum working wave beam number is K which is less than or equal to N, and the wave beam jumping period is defined as T which is { T }1,t2,...,tj,...,tJWhere t isjJ is more than or equal to 1 and less than or equal to J, wherein J is the total number of the beam hopping time slots;
tjtime hopping beam patternWherein the content of the first and second substances,represents tjWhen c isnWhether it is illuminated by the operating beam or not,represents tjTime working beam ignition cn,Represents tjNo working beam is on during the time cn;
Wherein, ciIs the ith ground wave position,represents tjTime working beam ignition ciTime pair cnThe power gain includes satellite antenna transmit gain, free space loss, rain attenuation, and antenna receive gain;represents tjTime working beam ignition cnTime pair cnThe power gain of (a) is determined,represents tjTime working beam to cnThe power of the transmitted power of the satellite,represents tjTime working beam to ciSatellite transmission power of, N0Is the noise power spectral density, W is the satellite spectrum bandwidth,represents tjTime working beam ignition ci;
Wherein f isDVB-S2() is a piecewise function of the european telecommunications standards institute standard on signal to interference plus noise ratio and spectral efficiency.
And 2, establishing an optimization problem of minimizing transmission delay, and solving an optimal scheme of hopping beam resource allocation.
tjNew arrival of time cnIs defined asStoring data packets in a packet buffer queueWhereinRepresents tjWhen c is turned onnThe data packet buffer queue of (a) is, representing the j-q th beam hopping time slot tj-qTime of arrival cnQ is not less than 0 and not more than Tth,TthIs the maximum transmission delay of the data packet;
if transmission delay of data packetExceeding TthThen the packet is discarded; wherein the content of the first and second substances,
wherein t isjIs the time slot, t, in which the data packet is transmittedkIs the time slot in which the packet arrived.
In summary, the following optimization problem P for minimizing the packet transmission delay is established:
wherein the content of the first and second substances,represents tkTo cnThe data packet of (1). Equation (5) indicates that the largest operating beam in a single slot cannot exceed K,represents tjTime working beam ignition cnEquation (6) shows that the sum of the powers of the operating beams in any time slot of the beam hopping period cannot exceed the total power P of the satellitetotThe power of a single working beam in any time slot of the beam hopping period shown in the formula (7) can not exceed the maximum power P of the single beambEquation (8) shows that the maximum transmission delay of the data packet cannot exceed Tth。
And 3, modeling the satellite into an intelligent agent by utilizing a deep reinforcement learning algorithm, and designing reward design of state input of the intelligent agent, output decision-making action of the intelligent agent and quality of action behaviors.
The deep reinforcement learning algorithm framework is shown in fig. 3. The decision neural network is a mapping function of action values and decides the behavior and the action of the intelligent agent. In addition, in order to improve the performance of the decision neural network, an algorithm framework based on deep reinforcement learning is added into a target network and an experience pool. The specific design method of the algorithm comprises the following steps:
step 301, state design of a deep reinforcement learning algorithm:
state stDefined as two attributes of the number of data packets and the average transmission delay of the data packets, and expressed by formula (9):
wherein the content of the first and second substances,represents tjTime of flightThe total number of data packets in the packet is defined as Represents tjTime of flightThe number of data packets in the packet stream,to representThe average transmission delay of the data packet in the inner is defined as Represents tjTime of flightAverage transmission delay of the data packets in the packet;
step 302, designing the action of the deep reinforcement learning algorithm:
hopping beam patternAs agent at tjOutput action of timeObtaining a group of beam hopping pattern sets through an iterative algorithm as actions of a deep reinforcement learning algorithm
Wherein, XkThe number of the k-th hopping beam pattern in the hopping beam pattern set is more than or equal to 1, k is less than or equal to num, and num is the number of the hopping beam patterns in the group of hopping beam pattern sets;
the beam hopping pattern design flow is shown in fig. 4, and the iterative algorithm specifically includes the following processes:
initializing ground wave position numbers, dividing the ground wave positions into M clusters, and setting a same frequency interference threshold value, wherein i is 1, and k is 1;
secondly, if all the ground wave positions in the ith cluster are contained in the Set, selecting one ground wave position from the ith cluster and lighting the ground wave position; otherwise, selecting one ground wave position which does not contain the Set from the ith cluster, and lightening the ground wave position;
step three, calculating the same frequency multiplexing distance of the working wave beam, if the same frequency multiplexing distance is larger than the same frequency interference threshold value, adding the ground wave position selected in the step two into the wave beam hopping pattern XkOtherwise, the ground wave position which enables the same-frequency multiplexing distance to be maximum in the ith cluster is selected to be added into the hopping beam pattern Xk;
If i is not equal to M, if i is equal to i +1, returning to the step (II); if i is equal to M, X iskAdding into Set;
step five, if all elements in Set meet X1∪X2∪...∪XkIf psi, then iterateTerminating; otherwise k equals k +1, i equals 1, and the step is returned;
step 303, reward design of the deep reinforcement learning algorithm:
the optimization problem (4) aims at minimizing the transmission delay of the data packet, and is known from state designRepresenting a time slot tjInner wave position cnInner packet transmission delay summation. Therefore, the smaller the total transmission delay of the data packet is, the larger the reward is set, and the reward of the deep reinforcement learning algorithm can be set as follows
Wherein | · represents hadamard matrix operation, and | | · | |, represents the sum of all elements in the matrix.
And 4, setting parameters of a deep reinforcement learning algorithm, and continuously training weight parameters of an optimization decision neural network.
The algorithm comprises the following specific steps:
step I, initializing a satellite scene, and initializing a data packet buffer queue;
step II, initializing a satellite agent, initializing weight parameters of a decision neural network and a target network, initializing the training step number step of the decision neural network to be 0, and setting the updating step length of the target network to be G;
step III, initializing the capacity of the experience pool, setting a training period number E and a beam hopping time slot number J of each period, wherein the initialization training period E is 1, and the initialization time slot number J is 1;
step IV, tjThe time data packet arrives at the ground wave position, and the satellite environment state information at the time is observed and extracted as
Step V, randomly selecting one beam hopping pattern X in the Set obtained in step 302 according to probability epsilonkAsOr selecting the action corresponding to the maximum action value output by the decision neural network as the action with the probability of 1-epsilon
Step VI, selected by performing step VAt which time the environment transitions to the next stateAnd obtain the prize at that time
step VIII, randomly sampling a plurality of experience pieces of information from an experience pool, calculating a loss function, and training a decision neural network by using an Adam algorithm, wherein step is step + 1;
step IX, if step number step of training the decision neural network is a multiple of G, updating the weight parameter of the target network to be the weight parameter of the decision neural network, and executing step X;
if step number step of training decision neural network is not multiple of G, executing step X;
step X, firstly, judging whether J is equal to J or not, if J is not equal to J, then J is J +1, and returning to the step IV;
if J is equal to J, continuing to judge whether E is equal to E: if E is not equal to E, E +1, and the probability epsilon is reduced, the process returns to step IV, and if E is equal to E, the process is terminated.
And 5, finally, completing dynamic allocation of beam hopping resources of the decision neural network obtained by training.
The normalized traffic is first defined as the total packet traffic of the ground wave bits divided by the maximum available capacity of the satellite. And secondly, using the decision neural network obtained by training in the step 4 for dynamic resource allocation of the beam-hopping satellite system. And finally, comparing the performances of the data packet transmission delay and the system throughput under different normalized traffic conditions by using three algorithms of resource allocation, random allocation algorithm and fixed allocation algorithm based on deep reinforcement learning. Wherein the random allocation algorithm indicates that the operating beams are randomly selected per time slot and the fixed allocation algorithm indicates that each beam is allocated a fixed number of time slots. The simulation results are shown in fig. 5 and fig. 6.
The effects of the present invention can be further verified by the following simulation.
1. An experimental scene is as follows:
in order to illustrate the effect of the method, the result of a comparison experiment is given by adopting 36 ground wave position 6-beam GEO satellite system model simulation.
2. Experimental contents and results:
in order to verify the performance of the method, a 36-ground wave position 6-beam GEO beam hopping system model is adopted, the maximum transmission delay threshold of a data packet in satellite scene parameters is set to be 4s, the beam hopping time slot is set to be 100ms, and the number of beam hopping period time slots is set to be 256. The training period in the deep reinforcement learning algorithm is set to be 1000, the size of an experience pool is 5000, the activation function of the decision neural network is Relu, the initial exploration probability is 0.5, and the termination exploration probability is 0.01.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.
Claims (5)
1. A dynamic resource allocation method for a beam-hopping satellite system based on deep reinforcement learning is characterized by comprising the following steps:
step 1, establishing a service model of a forward link of a beam-hopping satellite system according to the characteristic of uneven time-space distribution of services of the beam-hopping GEO satellite system;
step 2, storing a data packet of a service reaching a ground wave position in each time slot in a data packet buffer queue according to the service model of the forward link of the beam hopping satellite system established in the step 1, wherein the data packet obeys the principle of first-come first-serve and establishes an optimization problem of minimizing the transmission delay of the data packet by combining the capacity which can be provided by a satellite;
step 3, introducing a deep reinforcement learning algorithm, modeling a resource allocation module of the satellite into an intelligent agent, and designing state input of the intelligent agent, output decision-making action of the intelligent agent and reward of evaluation action;
step 4, simulating the deep reinforcement learning algorithm in the step 3, initializing a satellite scene, setting parameters of the deep reinforcement learning algorithm, and continuously training decision neural network weight parameters of the deep reinforcement learning algorithm;
and 5, completing dynamic resource allocation of the beam hopping satellite system by the decision neural network obtained by training in the step 4, and solving an optimal scheme of resource allocation of the beam hopping satellite system.
2. The method for dynamically allocating resources of a beam-hopping satellite system based on deep reinforcement learning according to claim 1, wherein a service model of a forward link of the beam-hopping satellite system is established in step 1, and specifically the following steps are performed:
in a beam-hopping satellite system, the ground wave position psi is defined as psi ═ cn1,2,3, N, where N represents the total number of ground wave bits, cnThe maximum working wave beam number is K which is less than or equal to N, and the wave beam jumping period is defined as T which is { T }1,t2,...,tj,...,tJWhere t isjJ is more than or equal to 1 and less than or equal to J, wherein J is the total number of the beam hopping time slots;
tjtime hopping beam patternWherein the content of the first and second substances,represents tjWhen c isnWhether it is illuminated by the operating beam or not,represents tjTime working beam ignition cn,Represents tjNo working beam is on during the time cn;
Wherein, ciIs the ith ground wave position and is a reference ground wave position,represents tjTime working beam ignition ciTime pair cnThe power gain includes satellite antenna transmit gain, free space loss, rain attenuation, and antenna receive gain;represents tjTime working beam ignition cnTime pair cnThe power gain of (a) is determined,represents tjTime working beam to cnThe power of the transmitted power of the satellite,represents tjTime working beam to ciSatellite transmission power of, N0Is the noise power spectral density, W is the satellite spectrum bandwidth,represents tjTime working beam ignition ci;
Wherein f isDVB-S2() is a piecewise function of the european telecommunications standards institute standard on signal to interference plus noise ratio and spectral efficiency.
3. The method for dynamically allocating resources of a beam-hopping satellite system based on deep reinforcement learning according to claim 1, wherein the specific process of establishing the optimization problem for minimizing the transmission delay of the data packet in the step 2 is as follows:
tjnew arrival of time cnIs defined asStoring data packets in a packet buffer queueWhereinRepresents tjWhen c isnThe data packet buffer queue of (a) is, representing the j-q th beam hopping time slot tj-qTime of arrival cnQ is not less than 0 and not more than Tth,TthIs the maximum transmission delay of the data packet;
if transmission delay of data packetExceeding TthThen the packet is discarded; wherein the content of the first and second substances,
wherein, tjIs the time slot, t, in which the data packet is transmittedkIs the time slot for the data packet to arrive at the ground wave position;
in summary, the following optimization problem P for minimizing the packet transmission delay is established:
wherein the content of the first and second substances,represents tkTo cnThe data packet of (1). Equation (5) indicates that the largest operating beam in a single slot cannot exceed K,represents tjTime working beam ignition cnEquation (6) shows that the sum of the powers of the operating beams in any time slot of the beam hopping period cannot exceed the total power P of the satellitetotThe power of a single working beam in any time slot of the beam hopping period shown in the formula (7) can not exceed the maximum power P of the single beambEquation (8) shows that the maximum transmission delay of the data packet cannot exceed Tth。
4. The method for dynamically allocating resources of a beam-hopping satellite system based on deep reinforcement learning according to claim 3, wherein the step 3 is as follows:
step 301, state design of a deep reinforcement learning algorithm:
state stDefined as two attributes of the number of data packets and the average transmission delay of the data packets, and expressed by formula (9):
wherein, the first and the second end of the pipe are connected with each other,represents tjTime of flightThe total number of data packets in the packet is defined as Represents tjTime of flightThe number of data packets in the packet stream,to representThe average transmission delay of the data packet in the inner layer is defined as Represents tjTime of flightAverage transmission delay of the data packets in the packet;
step 302, designing the action of the deep reinforcement learning algorithm:
hopping beam patternAs agent at tjOutput action of timeObtaining a group of beam hopping pattern sets through an iterative algorithm as actions of a deep reinforcement learning algorithm
Wherein, XkThe number of the k-th hopping beam pattern in the hopping beam pattern set is more than or equal to 1, k is less than or equal to num, and num is the number of the hopping beam patterns in the group of hopping beam pattern sets;
the iterative algorithm specifically comprises the following steps:
initializing ground wave position numbers, dividing the ground wave positions into M clusters, and setting a same frequency interference threshold value, wherein i is 1, and k is 1;
secondly, if all the ground wave positions in the ith cluster are contained in the Set, selecting one ground wave position from the ith cluster and lighting the ground wave position; otherwise, selecting one ground wave position which does not contain the Set from the ith cluster, and lightening the ground wave position;
step three, calculating the same frequency multiplexing distance of the working wave beam, if the same frequency multiplexing distance is larger than the same frequency interference threshold value, adding the ground wave position selected in the step two into the wave beam hopping pattern XkOtherwise, the ground wave position which enables the same-frequency multiplexing distance to be maximum in the ith cluster is selected to be added into the hopping beam pattern Xk;
If i is not equal to M, i is equal to i +1, and the step (II) is returned; if i is equal to M, X iskAdding into Set;
step five, if all elements in Set meet X1∪X2∪...∪XkIf psi, the iteration is terminated; otherwise k equals k +1, i equals 1, and the step is returned;
step 303, reward design of the deep reinforcement learning algorithm:
where | · represents the sum of all elements in the matrix, and | · | |, where |, indicates the sum of all elements in the matrix.
5. The method for dynamically allocating resources of a beam-hopping satellite system based on deep reinforcement learning according to claim 4, wherein the step 4 is as follows:
step I, initializing a satellite scene, and initializing a data packet buffer queue;
step II, initializing a satellite agent, initializing weight parameters of a decision neural network and a target network, initializing the training step number step of the decision neural network to be 0, and setting the updating step length of the target network to be G;
step III, initializing the capacity of the experience pool, setting the number E of training cycles and the number J of beam hopping time slots of each cycle, wherein the initialized training cycle E is 1, and the initialized time slot number J is 1;
step IV, tjThe time data packet arrives at the ground wave position, and the satellite environment state information at the time is observed and extracted as
Step V, randomly selecting one beam hopping pattern X in the Set obtained in step 302 according to probability epsilonkAsOr selecting the action corresponding to the maximum action value output by the decision neural network according to the probability 1-epsilon as the action
Step VI, selected by performing step VAt which time the environment transitions to the next stateAnd obtain the prize at that time
step VIII, randomly sampling a plurality of experience pieces of information from an experience pool, calculating a loss function, and training a decision neural network by using an Adam algorithm, wherein step is step + 1;
step IX, if step number step of training the decision neural network is a multiple of G, updating the weight parameter of the target network to be the weight parameter of the decision neural network, and executing step X;
if step number step of training decision neural network is not multiple of G, executing step X;
step X, firstly, judging whether J is equal to J or not, if J is not equal to J, then J is J +1, and returning to the step IV;
if J is equal to J, continuing to judge whether E is equal to E: if E is not equal to E, E +1, and the probability epsilon is reduced, the process returns to step IV, and if E is equal to E, the process is terminated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111609439.0A CN114499629B (en) | 2021-12-24 | 2021-12-24 | Dynamic allocation method for jumping beam satellite system resources based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111609439.0A CN114499629B (en) | 2021-12-24 | 2021-12-24 | Dynamic allocation method for jumping beam satellite system resources based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114499629A true CN114499629A (en) | 2022-05-13 |
CN114499629B CN114499629B (en) | 2023-07-25 |
Family
ID=81495303
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111609439.0A Active CN114499629B (en) | 2021-12-24 | 2021-12-24 | Dynamic allocation method for jumping beam satellite system resources based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114499629B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114900897A (en) * | 2022-05-17 | 2022-08-12 | 中国人民解放军国防科技大学 | Multi-beam satellite resource allocation method and system |
CN114928401A (en) * | 2022-05-17 | 2022-08-19 | 重庆邮电大学 | Dynamic planning method for LEO inter-satellite link based on multi-agent reinforcement learning |
CN115001611A (en) * | 2022-05-18 | 2022-09-02 | 西安交通大学 | Resource allocation method of hopping beam satellite spectrum sharing system based on reinforcement learning |
CN115118331A (en) * | 2022-06-28 | 2022-09-27 | 北京理工大学 | Dynamic low-orbit double-satellite beam hopping technology based on DPP algorithm |
CN115173923A (en) * | 2022-07-04 | 2022-10-11 | 重庆邮电大学 | Energy efficiency perception route optimization method and system for low-orbit satellite network |
CN115483960A (en) * | 2022-08-23 | 2022-12-16 | 爱浦路网络技术(南京)有限公司 | Beam hopping scheduling method, system, device and storage medium for low-earth-orbit satellite |
CN116260506A (en) * | 2023-05-09 | 2023-06-13 | 红珊科技有限公司 | Satellite communication transmission delay prediction system and method |
CN116346202A (en) * | 2023-03-15 | 2023-06-27 | 南京融星智联信息技术有限公司 | Wave beam hopping scheduling method based on maximum weighting group |
CN116546624A (en) * | 2023-05-24 | 2023-08-04 | 华能伊敏煤电有限责任公司 | Method and device for predicting wave-hopping satellite service and distributing multidimensional link dynamic resources |
CN116938323A (en) * | 2023-09-18 | 2023-10-24 | 中国电子科技集团公司第五十四研究所 | Satellite transponder resource allocation method based on reinforcement learning |
CN117014061A (en) * | 2023-09-27 | 2023-11-07 | 银河航天(北京)通信技术有限公司 | Method, device and storage medium for determining satellite communication frequency band |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113541770A (en) * | 2021-07-12 | 2021-10-22 | 军事科学院系统工程研究院网络信息研究所 | Space-time-frequency refined resource management method for multi-beam satellite communication system |
CN113572517A (en) * | 2021-07-30 | 2021-10-29 | 哈尔滨工业大学 | Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning |
CN113692051A (en) * | 2021-07-23 | 2021-11-23 | 西安空间无线电技术研究所 | Cross-wave-bit resource allocation method for beam-hopping satellite |
-
2021
- 2021-12-24 CN CN202111609439.0A patent/CN114499629B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113541770A (en) * | 2021-07-12 | 2021-10-22 | 军事科学院系统工程研究院网络信息研究所 | Space-time-frequency refined resource management method for multi-beam satellite communication system |
CN113692051A (en) * | 2021-07-23 | 2021-11-23 | 西安空间无线电技术研究所 | Cross-wave-bit resource allocation method for beam-hopping satellite |
CN113572517A (en) * | 2021-07-30 | 2021-10-29 | 哈尔滨工业大学 | Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
方应勇;何辉;: "宽带卫星通信系统的跳波束技术综述", 卫星电视与宽带多媒体, no. 12 * |
韩永锋张晨张更新: "基于深度强化学习的卫星动态资源管理研究综述", 第十六届卫星通信学术年会论文集 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114900897A (en) * | 2022-05-17 | 2022-08-12 | 中国人民解放军国防科技大学 | Multi-beam satellite resource allocation method and system |
CN114928401A (en) * | 2022-05-17 | 2022-08-19 | 重庆邮电大学 | Dynamic planning method for LEO inter-satellite link based on multi-agent reinforcement learning |
CN114900897B (en) * | 2022-05-17 | 2023-04-07 | 中国人民解放军国防科技大学 | Multi-beam satellite resource allocation method and system |
CN114928401B (en) * | 2022-05-17 | 2023-07-07 | 重庆邮电大学 | LEO inter-satellite link dynamic planning method based on multi-agent reinforcement learning |
CN115001611A (en) * | 2022-05-18 | 2022-09-02 | 西安交通大学 | Resource allocation method of hopping beam satellite spectrum sharing system based on reinforcement learning |
CN115001611B (en) * | 2022-05-18 | 2023-09-26 | 西安交通大学 | Resource allocation method of beam hopping satellite spectrum sharing system based on reinforcement learning |
CN115118331A (en) * | 2022-06-28 | 2022-09-27 | 北京理工大学 | Dynamic low-orbit double-satellite beam hopping technology based on DPP algorithm |
CN115118331B (en) * | 2022-06-28 | 2023-09-19 | 北京理工大学 | Dynamic low-orbit double-star-jump beam method based on DPP algorithm |
CN115173923B (en) * | 2022-07-04 | 2023-07-04 | 重庆邮电大学 | Low-orbit satellite network energy efficiency perception route optimization method and system |
CN115173923A (en) * | 2022-07-04 | 2022-10-11 | 重庆邮电大学 | Energy efficiency perception route optimization method and system for low-orbit satellite network |
CN115483960B (en) * | 2022-08-23 | 2023-08-29 | 爱浦路网络技术(南京)有限公司 | Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium |
CN115483960A (en) * | 2022-08-23 | 2022-12-16 | 爱浦路网络技术(南京)有限公司 | Beam hopping scheduling method, system, device and storage medium for low-earth-orbit satellite |
CN116346202B (en) * | 2023-03-15 | 2024-02-09 | 南京融星智联信息技术有限公司 | Wave beam hopping scheduling method based on maximum weighting group |
CN116346202A (en) * | 2023-03-15 | 2023-06-27 | 南京融星智联信息技术有限公司 | Wave beam hopping scheduling method based on maximum weighting group |
CN116260506A (en) * | 2023-05-09 | 2023-06-13 | 红珊科技有限公司 | Satellite communication transmission delay prediction system and method |
CN116260506B (en) * | 2023-05-09 | 2023-07-04 | 红珊科技有限公司 | Satellite communication transmission delay prediction system and method |
CN116546624A (en) * | 2023-05-24 | 2023-08-04 | 华能伊敏煤电有限责任公司 | Method and device for predicting wave-hopping satellite service and distributing multidimensional link dynamic resources |
CN116546624B (en) * | 2023-05-24 | 2024-05-14 | 华能伊敏煤电有限责任公司 | Method and device for predicting wave-hopping satellite service and distributing multidimensional link dynamic resources |
CN116938323B (en) * | 2023-09-18 | 2023-11-21 | 中国电子科技集团公司第五十四研究所 | Satellite transponder resource allocation method based on reinforcement learning |
CN116938323A (en) * | 2023-09-18 | 2023-10-24 | 中国电子科技集团公司第五十四研究所 | Satellite transponder resource allocation method based on reinforcement learning |
CN117014061A (en) * | 2023-09-27 | 2023-11-07 | 银河航天(北京)通信技术有限公司 | Method, device and storage medium for determining satellite communication frequency band |
CN117014061B (en) * | 2023-09-27 | 2023-12-08 | 银河航天(北京)通信技术有限公司 | Method, device and storage medium for determining satellite communication frequency band |
Also Published As
Publication number | Publication date |
---|---|
CN114499629B (en) | 2023-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114499629A (en) | Dynamic resource allocation method for beam-hopping satellite system based on deep reinforcement learning | |
CN109639377B (en) | Spectrum resource management method based on deep reinforcement learning | |
CN111586720B (en) | Task unloading and resource allocation combined optimization method in multi-cell scene | |
CN113572517B (en) | Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning | |
CN111628855B (en) | Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning | |
CN113644964B (en) | Multi-dimensional resource joint allocation method of multi-beam satellite same-frequency networking system | |
CN114389678A (en) | Multi-beam satellite resource allocation method based on decision performance evaluation | |
CN110753319B (en) | Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles | |
CN110233755B (en) | Computing resource and frequency spectrum resource allocation method for fog computing in Internet of things | |
CN113709701B (en) | Millimeter wave vehicle networking combined beam distribution and relay selection method, system and equipment | |
CN112583453A (en) | Downlink NOMA power distribution method of multi-beam LEO satellite communication system | |
CN107682935B (en) | Wireless self-return resource scheduling method based on system stability | |
CN114071528A (en) | Service demand prediction-based multi-beam satellite beam resource adaptation method | |
CN110290542B (en) | Communication coverage optimization method and system for offshore unmanned aerial vehicle | |
CN114900225B (en) | Civil aviation Internet service management and access resource allocation method based on low-orbit giant star base | |
CN115441939B (en) | MADDPG algorithm-based multi-beam satellite communication system resource allocation method | |
CN112788605A (en) | Edge computing resource scheduling method and system based on double-delay depth certainty strategy | |
CN113115344B (en) | Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization | |
CN105873214A (en) | Resource allocation method of D2D communication system based on genetic algorithm | |
CN106792451A (en) | A kind of D2D communication resource optimization methods based on Multiple-population Genetic Algorithm | |
CN114885420A (en) | User grouping and resource allocation method and device in NOMA-MEC system | |
CN117412391A (en) | Enhanced dual-depth Q network-based Internet of vehicles wireless resource allocation method | |
CN116634450A (en) | Dynamic air-ground heterogeneous network user association enhancement method based on reinforcement learning | |
CN116963034A (en) | Emergency scene-oriented air-ground network distributed resource scheduling method | |
CN115811788A (en) | D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |