CN114499629A - Dynamic resource allocation method for beam-hopping satellite system based on deep reinforcement learning - Google Patents

Dynamic resource allocation method for beam-hopping satellite system based on deep reinforcement learning Download PDF

Info

Publication number
CN114499629A
CN114499629A CN202111609439.0A CN202111609439A CN114499629A CN 114499629 A CN114499629 A CN 114499629A CN 202111609439 A CN202111609439 A CN 202111609439A CN 114499629 A CN114499629 A CN 114499629A
Authority
CN
China
Prior art keywords
hopping
time
reinforcement learning
data packet
deep reinforcement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111609439.0A
Other languages
Chinese (zh)
Other versions
CN114499629B (en
Inventor
张晨
韩永锋
张更新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202111609439.0A priority Critical patent/CN114499629B/en
Publication of CN114499629A publication Critical patent/CN114499629A/en
Application granted granted Critical
Publication of CN114499629B publication Critical patent/CN114499629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/1851Systems using a satellite or space-based relay
    • H04B7/18513Transmission in a satellite or space-based system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/1851Systems using a satellite or space-based relay
    • H04B7/18519Operations control, administration or maintenance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/24Cell structures
    • H04W16/28Cell structures using beam steering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Radio Relay Systems (AREA)

Abstract

The invention discloses a dynamic resource allocation method for a beam hopping satellite system based on deep reinforcement learning, which comprises the following steps: step 1, establishing a service model of a forward link of a beam hopping GEO satellite system; step 2, storing the data packet of the service reaching the ground wave position in each time slot in a data packet buffer queue; step 3, a utilization degree reinforcement learning algorithm is adopted, a resource distribution module of the satellite is modeled into an intelligent agent, and state input of the intelligent agent, output decision-making action of the intelligent agent and reward of evaluation action are designed; step 4, simulating the deep reinforcement learning algorithm in the step 3, and continuously training decision neural network weight parameters of the deep reinforcement learning algorithm; and 5, completing dynamic resource allocation of the beam hopping satellite system by the decision neural network obtained by training in the step 4, and solving an optimal scheme of resource allocation of the beam hopping satellite system. The invention reduces the transmission delay of the data packet and improves the throughput of the beam hopping satellite system.

Description

Dynamic resource allocation method for beam-hopping satellite system based on deep reinforcement learning
Technical Field
The invention relates to the field of satellite communication, in particular to a dynamic resource allocation method for a beam hopping satellite system based on deep reinforcement learning.
Background
In conventional multi-beam satellite systems, the power and frequency resources allocated to each beam are relatively fixed. However, since the service requests between beams are non-uniform and time-varying, the conventional allocation algorithm cannot satisfy the service requests. The Beam Hopping (BH) technique is based on time slicing: only a portion of the beams are activated to operate in the same time slot. The beam hopping technology is driven by a service request, and the utilization rate of system resources can be greatly improved. The resource allocation algorithm of the forward link of the current beam hopping satellite system mainly has an heuristic algorithm, an iterative algorithm and a convex optimization algorithm. Both heuristic algorithms and iterative algorithms have large calculation amount and are not suitable for matching with the scene of ground service dynamic change in real time regions. The convex optimization calculation is suitable for a scene with smaller influence degree of co-channel interference between beams in a beam hopping satellite system.
On the other hand, Deep Reinforcement Learning (DRL) is one of the directions that have been most spotlighted in the field of artificial intelligence in recent years. The method combines the perception of deep learning and the decision of reinforcement learning, directly controls the behavior of an intelligent agent through the learning of high-dimensional perception input, and provides a way for solving the perception decision problem of a complex system. Some researches show that the deep reinforcement learning algorithm can obtain better performance in a satellite dynamic resource allocation system, and mainly relates to the aspects of inter-beam channel allocation of a multi-beam satellite system, multi-target optimization resource allocation of the multi-beam satellite and optimized transmission delay of a beam hopping satellite.
However, the existing beam hopping resource allocation algorithm based on deep reinforcement learning does not consider the problem of co-channel interference between beams. When the working beams are adjacent, interference is inevitable. In order to alleviate the problem of co-channel interference between beams, a dynamic resource allocation method for a beam hopping satellite system based on deep reinforcement learning needs to be designed on the basis of a criterion of considering interference avoidance.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a dynamic resource allocation method of a beam-hopping satellite system based on deep reinforcement learning.
The invention adopts the following technical scheme for solving the technical problems:
the invention provides a dynamic resource allocation method of a beam hopping satellite system based on deep reinforcement learning, which comprises the following steps:
step 1, establishing a service model of a forward link of a beam-hopping satellite system according to the characteristic of uneven time-space distribution of services of the beam-hopping GEO satellite system;
step 2, storing a data packet of a service reaching a ground wave position in each time slot in a data packet buffer queue according to the service model of the forward link of the beam hopping satellite system established in the step 1, wherein the data packet obeys the principle of first-come first-serve and establishes an optimization problem of minimizing the transmission delay of the data packet by combining the capacity which can be provided by a satellite;
step 3, introducing a deep reinforcement learning algorithm, modeling a resource allocation module of the satellite into an intelligent agent, and designing state input of the intelligent agent, output decision-making action of the intelligent agent and reward of evaluation action;
step 4, simulating the deep reinforcement learning algorithm in the step 3, initializing a satellite scene, setting parameters of the deep reinforcement learning algorithm, and continuously training decision neural network weight parameters of the deep reinforcement learning algorithm;
and 5, completing dynamic resource allocation of the beam hopping satellite system by the decision neural network obtained by training in the step 4, and solving an optimal scheme of resource allocation of the beam hopping satellite system.
As a further optimization scheme of the dynamic resource allocation method of the beam hopping satellite system based on deep reinforcement learning, a service model of a forward link of the beam hopping satellite system is established in step 1, and the method specifically comprises the following steps:
in a beam-hopping satellite system, the ground wave position psi is defined as psi ═ c n1,2,3, N, where N represents the total number of ground wave bits, cnThe maximum working wave beam number is K which is less than or equal to N, and the wave beam jumping period is defined as T which is { T }1,t2,...,tj,...,tJWhere t isjJ is more than or equal to 1 and less than or equal to J, wherein J is the total number of the beam hopping time slots;
tjtime hopping beamsPattern(s)
Figure BDA0003431952560000021
Wherein the content of the first and second substances,
Figure BDA0003431952560000022
represents tjWhen c isnWhether it is illuminated by the operating beam or not,
Figure BDA0003431952560000023
represents tjTime working beam ignition cn
Figure BDA0003431952560000024
Represents tjNo working beam is on during the time cn
According to a hopping beam pattern
Figure BDA0003431952560000025
Calculating tjWhen, cnSignal to interference plus noise ratio of
Figure BDA0003431952560000026
Figure BDA0003431952560000027
Wherein, ciIs the ith ground wave position,
Figure BDA0003431952560000028
represents tjTime working beam ignition ciTime pair cnThe power gain includes satellite antenna transmit gain, free space loss, rain attenuation, and antenna receive gain;
Figure BDA0003431952560000029
represents tjTime working beam ignition cnTime pair cnThe power gain of (a) is determined,
Figure BDA00034319525600000210
representtjTime working beam to cnThe power of the transmitted power of the satellite,
Figure BDA00034319525600000211
represents tjTime working beam to ciSatellite transmission power of, N0Is the noise power spectral density, W is the satellite spectrum bandwidth,
Figure BDA00034319525600000212
represents tjTime working beam ignition ci
tjTime working beam ignition cnSatellite beam transmission capacity of
Figure BDA00034319525600000213
Figure BDA00034319525600000214
Wherein f isDVB-S2() is a piecewise function of the european telecommunications standards institute standard on signal to interference plus noise ratio and spectral efficiency.
As a further optimization scheme of the dynamic resource allocation method of the beam hopping satellite system based on deep reinforcement learning, the specific process of establishing the optimization problem of minimizing the transmission delay of the data packet in the step 2 is as follows:
tjnew arrival of time cnIs defined as
Figure BDA00034319525600000215
Storing data packets in a packet buffer queue
Figure BDA00034319525600000216
Wherein
Figure BDA0003431952560000031
Represents tjWhen c isnThe data packet buffer queue of (a) is,
Figure BDA0003431952560000032
Figure BDA0003431952560000033
representing the j-q th beam hopping time slot tj-qTime of arrival cnQ is not less than 0 and not more than Tth,TthIs the maximum transmission delay of the data packet;
if the transmission delay of the data packet
Figure BDA0003431952560000034
Exceeding TthThen the packet is discarded; wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003431952560000035
wherein, tjIs the time slot, t, in which the data packet is transmittedkIs the time slot for the data packet to arrive at the ground wave position;
in summary, the following optimization problem P for minimizing the packet transmission delay is established:
Figure BDA0003431952560000036
Figure BDA0003431952560000037
Figure BDA0003431952560000038
Figure BDA0003431952560000039
Figure BDA00034319525600000310
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00034319525600000311
represents tkTo cnThe data packet of (1). Equation (5) indicates that the largest operating beam in a single slot cannot exceed K,
Figure BDA00034319525600000312
represents tjTime working beam ignition cnEquation (6) shows that the sum of the powers of the operating beams in any time slot of the beam hopping period cannot exceed the total power P of the satellitetotThe power of a single working beam in any time slot of the beam hopping period shown in the formula (7) can not exceed the maximum power P of the single beambEquation (8) shows that the maximum transmission delay of the data packet cannot exceed Tth
As a further optimization scheme of the dynamic resource allocation method of the beam hopping satellite system based on deep reinforcement learning, the step 3 is as follows:
step 301, state design of a deep reinforcement learning algorithm:
state stDefined as two attributes of the number of data packets and the average transmission delay of the data packets, and expressed by formula (9):
Figure BDA00034319525600000313
wherein the content of the first and second substances,
Figure BDA00034319525600000314
represents tjTime of flight
Figure BDA00034319525600000315
The total number of data packets in the packet is defined as
Figure BDA00034319525600000316
Figure BDA00034319525600000317
Represents tjTime of flight
Figure BDA00034319525600000318
The number of data packets in the packet stream,
Figure BDA00034319525600000319
to represent
Figure BDA00034319525600000320
The average transmission delay of the data packet in the inner is defined as
Figure BDA00034319525600000321
Figure BDA00034319525600000322
Represents tjTime of flight
Figure BDA00034319525600000323
Average transmission delay of the data packets in the packet queue;
step 302, action design of a deep reinforcement learning algorithm:
hopping beam pattern
Figure BDA00034319525600000324
As agent at tjOutput action of time
Figure BDA00034319525600000325
Obtaining a group of beam hopping pattern sets through an iterative algorithm as actions of a deep reinforcement learning algorithm
Figure BDA0003431952560000041
Figure BDA0003431952560000042
Wherein, XkThe number of the k-th hopping beam pattern in the hopping beam pattern set is more than or equal to 1, k is less than or equal to num, and num is the number of the hopping beam patterns in the group of hopping beam pattern sets;
the iterative algorithm specifically comprises the following steps:
initializing ground wave position numbers, dividing the ground wave positions into M clusters, and setting a same frequency interference threshold value, wherein i is 1, and k is 1;
secondly, if all the ground wave positions in the ith cluster are contained in the Set, selecting one ground wave position from the ith cluster and lighting the ground wave position; otherwise, selecting one ground wave position which does not contain the Set from the ith cluster, and lightening the ground wave position;
step three, calculating the same frequency multiplexing distance of the working wave beam, if the same frequency multiplexing distance is larger than the same frequency interference threshold value, adding the ground wave position selected in the step two into the wave beam hopping pattern XkOtherwise, the ground wave position which enables the same-frequency multiplexing distance to be maximum in the ith cluster is selected to be added into the hopping beam pattern Xk
If i is not equal to M, i is equal to i +1, and the step (II) is returned; if i is equal to M, X iskAdding into Set;
step five, if all elements in Set meet X1∪X2∪...∪XkIf psi, the iteration is terminated; otherwise k equals k +1, i equals 1, and the step is returned;
step 303, reward design of the deep reinforcement learning algorithm:
deep reinforcement learning algorithm tjReward of time
Figure BDA0003431952560000043
The following formula is set:
Figure BDA0003431952560000044
wherein | · represents hadamard matrix operation, and | | · | |, represents the sum of all elements in the matrix.
As a further optimization scheme of the dynamic resource allocation method of the beam hopping satellite system based on deep reinforcement learning, the step 4 is specifically as follows:
step I, initializing a satellite scene, and initializing a data packet buffer queue;
step II, initializing a satellite agent, initializing weight parameters of a decision neural network and a target network, initializing the training step number step of the decision neural network to be 0, and setting the updating step length of the target network to be G;
step III, initializing the capacity of the experience pool, setting a training period number E and a beam hopping time slot number J of each period, wherein the initialization training period E is 1, and the initialization time slot number J is 1;
step IV, tjThe time data packet arrives at the ground wave position, and the satellite environment state information at the time is observed and extracted as
Figure BDA0003431952560000045
Step V, randomly selecting one beam hopping pattern X in the Set obtained in step 302 according to probability epsilonkAs
Figure BDA0003431952560000046
Or selecting the action corresponding to the maximum action value output by the decision neural network according to the probability 1-epsilon as the action
Figure BDA0003431952560000051
Step VI, selected by performing step V
Figure BDA0003431952560000052
At which time the environment transitions to the next state
Figure BDA0003431952560000053
And obtain the prize at that time
Figure BDA0003431952560000054
Step VII, storing experience information
Figure BDA0003431952560000055
To an experience pool;
step VIII, randomly sampling a plurality of experience pieces of information from an experience pool, calculating a loss function, and training a decision neural network by using an Adam algorithm, wherein step is step + 1;
step IX, if step number step of training the decision neural network is a multiple of G, updating the weight parameter of the target network to be the weight parameter of the decision neural network, and executing step X;
if step number step of training decision neural network is not multiple of G, executing step X;
step X, firstly, judging whether J is equal to J or not, if J is not equal to J, then J is J +1, and returning to the step IV;
if J is equal to J, continuing to judge whether E is equal to E: if E is not equal to E, E +1, and the probability epsilon is reduced, the process returns to step IV, and if E is equal to E, the process is terminated.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
the invention provides a feasible scheme for resource allocation of a satellite communication system based on beam hopping, a deep reinforcement learning algorithm is introduced, a satellite is modeled into an intelligent body, a decision neural network of the algorithm is continuously trained by designing the state, action and reward of the deep reinforcement learning algorithm, and finally the decision neural network obtained by training is used for completing the resource allocation of the beam hopping satellite system.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a schematic diagram of a forward link traffic model of a beam-hopping satellite system;
FIG. 3 is a deep reinforcement learning algorithm framework diagram;
FIG. 4 is a flow chart of an iterative algorithm for beam hopping pattern design;
FIG. 5 is a graph comparing the transmission delays of data packets for a deep reinforcement learning algorithm, a random distribution algorithm, and a fixed distribution algorithm;
FIG. 6 is a graph comparing average throughput for a deep reinforcement learning algorithm, a random distribution algorithm, and a fixed distribution algorithm system.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings:
referring to fig. 1, the specific implementation steps of the present invention are as follows:
step 1, establishing a service model of a forward link of a beam-hopping satellite system according to the characteristic of uneven time-space distribution of services of the beam-hopping GEO satellite system.
The forward link traffic model of a beam-hopping satellite system is shown in fig. 2. In a beam-hopping satellite system, the ground wave position psi is defined as psi ═ c n1,2,3, N, where N represents the total number of ground wave bits, cnThe maximum working wave beam number is K which is less than or equal to N, and the wave beam jumping period is defined as T which is { T }1,t2,...,tj,...,tJWhere t isjJ is more than or equal to 1 and less than or equal to J, wherein J is the total number of the beam hopping time slots;
tjtime hopping beam pattern
Figure BDA0003431952560000061
Wherein the content of the first and second substances,
Figure BDA0003431952560000062
represents tjWhen c isnWhether it is illuminated by the operating beam or not,
Figure BDA0003431952560000063
represents tjTime working beam ignition cn
Figure BDA0003431952560000064
Represents tjNo working beam is on during the time cn
According to a hopping beam pattern
Figure BDA0003431952560000065
Calculating tjWhen, cnSignal to interference plus noise ratio of
Figure BDA0003431952560000066
Figure BDA0003431952560000067
Wherein, ciIs the ith ground wave position,
Figure BDA0003431952560000068
represents tjTime working beam ignition ciTime pair cnThe power gain includes satellite antenna transmit gain, free space loss, rain attenuation, and antenna receive gain;
Figure BDA0003431952560000069
represents tjTime working beam ignition cnTime pair cnThe power gain of (a) is determined,
Figure BDA00034319525600000610
represents tjTime working beam to cnThe power of the transmitted power of the satellite,
Figure BDA00034319525600000611
represents tjTime working beam to ciSatellite transmission power of, N0Is the noise power spectral density, W is the satellite spectrum bandwidth,
Figure BDA00034319525600000612
represents tjTime working beam ignition ci
tjTime working beam ignition cnSatellite beam transmission capacity of
Figure BDA00034319525600000613
Figure BDA00034319525600000614
Wherein f isDVB-S2() is a piecewise function of the european telecommunications standards institute standard on signal to interference plus noise ratio and spectral efficiency.
And 2, establishing an optimization problem of minimizing transmission delay, and solving an optimal scheme of hopping beam resource allocation.
tjNew arrival of time cnIs defined as
Figure BDA00034319525600000615
Storing data packets in a packet buffer queue
Figure BDA00034319525600000616
Wherein
Figure BDA00034319525600000617
Represents tjWhen c is turned onnThe data packet buffer queue of (a) is,
Figure BDA00034319525600000618
Figure BDA00034319525600000619
representing the j-q th beam hopping time slot tj-qTime of arrival cnQ is not less than 0 and not more than Tth,TthIs the maximum transmission delay of the data packet;
if transmission delay of data packet
Figure BDA00034319525600000620
Exceeding TthThen the packet is discarded; wherein the content of the first and second substances,
Figure BDA00034319525600000621
wherein t isjIs the time slot, t, in which the data packet is transmittedkIs the time slot in which the packet arrived.
In summary, the following optimization problem P for minimizing the packet transmission delay is established:
Figure BDA0003431952560000071
Figure BDA0003431952560000072
Figure BDA0003431952560000073
Figure BDA0003431952560000074
Figure BDA0003431952560000075
wherein the content of the first and second substances,
Figure BDA0003431952560000076
represents tkTo cnThe data packet of (1). Equation (5) indicates that the largest operating beam in a single slot cannot exceed K,
Figure BDA0003431952560000077
represents tjTime working beam ignition cnEquation (6) shows that the sum of the powers of the operating beams in any time slot of the beam hopping period cannot exceed the total power P of the satellitetotThe power of a single working beam in any time slot of the beam hopping period shown in the formula (7) can not exceed the maximum power P of the single beambEquation (8) shows that the maximum transmission delay of the data packet cannot exceed Tth
And 3, modeling the satellite into an intelligent agent by utilizing a deep reinforcement learning algorithm, and designing reward design of state input of the intelligent agent, output decision-making action of the intelligent agent and quality of action behaviors.
The deep reinforcement learning algorithm framework is shown in fig. 3. The decision neural network is a mapping function of action values and decides the behavior and the action of the intelligent agent. In addition, in order to improve the performance of the decision neural network, an algorithm framework based on deep reinforcement learning is added into a target network and an experience pool. The specific design method of the algorithm comprises the following steps:
step 301, state design of a deep reinforcement learning algorithm:
state stDefined as two attributes of the number of data packets and the average transmission delay of the data packets, and expressed by formula (9):
Figure BDA0003431952560000078
wherein the content of the first and second substances,
Figure BDA0003431952560000079
represents tjTime of flight
Figure BDA00034319525600000710
The total number of data packets in the packet is defined as
Figure BDA00034319525600000711
Figure BDA00034319525600000712
Represents tjTime of flight
Figure BDA00034319525600000713
The number of data packets in the packet stream,
Figure BDA00034319525600000714
to represent
Figure BDA00034319525600000715
The average transmission delay of the data packet in the inner is defined as
Figure BDA00034319525600000716
Figure BDA00034319525600000717
Represents tjTime of flight
Figure BDA00034319525600000718
Average transmission delay of the data packets in the packet;
step 302, designing the action of the deep reinforcement learning algorithm:
hopping beam pattern
Figure BDA00034319525600000719
As agent at tjOutput action of time
Figure BDA00034319525600000720
Obtaining a group of beam hopping pattern sets through an iterative algorithm as actions of a deep reinforcement learning algorithm
Figure BDA00034319525600000721
Figure BDA00034319525600000722
Wherein, XkThe number of the k-th hopping beam pattern in the hopping beam pattern set is more than or equal to 1, k is less than or equal to num, and num is the number of the hopping beam patterns in the group of hopping beam pattern sets;
the beam hopping pattern design flow is shown in fig. 4, and the iterative algorithm specifically includes the following processes:
initializing ground wave position numbers, dividing the ground wave positions into M clusters, and setting a same frequency interference threshold value, wherein i is 1, and k is 1;
secondly, if all the ground wave positions in the ith cluster are contained in the Set, selecting one ground wave position from the ith cluster and lighting the ground wave position; otherwise, selecting one ground wave position which does not contain the Set from the ith cluster, and lightening the ground wave position;
step three, calculating the same frequency multiplexing distance of the working wave beam, if the same frequency multiplexing distance is larger than the same frequency interference threshold value, adding the ground wave position selected in the step two into the wave beam hopping pattern XkOtherwise, the ground wave position which enables the same-frequency multiplexing distance to be maximum in the ith cluster is selected to be added into the hopping beam pattern Xk
If i is not equal to M, if i is equal to i +1, returning to the step (II); if i is equal to M, X iskAdding into Set;
step five, if all elements in Set meet X1∪X2∪...∪XkIf psi, then iterateTerminating; otherwise k equals k +1, i equals 1, and the step is returned;
step 303, reward design of the deep reinforcement learning algorithm:
the optimization problem (4) aims at minimizing the transmission delay of the data packet, and is known from state design
Figure BDA0003431952560000081
Representing a time slot tjInner wave position cnInner packet transmission delay summation. Therefore, the smaller the total transmission delay of the data packet is, the larger the reward is set, and the reward of the deep reinforcement learning algorithm can be set as follows
Figure BDA0003431952560000082
Wherein | · represents hadamard matrix operation, and | | · | |, represents the sum of all elements in the matrix.
And 4, setting parameters of a deep reinforcement learning algorithm, and continuously training weight parameters of an optimization decision neural network.
The algorithm comprises the following specific steps:
step I, initializing a satellite scene, and initializing a data packet buffer queue;
step II, initializing a satellite agent, initializing weight parameters of a decision neural network and a target network, initializing the training step number step of the decision neural network to be 0, and setting the updating step length of the target network to be G;
step III, initializing the capacity of the experience pool, setting a training period number E and a beam hopping time slot number J of each period, wherein the initialization training period E is 1, and the initialization time slot number J is 1;
step IV, tjThe time data packet arrives at the ground wave position, and the satellite environment state information at the time is observed and extracted as
Figure BDA0003431952560000083
Step V, randomly selecting one beam hopping pattern X in the Set obtained in step 302 according to probability epsilonkAs
Figure BDA0003431952560000084
Or selecting the action corresponding to the maximum action value output by the decision neural network as the action with the probability of 1-epsilon
Figure BDA0003431952560000085
Step VI, selected by performing step V
Figure BDA0003431952560000091
At which time the environment transitions to the next state
Figure BDA0003431952560000092
And obtain the prize at that time
Figure BDA0003431952560000093
Step VII, storing experience information
Figure BDA0003431952560000094
To an experience pool;
step VIII, randomly sampling a plurality of experience pieces of information from an experience pool, calculating a loss function, and training a decision neural network by using an Adam algorithm, wherein step is step + 1;
step IX, if step number step of training the decision neural network is a multiple of G, updating the weight parameter of the target network to be the weight parameter of the decision neural network, and executing step X;
if step number step of training decision neural network is not multiple of G, executing step X;
step X, firstly, judging whether J is equal to J or not, if J is not equal to J, then J is J +1, and returning to the step IV;
if J is equal to J, continuing to judge whether E is equal to E: if E is not equal to E, E +1, and the probability epsilon is reduced, the process returns to step IV, and if E is equal to E, the process is terminated.
And 5, finally, completing dynamic allocation of beam hopping resources of the decision neural network obtained by training.
The normalized traffic is first defined as the total packet traffic of the ground wave bits divided by the maximum available capacity of the satellite. And secondly, using the decision neural network obtained by training in the step 4 for dynamic resource allocation of the beam-hopping satellite system. And finally, comparing the performances of the data packet transmission delay and the system throughput under different normalized traffic conditions by using three algorithms of resource allocation, random allocation algorithm and fixed allocation algorithm based on deep reinforcement learning. Wherein the random allocation algorithm indicates that the operating beams are randomly selected per time slot and the fixed allocation algorithm indicates that each beam is allocated a fixed number of time slots. The simulation results are shown in fig. 5 and fig. 6.
The effects of the present invention can be further verified by the following simulation.
1. An experimental scene is as follows:
in order to illustrate the effect of the method, the result of a comparison experiment is given by adopting 36 ground wave position 6-beam GEO satellite system model simulation.
2. Experimental contents and results:
in order to verify the performance of the method, a 36-ground wave position 6-beam GEO beam hopping system model is adopted, the maximum transmission delay threshold of a data packet in satellite scene parameters is set to be 4s, the beam hopping time slot is set to be 100ms, and the number of beam hopping period time slots is set to be 256. The training period in the deep reinforcement learning algorithm is set to be 1000, the size of an experience pool is 5000, the activation function of the decision neural network is Relu, the initial exploration probability is 0.5, and the termination exploration probability is 0.01.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (5)

1. A dynamic resource allocation method for a beam-hopping satellite system based on deep reinforcement learning is characterized by comprising the following steps:
step 1, establishing a service model of a forward link of a beam-hopping satellite system according to the characteristic of uneven time-space distribution of services of the beam-hopping GEO satellite system;
step 2, storing a data packet of a service reaching a ground wave position in each time slot in a data packet buffer queue according to the service model of the forward link of the beam hopping satellite system established in the step 1, wherein the data packet obeys the principle of first-come first-serve and establishes an optimization problem of minimizing the transmission delay of the data packet by combining the capacity which can be provided by a satellite;
step 3, introducing a deep reinforcement learning algorithm, modeling a resource allocation module of the satellite into an intelligent agent, and designing state input of the intelligent agent, output decision-making action of the intelligent agent and reward of evaluation action;
step 4, simulating the deep reinforcement learning algorithm in the step 3, initializing a satellite scene, setting parameters of the deep reinforcement learning algorithm, and continuously training decision neural network weight parameters of the deep reinforcement learning algorithm;
and 5, completing dynamic resource allocation of the beam hopping satellite system by the decision neural network obtained by training in the step 4, and solving an optimal scheme of resource allocation of the beam hopping satellite system.
2. The method for dynamically allocating resources of a beam-hopping satellite system based on deep reinforcement learning according to claim 1, wherein a service model of a forward link of the beam-hopping satellite system is established in step 1, and specifically the following steps are performed:
in a beam-hopping satellite system, the ground wave position psi is defined as psi ═ cn1,2,3, N, where N represents the total number of ground wave bits, cnThe maximum working wave beam number is K which is less than or equal to N, and the wave beam jumping period is defined as T which is { T }1,t2,...,tj,...,tJWhere t isjJ is more than or equal to 1 and less than or equal to J, wherein J is the total number of the beam hopping time slots;
tjtime hopping beam pattern
Figure FDA0003431952550000011
Wherein the content of the first and second substances,
Figure FDA0003431952550000012
represents tjWhen c isnWhether it is illuminated by the operating beam or not,
Figure FDA0003431952550000013
represents tjTime working beam ignition cn
Figure FDA0003431952550000014
Represents tjNo working beam is on during the time cn
According to a hopping beam pattern
Figure FDA0003431952550000015
Calculating tjWhen, cnSignal to interference plus noise ratio of
Figure FDA0003431952550000016
Figure FDA0003431952550000017
Wherein, ciIs the ith ground wave position and is a reference ground wave position,
Figure FDA0003431952550000018
represents tjTime working beam ignition ciTime pair cnThe power gain includes satellite antenna transmit gain, free space loss, rain attenuation, and antenna receive gain;
Figure FDA0003431952550000019
represents tjTime working beam ignition cnTime pair cnThe power gain of (a) is determined,
Figure FDA00034319525500000110
represents tjTime working beam to cnThe power of the transmitted power of the satellite,
Figure FDA00034319525500000111
represents tjTime working beam to ciSatellite transmission power of, N0Is the noise power spectral density, W is the satellite spectrum bandwidth,
Figure FDA00034319525500000112
represents tjTime working beam ignition ci
tjTime working beam ignition cnSatellite beam transmission capacity of
Figure FDA00034319525500000113
Figure FDA0003431952550000021
Wherein f isDVB-S2() is a piecewise function of the european telecommunications standards institute standard on signal to interference plus noise ratio and spectral efficiency.
3. The method for dynamically allocating resources of a beam-hopping satellite system based on deep reinforcement learning according to claim 1, wherein the specific process of establishing the optimization problem for minimizing the transmission delay of the data packet in the step 2 is as follows:
tjnew arrival of time cnIs defined as
Figure FDA0003431952550000022
Storing data packets in a packet buffer queue
Figure FDA0003431952550000023
Wherein
Figure FDA0003431952550000024
Represents tjWhen c isnThe data packet buffer queue of (a) is,
Figure FDA0003431952550000025
Figure FDA0003431952550000026
representing the j-q th beam hopping time slot tj-qTime of arrival cnQ is not less than 0 and not more than Tth,TthIs the maximum transmission delay of the data packet;
if transmission delay of data packet
Figure FDA0003431952550000027
Exceeding TthThen the packet is discarded; wherein the content of the first and second substances,
Figure FDA0003431952550000028
wherein, tjIs the time slot, t, in which the data packet is transmittedkIs the time slot for the data packet to arrive at the ground wave position;
in summary, the following optimization problem P for minimizing the packet transmission delay is established:
Figure FDA0003431952550000029
Figure FDA00034319525500000210
Figure FDA00034319525500000211
Figure FDA00034319525500000212
Figure FDA00034319525500000213
wherein the content of the first and second substances,
Figure FDA00034319525500000214
represents tkTo cnThe data packet of (1). Equation (5) indicates that the largest operating beam in a single slot cannot exceed K,
Figure FDA00034319525500000215
represents tjTime working beam ignition cnEquation (6) shows that the sum of the powers of the operating beams in any time slot of the beam hopping period cannot exceed the total power P of the satellitetotThe power of a single working beam in any time slot of the beam hopping period shown in the formula (7) can not exceed the maximum power P of the single beambEquation (8) shows that the maximum transmission delay of the data packet cannot exceed Tth
4. The method for dynamically allocating resources of a beam-hopping satellite system based on deep reinforcement learning according to claim 3, wherein the step 3 is as follows:
step 301, state design of a deep reinforcement learning algorithm:
state stDefined as two attributes of the number of data packets and the average transmission delay of the data packets, and expressed by formula (9):
Figure FDA00034319525500000216
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003431952550000031
represents tjTime of flight
Figure FDA0003431952550000032
The total number of data packets in the packet is defined as
Figure FDA0003431952550000033
Figure FDA0003431952550000034
Represents tjTime of flight
Figure FDA0003431952550000035
The number of data packets in the packet stream,
Figure FDA0003431952550000036
to represent
Figure FDA0003431952550000037
The average transmission delay of the data packet in the inner layer is defined as
Figure FDA0003431952550000038
Figure FDA0003431952550000039
Represents tjTime of flight
Figure FDA00034319525500000310
Average transmission delay of the data packets in the packet;
step 302, designing the action of the deep reinforcement learning algorithm:
hopping beam pattern
Figure FDA00034319525500000311
As agent at tjOutput action of time
Figure FDA00034319525500000312
Obtaining a group of beam hopping pattern sets through an iterative algorithm as actions of a deep reinforcement learning algorithm
Figure FDA00034319525500000313
Figure FDA00034319525500000314
Wherein, XkThe number of the k-th hopping beam pattern in the hopping beam pattern set is more than or equal to 1, k is less than or equal to num, and num is the number of the hopping beam patterns in the group of hopping beam pattern sets;
the iterative algorithm specifically comprises the following steps:
initializing ground wave position numbers, dividing the ground wave positions into M clusters, and setting a same frequency interference threshold value, wherein i is 1, and k is 1;
secondly, if all the ground wave positions in the ith cluster are contained in the Set, selecting one ground wave position from the ith cluster and lighting the ground wave position; otherwise, selecting one ground wave position which does not contain the Set from the ith cluster, and lightening the ground wave position;
step three, calculating the same frequency multiplexing distance of the working wave beam, if the same frequency multiplexing distance is larger than the same frequency interference threshold value, adding the ground wave position selected in the step two into the wave beam hopping pattern XkOtherwise, the ground wave position which enables the same-frequency multiplexing distance to be maximum in the ith cluster is selected to be added into the hopping beam pattern Xk
If i is not equal to M, i is equal to i +1, and the step (II) is returned; if i is equal to M, X iskAdding into Set;
step five, if all elements in Set meet X1∪X2∪...∪XkIf psi, the iteration is terminated; otherwise k equals k +1, i equals 1, and the step is returned;
step 303, reward design of the deep reinforcement learning algorithm:
deep reinforcement learning algorithm tjReward of time
Figure FDA00034319525500000315
The following formula is set:
Figure FDA00034319525500000316
where | · represents the sum of all elements in the matrix, and | · | |, where |, indicates the sum of all elements in the matrix.
5. The method for dynamically allocating resources of a beam-hopping satellite system based on deep reinforcement learning according to claim 4, wherein the step 4 is as follows:
step I, initializing a satellite scene, and initializing a data packet buffer queue;
step II, initializing a satellite agent, initializing weight parameters of a decision neural network and a target network, initializing the training step number step of the decision neural network to be 0, and setting the updating step length of the target network to be G;
step III, initializing the capacity of the experience pool, setting the number E of training cycles and the number J of beam hopping time slots of each cycle, wherein the initialized training cycle E is 1, and the initialized time slot number J is 1;
step IV, tjThe time data packet arrives at the ground wave position, and the satellite environment state information at the time is observed and extracted as
Figure FDA0003431952550000041
Step V, randomly selecting one beam hopping pattern X in the Set obtained in step 302 according to probability epsilonkAs
Figure FDA0003431952550000042
Or selecting the action corresponding to the maximum action value output by the decision neural network according to the probability 1-epsilon as the action
Figure FDA0003431952550000043
Step VI, selected by performing step V
Figure FDA0003431952550000044
At which time the environment transitions to the next state
Figure FDA0003431952550000045
And obtain the prize at that time
Figure FDA0003431952550000046
Step VII, storing experience information
Figure FDA0003431952550000047
To an experience pool;
step VIII, randomly sampling a plurality of experience pieces of information from an experience pool, calculating a loss function, and training a decision neural network by using an Adam algorithm, wherein step is step + 1;
step IX, if step number step of training the decision neural network is a multiple of G, updating the weight parameter of the target network to be the weight parameter of the decision neural network, and executing step X;
if step number step of training decision neural network is not multiple of G, executing step X;
step X, firstly, judging whether J is equal to J or not, if J is not equal to J, then J is J +1, and returning to the step IV;
if J is equal to J, continuing to judge whether E is equal to E: if E is not equal to E, E +1, and the probability epsilon is reduced, the process returns to step IV, and if E is equal to E, the process is terminated.
CN202111609439.0A 2021-12-24 2021-12-24 Dynamic allocation method for jumping beam satellite system resources based on deep reinforcement learning Active CN114499629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111609439.0A CN114499629B (en) 2021-12-24 2021-12-24 Dynamic allocation method for jumping beam satellite system resources based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111609439.0A CN114499629B (en) 2021-12-24 2021-12-24 Dynamic allocation method for jumping beam satellite system resources based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114499629A true CN114499629A (en) 2022-05-13
CN114499629B CN114499629B (en) 2023-07-25

Family

ID=81495303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111609439.0A Active CN114499629B (en) 2021-12-24 2021-12-24 Dynamic allocation method for jumping beam satellite system resources based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114499629B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900897A (en) * 2022-05-17 2022-08-12 中国人民解放军国防科技大学 Multi-beam satellite resource allocation method and system
CN114928401A (en) * 2022-05-17 2022-08-19 重庆邮电大学 Dynamic planning method for LEO inter-satellite link based on multi-agent reinforcement learning
CN115001611A (en) * 2022-05-18 2022-09-02 西安交通大学 Resource allocation method of hopping beam satellite spectrum sharing system based on reinforcement learning
CN115118331A (en) * 2022-06-28 2022-09-27 北京理工大学 Dynamic low-orbit double-satellite beam hopping technology based on DPP algorithm
CN115173923A (en) * 2022-07-04 2022-10-11 重庆邮电大学 Energy efficiency perception route optimization method and system for low-orbit satellite network
CN115483960A (en) * 2022-08-23 2022-12-16 爱浦路网络技术(南京)有限公司 Beam hopping scheduling method, system, device and storage medium for low-earth-orbit satellite
CN116260506A (en) * 2023-05-09 2023-06-13 红珊科技有限公司 Satellite communication transmission delay prediction system and method
CN116346202A (en) * 2023-03-15 2023-06-27 南京融星智联信息技术有限公司 Wave beam hopping scheduling method based on maximum weighting group
CN116546624A (en) * 2023-05-24 2023-08-04 华能伊敏煤电有限责任公司 Method and device for predicting wave-hopping satellite service and distributing multidimensional link dynamic resources
CN116938323A (en) * 2023-09-18 2023-10-24 中国电子科技集团公司第五十四研究所 Satellite transponder resource allocation method based on reinforcement learning
CN117014061A (en) * 2023-09-27 2023-11-07 银河航天(北京)通信技术有限公司 Method, device and storage medium for determining satellite communication frequency band

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113541770A (en) * 2021-07-12 2021-10-22 军事科学院系统工程研究院网络信息研究所 Space-time-frequency refined resource management method for multi-beam satellite communication system
CN113572517A (en) * 2021-07-30 2021-10-29 哈尔滨工业大学 Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning
CN113692051A (en) * 2021-07-23 2021-11-23 西安空间无线电技术研究所 Cross-wave-bit resource allocation method for beam-hopping satellite

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113541770A (en) * 2021-07-12 2021-10-22 军事科学院系统工程研究院网络信息研究所 Space-time-frequency refined resource management method for multi-beam satellite communication system
CN113692051A (en) * 2021-07-23 2021-11-23 西安空间无线电技术研究所 Cross-wave-bit resource allocation method for beam-hopping satellite
CN113572517A (en) * 2021-07-30 2021-10-29 哈尔滨工业大学 Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
方应勇;何辉;: "宽带卫星通信系统的跳波束技术综述", 卫星电视与宽带多媒体, no. 12 *
韩永锋张晨张更新: "基于深度强化学习的卫星动态资源管理研究综述", 第十六届卫星通信学术年会论文集 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900897A (en) * 2022-05-17 2022-08-12 中国人民解放军国防科技大学 Multi-beam satellite resource allocation method and system
CN114928401A (en) * 2022-05-17 2022-08-19 重庆邮电大学 Dynamic planning method for LEO inter-satellite link based on multi-agent reinforcement learning
CN114900897B (en) * 2022-05-17 2023-04-07 中国人民解放军国防科技大学 Multi-beam satellite resource allocation method and system
CN114928401B (en) * 2022-05-17 2023-07-07 重庆邮电大学 LEO inter-satellite link dynamic planning method based on multi-agent reinforcement learning
CN115001611A (en) * 2022-05-18 2022-09-02 西安交通大学 Resource allocation method of hopping beam satellite spectrum sharing system based on reinforcement learning
CN115001611B (en) * 2022-05-18 2023-09-26 西安交通大学 Resource allocation method of beam hopping satellite spectrum sharing system based on reinforcement learning
CN115118331A (en) * 2022-06-28 2022-09-27 北京理工大学 Dynamic low-orbit double-satellite beam hopping technology based on DPP algorithm
CN115118331B (en) * 2022-06-28 2023-09-19 北京理工大学 Dynamic low-orbit double-star-jump beam method based on DPP algorithm
CN115173923B (en) * 2022-07-04 2023-07-04 重庆邮电大学 Low-orbit satellite network energy efficiency perception route optimization method and system
CN115173923A (en) * 2022-07-04 2022-10-11 重庆邮电大学 Energy efficiency perception route optimization method and system for low-orbit satellite network
CN115483960B (en) * 2022-08-23 2023-08-29 爱浦路网络技术(南京)有限公司 Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium
CN115483960A (en) * 2022-08-23 2022-12-16 爱浦路网络技术(南京)有限公司 Beam hopping scheduling method, system, device and storage medium for low-earth-orbit satellite
CN116346202B (en) * 2023-03-15 2024-02-09 南京融星智联信息技术有限公司 Wave beam hopping scheduling method based on maximum weighting group
CN116346202A (en) * 2023-03-15 2023-06-27 南京融星智联信息技术有限公司 Wave beam hopping scheduling method based on maximum weighting group
CN116260506A (en) * 2023-05-09 2023-06-13 红珊科技有限公司 Satellite communication transmission delay prediction system and method
CN116260506B (en) * 2023-05-09 2023-07-04 红珊科技有限公司 Satellite communication transmission delay prediction system and method
CN116546624A (en) * 2023-05-24 2023-08-04 华能伊敏煤电有限责任公司 Method and device for predicting wave-hopping satellite service and distributing multidimensional link dynamic resources
CN116546624B (en) * 2023-05-24 2024-05-14 华能伊敏煤电有限责任公司 Method and device for predicting wave-hopping satellite service and distributing multidimensional link dynamic resources
CN116938323B (en) * 2023-09-18 2023-11-21 中国电子科技集团公司第五十四研究所 Satellite transponder resource allocation method based on reinforcement learning
CN116938323A (en) * 2023-09-18 2023-10-24 中国电子科技集团公司第五十四研究所 Satellite transponder resource allocation method based on reinforcement learning
CN117014061A (en) * 2023-09-27 2023-11-07 银河航天(北京)通信技术有限公司 Method, device and storage medium for determining satellite communication frequency band
CN117014061B (en) * 2023-09-27 2023-12-08 银河航天(北京)通信技术有限公司 Method, device and storage medium for determining satellite communication frequency band

Also Published As

Publication number Publication date
CN114499629B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN114499629A (en) Dynamic resource allocation method for beam-hopping satellite system based on deep reinforcement learning
CN109639377B (en) Spectrum resource management method based on deep reinforcement learning
CN111586720B (en) Task unloading and resource allocation combined optimization method in multi-cell scene
CN113572517B (en) Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning
CN111628855B (en) Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning
CN113644964B (en) Multi-dimensional resource joint allocation method of multi-beam satellite same-frequency networking system
CN114389678A (en) Multi-beam satellite resource allocation method based on decision performance evaluation
CN110753319B (en) Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles
CN110233755B (en) Computing resource and frequency spectrum resource allocation method for fog computing in Internet of things
CN113709701B (en) Millimeter wave vehicle networking combined beam distribution and relay selection method, system and equipment
CN112583453A (en) Downlink NOMA power distribution method of multi-beam LEO satellite communication system
CN107682935B (en) Wireless self-return resource scheduling method based on system stability
CN114071528A (en) Service demand prediction-based multi-beam satellite beam resource adaptation method
CN110290542B (en) Communication coverage optimization method and system for offshore unmanned aerial vehicle
CN114900225B (en) Civil aviation Internet service management and access resource allocation method based on low-orbit giant star base
CN115441939B (en) MADDPG algorithm-based multi-beam satellite communication system resource allocation method
CN112788605A (en) Edge computing resource scheduling method and system based on double-delay depth certainty strategy
CN113115344B (en) Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN105873214A (en) Resource allocation method of D2D communication system based on genetic algorithm
CN106792451A (en) A kind of D2D communication resource optimization methods based on Multiple-population Genetic Algorithm
CN114885420A (en) User grouping and resource allocation method and device in NOMA-MEC system
CN117412391A (en) Enhanced dual-depth Q network-based Internet of vehicles wireless resource allocation method
CN116634450A (en) Dynamic air-ground heterogeneous network user association enhancement method based on reinforcement learning
CN116963034A (en) Emergency scene-oriented air-ground network distributed resource scheduling method
CN115811788A (en) D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant