CN113364495A - Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system - Google Patents
Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system Download PDFInfo
- Publication number
- CN113364495A CN113364495A CN202110573024.6A CN202110573024A CN113364495A CN 113364495 A CN113364495 A CN 113364495A CN 202110573024 A CN202110573024 A CN 202110573024A CN 113364495 A CN113364495 A CN 113364495A
- Authority
- CN
- China
- Prior art keywords
- reflecting surface
- unmanned aerial
- intelligent
- intelligent reflecting
- aerial vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000010363 phase shift Effects 0.000 title claims abstract description 83
- 238000000034 method Methods 0.000 title claims abstract description 71
- 238000005457 optimization Methods 0.000 title claims abstract description 50
- 238000004891 communication Methods 0.000 claims abstract description 51
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 23
- 238000005265 energy consumption Methods 0.000 claims abstract description 18
- 238000003064 k means clustering Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims description 43
- 239000011159 matrix material Substances 0.000 claims description 42
- 239000003795 chemical substances by application Substances 0.000 claims description 36
- 230000005540 biological transmission Effects 0.000 claims description 29
- 230000006870 function Effects 0.000 claims description 21
- 238000003860 storage Methods 0.000 claims description 17
- 230000006399 behavior Effects 0.000 claims description 13
- 230000009471 action Effects 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 6
- 230000007704 transition Effects 0.000 claims description 6
- 241000209094 Oryza Species 0.000 claims description 5
- 235000007164 Oryza sativa Nutrition 0.000 claims description 5
- 238000012552 review Methods 0.000 claims description 5
- 235000009566 rice Nutrition 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 239000000654 additive Substances 0.000 claims description 3
- 230000000996 additive effect Effects 0.000 claims description 3
- 238000005562 fading Methods 0.000 claims description 3
- 230000009916 joint effect Effects 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 12
- 238000004590 computer program Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- HMPUHXCGUHDVBI-UHFFFAOYSA-N 5-methyl-1,3,4-thiadiazol-2-amine Chemical compound CC1=NN=C(N)S1 HMPUHXCGUHDVBI-UHFFFAOYSA-N 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/01—Reducing phase shift
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and a system, wherein a wireless communication system model based on the assistance of a plurality of unmanned aerial vehicles and an intelligent reflecting surface is established, a signal sent by a user is reflected to a base station by the intelligent reflecting surface arranged on the unmanned aerial vehicle, a channel model in the wireless communication system model and energy consumption models of the unmanned aerial vehicle and the intelligent reflecting surface are determined, and the energy efficiency of the wireless communication system model is calculated; clustering ground users by using a K-means clustering algorithm, determining the position of the unmanned aerial vehicle in each cluster by using a priority experience playback MATD3 method, assisting users communicating with the base station by the unmanned aerial vehicle and the intelligent reflecting surface, and finishing joint optimization of tracks of the unmanned aerial vehicles and phase shift of the intelligent reflecting surface by the activated reflecting elements and the phase shift of the activated reflecting elements of the intelligent reflecting surface. The invention solves the problems of high communication delay and high power consumption of the existing offline optimization method.
Description
Technical Field
The invention belongs to the technical field of wireless communication, and particularly relates to a method and a system for joint optimization of multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift.
Background
With the development of the internet of things technology, more and more devices need to access a communication network, and sometimes the devices are distributed in a very large range, if a single unmanned aerial vehicle and a single intelligent reflecting surface are used for providing services for a large number of communication devices, a large communication load is undoubtedly brought to the unmanned aerial vehicle, in addition, long-distance flight of the unmanned aerial vehicle consumes much time and energy, serious communication delay is generated, and a challenge is brought to the power consumption problem of the unmanned aerial vehicle.
In order to improve the service with low time delay and high reliability for user equipment, a plurality of unmanned aerial vehicles and a plurality of intelligent reflecting surfaces can be adopted for auxiliary communication, a K-mean clustering algorithm is used for dividing ground users into a plurality of areas, each unmanned aerial vehicle carrying the intelligent reflecting surfaces serves users in a certain area, and on the premise of ensuring good communication quality, the track of the unmanned aerial vehicle and the phase shift of the intelligent reflecting surfaces are jointly optimized by using a multi-agent reinforcement learning algorithm, so that the energy efficiency of a wireless communication system is maximized.
Disclosure of Invention
The invention aims to solve the technical problem of providing a multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system aiming at the defects in the prior art, and solves the problems of high communication delay and high power consumption of the existing unmanned aerial vehicle track and intelligent reflecting surface phase shift offline optimization method.
The invention adopts the following technical scheme:
a multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method comprises the following steps:
s1, establishing a wireless communication system model based on multiple unmanned aerial vehicles and intelligent reflecting surface assistance, reflecting a signal sent by a user to a base station by an intelligent reflecting surface installed on the unmanned aerial vehicles, determining a channel model in the wireless communication system model and energy consumption models of the unmanned aerial vehicles and the intelligent reflecting surface, and calculating the energy efficiency of the wireless communication system model;
s2, based on the channel model determined in the step S1 and the energy consumption models of the unmanned aerial vehicle and the intelligent reflecting surface, clustering ground users by using a K-means clustering algorithm, taking energy efficiency as an optimization target, then determining the position of the unmanned aerial vehicle in each cluster by using a priority experience playback MATD3 method, assisting users communicating with the base station by the unmanned aerial vehicle and the intelligent reflecting surface, and completing the joint optimization of the tracks of the unmanned aerial vehicles and the phase shift of the intelligent reflecting surface by the phase shift of the activated reflecting element and the activated reflecting element.
Specifically, in step S1, the wireless communication system model based on the assistance of multiple drones and the intelligent reflecting surfaceThe body is as follows: the number of randomly distributed users is U, the user U is divided into K areas, and the number of the users in each area is Uk,u1+…+uk+…+uKThe number of the intelligent reflecting surfaces and the number of the unmanned aerial vehicles are K, and each unmanned aerial vehicle provided with the intelligent reflecting surface serves users in one area; the intelligent reflecting surface carried on the unmanned aerial vehicle adjusts the phase shift of the M reflecting elements through an integrated controller; the base station receives signals reflected by all the intelligent reflecting surfaces at the same time; the number of antennas of the base station is N, the number of reflecting elements of the intelligent reflecting surface is M, and the user is a single antenna; the coordinates of the base station are (x)BS,yBS,zBS) The coordinates of the intelligent reflecting surface p areThe coordinates of the user q areOnly one user in one area sends signals, the signals sent by each user are reflected to a base station through an intelligent reflecting surface serving the area and are reflected to the base station through intelligent reflecting surfaces serving other areas, and meanwhile, the number of the users participating in communication and the number of the intelligent reflecting surfaces are K; each reflecting element of the intelligent reflecting surface independently adjusts the phase shift of an incident signal, simultaneously keeps the amplitude unchanged, and the phase shift matrix of the intelligent reflecting surface p is a diagonal matrix thetap=diag(νp) Element on diagonalθpmRepresenting the phase shift of the mth reflecting element of the intelligent reflecting surface p; the matrix of activated reflecting elements of the intelligent reflecting surface is a diagonal matrix deltap=diag(υp) Element v on diagonalp=(δp1,…,δpm,…,δpM),δpmIndicating whether the mth reflecting element of the intelligent reflecting surface p is activated.
Specifically, in step S1, the signal sent by the user is reflected by the intelligent reflection surface of the unmanned aerial vehicle to the base station in the decision stage, the flight stage, and the information transmission stage, where the decision stage is: the unmanned aerial vehicle selects which user to communicate with, and selects the position for information transmission, and the intelligent reflecting surface selects the activated reflecting element and the phase shift thereof; a flight phase: the unmanned aerial vehicle flies to the information transmission position selected in the decision stage along a straight line at a speed v; and (3) information transmission stage: the unmanned aerial vehicle hovers after reaching a specified position, the selected users send signals to the intelligent reflecting surface in the decision phase, and the activated reflecting elements of the intelligent reflecting surface reflect the signals sent by the users to the base station with corresponding phase offsets.
Specifically, in step S1, channels between the user and the intelligent reflective surface and between the intelligent reflective surface and the base station are modeled as rice channels, and a channel G from the user q to the intelligent reflective surface ppqIs as follows;
where ρ represents the reference distance d0Path loss at 1m, k1Is the path loss exponent, beta is the Rice fading factor, d1Is the euclidean distance between the user q and the intelligent reflecting surface p,is a non-line-of-sight propagation component,is a vector of the response of the array,the cosine of the angle of arrival of the signal from user q to intelligent reflecting surface p, λ represents the wavelength of the carrier, and d represents the antenna spacing.
Channel F from intelligent reflecting surface p to base stationpComprises the following steps:
wherein d is2Representing the euclidean distance between the intelligent reflecting surface p and the base station,is a non-line-of-sight propagation component, andis an array response vector;
the received signal y of the base station is:
where S is a transmit signal matrix, H is a channel matrix, HkIs the kth column, s, of the matrix HkIs the k-th row of the matrix S, n represents the additive white Gaussian noise at the base station end, and the variance is sigma2The cyclic symmetric complex gaussian variable of (a);
regarding the interference of other users as noise, the SINR of the k-th userkComprises the following steps:
information transmission rate R of kth userkComprises the following steps:
where K is the number of users communicating with the base station at the same time, wkFor the kth row of the zero-forcing detection filter matrix,for making an intelligenceConjugate transpose of channel matrix between plane of reflection p and base station, [ theta ]pIs a phase shift matrix of the intelligent reflecting surface p, DeltapMatrix of activated reflecting elements, G, being intelligent reflecting surfaces ppqFor the channel between user q and intelligent reflecting surface p, GpkFor the channel between user k and intelligent reflecting surface p, σ2Is the variance of the noise.
Specifically, in step S1, energy efficiency EEpFor the total energy that the data volume of transmission divided unmanned aerial vehicle p and intelligent plane of reflection p consumed, specifically be:
wherein,energy consumed for unmanned aerial vehicle flying to designated location, GpFor the data quantity transmitted to the base station by the user p through the assistance of the unmanned plane p and the intelligent reflecting surface p,the energy consumed for the intelligent reflecting surface p,for the propulsion power of drone p, T is the time required for the drone to fly to the designated location.
Specifically, in step S2, clustering the users by using a K-means clustering algorithm specifically includes:
and if the clustering centers of all the clusters are completely the same as the result obtained by the last calculation, the clustering criterion function is converged, and all the users are classified into the correct clusters.
Specifically, in step S2, determining the position of the drone in each cluster, the position of the user communicating with the base station, the activated reflection element of the intelligent reflection surface, and the phase shift of the activated element by using a priority experience playback MATD3 method, and completing the joint optimization of the trajectories of the multiple drones and the phase shift of the intelligent reflection surface specifically includes:
modeling optimization problems of unmanned aerial vehicle tracks and intelligent reflecting surface phase shift in a wireless communication system based on multiple unmanned aerial vehicles and intelligent reflecting surface assistance into a Markov game, wherein each unmanned aerial vehicle provided with the intelligent reflecting surface is used as an intelligent agent, and the kth intelligent agent observes the current environment state skBased on a strategy of pikSelecting an action akThe reward r obtained after the action acts on the environmentkThe environment will then be represented by a transition probability P (s'k|sk,a1,…,aK) Transition to New State s'k;
In each moment, the kth agent observes the position of the unmanned aerial vehicle k at the last moment and the position of the user communicating with the base station in the kth cluster as a state skThe parameter of the training strategy network is thetakWill state skAs input, the position of the kth unmanned aerial vehicle at the current moment, the activated user vector in the kth cluster for communicating with the base station, the activated element vector and the phase shift vector of the kth intelligent reflecting surface are output as the behavior ak(ii) a The parameters of the first training value network and the second training value network are respectively omegak1And ωk2Two training value networks put the joint state s observed by each agent (s ═ s)1,s2,…,sK) And the joint action a ═ a taken1,a2,…,aK) As inputs, the joint state-behavior cost function Q is output separatelyk1(s,a1,a2,…,aK,ωk1) And Qk2(s,a1,a2,…,aK,ωk2) Target policy network will next state s'kAs input, the next action a 'is output'kAccording to the parameter theta of the training strategy network in a soft updating modekUpdating a parameter θ of a target policy networkk', the first and second target value networks input the next state-behavior pair (s', a '), respectively outputting Q'k1(s',a1',a'2,…,a'K,ω'k1) And Q'k2(s',a1',a'2,…,a'K,ω'k2) According to the parameter omega of the first training value network in a soft updating modek1And a parameter ω of the second training value networkk2Updating parameter omega 'of first target value network'k1And a parameter ω 'of a second target value network'k2;
Will (s, a)1,a2,…,aK,r1,r2,…,rKS') as an experience of the agent is stored in an experience memory, and when the experience memory reaches the maximum storage capacity, a small batch of experiences are sampled from the experience memory by using a priority experience playback method for training, and parameters of the strategy network and parameters of the value network are updated.
Further, the state s observed by each dronekComprising two parts, respectively the position of drone K (K ═ 1,2, …, K) at the last moment,and in the kth cluster, assisting the position of a user communicating with the base station by the kth unmanned aerial vehicle and the intelligent reflecting surface,the dimensionality of the state sk is six dimensions; behavior akThe method comprises the following four parts:
ii: activated user vector communicating with base station in kth cluster at current timeEach of whichEach element represents whether the corresponding user is activated or not, the value of 0 represents that the corresponding user is not activated, the value of 1 represents that the corresponding user is activated, and the vectorShould satisfyIndicating that only one activated user in a cluster is at any one time;
iii: activated element vector of k-th intelligent reflecting surface at current momentEach element represents whether the corresponding reflection element is activated or not, the value of 0 represents that the corresponding reflection element is not activated, the value of 1 represents that the corresponding reflection element is activated, and the vectorShould satisfyIndicating that the number of activated elements per intelligent reflective surface should be between 1 and M.
iv: phase shift vector of intelligent reflecting surface at current momentEach of which represents a phase shift of the corresponding reflective element,
reward is defined as energy efficiency EEk,rk(sk,ak)=EEk。
Further, a strategic gradient method is usedParameter theta of training strategy network of new kth agentkComprises the following steps:
wherein, J (theta)k) Is a strategic objective function, F denotes the size of the small batch of samples,the expression of the gradient operator is used to indicate,is the policy learned by the kth agent,to sample the state of the kth agent in the jth experience using the priority empirical playback method,behavior of the kth agent in the jth experience;
parameter ω of training value network 1 for kth agentk1And a parameter omega of the training value network 2k2Updating by gradient back propagation of the neural network, and the loss functions are respectively:
parameter theta 'of target policy network'kParameters of the target value network 1Of several omega'k1And parameter ω 'of target value network 2'k2And respectively updating by using a soft updating mode:
θ′k←αθk+(1-α)θ′k
ω'k1←αωk1+(1-α)ω'k1
ω'k2←αωk2+(1-α)ω'k2
where α represents an update coefficient.
Another technical solution of the present invention is a system for joint optimization of multiple unmanned aerial vehicle trajectories and intelligent reflecting surface phase shifts, comprising:
the energy module is used for establishing a wireless communication system model based on multi-unmanned aerial vehicle and intelligent reflecting surface assistance, signals sent by a user are reflected to a base station by the intelligent reflecting surface installed on the unmanned aerial vehicle, a channel model in the wireless communication system model and energy consumption models of the unmanned aerial vehicle and the intelligent reflecting surface are determined, and the energy efficiency EE of the wireless communication system model is calculatedp;
And the optimization module is used for clustering ground users by using a K-means clustering algorithm based on a channel model determined by the energy module and energy consumption models of the unmanned aerial vehicles and the intelligent reflecting surfaces, taking energy efficiency as an optimization target, then determining the position of the unmanned aerial vehicle in each cluster by using a priority experience playback MATD3 method, assisting users communicating with the base station by the unmanned aerial vehicles and the intelligent reflecting surfaces, and completing the joint optimization of tracks of the multiple unmanned aerial vehicles and the phase shift of the intelligent reflecting surfaces by using the activated reflecting elements and the phase shift of the activated reflecting elements.
Compared with the prior art, the invention has at least the following beneficial effects:
a method for jointly optimizing the track of a plurality of unmanned aerial vehicles and the phase shift of an intelligent reflecting surface,
the channel model and the energy consumption model are established for calculating energy efficiency, the energy efficiency maximization is used as an optimization target to train the neural network, and finally the neural network learns a strategy for enabling the wireless communication system to obtain the maximum energy efficiency. The phase shift of the unmanned aerial vehicle track and the intelligent reflecting surface is optimized by using the priority experience playback MATD3 method, so that the unmanned aerial vehicle and the intelligent reflecting surface can self-adaptively adjust own strategies according to the change of the environment, and the robustness is strong.
Furthermore, the user is divided into a plurality of areas, and an unmanned aerial vehicle provided with an intelligent reflecting surface is arranged in each area to provide services for the user, so that the problems of high power consumption and high communication delay caused by long-distance flight of the unmanned aerial vehicle can be avoided.
Further, in the decision phase, the drones in each area select which user to communicate with and select the location of information transmission, and the intelligent reflective surface selects the reflective element that needs to be activated and determines the phase shift of the activated element. In the flight phase, the unmanned aerial vehicle flies along a straight line to the information transmission position determined in the decision phase. In the information transmission stage, the selected user sends a signal in the decision stage, and the intelligent reflecting surface reflects the signal sent by the user to the base station.
Furthermore, establishing a proper channel model is the basis for accurately calculating the information transmission rate, and the energy efficiency of the system can be further calculated after the information transmission rate is obtained.
Furthermore, the energy efficiency is used as an optimization target to design the track of the unmanned aerial vehicle and the phase shift of the intelligent reflecting surface, so that the aim of maximizing the energy efficiency of the system can be achieved.
Further, when the region that unmanned aerial vehicle needs service is very big, in order to improve communication quality and practice thrift unmanned aerial vehicle's energy, need cluster the user, every unmanned aerial vehicle who installs intelligent plane serves the user in a cluster, and unmanned aerial vehicle flies at this cluster coverage, provides service for the user in the cluster.
Furthermore, the unmanned aerial vehicle and the intelligent reflecting surface in each cluster are used as an intelligent agent, and the intelligent agents learn by using a distributed execution and centralized training mode, so that experience sharing can be realized, and an optimal strategy which can enable the energy efficiency of the system to be the highest can be learned quickly. The samples are extracted from the experience memory by using the priority experience playback method, so that the experience with higher learning value can be learned more frequently, and the learning efficiency is improved. The TD3 algorithm can solve the problem of overestimation of Q values, thereby enabling the value network to make an accurate assessment of the value of the state-behavior pairs.
Further, the channel state is related to the position of the user and the unmanned aerial vehicle, and the channel state is an important basis for determining the optimal position of the unmanned aerial vehicle for information transmission and the phase shift of the intelligent reflecting surface, and the state s is obtained by combiningkSet up to the position of last moment unmanned aerial vehicle and carry out the position of the user that communicates with the basic station, can make the intelligent agent learn the hidden relation between unmanned aerial vehicle position and user position and the channel state to can be directly with state skMapping to behavior a that maximizes energy efficiencykWithout obtaining accurate channel state information. By taking the position of the unmanned aerial vehicle, the matrix of the activated elements of the intelligent reflecting surface and the phase shift matrix as the behaviors akThe intelligent reflecting surface can establish a high-quality line-of-sight propagation link between the user and the base station and reflect signals sent by the user to the base station.
Furthermore, by solving the gradient of the strategy objective function and adjusting the parameters of the training strategy network to maximize the Q value, a strategy that can map the state to the optimal behavior can be found. And updating parameters of the training value network by using a gradient descent method to minimize a loss function, so that the value of the value network to the state behavior pair can be accurately evaluated. The parameters of the target strategy network and the target value network are updated in a soft updating mode, so that the stability of the algorithm can be improved.
In summary, the invention uses a plurality of unmanned aerial vehicles and a plurality of intelligent reflecting surfaces for auxiliary communication, uses a K-means clustering algorithm to cluster users, and each unmanned aerial vehicle and intelligent reflecting surface serve users in one cluster; the priority experience playback MATD3 method enables the intelligent bodies to learn the strategies adopted by other intelligent bodies in a centralized training mode, and experience of all the intelligent bodies is shared, so that joint optimization of multiple unmanned aerial vehicle tracks and phase shift of an intelligent reflecting surface is rapidly achieved, and the energy efficiency of the system is maximized.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a diagram of a system model of the present invention;
FIG. 2 is a diagram illustrating a process of transmitting information from a user to a base station according to the present invention;
FIG. 3 is a flow chart of a K-means clustering algorithm;
FIG. 4 is a block diagram of a method for priority empirical review MATD 3;
fig. 5 is a diagram illustrating the effect of user transmit power on energy efficiency.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.
The invention provides a Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method, which comprises the steps of firstly establishing a wireless communication system model based on the assistance of multiple unmanned aerial vehicles and intelligent reflecting surfaces, and secondly, providing a priority experience playback MATD3 method (MATD 3) aiming at the non-convexity of the track and phase shift optimization problem, so as to realize the joint optimization of the Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint.
The invention discloses a multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method, which comprises the following steps:
s1, establishing a wireless communication system model based on the assistance of multiple unmanned aerial vehicles and intelligent reflecting surfaces, and then respectively discussing the channels and the energy consumed by the unmanned aerial vehicles and the intelligent reflecting surfaces;
as shown in FIG. 1, the communication model is that the number of users randomly distributed in a certain range is set as U, the users are divided into K regions, and the number of users in each region is set as Uk,u1+…+uk+…+uKU. The number of intelligent plane of reflection and unmanned aerial vehicle all is K, and every unmanned aerial vehicle of installing intelligent plane of reflection serves the user in an area. The intelligent reflecting surface carried on the unmanned aerial vehicle adjusts the phase shift of the M reflecting elements through an integrated controller. And the base station receives all the signals reflected by the intelligent reflecting surface at the same time. Suppose the number of antennas of the base station is N, the number of reflecting elements of the intelligent reflecting surface is M, and the user is a single antenna. Let the coordinates of the base station be (x)BS,yBS,zBS) The coordinates of the intelligent reflecting surface p areThe coordinates of the user q areAt a certain moment, only one user in one area sends a signal, the signal sent by each user can be reflected to the base station through the intelligent reflecting surface serving the area, and can be reflected to the base station through the intelligent reflecting surfaces serving other areas, and the number of the users participating in communication and the number of the intelligent reflecting surfaces are both K. Each reflecting element of the intelligent reflecting surface can independently adjust the phase shift of an incident signal while keeping the amplitude of the incident signal unchanged, and the phase shift matrix of the intelligent reflecting surface p is a diagonal matrix thetap=diag(νp) Element on diagonal
Wherein, thetapmRepresenting the phase shift of the mth reflecting element of the intelligent reflecting surface p.
The matrix of activated reflecting elements of the intelligent reflecting surface is also a diagonal matrix deltap=diag(υp) Element on diagonal
υp=(δp1,…,δpm,…,δpM) (2)
Wherein, deltapmIndicating whether the mth reflecting element of the intelligent reflecting surface p is activated,
referring to fig. 2, the process of transmitting information from the user to the base station is divided into three stages, specifically:
1) a decision stage: the unmanned aerial vehicle selects which user to communicate with, and selects the position to transmit information, and the intelligent reflecting surface selects the activated reflecting element and the phase shift thereof.
2) A flight phase: the drone flies in a straight line at a speed v towards the information transmission location selected in the decision phase.
3) And (3) information transmission stage: after the unmanned aerial vehicle reaches a specified position, the unmanned aerial vehicle hovers at the position, the selected user sends a signal to the intelligent reflecting surface in the decision phase, and the activated reflecting element of the intelligent reflecting surface reflects the signal sent by the user to the base station with a certain phase offset.
Modeling channels between the user and the intelligent reflecting surface and between the intelligent reflecting surface and the base station into a Leise channel, and setting a channel from the user q to the intelligent reflecting surface p asThe method specifically comprises the following steps:
where ρ represents the reference distance d0Path loss at 1m, k1Is the path loss exponent, beta is the rice fading factor,is the euclidean distance between the user q and the intelligent reflecting surface p,a non-line-of-sight propagation component, each of which is modeled as a circularly symmetric complex gaussian variable with zero mean and unit variance,is a vector of the response of the array,a cosine value representing the angle of arrival of the signal from user q to intelligent reflecting surface p.
The channel from the intelligent reflecting surface p to the base station isThe method specifically comprises the following steps:
wherein,representing the euclidean distance between the intelligent reflecting surface p and the base station,a non-line-of-sight propagation component, each of which is modeled as a circularly symmetric complex gaussian variable with zero mean and unit variance,is an array response vector, specifically:
wherein,andrepresenting cosine values of the departure angle and arrival angle of the signal, respectively.
Let S be the transmission signal matrix, H be the channel matrix, and S be the transmission signal of user K (K {1, …, K, …, K })kThen, the received signal of the base station is:
wherein h iskIs the kth column, s, of the matrix HkIs the k-th row of the matrix S, n represents the additive white Gaussian noise at the base station end, the mean is 0, and the variance is sigma2Of circularly symmetric complex Gaussian variables, i.e.
In an uplink multi-user communication system, since multiple users transmit signals on the same frequency band at the same time, co-channel interference exists. In order to suppress co-channel interference between users and successfully detect signals transmitted by each user, the base station may use a zero-forcing detection algorithm to eliminate interference between signals transmitted by different antennas at the signal receiving end through linear transformation.
To recover s at the base stationkWhile excluding interference from signals transmitted by other users, using the matrix WZFBy inner-product with the received signal y to obtain an equalized signal, i.e.
WZFy=WZFHS+WZFn (10)
wkAs a matrix WZFThe following should be satisfied for line k of (1):
matrix WZFShould satisfy WZFAnd H is a unit array, and specifically comprises the following components:
WZF=(HHH)-1HH (12)
assuming that the channel matrix H is full rank, the estimated value of the transmitted signal is thenCan be expressed as:
the estimated value of the transmitted signal after the zero forcing detector completely eliminates the interference between the transmitted signals of different users.
Regarding the interference of other users as noise, the signal-to-interference-and-noise ratio of the kth user is:
the information transmission rate of the kth user is:
where K is the number of users communicating with the base station at the same time, wkFor the kth row of the zero-forcing detection filter matrix,is the conjugate transpose of the channel matrix between the intelligent reflecting surface p and the base station, thetapIs a phase shift matrix of the intelligent reflecting surface p, DeltapMatrix of activated reflecting elements, G, being intelligent reflecting surfaces ppqFor the channel between user q and intelligent reflecting surface p, GpkFor the channel between user k and intelligent reflecting surface p, σ2Is the variance of the noise.
Energy consumption in the multi-user uplink transmission system based on the assistance of the unmanned aerial vehicle and the intelligent reflecting surface comprises two parts, namely energy consumed by flight of the unmanned aerial vehicle and energy consumed by the activated reflecting element of the intelligent reflecting surface, wherein the propulsion power of the pth unmanned aerial vehicle is as follows:
wherein, UtipIs the speed, v, of the rotor blade tip of the drone0Is the average induced velocity of the rotor during hover, χ is the fuselage drag ratio, κ is the air density, u is the rotor solidity, a is the rotor disk area,is the profile drag coefficient, Ω is the blade angular velocity, γ is the rotor radius, ψ is the incremental coefficient of dependence of induced power, W is the weight of the drone, vpIs the speed of the pth unmanned aerial vehicle, the calculation process is as follows:
(t-1) the position of the drone p isthe position at time t isThe distance traveled by drone p to fly from the location at time (t-1) to the location at time t is:
if the time spent by the flight of the unmanned aerial vehicle is T, the speed v of the pth unmanned aerial vehiclepComprises the following steps:
the energy consumed when the unmanned plane p flies to a specified position is as follows:
let deltapmIndicating whether the m-th reflecting element of the intelligent reflecting surface p is activated, pIRSRepresenting the power consumed by each reflecting element, the power consumed by the entire intelligent reflecting surface p is:
the duration of the information transmission phase is tau, and the energy consumed by the intelligent reflecting surface p in the period is:
in the information transmission stage, in the kth cluster, a user p is assisted by an unmanned aerial vehicle p and an intelligent reflecting surface p, and the data volume transmitted to a base station is as follows:
Gp=Rpτ (24)
energy efficiency is the amount of data transmitted divided by the total energy consumed by the drone p and the intelligent reflective surface p:
s2, based on the channel model and the energy consumption model in the step S1, clustering ground users by using a K-means clustering algorithm, then determining the position of the unmanned aerial vehicle in each cluster by using a priority experience playback MATD3 method, assisting users communicating with the base station by the unmanned aerial vehicle and the intelligent reflecting surface, and completing the joint optimization of the track of the unmanned aerial vehicle and the phase shift of the intelligent reflecting surface by using the activated reflecting element and the phase shift of the intelligent reflecting surface in the information transmission stage.
Referring to fig. 3, the basic idea of the K-means clustering algorithm is to first designate a K value, randomly extract K users from all users as initial clustering centers, then calculate distances between the remaining all users and the K initial clustering centers, and partition the user closest to which clustering center to the clustering center. And for each newly formed cluster, the clustering center is obtained by calculating the average value of samples in the cluster, and if the clustering centers of all the clusters are completely the same as the result obtained by the last calculation, the clustering criterion function is converged, and all the users are divided into the correct clusters.
After the ground users are divided into a plurality of clusters by using a K-means clustering algorithm, an unmanned aerial vehicle provided with an intelligent reflecting surface can be placed in each cluster, and the unmanned aerial vehicle flies in the coverage range of the cluster to provide service for the users in the cluster. The trajectory of multiple drones and the phase shift of the intelligent reflective surface are jointly optimized using the priority empirical review MATD3 algorithm to maximize the energy efficiency of the system, the algorithm framework being shown in fig. 4. Modeling optimization problems of unmanned aerial vehicle tracks and intelligent reflecting surface phase shift in a wireless communication system based on multiple unmanned aerial vehicles and intelligent reflecting surface assistance into a Markov game, wherein each unmanned aerial vehicle provided with the intelligent reflecting surface is used as an intelligent agent, and the kth intelligent agent observes the current environment state skBased on a strategy of pikSelecting an action akThe reward r obtained after the action acts on the environmentkThe environment will then be represented by a transition probability P (s'k|sk,a1,…,aK) Transition to New State s'k。
State s observed by each dronekComprising two parts, respectively the position of drone K (K ═ 1,2, …, K) at the last moment,and in the kth cluster, assisting the position of a user communicating with the base station by the kth unmanned aerial vehicle and the intelligent reflecting surface,state skThe dimensions of (a) are six dimensions:
behavior of kth agent akIs one dimension of (3+ u)k+2 × M) vector, ukIs the number of users in the kth cluster, action akThe method comprises the following four parts:
ii: activated user vector communicating with base station in kth cluster at current timeEach element represents whether the corresponding user is activated or not, the value of 0 represents that the corresponding user is not activated, the value of 1 represents that the corresponding user is activated, and the vectorShould satisfyIndicating that only one activated user in a cluster is at any one time;
iii: activated element vector of k-th intelligent reflecting surface at current momentEach of which indicates whether the corresponding reflective element is activated or notThe value of 0 indicates that the corresponding reflection element is not activated, the value of 1 indicates that the corresponding reflection element is activated, and the vector is expressedShould satisfyIndicating that the number of activated elements per intelligent reflective surface should be between 1 and M.
iv: phase shift vector of intelligent reflecting surface at current momentEach of which represents a phase shift of the corresponding reflective element,
prize rk(sk,ak) Defined as energy efficiency EEkCalculated from equation (25).
For a multi-agent system, each agent has six neural networks, namely a training strategy network, a target strategy network, a first training value network, a second training value network, a first target value network and a second target value network. In each moment, the kth agent observes the position of the unmanned aerial vehicle k at the last moment and the position of the user communicating with the base station in the kth cluster as a state skThe parameter of the training strategy network is thetakWill state skAs input, the position of the kth unmanned aerial vehicle at the current moment, the activated user vector in the kth cluster for communicating with the base station, the activated element vector and the phase shift vector of the kth intelligent reflecting surface are output as the behavior ak(ii) a The parameters of the first training value network and the second training value network are respectively omegak1And ωk2The two networks view the respective agentsThe measured joint state s ═ s(s)1,s2,…,sK) And the joint action a ═ a taken1,a2,…,aK) As inputs, the joint state-behavior cost function Q is output separatelyk1(s,a1,a2,…,aK,ωk1) And Qk2(s,a1,a2,…,aK,ωk2) Target policy network will next state s'kAs input, the next action a 'is output'kAccording to the parameter theta of the training strategy network in a soft updating modekUpdating a parameter θ of a target policy networkk', the first and second target value networks input the next state-behavior pair (s', a '), respectively outputting Q'k1(s',a′1,a'2,…,a'K,ω'k1) And Q'k2(s',a′1,a'2,…,a'K,ω'k2) According to the parameter omega of the first training value network in a soft updating modek1And a parameter ω of the second training value networkk2Updating parameter omega 'of first target value network'k1And a parameter ω 'of a second target value network'k2。
Will (s, a)1,a2,…,aK,r1,r2,…,rKS') as an experience of the agent is stored in an experience memory, and when the experience memory reaches the maximum storage capacity, a small batch of experiences are sampled from the experience memory by using a priority experience playback method for training, and parameters of the strategy network and parameters of the value network are updated.
The probability that experience j is sampled is:
where γ represents the importance of the priority, F represents the number of small batch extractions, DjRank (1/rank) (j), rank (j) is the ranking of the jth empirical learning value.
The importance sampling weights are:
e is the number of stored experiences in the experience memory and ξ is the sampling weight coefficient.
Updating parameter theta of training strategy network of kth intelligent agent by using strategy gradient methodk:
Wherein, J (theta)k) Is a strategic objective function, means a gradient operator,is the policy learned by the kth agent,to sample the state of the kth agent in the jth experience using the priority empirical playback method,the behavior of the kth agent in the jth experience.
Parameter ω of training value network 1 for kth agentk1And a parameter omega of the training value network 2k2Updating by gradient back propagation of the neural network, and the loss functions are respectively:
The loss function represents the difference between the Q value output by the training value network and the target Q value, and the Q value output by the training value network is very close to the target Q value by updating the parameters of the training value network by using a gradient descent method to minimize the loss function, so that the value of the training value network on the state-behavior pair can be accurately evaluated.
Parameter theta 'of target policy network'kParameter ω 'of target value network 1'k1And parameter ω 'of target value network 2'k2And respectively updating by using a soft updating mode:
θ′k←αθk+(1-α)θ′k (34)
ω'k1←αωk1+(1-α)ω'k1 (35)
ω'k2←αωk2+(1-α)ω'k2 (36)
where α represents an update coefficient.
In another embodiment of the present invention, a method and a system for joint optimization of multi-drone trajectory and intelligent reflection surface phase shift are provided, where the system can be used to implement the method and the system for joint optimization of multi-drone trajectory and intelligent reflection surface phase shift, and specifically, the method and the system for joint optimization of multi-drone trajectory and intelligent reflection surface phase shift include an energy module and an optimization module.
The energy module establishes a wireless communication system model based on multiple unmanned aerial vehicles and intelligent reflecting surface assistance, signals sent by a user are reflected to a base station by the intelligent reflecting surface installed on the unmanned aerial vehicles, a channel model in the wireless communication system model and energy consumption models of the unmanned aerial vehicles and the intelligent reflecting surface are determined, and energy efficiency EE of the wireless communication system model is calculatedp;
And the optimization module is used for clustering ground users by using a K-means clustering algorithm based on a channel model determined by the energy module and energy consumption models of the unmanned aerial vehicles and the intelligent reflecting surfaces, then determining the position of the unmanned aerial vehicle in each cluster by using a priority experience playback MATD3 method, assisting users communicating with the base station by the unmanned aerial vehicles and the intelligent reflecting surfaces, and completing the joint optimization of tracks of the multiple unmanned aerial vehicles and the phase shift of the intelligent reflecting surfaces by using the activated reflecting elements and the phase shift of the activated reflecting elements.
In yet another embodiment of the present invention, a terminal device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor being configured to execute the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, etc., which is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and is specifically adapted to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; the processor provided by the embodiment of the invention can be used for the operation of the multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system, and comprises the following steps:
establishing a wireless communication system model based on multiple unmanned aerial vehicles and intelligent reflecting surface assistance, reflecting a signal sent by a user to a base station by an intelligent reflecting surface installed on the unmanned aerial vehicles, determining a channel model in the wireless communication system model and energy consumption models of the unmanned aerial vehicles and the intelligent reflecting surface, and calculating the energy efficiency of the wireless communication system model; based on the determined channel model and the energy consumption models of the unmanned aerial vehicle and the intelligent reflecting surface, clustering ground users by using a K-means clustering algorithm, taking energy efficiency as an optimization target, then determining the position of the unmanned aerial vehicle in each cluster by using priority experience playback MATD3, assisting users communicating with the base station by the unmanned aerial vehicle and the intelligent reflecting surface, and completing the joint optimization of the tracks of the unmanned aerial vehicles and the phase shift of the intelligent reflecting surface by using the activated reflecting elements and the phase shift of the activated reflecting elements.
In still another embodiment of the present invention, the present invention further provides a storage medium, specifically a computer-readable storage medium (Memory), which is a Memory device in a terminal device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include a built-in storage medium in the terminal device, and may also include an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also, one or more instructions, which may be one or more computer programs (including program code), are stored in the memory space and are adapted to be loaded and executed by the processor. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor can load and execute one or more instructions stored in the computer readable storage medium to realize the corresponding steps of the method and the system for joint optimization of the multi-unmanned aerial vehicle track and the intelligent reflecting surface phase shift in the embodiment; one or more instructions in the computer-readable storage medium are loaded by the processor and perform the steps of:
establishing a wireless communication system model based on multiple unmanned aerial vehicles and intelligent reflecting surface assistance, reflecting a signal sent by a user to a base station by an intelligent reflecting surface installed on the unmanned aerial vehicles, determining a channel model in the wireless communication system model and energy consumption models of the unmanned aerial vehicles and the intelligent reflecting surface, and calculating the energy efficiency of the wireless communication system model; based on the determined channel model and the energy consumption models of the unmanned aerial vehicle and the intelligent reflecting surface, clustering ground users by using a K-means clustering algorithm, taking energy efficiency as an optimization target, then determining the position of the unmanned aerial vehicle in each cluster by using priority experience playback MATD3, assisting users communicating with the base station by the unmanned aerial vehicle and the intelligent reflecting surface, and completing the joint optimization of the tracks of the unmanned aerial vehicles and the phase shift of the intelligent reflecting surface by using the activated reflecting elements and the phase shift of the activated reflecting elements.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The joint optimization algorithm for multi-drone trajectory and intelligent reflective surface phase shift based on priority empirical playback MATD3 is summarized as follows:
the simulation parameters are set as follows:
referring to fig. 5, the energy efficiency of the system varies with the user transmission power when the maddppg method, the MATD3 method, the priority empirical review maddppg method, and the priority empirical review MATD3 method are used. As can be seen from the figure, the energy efficiency of the system is higher when the priority experience replay method is used than when the priority experience replay method is not used, and the energy efficiency of the system is higher when the MATD3 method is used than when the maddppg method is used, because the probability that the experience with higher learning value in the experience memory is sampled is increased when the priority experience replay is used, learning from the experiences increases the learning efficiency, and the MATD3 method can overcome the problem that the Q value is overestimated, so that the value of the state-behavior pair is accurately evaluated by the value network. In addition, when the transmission power of the user increases, the amount of data to be transmitted increases, and thus the energy efficiency of the system increases.
In summary, the method and the system for joint optimization of multiple unmanned aerial vehicle tracks and intelligent reflecting surface phase shifts consider an uplink wireless communication system based on multiple unmanned aerial vehicles and intelligent reflecting surface assistance, firstly, ground users are clustered, an unmanned aerial vehicle provided with an intelligent reflecting surface is distributed to each cluster to provide service for users in the cluster, and then joint optimization of unmanned aerial vehicle tracks and intelligent reflecting surface phase shifts in each cluster is completed by using a priority experience playback MATD3 method, so that the purpose of maximum energy efficiency of the system is achieved.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.
Claims (10)
1. A multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method is characterized by comprising the following steps:
s1, establishing a wireless communication system model based on multiple unmanned aerial vehicles and intelligent reflecting surface assistance, reflecting a signal sent by a user to a base station by an intelligent reflecting surface installed on the unmanned aerial vehicles, determining a channel model in the wireless communication system model and energy consumption models of the unmanned aerial vehicles and the intelligent reflecting surface, and calculating the energy efficiency of the wireless communication system model;
s2, based on the channel model determined in the step S1 and the energy consumption models of the unmanned aerial vehicle and the intelligent reflecting surface, clustering ground users by using a K-means clustering algorithm, taking energy efficiency as an optimization target, then determining the position of the unmanned aerial vehicle in each cluster by using a priority experience playback MATD3 method, assisting users communicating with the base station by the unmanned aerial vehicle and the intelligent reflecting surface, and completing the joint optimization of the tracks of the unmanned aerial vehicles and the phase shift of the intelligent reflecting surface by the phase shift of the activated reflecting element and the activated reflecting element.
2. The method according to claim 1, wherein in step S1, the wireless communication system model based on multiple drones and the intelligent reflector assistance is specifically: the number of randomly distributed users is U, the user U is divided into K areas, and the number of the users in each area is Uk,u1+…+uk+…+uKThe number of the intelligent reflecting surfaces and the number of the unmanned aerial vehicles are K, and each unmanned aerial vehicle provided with the intelligent reflecting surface serves users in one area; the intelligent reflecting surface carried on the unmanned aerial vehicle adjusts the phase shift of the M reflecting elements through an integrated controller; the base station receives signals reflected by all the intelligent reflecting surfaces at the same time; the number of antennas of the base station is N, the number of reflecting elements of the intelligent reflecting surface is M, and the user is a single antenna; the coordinates of the base station are (x)BS,yBS,zBS) The coordinates of the intelligent reflecting surface p areThe coordinates of the user q areOnly one user in a region transmits a signal, each user transmittingThe transmitted signals are reflected to the base station through the intelligent reflecting surface serving the area and are reflected to the base station through the intelligent reflecting surfaces serving other areas, and meanwhile, the number of users participating in communication and the number of the intelligent reflecting surfaces are K; each reflecting element of the intelligent reflecting surface independently adjusts the phase shift of an incident signal, simultaneously keeps the amplitude unchanged, and the phase shift matrix of the intelligent reflecting surface p is a diagonal matrix thetap=diag(νp) Element on diagonalθpmRepresenting the phase shift of the mth reflecting element of the intelligent reflecting surface p; the matrix of activated reflecting elements of the intelligent reflecting surface is a diagonal matrix deltap=diag(υp) Element v on diagonalp=(δp1,…,δpm,…,δpM),δpmIndicating whether the mth reflecting element of the intelligent reflecting surface p is activated.
3. The method according to claim 1, wherein in step S1, the signal sent by the user is reflected by the intelligent reflection surface of the drone to the base station in the decision phase, the flight phase and the information transmission phase, and the decision phase is: the unmanned aerial vehicle selects which user to communicate with, and selects the position for information transmission, and the intelligent reflecting surface selects the activated reflecting element and the phase shift thereof; a flight phase: the unmanned aerial vehicle flies to the information transmission position selected in the decision stage along a straight line at a speed v; and (3) information transmission stage: the unmanned aerial vehicle hovers after reaching a specified position, the selected users send signals to the intelligent reflecting surface in the decision phase, and the activated reflecting elements of the intelligent reflecting surface reflect the signals sent by the users to the base station with corresponding phase offsets.
4. The method of claim 1, wherein in step S1, the channels between the user and the intelligent reflective surface and between the intelligent reflective surface and the base station are modeled as rice channels, and the channel G from the user q to the intelligent reflective surface ppqIs as follows;
where ρ represents the reference distance d0Path loss at 1m, k1Is the path loss exponent, beta is the Rice fading factor, d1Is the Euclidean distance, G, between the user q and the intelligent reflecting surface ppqIs a non-line-of-sight propagation component,is a vector of the response of the array,a cosine value representing an arrival angle of a signal from a user q to the intelligent reflecting surface p, wherein lambda represents the wavelength of a carrier wave, and d represents the antenna spacing;
channel F from intelligent reflecting surface p to base stationpComprises the following steps:
wherein d is2Representing the Euclidean distance, F, between the intelligent reflecting surface p and the base stationpIs a non-line-of-sight propagation component,andis an array response vector;
the received signal y of the base station is:
where S is a transmit signal matrix, H is a channel matrix, HkIs the kth column, s, of the matrix HkIs the kth row of the matrix SN represents additive white Gaussian noise at the base station end, and the variance is sigma2The cyclic symmetric complex gaussian variable of (a);
regarding the interference of other users as noise, the SINR of the k-th userkComprises the following steps:
information transmission rate R of kth userkComprises the following steps:
where K is the number of users communicating with the base station at the same time, wkFor the kth row of the zero-forcing detection filter matrix,is the conjugate transpose of the channel matrix between the intelligent reflecting surface p and the base station, thetapIs a phase shift matrix of the intelligent reflecting surface p, DeltapMatrix of activated reflecting elements, G, being intelligent reflecting surfaces ppqFor the channel between user q and intelligent reflecting surface p, GpkFor the channel between user k and intelligent reflecting surface p, σ2Is the variance of the noise.
5. The method according to claim 1, characterized in that in step S1, energy efficiency EEpFor the total energy that the data volume of transmission divided unmanned aerial vehicle p and intelligent plane of reflection p consumed, specifically be:
wherein,fly to the finger for unmanned aerial vehicleEnergy consumed at fixed position, GpFor the data quantity transmitted to the base station by the user p through the assistance of the unmanned plane p and the intelligent reflecting surface p,the energy consumed for the intelligent reflecting surface p,for the propulsion power of drone p, T is the time required for the drone to fly to the designated location.
6. The method according to claim 1, wherein in step S2, the users are clustered using a K-means clustering algorithm, specifically:
and if the clustering centers of all the clusters are completely the same as the result obtained by the last calculation, the clustering criterion function is converged, and all the users are classified into the correct clusters.
7. The method of claim 1, wherein in step S2, the position of the drone in each cluster, the position of the user communicating with the base station, the activated reflective element of the intelligent reflective surface, and the phase shift of the activated element are determined by using a priority empirical review MATD3 method, and the joint optimization of the trajectories of the multiple drones and the phase shift of the intelligent reflective surface is specifically performed as follows:
modeling optimization problems of unmanned aerial vehicle tracks and intelligent reflecting surface phase shift in a wireless communication system based on multiple unmanned aerial vehicles and intelligent reflecting surface assistance into a Markov game, wherein each unmanned aerial vehicle provided with the intelligent reflecting surface is used as an intelligent agent, and the kth intelligent agent observes the current environment state skBased on a strategy of pikSelecting an actionakThe reward r obtained after the action acts on the environmentkThe environment will then be represented by a transition probability P (s'k|sk,a1,…,aK) Transition to New State s'k;
In each moment, the kth agent observes the position of the unmanned aerial vehicle k at the last moment and the position of the user communicating with the base station in the kth cluster as a state skThe parameter of the training strategy network is thetakWill state skAs input, the position of the kth unmanned aerial vehicle at the current moment, the activated user vector in the kth cluster for communicating with the base station, the activated element vector and the phase shift vector of the kth intelligent reflecting surface are output as the behavior ak(ii) a The parameters of the first training value network and the second training value network are respectively omegak1And ωk2Two training value networks put the joint state s observed by each agent (s ═ s)1,s2,…,sK) And the joint action a ═ a taken1,a2,…,aK) As inputs, the joint state-behavior cost function Q is output separatelyk1(s,a1,a2,…,aK,ωk1) And Qk2(s,a1,a2,…,aK,ωk2) Target policy network will next state s'kAs input, the next action a 'is output'kAccording to the parameter theta of the training strategy network in a soft updating modekUpdating parameter theta 'of target policy network'kThe first target value network and the second target value network input the next state-behavior pair (s ', a'), and output them respectivelyAnd Q'k2(s',a′1,a′2,…,a′K,ω′k2) According to the parameter omega of the first training value network in a soft updating modek1And a parameter ω of the second training value networkk2Updating parameter omega 'of first target value network'k1And a parameter ω 'of a second target value network'k2;
Will (s, a)1,a2,…,aK,r1,r2,…,rKS') as an experience of the agent is stored in an experience memory, and when the experience memory reaches the maximum storage capacity, a small batch of experiences are sampled from the experience memory by using a priority experience playback method for training, and parameters of the strategy network and parameters of the value network are updated.
8. The method of claim 7, wherein the state s observed by each dronekComprising two parts, respectively the position of drone K (K ═ 1,2, …, K) at the last moment,and in the kth cluster, assisting the position of a user communicating with the base station by the kth unmanned aerial vehicle and the intelligent reflecting surface,state skThe dimension of (A) is six; behavior akThe method comprises the following four parts:
ii: activated user vector communicating with base station in kth cluster at current timeEach element represents whether the corresponding user is activated or not, the value of 0 represents that the corresponding user is not activated, the value of 1 represents that the corresponding user is activated, and the vectorShould satisfyIs shown inAt any moment, only one activated user in one cluster is available;
iii: activated element vector of k-th intelligent reflecting surface at current momentEach element represents whether the corresponding reflection element is activated or not, the value of 0 represents that the corresponding reflection element is not activated, the value of 1 represents that the corresponding reflection element is activated, and the vectorShould satisfyThe number of activated elements of each intelligent reflecting surface is between 1 and M;
iv: phase shift vector of intelligent reflecting surface at current momentEach of which represents a phase shift of the corresponding reflective element,
reward is defined as energy efficiency EEk,rk(sk,ak)=EEk。
9. The method of claim 7, wherein the parameter θ of the training strategy network of the kth agent is updated using a strategy gradient methodkComprises the following steps:
wherein, J (theta)k) Is a strategic objective function, F denotes the size of the small batch of samples,the expression of the gradient operator is used to indicate,is the policy learned by the kth agent,to sample the state of the kth agent in the jth experience using the priority empirical playback method,behavior of the kth agent in the jth experience;
parameter ω of training value network 1 for kth agentk1And a parameter omega of the training value network 2k2Updating by gradient back propagation of the neural network, and the loss functions are respectively:
parameter theta 'of target policy network'kParameter ω 'of target value network 1'k1And parameter ω 'of target value network 2'k2And respectively updating by using a soft updating mode:
θ′k←αθk+(1-α)θ′k
ω′k1←αωk1+(1-α)ω′k1
ω′k2←αωk2+(1-α)ω′k2
where α represents an update coefficient.
10. The utility model provides a many unmanned aerial vehicle orbit and intelligent plane of reflection phase shift joint optimization system which characterized in that includes:
the energy module is used for establishing a wireless communication system model based on multi-unmanned aerial vehicle and intelligent reflecting surface assistance, signals sent by a user are reflected to a base station by the intelligent reflecting surface installed on the unmanned aerial vehicle, a channel model in the wireless communication system model and energy consumption models of the unmanned aerial vehicle and the intelligent reflecting surface are determined, and the energy efficiency EE of the wireless communication system model is calculatedp;
And the optimization module is used for clustering ground users by using a K-means clustering algorithm based on a channel model determined by the energy module and energy consumption models of the unmanned aerial vehicles and the intelligent reflecting surfaces, taking energy efficiency as an optimization target, then determining the position of the unmanned aerial vehicle in each cluster by using a priority experience playback MATD3 method, assisting users communicating with the base station by the unmanned aerial vehicles and the intelligent reflecting surfaces, and completing the joint optimization of tracks of the multiple unmanned aerial vehicles and the phase shift of the intelligent reflecting surfaces by using the activated reflecting elements and the phase shift of the activated reflecting elements.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110573024.6A CN113364495B (en) | 2021-05-25 | 2021-05-25 | Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110573024.6A CN113364495B (en) | 2021-05-25 | 2021-05-25 | Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113364495A true CN113364495A (en) | 2021-09-07 |
CN113364495B CN113364495B (en) | 2022-08-05 |
Family
ID=77527508
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110573024.6A Active CN113364495B (en) | 2021-05-25 | 2021-05-25 | Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113364495B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113949474A (en) * | 2021-09-27 | 2022-01-18 | 江苏科技大学 | Unmanned aerial vehicle geometric model establishing method based on assistance of intelligent reflecting surface |
CN114051204A (en) * | 2021-11-08 | 2022-02-15 | 南京大学 | Unmanned aerial vehicle auxiliary communication method based on intelligent reflecting surface |
CN114124266A (en) * | 2022-01-24 | 2022-03-01 | 南京中网卫星通信股份有限公司 | Channel modeling method based on IRS (intelligent resilient system) for assisting communication between unmanned aerial vehicle and unmanned ship |
CN114142898A (en) * | 2021-12-03 | 2022-03-04 | 深圳市大数据研究院 | Intelligent reflecting surface phase shift control method and related product |
CN114257298A (en) * | 2022-01-17 | 2022-03-29 | 电子科技大学 | Intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method |
CN114422060A (en) * | 2022-03-29 | 2022-04-29 | 军事科学院系统工程研究院网络信息研究所 | Method and system for constructing unmanned aerial vehicle communication channel model |
CN114422056A (en) * | 2021-12-03 | 2022-04-29 | 北京航空航天大学 | Air-ground non-orthogonal multiple access uplink transmission method based on intelligent reflecting surface |
CN114980132A (en) * | 2022-04-12 | 2022-08-30 | 合肥工业大学 | Position deployment method and system of intelligent reflecting surface |
CN115334519A (en) * | 2022-06-30 | 2022-11-11 | 北京科技大学 | User association and phase shift optimization method and system in unmanned aerial vehicle IRS network |
CN115801157A (en) * | 2023-02-09 | 2023-03-14 | 中国人民解放军军事科学院系统工程研究院 | Construction method of multi-unmanned aerial vehicle cooperative communication channel model |
CN117103282A (en) * | 2023-10-20 | 2023-11-24 | 南京航空航天大学 | Double-arm robot cooperative motion control method based on MATD3 algorithm |
CN117858105A (en) * | 2024-03-07 | 2024-04-09 | 中国电子科技集团公司第十研究所 | Multi-unmanned aerial vehicle cooperation set dividing and deploying method in complex electromagnetic environment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130337822A1 (en) * | 2012-06-13 | 2013-12-19 | All Purpose Networks LLC | Locating and tracking user equipment in the rf beam areas of an lte wireless system employing agile beam forming techniques |
CN111193536A (en) * | 2019-12-11 | 2020-05-22 | 西北工业大学 | Multi-unmanned aerial vehicle base station track optimization and power distribution method |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
CN112118556A (en) * | 2020-03-02 | 2020-12-22 | 湖北工业大学 | Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning |
US20210003412A1 (en) * | 2019-01-16 | 2021-01-07 | Beijing University Of Posts And Telecommunications | Method and Device of Path Optimization for UAV, and Storage Medium thereof |
CN112532300A (en) * | 2020-11-25 | 2021-03-19 | 北京邮电大学 | Trajectory optimization and resource allocation method for single unmanned aerial vehicle backscatter communication network |
CN112769464A (en) * | 2020-12-29 | 2021-05-07 | 北京邮电大学 | Wireless communication method and device |
-
2021
- 2021-05-25 CN CN202110573024.6A patent/CN113364495B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130337822A1 (en) * | 2012-06-13 | 2013-12-19 | All Purpose Networks LLC | Locating and tracking user equipment in the rf beam areas of an lte wireless system employing agile beam forming techniques |
US20210003412A1 (en) * | 2019-01-16 | 2021-01-07 | Beijing University Of Posts And Telecommunications | Method and Device of Path Optimization for UAV, and Storage Medium thereof |
CN111193536A (en) * | 2019-12-11 | 2020-05-22 | 西北工业大学 | Multi-unmanned aerial vehicle base station track optimization and power distribution method |
CN112118556A (en) * | 2020-03-02 | 2020-12-22 | 湖北工业大学 | Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
CN112532300A (en) * | 2020-11-25 | 2021-03-19 | 北京邮电大学 | Trajectory optimization and resource allocation method for single unmanned aerial vehicle backscatter communication network |
CN112769464A (en) * | 2020-12-29 | 2021-05-07 | 北京邮电大学 | Wireless communication method and device |
Non-Patent Citations (6)
Title |
---|
CHENGCHENG FENG ET AL: "Trajectory and Beamforming Vector Optimization for Multi-UAV Multicast Network", 《2019 11TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING》 * |
LINGHUI GE ET AL: "Joint Beamforming and Trajectory Optimization for Intelligent Reflecting Surfaces-Assisted UAV Communications", 《IEEE ACCESS》 * |
SHENGJUN WU: "Illegal Radio Station Localization with UAV-Based Q-Learning", 《中国通信》 * |
SHIYU JIAO ET AL: "Joint Beamforming and Phase Shift Design in Downlink UAV Networks with IRS-Assisted NOMA", 《JOURNAL OF COMMUNICATIONS AND INFORMATION NETWORKS》 * |
ZINA MOHAMED ET AL: "Resource Allocation for Energy-Efficient Cellular Communications via Aerial IRS", 《2021 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE》 * |
郝立元: "无人机中继通信轨迹和功率优化策略研究", 《电子制作》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113949474B (en) * | 2021-09-27 | 2023-08-22 | 江苏科技大学 | Unmanned aerial vehicle geometric model building method based on intelligent reflecting surface assistance |
CN113949474A (en) * | 2021-09-27 | 2022-01-18 | 江苏科技大学 | Unmanned aerial vehicle geometric model establishing method based on assistance of intelligent reflecting surface |
CN114051204B (en) * | 2021-11-08 | 2022-08-09 | 南京大学 | Unmanned aerial vehicle auxiliary communication method based on intelligent reflecting surface |
CN114051204A (en) * | 2021-11-08 | 2022-02-15 | 南京大学 | Unmanned aerial vehicle auxiliary communication method based on intelligent reflecting surface |
CN114422056B (en) * | 2021-12-03 | 2023-05-23 | 北京航空航天大学 | Space-to-ground non-orthogonal multiple access uplink transmission method based on intelligent reflecting surface |
CN114422056A (en) * | 2021-12-03 | 2022-04-29 | 北京航空航天大学 | Air-ground non-orthogonal multiple access uplink transmission method based on intelligent reflecting surface |
CN114142898B (en) * | 2021-12-03 | 2022-09-20 | 深圳市大数据研究院 | Intelligent reflecting surface phase shift control method and device and storage medium |
CN114142898A (en) * | 2021-12-03 | 2022-03-04 | 深圳市大数据研究院 | Intelligent reflecting surface phase shift control method and related product |
CN114257298A (en) * | 2022-01-17 | 2022-03-29 | 电子科技大学 | Intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method |
CN114257298B (en) * | 2022-01-17 | 2022-09-27 | 电子科技大学 | Intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method |
CN114124266A (en) * | 2022-01-24 | 2022-03-01 | 南京中网卫星通信股份有限公司 | Channel modeling method based on IRS (intelligent resilient system) for assisting communication between unmanned aerial vehicle and unmanned ship |
CN114124266B (en) * | 2022-01-24 | 2022-04-12 | 南京中网卫星通信股份有限公司 | Channel modeling method based on IRS (intelligent resilient system) for assisting communication between unmanned aerial vehicle and unmanned ship |
CN114422060A (en) * | 2022-03-29 | 2022-04-29 | 军事科学院系统工程研究院网络信息研究所 | Method and system for constructing unmanned aerial vehicle communication channel model |
CN114980132A (en) * | 2022-04-12 | 2022-08-30 | 合肥工业大学 | Position deployment method and system of intelligent reflecting surface |
CN115334519A (en) * | 2022-06-30 | 2022-11-11 | 北京科技大学 | User association and phase shift optimization method and system in unmanned aerial vehicle IRS network |
CN115334519B (en) * | 2022-06-30 | 2024-01-26 | 北京科技大学 | User association and phase shift optimization method and system in unmanned aerial vehicle IRS network |
CN115801157A (en) * | 2023-02-09 | 2023-03-14 | 中国人民解放军军事科学院系统工程研究院 | Construction method of multi-unmanned aerial vehicle cooperative communication channel model |
CN115801157B (en) * | 2023-02-09 | 2023-05-05 | 中国人民解放军军事科学院系统工程研究院 | Construction method of multi-unmanned aerial vehicle cooperative communication channel model |
CN117103282A (en) * | 2023-10-20 | 2023-11-24 | 南京航空航天大学 | Double-arm robot cooperative motion control method based on MATD3 algorithm |
CN117103282B (en) * | 2023-10-20 | 2024-02-13 | 南京航空航天大学 | Double-arm robot cooperative motion control method based on MATD3 algorithm |
CN117858105A (en) * | 2024-03-07 | 2024-04-09 | 中国电子科技集团公司第十研究所 | Multi-unmanned aerial vehicle cooperation set dividing and deploying method in complex electromagnetic environment |
CN117858105B (en) * | 2024-03-07 | 2024-05-24 | 中国电子科技集团公司第十研究所 | Multi-unmanned aerial vehicle cooperation set dividing and deploying method in complex electromagnetic environment |
Also Published As
Publication number | Publication date |
---|---|
CN113364495B (en) | 2022-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113364495B (en) | Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system | |
CN113194488B (en) | Unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system | |
CN113162679B (en) | DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method | |
Wang et al. | Deep reinforcement learning based dynamic trajectory control for UAV-assisted mobile edge computing | |
Li et al. | Path planning for cellular-connected UAV: A DRL solution with quantum-inspired experience replay | |
Fan et al. | RIS-assisted UAV for fresh data collection in 3D urban environments: A deep reinforcement learning approach | |
CN114422363B (en) | Capacity optimization method and device for unmanned aerial vehicle-mounted RIS auxiliary communication system | |
CN115499921A (en) | Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network | |
CN116684925B (en) | Unmanned aerial vehicle-mounted intelligent reflecting surface safe movement edge calculation method | |
CN110312265B (en) | Power distribution method and system for unmanned aerial vehicle formation communication coverage | |
CN113382060B (en) | Unmanned aerial vehicle track optimization method and system in Internet of things data collection | |
CN113784314B (en) | Unmanned aerial vehicle data and energy transmission method assisted by intelligent reflection surface | |
CN115314904B (en) | Communication coverage method based on multi-agent maximum entropy reinforcement learning and related equipment | |
Luo et al. | A two-step environment-learning-based method for optimal UAV deployment | |
Xu et al. | Joint power and trajectory optimization for IRS-aided master-auxiliary-UAV-powered IoT networks | |
Wei et al. | Differential game-based deep reinforcement learning in underwater target hunting task | |
Wang et al. | Trajectory optimization and power allocation scheme based on DRL in energy efficient UAV‐aided communication networks | |
CN114372612B (en) | Path planning and task unloading method for unmanned aerial vehicle mobile edge computing scene | |
Shi et al. | Age of information optimization with heterogeneous uavs based on deep reinforcement learning | |
Hu et al. | Digital twins-based multi-agent deep reinforcement learning for UAV-assisted vehicle edge computing | |
Yang et al. | Path planning of UAV base station based on deep reinforcement learning | |
Wang et al. | Energy Efficiency Optimization of IRS and UAV-Assisted Wireless Powered Edge Networks | |
CN116009590B (en) | Unmanned aerial vehicle network distributed track planning method, system, equipment and medium | |
CN114257298B (en) | Intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method | |
Gao et al. | MO-AVC: Deep Reinforcement Learning Based Trajectory Control and Task Offloading in Multi-UAV Enabled MEC Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |