CN115802313A - Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface - Google Patents

Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface Download PDF

Info

Publication number
CN115802313A
CN115802313A CN202211472603.2A CN202211472603A CN115802313A CN 115802313 A CN115802313 A CN 115802313A CN 202211472603 A CN202211472603 A CN 202211472603A CN 115802313 A CN115802313 A CN 115802313A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
representing
reflecting surface
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211472603.2A
Other languages
Chinese (zh)
Other versions
CN115802313B (en
Inventor
周毅
晋占齐
石华光
吴金月
宁念文
张延宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN202211472603.2A priority Critical patent/CN115802313B/en
Publication of CN115802313A publication Critical patent/CN115802313A/en
Application granted granted Critical
Publication of CN115802313B publication Critical patent/CN115802313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides an air-ground mobile network energy-carrying fair communication method based on an intelligent reflecting surface, which comprises the following steps: establishing an air-ground mobile network architecture based on multiple unmanned aerial vehicles and an intelligent reflecting surface; establishing a wireless power transmission model; establishing an energy consumption model of the unmanned aerial vehicle according to the dynamic model and the communication model of the unmanned aerial vehicle; reconstructing a channel state between the unmanned aerial vehicle and a ground user by using the intelligent reflecting surface, and establishing a wireless communication model; establishing a fair communication model; constructing a judgment matrix about fair throughput and energy consumption, and determining weight coefficients of two sub-targets of fair weighted throughput and energy consumption; modeling is a multi-objective integer non-convex optimization problem of fair throughput and unmanned aerial vehicle residual energy maximization, and a complex multi-objective optimization problem is solved through multi-agent deep reinforcement learning. The invention optimizes the position of the unmanned aerial vehicle and the phase of the intelligent reflecting surface based on multi-agent deep reinforcement learning, provides fair communication for ground users and wirelessly charges the unmanned aerial vehicle.

Description

Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface
Technical Field
The invention relates to the technical field of air-ground cooperative auxiliary communication, in particular to an air-ground mobile network energy-carrying fair communication method based on an intelligent reflecting surface, which is used for carrying out fair energy-carrying communication based on an unmanned aerial vehicle and the intelligent reflecting surface under limited communication resources.
Background
At present, the establishment and implementation of communication networks mainly rely on terrestrial base stations or other fixed communication devices, and the flexibility thereof is greatly limited. To address this problem, drone-assisted wireless communication has gained widespread attention as a new communication means in both academic and industrial sectors. The unmanned aerial vehicle has the advantages of high maneuverability, strong universality, rapid deployment and the like, and is widely applied to the fields of intelligent traffic, post-disaster reconstruction, emergency communication, remote area communication range expansion and the like.
In the unmanned aerial vehicle auxiliary communication, the unmanned aerial vehicle mainly serves as a mobile base station to provide communication service for ground users. The high mobility and flexibility of the drone can quickly establish a communication connection and significantly improve data transmission efficiency, for example, when a ground infrastructure communication facility is damaged by a natural disaster or the like, the drone can be used as a temporary base station to provide a temporary emergency communication service for a ground user. Drone-assisted communication still presents the following challenges: the position optimization of the unmanned aerial vehicle mobile base station is a typical optimal sequence decision problem, the problem is usually provided with a plurality of decision variables and is non-convex, and the problem is difficult to directly solve by adopting a traditional convex optimization method. In addition, when the traditional method solves the problem of trajectory optimization, the calculation complexity exponentially increases along with the number of unmanned aerial vehicles and ground users. The channel state between the drone and the user is susceptible to the external environment and does not take into account the impact of fairness between users on system performance in optimizing system communication efficiency.
Intelligent reflective surfaces have been extensively studied with the benefit of improved propagation environment and increased signal strength, and generally consist of energy-efficient, cost-effective reconfigurable passive components. Each of which may be phase shifted by an intelligent controller with respect to the incident signal. Thus, with the help of the intelligent reflective surface, signals from different communication links can be superimposed at the desired receivers to increase the energy of the received signal, or can be destructively added at the undesired receivers to avoid information leakage. Due to the fact that urban environments are complex and changeable and various buildings are shielded, the intelligent reflecting surface is used for reconstructing the transmission link to play a very important role in future smart cities.
Disclosure of Invention
Aiming at the technical problems that the existing unmanned aerial vehicle auxiliary communication method is easily influenced by the external environment and fairness among users is not considered, the invention provides an air-ground mobile network energy-carrying fair communication method based on an intelligent reflector, the position of the unmanned aerial vehicle and the phase of the intelligent reflector are optimized based on a multi-agent deep reinforcement learning optimization algorithm, and the wireless charging of the unmanned aerial vehicle is realized while fair communication is provided for ground users.
In order to achieve the purpose, the technical scheme of the invention is realized as follows: an air-ground mobile network energy-carrying fair communication method based on an intelligent reflecting surface comprises the following steps:
s1: establishing an air-ground mobile network architecture based on a plurality of unmanned aerial vehicles and an intelligent reflecting surface, wherein the air-ground mobile network architecture comprises K ground mobile users and D unmanned aerial vehicles;
s2: establishing a wireless power transmission model according to a wireless power transmission technology: the intelligent lamp pole is used as an energy source, and two transmission paths of the intelligent lamp-the direct transmission of the unmanned aerial vehicle and the intelligent lamp pole-the intelligent reflecting surface-the indirect transmission of the unmanned aerial vehicle are adopted to realize the wireless charging of the unmanned aerial vehicle;
s3: establishing an energy consumption model of the unmanned aerial vehicle according to the dynamic model and the communication model of the unmanned aerial vehicle;
s4: reconstructing a channel state between the unmanned aerial vehicle and a ground user by using the intelligent reflecting surface, and establishing a wireless communication model;
s5: establishing a fair communication model: a fair communication model is established by considering communication efficiency and fairness among users, and system throughput is maximized on the premise of guaranteeing user fairness;
s6: constructing a judgment matrix about fair throughput and energy consumption according to the user service quality grade, solving and normalizing the eigenvalue and the eigenvector of the judgment matrix, and determining the weight coefficients of two sub-targets of fair weighted throughput and energy consumption;
s7: the unmanned aerial vehicle energy carrying communication problem is modeled into a multi-target integer non-convex optimization problem with fair throughput and maximized unmanned aerial vehicle residual energy, the problem is described as a Markov game process again, the complex multi-target optimization problem is solved through multi-agent deep reinforcement learning, and the position of the unmanned aerial vehicle and the phase of an intelligent reflecting surface are updated.
Preferably, the set of terrestrial mobile users is represented as
Figure BDA0003954413880000021
The set of drones is denoted as
Figure BDA0003954413880000022
The intelligent reflecting surface is a reflecting surface responsible for charging, the unmanned aerial vehicle serves as a mobile base station to provide communication service for ground mobile users, and the intelligent lamp pole serves as an energy source to provide energy transmission for the unmanned aerial vehicle;
the method for constructing the wireless power transmission model in the step S2 comprises the following steps: an intelligent reflecting surface is adopted to reconstruct an energy transmission path, an energy beam emitted by a power emitter of the intelligent lamp pole reaches a light receiver of the unmanned aerial vehicle through two transmission paths of the intelligent lamp pole-the unmanned aerial vehicle and the intelligent lamp pole-the intelligent reflecting surface-the unmanned aerial vehicle, and the energy harvested by the unmanned aerial vehicle is as follows:
Figure BDA0003954413880000023
wherein,
Figure BDA0003954413880000024
representing the direct channel gain, H, between the drone and the terrestrial mobile users s (t) represents the channel gain between the energy source and the drone,
Figure BDA0003954413880000025
the channel gain between the intelligent reflecting surface and the unmanned aerial vehicle is represented, theta (t) represents the phase of the intelligent reflecting surface, and eta represents the energy conversion coefficient.
Preferably, the method for establishing the energy consumption model of the unmanned aerial vehicle in S3 includes:
unmanned aerial vehicle has the energy resource consumption that removes consumption, communication consumption and internal circuit produced at the in-process of carrying out the auxiliary communication task, then the total energy consumption of unmanned aerial vehicle is:
Figure BDA0003954413880000031
wherein, P t (t) denotes the transmitted power of the drone, P c (t) constant circuit power, P, for the drone d (t) represents the propulsion power of the drone at time t, and the propulsion power P d (t) is expressed as:
Figure BDA0003954413880000032
wherein,
Figure BDA0003954413880000033
representing the flying speed, V, of the drone at time t max Represents the maximum flying speed of the unmanned aerial vehicle;
Figure BDA0003954413880000034
respectively representing the positions of the unmanned aerial vehicle at t +1 moment and t moment and the time difference between the two moments; p o Representing blade section power, P i Indicating hovering power, v 0 Denotes the average induced speed of the rotor, d 0 Expressing the fuselage drag coefficient, ρ air density, s motor volume, A rotor area, U tip Representing the linear speed of the blade tip;
the remaining energy of the drone is:
Figure BDA0003954413880000035
wherein E is max Represents the maximum energy value after the unmanned aerial vehicle is fully charged, E d (t) represents the battery energy remaining at the drone at time t,
Figure BDA0003954413880000036
representing the energy harvested by the drone.
Preferably, the method for establishing the wireless communication model in step S4 includes: adjusting the phase and amplitude reconstruction channel conditions of the intelligent reflecting surface to improve the transmission rate between the unmanned aerial vehicle and the user, wherein the transmission rate received by the ground mobile user is as follows:
Figure BDA0003954413880000037
wherein,
Figure BDA0003954413880000038
a phase coefficient matrix representing the anti-jamming intelligent reflecting surface,
Figure BDA0003954413880000039
indicating the phase, M, of an intelligent reflecting surface in wireless information transmission r 、M c Respectively representing the number of reflecting elements on the rows and columns of the intelligent reflecting surface,
Figure BDA00039544138800000310
(m) th representing intelligent reflecting surface r ,m c ) The phase of the reflective element; h is a total of UG (t) denotes the channel gain of the transmission link for the unmanned aerial vehicle and the terrestrial mobile subscriber, B k (t) bandwidth allocated to the ground mobile user k by the drone, α d,k (t) indicates whether unmanned plane d is serving ground mobile user k, P t Denotes the transmission power, h RG (t) represents the channel gain between the intelligent reflecting surface and the land mobile user, h UR (t) represents the channel gain, σ, between the drone and the smart reflecting surface 2 Representing gaussian white noise.
Preferably, the unmanned plane U-intelligent reflector R, the intelligent reflector R-ground mobile user G and the unmanned plane U-ground mobile user G are arranged to have channel gain h between the unmanned plane U-intelligent reflector R and the unmanned plane U-ground mobile user G without a line of sight link between the unmanned plane and the ground mobile user UR (t)、h RG (t) and h UG (t) are respectively:
Figure BDA00039544138800000311
Figure BDA0003954413880000041
Figure BDA0003954413880000042
wherein, beta 0 Denotes the channel power gain per unit distance, D UR (t)、D RG (t) and D UG (t) respectively representing the distances between the unmanned aerial vehicle U-intelligent reflector R, the intelligent reflector R-ground mobile user G and the unmanned aerial vehicle U-ground mobile user G at time t, and respectively representing the distances; 2. alpha and beta represent unmanned plane U-intelligence respectivelyThe path loss indexes on the links of the reflective surface R, the intelligent reflective surface R-ground mobile user G and the unmanned aerial vehicle U-ground mobile user G;
Figure BDA0003954413880000043
the portion representing the line of sight in the U-intelligent reflector R link of the drone, depending on the flight trajectory of the drone at time slot n, is represented by:
Figure BDA0003954413880000044
wherein,
Figure BDA0003954413880000045
Figure BDA0003954413880000046
and is
Figure BDA0003954413880000047
And
Figure BDA0003954413880000048
cosine and sine respectively representing the horizontal arrival angle of the signal on the intelligent reflecting surface;
Figure BDA0003954413880000049
a sine representing a vertical arrival angle of the signal at the intelligent reflecting surface; λ represents the wavelength of the carrier wave, (x) r ,y r ,z r ) Indicating the position of the intelligent reflecting surface, x d (t)、y d (t)、z d (t) represents the horizontal coordinate and the flying height of the drone respectively,
Figure BDA00039544138800000410
represents the kronecker product;
part of the line of sight of the intelligent reflector R-ground mobile subscriber G
Figure BDA00039544138800000411
Comprises the following steps:
Figure BDA00039544138800000412
wherein
Figure BDA00039544138800000413
Figure BDA00039544138800000414
Wherein,
Figure BDA00039544138800000415
and
Figure BDA00039544138800000416
cosine and sine representing the horizontal departure angle of the signal to the kth user;
Figure BDA00039544138800000417
a sine representing the vertical departure angle of the signal to the kth user; x is the number of k (t)、y k (t) horizontal coordinates representing the ground users, respectively;
Figure BDA00039544138800000418
representing the non line-of-sight portion of the R-G link,
Figure BDA00039544138800000419
representing the random scattering index.
Preferably, the method for establishing the fair communication model in step S5 includes: fair index representation maximization system throughput and fairness balance based on throughput ratio, and ground mobile user definition
Figure BDA0003954413880000051
Throughput ratio f k (t) measure the importance of terrestrial mobile users:
Figure BDA0003954413880000052
wherein,
Figure BDA0003954413880000053
representing the terrestrial mobile user k during a time period [0, t ]]The throughput of the network element(s) is,
Figure BDA0003954413880000054
represents the throughput of all terrestrial mobile users;
measuring fairness among users by using Jain's fairness index, and the new evaluation index for balancing communication efficiency and fairness is as follows:
Figure BDA0003954413880000055
preferably, the method for determining the weight coefficient in step S6 is: and performing grade quantization on the energy consumption sub-target and the fair throughput by taking the user service quality as a standard according to the attributes of the tasks to obtain a grade quantization table as follows:
applications of Energy consumption Throughput capacity
Real-time data y 2 y 3
Image data y 1 y 2
Audio data y 2 y 4
Non-compressed video y 4 y 3
Compressing video y 1 y 2
Wherein the rank [ y 1 ,y 2 ,y 3 ,y 4 ]Represents the level of importance; constructing a judgment matrix about energy consumption and fair throughput according to a grade quantization table:
Figure BDA0003954413880000056
solving the eigenvalue and eigenvector of the judgment matrix by a Jacobi method and normalizing the eigenvalue and eigenvector to obtain a weight coefficient [ w ] corresponding to the two sub-targets 1 ,w 2 ]。
Preferably, the multi-objective integer non-convex optimization problem in step S7 is:
Figure BDA0003954413880000061
s.t.C1:E d (0)=E max ,E d (T t )=E min ,
C2:
Figure BDA0003954413880000062
C3:
Figure BDA0003954413880000063
C4:R GU (t)≥γ dk ,
C5:
Figure BDA0003954413880000064
C6:
Figure BDA0003954413880000065
C7:
Figure BDA0003954413880000066
wherein u is d (t) represents a position of the drone;
Figure BDA0003954413880000067
a utility function representing the weighted throughput and the residual energy composition; t is t Representing a task execution time; e t (0) The electric quantity of the unmanned aerial vehicle at the initial time is represented; e max Representing the maximum battery capacity of the unmanned aerial vehicle when the unmanned aerial vehicle is fully charged; e d (T t ) Representing the remaining electric quantity when the unmanned aerial vehicle task is finished; e min Representing the minimum electric quantity required by safe return after the unmanned aerial vehicle executes the task; gamma ray dk Represents a transmission rate minimum threshold; u. of i (t) and u j (t) respectively representing the positions of the unmanned planes i and j at the time t; x is the number of d (t)、x k (t)、y d (t)、y k (t) coordinates, X, representing the unmanned aerial vehicle and the ground mobile user, respectively min 、X max 、Y min 、Y max The boundary value of the whole rectangular task area is obtained;
dividing the whole task execution time into N t A time slot, each time slot having a length of
Figure BDA0003954413880000068
Will continue to question
Figure BDA0003954413880000069
Conversion to discrete problems:
Figure BDA00039544138800000610
s.t.C1~C7
solve the problem of dispersion
Figure BDA00039544138800000611
The method is described as a Markov game process of a multi-agent < S, A, P, R, gamma >, wherein S is a state set, A is an action set, R is a reward function, P is a state transition probability function, and Gamma is a reward discount factor;
the method for the multi-agent deep reinforcement learning comprises the following steps:
in time slot n ∈ [0, N t ]Internal state
Figure BDA00039544138800000612
Wherein,
Figure BDA00039544138800000613
the coordinates of the drone at time slot n are indicated,
Figure BDA00039544138800000614
the coordinates representing the terrestrial mobile user at time slot n,
Figure BDA00039544138800000615
representing the residual energy consumption of the unmanned aerial vehicle, and theta (n) representing the phase of the intelligent reflecting surface;
acting within a time slot n
Figure BDA00039544138800000618
Wherein, dist d (n)∈[0,V d (t)δ t ]Indicating that the drone base station is on timeThe distance of flight within the gap n;
Figure BDA00039544138800000619
representing the flight direction of the unmanned aerial vehicle base station in time slot n; delta theta is the variation of the phase of the intelligent reflecting surface; v d (t) the flight speed of the drone;
the reward function is r = r 1 +r 21 p 12 p 23 p 3
Wherein the fair throughput
Figure BDA00039544138800000616
Coverage rewards
Figure BDA00039544138800000617
e d,k =1 indicates that user k can be covered by drone d, whereas e d,k =0;
Punishing: when the following conditions are satisfied, the drone base station will be punished: (1) Drone flight mission boundary zones, e.g.
Figure BDA0003954413880000071
Wherein X min 、X max 、Y min 、Y max The values of the abscissa and the ordinate of the task area range are represented; (2) Collision of unmanned aerial vehicle i with unmanned aerial vehicle j, i.e. | | u i (n)-u j (n)|| 2 ≥d min In which d is min Represents a safe distance threshold; (3) When the energy consumption of the unmanned aerial vehicle is lower than a set value, namely E d (t)≤E min (ii) a By defining a binary variable ξ l E {0,1} indicates whether the above condition l is violated; if xi l The condition of violation l is represented by =1,l epsilon {1,2,3}, and the unmanned plane is given a fixed penalty p l ,l∈{1,2,3};
In the Markov game process, the intelligent agent maximizes the reward function and disperses the problem through the optimal self strategy pi
Figure BDA0003954413880000072
Is described again as
Figure BDA0003954413880000073
s.t.C1~C7
Wherein,
Figure BDA0003954413880000074
representing the desired operation, and s and a are the concatenation of the state space and the action space of all agents.
Updating the state of the unmanned aerial vehicle based on an information sharing mechanism of a gate control unit, and inputting a strategy network to obtain the action to be executed by the unmanned aerial vehicle; and constructing an Actor network of state decomposition-expansion-aggregation to decompose and reduce the dimension of state information, and then performing state aggregation on the processed sub-states according to different correlation degrees by using a multi-head attention mechanism.
Preferably, the implementation method of the information sharing mechanism based on the gate control unit is as follows:
by means of a memory with a memory capacity M
Figure BDA0003954413880000075
Establishing state information sharing, wherein the memory is used for storing collective state information m of the unmanned aerial vehicle to form an element R M (ii) a The strategy of each unmanned aerial vehicle becomes
Figure BDA0003954413880000076
Each unmanned aerial vehicle sends its own state s d Mapping to an embedded vector representing the current state:
Figure BDA0003954413880000077
wherein,
Figure BDA0003954413880000078
is a network parameter of
Figure BDA0003954413880000079
A neural network of (a);
unmanned aerial vehicle carries out reading and operatingAs fetched for storage in memory
Figure BDA00039544138800000710
By generating a context vector h d To capture an embedded vector e d The spatio-temporal information of (c):
Figure BDA00039544138800000711
wherein,
Figure BDA00039544138800000712
representing parameters of a linear mapping network, H and E respectively representing a context vector H d And embedding vector e d Dimension (d);
embedded vector e of joint agent observed value d Context vector h d And the current memory
Figure BDA00039544138800000713
As input, learn a gating mechanism:
Figure BDA00039544138800000714
where σ (·) is a sigmoid function, [ e ] d ,h d ,m]Representing the concatenation of three vectors, k d As a weighting factor;
regulating the reading of information r from memory by gating mechanism d =m⊙k d (ii) a Wherein, l represents a hadamard product.
The agent generates a candidate memory content through nonlinear mapping according to the coding of the state value of the agent and the information of the current shared memory:
Figure BDA0003954413880000081
wherein,
Figure BDA0003954413880000082
is a network parameter; input gate g d For adjusting the content of the candidate memory, f d Determining information that needs to be retained and discarded, and:
Figure BDA0003954413880000083
Figure BDA0003954413880000084
where σ represents the sigmode activation function,
Figure BDA0003954413880000085
Respectively representing neural network parameters needing to be trained;
then, the unmanned aerial vehicle d generates new updated information by weighted combination of the new and old information: m' = g d ⊙c d +f d ⊙m;
The unmanned aerial vehicle takes the code of the current self state and the information read from the memory as the input of the strategy network, and the strategy network outputs the action to be executed by the unmanned aerial vehicle
Figure BDA0003954413880000086
Wherein r is d Which represents the information read from the memory device,
Figure BDA0003954413880000087
representing a policy function;
the state decomposition of the Actor network decomposes different types of state information and adopts a dimension expansion technology to expand all the state information to the same dimension; the aggregation is to carry out aggregation and linear mapping on each sub-state after decomposition into a low-dimensional input vector according to different correlation degrees;
selecting state information based on state information selection strategy of self-attention mechanism, and dividing the passing state intoPosition state information, remaining capacity state information, and state information read from a memory after the solution, dimension expansion, and linear mapping processes are treated as three vectors a 1 、a 2 And a 3 The calculation method of the self-attention mechanism comprises the following steps:
q i =W q I,Q=(q 1 ,q 2 ,q 3 ),I=[s 1 ,s 2 ,m d ]
k i =W k I,K=(k 1 ,k 2 ,k 3 ),I=[s 1 ,s 2 ,m d ]
v i =W v I,V=(v 1 ,v 2 ,v 3 ),I=[s 1 ,s 2 ,m d ]
wherein, W q 、W k And W v Respectively representing weight parameters of the full connection layer neural network; q. q of i 、k i And v i Representing queries, keys, and values in the attention mechanism, respectively; q, K and V respectively represent a query matrix, a key matrix and a value matrix; i is formed by 1 ,a 2 And a 3 A matrix composed of three vectors;
the attention score is expressed as: alpha is alpha score =Softmax(K T Q);
The output of the attention mechanism is: b = a score ·V,B={b 1 ,b 2 ,b 3 };
The output of the attention mechanism is processed through a linear mapping: s. the input =FC(B);
Wherein S is input Representing the inputs to the policy network and FC representing the linear mapping implemented by the fully-connected layer neural network.
The invention has the beneficial effects that: an air-ground cooperative network composed of an unmanned aerial vehicle and an intelligent reflecting surface provides communication service for a ground mobile network, the system communication efficiency and the fairness among ground mobile users are comprehensively considered, and an evaluation index based on the throughput priority and the fairness index is designed to balance the communication efficiency and the user fairness; the method is characterized in that the method takes the maximization of the system fair throughput as a target, considers the energy consumption of the unmanned aerial vehicle and provides energy support for the unmanned aerial vehicle by adopting a wireless power transmission technology, and optimizes the flight trajectory of the unmanned aerial vehicle and the phase of intelligent reflection by utilizing a multi-agent deep reinforcement learning optimization algorithm, so that the unmanned aerial vehicle is in the optimal position at each time slot to maximize the system fair throughput.
The invention provides fair communication service for ground mobile users and completes wireless charging of the unmanned aerial vehicles by utilizing the plurality of unmanned aerial vehicles and the intelligent reflecting surfaces fixed on the surface of the building, and provides fair communication service for the ground mobile users under the condition of meeting energy consumption and network connectivity.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flow chart of the auxiliary communication of the drone according to the present invention.
Fig. 2 is an architecture diagram of the auxiliary communication of the unmanned aerial vehicle according to the present invention.
Fig. 3 is a schematic diagram of the information sharing mechanism based on the gating function according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
An energy-carrying fair communication method of an air-to-ground mobile network based on an intelligent reflecting surface is mainly divided into three parts as shown in fig. 2: the method specifically comprises the following steps as shown in figure 1, and the implementation method comprises the following steps:
s1: establishing an air-ground mobile network architecture based on multiple unmanned aerial vehicles and intelligent reflecting surfaces, wherein the air-ground mobile network architecture comprises K ground mobile users, and the set of the K ground mobile users is represented as
Figure BDA0003954413880000091
Deploying D drones and whose set is represented as
Figure BDA0003954413880000092
The air-ground mobile network architecture is a scene of wireless power transmission and fair communication, the intelligent reflecting surface is a reflecting surface responsible for charging, the unmanned aerial vehicle serves as a mobile base station to provide communication service for ground mobile users (mobile vehicles, mobile terminals, mobile robots and the like), and the intelligent lamp pole serves as an energy source to provide energy transmission for the unmanned aerial vehicle.
S2: a wireless power transfer model is established according to a wireless power transfer technique. The intelligent lamp pole is used as an energy source, two transmission paths of the intelligent lamp-unmanned aerial vehicle direct transmission and the intelligent lamp pole-intelligent reflecting surface-unmanned aerial vehicle indirect transmission are adopted, wireless charging of the unmanned aerial vehicle is achieved by means of a wireless power transmission technology and a path reconstruction technology of the intelligent reflecting surface, and the cruising ability of the unmanned aerial vehicle is prolonged.
In the step S2, the wireless power transmission technology uses an urban infrastructure intelligent lamp pole as an energy source, and a power transmitter of the intelligent lamp pole transmits energy beams to a receiving end of the unmanned aerial vehicle through a transmission medium. Because certain loss is generated in the transmission process and the influence of buildings is easily received, the intelligent reflecting surface is adopted to reconstruct the energy transmission path, and the energy beam emitted by the power transmitter can reach the optical receiver of the unmanned aerial vehicle through two transmission paths of the intelligent lamp post-unmanned aerial vehicle and the intelligent lamp post-intelligent reflecting surface-unmanned aerial vehicle. Thus, the energy harvested at the unmanned end can be expressed as:
Figure BDA0003954413880000101
wherein,
Figure BDA0003954413880000102
representing the direct channel gain, H, between the drone and the terrestrial mobile users s (t) represents the channel gain between the energy source and the drone,
Figure BDA0003954413880000103
the channel gain between the smart reflective surface and the drone is represented, Θ (t) represents the phase of the smart reflective surface, and η represents the energy conversion factor. Energy of
Figure BDA0003954413880000104
The first term of express wisdom lamp to unmanned aerial vehicle's direct transmission, the second term expresses the transmission path through the reconstruction of intelligent plane of reflection, can effectively avoid the influence of shelter from thing to power transmission. After the intelligent reflecting surface is added to reconstruct a transmission path, the receiving power of the receiving end of the unmanned aerial vehicle can be improved on the one hand, and the influence of obstacles on transmission can be avoided on the other hand.
S3: and establishing an energy consumption model of the unmanned aerial vehicle according to the dynamic model and the communication model of the unmanned aerial vehicle. Unmanned aerial vehicle energy consumption mainly divide into at the flight in-process and removes the consumption and provide communication service's communication consumption for the user, in order to guarantee the normal execution of communication task, needs satisfy unmanned aerial vehicle residual energy restraint.
The unmanned aerial vehicle mainly has the energy consumption that removal, communication and internal circuit produced in the process of carrying out the auxiliary communication task, therefore the total energy consumption of unmanned aerial vehicle can be expressed as:
Figure BDA0003954413880000105
wherein, P t (t) denotes the transmit power of the drone, P c (t) constant circuit power, P, for the drone d (t) represents the propulsion power of the drone at time t, and the propulsion power P d (t) can be represented as:
Figure BDA0003954413880000106
Wherein,
Figure BDA0003954413880000107
representing the flying speed of the drone at time t, where V max Representing the maximum flying speed of the drone.
Figure BDA0003954413880000108
Respectively representing the drone position at t +1 and t and the time difference between the two moments. P o Representing blade section power, P i Indicating hovering power, v 0 Denotes the average induced speed of the rotor, d 0 Expressing the fuselage drag coefficient, ρ air density, s motor volume, A rotor area, U tip Indicating the linear tip speed.
The remaining energy of the drone is expressed as
Figure BDA0003954413880000111
Wherein E is max Represents the maximum energy value after the unmanned aerial vehicle is fully charged, E d (t) represents the battery energy remaining at time t for the drone. E c (t) represents the total energy consumption of the drone.
S4: and establishing a wireless communication model. And the intelligent reflecting surface is utilized to reconstruct the channel state between the unmanned aerial vehicle and the ground user, so that the communication quality between the unmanned aerial vehicle and the ground user is improved.
In step S4, the quality of the wireless channel between the unmanned aerial vehicle and the ground mobile user is poor due to the influence of obstacles or other environments, and the transmission rate between the unmanned aerial vehicle and the ground mobile user is increased by reconstructing the channel condition by adjusting the phase and amplitude of the intelligent reflecting surface, so that the transmission rate received by the ground mobile user is represented as:
Figure BDA0003954413880000112
wherein,
Figure BDA0003954413880000113
a matrix of phase coefficients representing the tamper resistant intelligent reflective surfaces,
Figure BDA0003954413880000114
indicating the phase, M, of an intelligent reflecting surface in wireless information transmission r 、M c
Figure BDA0003954413880000115
Respectively representing the number of rectangular reflecting surface rows and reflective elements on columns and the (m) th r ,m c ) The phase of the reflective element. h is a total of UG (t) denotes the channel gain of the transmission link for the unmanned aerial vehicle and the terrestrial mobile subscriber, B k (t) bandwidth allocated to the ground mobile user k by the drone, α d,k (t) indicates whether unmanned plane d is serving ground mobile user k, P t Denotes the transmission power, h RG (t) represents the channel gain between the intelligent reflecting surface and the land mobile user, h UR (t) represents the channel gain between the drone and the intelligent reflecting surface. Sigma 2 Representing gaussian white noise.
Assuming that there is no line-of-sight link between the drone and the ground mobile user, the channel gain h between the drone U-intelligent reflector R, the intelligent reflector R-ground mobile user G and the drone U-ground mobile user G UR (t)、h RG (t) and h UG (t) can be expressed as:
Figure BDA0003954413880000116
Figure BDA0003954413880000117
Figure BDA0003954413880000118
wherein beta is 0 Represents the channel power gain per unit distance, and the distances of U-R, R-G and U-G at time t are respectively represented as D UR (t),D RG (t) and D UG (t) of (d). 2. Alpha and beta represent path loss indexes on the U-R, R-G and U-G links respectively,
Figure BDA0003954413880000119
the portion representing the line of sight in the U-R link, depending on the flight trajectory of the drone at time slot n, may be represented as:
Figure BDA00039544138800001110
wherein,
Figure BDA0003954413880000121
Figure BDA0003954413880000122
and is provided with
Figure BDA0003954413880000123
And
Figure BDA0003954413880000124
respectively representing the cosine and sine of the horizontal Angle of Arrival (AoA) of the signal at the intelligent reflecting surface;
Figure BDA0003954413880000125
represents the sine of the signal perpendicular to the AoA at the intelligent reflective surface; λ represents the wavelength of the carrier wave, (x) r ,y r ,z r ) Indicating the position of the intelligent reflecting surface, x d (t)、y d (t)、z d (t) respectively representing the horizontal coordinates and the flying height of the drone,
Figure BDA0003954413880000126
representing the kronecker product.
Figure BDA0003954413880000127
Part representing the line of sight in the R-G link:
Figure BDA0003954413880000128
wherein
Figure BDA0003954413880000129
Figure BDA00039544138800001210
Wherein,
Figure BDA00039544138800001211
and
Figure BDA00039544138800001212
cosine and sine of an Angle of horizontal Departure (AoD) representing the signal to the kth user;
Figure BDA00039544138800001213
representing the sine of the vertical AoD of the signal to the kth user. x is the number of k (t)、y k (t) respectively represent horizontal coordinates of the land users.
Figure BDA00039544138800001214
Representing the non line-of-sight portion of the R-G link,
Figure BDA00039544138800001215
representing the random scattering index.
It can be seen from the transmission rate expression received by the ground mobile user that when no intelligent reflection surface is added, the channel between the unmanned aerial vehicle and the user only comprises a direct channel. After the reflecting surface is added, the channel is divided into a direct channel and a part of channels formed by the beams of the intelligent reflecting surface, the two channels are superposed at a user, and the channel propagation environment between the original unmanned aerial vehicle and the user can be improved, so that the transmission rate is improved.
S5: establishing a fair communication model: a fair communication model is established by considering communication efficiency (maximizing system throughput) and fairness among users, and the system throughput is maximized on the premise of guaranteeing the user fairness.
In the step S5, for the system maximization target, the unmanned aerial vehicle adjusts the flight trajectory to enable the unmanned aerial vehicle to be in a position with good channel quality. However, throughput maximization and user fairness are contradictory. The invention provides a fairness index based on a throughput ratio to represent the balance of system throughput and fairness. Defining users
Figure BDA00039544138800001216
Throughput ratio f of k (t) to measure the importance of the user:
Figure BDA0003954413880000131
wherein,
Figure BDA0003954413880000132
indicating that user k is in time period [0, t ]]The throughput of the network element(s) is,
Figure BDA0003954413880000133
representing the throughput of all users.
Modeling according to the importance degree of the users can achieve higher throughput, but unfairness of the ground mobile users can be caused, so that the fairness among the users is measured by using Jain's fairness index, and the maximum throughput is achieved on the premise of ensuring the fairness among the users. Selecting users of a service according to priority may improve communication efficiency but may result in unfairness among users. A new evaluation index is designed by balancing communication efficiency and fairness:
Figure BDA0003954413880000134
in order to measure the importance degree of the ground users, priority based on throughput rate is designed, unfairness is caused if only the priority is considered, and then an evaluation index which simultaneously considers the priority and fairness is obtained according to the priority and fairness index.
S6: and constructing a judgment matrix about fair throughput and energy consumption according to the user service quality grade, solving and normalizing the eigenvalue and the eigenvector of the judgment matrix, and determining the weight coefficients of two sub-targets of fair weighted throughput and energy consumption.
In step S6, the total optimization goal is a weighted sum of two sub-goals regarding the remaining energy of the drone and the system fair throughput, which are mutually opposite. In order to reasonably determine the weight coefficient of the sub-targets, the invention designs a weight coefficient determination method based on the user service quality. First, in order to establish a multi-objective optimization problem for communication and energy transmission, energy consumption sub-objectives and fair throughput are hierarchically quantized on the basis of user qos criteria according to attributes of tasks (task volume, transmission delay, computation power, etc.) as shown in table 1, where the hierarchy y 1 ,y 2 ,y 3 ,y 4 ]The importance is represented by 1,2,3 and 4 grades, and the higher grade represents the higher quality of service of the user. According to the grade quantization table, taking a real-time data task as an example, a judgment matrix about energy consumption and fair throughput is constructed:
Figure BDA0003954413880000135
the eigenvalue and the eigenvector of the judgment matrix are solved and normalized by a Jacobi method, and the weight system corresponding to the two sub-targets can be obtainedNumber [ w ] 1 ,w 2 ]。
TABLE 1 QOS class quantization table for different tasks with respect to energy consumption and throughput
Figure BDA0003954413880000136
Figure BDA0003954413880000141
S7: constructing a multi-objective optimization problem: modeling the unmanned aerial vehicle energy-carrying communication problem into a multi-target integer non-convex optimization problem with fair throughput and maximized unmanned aerial vehicle residual energy, and re-describing the problem into a Markov game process, and solving the complex multi-target optimization problem through multi-agent deep reinforcement learning.
In the step S7, the unmanned aerial vehicle energy-carrying fair communication is modeled into a multi-objective optimization problem:
Figure BDA0003954413880000142
s.t.C1:E d (0)=E max ,E d (T t )=E min ,
C2:
Figure BDA0003954413880000143
C3:
Figure BDA0003954413880000144
C4:R GU (t)≥γ dk ,
C5:
Figure BDA0003954413880000145
C6:
Figure BDA0003954413880000146
C7:
Figure BDA0003954413880000147
wherein u is d (t) represents a position of the drone;
Figure BDA0003954413880000148
a utility function representing the weighted throughput and the residual energy composition; t is a unit of t Representing a task execution time; e t (0) Representing the initial electric quantity of the unmanned aerial vehicle; e max Representing a maximum battery capacity when the drone is fully charged; e d (T t ) Representing the remaining electric quantity when the unmanned aerial vehicle task is finished; e min Representing the minimum electric quantity required by safe return after the unmanned aerial vehicle executes the task; gamma ray dk Represents a transmission rate minimum threshold; u. of i (t) and u j (t) respectively representing the positions of the unmanned planes i and j at the time t; x is the number of d (t)、x k (t)、y d (t)、y k (t)、X min 、X max 、Y min 、Y max Coordinates representing the drone and the user, respectively, and boundary values for the entire rectangular task area.
Since the position of the drone is continuously changing, its optimization variables are continuous and there is a non-linear coupling. To make a continuous problem
Figure BDA0003954413880000149
Becomes easy to solve, the invention divides the whole task execution time into N t A time slot, each time slot having a length of
Figure BDA00039544138800001410
Thus continuing the problem
Figure BDA00039544138800001411
Can be converted into discrete problems
Figure BDA00039544138800001412
Figure BDA00039544138800001413
s.t.C1~C7
The above problem is still a non-convex integer optimization problem, and the conventional optimization algorithm has high computational complexity and is not easy to solve. Thus, the problem of dispersion
Figure BDA00039544138800001414
Re-described as a multi-agent Markov game process < S, A, P, R, γ >. The Markov game process comprises five parts, namely a state set S, an action set A, a reward function R, a state transition probability function P and a reward discount factor gamma. Its state, actions and rewards are defined as follows:
in time slot n ∈ [0, N t ]Internal state
Figure BDA0003954413880000151
The device consists of four parts, wherein,
Figure BDA0003954413880000152
the coordinates of the drone base station at time slot n are indicated.
Figure BDA0003954413880000153
The coordinates of the terrestrial mobile subscriber in time slot n are indicated.
Figure BDA0003954413880000154
Representing the remaining energy consumption of the drone, and Θ (n) representing the phase of the intelligent reflecting surface.
Acting within a time slot n
Figure BDA00039544138800001510
Mainly consists of two parts, dist d (n)∈[0,V d (t)δ t ]Indicating the distance the drone base station is flying during time slot n.
Figure BDA00039544138800001511
Indicates that there is noAnd the direction of the flight of the man-machine base station in the time slot n. Δ Θ is the amount of change in the phase of the intelligent reflecting surface. V d (t) the flight speed of the drone.
The goal of the action taken by the agent is to maximize the system reward, so the setting of the reward function plays an important role in multi-agent reinforcement learning, and the reward of the unmanned aerial vehicle base station mainly comprises the following parts:
fair throughput
Figure BDA0003954413880000155
In the multi-drone base station assisted fair communication problem, each drone base station has the same objective, namely maximizing the global fair throughput and the remaining energy of the drones.
Coverage rewards
Figure BDA0003954413880000156
In order to accelerate the convergence speed of the algorithm, the coverage reward of the unmanned aerial vehicle is designed in a reward function, namely the coverage reward e d,k Is in direct proportion to the number of users covered by the unmanned aerial vehicle. Wherein e d,k =1 indicates that user k can be covered by drone d, whereas e d,k =0。
Punishing: when satisfying following condition, the unmanned aerial vehicle basic station will receive punishment. (1) Unmanned aerial vehicle departure mission boundary regions, e.g.
Figure BDA00039544138800001512
Wherein X min 、X max 、Y min 、Y max And the values of the abscissa and the ordinate of the task area range are represented. (2) Collision of drone i with drone j, e.g. | | u i (n)-u j (n)|| 2 ≥d min In which d is min Representing a safe distance threshold. (3) When the energy consumption of the drone is below a set value, e.g. E d (t)≤E min . By defining a binary variable ξ l E {0,1} indicates whether the above is violated. If xi l That =1,l ∈ {1,2,3} indicates that in violation of the above, the drone is given a fixed penalty p l ,l∈{1,2,3}。
In summary, the reward function can be expressed as
r=r 1 +r 21 p 12 p 23 p 3
In the Markov game process, the agent aims to maximize the reward function through the optimal self strategy pi, so the discrete problem
Figure BDA0003954413880000157
Can be re-described as
Figure BDA0003954413880000158
s.t.C1~C7
Wherein,
Figure BDA0003954413880000159
s and a respectively represent the calculation of the expected operation, the state space and the action space; s and a are the concatenation of the state space and the action space of all agents described above.
The goal of the agent in the deep reinforcement learning is to maximize the value of the reward obtained by the agent, and the problem can be converted into a maximization problem. In addition, the intelligent agent maximizes the reward value through the strategy of the intelligent agent, namely the executed action, so that the optimization target of the intelligent agent can be used as the reward of the intelligent agent, the optimization variable can be used as the action of the intelligent agent, and the unmanned aerial vehicle and the environment interact to search the optimal strategy of the intelligent agent through the state space with reasonable design, so that the reward value is maximized.
The energy quantum target in fig. 1 is to maximize the remaining energy consumption of the drone, and the throughput quantum target is to maximize fair throughput. The optimization variables of the whole problem mainly comprise the phase of the intelligent reflecting surface and the position of the unmanned aerial vehicle. The positions of the unmanned aerial vehicles and the phases of the reflecting surfaces are updated according to the magnitude of the reward values in reinforcement learning.
S8: and updating the state of the unmanned aerial vehicle based on an information sharing mechanism of the gate control unit, and inputting a strategy network to obtain the action to be executed by the unmanned aerial vehicle. And storing the state information of all the unmanned planes into a memory, wherein each unmanned plane can access the memory to read the state information of other intelligent bodies when executing actions.
In the step S8, an information sharing mechanism based on a gate control unit is designed to solve the problem of uncertainty of the policy due to partial observability, as shown in fig. 3. The mechanism is implemented by a central memory with a memory capacity of M
Figure BDA0003954413880000161
To establish state information sharing, the memory is used to store the collective state information m of the unmanned aerial vehicle M . After joining the information sharing mechanism, the policy of each drone becomes
Figure BDA0003954413880000162
That is, the strategy of the unmanned aerial vehicle does not depend on the observation value s of the unmanned aerial vehicle at the moment d And also to the information in the memory. The information sharing mechanism mainly comprises an encoding operation, a reading operation, a writing operation and an action selection part. Each unmanned aerial vehicle sends its own state s d Mapping to an embedded vector representing the current state:
Figure BDA0003954413880000163
wherein,
Figure BDA0003954413880000164
is a network parameter of
Figure BDA0003954413880000165
The neural network of (1).
After encoding the current information, the drone performs a read operation extraction stored in the central memory
Figure BDA0003954413880000166
The related information in (1). By generating a context vector h d To capture an embedded vector e d Temporal and spatial information of
Figure BDA0003954413880000167
Wherein,
Figure BDA0003954413880000168
network parameters representing linear mapping, H and E respectively representing context vector H d And embedding vector e d Of (c) is calculated. Embedded vector e of agent observations d Context vector h d And a current central storage
Figure BDA0003954413880000169
Respectively, contains different information, which are combined as input to learn a gating mechanism:
Figure BDA00039544138800001610
where σ (·) is a sigmoid function, [ e ] d ,h d ,m]Representing the concatenation of three vectors and M representing the memory size. k is a radical of d Adjusting information read from a central memory as a weighting factor
r d =m⊙k d
Wherein, l represents a hadamard product.
The agent generates a candidate memory content through nonlinear mapping according to the coding of the state value of the agent and the information of the current shared memory:
Figure BDA0003954413880000171
wherein,
Figure BDA0003954413880000172
is a network parameter. Input gate g d For adjusting the contents of the candidate memory, f d Deciding which information to useWhich information needs to be retained and discarded, these operations can be expressed as:
Figure BDA0003954413880000173
Figure BDA0003954413880000174
where σ represents the sigmode activation function,
Figure BDA0003954413880000175
Respectively representing the neural network parameters to be trained.
Then, the unmanned aerial vehicle d finally generates new updated information through the weighted combination of the new and old information:
m'=g d ⊙c d +f d ⊙m
after the read-write operation is finished, the unmanned aerial vehicle takes the code of the current self state and the information read from the memory as the input of a strategy network, and the strategy network outputs the action to be executed by the unmanned aerial vehicle
Figure BDA0003954413880000176
Wherein e is d Represents an observed value s d A mapping vector of states. r is d Which represents the information read from the memory device,
Figure BDA0003954413880000177
representing a policy function. The policy network is part of the Actor network.
S9: and constructing an Actor network of state decomposition-expansion-aggregation to decompose and reduce the dimension of the state information. Decoupling the state of the unmanned aerial vehicle according to different categories, carrying out dimension expansion on the state with smaller dimension, and then carrying out state aggregation on the processed sub-states according to different correlation degrees by using a multi-head attention mechanism.
In step S9, the final action of the drone depends on the position information of the drone and the ground mobile user, the energy information of the drone, and the phase of the intelligent reflective surface. If all the status information(s) is directly used d The position of the drone, the position of the user, and the remaining capacity of the drone) and shared information (contents read from a memory by the agent through a read operation) are input to the Actor network, and it is difficult to output an ideal policy due to the unbalanced dimensionality of the status information. Therefore, the invention designs a new Actor network architecture of state decomposition-extension-aggregation. The network structure mainly comprises three parts, namely state decomposition and expansion, state linear mapping dimension space reduction and multi-head attention mechanism state aggregation. The state decomposition refers to decomposing different types of state information (position information, energy information and phase information), and since the dimensions of the different types of state information are different, all the state information is expanded to the same dimension by adopting a dimension expansion technology. The aggregation is to aggregate and linearly map the sub-states after the decomposition into a low-dimensional input vector according to different degrees of correlation. And (3) an attention mechanism is adopted to enable the unmanned aerial vehicle to learn the strategy, and then the strategy is spliced into an integral vector according to the importance degree (essentially, a weight coefficient) of different sub-states.
In addition, the output action of the unmanned aerial vehicle is closely related to the state information, for example, when the remaining capacity of the unmanned aerial vehicle is lower than a safety threshold, the action output by the unmanned aerial vehicle is biased to ensure that the charging capacity of the unmanned aerial vehicle is maximized rather than the throughput is maximized. Therefore, the influence of different types of state information on the action of the unmanned aerial vehicle is different, and a state information selection strategy based on a self-attention mechanism is designed based on the method. Position state information, remaining capacity state information and state information read from a memory after state decomposition, dimension expansion and linear mapping processing are taken as three vectors a 1 、a 2 And a 3 . The calculation formula of the self-attention mechanism is as follows
q i =W q I,Q=(q 1 ,q 2 ,q 3 ),I=[s 1 ,s 2 ,m d ]
k i =W k I,K=(k 1 ,k 2 ,k 3 ),I=[s 1 ,s 2 ,m d ]
v i =W v I,V=(v 1 ,v 2 ,v 3 ),I=[s 1 ,s 2 ,m d ]
Wherein, W q 、W k And W v Respectively, representing the weight parameters of the fully-connected layer neural network. q. q of i 、k i And v i Representing queries, keys and values in the attention mechanism, respectively. Q, K, V represent the query matrix, key matrix, and value matrix, respectively. I is formed by 1 ,a 2 And a 3 A matrix of three vectors. Thus, the attention score may be expressed as
α score =Softmax(K T Q)
The attention score may reflect the correlation between any two vectors. The output of the attention mechanism can be expressed as:
B=α score ·V,B={b 1 ,b 2 ,b 3 }
for better input strategy network (full connection layer neural network), the invention processes the output of attention mechanism through linear mapping
S input =FC(B)
Wherein S is input Representing the inputs to the policy network and FC representing the linear mapping implemented by the fully-connected layer neural network. Therefore, the convergence difficulty caused by dimension imbalance and dimension explosion can be solved by carrying out the aggregation of the decoupling-expanding-attention mechanism on the input of the strategy network, and in addition, the correlation among the sub-states can be well represented by adopting the attention mechanism. Then, the drone focuses on the sub-state with stronger relevance when performing the action, so as to help the drone find the optimal strategy.
Firstly, designing an air-ground mobile network architecture based on an unmanned aerial vehicle and an intelligent reflecting surface, and establishing an energy transmission model and a communication model based on the intelligent reflecting surface. In order to balance communication efficiency and user fairness, a fairness index based on a throughput ratio is designed to describe user-level fairness. And finally, providing a multi-agent deep reinforcement learning optimization algorithm, and enabling the total fair throughput of the system and the residual energy of the unmanned aerial vehicle to be maximum by optimizing the position of the unmanned aerial vehicle and the phase of an intelligent reflecting surface. In addition, in order to solve the problems of strategy uncertainty and algorithm training difficulty caused by partial observability and state dimension imbalance, the invention also designs an information sharing mechanism based on a gating function and a novel Actor network architecture of state decoupling, state expansion and state aggregation, and the unmanned aerial vehicle can pay attention to more important state information by self by adopting a self-attention mechanism during aggregation.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims (10)

1. An air-ground mobile network energy-carrying fair communication method based on an intelligent reflecting surface is characterized by comprising the following steps:
s1: establishing an air-ground mobile network architecture based on a plurality of unmanned aerial vehicles and an intelligent reflecting surface, wherein the air-ground mobile network architecture comprises K ground mobile users and D unmanned aerial vehicles;
s2: establishing a wireless power transmission model according to a wireless power transmission technology: the intelligent lamp post is used as an energy source, and two transmission paths of direct transmission of the intelligent lamp-the unmanned aerial vehicle and indirect transmission of the intelligent lamp post-the intelligent reflecting surface-the unmanned aerial vehicle are adopted, so that the unmanned aerial vehicle is wirelessly charged;
s3: establishing an energy consumption model of the unmanned aerial vehicle according to the dynamic model and the communication model of the unmanned aerial vehicle;
s4: reconstructing a channel state between the unmanned aerial vehicle and a ground user by using the intelligent reflecting surface, and establishing a wireless communication model;
s5: establishing a fair communication model: a fair communication model is established by considering communication efficiency and fairness among users, and system throughput is maximized on the premise of guaranteeing user fairness;
s6: constructing a judgment matrix about fair throughput and energy consumption according to the user service quality grade, solving and normalizing eigenvalues and eigenvectors of the judgment matrix, and determining weight coefficients of two sub-targets of fair weighted throughput and energy consumption;
s7: modeling the unmanned aerial vehicle energy-carrying communication problem into a multi-target integer non-convex optimization problem with fair throughput and maximized unmanned aerial vehicle residual energy, and re-describing the problem into a Markov game process, solving the complex multi-target optimization problem through multi-agent deep reinforcement learning, and updating the position of the unmanned aerial vehicle and the phase of an intelligent reflecting surface.
2. The air-ground mobile network energy-carrying fair communication method based on the intelligent reflecting surface as claimed in claim 1, wherein the set of ground mobile users is represented as
Figure FDA0003954413870000011
The set of drones is denoted as
Figure FDA0003954413870000012
The intelligent reflecting surface is a reflecting surface responsible for charging, the unmanned aerial vehicle serves as a mobile base station to provide communication service for ground mobile users, and the intelligent lamp pole serves as an energy source to provide energy transmission for the unmanned aerial vehicle;
the method for constructing the wireless power transmission model in the step S2 comprises the following steps: an intelligent reflecting surface is adopted to reconstruct an energy transmission path, an energy beam emitted by a power emitter of the intelligent lamp pole reaches a light receiver of the unmanned aerial vehicle through two transmission paths of the intelligent lamp pole-the unmanned aerial vehicle and the intelligent lamp pole-the intelligent reflecting surface-the unmanned aerial vehicle, and the energy harvested by the unmanned aerial vehicle is as follows:
Figure FDA0003954413870000013
wherein,
Figure FDA0003954413870000014
representing the direct channel gain, H, between the drone and the terrestrial mobile users s (t) represents the channel gain between the energy source and the drone,
Figure FDA0003954413870000015
the channel gain between the intelligent reflecting surface and the unmanned aerial vehicle is represented, theta (t) represents the phase of the intelligent reflecting surface, and eta represents the energy conversion coefficient.
3. The air-ground mobile network energy-carrying fair communication method based on the intelligent reflecting surface as claimed in claim 1, wherein the method for establishing the energy consumption model of the unmanned aerial vehicle in S3 is as follows:
unmanned aerial vehicle has the energy resource consumption that removes consumption, communication consumption and internal circuit produced at the in-process of carrying out the auxiliary communication task, then the total energy consumption of unmanned aerial vehicle is:
Figure FDA0003954413870000021
wherein, P t (t) denotes the transmit power of the drone, P c (t) constant circuit power, P, for the drone d (t) represents the propulsion power of the drone at time t, and the propulsion power P d (t) is expressed as:
Figure FDA0003954413870000022
wherein,
Figure FDA0003954413870000023
representing the flying speed, V, of the drone at time t max Representing the maximum flight speed of the drone;
Figure FDA0003954413870000024
individual watchShowing the positions of the unmanned aerial vehicle at the t +1 moment and the t moment and the time difference between the two moments; p o Representing blade section power, P i Indicating hovering power, v 0 Denotes the average induced speed of the rotor, d 0 Expressing the fuselage drag coefficient, ρ air density, s motor volume, A rotor area, U tip Expressing the linear speed of a blade tip;
the remaining energy of the drone is:
Figure FDA0003954413870000025
wherein E is max Represents the maximum energy value after the unmanned aerial vehicle is fully charged, E d (t) represents the battery energy remaining at the drone at time t,
Figure FDA0003954413870000026
representing the energy harvested by the drone.
4. The air-ground mobile network energy-carrying fair communication method based on the intelligent reflecting surface as claimed in claim 2 or 3, wherein the method for establishing the wireless communication model in the step S4 is as follows: the phase position and the amplitude value of the intelligent reflecting surface are adjusted to reconstruct the channel condition, so that the transmission rate between the unmanned aerial vehicle and the user is improved, and the transmission rate received by the ground mobile user is as follows:
Figure FDA0003954413870000027
wherein,
Figure FDA0003954413870000028
a phase coefficient matrix representing the anti-jamming intelligent reflecting surface,
Figure FDA0003954413870000029
indicating the phase, M, of an intelligent reflecting surface in wireless information transmission r 、M c Respectively representing the number of reflecting elements on the rows and columns of the intelligent reflecting surface,
Figure FDA00039544138700000210
(m) th representing an intelligent reflecting surface r ,m c ) The phase of the reflective element; h is a total of UG (t) denotes the channel gain of the transmission link for the unmanned aerial vehicle and the terrestrial mobile subscriber, B k (t) bandwidth allocated to the ground mobile user k by the drone, α d,k (t) indicates whether unmanned plane d is serving ground mobile user k, P t Denotes the transmission power, h RG (t) represents the channel gain between the intelligent reflecting surface and the land mobile user, h UR (t) represents the channel gain, σ, between the drone and the smart reflecting surface 2 Representing gaussian white noise.
5. The air-ground mobile network energy-carrying fair communication method based on the intelligent reflector as claimed in claim 4, wherein a channel gain h between the unmanned plane U and the intelligent reflector R, the intelligent reflector R and the ground mobile subscriber G is set to be equal to a channel gain h between the unmanned plane U and the intelligent reflector R and between the unmanned plane U and the ground mobile subscriber G, provided that no line-of-sight link exists between the unmanned plane and the ground mobile subscriber UR (t)、h RG (t) and h UG (t) are respectively:
Figure FDA0003954413870000031
Figure FDA0003954413870000032
Figure FDA0003954413870000033
wherein beta is 0 Denotes the channel power gain per unit distance, D UR (t)、D RG (t) and D UG (t) respectively representing unmanned plane U-intelligent reflector R and intelligent reflector R-ground movementThe distance between the user G and the unmanned aerial vehicle U-ground mobile user G at the time t is respectively expressed as; 2. alpha and beta represent path loss indexes on links of an unmanned plane U-intelligent reflecting surface R, an intelligent reflecting surface R-ground mobile user G and an unmanned plane U-ground mobile user G respectively;
Figure FDA0003954413870000034
the part representing the line of sight in the U-intelligent reflector R link of the drone, depending on the flight trajectory of the drone in time slot n, is represented as:
Figure FDA0003954413870000035
wherein,
Figure FDA0003954413870000036
Figure FDA0003954413870000037
and is
Figure FDA0003954413870000038
And
Figure FDA0003954413870000039
respectively representing cosine and sine of a horizontal arrival angle of the signal on the intelligent reflecting surface;
Figure FDA00039544138700000310
a sine representing the perpendicular angle of arrival of the signal at the intelligent reflecting surface; λ represents the wavelength of the carrier wave, (x) r ,y r ,z r ) Indicating the position of the intelligent reflecting surface, x d (t)、y d (t)、z d (t) represents the horizontal coordinate and the flying height of the drone respectively,
Figure FDA00039544138700000311
represents the kronecker product;
part of line-of-sight in the link of the intelligent reflector R-ground mobile subscriber G
Figure FDA00039544138700000312
Comprises the following steps:
Figure FDA00039544138700000313
wherein
Figure FDA00039544138700000314
Figure FDA00039544138700000315
Wherein,
Figure FDA00039544138700000316
and
Figure FDA00039544138700000317
cosine and sine representing the horizontal departure angle of the signal to the kth user;
Figure FDA0003954413870000041
a sine representing the vertical departure angle of the signal to the kth user; x is the number of k (t)、y k (t) horizontal coordinates representing the ground users, respectively;
Figure FDA0003954413870000042
representing the non line-of-sight portion of the R-G link,
Figure FDA0003954413870000043
is represented byThe machine scattering index.
6. The intelligent reflecting surface-based air-ground mobile network energy-carrying fair communication method according to any one of claims 2,3 and 5, wherein the method for establishing the fair communication model in the step S5 comprises the following steps: fair index representation maximization system throughput and fairness balance based on throughput ratio, and ground mobile user definition
Figure FDA0003954413870000044
Throughput ratio f k (t) measure the importance of terrestrial mobile users:
Figure FDA0003954413870000045
wherein,
Figure FDA0003954413870000046
representing the terrestrial mobile user k during the time period [0, t]The throughput of the network element(s) is,
Figure FDA0003954413870000047
represents the throughput of all terrestrial mobile users;
measuring fairness among users by using Jain's fairness index, and the new evaluation index for balancing communication efficiency and fairness is as follows:
Figure FDA0003954413870000048
7. the air-ground mobile network energy-carrying fair communication method based on the intelligent reflecting surface as claimed in claim 6, wherein the method for determining the weight coefficient in the step S6 is as follows: and performing grade quantization on the energy consumption sub-target and the fair throughput by taking the user service quality as a standard according to the attributes of the tasks to obtain a grade quantization table as follows:
applications of the invention Energy consumption Throughput capacity Real time data y 2 y 3 Image data y 1 y 2 Audio data y 2 y 4 Non-compressed video y 4 y 3 Compressing video y 1 y 2
Wherein the rank [ y 1 ,y 2 ,y 3 ,y 4 ]Represents the level of importance; constructing a table of energy consumption and energy consumption according to the grade quantization tableJudgment matrix of fair throughput:
Figure FDA0003954413870000049
solving the eigenvalue and the eigenvector of the judgment matrix by a Jacobi method and normalizing the eigenvalue and the eigenvector to obtain a weight coefficient [ w ] corresponding to two sub-targets 1 ,w 2 ]。
8. The air-ground mobile network energy-carrying fair communication method based on the intelligent reflecting surface as claimed in claim 7, wherein the multi-objective integer non-convex optimization problem in the step S7 is as follows:
Figure FDA0003954413870000051
Figure FDA0003954413870000052
s.t.C1:E d (0)=E max ,E d (T t )=E min ,
C2:
Figure FDA0003954413870000053
C3:
Figure FDA0003954413870000054
C4:R GU (t)≥γ dk ,
C5:
Figure FDA0003954413870000055
C6:
Figure FDA0003954413870000056
C7:
Figure FDA0003954413870000057
wherein u is d (t) represents a position of the drone;
Figure FDA0003954413870000058
a utility function representing the weighted throughput and the residual energy composition; t is t Representing the task execution time; e t (0) The electric quantity of the unmanned aerial vehicle at the initial time is represented; e max Representing the maximum battery capacity of the unmanned aerial vehicle when the unmanned aerial vehicle is fully charged; e d (T t ) Representing the remaining electric quantity when the unmanned aerial vehicle task is finished; e min Representing the minimum electric quantity required by safe return after the unmanned aerial vehicle executes the task; gamma ray dk Represents a transmission rate minimum threshold; u. of i (t) and u j (t) respectively representing the positions of the unmanned planes i and j at the time t; x is the number of d (t)、x k (t)、y d (t)、y k (t) coordinates, X, representing the unmanned aerial vehicle and the ground mobile user, respectively min 、X max 、Y min 、Y max The boundary value of the whole rectangular task area is obtained;
dividing the whole task execution time into N t A time slot, each time slot having a length of
Figure FDA0003954413870000059
Will continue to question
Figure FDA00039544138700000510
Conversion to the discrete problem:
Figure FDA00039544138700000511
Figure FDA00039544138700000512
s.t.C1~C7
solve the problem of dispersion
Figure FDA00039544138700000513
Re-describing the Markov game process of a multi-agent < S, A, P, R, gamma >, wherein S is a state set, A is an action set, R is a reward function, P is a state transition probability function, and gamma is a reward discount factor;
the multi-agent deep reinforcement learning method comprises the following steps:
in time slot n ∈ [0, N t ]Internal state
Figure FDA00039544138700000514
Wherein,
Figure FDA00039544138700000515
the coordinates of the drone at time slot n are indicated,
Figure FDA00039544138700000516
the coordinates of the terrestrial mobile subscriber in time slot n are represented,
Figure FDA00039544138700000517
representing the residual energy consumption of the unmanned aerial vehicle, and theta (n) representing the phase of the intelligent reflecting surface;
within a time slot n an action a d ={dist d (n),
Figure FDA00039544138700000520
Δ Θ }, wherein dist d (n)∈[0,V d (t)δ t ]The flying distance of the unmanned aerial vehicle base station in the time slot n is represented;
Figure FDA00039544138700000521
representing the flight direction of the unmanned aerial vehicle base station in the time slot n; delta theta is the variation of the phase of the intelligent reflecting surface; v d (t) the flight speed of the drone;
the reward function is r = r 1 +r 21 p 12 p 23 p 3
Wherein the fair throughput
Figure FDA00039544138700000518
Coverage rewards
Figure FDA00039544138700000519
e d,k =1 indicates that user k can be covered by drone d, whereas e d,k =0;
Punishing: when satisfying following condition, the unmanned aerial vehicle basic station will receive punishment: (1) Unmanned aerial vehicle departure mission boundary regions, e.g.
Figure FDA0003954413870000061
Wherein X min 、X max 、Y min 、Y max The values of the abscissa and the ordinate of the task area range are represented; (2) Collision of unmanned aerial vehicle i with unmanned aerial vehicle j, i.e. | | u i (n)-u j (n)|| 2 ≥d min In which d is min Represents a safe distance threshold; (3) When the energy consumption of the unmanned aerial vehicle is lower than a set value, namely E d (t)≤E min (ii) a By defining a binary variable ξ l E {0,1} indicates whether the above condition l is violated; if xi l The condition of violation l is represented by =1,l epsilon {1,2,3}, and the unmanned plane is given a fixed penalty p l ,l∈{1,2,3};
In the Markov game process, the intelligent agent maximizes the reward function through the optimal self strategy pi, and the problem is dispersed
Figure FDA00039544138700000614
Is described again as
Figure FDA0003954413870000062
s.t.C1~C7
Wherein,
Figure FDA0003954413870000063
expressing expectationsThe operation, s and a, is the concatenation of the state space and the action space of all agents.
9. The intelligent reflector-based air-ground mobile network energy-carrying fair communication method as claimed in claim 8, wherein the state of the unmanned aerial vehicle is updated based on an information sharing mechanism of a gate control unit, and a policy network is input to obtain actions required to be executed by the unmanned aerial vehicle; and constructing an Actor network of state decomposition-expansion-aggregation to decompose and reduce the dimension of state information, and then performing state aggregation on the processed sub-states according to different correlation degrees by using a multi-head attention mechanism.
10. The air-ground mobile network energy-carrying fair communication method based on the intelligent reflecting surface as claimed in claim 9, wherein the gating cell-based information sharing mechanism is implemented by:
by means of a memory with a memory capacity of M
Figure FDA0003954413870000064
Establishing state information sharing, wherein a memory is used for storing collective state information m of the unmanned aerial vehicle belongs to R M (ii) a The strategy of each unmanned aerial vehicle becomes
Figure FDA0003954413870000065
Each unmanned aerial vehicle sends its own state s d Mapping to an embedded vector representing the current state:
Figure FDA0003954413870000066
wherein,
Figure FDA0003954413870000067
is a network parameter of
Figure FDA0003954413870000068
A neural network of (a);
the unmanned aerial vehicle executes the read operation to be extracted and stored in the memory
Figure FDA0003954413870000069
By generating a context vector h d To capture an embedded vector e d The spatio-temporal information of (a):
Figure FDA00039544138700000610
wherein,
Figure FDA00039544138700000611
parameters representing a linear mapping network, H, E representing a context vector H, respectively d And embedding vector e d Dimension (d);
embedded vector e of joint agent observed value d Context vector h d And the current memory
Figure FDA00039544138700000612
As input, learn a gating mechanism:
Figure FDA00039544138700000613
where σ (·) is a sigmoid function, [ e ] d ,h d ,m]Representing the concatenation of three vectors, k d As a weighting factor;
regulating the reading of information r from memory by gating mechanism d =m⊙k d (ii) a Wherein, l represents a hadamard product.
The agent generates a candidate memory content through nonlinear mapping according to the coding of the state value of the agent and the information of the current shared memory:
Figure FDA0003954413870000071
wherein,
Figure FDA0003954413870000072
is a network parameter; input gate g d For adjusting the content of the candidate memory, f d Deciding information that needs to be retained and discarded, and:
Figure FDA0003954413870000073
Figure FDA0003954413870000074
where σ represents the sigmode activation function,
Figure FDA0003954413870000075
Respectively representing neural network parameters needing to be trained;
then, the unmanned aerial vehicle d generates new updated information by weighted combination of the new and old information: m' = g d ⊙c d +f d ⊙m;
The unmanned aerial vehicle takes the code of the current self state and the information read from the memory as the input of the strategy network, and the strategy network outputs the action to be executed by the unmanned aerial vehicle
Figure FDA0003954413870000076
Wherein r is d Which represents the information read from the memory device,
Figure FDA0003954413870000077
representing a policy function;
the state decomposition of the Actor network decomposes different types of state information and adopts a dimension expansion technology to expand all the state information to the same dimension; the aggregation is to aggregate and linearly map each sub-state after decomposition into a low-dimensional input vector according to different correlation degrees;
selecting the state information based on a state information selection strategy of a self-attention mechanism, and taking the position state information, the residual capacity state information and the state information read from the memory after the state decomposition, the dimension expansion and the linear mapping processing as three vectors a 1 、a 2 And a 3 The calculation method of the self-attention mechanism comprises the following steps:
q i =W q I,Q=(q 1 ,q 2 ,q 3 ),I=[s 1 ,s 2 ,m d ]
k i =W k I,K=(k 1 ,k 2 ,k 3 ),I=[s 1 ,s 2 ,m d ]
v i =W v I,V=(v 1 ,v 2 ,v 3 ),I=[s 1 ,s 2 ,m d ]
wherein, W q 、W k And W v Respectively representing weight parameters of the full connection layer neural network; q. q.s i 、k i And v i Representing queries, keys, and values in the attention mechanism, respectively; q, K and V respectively represent a query matrix, a key matrix and a value matrix; i is formed by 1 ,a 2 And a 3 A matrix composed of three vectors;
the attention score is expressed as: alpha is alpha score =Softmax(K T Q);
The output of the attention mechanism is: b = a score ·V,B={b 1 ,b 2 ,b 3 };
The output of the attention mechanism is processed through linear mapping: s. the input =FC(B);
Wherein S is input Representing the inputs to the policy network and FC representing the linear mapping implemented by the fully-connected layer neural network.
CN202211472603.2A 2022-11-16 2022-11-16 Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface Active CN115802313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211472603.2A CN115802313B (en) 2022-11-16 2022-11-16 Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211472603.2A CN115802313B (en) 2022-11-16 2022-11-16 Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface

Publications (2)

Publication Number Publication Date
CN115802313A true CN115802313A (en) 2023-03-14
CN115802313B CN115802313B (en) 2024-06-28

Family

ID=85440416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211472603.2A Active CN115802313B (en) 2022-11-16 2022-11-16 Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface

Country Status (1)

Country Link
CN (1) CN115802313B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
WO2021143052A1 (en) * 2020-01-13 2021-07-22 电子科技大学中山学院 Multi-dimensional resource joint scheduling optimization method for wireless energy transmission network
US20210255641A1 (en) * 2019-09-30 2021-08-19 South China University Of Technology Method for designing three-dimensional trajectory of unmanned aerial vehicle based on wireless power transfer network
CN113645635A (en) * 2021-08-12 2021-11-12 大连理工大学 Design method of intelligent reflector-assisted high-energy-efficiency unmanned aerial vehicle communication system
CN114124705A (en) * 2021-11-26 2022-03-01 重庆邮电大学 Resource allocation method based on max-min fairness for unmanned aerial vehicle-assisted backscatter communication system
CN114422056A (en) * 2021-12-03 2022-04-29 北京航空航天大学 Air-ground non-orthogonal multiple access uplink transmission method based on intelligent reflecting surface
CN114826380A (en) * 2022-04-22 2022-07-29 昆明理工大学 Unmanned aerial vehicle-assisted air-ground communication optimization algorithm based on deep reinforcement learning algorithm
CN115226255A (en) * 2022-07-15 2022-10-21 南京邮电大学 Unmanned aerial vehicle auxiliary communication working mode adjusting method based on intelligent reflecting surface

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210255641A1 (en) * 2019-09-30 2021-08-19 South China University Of Technology Method for designing three-dimensional trajectory of unmanned aerial vehicle based on wireless power transfer network
WO2021143052A1 (en) * 2020-01-13 2021-07-22 电子科技大学中山学院 Multi-dimensional resource joint scheduling optimization method for wireless energy transmission network
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN113645635A (en) * 2021-08-12 2021-11-12 大连理工大学 Design method of intelligent reflector-assisted high-energy-efficiency unmanned aerial vehicle communication system
CN114124705A (en) * 2021-11-26 2022-03-01 重庆邮电大学 Resource allocation method based on max-min fairness for unmanned aerial vehicle-assisted backscatter communication system
CN114422056A (en) * 2021-12-03 2022-04-29 北京航空航天大学 Air-ground non-orthogonal multiple access uplink transmission method based on intelligent reflecting surface
CN114826380A (en) * 2022-04-22 2022-07-29 昆明理工大学 Unmanned aerial vehicle-assisted air-ground communication optimization algorithm based on deep reinforcement learning algorithm
CN115226255A (en) * 2022-07-15 2022-10-21 南京邮电大学 Unmanned aerial vehicle auxiliary communication working mode adjusting method based on intelligent reflecting surface

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
HUNG NGUYEN-KHA等: "Joint UAV Placement and IRS Phase Shift Optimization in Downlink Networks", 《 IEEE ACCESS》, 13 October 2022 (2022-10-13) *
YI ZHOU: "Latency Minimization for Secure Intelligent Reflecting Surface Enhanced Virtual Reality Delivery Systems", 《IEEE WIRELESS COMMUNICATIONS LETTERS》, 15 March 2022 (2022-03-15) *
周毅等: "基于深度强化学习的无人机自主部署及能效优化策略", 《物联网学报》, 30 June 2019 (2019-06-30) *
李思贤: "基于智能反射面的无人机通信研究", 《信息科技》, 15 January 2022 (2022-01-15) *
白艳宇;吴颖;: "无线激光通信网络的上行接入带宽动态分配", 激光杂志, no. 08, 25 August 2017 (2017-08-25) *
陈前斌;管令进;李子煜;王兆堃;杨恒;唐伦;: "基于深度强化学习的异构云无线接入网自适应无线资源分配算法", 电子与信息学报, no. 06, 15 June 2020 (2020-06-15) *

Also Published As

Publication number Publication date
CN115802313B (en) 2024-06-28

Similar Documents

Publication Publication Date Title
Huang et al. Deep reinforcement learning for UAV navigation through massive MIMO technique
CN113162679B (en) DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method
Yan et al. Task allocation and route planning of multiple UAVs in a marine environment based on an improved particle swarm optimization algorithm
CN114690799A (en) Air-space-ground integrated unmanned aerial vehicle Internet of things data acquisition method based on information age
CN110049566B (en) Downlink power distribution method based on multi-unmanned-aerial-vehicle auxiliary communication network
CN110166154A (en) A kind of software radio spectrum monitoring knowledge method for distinguishing neural network based
Dong et al. Joint optimization of deployment and trajectory in UAV and IRS-assisted IoT data collection system
CN114142908B (en) Multi-unmanned aerial vehicle communication resource allocation method for coverage reconnaissance task
CN115659803A (en) Intelligent unloading method for computing tasks under unmanned aerial vehicle twin network mapping error condition
CN114980169A (en) Unmanned aerial vehicle auxiliary ground communication method based on combined optimization of track and phase
Liao et al. Energy minimization for UAV swarm-enabled wireless inland ship MEC network with time windows
CN115866574A (en) Disaster area rescue-oriented multi-unmanned aerial vehicle flight search and rescue trajectory optimization method
CN114158010B (en) Unmanned aerial vehicle communication system and resource allocation strategy prediction method based on neural network
Li et al. DNN partition and offloading strategy with improved particle swarm genetic algorithm in VEC
Li et al. TaskPOI priority-based energy balanced multi-UAVs cooperative trajectory planning algorithm in 6G networks
CN115802313A (en) Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface
CN117053790A (en) Single-antenna unmanned aerial vehicle auxiliary communication flight route-oriented planning method
Sun et al. Semantic-driven computation offloading and resource allocation for uav-assisted monitoring system in vehicular networks
CN116896777A (en) Unmanned aerial vehicle group general sense one-body energy optimization method based on reinforcement learning
Zhang et al. Offloading strategy for UAV-assisted mobile edge computing based on reinforcement learning
Tarekegn et al. Channel Quality Estimation in 3D Drone Base Station for Future Wireless Network
Xu et al. Energy optimization in multi-UAV-assisted edge data collection system
Xu et al. Learning-Based Energy Minimization Optimization for IRS-Assisted Master-Auxiliary-UAV-Enabled Wireless-Powered IoT Networks
Zhang et al. Learning-based trajectory design and time allocation in UAV-supported wireless powered NOMA-IoT networks
Wang et al. Safety constrained trajectory optimization for completion time minimization for uav communications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant