CN114257298B

CN114257298B - Intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method

Info

Publication number: CN114257298B
Application number: CN202210050792.8A
Authority: CN
Inventors: 梅海波; 蔡勇; 车畅; 张子歌; 庞宇; 梁楚雄; 孙小博
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-09-27
Anticipated expiration: 2042-01-17
Also published as: CN114257298A

Abstract

The invention discloses an intelligent reflection surface phase shift and unmanned aerial vehicle path planning method, and relates to the field of unmanned aerial vehicle air-ground communication, intelligent reflection surface auxiliary communication and deep learning.

Description

Intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method

Technical Field

The invention relates to the fields of unmanned aerial vehicle air-ground communication, intelligent reflection surface auxiliary communication and deep learning, in particular to an intelligent reflection surface phase shift and unmanned aerial vehicle path planning method.

Background

The wireless communication supported by the unmanned aerial vehicle is a research hotspot in recent years, and the unmanned aerial vehicle has high mobility, so that the unmanned aerial vehicle can flexibly work in a three-dimensional space, and attracts the air-ground wireless communication by simultaneously adjusting the horizontal and vertical positions of the unmanned aerial vehicle. Due to its high mobility, drone-assisted wireless networks are particularly suitable for on-demand deployment in emergency situations, where the drone is mainly used as an aerial temporary base station or access point, transmitting or receiving data to ground terminals. However, in practical application, a communication link between the unmanned aerial vehicle and the terminal is likely to be blocked by regional obstacles, so that signal attenuation is caused, and the data transmission rate is reduced. The problem is solved by the proposal of an intelligent reflecting surface, which is a meta-surface equipped with integrated electronic circuits, which can be programmed to vary the input electromagnetic field in a customizable manner, each surface element being realized by a reflecting array. The communication link blocked between the unmanned aerial vehicle and the ground terminal can be transferred and reestablished by an intelligent reflection surface on a building, so that the energy and the spectral efficiency of a cellular system are effectively utilized, and the unmanned aerial vehicle is helped to overcome the signal blocking problem of air-ground wireless communication.

Despite these advantages, there are three technical problems that have yet to be resolved.

Firstly, under the assistance of an intelligent reflecting surface, two sections of communication links between a ground terminal and an unmanned aerial vehicle are influenced by the movement of the unmanned aerial vehicle, and the three-dimensional track design is difficult to realize;

secondly, the phase shift of the intelligent reflecting surface needs to be calculated and determined in real time according to the current communication link condition, and the traditional algorithm has large calculation amount and poor real-time performance, so that the quality of the communication link is influenced;

finally, the drone has limited energy, and the overall propulsion energy of the drone should be controlled to a minimum, while at the same time having a high system energy efficiency.

Generally, the above problems affect each other in the unmanned aerial vehicle air-ground communication system, and how to solve the joint optimization problem is particularly important for improving the unmanned aerial vehicle air-ground communication energy efficiency.

Disclosure of Invention

The invention aims to solve the problems and designs an intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method which is based on deep reinforcement learning, minimizes the energy consumption of an unmanned aerial vehicle and simultaneously maximizes the gain of a wireless communication network under the assistance of an intelligent reflecting surface.

The invention realizes the purpose through the following technical scheme:

an intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method is characterized by comprising the following steps:

s1, establishing an unmanned aerial vehicle-terminal communication model assisted by the intelligent reflecting surface;

s2, collecting information of the unmanned aerial vehicle, the intelligent reflection surface and the ground terminal in the current area, and importing a communication model;

s3, establishing a deep reinforcement learning network, and initializing initial and target network parameters;

s4, initializing the states of the communication scene of the unmanned aerial vehicle assisted by the intelligent reflection surface in the deep reinforcement learning network and the terminal;

s5, executing behaviors according to the states and the rewards;

s6, judging whether the unmanned aerial vehicle is out of range or overspeed, and if so, punishing and canceling the execution behavior;

s7, applying intelligent reflection surface phase shift parameters and executing behaviors;

s8, saving the behavior, reward, current and next state to the sample;

s9, if the task is not completed, repeating the steps S5 to S8 for a fixed number of times or until the task is completed;

s10, randomly selecting small samples from the samples obtained in S8 to calculate target values;

s11, updating network parameters by a method of minimizing a loss function and gradient descent respectively;

and S12, repeating the steps S4 to S11 for a fixed number of times to obtain the stable intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method.

The invention has the beneficial effects that: the design is based on the frame of degree of depth reinforcement study, realizes comparing in the trade-off of traditional protruding optimization algorithm in computational complexity and precision, combines intelligent reflection surface technique, jointly optimizes unmanned aerial vehicle three-dimensional orbit and intelligent reflection surface phase shift for maximize wireless communication network gain when minimizing unmanned aerial vehicle energy consumption promotes unmanned aerial vehicle and ground terminal communication efficiency.

Drawings

FIG. 1 is a flow chart of a method of intelligent reflective surface phase shifting and unmanned aerial vehicle path planning of the present invention;

fig. 2 is a scene model diagram of an intelligent reflective surface phase shift and unmanned aerial vehicle path planning method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it is to be understood that the terms "upper", "lower", "inner", "outer", "left", "right", and the like indicate orientations or positional relationships based on orientations or positional relationships shown in the drawings, or orientations or positional relationships conventionally placed when the product of the present invention is used, or orientations or positional relationships conventionally understood by those skilled in the art, which are merely for convenience of description and simplification of description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore, should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it is also to be noted that, unless otherwise explicitly stated or limited, the terms "disposed" and "connected" are to be interpreted broadly, and for example, "connected" may be a fixed connection, a detachable connection, or an integral connection; can be mechanically or electrically connected; the connection may be direct or indirect through an intermediate medium, and the connection may be internal to the two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

The following detailed description of embodiments of the invention refers to the accompanying drawings.

The invention provides an intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method which effectively balances computational complexity and computational accuracy, maximizes wireless communication network gain while minimizing unmanned aerial vehicle energy consumption, and consists of three parts of system model establishment, model transformation and solution, as shown in figure 1, the method specifically comprises the following steps:

s1, establishing an unmanned aerial vehicle-terminal communication model assisted by an intelligent reflection surface, specifically:

in a three-dimensional area of unmanned aerial vehicle and terminal communication under the assistance of intelligent reflection surface, this area is evenly divided into a plurality of cells, and the horizontal coordinate of the central of ith cell is

In the formula

Set of abscissa, x, referring to the horizontal center of all cells _s And y _s Refers to the horizontal distance between two adjacent cells in the x and y directions.

Refers to the horizontal position of the unmanned plane at the nth time slot, wherein

Where N refers to all slots. As shown in fig. 2, is provided

And

the horizontal center for the takeoff and landing of the unmanned aerial vehicle is set in advance.

Refers to the vertical position of the drone at the nth slot. Hence spatial coordinates

And time slot duration

The path plan of the drone can be characterized.

And establishing an energy consumption model. According to the horizontal flying speed of the unmanned aerial vehicle

Constant blade power P ₀ Hovering induced power P ₁ Constant falling or rising power P ₂ Speed of moving blade _tip Average rotor induced velocity v at hover ₀ Body resistance ratio d ₀ The rotor solidity S, the air density rho and the rotor disc area G, and calculating the propulsion energy of the unmanned rotorcraft

And establishing a communication model between the intelligent reflecting surface and the unmanned aerial vehicle. Number of reflecting units M according to each uniform planar array on intelligent reflecting surface _c ×M _r Uniform planar array of column spacing d _c Distance d between rice and line _r And m, calculating the channel gain between the unmanned aerial vehicle and the intelligent reflecting surface at the nth time slot

Xi in the formula refers to the channel loss when the distance is 1 meter, and the distance between the intelligent reflecting surface of the nth time slot and the unmanned aerial vehicle is expressed as

z _R And w _R Which respectively indicate the position of the first element of the intelligent reflective surface in the vertical and horizontal directions, lambda refers to the carrier wavelength,

and

respectively denote cosine and sine values of the angle of arrival of the horizontal signal of the intelligent reflecting surface,

refers to the sine of the angle of arrival of the vertical signal at the intelligent reflective surface.

And establishing a communication model between the intelligent reflecting surface and the ground terminal. Calculating channel gain between a kth terminal and a smart reflective surface

Wherein the distance between the kth terminal and the intelligent reflective surface

And

respectively refers to a cosine value and a sine value of a k terminal horizontal signal emission angle,

refers to the sine of the k-th terminal vertical signal transmission angle. Further, the channel gain of the k-th terminal of the overall process may be expressed as

In the formula

Is an intelligent reflective surface reflection phase coefficient matrix and

and establishing a communication link model of the unmanned aerial vehicle and the ground terminal under the assistance of the intelligent reflecting surface. Calculating blocking probability of link between unmanned aerial vehicle and k ground terminal in n time slot

In the formula

a and b are variables that change as the communication environment changes. Further, the average channel gain achievable by the kth terminal is expressed as

Channel rate of

In the formula, P is the fixed transmitting power of the unmanned aerial vehicle, B is the bandwidth, sigma is the noise variable, c _k，n With 0, 1 indicating whether the kth terminal is scheduled (the same slot intelligent reflective surface serves at most one terminal).

S2, collecting information of the unmanned aerial vehicle, the intelligent reflection surface and the ground terminal in the current area, and importing a communication model:

and collecting information of the unmanned aerial vehicle L, H, T, information of the intelligent reflection surface theta and information of the ground terminal C in the current area, and importing the information into a communication model. Wherein

Indicating a set of horizontal positions of the drone,

indicating a set of vertical positions of the drone,

indicating the duration of each flight time slot of the drone,

indicating the intelligent reflective surface reflection phase coefficient matrix,

indicating a ground terminal scheduling scheme;

s3, establishing a deep reinforcement learning network, initializing initial and target network parameters:

and establishing a deep reinforcement learning network, and initializing an experience recurrence cache F and a time slot number N. Initializing parameter theta of initial and target policy pi network ^π And theta ^π′ So that θ ^π′ ＝θ ^π . Parameters thetaq and thetaq for initializing initial and target deep reinforcement learning Q networks ^Q′ So that theta ^Q′ ＝θ ^Q ；

S4, initializing the state of the communication scene between the intelligent reflection surface assisted unmanned aerial vehicle and the terminal in the deep reinforcement learning network to a state S (1);

s5, performing behavior according to the state and the reward:

random selection behavior

Carry out and executeIn the formula

Is a random equation, pi (s (n) | theta ^π ) Denoted in state s (n) and network parameter θ ^π Selecting a time strategy;

s6, judging whether the unmanned aerial vehicle is out of range or overspeed, if yes, punishing, and simultaneously canceling the execution behaviors a (n);

s7, applying the smart reflective surface phase shift parameter, performing acts a (n) for state S (n +1) and reward r (S (n), a (n));

s8, storing behaviors, rewards, current and next-moment states into the samples, namely storing the samples (S (n), a (n), r (·), S (n +1)) into an experience reproduction cache F;

s9, if task D _k If not, repeating the steps S5 to S8 repeatedly until the task is completed or repeating N times;

s10, randomly selecting small samples from the samples obtained in S8 to calculate the target values:

selecting a batch of random small samples (s (j), a (j), r (j), s (j +1)) from M samples in the empirical recurrence cache F, wherein s (j) and s (j +1) respectively refer to the states at the moments j and j +1, a (j) refers to the behavior at the moment j, and r (j) refers to the reward at the moment j; the target value y (j) ═ r (j) + γ Q' (s (j +1) | θ is calculated ^π′ )|θ ^Q′ ) In the formula, gamma is belonged to (0, 1)]For the discounting factor, Q '(s (j +1), π' (s (j + 1). theta ^π′ )|θ ^Q′ ) The Q value parameter of the target Q' () network obtained through the state, the strategy and the network parameter at the moment j + 1.

S11, updating the network parameters by a method of minimizing a loss function and gradient descent respectively:

by minimizing a loss function

Updating Q (-) network weights θ ^Q Wherein Q (s (j), a (j) | θ ^Q ) By network weight θ ^Q Q value obtained when the state is s (j), the action is a (j), and updating the target network parameter theta ^Q′ ＝δθ ^Q +(1-δ)θ ^Q′ Wherein δ is a proportionality coefficient; by means of a pair of gradients

Solving and reducing, wherein J (pi) is the output value of the strategy network, a refers to behavior parameters, s refers to state parameters, pi refers to strategy parameters, and the weight theta of the pi (·) network is updated ^π And updating the target network parameter θ ^π′ ＝δθ ^π +(1-δ)θ ^π′ 。

The technical solution of the present invention is not limited to the limitations of the above specific embodiments, and all technical modifications made according to the technical solution of the present invention fall within the protection scope of the present invention.

Claims

1. The intelligent reflecting surface phase shift and unmanned aerial vehicle path planning method is characterized by comprising the following steps:

s5, executing behaviors according to the states and the rewards;

s8, saving the behavior, reward, current and next state to the sample;

s12, repeating the steps S4 to S11 for a fixed number of times to obtain an intelligent reflecting surface phase shift tending to be stable and an unmanned aerial vehicle path planning method;

in S1, the method includes:

s101, using space coordinates

And time slot duration

Characterizing a path of the drone;

wherein the content of the first and second substances,

Where N refers to all slots;

indicating the vertical position of the unmanned plane in the nth time slot;

in a three-dimensional area for communication between the unmanned aerial vehicle and the terminal under the assistance of an intelligent reflecting surface, the area is uniformly divided into a plurality of cells, and the horizontal coordinate of the center of the ith cell is

Wherein

A set of abscissas that refer to the horizontal centers of all cells;

is provided with

And

taking off and landing horizontal centers of the unmanned aerial vehicle which are set in advance;

s102, the propulsion energy of the rotor unmanned aerial vehicle is as follows:

in the above formula, the first and second carbon atoms are,

the horizontal flying speed of the unmanned plane; p ₀ Constant blade wing power; p ₁ Inducing power for hovering; p ₂ Constant falling or rising power; u shape _tip The moving blade wing speed; v. of ₀ Average rotor induced speed at hover; d ₀ Is the fuselage resistance ratio; s is rotor solidity; ρ is the air density; g is the area of the rotor disc;

s103, according to the number M of the reflection units of each uniform planar array on the intelligent reflection surface _c ×M _r Uniform planar array of column spacing d _c Distance d between rice and row _r And m, calculating the channel gain between the unmanned aerial vehicle at the nth time slot and the intelligent reflecting surface:

in the above formula, ξ is the channel loss at a distance of 1 meter;

the distance between the intelligent reflecting surface and the unmanned aerial vehicle is the nth time slot; zR and w _R Respectively representing the position of a first element of the intelligent reflecting surface in the vertical direction and the horizontal direction; λ refers to the carrier wavelength;

and

respectively indicating cosine and sine values of the arrival angle of the horizontal signal of the intelligent reflection surface;

sine value of the vertical signal arrival angle of the intelligent reflection surface;

s104, similarly, calculating the channel gain between the kth terminal and the intelligent reflecting surface:

And

respectively denote a cosine value and a sine value of a k-th terminal horizontal signal emission angle,

the sine value of the vertical signal emission angle of the kth terminal is referred to; further, the channel gain of the k-th terminal of the overall process is expressed as

In the formula

Is an intelligent reflective surface reflection phase coefficient matrix and

s105, calculating blocking probability of a link between the unmanned aerial vehicle and the kth ground terminal in the nth time slot

In the formula

a and b are variables that change with changes in the communication environment; further, the average channel gain achieved by the kth terminal is expressed as

Channel rate of

In the formula, P is the fixed transmitting power of the unmanned aerial vehicle, B is the bandwidth, sigma is the noise variable, c _k，n {0, 1} represents whether or not the k-th terminal is scheduled;

in S2, collecting information of the unmanned aerial vehicle L, H, T, information of the intelligent reflection surface theta and information of the ground terminal C in the current area, and importing a communication model; wherein

Indicating a set of horizontal positions of the drone,

indicating a set of vertical positions of the drone,

indicating the duration of each flight time slot of the drone,

indicating a ground terminal scheduling scheme;

in S3, establishing a deep reinforcement learning network, and initializing an experience recurrence buffer F and a time slot number N; initializing parameter theta of initial and target policy pi network ^π And theta ^π′ So that theta ^π′ ＝θ ^π (ii) a Initializing parameters θ of initial and target deep reinforcement learning Q networks ^Q And theta ^Q′ So that θ ^Q′ ＝θ ^Q ；

In S5, behaviors are randomly selected

Is carried out in the formula

Is a random equation, pi (s (n) | theta ^π ) Denoted in state s (n) and network parameter θ ^π Selecting time;

in S6, if the drone is flying beyond the boundary or the speed exceeds the upper limit, then penalizing while cancelling action a (n);

at S7, applying the smart reflective surface phase shift parameter, performing acts a (n) for state S (n +1) and reward r (S (n), a (n));

in S8, storing the samples (S (n), a (n), r (·), S (n +1)) in an experience recurrence cache F;

in S10, selecting a random small sample (S (j), a (j), r (j), S (j +1)) from M samples in the empirical recurrence buffer F, where S (j) and S (j +1) refer to the states at times j and j +1, respectively, a (j) refers to the behavior at time j, and r (j) refers to the reward at time j; the target value y (j) ═ r (j) + γ Q' (s (j +1) | θ is calculated ^π′ )|θ ^Q′ ) In the formula, gamma is belonged to (0, 1)]As a discount factor, Q '(s (j +1), π' (s (j +1) | θ ^π′ )|θ ^Q′ ) The Q value parameter of the target Q' () network obtained through the state, the strategy and the network parameter at the moment j + 1;

in S11, by minimizing the loss function