CN114222251A

CN114222251A - Adaptive network forming and track optimizing method for multiple unmanned aerial vehicles

Info

Publication number: CN114222251A
Application number: CN202111439489.9A
Authority: CN
Inventors: 龚世民; 王猛; 王海东; 龙钰斯
Original assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Current assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-03-22
Anticipated expiration: 2041-11-30

Abstract

The invention discloses a self-adaptive network forming and track optimizing method for multiple unmanned aerial vehicles, which comprises the following steps: based on a heuristic method, network forming is adjusted according to the energy consumption and the data cache state of the unmanned aerial vehicle, and a self-adaptive network forming strategy is obtained; and performing combined optimization of the tracks of the multiple unmanned aerial vehicles by combining an adaptive network forming strategy based on a multi-agent reinforcement learning method. By using the method, a track strategy and a formation strategy are cooperatively optimized, the advantages of a multi-unmanned aerial vehicle cooperative network are fully utilized, and the problem of difficulty in data transmission of ground user equipment is solved. The method for self-adaptive network forming and track optimization of the multiple unmanned aerial vehicles can be widely applied to the field of wireless communication.

Description

Adaptive network forming and track optimizing method for multiple unmanned aerial vehicles

Technical Field

The invention relates to the field of wireless communication, in particular to a method for self-adaptive network forming and track optimization of multiple unmanned aerial vehicles.

Background

In order to solve various problems faced in the development process of the internet of things, the unmanned aerial vehicle-assisted wireless communication network is considered to be a solution with great potential and application prospect. The problems of limited energy supply, remote position, non-line-of-sight obstacles and the like in order to expand the scale of the Internet of things can be solved by deploying a flying unmanned aerial vehicle in a wireless network to assist in user data transmission of the Internet of things. However, the main direction of the technical achievements of the existing single-unmanned-aerial-vehicle and multi-unmanned-vehicle auxiliary systems is to plan the flight path of the unmanned aerial vehicle, or to separately consider the control problem of the unmanned aerial vehicle, and the coupling relationship between the unmanned aerial vehicle path and the network connection relationship of the multiple unmanned aerial vehicles is not considered.

Disclosure of Invention

The invention aims to provide a multi-unmanned aerial vehicle self-adaptive network forming and track optimizing method, aims to cooperatively optimize a track strategy and a network forming strategy, fully utilizes the advantages of a multi-unmanned aerial vehicle cooperative network, and solves the problem of difficult data transmission of ground user equipment.

The first technical scheme adopted by the invention is as follows: a self-adaptive network forming and track optimizing method for multiple unmanned aerial vehicles comprises the following steps:

based on a heuristic method, adjusting network forming according to the energy consumption and the data cache state of the unmanned aerial vehicle to obtain a self-adaptive network forming strategy;

and performing combined optimization of the tracks of the multiple unmanned aerial vehicles by combining an adaptive network forming strategy based on a multi-agent reinforcement learning method.

Further, the step of adjusting network shaping according to the energy consumption of the unmanned aerial vehicle and the data cache state based on a heuristic method to obtain a self-adaptive network shaping strategy specifically includes:

at each transmission sub-time slot t, the unmanned aerial vehicle reports the current state to the base station;

the current state comprises a location

Network shaping strategy (phi (t), psi)^k(t)), energy consumption

And buffer information

When the base station collects the state information of all the unmanned aerial vehicles, the network forming matrix (phi (t), psi) is adjusted by taking the balance of the energy consumption of the unmanned aerial vehicles and the size of the queue as the target^k(t))；

The base station evaluates the cost function c of each drone in each time slot t_j(t) allowing the ith drone to cost c a minimum cost when the cost function of the drone continues to increase beyond a threshold_j(t) connecting other drones in the vicinity.

Further, still include:

judge to be unfavorable for the base station to collect data, forbid being connected between some unmanned aerial vehicle and the unmanned aerial vehicle.

Further, the multi-agent-based reinforcement learning method, combined with an adaptive network shaping strategy, performs a combined optimization of the trajectories of multiple drones, and specifically includes:

for multi-UAV systems, joint observations s defining the states of all UAVs_i(t) and action a_i(t)；

The ith UAV takes action a in s (t) state in the t time slot_i(t) obtaining a reward R_i(s(t),a_i(t))；

According to the reward R_i(s(t),a_i(t)) performing trajectory optimization.

Further, the reward includes an energy reward R_i,e(t) transmission of the reward R_i,d(t) and perceived reward R_i,c(t)：

Energy reward R_i,e(t), defined as a negative value of energy consumed, for causing the ith drone to reduce energy consumption at each time slot;

transmission of a reward R_i,d(t) representing the amount of data transmitted from the ith drone to the base station;

perception reward R_i,c(t) data transmitted back by the sensors of the internet of things users in the coverage area of the ith unmanned aerial vehicle are represented.

Further, still include:

and (4) combining a penalty function to carry out track optimization.

The method has the beneficial effects that: the invention considers the coupling influence among a plurality of unmanned aerial vehicles, realizes the optimal solution of the transmission performance of the wireless network by integrally planning the scheduling of the plurality of unmanned aerial vehicles, and can greatly improve the performance of the multi-unmanned aerial vehicle auxiliary wireless network system due to the integrated consideration of the adaptive network forming, so that the system is more flexible and the application scene is wider.

Drawings

Fig. 1 is a block diagram of a multi-drone assisted wireless network system according to a specific embodiment of the present invention;

fig. 2 is a schematic structural diagram of a method for adaptive network formation and trajectory optimization of multiple drones according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

Referring to fig. 1, the multi-drone assisted wireless network system has one Base Station (BS) and a plurality of drones (UAVs). Collection

Indicating a fleet of drones. CollectionCombination of Chinese herbs

Representing sensors or IoT users deployed on the ground, which allow for direct communication beyond range with a base station, flying in a designated area by deploying multiple drones, and collecting user sensory data to the BS. Each UAV may be directly connected to the BS or may relay its information back to the BS through other UAVs. Assuming that each drone is equipped with an antenna, support for UAV to UAV direct communications (i.e., U2U communications); different network topologies can be formed by channel allocation on different links between the unmanned aerial vehicles, namely the adaptive network shaping mentioned in the item, and the method can potentially reduce the overall transmission delay and energy consumption of the multi-hop relay transmission. Furthermore, as each drone optimizes and follows its own trajectory, the network structure formed by the multiple drones also changes over time. So in this scheme, optimize network shaping and unmanned aerial vehicle's orbit jointly.

Referring to fig. 2, the invention provides a method for adaptive network formation and trajectory optimization of multiple drones, comprising the following steps:

s1, adjusting network forming according to the energy consumption and the buffer state of the unmanned aerial vehicle based on a heuristic method to obtain a self-adaptive network forming strategy;

given trajectories of multiple drones

The topology of many drones requires adaptive network shaping, which makes the problem a non-linear integer programming problem. Although the problem can be solved by existing branch-and-bound methods, the problem has very high computational complexity due to the dynamic evolution of data buffering space and energy consumption of multiple drones and IoT users at different time slots. Therefore, a simple heuristic algorithm, namely an energy and delay perception network shaping (EDA-NF) algorithm is provided to adjust the network shaping of the unmanned aerial vehicle according to the energy consumption and the data caching state of the unmanned aerial vehicle, and the basic idea of the EDA-NF algorithm is to balance the energy consumption and the data of different unmanned aerial vehiclesThe size of the queue. Specifically, at each transmission sub-slot t, the ith unmanned aerial vehicle (UAV-i) reports its current state, including current location l, to the BS_i(t), network shaping strategy (Φ (t), Ψ)^k(t)), energy consumption

And data buffer information

. When the BS collects the status information of all drones, it will adjust the network shaping policy (Φ (t), Ψ)^k(t)) to balance the energy consumption of the drone and the queue size. The BS will evaluate the cost function c of each UAV in each time slot t_j(t) of (d). Cost function c when UAV-i_j(t) when continuing to increase beyond a certain threshold, UAV-i attempts to open the U2U channel at a minimum cost c_j(t) connect neighboring drones (or send directly to the BS). Meanwhile, the BS can forbid the connection of other unmanned aerial vehicles to the U2U of the UAV-i, the information transmission capability of the unmanned aerial vehicle can be greatly improved through the mode, and the transmission delay is reduced.

S2, performing combined optimization of the tracks of the multiple unmanned aerial vehicles based on a multi-agent reinforcement learning (MADRL) method and in combination with an adaptive network forming strategy.

Given a network shaping policy (phi (t), psi)^k(t)), the remaining task is to update the trajectory of the drone for the remaining period of time. Due to its dynamic nature, trajectory optimization is very complex, and the present solution reformulates path planning as a Markov Decision Process (MDP), which is approximated using a model-free Deep Reinforcement Learning (DRL) method. Using tuples

To characterize the MDP, wherein

And

representing a state space and an action space. RIs state-action(s)_t,a_t) A function of the pair. For a multi-drone system, the joint observation of all UAV states is denoted by s (t), i.e. s (t) ═ s₁(t),s₂(t),...,s_N(t)). Similarly, the action is a (t) ═ a₁(t),a₂(t),...,a_N(t)). State s of each drone_i(t) including its position l_i(t), network shaping strategy (phi (t), psi)^k(t)), energy state E_i(t) and buffer size D_i(t) of (d). Action a of each drone_i(t) including the direction of flight d_i(t) and velocity v_i(t) of (d). UAV-i taking action a in time slot t, s (t)_i(t) can obtain its own reward R_i(s(t),a_i(t)). For a multi-drone system, the reward for UAV-i is also dependent on the actions of the other drones, noted as a_-i(t) of (d). The reward for UAV-i consists of three parts: energy reward R_i,e(t) transmission of the reward R_i,d(t) and perceived reward R_i,c(t) of (d). An energy reward is defined as

It forces UAV-i to reduce energy consumption at each slot. To reduce transmission delays, each drone is rewarded if it forwards as much data as possible. Transmission of a reward R_i,d(t) refers to the amount of data transmitted from UAV-i to BS or relay drone, i.e.

. For sensing reward

This section of reward is shown to be determined by data transmitted back from sensors of the IoT users in the coverage area of UAV-i. We then use the reward definition above to approximate the original design objective. In addition, a penalty term R is required_i,p(t) to ensure a minimum safe distance between UAV-i and other drones. If l_i(t)-l_j(t)||≥d_minThe constraint does not hold, we can simply be R_i,p(t) allocating a largerThe penalty function value of (1).

Given the network shaping policy (Φ (t), Ψ) of the drone^k(t)), the drone needs to search for the optimal flight direction d based on local observations_i(t) and moving speed v_i(t) to update the trajectory. Considering that there are multiple drones in the system, the observation of each drone depends not only on its own action, but also on the actions of the other drones. The trajectory of the drone can thus be learned using a multi-agent depth deterministic policy gradient algorithm (madpg) in multi-agent depth reinforcement learning. Combining an EDA-NF algorithm, the training method is as follows, in an off-line training stage, the BS collects the state updates of all the unmanned aerial vehicles and trains the Critic network and the Actor network of the unmanned aerial vehicles simultaneously in a centralized training mode. After offline training, the Critic network and the Actor network may issue different drone commands to guide the decision of a single drone in a decentralized manner.

Using the trajectories learned by the madpg algorithm, each UAV will follow its trajectory to receive IoT user's data and forward it to the BS or other drone in the next slot. Once the BS receives the data or status updates forwarded by the drones, it will evaluate the cost function c for each drone_j(t) of (d). This result can be used to initialize the network shaping policy for the drone, as shown in algorithm 1, lines 8-10. Network shaping policy matrix (phi (t), psi)^k(t)) as input to the maddppg algorithm and by training the trajectory of the drone is output.

This scheme is realized mainly based on following unmanned aerial vehicle communication principle:

1) network shaping and subchannel allocation

We consider a slot frame structure

. In each time slot

Each UAV may fly to a location, receive data from IoT users, buffer the data, and then offload to the BS. We assume that the drone has maximum cache capacityD_MAXFor data caching. Status information of the drone (e.g., location of the drone, data buffer size, and network status) may also be updated to the BS during the offload phase. The channels of the drone are described as follows:

IoT user-to-UAV (I2U communication): the I2U channel is used for each drone to collect sensory data from IoT devices within its signal coverage. We assume that a direct channel from the IoT device to the BS is not available. The drone will collect data from the ground sensors in a planned trajectory.

UAV-to-BS (U2B communication): each drone may report its data to the BS over the U2B channel. We assume that the U2B transmission relies on a dedicated cellular channel shared by all UAVs. The data rate on the U2B channel depends on the drone's location and channel conditions.

UAV-to-UAV (U2U communication): if some drones are far away from the BS, we allow them to connect with nearby drones through the U2U tunnel. Through multi-hop relay, the perception data of all internet of things users can be forwarded to the base station. The network shaping of the drone is also related to the overall delay performance.

By using

And

representing a set of drones that forward sensory data using the U2B and U2U channels in the t slot, respectively. All unmanned aerial vehicles are used

Meaning that each UAV is connected either to the BS or to other UAVs. For some drones that are far away from the base station, the direct link may have a lower signal-to-noise ratio (SNR) and larger transmission delay, suggesting that continuing this strategy may result in more hover time and higher energy consumption. In this case, the drone may instead use the U2U channel and aggregate with it

The other drones in (1) connect.

Considering the limited channel resources in cellular systems, we assume that all drones share

And (4) orthogonal subchannels. The set of all sub-channels is denoted as

. Let binary matrix

Denotes the U2B sub-channel allocation strategy, wherein

Indicating the k-th sub-channel for UAV-i and U2B channels to offload their data. Similarly, a binary matrix is defined

As a sub-channel allocation strategy of U2U, wherein

Representing the U2U connection on the kth sub-channel between UAV-i and UAV-j, the sub-channel allocation being constrained by the following resources:

the path planning algorithm invented by the project is suitable for each subchannel k, and the (phi (t), psi in the adjustment formula^k(t)) two matrices to determine the drone network formation in each time slot t.

2) Channel model building for I2U, U2U and U2B

All unmanned aerial vehicles are set to fly at a fixed height H, sensing data are collected from IoT users, and then problem expression and solution can be popularized to the situation that the flying height changes along with time. The trajectory of each UAV-i may be defined as notA set of location points on the same time slot, i.e.

Each position is specified by two-dimensional coordinates, i.e. /)_i(t)＝(x_i(t),y_i(t)). The BS is fixed at the coordinate origin. Suppose UAV-i is at a limited velocity v_i(t)≤v_maxTo d_iAnd (t) moving in the direction. The position of UAV-i at the next time interval t +1 may be given by: l_i(t+1)＝l_i(t)+v_i(t)d_i(t) of (d). The distance between UAV-i and UAV-j is expressed as:

d_i,j(t)＝||l_i(t)-l_j(t)||

by H_bExpressing the height of the BS antenna, we can also find the distance d between the UAV-i and the BS_i,0. Given IoT device

Position on the ground

Then its distance to UAV-i is determined by

It is given.

The UAV and BS are typically line-of-sight wireless transmissions, so the U2U and U2B channels employ a simplified exponential channel fading model. For the drones in the system, when UAV-i transmits information to UAV-j on sub-channel, the received power of UAV-j on sub-channel k is expressed as

Wherein

Represents the transmit power, β, of UAV-i on the k-th sub-channel_i,jIs a constant power gain caused by the amplifier and antenna of the transceiver. Path loss

Dependent on the distance between the transceivers, alpha_uRepresenting the path loss constant. If other UAV-m (m ≠ i) also transmits on the same sub-channel k, the interference power for UAV-j is given by:

thus, the transmission data rate from UAV-i to UAV-j on all sub-channels may be expressed as

Wherein

Representing the noise power on the k-th sub-channel. The U2B data rate may be similarly defined. Each drone collects sensory data as it flies over the ground IoT device, which means that I2U communication is eligible for line-of-sight transmission. The I2U channel can therefore be approximately characterized in the same way as the U2U and U2B channels.

The scheme models the problem as follows. For each UAV-i, the time slot may be further divided into perception, transmission and flight sub-time slots, respectively denoted by t_i,s、t_i,oAnd t_i,fRespectively, are shown. Data s received by UAV-i during sensing_i(t) depends on its coverage and the transmission rate of I2U. Let W_m(t) denotes IoT user

The remaining data in (1). The data queue for IoT user m may be updated as follows:

wherein [ X ]]⁺Indicating a maximum operation, i.e. max{0,X},x_i,m(t) e {0,1} represents the communication of IoT user m to UAV-i, s_i,m(t)≤D_mRepresenting the amount of perceived data collected by UAV-j. Order to

Representing the set of users under UAV-i coverage, then

And the buffer dynamics of the UAV may be modeled as follows:

wherein

Representing output data from UAV-i. O is_iFirst term of (t) o_i,0(t) is data transmitted to the BS, the second term

Is the data sent to the drone. D_iThird term of (t +1)

Is data received from other drones.

Given a task completion time T, the total energy consumption of multiple drones may be expressed as

Wherein

Representing the total energy consumption of multiple drones at different sub-channels, the problem of minimizing the total energy is then modeled as follows

The solution is to optimize the network shaping strategy (phi (t), psi) under the constraints mentioned above^k(t)) and binary matrices

This matrix specifies the I2U connection policy in each time slot. All these matrix variables should be aligned with the trajectories L of the multiple drones_iAnd (4) joint optimization. We also optimize the total number of timeslots T needed to complete the offloading of all user data, which may simplify the fixed sensing strategy; given the position of the drone, the I2U correlation matrix x (t) may be determined. By D_i(t)≤D_max and D_i(T) 0 and W_m(0)＝D_m and W_m0 ensures that the sensory data of all IoT users can be successfully offloaded to the BS after T slots. I_i(t+1)-l_i(t)||≤v_max(t)t_i,fAnd l_i(t)-l_j(t)||≥d_minThe inequality in (1) limits the flight speed and distance of multiple drones. In fact, the transmission power p of multiple drones_iThe power consumption of hovering and flying of the unmanned aerial vehicle is far less, and the optimization problem can be omitted.

By the way, the modeling details, complexity and simplified thought of the problem are described in detail, and the two iterative algorithms proposed by the scheme have excellent performance in processing the problem.

In the scheme, the multiple unmanned aerial vehicle paths are planned, self-adaptive network forming is added, and a two-stage algorithm is provided to iterate between the self-adaptive network forming and track optimization, so that cooperation of the network forming and the track optimization is realized. The adaptive network forming is based on a heuristic algorithm EDA-NF, and can be used for balancing the energy consumption of the unmanned aerial vehicle and the size of a data cache queue. Compared with the traditional strategy, the algorithm designed by the scheme is low in calculation complexity and more efficient. And the system is more flexible by combining track optimization and network forming, and a plurality of unmanned aerial vehicles are allowed to adaptively optimize a new network structure according to position change. And the algorithm optimizes the energy consumption and the time delay, reduces the energy consumption of the system and the time delay of the system, and makes the practical use of the system possible.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A self-adaptive network forming and track optimizing method for multiple unmanned aerial vehicles is characterized by comprising the following steps:

based on a heuristic method, network forming is adjusted according to the energy consumption and the data cache state of the unmanned aerial vehicle, and a self-adaptive network forming strategy is obtained;

2. The method of claim 1, wherein the step of adjusting the network shaping according to the energy consumption and the data cache state of the unmanned aerial vehicle based on a heuristic method to obtain the adaptive network shaping comprises:

the current state includes a location L_i(t), network shaping strategy (phi (t), psi)^k(t)), energy consumption

And data caching information

When the base station collects the state information of all the unmanned aerial vehicles, the network shaping strategy (phi (t), psi) is adjusted by taking the balance of the energy consumption of the unmanned aerial vehicles and the queue size as the target^k(t))；

The base station evaluating each drone in each time slot tA cost function that allows the ith drone to cost c a minimum cost when the cost function of the ith drone continues to increase beyond a threshold_j(t) connecting other drones in the vicinity.

3. The method of claim 2, further comprising:

judge to be unfavorable for the condition that the basic station collected data, forbid being connected between some unmanned aerial vehicle and the unmanned aerial vehicle.

4. The method as claimed in claim 3, wherein the multi-agent-based reinforcement learning method is combined with an adaptive network shaping strategy to perform joint optimization of trajectories of multiple drones, and specifically includes:

According to the reward R_i(s(t),a_i(t)) performing trajectory optimization.

5. The method of claim 4, wherein the reward comprises an energy reward R_i,e(t) transmission of the reward R_i,d(t) and perceived reward R_i,c(t)：

6. The method of claim 5, further comprising:

and (4) combining a penalty function to carry out track optimization.