CN113268077B

CN113268077B - Unmanned aerial vehicle energy consumption minimization design method and device

Info

Publication number: CN113268077B
Application number: CN202110397120.XA
Authority: CN
Inventors: 张煜; 熊轲; 吴鹏; 单葆国; 谭显东; 唐伟; 王成洁; 谭清坤; 刘小聪; 贾跃龙; 马捷; 张玉琢; 吴姗姗; 张成龙; 王向; 张莉莉; 刘青; 姚力; 汲国强
Original assignee: Beijing Jiaotong University; State Grid Energy Research Institute Co Ltd
Current assignee: Beijing Jiaotong University; State Grid Energy Research Institute Co Ltd
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2023-06-09
Anticipated expiration: 2041-04-13
Also published as: CN113268077A

Abstract

The invention discloses an unmanned aerial vehicle energy consumption minimization design method and device, wherein the method comprises the following steps: calculating channel gains of the unmanned aerial vehicle and the sensor nodes; according to the channel gain of the unmanned aerial vehicle and the sensor node, calculating the energy consumed by the communication of the sensor node; according to the channel gains of the unmanned aerial vehicle and the sensor node, calculating the energy collected by the sensor node; calculating energy consumed by the unmanned aerial vehicle; calculating the age AoI of the sensor information according to the energy consumed by the communication of the sensor nodes and the residual capacity of the battery of the sensor nodes; the flight track, the flight time and the information acquisition and energy collection strategies of the unmanned aerial vehicle are jointly optimized to minimize the energy consumption of the unmanned aerial vehicle, and an optimization problem model is established; establishing an unmanned aerial vehicle control frame based on DQN; and planning the unmanned aerial vehicle flight strategy based on the DQN according to the unmanned aerial vehicle control framework based on the DQN.

Description

Unmanned aerial vehicle energy consumption minimization design method and device

Technical Field

The invention relates to the technical field of network optimization design, in particular to an energy consumption minimization design method and device for a unmanned aerial vehicle.

Background

With rapid deployment of 5G, applications such as Virtual Reality (VR), augmented Reality (AR, augmented Reality), unmanned, smart medicine, etc. are being developed, and these applications require ultra-high reliability, low latency communication, are also very sensitive to information freshness, and conventional network indexes such as throughput, delay, etc. cannot accurately characterize information freshness in a network.

In order to accurately characterize the freshness of information, the academia proposes the concept of information age (AoI, age of information), which is defined as the time elapsed since the generation of the data packet at the source node, which is received most recently by the destination node. AoI is one of the key performance indicators in a sensor network, in order to ensure the freshness of information in the network, the sensor nodes need to frequently transmit the latest collected information, and therefore a lot of energy is consumed, and frequent battery replacement or charging is generally inconvenient and expensive, and is more difficult to realize in severe environments.

Wireless power transfer (WPT, wireless power transfer) technology can provide a stable power source for sensor nodes, thereby extending system run time. Therefore, there have been many related studies on AoI systems based on rf energy harvesting. However, most of these works use a linear energy harvesting model, i.e. the harvested energy increases linearly with the input power of the received rf signal, whereas in practice, the nonlinear characteristics of the circuit caused by the diode and other elements cause the input/output of the energy harvesting circuit to exhibit a high degree of nonlinear characteristics, so that the nonlinear energy harvesting model should be considered by a practical system.

In addition, the introduction of unmanned aerial vehicles (UAV, unmanned aerial vehicle) in wireless networks has received widespread attention in order to extend the useful life of sensor nodes in wireless networks. In a wireless network environment, an unmanned aerial vehicle equipped with an energy transmitter can flexibly move to the vicinity of a sensor node to establish a line-of-sight (LoS) connection therewith while providing stable energy service. However, most unmanned aerial vehicles are powered by batteries, and have limited endurance, so energy conservation has been considered as one of the important indicators of future wireless network designs. At present, many works are researching the optimization problem of minimizing unmanned aerial vehicle energy consumption, and unmanned aerial vehicle flight trajectory optimization plays an important role in reducing unmanned aerial vehicle flight energy consumption. In addition, the energy consumption of the unmanned aerial vehicle can be reduced by optimizing the acceleration of the unmanned aerial vehicle.

The time of the channel between the unmanned aerial vehicle and the sensor nodes changes in the flight process, and meanwhile, the energy and information are transmitted to cause the change of the energy storage capacity and the information AoI of the sensor nodes, so that the state space is easy to increase rapidly along with the increase of the number of the sensor nodes, and the problem of energy consumption optimization of the unmanned aerial vehicle is difficult to solve by using traditional algorithms such as dynamic planning and the like.

In recent years, deep reinforcement learning (DRL, deep reinforcement learning) algorithms have attracted considerable attention in industry and academia. Deep reinforcement learning can overcome huge state and action space and solve more complex optimization problems than markov decision process (MDP, markov decision process) and reinforcement learning (RL, reinforcement learning) algorithms. In some areas, such as games, significant success has been achieved and optimization problems in Jie Juemo human-machine assisted networks have begun to apply. However, the existing system simply discretizes the flight area of the unmanned aerial vehicle, and the condition that the unmanned aerial vehicle turns at a large angle in the flight process cannot be avoided, so that the optimization result is inconsistent with the actual flight of the unmanned aerial vehicle. And only pay attention to how to save energy for unmanned aerial vehicles and sensor nodes, a wireless radio frequency energy collection technology is not introduced yet to supply power for low-power consumption sensor nodes.

Disclosure of Invention

The invention aims to provide a method and a device for designing energy consumption minimization of a unmanned aerial vehicle, and aims to solve the problems in the prior art.

The invention provides an unmanned aerial vehicle energy consumption minimization design method, which comprises the following steps:

calculating channel gains of the unmanned aerial vehicle and the sensor nodes;

According to the channel gain of the unmanned aerial vehicle and the sensor node, calculating the energy consumed by the communication of the sensor node;

according to the channel gains of the unmanned aerial vehicle and the sensor node, calculating the energy collected by the sensor node;

calculating energy consumed by the unmanned aerial vehicle;

calculating the age AoI of the sensor information according to the energy consumed by the communication of the sensor nodes and the residual capacity of the battery of the sensor nodes;

the flight track, the flight time and the information acquisition and energy collection strategies of the unmanned aerial vehicle are jointly optimized to minimize the energy consumption of the unmanned aerial vehicle, and an optimization problem model is established;

establishing an unmanned aerial vehicle control frame based on DQN;

and planning the unmanned aerial vehicle flight strategy based on the DQN according to the unmanned aerial vehicle control framework based on the DQN.

The embodiment of the invention also provides an unmanned aerial vehicle energy consumption minimization design device, which comprises: the unmanned aerial vehicle energy consumption minimization design method comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is executed by the processor to realize the steps of the unmanned aerial vehicle energy consumption minimization design method.

By adopting the embodiment of the invention, a design method is provided, unmanned aerial vehicle energy consumption minimization design of the unmanned aerial vehicle auxiliary wireless sensor network based on information age limitation is provided, and unmanned aerial vehicle energy consumption minimization is achieved by jointly optimizing sensor node information uploading and energy collection scheduling strategies and unmanned aerial vehicle flight time and trajectories under the constraint of meeting information freshness (AoI, age of information).

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an unmanned energy conservation minimization design method of an embodiment of the present invention;

FIG. 2 is a system architecture diagram of an unmanned energy conservation minimization design method of an embodiment of the present invention;

FIG. 3 is a schematic diagram of a DQN-based drone control framework of an embodiment of the present invention;

FIG. 4 is a first unmanned aerial vehicle flight trajectory diagram based on information age-constrained minimum unmanned aerial vehicle energy consumption optimization in accordance with an embodiment of the present invention;

FIG. 5 is a second unmanned aerial vehicle flight trajectory graph based on information age-constrained minimum unmanned aerial vehicle energy consumption optimization in accordance with an embodiment of the present invention;

fig. 6 is a schematic diagram of an unmanned energy consumption minimization design device according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a unmanned aerial vehicle energy consumption minimization design method, which is characterized in that an unmanned aerial vehicle auxiliary energy information collection network with limited information age is used for designing an unmanned aerial vehicle auxiliary wireless energy supply network with limited information age by taking unmanned aerial vehicle energy consumption as an index. In order to solve the complex problem, a DRL algorithm is adopted for solving.

The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise. Furthermore, the terms "mounted," "connected," "coupled," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Method embodiment

According to an embodiment of the present invention, there is provided a method for designing energy consumption minimization of an unmanned aerial vehicle, and fig. 1 is a flowchart of the method for designing energy consumption minimization of an unmanned aerial vehicle according to the embodiment of the present invention, as shown in fig. 1, the method for designing energy consumption minimization of an unmanned aerial vehicle according to the embodiment of the present invention specifically includes:

Step 101, calculating channel gains of the unmanned aerial vehicle and the sensor node;

step 102, calculating the energy consumed by the communication of the sensor node according to the channel gains of the unmanned aerial vehicle and the sensor node;

step 103, calculating energy collected by the sensor node according to the channel gains of the unmanned aerial vehicle and the sensor node;

104, calculating energy consumed by the unmanned aerial vehicle;

step 105, calculating the age AoI of the sensor information according to the energy consumed by the communication of the sensor node and the residual capacity of the battery of the sensor node;

step 106, establishing an optimization problem model by jointly optimizing the flight trajectory, flight time and information acquisition and energy collection strategies of the unmanned aerial vehicle to minimize the energy consumption of the unmanned aerial vehicle;

step 107, establishing an unmanned aerial vehicle control frame based on DQN;

and step 108, planning the unmanned aerial vehicle flight strategy based on the DQN according to the DQN-based unmanned aerial vehicle control framework.

The above processing steps are described in detail below with reference to the drawings.

As shown in fig. 2, in the unmanned aerial vehicle auxiliary wireless network scene, the unmanned aerial vehicle takes off from a departure point, in a task time T, energy is transmitted to K sensor nodes randomly distributed on the ground by using a WPT technology, then the K sensors collect information by using the collected energy and upload the collected information to the unmanned aerial vehicle, and the unmanned aerial vehicle flies to a destination after collecting the information of all the nodes.

Wherein the sensor nodes are denoted as set v= { V ₁ ,...,v _K }. For convenience, the time T is divided into N time slices, i.e., t=n·δ, where δ represents a sufficiently small time gap. Assuming that the fixed flying height of the unmanned aerial vehicle is H, the horizontal coordinate of the departure point of the unmanned aerial vehicle is q ₀ ＝[x ₀ ,y ₀ ]The horizontal coordinate of the end point is q _F ＝[x _F ,y _F ]In N (N e [1,2 ], n.]) At the moment, the flying horizontal coordinate of the unmanned plane is q (n) = [ x (n), y (n)]。

The unmanned aerial vehicle energy consumption minimization design method provided by the embodiment of the invention specifically comprises the following steps:

the first step: and calculating the channel gains of the unmanned aerial vehicle and the sensor nodes.

Because the unmanned aerial vehicle flies on a certain height, a LoS link can be established with the sensor node, meanwhile, small-scale fading is considered, and a Lais fading channel model is adopted to describe the channel. At the nth time, the unmanned plane and the kth sensor node v _k The channel gain of (2) can be expressed as

wherein d_u,k Is the unmanned plane to the sensor node v _k Is a distance of (2);

β ₀ expressed at a reference distance d ₀ Channel gain=1 meter;

the I & I is the Euclidean distance;

g _k for a small scale fading coefficient,

wherein K_R Refers to the rice factor of the unmanned aerial vehicle and sensor node channels;

is the line of sight component, is>

Is a scattering component representing a zero-mean unit variance Circularly Symmetric Complex Gaussian (CSCG) random variable.

And a second step of: the energy consumed by the sensor node communication is calculated.

In order to avoid interference, the unmanned aerial vehicle performs energy propagation and information transmission with the sensor nodes through time division multiple access (TDMA, time division multiple access), so that at each moment, the sensor nodes can only perform energy collection or information transmission, and at most, only one sensor node can upload updated information at a time during information transmission. Therefore, according to shannon's formula, at the nth time, the sensor node v _k The energy required to transmit information can be calculated by

Wherein S and B are the packet size and channel bandwidth, respectively.

And a third step of: the energy collected by the sensor nodes is calculated.

Unmanned aerial vehicle is from q ₀ During the flying process, each sensor node is charged in a broadcasting mode. Sensor node v _k Is equipped with an energy collector, the input power received by the radio frequency circuit of which can be calculated by the following formula

e _k (n)＝p _u g _u,k (n)

wherein p_u Representing the transmit power of the drone.

A nonlinear energy harvesting model is used to characterize the rf to dc conversion. Thus, at the nth time v _k Is calculated by the following formula

Where M represents the maximum harvested power at the energy harvesting receiver when the energy harvesting circuit is saturated, a and b represent constants related to the actual circuit sensitivity, resistance, and the like, respectively. Then v _k The battery storage capacity change process of (2) can be calculated by the following formula

wherein B_max Expressed as sensor node battery maximum capacity;

c (n) represents a policy of n-th time information transmission and energy collection, i.e., c (n) = {0,1,2,.. The term, K }, when c (n) = 0, it means that all nodes perform energy collection only; when c (n) =k (K e {1,2,..once., K }), the kth sensor node v is represented _k Uploading information, only v _k The battery stores more energy than is consumed by transmitting information to successfully upload the information.

Fourth step: and calculating the energy consumed by the unmanned aerial vehicle.

The energy consumption of the unmanned aerial vehicle system mainly comprises two parts of propulsion energy of the unmanned aerial vehicle and energy related to unmanned aerial vehicle communication, such as signal processing, radiation and the like, wherein the energy related to communication is far smaller than the propulsion energy, so that the energy consumption related to communication can be ignored. The flight energy consumption of a fixed wing unmanned aerial vehicle can be calculated by the following formula:

wherein c₁ and c₂ Are two constants that relate to aircraft weight, wing area, air density, etc.;

g represents gravitational acceleration;

m is the mass of the unmanned aerial vehicle;

v (n) and a (n) represent the speed and acceleration of the drone, respectively.

The propulsion energy consumption of an unmanned aerial vehicle is mainly related to the speed and acceleration of the unmanned aerial vehicle. Additionally, the flight trajectory of the drone may be calculated by:

wherein p (n) and o (n) respectively represent the flight angle and the change angle of the unmanned plane at the nth moment.

Fifth step: the sensor information age is calculated AoI.

The information age AoI is an important index for describing the freshness of the collected information, and is defined as the time elapsed since the last information collected by the unmanned aerial vehicle was generated. Suppose U _k (n) represents the last slave v of the unmanned plane at the nth time _k The time of collecting the information, the information age AoI of the information can be calculated by the following formula according to the definition of the information age AoI:

A _k (n)＝(n-U _k (n))δ。

without loss of generality, it is assumed that the information age AoI at the information generation time is 1 normalized unit of time, with the information age increasing by 1 every time a time passes. Once new information is generated, the original information will be overridden and the information age will also be reduced to 1.v _k The change in the information AoI at T can be calculated by the following formula

Sixth step: and establishing an optimization problem model.

The unmanned aerial vehicle energy consumption is minimized by jointly optimizing the flight trajectory, the flight time and the strategies of information acquisition and energy collection of the unmanned aerial vehicle. The mathematical description of the problem is as follows:

s.t.A _k (N)＜(N+1)δ,k∈{1,2,...,K},

q(0)＝q ₀ ,q(N)＝q _F ,

a(n)∈[a _min ,a _max ],

o(n)∈[o _min ,o _max ],

v(n)∈[v _min ,v _max ].

Wherein constraint A _k (N) < (n+1) δ, K e {1, 2..once, K }, ensuring that the drone gathers each sensor node information at least once for updating;

constraint

Ensuring that information AoI collected by unmanned aerial vehicle meets maximum information age limit, A _r AoI, which represents the maximum information age allowed by the system;

constraint q (0) =q ₀ ,q(N)＝q _F The unmanned aerial vehicle starts from an initial place, completes tasks through T and reaches a destination;

constraint a (n) ∈a _min ,a _max ]The acceleration of the unmanned aerial vehicle is limited, so that normal flight is ensured;

constraint o (n) ∈o _min ,o _max ]The flight angle of the unmanned aerial vehicle is limited, so that normal flight is ensured;

constraint v (n) ∈v _min ,v _max ]And the speed of the unmanned aerial vehicle is limited, so that normal flight is ensured.

wherein ,P₀ The non-convex problem of nonlinear integer optimization cannot be directly solved. Due to the independence between controlling acceleration, rotation angle, electric quantity stored by sensor nodes and AoI of collected information of unmanned aerial vehicle, the problem P can be solved ₀ Modeling is a markov decision problem with finite states and action space. However, the vast state space makes the problem difficult to solve using the traditional standard markov algorithm, whereas neural networks in deep reinforcement learning are good at extracting high-dimensional data features, and thus deep reinforcement learning methods are employed to solve the problem.

Seventh step: DQN-based unmanned aerial vehicle control framework.

As shown in fig. 3, the unmanned plane control frame based on DQN (deep Q network) comprises an environment and an intelligent agent, and is formed in the methodIn the patent DQN algorithm, the unmanned aerial vehicle is regarded as an intelligent agent, and flying motion in the air, interaction with sensor nodes such as information energy transmission and the like are regarded as environments. At the nth time in each training period, the agent needs to sense the ambient state s _n Information including the power of the sensor node, aoI of the uploaded information, etc., so as to determine the action a at the next moment according to the current environment _n After the action is executed, the intelligent agent can obtain the feedback rewards r corresponding to the environment _n And continue to observe the next time state s _n+1 . Therefore, the training process of the intelligent agent continuously interacts with the environment, the change of the state after the action is executed and the rewards of the environment feedback are observed to adjust the action strategy, and the iterative learning is repeated, so that the accumulated return is maximized, and a better action strategy is obtained.

In each iteration process of the agent, the value function of the agent needs to be calculated for a given strategy, the strategy is given according to the value function, and the DQN algorithm is a method for non-linearly approximating the value function Q (s _n ,a _n |θ _n ) Assessment state s _n Take action a _n Is the cost of the artificial neural network (ANN, artificial neural networks), the value function Q (s _n ,a _n |θ _n ) The update rule of (2) is as follows:

where β is the learning rate, γ is the discount factor, and a represents the action selected for execution.

The DQN algorithm consists of two neural networks, an online network and a target network, the initial weights of which are the same, the online network being updated every iteration, and the target network being updated at intervals. In addition, to break the correlation between data, the DQN algorithm uses an empirical pool to store historical data from which data is randomly extracted each time training is performed. Taking into account the influence of energy, aoI and position dimensions in the state space, the historical data needs to be standardized before storing the data, i

wherein x_o Is the original data, x _s Is normalized data, < >>

Sigma is the mean value of the original data _x Is the raw data variance. For any set of history data (s _n ,a _n ,r _n ,s _n+1 ) An online network may be trained by minimizing a loss function defined as follows:

where j is the number of rounds of updating the weights. Thus, the online network is updated using a gradient descent method, the gradient of which is

Eighth step: and planning the unmanned aerial vehicle flight strategy based on the DQN.

The unmanned aerial vehicle control algorithm based on the DQN is used for converting the unmanned aerial vehicle energy consumption minimization optimization problem into a Markov decision process and is described by triplets { s, a, r }, wherein s represents the state of an intelligent agent, a represents the action executed by the intelligent agent, and r represents the rewards of environmental feedback after the intelligent agent executes the action.

1) State of agent

At the nth time, agent state s _n Comprises two parts: the state information of the sensor node and the unmanned aerial vehicle at the nth moment comprises the current stored energy beta of the sensor node, the freshness A of information collected by the unmanned aerial vehicle, the geographic position, the flying speed, the flying angle, the energy consumption, the flying time and the distance from the terminal point of the unmanned aerial vehicle. Thus, at the nth time, agent state s _n Can be expressed as

s _n ＝{Β,Α,q(n),v(n),p(n),E(n),n,d(n)}，

wherein Β＝[B₁ ,B ₂ ,...,B _K ]，Α＝[A ₁ ,A ₂ ,...,A _K ]，B _k ，A _k AoI, which respectively represent the electric quantity stored by the kth sensor node and the acquired information;

d (n) is the position of the unmanned aerial vehicle at the nth moment from the end point, and is defined as d (n) = |q (n) -q _F ||；

The state space of the intelligent agent is

wherein s_n ∈S。

2) The agent performs an action

At the nth time, the agent performs action a _n Comprises three parts: acceleration a (n), rotation angle o (n) and node information uploading and energy collecting strategy c (n) of unmanned aerial vehicle, thus action a _n Can be expressed as

a _n ＝{a(n),o(n),c(n)}。

3) Reward function

The design of the reward function mainly comprises two parts, namely the distance between the unmanned aerial vehicle and the terminal point and the energy consumed by the unmanned aerial vehicle. If the current state satisfies the problem P ₀ A certain reward is given if the constraint condition is violated, and the constraint condition is punished, so that at the nth moment, the reward function of the environmental feedback after the intelligent agent performs the action can be designed as follows:

wherein

Represents the energy consumption, k, of the unmanned plane from the departure to the nth time ₁ ,k ₂ ,k ₃ Is a positive constant.

The flow of the DQN-based unmanned control algorithm is shown in the algorithm shown in table 1.

First, an experience pool is initialized to store training data and randomly initialize online network parameters, and a target network is introduced in the same network structure as the online network, and the same neural network parameters as the online network are assigned.

Then, initializing an initial state of the unmanned aerial vehicle in an Episodes training period, normalizing the processing state, and exploring the environment by using an epsilon-greedy strategy within the maximum allowable task time T of the unmanned aerial vehicle, namely randomly selecting actions with a certain exploration probability epsilon when selecting the actions.

After the action is performed, the agent can obtain the rewards of environmental feedback and the state s at the next moment _n+1 At the same time { s } _n ,a _n ,r _n ,s _n+1 The training data is normalized and stored in the experience pool D.

If the unmanned aerial vehicle task is completed, the current flight task is exited, and the next training is performed. And (3) randomly selecting b groups of training data from the experience pool by adopting a small batch method to break the correlation between the data and update the gradient according to (1) after each flight task is finished.

And finally, synchronously updating the weight of the online network to the target network every J steps.

TABLE 1

Examples of applications of embodiments of the present invention are shown in fig. 4 and 5.

Device embodiment

The embodiment of the invention provides an unmanned aerial vehicle energy consumption minimization design device, as shown in fig. 6, comprising: a memory 60, a processor 62 and a computer program stored on the memory 60 and executable on the processor 62, which when executed by the processor 62 carries out the steps as described in the method embodiments.

Embodiments of the present invention provide a computer-readable storage medium having stored thereon a program for carrying out information transmission, which when executed by the processor 62, carries out the steps as described in the method embodiments.

The computer readable storage medium of the present embodiment includes, but is not limited to: ROM, RAM, magnetic or optical disks, etc.

It should be noted that, the embodiments related to the storage medium in the present specification and the embodiments related to the energy consumption minimization design method of the drone in the present specification are based on the same inventive concept, so the specific implementation of this embodiment may refer to the implementation of the corresponding energy consumption minimization design method of the drone, and the repetition is not repeated.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In the 30 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each unit may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present specification.

One skilled in the relevant art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is by way of example only and is not intended to limit the present disclosure. Various modifications and changes may occur to those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. that fall within the spirit and principles of the present document are intended to be included within the scope of the claims of the present document.

Claims

1. An unmanned aerial vehicle energy consumption minimization design method, characterized by comprising:

calculating channel gains of the unmanned aerial vehicle and the sensor nodes;

calculating energy consumed by the unmanned aerial vehicle;

the flight track, the flight time and the information acquisition and energy collection strategies of the unmanned aerial vehicle are jointly optimized to minimize the energy consumption of the unmanned aerial vehicle, and an optimization problem model is established; the method specifically comprises the following steps: mathematical description is carried out on an optimized problem model, and a deep reinforcement learning method is adopted to carry out problem

Modeling to solve a Markov decision problem with a finite state and an action space;

establishing an unmanned aerial vehicle control frame based on DQN; the method specifically comprises the following steps: establishing an environment and an intelligent body of an unmanned aerial vehicle control frame based on DQN, regarding the unmanned aerial vehicle as an intelligent body, regarding the flying motion in the air and the interaction with the sensor nodes as the environment, such as information and energy transmission;

the first in each training periodnAt any time, the agent needs to sense the state of the surrounding environments _n The information including the electric quantity of the sensor node and the uploading information AoI is used for determining the action at the next moment according to the current environment

After the action is executed, the agent obtains the environmentCorresponding feedback rewards->

And continue to observe the next moment state +.>

The training process of the intelligent agent continuously interacts with the environment, changes of states after the action is executed and rewards of environment feedback are observed to adjust the action strategy, and iterative learning is repeated, so that accumulated returns are maximized, and a better action strategy is obtained;

Calculating a value function of a given strategy in each iteration process of the intelligent agent, giving the strategy according to the value function, and adopting a DQN algorithm to approach the value function through a neural network in a nonlinear manner

Assessment state s _n Take action a _n At the expense of->

Is the weight of the artificial neural network;

normalizing historical data prior to storing the data, i.e.

, wherein x _o Is the original data,x _s Is normalized data, < >>

For the original data mean>

For the original data variance +.>

On-line network training by minimizing loss function

Updating the online network by using a gradient descent method;

according to the DQN-based unmanned aerial vehicle control frame, an unmanned aerial vehicle flight strategy is planned based on the DQN; the method specifically comprises the following steps: by making a triplet

To describe an unmanned aerial vehicle control algorithm based on DQN, to convert unmanned aerial vehicle energy minimization optimization problem into a Markov decision process, wherein +.>

Indicating the status of the agent->

Representing actions performed by the agent,/->

Rewards representing environmental feedback after the agent performs the action; wherein in->

At each moment, the state of the agent->

Comprises two parts: in the first placenTime sensor node and status information of unmanned aerial vehicle, including the current stored energy of the sensor node +. >

Freshness of unmanned aerial vehicle collection information +.>

The geographic position, the flight speed, the flight angle, the energy consumption, the flight time and the distance from the end point of the unmanned aerial vehicle; thus, at the first

At each moment, the state of the agent->

Denoted as->

, wherein />

，

，/>

，/>

Respectively denoted as->

AoI the residual electric quantity of each sensor node and the acquired information; />

Is unmanned plane in the firstnThe position of the moment from the end point is defined as +.>

；q _F Is the horizontal coordinate of the end point, q (n) is the flight horizontal coordinate of the unmanned plane, v (n) is the flight speed, and p (n) is the +.>

The flying angle of the unmanned plane at moment, E (n) is energy consumption, and the state space of the intelligent body is +.>

, wherein />

The method comprises the steps of carrying out a first treatment on the surface of the In->

At the moment of time of day,the intelligent agent performs action +.>

Comprises three parts: acceleration of unmanned aerial vehiclea(n) Corner of the vehicleo(n) Node information upload and energy collection policiesc(n) Thus, act->

Denoted as->

The method comprises the steps of carrying out a first treatment on the surface of the The reward function includes the distance between the unmanned aerial vehicle and the terminal point and the energy consumed by the unmanned aerial vehicle, if the current state satisfies the question +.>

A certain prize is given, and if the constraint condition is violated, the constraint condition is punished.

2. The method of claim 1, wherein calculating channel gains of the drone and the sensor node specifically comprises:

Assume that the sensor nodes are recorded as a set

Time +.>

Is divided into->

Time slices, i.e.)>

, wherein />

Representing a sufficiently small time gap, the fixed altitude of flight of the unmanned aerial vehicle is +.>

The horizontal coordinates of the departure point of the unmanned plane are

The horizontal coordinate of the end point is +.>

In->

At the moment, the flying horizontal coordinate of the unmanned plane is +.>

；

According to equation 1, atnAt moment, unmanned aerial vehicle and first unmanned aerial vehicle are calculatedkIndividual sensor nodes

Channel gain of (c):

equation 1;

wherein ,

is unmanned plane to sensor node +.>

Distance of->

Expressed in reference distance->

Channel gain in meters; />

Is European distance, ++>

For small scale fading coefficients +.>

, wherein />

Is the Lesi factor of the unmanned plane and sensor node channel, +.>

Is the line of sight component, is>

Is a scattering component, represents a zero-mean unit variance circularly symmetric complex Gaussian CSCG random variable, and u represents an unmanned aerial vehicle.

3. The method of claim 2, wherein calculating the energy consumed by the sensor node in communication based on the channel gains of the drone and the sensor node comprises:

according to shannon's formula, at

At the moment, the sensor node is calculated based on equation 2 +.>

The energy required to transmit information:

equation 2;

wherein ,

and />

Packet size and channel bandwidth, respectively。

4. A method according to claim 3, characterized in that calculating the energy collected by a sensor node from the channel gains of the drone and the sensor node comprises in particular:

suppose that unmanned aerial vehicle is from

Starting from, during the flight, each sensor node is charged by broadcasting, sensor node +.>

Equipped with an energy harvester, the input power received by its radio frequency circuit is calculated according to equation 3:

equation 3;

wherein ,

representing the transmitting power of the unmanned aerial vehicle;

using a nonlinear energy harvesting model to characterize the RF-to-DC conversion, the method is calculated at equation 4nAt a moment of

Collected energy:

equation 4; />

wherein ,Mrepresents the maximum harvested power at the energy harvesting receiver when the energy harvesting circuit is saturated,aandbconstants related to the actual circuit sensitivity and resistance are shown respectively;

calculated according to equation 5

Is a battery capacity of (a)The change process comprises the following steps:

equation 5;

wherein ,

expressed as sensor node battery maximum capacity, < >>

Represent the firstnStrategies for time information transmission and energy harvesting, i.e. +.>

When->

When the time indicates that all nodes only collect energy; when->

At the time, represent the first kIndividual sensor nodes->

Upload information, only->

The energy remained by the battery is larger than the energy consumed by the information transmission to successfully upload the information.

5. The method of claim 4, wherein calculating the energy consumed by the drone comprises:

and calculating the flight energy consumption of the fixed-wing unmanned aerial vehicle according to a formula 6:

equation 6;

wherein ,

and />

Is a function of two constants, namely,gindicating the acceleration of gravity and,mis the mass of unmanned plane, < >>

and />

Respectively representing the speed and acceleration of the unmanned aerial vehicle, < ->

Is the task time;

and calculating the flight trajectory of the unmanned aerial vehicle according to a formula 7:

equation 7;

wherein ,

and />

Respectively represent +.>

The flying angle and the changing angle of the unmanned plane at any moment.

6. The method of claim 5, wherein calculating the age AoI of the sensor information based on the energy consumed by the sensor node communication and the remaining power of the sensor node battery comprises:

assume that

Is indicated at +.>

The unmanned plane is from the last time +.>

The information age AoI of the information is calculated according to the formula 8 according to the definition of the information age AoI when the information is collected:

equation 8;

assuming that the information age AoI at the information generation time is 1 normalized unit time, the information age is increased by 1 every time a time passes, and once new information is generated, the original information is covered, the information age is also reduced by 1, and the information is calculated according to the formula 9

AoI of information in->

Is characterized by comprising the following steps:

equation 9;

wherein ,

representing a sufficiently small time gap.

7. The method of claim 6, wherein establishing the optimization problem model by jointly optimizing the unmanned aerial vehicle's flight trajectory, time of flight, and strategies for information acquisition and energy collection to minimize unmanned aerial vehicle energy consumption specifically comprises:

mathematical description is carried out on the optimized problem model according to the formula 10, and the problem is subjected to a deep reinforcement learning method

Modeling as a Markov decision problem with finite state and action space:

equation 10;

wherein ,A _r AoI, which represents the maximum information age allowed by the system; constraint

The unmanned aerial vehicle is ensured to acquire the information of each sensor node at least once for updating; constraint->

Ensuring that the information AoI collected by the unmanned aerial vehicle meets the maximum information age limit; constraint->

Indicating that the unmanned plane starts from the initial place and passes +.>

Completing the task and reaching the destination; constraint->

The acceleration of the unmanned aerial vehicle is limited, so that normal flight is ensured; constraint->

The flight angle of the unmanned aerial vehicle is limited, so that normal flight is ensured; constraint->

And the speed of the unmanned aerial vehicle is limited, so that normal flight is ensured.

8. The method of claim 7, wherein establishing a DQN-based unmanned aerial vehicle control framework comprises:

Establishing an environment and an intelligent body of an unmanned aerial vehicle control frame based on DQN, regarding the unmanned aerial vehicle as an intelligent body, regarding the flying motion in the air and the interaction with the sensor nodes as the environment, such as information and energy transmission;

After executing the action, the agent gets the feedback rewards corresponding to the environment +.>

And continue to observe the next moment state +.>

Assessment state s _n Take action a _n At the expense of->

Is the weight of the artificial neural network, the value function +.>

The update rule of (2) is as shown in formula 11:

Equation 11;

wherein ,

for learning rate->

Is a discount factor that is used to determine the discount,arepresenting the action selected for execution;

normalizing historical data prior to storing the data, i.e.

, wherein x _o Is the original data,x _s Is normalized data, < >>

For the original data mean>

For the original data variance +.>

The online network is trained by minimizing a loss function whose definition is shown in equation 12:

equation 12;

wherein ,

the number of rounds for updating the weights;

the online network is updated using a gradient descent method, the gradient of which is shown in equation 13:

equation 13.

9. The method of claim 8, wherein planning an unmanned aerial vehicle flight strategy based on DQN in accordance with the DQN-based unmanned aerial vehicle control framework comprises:

by making a triplet

Indicating the status of the agent->

Representing actions performed by the agent,/->

At each moment, the state of the agent->

Freshness of unmanned aerial vehicle collection information +.>

The geographic position, the flight speed, the flight angle, the energy consumption, the flight time and the distance from the end point of the unmanned aerial vehicle; thus, in->

At each moment, the state of the agent->

Denoted as->

, wherein

，/>

，/>

，/>

Respectively denoted as->

， q _F Is the horizontal coordinate of the end point, q (n) is the flight horizontal coordinate of the unmanned plane, v (n) is the flight speed, and p (n) is the +.>

The flying angle of the unmanned plane at moment, E (n) is energy consumption; the state space of the intelligent agent is->

, wherein />

At each moment, the agent performs an action +.>

Denoted as->

If the constraint condition is violated, the constraint condition is penalized, and therefore, the constraint condition is +.>

At each moment, the rewarding function of the environmental feedback after the agent performs the action is as shown in formula 14:

equation 14;

wherein ,

indicating the departure to the +.>

Energy consumption from time of day->

Is a positive constant;

the unmanned aerial vehicle control algorithm based on DQN specifically comprises:

the training data is stored through an initialization experience pool, online network parameters are initialized randomly, a target network is introduced in the same network structure as the online network, and the neural network parameters which are the same as the online network are assigned;

at the position of

In the training period, initializing the initial state of the unmanned aerial vehicle, normalizing the processing state, and enabling the maximum allowable task time of the unmanned aerial vehicle to be +.>

Internal use->

Greedy strategy exploration environment, i.e. with a certain exploration probability when selecting actions>

Randomly selecting actions;

after the action is executed, the agent gets the rewards of environmental feedback and the state at the next moment

At the same time will->

The training data is normalized and then stored in an experience pool D; />

If the unmanned aerial vehicle task is completed, exiting the current flight task, entering the next round of training, finishing each flight task, and randomly selecting from an experience pool by adopting a small batch method

Training data to break the correlation between the data and update the gradient according to the agent state;

every other interval

And synchronously updating the weight of the online network to the target network.

10. An unmanned aerial vehicle energy minimisation design apparatus, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the unmanned aerial vehicle energy minimisation design method as claimed in any one of claims 1 to 9.