CN113268077B - Unmanned aerial vehicle energy consumption minimization design method and device - Google Patents

Unmanned aerial vehicle energy consumption minimization design method and device Download PDF

Info

Publication number
CN113268077B
CN113268077B CN202110397120.XA CN202110397120A CN113268077B CN 113268077 B CN113268077 B CN 113268077B CN 202110397120 A CN202110397120 A CN 202110397120A CN 113268077 B CN113268077 B CN 113268077B
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
information
energy
flight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110397120.XA
Other languages
Chinese (zh)
Other versions
CN113268077A (en
Inventor
张煜
熊轲
吴鹏
单葆国
谭显东
唐伟
王成洁
谭清坤
刘小聪
贾跃龙
马捷
张玉琢
吴姗姗
张成龙
王向
张莉莉
刘青
姚力
汲国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
State Grid Energy Research Institute Co Ltd
Original Assignee
Beijing Jiaotong University
State Grid Energy Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University, State Grid Energy Research Institute Co Ltd filed Critical Beijing Jiaotong University
Priority to CN202110397120.XA priority Critical patent/CN113268077B/en
Publication of CN113268077A publication Critical patent/CN113268077A/en
Application granted granted Critical
Publication of CN113268077B publication Critical patent/CN113268077B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/106Change initiated in response to external conditions, e.g. avoidance of elevated terrain or of no-fly zones
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an unmanned aerial vehicle energy consumption minimization design method and device, wherein the method comprises the following steps: calculating channel gains of the unmanned aerial vehicle and the sensor nodes; according to the channel gain of the unmanned aerial vehicle and the sensor node, calculating the energy consumed by the communication of the sensor node; according to the channel gains of the unmanned aerial vehicle and the sensor node, calculating the energy collected by the sensor node; calculating energy consumed by the unmanned aerial vehicle; calculating the age AoI of the sensor information according to the energy consumed by the communication of the sensor nodes and the residual capacity of the battery of the sensor nodes; the flight track, the flight time and the information acquisition and energy collection strategies of the unmanned aerial vehicle are jointly optimized to minimize the energy consumption of the unmanned aerial vehicle, and an optimization problem model is established; establishing an unmanned aerial vehicle control frame based on DQN; and planning the unmanned aerial vehicle flight strategy based on the DQN according to the unmanned aerial vehicle control framework based on the DQN.

Description

Unmanned aerial vehicle energy consumption minimization design method and device
Technical Field
The invention relates to the technical field of network optimization design, in particular to an energy consumption minimization design method and device for a unmanned aerial vehicle.
Background
With rapid deployment of 5G, applications such as Virtual Reality (VR), augmented Reality (AR, augmented Reality), unmanned, smart medicine, etc. are being developed, and these applications require ultra-high reliability, low latency communication, are also very sensitive to information freshness, and conventional network indexes such as throughput, delay, etc. cannot accurately characterize information freshness in a network.
In order to accurately characterize the freshness of information, the academia proposes the concept of information age (AoI, age of information), which is defined as the time elapsed since the generation of the data packet at the source node, which is received most recently by the destination node. AoI is one of the key performance indicators in a sensor network, in order to ensure the freshness of information in the network, the sensor nodes need to frequently transmit the latest collected information, and therefore a lot of energy is consumed, and frequent battery replacement or charging is generally inconvenient and expensive, and is more difficult to realize in severe environments.
Wireless power transfer (WPT, wireless power transfer) technology can provide a stable power source for sensor nodes, thereby extending system run time. Therefore, there have been many related studies on AoI systems based on rf energy harvesting. However, most of these works use a linear energy harvesting model, i.e. the harvested energy increases linearly with the input power of the received rf signal, whereas in practice, the nonlinear characteristics of the circuit caused by the diode and other elements cause the input/output of the energy harvesting circuit to exhibit a high degree of nonlinear characteristics, so that the nonlinear energy harvesting model should be considered by a practical system.
In addition, the introduction of unmanned aerial vehicles (UAV, unmanned aerial vehicle) in wireless networks has received widespread attention in order to extend the useful life of sensor nodes in wireless networks. In a wireless network environment, an unmanned aerial vehicle equipped with an energy transmitter can flexibly move to the vicinity of a sensor node to establish a line-of-sight (LoS) connection therewith while providing stable energy service. However, most unmanned aerial vehicles are powered by batteries, and have limited endurance, so energy conservation has been considered as one of the important indicators of future wireless network designs. At present, many works are researching the optimization problem of minimizing unmanned aerial vehicle energy consumption, and unmanned aerial vehicle flight trajectory optimization plays an important role in reducing unmanned aerial vehicle flight energy consumption. In addition, the energy consumption of the unmanned aerial vehicle can be reduced by optimizing the acceleration of the unmanned aerial vehicle.
The time of the channel between the unmanned aerial vehicle and the sensor nodes changes in the flight process, and meanwhile, the energy and information are transmitted to cause the change of the energy storage capacity and the information AoI of the sensor nodes, so that the state space is easy to increase rapidly along with the increase of the number of the sensor nodes, and the problem of energy consumption optimization of the unmanned aerial vehicle is difficult to solve by using traditional algorithms such as dynamic planning and the like.
In recent years, deep reinforcement learning (DRL, deep reinforcement learning) algorithms have attracted considerable attention in industry and academia. Deep reinforcement learning can overcome huge state and action space and solve more complex optimization problems than markov decision process (MDP, markov decision process) and reinforcement learning (RL, reinforcement learning) algorithms. In some areas, such as games, significant success has been achieved and optimization problems in Jie Juemo human-machine assisted networks have begun to apply. However, the existing system simply discretizes the flight area of the unmanned aerial vehicle, and the condition that the unmanned aerial vehicle turns at a large angle in the flight process cannot be avoided, so that the optimization result is inconsistent with the actual flight of the unmanned aerial vehicle. And only pay attention to how to save energy for unmanned aerial vehicles and sensor nodes, a wireless radio frequency energy collection technology is not introduced yet to supply power for low-power consumption sensor nodes.
Disclosure of Invention
The invention aims to provide a method and a device for designing energy consumption minimization of a unmanned aerial vehicle, and aims to solve the problems in the prior art.
The invention provides an unmanned aerial vehicle energy consumption minimization design method, which comprises the following steps:
calculating channel gains of the unmanned aerial vehicle and the sensor nodes;
According to the channel gain of the unmanned aerial vehicle and the sensor node, calculating the energy consumed by the communication of the sensor node;
according to the channel gains of the unmanned aerial vehicle and the sensor node, calculating the energy collected by the sensor node;
calculating energy consumed by the unmanned aerial vehicle;
calculating the age AoI of the sensor information according to the energy consumed by the communication of the sensor nodes and the residual capacity of the battery of the sensor nodes;
the flight track, the flight time and the information acquisition and energy collection strategies of the unmanned aerial vehicle are jointly optimized to minimize the energy consumption of the unmanned aerial vehicle, and an optimization problem model is established;
establishing an unmanned aerial vehicle control frame based on DQN;
and planning the unmanned aerial vehicle flight strategy based on the DQN according to the unmanned aerial vehicle control framework based on the DQN.
The embodiment of the invention also provides an unmanned aerial vehicle energy consumption minimization design device, which comprises: the unmanned aerial vehicle energy consumption minimization design method comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is executed by the processor to realize the steps of the unmanned aerial vehicle energy consumption minimization design method.
By adopting the embodiment of the invention, a design method is provided, unmanned aerial vehicle energy consumption minimization design of the unmanned aerial vehicle auxiliary wireless sensor network based on information age limitation is provided, and unmanned aerial vehicle energy consumption minimization is achieved by jointly optimizing sensor node information uploading and energy collection scheduling strategies and unmanned aerial vehicle flight time and trajectories under the constraint of meeting information freshness (AoI, age of information).
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an unmanned energy conservation minimization design method of an embodiment of the present invention;
FIG. 2 is a system architecture diagram of an unmanned energy conservation minimization design method of an embodiment of the present invention;
FIG. 3 is a schematic diagram of a DQN-based drone control framework of an embodiment of the present invention;
FIG. 4 is a first unmanned aerial vehicle flight trajectory diagram based on information age-constrained minimum unmanned aerial vehicle energy consumption optimization in accordance with an embodiment of the present invention;
FIG. 5 is a second unmanned aerial vehicle flight trajectory graph based on information age-constrained minimum unmanned aerial vehicle energy consumption optimization in accordance with an embodiment of the present invention;
fig. 6 is a schematic diagram of an unmanned energy consumption minimization design device according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a unmanned aerial vehicle energy consumption minimization design method, which is characterized in that an unmanned aerial vehicle auxiliary energy information collection network with limited information age is used for designing an unmanned aerial vehicle auxiliary wireless energy supply network with limited information age by taking unmanned aerial vehicle energy consumption as an index. In order to solve the complex problem, a DRL algorithm is adopted for solving.
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise. Furthermore, the terms "mounted," "connected," "coupled," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Method embodiment
According to an embodiment of the present invention, there is provided a method for designing energy consumption minimization of an unmanned aerial vehicle, and fig. 1 is a flowchart of the method for designing energy consumption minimization of an unmanned aerial vehicle according to the embodiment of the present invention, as shown in fig. 1, the method for designing energy consumption minimization of an unmanned aerial vehicle according to the embodiment of the present invention specifically includes:
Step 101, calculating channel gains of the unmanned aerial vehicle and the sensor node;
step 102, calculating the energy consumed by the communication of the sensor node according to the channel gains of the unmanned aerial vehicle and the sensor node;
step 103, calculating energy collected by the sensor node according to the channel gains of the unmanned aerial vehicle and the sensor node;
104, calculating energy consumed by the unmanned aerial vehicle;
step 105, calculating the age AoI of the sensor information according to the energy consumed by the communication of the sensor node and the residual capacity of the battery of the sensor node;
step 106, establishing an optimization problem model by jointly optimizing the flight trajectory, flight time and information acquisition and energy collection strategies of the unmanned aerial vehicle to minimize the energy consumption of the unmanned aerial vehicle;
step 107, establishing an unmanned aerial vehicle control frame based on DQN;
and step 108, planning the unmanned aerial vehicle flight strategy based on the DQN according to the DQN-based unmanned aerial vehicle control framework.
The above processing steps are described in detail below with reference to the drawings.
As shown in fig. 2, in the unmanned aerial vehicle auxiliary wireless network scene, the unmanned aerial vehicle takes off from a departure point, in a task time T, energy is transmitted to K sensor nodes randomly distributed on the ground by using a WPT technology, then the K sensors collect information by using the collected energy and upload the collected information to the unmanned aerial vehicle, and the unmanned aerial vehicle flies to a destination after collecting the information of all the nodes.
Wherein the sensor nodes are denoted as set v= { V 1 ,...,v K }. For convenience, the time T is divided into N time slices, i.e., t=n·δ, where δ represents a sufficiently small time gap. Assuming that the fixed flying height of the unmanned aerial vehicle is H, the horizontal coordinate of the departure point of the unmanned aerial vehicle is q 0 =[x 0 ,y 0 ]The horizontal coordinate of the end point is q F =[x F ,y F ]In N (N e [1,2 ], n.]) At the moment, the flying horizontal coordinate of the unmanned plane is q (n) = [ x (n), y (n)]。
The unmanned aerial vehicle energy consumption minimization design method provided by the embodiment of the invention specifically comprises the following steps:
the first step: and calculating the channel gains of the unmanned aerial vehicle and the sensor nodes.
Because the unmanned aerial vehicle flies on a certain height, a LoS link can be established with the sensor node, meanwhile, small-scale fading is considered, and a Lais fading channel model is adopted to describe the channel. At the nth time, the unmanned plane and the kth sensor node v k The channel gain of (2) can be expressed as
Figure BDA0003018962220000061
wherein du,k Is the unmanned plane to the sensor node v k Is a distance of (2);
β 0 expressed at a reference distance d 0 Channel gain=1 meter;
the I & I is the Euclidean distance;
g k for a small scale fading coefficient,
Figure BDA0003018962220000071
wherein KR Refers to the rice factor of the unmanned aerial vehicle and sensor node channels;
Figure BDA0003018962220000072
is the line of sight component, is>
Figure BDA0003018962220000073
Is a scattering component representing a zero-mean unit variance Circularly Symmetric Complex Gaussian (CSCG) random variable.
And a second step of: the energy consumed by the sensor node communication is calculated.
In order to avoid interference, the unmanned aerial vehicle performs energy propagation and information transmission with the sensor nodes through time division multiple access (TDMA, time division multiple access), so that at each moment, the sensor nodes can only perform energy collection or information transmission, and at most, only one sensor node can upload updated information at a time during information transmission. Therefore, according to shannon's formula, at the nth time, the sensor node v k The energy required to transmit information can be calculated by
Figure BDA0003018962220000074
Wherein S and B are the packet size and channel bandwidth, respectively.
And a third step of: the energy collected by the sensor nodes is calculated.
Unmanned aerial vehicle is from q 0 During the flying process, each sensor node is charged in a broadcasting mode. Sensor node v k Is equipped with an energy collector, the input power received by the radio frequency circuit of which can be calculated by the following formula
e k (n)=p u g u,k (n)
wherein pu Representing the transmit power of the drone.
A nonlinear energy harvesting model is used to characterize the rf to dc conversion. Thus, at the nth time v k Is calculated by the following formula
Figure BDA0003018962220000075
Where M represents the maximum harvested power at the energy harvesting receiver when the energy harvesting circuit is saturated, a and b represent constants related to the actual circuit sensitivity, resistance, and the like, respectively. Then v k The battery storage capacity change process of (2) can be calculated by the following formula
Figure BDA0003018962220000081
wherein Bmax Expressed as sensor node battery maximum capacity;
c (n) represents a policy of n-th time information transmission and energy collection, i.e., c (n) = {0,1,2,.. The term, K }, when c (n) = 0, it means that all nodes perform energy collection only; when c (n) =k (K e {1,2,..once., K }), the kth sensor node v is represented k Uploading information, only v k The battery stores more energy than is consumed by transmitting information to successfully upload the information.
Fourth step: and calculating the energy consumed by the unmanned aerial vehicle.
The energy consumption of the unmanned aerial vehicle system mainly comprises two parts of propulsion energy of the unmanned aerial vehicle and energy related to unmanned aerial vehicle communication, such as signal processing, radiation and the like, wherein the energy related to communication is far smaller than the propulsion energy, so that the energy consumption related to communication can be ignored. The flight energy consumption of a fixed wing unmanned aerial vehicle can be calculated by the following formula:
Figure BDA0003018962220000082
wherein c1 and c2 Are two constants that relate to aircraft weight, wing area, air density, etc.;
g represents gravitational acceleration;
m is the mass of the unmanned aerial vehicle;
v (n) and a (n) represent the speed and acceleration of the drone, respectively.
The propulsion energy consumption of an unmanned aerial vehicle is mainly related to the speed and acceleration of the unmanned aerial vehicle. Additionally, the flight trajectory of the drone may be calculated by:
Figure BDA0003018962220000083
wherein p (n) and o (n) respectively represent the flight angle and the change angle of the unmanned plane at the nth moment.
Fifth step: the sensor information age is calculated AoI.
The information age AoI is an important index for describing the freshness of the collected information, and is defined as the time elapsed since the last information collected by the unmanned aerial vehicle was generated. Suppose U k (n) represents the last slave v of the unmanned plane at the nth time k The time of collecting the information, the information age AoI of the information can be calculated by the following formula according to the definition of the information age AoI:
A k (n)=(n-U k (n))δ。
without loss of generality, it is assumed that the information age AoI at the information generation time is 1 normalized unit of time, with the information age increasing by 1 every time a time passes. Once new information is generated, the original information will be overridden and the information age will also be reduced to 1.v k The change in the information AoI at T can be calculated by the following formula
Figure BDA0003018962220000091
Sixth step: and establishing an optimization problem model.
The unmanned aerial vehicle energy consumption is minimized by jointly optimizing the flight trajectory, the flight time and the strategies of information acquisition and energy collection of the unmanned aerial vehicle. The mathematical description of the problem is as follows:
Figure BDA0003018962220000092
s.t.A k (N)<(N+1)δ,k∈{1,2,...,K},
Figure BDA0003018962220000093
q(0)=q 0 ,q(N)=q F ,
a(n)∈[a min ,a max ],
o(n)∈[o min ,o max ],
v(n)∈[v min ,v max ].
Wherein constraint A k (N) < (n+1) δ, K e {1, 2..once, K }, ensuring that the drone gathers each sensor node information at least once for updating;
constraint
Figure BDA0003018962220000094
Ensuring that information AoI collected by unmanned aerial vehicle meets maximum information age limit, A r AoI, which represents the maximum information age allowed by the system;
constraint q (0) =q 0 ,q(N)=q F The unmanned aerial vehicle starts from an initial place, completes tasks through T and reaches a destination;
constraint a (n) ∈a min ,a max ]The acceleration of the unmanned aerial vehicle is limited, so that normal flight is ensured;
constraint o (n) ∈o min ,o max ]The flight angle of the unmanned aerial vehicle is limited, so that normal flight is ensured;
constraint v (n) ∈v min ,v max ]And the speed of the unmanned aerial vehicle is limited, so that normal flight is ensured.
wherein ,P0 The non-convex problem of nonlinear integer optimization cannot be directly solved. Due to the independence between controlling acceleration, rotation angle, electric quantity stored by sensor nodes and AoI of collected information of unmanned aerial vehicle, the problem P can be solved 0 Modeling is a markov decision problem with finite states and action space. However, the vast state space makes the problem difficult to solve using the traditional standard markov algorithm, whereas neural networks in deep reinforcement learning are good at extracting high-dimensional data features, and thus deep reinforcement learning methods are employed to solve the problem.
Seventh step: DQN-based unmanned aerial vehicle control framework.
As shown in fig. 3, the unmanned plane control frame based on DQN (deep Q network) comprises an environment and an intelligent agent, and is formed in the methodIn the patent DQN algorithm, the unmanned aerial vehicle is regarded as an intelligent agent, and flying motion in the air, interaction with sensor nodes such as information energy transmission and the like are regarded as environments. At the nth time in each training period, the agent needs to sense the ambient state s n Information including the power of the sensor node, aoI of the uploaded information, etc., so as to determine the action a at the next moment according to the current environment n After the action is executed, the intelligent agent can obtain the feedback rewards r corresponding to the environment n And continue to observe the next time state s n+1 . Therefore, the training process of the intelligent agent continuously interacts with the environment, the change of the state after the action is executed and the rewards of the environment feedback are observed to adjust the action strategy, and the iterative learning is repeated, so that the accumulated return is maximized, and a better action strategy is obtained.
In each iteration process of the agent, the value function of the agent needs to be calculated for a given strategy, the strategy is given according to the value function, and the DQN algorithm is a method for non-linearly approximating the value function Q (s n ,a nn ) Assessment state s n Take action a n Is the cost of the artificial neural network (ANN, artificial neural networks), the value function Q (s n ,a nn ) The update rule of (2) is as follows:
Figure BDA0003018962220000101
where β is the learning rate, γ is the discount factor, and a represents the action selected for execution.
The DQN algorithm consists of two neural networks, an online network and a target network, the initial weights of which are the same, the online network being updated every iteration, and the target network being updated at intervals. In addition, to break the correlation between data, the DQN algorithm uses an empirical pool to store historical data from which data is randomly extracted each time training is performed. Taking into account the influence of energy, aoI and position dimensions in the state space, the historical data needs to be standardized before storing the data, i
Figure BDA0003018962220000111
wherein xo Is the original data, x s Is normalized data, < >>
Figure BDA0003018962220000112
Sigma is the mean value of the original data x Is the raw data variance. For any set of history data (s n ,a n ,r n ,s n+1 ) An online network may be trained by minimizing a loss function defined as follows:
Figure BDA0003018962220000113
where j is the number of rounds of updating the weights. Thus, the online network is updated using a gradient descent method, the gradient of which is
Figure BDA0003018962220000114
Eighth step: and planning the unmanned aerial vehicle flight strategy based on the DQN.
The unmanned aerial vehicle control algorithm based on the DQN is used for converting the unmanned aerial vehicle energy consumption minimization optimization problem into a Markov decision process and is described by triplets { s, a, r }, wherein s represents the state of an intelligent agent, a represents the action executed by the intelligent agent, and r represents the rewards of environmental feedback after the intelligent agent executes the action.
1) State of agent
At the nth time, agent state s n Comprises two parts: the state information of the sensor node and the unmanned aerial vehicle at the nth moment comprises the current stored energy beta of the sensor node, the freshness A of information collected by the unmanned aerial vehicle, the geographic position, the flying speed, the flying angle, the energy consumption, the flying time and the distance from the terminal point of the unmanned aerial vehicle. Thus, at the nth time, agent state s n Can be expressed as
s n ={Β,Α,q(n),v(n),p(n),E(n),n,d(n)},
wherein Β=[B1 ,B 2 ,...,B K ],Α=[A 1 ,A 2 ,...,A K ],B k ,A k AoI, which respectively represent the electric quantity stored by the kth sensor node and the acquired information;
d (n) is the position of the unmanned aerial vehicle at the nth moment from the end point, and is defined as d (n) = |q (n) -q F ||;
The state space of the intelligent agent is
Figure BDA0003018962220000115
wherein sn ∈S。
2) The agent performs an action
At the nth time, the agent performs action a n Comprises three parts: acceleration a (n), rotation angle o (n) and node information uploading and energy collecting strategy c (n) of unmanned aerial vehicle, thus action a n Can be expressed as
a n ={a(n),o(n),c(n)}。
3) Reward function
The design of the reward function mainly comprises two parts, namely the distance between the unmanned aerial vehicle and the terminal point and the energy consumed by the unmanned aerial vehicle. If the current state satisfies the problem P 0 A certain reward is given if the constraint condition is violated, and the constraint condition is punished, so that at the nth moment, the reward function of the environmental feedback after the intelligent agent performs the action can be designed as follows:
Figure BDA0003018962220000121
wherein
Figure BDA0003018962220000122
Represents the energy consumption, k, of the unmanned plane from the departure to the nth time 1 ,k 2 ,k 3 Is a positive constant.
The flow of the DQN-based unmanned control algorithm is shown in the algorithm shown in table 1.
First, an experience pool is initialized to store training data and randomly initialize online network parameters, and a target network is introduced in the same network structure as the online network, and the same neural network parameters as the online network are assigned.
Then, initializing an initial state of the unmanned aerial vehicle in an Episodes training period, normalizing the processing state, and exploring the environment by using an epsilon-greedy strategy within the maximum allowable task time T of the unmanned aerial vehicle, namely randomly selecting actions with a certain exploration probability epsilon when selecting the actions.
After the action is performed, the agent can obtain the rewards of environmental feedback and the state s at the next moment n+1 At the same time { s } n ,a n ,r n ,s n+1 The training data is normalized and stored in the experience pool D.
If the unmanned aerial vehicle task is completed, the current flight task is exited, and the next training is performed. And (3) randomly selecting b groups of training data from the experience pool by adopting a small batch method to break the correlation between the data and update the gradient according to (1) after each flight task is finished.
And finally, synchronously updating the weight of the online network to the target network every J steps.
TABLE 1
Figure BDA0003018962220000131
Examples of applications of embodiments of the present invention are shown in fig. 4 and 5.
Device embodiment
The embodiment of the invention provides an unmanned aerial vehicle energy consumption minimization design device, as shown in fig. 6, comprising: a memory 60, a processor 62 and a computer program stored on the memory 60 and executable on the processor 62, which when executed by the processor 62 carries out the steps as described in the method embodiments.
Embodiments of the present invention provide a computer-readable storage medium having stored thereon a program for carrying out information transmission, which when executed by the processor 62, carries out the steps as described in the method embodiments.
The computer readable storage medium of the present embodiment includes, but is not limited to: ROM, RAM, magnetic or optical disks, etc.
It should be noted that, the embodiments related to the storage medium in the present specification and the embodiments related to the energy consumption minimization design method of the drone in the present specification are based on the same inventive concept, so the specific implementation of this embodiment may refer to the implementation of the corresponding energy consumption minimization design method of the drone, and the repetition is not repeated.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In the 30 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each unit may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present specification.
One skilled in the relevant art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing description is by way of example only and is not intended to limit the present disclosure. Various modifications and changes may occur to those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. that fall within the spirit and principles of the present document are intended to be included within the scope of the claims of the present document.

Claims (10)

1. An unmanned aerial vehicle energy consumption minimization design method, characterized by comprising:
calculating channel gains of the unmanned aerial vehicle and the sensor nodes;
according to the channel gain of the unmanned aerial vehicle and the sensor node, calculating the energy consumed by the communication of the sensor node;
according to the channel gains of the unmanned aerial vehicle and the sensor node, calculating the energy collected by the sensor node;
calculating energy consumed by the unmanned aerial vehicle;
Calculating the age AoI of the sensor information according to the energy consumed by the communication of the sensor nodes and the residual capacity of the battery of the sensor nodes;
the flight track, the flight time and the information acquisition and energy collection strategies of the unmanned aerial vehicle are jointly optimized to minimize the energy consumption of the unmanned aerial vehicle, and an optimization problem model is established; the method specifically comprises the following steps: mathematical description is carried out on an optimized problem model, and a deep reinforcement learning method is adopted to carry out problem
Figure QLYQS_1
Modeling to solve a Markov decision problem with a finite state and an action space;
establishing an unmanned aerial vehicle control frame based on DQN; the method specifically comprises the following steps: establishing an environment and an intelligent body of an unmanned aerial vehicle control frame based on DQN, regarding the unmanned aerial vehicle as an intelligent body, regarding the flying motion in the air and the interaction with the sensor nodes as the environment, such as information and energy transmission;
the first in each training periodnAt any time, the agent needs to sense the state of the surrounding environments n The information including the electric quantity of the sensor node and the uploading information AoI is used for determining the action at the next moment according to the current environment
Figure QLYQS_2
After the action is executed, the agent obtains the environmentCorresponding feedback rewards->
Figure QLYQS_3
And continue to observe the next moment state +.>
Figure QLYQS_4
The training process of the intelligent agent continuously interacts with the environment, changes of states after the action is executed and rewards of environment feedback are observed to adjust the action strategy, and iterative learning is repeated, so that accumulated returns are maximized, and a better action strategy is obtained;
Calculating a value function of a given strategy in each iteration process of the intelligent agent, giving the strategy according to the value function, and adopting a DQN algorithm to approach the value function through a neural network in a nonlinear manner
Figure QLYQS_5
Assessment state s n Take action a n At the expense of->
Figure QLYQS_6
Is the weight of the artificial neural network;
normalizing historical data prior to storing the data, i.e.
Figure QLYQS_7
, wherein x o Is the original data,x s Is normalized data, < >>
Figure QLYQS_8
For the original data mean>
Figure QLYQS_9
For the original data variance +.>
Figure QLYQS_10
On-line network training by minimizing loss function
Updating the online network by using a gradient descent method;
according to the DQN-based unmanned aerial vehicle control frame, an unmanned aerial vehicle flight strategy is planned based on the DQN; the method specifically comprises the following steps: by making a triplet
Figure QLYQS_13
To describe an unmanned aerial vehicle control algorithm based on DQN, to convert unmanned aerial vehicle energy minimization optimization problem into a Markov decision process, wherein +.>
Figure QLYQS_22
Indicating the status of the agent->
Figure QLYQS_25
Representing actions performed by the agent,/->
Figure QLYQS_14
Rewards representing environmental feedback after the agent performs the action; wherein in->
Figure QLYQS_19
At each moment, the state of the agent->
Figure QLYQS_27
Comprises two parts: in the first placenTime sensor node and status information of unmanned aerial vehicle, including the current stored energy of the sensor node +. >
Figure QLYQS_34
Freshness of unmanned aerial vehicle collection information +.>
Figure QLYQS_15
The geographic position, the flight speed, the flight angle, the energy consumption, the flight time and the distance from the end point of the unmanned aerial vehicle; thus, at the first
Figure QLYQS_23
At each moment, the state of the agent->
Figure QLYQS_24
Denoted as->
Figure QLYQS_31
, wherein />
Figure QLYQS_17
Figure QLYQS_18
,/>
Figure QLYQS_29
,/>
Figure QLYQS_33
Respectively denoted as->
Figure QLYQS_12
AoI the residual electric quantity of each sensor node and the acquired information; />
Figure QLYQS_21
Is unmanned plane in the firstnThe position of the moment from the end point is defined as +.>
Figure QLYQS_26
;q F Is the horizontal coordinate of the end point, q (n) is the flight horizontal coordinate of the unmanned plane, v (n) is the flight speed, and p (n) is the +.>
Figure QLYQS_30
The flying angle of the unmanned plane at moment, E (n) is energy consumption, and the state space of the intelligent body is +.>
Figure QLYQS_11
, wherein />
Figure QLYQS_32
The method comprises the steps of carrying out a first treatment on the surface of the In->
Figure QLYQS_35
At the moment of time of day,the intelligent agent performs action +.>
Figure QLYQS_36
Comprises three parts: acceleration of unmanned aerial vehiclea(n) Corner of the vehicleo(n) Node information upload and energy collection policiesc(n) Thus, act->
Figure QLYQS_16
Denoted as->
Figure QLYQS_20
The method comprises the steps of carrying out a first treatment on the surface of the The reward function includes the distance between the unmanned aerial vehicle and the terminal point and the energy consumed by the unmanned aerial vehicle, if the current state satisfies the question +.>
Figure QLYQS_28
A certain prize is given, and if the constraint condition is violated, the constraint condition is punished.
2. The method of claim 1, wherein calculating channel gains of the drone and the sensor node specifically comprises:
Assume that the sensor nodes are recorded as a set
Figure QLYQS_39
Time +.>
Figure QLYQS_42
Is divided into->
Figure QLYQS_45
Time slices, i.e.)>
Figure QLYQS_38
, wherein />
Figure QLYQS_40
Representing a sufficiently small time gap, the fixed altitude of flight of the unmanned aerial vehicle is +.>
Figure QLYQS_43
The horizontal coordinates of the departure point of the unmanned plane are
Figure QLYQS_46
The horizontal coordinate of the end point is +.>
Figure QLYQS_37
In->
Figure QLYQS_41
At the moment, the flying horizontal coordinate of the unmanned plane is +.>
Figure QLYQS_44
According to equation 1, atnAt moment, unmanned aerial vehicle and first unmanned aerial vehicle are calculatedkIndividual sensor nodes
Figure QLYQS_47
Channel gain of (c):
Figure QLYQS_48
equation 1;
wherein ,
Figure QLYQS_50
is unmanned plane to sensor node +.>
Figure QLYQS_53
Distance of->
Figure QLYQS_56
Expressed in reference distance->
Figure QLYQS_51
Channel gain in meters; />
Figure QLYQS_52
Is European distance, ++>
Figure QLYQS_54
For small scale fading coefficients +.>
Figure QLYQS_57
, wherein />
Figure QLYQS_49
Is the Lesi factor of the unmanned plane and sensor node channel, +.>
Figure QLYQS_55
Is the line of sight component, is>
Figure QLYQS_58
Is a scattering component, represents a zero-mean unit variance circularly symmetric complex Gaussian CSCG random variable, and u represents an unmanned aerial vehicle.
3. The method of claim 2, wherein calculating the energy consumed by the sensor node in communication based on the channel gains of the drone and the sensor node comprises:
according to shannon's formula, at
Figure QLYQS_59
At the moment, the sensor node is calculated based on equation 2 +.>
Figure QLYQS_60
The energy required to transmit information:
Figure QLYQS_61
equation 2;
wherein ,
Figure QLYQS_62
and />
Figure QLYQS_63
Packet size and channel bandwidth, respectively。
4. A method according to claim 3, characterized in that calculating the energy collected by a sensor node from the channel gains of the drone and the sensor node comprises in particular:
suppose that unmanned aerial vehicle is from
Figure QLYQS_64
Starting from, during the flight, each sensor node is charged by broadcasting, sensor node +.>
Figure QLYQS_65
Equipped with an energy harvester, the input power received by its radio frequency circuit is calculated according to equation 3:
Figure QLYQS_66
equation 3;
wherein ,
Figure QLYQS_67
representing the transmitting power of the unmanned aerial vehicle;
using a nonlinear energy harvesting model to characterize the RF-to-DC conversion, the method is calculated at equation 4nAt a moment of
Figure QLYQS_68
Collected energy:
Figure QLYQS_69
equation 4; />
wherein ,Mrepresents the maximum harvested power at the energy harvesting receiver when the energy harvesting circuit is saturated,aandbconstants related to the actual circuit sensitivity and resistance are shown respectively;
calculated according to equation 5
Figure QLYQS_70
Is a battery capacity of (a)The change process comprises the following steps:
Figure QLYQS_71
equation 5;
wherein ,
Figure QLYQS_72
expressed as sensor node battery maximum capacity, < >>
Figure QLYQS_73
Represent the firstnStrategies for time information transmission and energy harvesting, i.e. +.>
Figure QLYQS_74
When->
Figure QLYQS_75
When the time indicates that all nodes only collect energy; when->
Figure QLYQS_76
At the time, represent the first kIndividual sensor nodes->
Figure QLYQS_77
Upload information, only->
Figure QLYQS_78
The energy remained by the battery is larger than the energy consumed by the information transmission to successfully upload the information.
5. The method of claim 4, wherein calculating the energy consumed by the drone comprises:
and calculating the flight energy consumption of the fixed-wing unmanned aerial vehicle according to a formula 6:
Figure QLYQS_79
equation 6;
wherein ,
Figure QLYQS_80
and />
Figure QLYQS_81
Is a function of two constants, namely,gindicating the acceleration of gravity and,mis the mass of unmanned plane, < >>
Figure QLYQS_82
and />
Figure QLYQS_83
Respectively representing the speed and acceleration of the unmanned aerial vehicle, < ->
Figure QLYQS_84
Is the task time;
and calculating the flight trajectory of the unmanned aerial vehicle according to a formula 7:
Figure QLYQS_85
equation 7;
wherein ,
Figure QLYQS_86
and />
Figure QLYQS_87
Respectively represent +.>
Figure QLYQS_88
The flying angle and the changing angle of the unmanned plane at any moment.
6. The method of claim 5, wherein calculating the age AoI of the sensor information based on the energy consumed by the sensor node communication and the remaining power of the sensor node battery comprises:
assume that
Figure QLYQS_89
Is indicated at +.>
Figure QLYQS_90
The unmanned plane is from the last time +.>
Figure QLYQS_91
The information age AoI of the information is calculated according to the formula 8 according to the definition of the information age AoI when the information is collected:
Figure QLYQS_92
equation 8;
assuming that the information age AoI at the information generation time is 1 normalized unit time, the information age is increased by 1 every time a time passes, and once new information is generated, the original information is covered, the information age is also reduced by 1, and the information is calculated according to the formula 9
Figure QLYQS_93
AoI of information in->
Figure QLYQS_94
Is characterized by comprising the following steps:
Figure QLYQS_95
equation 9;
wherein ,
Figure QLYQS_96
representing a sufficiently small time gap.
7. The method of claim 6, wherein establishing the optimization problem model by jointly optimizing the unmanned aerial vehicle's flight trajectory, time of flight, and strategies for information acquisition and energy collection to minimize unmanned aerial vehicle energy consumption specifically comprises:
mathematical description is carried out on the optimized problem model according to the formula 10, and the problem is subjected to a deep reinforcement learning method
Figure QLYQS_97
Modeling as a Markov decision problem with finite state and action space:
Figure QLYQS_98
equation 10;
wherein ,A r AoI, which represents the maximum information age allowed by the system; constraint
Figure QLYQS_99
The unmanned aerial vehicle is ensured to acquire the information of each sensor node at least once for updating; constraint->
Figure QLYQS_100
Ensuring that the information AoI collected by the unmanned aerial vehicle meets the maximum information age limit; constraint->
Figure QLYQS_101
Indicating that the unmanned plane starts from the initial place and passes +.>
Figure QLYQS_102
Completing the task and reaching the destination; constraint->
Figure QLYQS_103
The acceleration of the unmanned aerial vehicle is limited, so that normal flight is ensured; constraint->
Figure QLYQS_104
The flight angle of the unmanned aerial vehicle is limited, so that normal flight is ensured; constraint->
Figure QLYQS_105
And the speed of the unmanned aerial vehicle is limited, so that normal flight is ensured.
8. The method of claim 7, wherein establishing a DQN-based unmanned aerial vehicle control framework comprises:
Establishing an environment and an intelligent body of an unmanned aerial vehicle control frame based on DQN, regarding the unmanned aerial vehicle as an intelligent body, regarding the flying motion in the air and the interaction with the sensor nodes as the environment, such as information and energy transmission;
the first in each training periodnAt any time, the agent needs to sense the state of the surrounding environments n The information including the electric quantity of the sensor node and the uploading information AoI is used for determining the action at the next moment according to the current environment
Figure QLYQS_106
After executing the action, the agent gets the feedback rewards corresponding to the environment +.>
Figure QLYQS_107
And continue to observe the next moment state +.>
Figure QLYQS_108
The training process of the intelligent agent continuously interacts with the environment, changes of states after the action is executed and rewards of environment feedback are observed to adjust the action strategy, and iterative learning is repeated, so that accumulated returns are maximized, and a better action strategy is obtained;
calculating a value function of a given strategy in each iteration process of the intelligent agent, giving the strategy according to the value function, and adopting a DQN algorithm to approach the value function through a neural network in a nonlinear manner
Figure QLYQS_109
Assessment state s n Take action a n At the expense of->
Figure QLYQS_110
Is the weight of the artificial neural network, the value function +.>
Figure QLYQS_111
The update rule of (2) is as shown in formula 11:
Figure QLYQS_112
Equation 11;
wherein ,
Figure QLYQS_113
for learning rate->
Figure QLYQS_114
Is a discount factor that is used to determine the discount,arepresenting the action selected for execution;
normalizing historical data prior to storing the data, i.e.
Figure QLYQS_115
, wherein x o Is the original data,x s Is normalized data, < >>
Figure QLYQS_116
For the original data mean>
Figure QLYQS_117
For the original data variance +.>
Figure QLYQS_118
The online network is trained by minimizing a loss function whose definition is shown in equation 12:
Figure QLYQS_119
equation 12;
wherein ,
Figure QLYQS_120
the number of rounds for updating the weights;
the online network is updated using a gradient descent method, the gradient of which is shown in equation 13:
Figure QLYQS_121
equation 13.
9. The method of claim 8, wherein planning an unmanned aerial vehicle flight strategy based on DQN in accordance with the DQN-based unmanned aerial vehicle control framework comprises:
by making a triplet
Figure QLYQS_127
To describe an unmanned aerial vehicle control algorithm based on DQN, to convert unmanned aerial vehicle energy minimization optimization problem into a Markov decision process, wherein +.>
Figure QLYQS_134
Indicating the status of the agent->
Figure QLYQS_139
Representing actions performed by the agent,/->
Figure QLYQS_128
Rewards representing environmental feedback after the agent performs the action; wherein in->
Figure QLYQS_133
At each moment, the state of the agent->
Figure QLYQS_140
Comprises two parts: in the first placenTime sensor node and status information of unmanned aerial vehicle, including the current stored energy of the sensor node +. >
Figure QLYQS_146
Freshness of unmanned aerial vehicle collection information +.>
Figure QLYQS_123
The geographic position, the flight speed, the flight angle, the energy consumption, the flight time and the distance from the end point of the unmanned aerial vehicle; thus, in->
Figure QLYQS_132
At each moment, the state of the agent->
Figure QLYQS_137
Denoted as->
Figure QLYQS_143
, wherein
Figure QLYQS_126
,/>
Figure QLYQS_130
,/>
Figure QLYQS_136
,/>
Figure QLYQS_142
Respectively denoted as->
Figure QLYQS_125
AoI the residual electric quantity of each sensor node and the acquired information; />
Figure QLYQS_131
Is unmanned plane in the firstnThe position of the moment from the end point is defined as +.>
Figure QLYQS_144
, q F Is the horizontal coordinate of the end point, q (n) is the flight horizontal coordinate of the unmanned plane, v (n) is the flight speed, and p (n) is the +.>
Figure QLYQS_148
The flying angle of the unmanned plane at moment, E (n) is energy consumption; the state space of the intelligent agent is->
Figure QLYQS_122
, wherein />
Figure QLYQS_129
The method comprises the steps of carrying out a first treatment on the surface of the In->
Figure QLYQS_135
At each moment, the agent performs an action +.>
Figure QLYQS_141
Comprises three parts: acceleration of unmanned aerial vehiclea(n) Corner of the vehicleo(n) Node information upload and energy collection policiesc(n) Thus, act->
Figure QLYQS_124
Denoted as->
Figure QLYQS_138
The method comprises the steps of carrying out a first treatment on the surface of the The reward function includes the distance between the unmanned aerial vehicle and the terminal point and the energy consumed by the unmanned aerial vehicle, if the current state satisfies the question +.>
Figure QLYQS_145
If the constraint condition is violated, the constraint condition is penalized, and therefore, the constraint condition is +.>
Figure QLYQS_147
At each moment, the rewarding function of the environmental feedback after the agent performs the action is as shown in formula 14:
Figure QLYQS_149
equation 14;
wherein ,
Figure QLYQS_150
indicating the departure to the +.>
Figure QLYQS_151
Energy consumption from time of day->
Figure QLYQS_152
Is a positive constant;
the unmanned aerial vehicle control algorithm based on DQN specifically comprises:
the training data is stored through an initialization experience pool, online network parameters are initialized randomly, a target network is introduced in the same network structure as the online network, and the neural network parameters which are the same as the online network are assigned;
at the position of
Figure QLYQS_153
In the training period, initializing the initial state of the unmanned aerial vehicle, normalizing the processing state, and enabling the maximum allowable task time of the unmanned aerial vehicle to be +.>
Figure QLYQS_154
Internal use->
Figure QLYQS_155
Greedy strategy exploration environment, i.e. with a certain exploration probability when selecting actions>
Figure QLYQS_156
Randomly selecting actions;
after the action is executed, the agent gets the rewards of environmental feedback and the state at the next moment
Figure QLYQS_157
At the same time will->
Figure QLYQS_158
The training data is normalized and then stored in an experience pool D; />
If the unmanned aerial vehicle task is completed, exiting the current flight task, entering the next round of training, finishing each flight task, and randomly selecting from an experience pool by adopting a small batch method
Figure QLYQS_159
Training data to break the correlation between the data and update the gradient according to the agent state;
every other interval
Figure QLYQS_160
And synchronously updating the weight of the online network to the target network.
10. An unmanned aerial vehicle energy minimisation design apparatus, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the unmanned aerial vehicle energy minimisation design method as claimed in any one of claims 1 to 9.
CN202110397120.XA 2021-04-13 2021-04-13 Unmanned aerial vehicle energy consumption minimization design method and device Active CN113268077B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110397120.XA CN113268077B (en) 2021-04-13 2021-04-13 Unmanned aerial vehicle energy consumption minimization design method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110397120.XA CN113268077B (en) 2021-04-13 2021-04-13 Unmanned aerial vehicle energy consumption minimization design method and device

Publications (2)

Publication Number Publication Date
CN113268077A CN113268077A (en) 2021-08-17
CN113268077B true CN113268077B (en) 2023-06-09

Family

ID=77228836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110397120.XA Active CN113268077B (en) 2021-04-13 2021-04-13 Unmanned aerial vehicle energy consumption minimization design method and device

Country Status (1)

Country Link
CN (1) CN113268077B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113406974B (en) * 2021-08-19 2021-11-02 南京航空航天大学 Learning and resource joint optimization method for unmanned aerial vehicle cluster federal learning
CN113938830B (en) * 2021-09-24 2023-03-24 北京邮电大学 Unmanned aerial vehicle base station deployment method and device
CN113973281B (en) * 2021-10-26 2022-09-27 深圳大学 Unmanned aerial vehicle Internet of things system and method for balancing energy consumption and service life of sensor
CN114071482B (en) * 2021-11-11 2024-03-26 浙江工业大学 Network throughput optimization method under AoI constraint in cognitive radio network
CN114630335B (en) * 2022-03-11 2023-09-08 西安电子科技大学 Low-energy-consumption high-dynamic air network coverage method for guaranteeing timeliness
CN114697975B (en) * 2022-04-11 2024-01-05 东南大学 Unmanned aerial vehicle cluster distributed deployment method for enhancing land wireless coverage
CN114783215B (en) * 2022-04-18 2023-05-26 中国人民解放军战略支援部队信息工程大学 Unmanned aerial vehicle clustering method and device and electronic equipment
CN115037638B (en) * 2022-06-14 2023-10-20 北京邮电大学 Unmanned aerial vehicle network data acquisition and transmission control method with low energy consumption and high timeliness
CN115277770B (en) * 2022-07-20 2023-04-25 华北电力大学(保定) Unmanned aerial vehicle information collection method based on joint optimization of node access and flight strategy
CN115113639B (en) * 2022-07-25 2023-05-05 中国人民解放军32370部队 Unmanned aerial vehicle flight control and simulation training method and device
CN115265549B (en) * 2022-09-27 2022-12-27 季华实验室 Unmanned aerial vehicle path planning method and device and electronic equipment
CN115278849B (en) * 2022-09-29 2022-12-20 香港中文大学(深圳) Transmission opportunity and power control method for dynamic topology of unmanned aerial vehicle
CN115857556B (en) * 2023-01-30 2023-07-14 中国人民解放军96901部队 Unmanned aerial vehicle collaborative detection planning method based on reinforcement learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045443B (en) * 2018-10-11 2021-07-02 北京航空航天大学 Unmanned aerial vehicle communication network movement control method, device, equipment and storage medium
CN110364031B (en) * 2019-07-11 2020-12-15 北京交通大学 Path planning and wireless communication method for unmanned aerial vehicle cluster in ground sensor network
CN110417456B (en) * 2019-07-24 2020-06-16 北京交通大学 Information transmission method based on unmanned aerial vehicle
CN111277320B (en) * 2020-01-21 2021-06-11 北京大学 Method and device for track design and interference management of cellular network connection unmanned aerial vehicle
CN111479239B (en) * 2020-04-29 2022-07-12 南京邮电大学 Sensor emission energy consumption optimization method of multi-antenna unmanned aerial vehicle data acquisition system

Also Published As

Publication number Publication date
CN113268077A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN113268077B (en) Unmanned aerial vehicle energy consumption minimization design method and device
Bouhamed et al. A UAV-assisted data collection for wireless sensor networks: Autonomous navigation and scheduling
Bellemare et al. Autonomous navigation of stratospheric balloons using reinforcement learning
Chen et al. Machine learning for wireless networks with artificial intelligence: A tutorial on neural networks
US11062235B1 (en) Predictive power management in a wireless sensor network
Prasetia et al. Mission-based energy consumption prediction of multirotor UAV
US10691133B1 (en) Adaptive and interchangeable neural networks
CN116011511A (en) Machine learning model scaling system for power aware hardware
CN113377131B (en) Method for acquiring unmanned aerial vehicle collected data track by using reinforcement learning
Li et al. Online velocity control and data capture of drones for the internet of things: An onboard deep reinforcement learning approach
US11562653B1 (en) Systems and methods for in-flight re-routing of an electric aircraft
CN112671451A (en) Unmanned aerial vehicle data collection method and device, electronic device and storage medium
Song et al. ADP-based optimal sensor scheduling for target tracking in energy harvesting wireless sensor networks
Sommer et al. Information Bang for the Energy Buck: Towards Energy-and Mobility-Aware Tracking.
CN112752357A (en) Online unmanned aerial vehicle auxiliary data collection method and device based on energy harvesting technology
Chhikara et al. Federated learning for air quality index prediction using UAV swarm networks
CN117369026B (en) Real-time high-precision cloud cluster residence time prediction method
Boubin et al. Programming and deployment of autonomous swarms using multi-agent reinforcement learning
Wang et al. Joint scheduling and trajectory design for UAV-aided wireless power transfer system
Wang et al. Multi-objective path planning algorithm for mobile charger in wireless rechargeable sensor networks
US11532186B1 (en) Systems and methods for communicating data of a vehicle
CN113807501A (en) Data prediction method and device based on improved particle swarm optimization
CN115277770B (en) Unmanned aerial vehicle information collection method based on joint optimization of node access and flight strategy
CN116772811B (en) Mapping method based on unmanned aerial vehicle network topology optimization
CN113253763B (en) Unmanned aerial vehicle data collection track determination method, system and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant