CN111752304B - Unmanned aerial vehicle data acquisition method and related equipment - Google Patents

Unmanned aerial vehicle data acquisition method and related equipment Download PDF

Info

Publication number
CN111752304B
CN111752304B CN202010584082.4A CN202010584082A CN111752304B CN 111752304 B CN111752304 B CN 111752304B CN 202010584082 A CN202010584082 A CN 202010584082A CN 111752304 B CN111752304 B CN 111752304B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
option
neural network
sensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010584082.4A
Other languages
Chinese (zh)
Other versions
CN111752304A (en
Inventor
牟治宇
张煜
高飞飞
郭文秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Shenzhen Research Institute Tsinghua University
Original Assignee
Tsinghua University
Shenzhen Research Institute Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Shenzhen Research Institute Tsinghua University filed Critical Tsinghua University
Priority to CN202010584082.4A priority Critical patent/CN111752304B/en
Publication of CN111752304A publication Critical patent/CN111752304A/en
Application granted granted Critical
Publication of CN111752304B publication Critical patent/CN111752304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention provides an unmanned aerial vehicle data acquisition method based on an Option-DQN algorithm and related equipment. The method comprises the following steps: (a) Acquiring current state information of the unmanned aerial vehicle, wherein the state information comprises the percentage of data acquired by each sensor in a sensor network, the current position of the unmanned aerial vehicle and the residual electric quantity of the unmanned aerial vehicle; (b) Inputting the state information into a value function neural network to obtain the probability of each option in an option set selected by the unmanned aerial vehicle, wherein the options in the option set comprise data of each sensor in a sensor network, return journey charging and task ending; (c) Determining a most preferred option according to the probability of each option in the option set selected by the unmanned aerial vehicle; (d) Obtaining a strategy corresponding to the most-preferred item, and controlling the unmanned aerial vehicle to execute the strategy; (e) And (b) judging whether the optimal preference is a task ending or not, and if the optimal preference is not the task ending, returning to the step (a). The invention can ensure that the data acquisition time of the unmanned aerial vehicle is shortest and simultaneously ensure that the unmanned aerial vehicle can be charged in time.

Description

Unmanned aerial vehicle data acquisition method and related equipment
Technical Field
The invention relates to the communication technology, in particular to an unmanned aerial vehicle data acquisition method and related equipment.
Background
The application of the unmanned aerial vehicle technology in wireless communication is more and more extensive in recent years. The unmanned aerial vehicle has the characteristics of high flexibility, strong maneuverability and the like, and can be used as a mobile aerial base station to assist the ground base station in communication, for example, to help remote areas to realize the coverage of communication. Furthermore, the information transmission between the drone and the ground user has almost no obstruction and can be assumed to be a line-of-sight channel. Therefore, the throughput and the coverage rate of the communication network covered by the unmanned aerial vehicle base station can be effectively improved.
The unmanned aerial vehicle can also assist the sensor network in data acquisition. Data acquisition among nodes of a traditional sensor network is realized in a multi-hop mode, one node transmits data to another node, and in the same way, data of all the nodes are converged to one node called a fusion center. The data acquisition mode has the problems that each sensor not only needs to transmit own data, but also needs to transfer data of other nodes, so that the electric quantity consumption of the nodes is too fast, and the multi-hop communication connection stability is poor. When adopting unmanned aerial vehicle to come the auxiliary sensor network and carry out data acquisition, above-mentioned problem just can avoid. Ground sensor can directly give the unmanned aerial vehicle that closes on with data transmission, improves the efficiency of transmission by a wide margin.
However, the power of the drone is limited, and timely charging is required if the power is insufficient during the task. At present, most of the conventional mathematical methods are adopted to solve the problems in the data acquisition scene of the unmanned aerial vehicle, and the electric quantity of the unmanned aerial vehicle is assumed to be infinite, which obviously does not conform to the reality. At present, no method is available for solving the problem that the unmanned aerial vehicle considers path planning and charging simultaneously in the data acquisition process.
Disclosure of Invention
In view of the above, there is a need to provide a data acquisition method, device, computer device and storage medium for an unmanned aerial vehicle, which can ensure that the time for the unmanned aerial vehicle to acquire data from a sensor network is shortest, and meanwhile, ensure that the unmanned aerial vehicle can be charged in time.
A first aspect of the application provides a method for data acquisition by an unmanned aerial vehicle, the method comprising:
(a) Acquiring current state information of the unmanned aerial vehicle, wherein the state information comprises the percentage of data acquired by each sensor in a sensor network, the current position of the unmanned aerial vehicle and the residual electric quantity of the unmanned aerial vehicle;
(b) Inputting the state information into a value function neural network to obtain the probability of each option in an option set selected by the unmanned aerial vehicle, wherein the options in the option set comprise data of each sensor in a sensor network, return journey charging and task ending;
(c) Determining a most preferred option according to the probability of each option in the option set selected by the unmanned aerial vehicle;
(d) Obtaining a strategy corresponding to the most-preferred item, and controlling the unmanned aerial vehicle to execute the strategy;
(e) And (b) judging whether the optimal preference is the task ending or not, and if the optimal preference is not the task ending, returning to the step (a).
In another possible implementation manner, the value function neural network includes an input layer, a hidden layer, and an output layer, where the hidden layer includes a first fully-connected layer and a second fully-connected layer, and an output of the first fully-connected layer is:
Figure BDA0002553467140000021
wherein W 1 And b 1 Respectively, a weight parameter and a deviation parameter of the first fully connected layer, wherein the ReLU is a linear rectification function;
the output of the second fully connected layer is:
Figure BDA0002553467140000022
wherein W 2 And b 2 Respectively, a weight parameter and a deviation parameter of the second fully-connected layer;
the output of the output layer is:
Figure BDA0002553467140000023
wherein W 3 And b 3 Respectively, a weight parameter and a bias parameter of the output layer, softmax being a normalized exponential function.
In another possible implementation manner, before the inputting the state information into the value function neural network, the method further includes:
from a training sample set
Figure BDA0002553467140000031
Training the value function neural network by randomly extracting training samples, and training a sample set
Figure BDA0002553467140000032
The kth training sample of (1)
Figure BDA0002553467140000033
Figure BDA0002553467140000034
Figure BDA00025534671400000325
Is the state information of the unmanned aerial vehicle before the training,
Figure BDA0002553467140000035
is composed of
Figure BDA0002553467140000036
The most preferred option under the conditions of the condition,
Figure BDA0002553467140000037
for unmanned aerial vehicle implementation
Figure BDA0002553467140000038
The total instant prize to be won later on,
Figure BDA0002553467140000039
for unmanned aerial vehicle implementation
Figure BDA00025534671400000310
The latter state information;
using training samples d k The loss function for training the value function neural network is as follows:
Figure BDA00025534671400000311
wherein
Figure BDA00025534671400000312
Representing the expectation, gamma is a discount factor, theta represents all parameters of the value function neural network, Q op A neural network representing the function of the value,
Figure BDA00025534671400000313
a target network representing the value function neural network.
In another possible implementation manner, the update rule of θ is:
Figure BDA00025534671400000314
where α is the learning rate, θ new And theta old Respectively representing the updated parameters and the parameters before updating of the value function neural network, and the loss function
Figure BDA00025534671400000315
Gradient of (2)
Figure BDA00025534671400000316
Comprises the following steps:
Figure BDA00025534671400000317
in another possible implementation, the overall instant prize is
Figure BDA00025534671400000318
Including electric power rewards
Figure BDA00025534671400000319
Collecting rewards
Figure BDA00025534671400000320
And path rewards
Figure BDA00025534671400000321
In another possible implementation, the electric quantity is rewarded
Figure BDA00025534671400000322
Calculated according to the following formula:
Figure BDA00025534671400000323
the collection reward
Figure BDA00025534671400000324
Calculated according to the following formula:
Figure BDA0002553467140000041
the path reward
Figure BDA0002553467140000042
Calculated according to the following formula:
Figure BDA0002553467140000043
wherein N is e 、N c And N l Are all negative constants,. L k For unmanned aerial vehicles at
Figure BDA0002553467140000044
The distance of inner flight.
In another possible implementation manner, the determining, according to the probability that the drone selects each option in the option set, the most preferred item includes:
generating a random number between 0 and 1;
judging whether the random number is smaller than epsilon, wherein epsilon is a constant between 0 and 1;
if the random number is smaller than epsilon, randomly selecting one option from the option set as the optimal option;
and if the random number is not less than epsilon, selecting the option with the highest probability from the option set as the optimal option.
A second aspect of the application provides an unmanned aerial vehicle data acquisition device, the device includes:
the acquisition module is used for acquiring the current state information of the unmanned aerial vehicle, wherein the state information comprises the percentage of data acquired by each sensor in the sensor network, the current position of the unmanned aerial vehicle and the residual electric quantity of the unmanned aerial vehicle;
the planning module is used for inputting the state information into a value function neural network to obtain the probability of each option in an option set selected by the unmanned aerial vehicle, wherein the options in the option set comprise data of each sensor in the sensor network, return voyage charging and task ending;
a determining module for determining a most preferred option according to a probability of each option in the set of options being selected by the drone;
the execution module is used for acquiring the strategy corresponding to the most preferred item and controlling the unmanned aerial vehicle to execute the strategy;
and the judging module is used for judging whether the most preferred item is the ending task.
In another possible implementation manner, the value function neural network includes an input layer, a hidden layer, and an output layer, where the hidden layer includes a first fully-connected layer and a second fully-connected layer, and an output of the first fully-connected layer is:
Figure BDA0002553467140000051
wherein W 1 And b 1 Respectively, a weight parameter and a deviation parameter of the first fully connected layer, wherein the ReLU is a linear rectification function;
the output of the second fully connected layer is:
Figure BDA0002553467140000052
wherein W 2 And b 2 Respectively, a weight parameter and a deviation parameter of the second fully-connected layer;
the output of the output layer is:
Figure BDA0002553467140000053
wherein W 3 And b 3 Respectively, a weight parameter and a bias parameter of the output layer, softmax being a normalized exponential function.
In another possible implementation manner, the apparatus further includes:
a training module for, prior to said inputting said state information into a value-function neural network, deriving a set of training samples
Figure BDA0002553467140000054
Training the value function neural network by randomly extracting training samples, and training a sample set
Figure BDA0002553467140000055
The kth training sample in (1)
Figure BDA0002553467140000056
Figure BDA00025534671400000517
Is the state information of the unmanned aerial vehicle before the training,
Figure BDA0002553467140000057
is composed of
Figure BDA0002553467140000058
The most preferred option under the conditions of the condition,
Figure BDA0002553467140000059
is made withoutMan-machine execution
Figure BDA00025534671400000510
The total instant prize to be won later on,
Figure BDA00025534671400000511
for unmanned aerial vehicle
Figure BDA00025534671400000512
The latter state information;
using training samples d k The loss function for training the value function neural network is as follows:
Figure BDA00025534671400000513
wherein
Figure BDA00025534671400000514
Representing the expectation, gamma is a discount factor, theta represents all parameters of the value function neural network, Q op A neural network representing the function of the value,
Figure BDA00025534671400000515
a target network representing the value function neural network.
In another possible implementation manner, the update rule of θ is:
Figure BDA00025534671400000516
where α is the learning rate, θ new And theta old Respectively representing the updated parameters and the parameters before updating of the value function neural network, and the loss function
Figure BDA0002553467140000061
Gradient of (2)
Figure BDA0002553467140000062
Comprises the following steps:
Figure BDA0002553467140000063
in another possible implementation, the overall instant prize is
Figure BDA0002553467140000064
Including electric power rewards
Figure BDA0002553467140000065
Collecting rewards
Figure BDA0002553467140000066
And path rewards
Figure BDA0002553467140000067
In another possible implementation, the electric quantity is rewarded
Figure BDA0002553467140000068
Calculated according to the following formula:
Figure BDA0002553467140000069
the collection reward
Figure BDA00025534671400000610
Calculated according to the following formula:
Figure BDA00025534671400000611
the path reward
Figure BDA00025534671400000612
Calculated according to the following formula:
Figure BDA00025534671400000613
wherein N is e 、N c And N l Are all negative constants,. L k For unmanned aerial vehicle at
Figure BDA00025534671400000614
The distance flown in.
In another possible implementation manner, the determining, according to the probability that the drone selects each option in the option set, the most preferred item includes:
generating a random number between 0 and 1;
judging whether the random number is smaller than epsilon, wherein epsilon is a constant between 0 and 1;
if the random number is smaller than epsilon, randomly selecting an option from the option set as the most preferred option;
and if the random number is not less than epsilon, selecting the option with the highest probability from the option set as the most preferred option.
A third aspect of the application provides a computer device comprising a processor for implementing the drone data acquisition method when executing a computer program stored in a memory.
A fourth aspect of the present application provides a computer storage medium having a computer program stored thereon, the computer program, when executed by a processor, implementing the drone data acquisition method.
According to the technical scheme, the invention provides a data acquisition solution for autonomous charging and path planning of the unmanned aerial vehicle. The invention adopts an Option-DQN-based hierarchical reinforcement learning algorithm to ensure that the unmanned aerial vehicle finds the optimal path selection, so that the data acquisition time is shortest, and meanwhile, the unmanned aerial vehicle can judge when to charge according to the self state and complete the charging action.
Different from the traditional method, this scheme can deal with complicated scene change, for example subaerial sensor quantity increases, and unmanned aerial vehicle's electric quantity is limited etc.. The method is simple to implement, low in complexity and obvious in practical application value.
Drawings
Fig. 1 is a flowchart of a data acquisition method for an unmanned aerial vehicle according to an embodiment of the present invention.
Fig. 2 is a structural diagram of the data acquisition device of the unmanned aerial vehicle provided by the embodiment of the invention.
Fig. 3 is a schematic diagram of a computer device provided by an embodiment of the present invention.
Fig. 4 is a comparison graph of the period return of the Option-DQN algorithm proposed by the present invention and the conventional DQN algorithm.
Fig. 5 is a diagram of a flight path of an unmanned aerial vehicle for data acquisition of the unmanned aerial vehicle according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, and the described embodiments are merely some, but not all embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Preferably, the unmanned aerial vehicle data acquisition method is applied to one or more computer devices. The computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
Example one
Fig. 1 is a flowchart of a data acquisition method for an unmanned aerial vehicle according to an embodiment of the present invention. The unmanned aerial vehicle data acquisition method is applied to computer equipment. The data acquisition method of the unmanned aerial vehicle controls the rechargeable unmanned aerial vehicle to acquire data of the sensor network, so that the shortest data acquisition time of the unmanned aerial vehicle is ensured, and meanwhile, the unmanned aerial vehicle can be charged in time.
As shown in fig. 1, the data acquisition method for the unmanned aerial vehicle includes:
101, acquiring current state information of the unmanned aerial vehicle, wherein the state information comprises the percentage of data collected by each sensor in the sensor network, the current position of the unmanned aerial vehicle and the residual capacity of the unmanned aerial vehicle.
The current state information of the unmanned aerial vehicle can be recorded as s t ,s t =(cr t ,p t ,e t ),
Figure BDA0002553467140000081
Wherein
Figure BDA0002553467140000082
Figure BDA0002553467140000083
Is the percentage of data that has been collected by each sensor in the sensor network,
Figure BDA0002553467140000084
n is the number of sensors in the sensor network, p t For the current position of the drone, p t =(x t ,y t ,z t ),e t The surplus power of the unmanned aerial vehicle.
In one embodiment, the height z of the drone t Is a constant H, i.e. z t =H。
The unmanned aerial vehicle starts from the initial position, and data acquisition is carried out on the sensors in the sensor network one by one. And in the data acquisition process, if the electric quantity is insufficient, the unmanned aerial vehicle returns to the charging station to be fully charged and continues to acquire data, and when the data acquisition of all the sensors in the sensor network is finished, the unmanned aerial vehicle returns to the initial position.
In one embodiment, the starting position of the drone is a charging station. The unmanned aerial vehicle starts from the charging station, if the electric quantity is insufficient in the data acquisition process, the unmanned aerial vehicle returns to the charging station to be fully charged and then continues to acquire data, and when the data acquisition of all the sensors in the sensor network is finished, the unmanned aerial vehicle returns to the charging station.
The sensor network comprises a plurality of sensors deployed on the ground, the positions of the sensors are randomly distributed, and the data volume carried by each sensor is different. Therefore, the time that the unmanned aerial vehicle stays on each sensor in the data acquisition process is different.
And 102, inputting the state information into a value function neural network to obtain the probability of each Option (Option) in an Option set selected by the unmanned aerial vehicle, wherein the options in the Option set comprise data of each sensor in the sensor network, return voyage charging and task ending.
The option set can be recorded as
Figure BDA0002553467140000091
o s,1 For acquiring data, o, of a first sensor in a sensor network s,2 For the acquisition of data of the second sensor in the sensor network … …, o s,N For collecting data of the Nth sensor in the sensor network, o c For return journey charging, o p To end the task. Option set
Figure BDA0002553467140000092
The number of the included options is
Figure BDA0002553467140000093
Option set
Figure BDA0002553467140000094
Each option in (a) is a triplet<I ooo >,
Figure BDA0002553467140000095
I o And a state information set corresponding to the option (indicating the state information of the unmanned aerial vehicle in which the option can be selected). In an embodiment, the option selectable by the drone in any state (any state information) is the whole option set
Figure BDA0002553467140000096
Therefore, the number of the first and second electrodes is increased,
Figure BDA0002553467140000097
π o and the strategy corresponding to the option. Beta is a o Is the termination condition for each option.
In one embodiment, each option corresponds to a policy π o Termination condition beta for each option for predefined policy o To finish pi o All actions defined. In particular, option o for collecting sensor data s,i I =1,2, …, N, the strategy is to fly to the ith sensor from the current position in a straight line and collect the data of the sensor until the collection is finished and exit from o s,i . For return voyage charging o c The strategy is that the electric charge is flown to a charging station in a straight line and charged until the electric charge is full and exits from the charging station c . For the end task o p The strategy is that the unmanned aerial vehicle flies back to the charging station in a straight line, and the user is informed of the completion of the task and quits p . It is understood that each option may correspond to other policies.
The value function neural network is a pre-trained neural network.
In one embodiment, the value function neural network comprises an input layer, a hidden layer and an output layer, the hidden layer comprises a first fully-connected layer and a second fully-connected layer, and the output of the first fully-connected layer is:
Figure BDA0002553467140000101
wherein W 1 And b 1 Respectively, a weight parameter and a deviation parameter of the first fully connected layer, wherein the ReLU is a linear rectification function;
the output of the second fully connected layer is:
Figure BDA0002553467140000102
wherein W 2 And b 2 Respectively, a weight parameter and a deviation parameter of the second fully-connected layer;
the output of the output layer is:
Figure BDA0002553467140000103
wherein W 3 And b 3 Respectively, a weight parameter and a bias parameter of the output layer, softmax being a normalized exponential function.
The input to the second fully-connected layer is the output of the first fully-connected layer, and the first fully-connected network may consist of 1024 neurons, with the second fully-connected layer consisting of 300 neurons.
The input of the output layer is the output of the second fully connected layer. The output of the output layer is an N + 2-dimensional vector comprising the probability outputs of the drone for selecting each option in the set of options j
Figure BDA0002553467140000104
It will be appreciated that other network architectures may be used for the value function neural network.
In an embodiment, prior to said inputting said state information into a value function neural network, said method further comprises:
from a training sample set
Figure BDA0002553467140000105
Training the value function neural network by randomly extracting training samples, and training a sample set
Figure BDA0002553467140000106
The kth training sample of (1)
Figure BDA0002553467140000107
Figure BDA0002553467140000108
Figure BDA00025534671400001015
Is the state information of the unmanned aerial vehicle before the training,
Figure BDA0002553467140000109
is composed of
Figure BDA00025534671400001010
The most preferred option under the conditions of the condition,
Figure BDA00025534671400001011
for unmanned aerial vehicle
Figure BDA00025534671400001012
The overall instant prize to be won later on,
Figure BDA00025534671400001013
for unmanned aerial vehicle implementation
Figure BDA00025534671400001014
The latter state information;
using training samples d k The loss function for training the value function neural network is as follows:
Figure BDA0002553467140000111
wherein
Figure BDA0002553467140000112
Representing the expectation, gamma is a discount factor, theta represents all parameters of the value function neural network, Q op A neural network representing a function of said values,
Figure BDA0002553467140000113
a target network representing the value function neural network.
In one embodiment, the update rule for θ is:
Figure BDA0002553467140000114
where α is the learning rate, θ new And theta old Respectively representing updated parameters and parameters before updating of the value function neural network, and a loss function Q op Gradient of (2)
Figure BDA0002553467140000115
Comprises the following steps:
Figure BDA0002553467140000116
in an embodiment, the target network is updated in a soft update manner, that is, after a certain period, the target network is updated by using the parameter synthesis of the original target network and the current function neural network, and the update rule is as follows:
θ target,new =αθ target,old +(1-α)θ
where α is the update rate and α ∈ [0,1],θ target,new And theta target,old Respectively representing target networks
Figure BDA0002553467140000117
Updated parameters and parameters before updating. The robustness of neural network training can be increased by adopting a soft update mode for the target network.
In one embodiment, the overall instant prize is
Figure BDA0002553467140000118
Including electric power rewards
Figure BDA0002553467140000119
Collecting rewards
Figure BDA00025534671400001110
And path rewards
Figure BDA00025534671400001111
The electric quantity reward
Figure BDA00025534671400001112
And the method is used for punishing the condition that the electric quantity of the unmanned aerial vehicle is insufficient in the execution process of executing the strategy corresponding to the option.
The collection reward
Figure BDA00025534671400001113
An option for punishing that the drone repeatedly selects a sensor whose acquisition has completed.
The path reward
Figure BDA00025534671400001114
The method is used for guiding the unmanned aerial vehicle to learn the shortest possible path for flying so as to acquire the data of the sensor.
In one embodiment, the power reward
Figure BDA0002553467140000121
Calculated according to the following formula:
Figure BDA0002553467140000122
the collection reward
Figure BDA0002553467140000123
Calculated according to the following formula:
Figure BDA0002553467140000124
the path reward
Figure BDA0002553467140000125
Calculated according to the following formula:
Figure BDA0002553467140000126
wherein N is e 、N c And N l Are all negative constants,. L k For unmanned aerial vehicles at
Figure BDA0002553467140000127
The distance flown in.
The total instant prize
Figure BDA0002553467140000128
Awarding for electricity
Figure BDA0002553467140000129
Collecting rewards
Figure BDA00025534671400001210
And path rewards
Figure BDA00025534671400001211
And (4) summing.
103, determining a most preferred option according to the probability of the unmanned aerial vehicle selecting each option in the option set.
In an embodiment, the determining a most preferred item according to the probability of the drone selecting each item in the set of items includes:
and determining the optimal preference from the option set through an epsilon-greedy algorithm according to the probability of each option in the option set selected by the unmanned aerial vehicle.
Specifically, determining the most preferred term from the set of options by an epsilon-greedy algorithm comprises:
generating a random number between 0 and 1;
judging whether the random number is smaller than epsilon, wherein epsilon is a constant between 0 and 1;
if the random number is smaller than epsilon, randomly selecting one option from the option set as the optimal option;
and if the random number is not less than epsilon, selecting the option with the highest probability from the option set as the most preferred option.
And 104, acquiring a strategy corresponding to the most preferred item, and controlling the unmanned aerial vehicle to execute the strategy.
Controlling the drone to execute the policy is controlling the drone to execute a sequence of actions specified by the policy.
For example, the most preferable item is to collect data of the ith sensor, then a strategy for collecting data of the ith sensor is obtained, the unmanned aerial vehicle is controlled to fly to the ith sensor from the current position in a straight line and collect data of the sensor until the unmanned aerial vehicle is completely collected and exits from the position o s,i
As another example, the most preferred is return charging o c Then obtain the return charge o c According to the strategy of return charging o c The strategy controls the unmanned aerial vehicle to fly to the charging station in a straight line and charge until the electric quantity is full and exit from the charging station c
As another example, the most preferred item is the end task o p Then get the end task o p According to the strategy of ending the task o p The strategy controls the unmanned aerial vehicle to fly back to the charging station in a straight line, informs the user of ending the task and quits p
And 105, judging whether the optimal preference is the task ending or not, and if the optimal preference is not the task ending, returning to 101.
For example, if the optimal preference is to collect data of the ith sensor and the optimal preference is not to end the task, the process returns to 101.
And if the optimal option is the task ending, ending the process.
The embodiment provides a data acquisition solution for unmanned aerial vehicle autonomous charging and path planning. According to the scheme, an Option-Deep Q-Network (DQN) -based layered reinforcement learning algorithm is adopted to ensure that the unmanned aerial vehicle finds the optimal path selection, so that the data acquisition time is shortest, and meanwhile, the unmanned aerial vehicle can judge when to charge according to the state of the unmanned aerial vehicle and complete the charging action.
Fig. 4 is a comparison graph of the period return of the proposed Option-DQN algorithm and the conventional DQN algorithm.
In fig. 4, the abscissa is the number of training cycles and the ordinate is the cumulative total instant prize. Compared with the traditional DQN algorithm, the period return of the Option-DQN algorithm provided by the invention rises more rapidly, and the period return can be converged rapidly. The periodic return of the DQN algorithm has obvious oscillation and large variance, and the final periodic return is obviously lower than the former. The Option-DQN algorithm provided by the invention utilizes a mode of directly learning a 'high-level' strategy, and can learn the meaning of a scene more quickly compared with the traditional DQN algorithm, so that the Option-DQN algorithm is more effective; while the traditional DQN algorithm only selects basic actions at a time, it lacks overall considerations, such as often turning to collect one sensor on its way, resulting in a lower collection efficiency.
Fig. 5 is a diagram of a flight path of an unmanned aerial vehicle for data acquisition of the unmanned aerial vehicle according to the present invention. Wherein the unmanned aerial vehicle starts from the starting point and traverses each sensor once, and finally returns to the terminal point, and the unmanned aerial vehicle returns to the charging station once on the way for charging, and the whole track unmanned aerial vehicle selects 22 options altogether, 162 time units are used.
Example two
Fig. 2 is a structural diagram of an unmanned aerial vehicle data acquisition device provided in the second embodiment of the present invention. The unmanned aerial vehicle data acquisition device 20 is applied to computer equipment. Unmanned aerial vehicle data acquisition device 40 control chargeable unmanned aerial vehicle carries out data acquisition to sensor network, guarantees that unmanned aerial vehicle data acquisition's time is the shortest, guarantees simultaneously that unmanned aerial vehicle can in time charge.
As shown in fig. 2, the unmanned aerial vehicle data acquisition apparatus 20 may include an obtaining module 201, a planning module 202, a determining module 203, an executing module 204, and a determining module 205.
The acquisition module 201 is configured to acquire current state information of the unmanned aerial vehicle, where the state information includes a percentage of data acquired by each sensor in the sensor network, a current position of the unmanned aerial vehicle, and a remaining power of the unmanned aerial vehicle.
The current state information of the unmanned aerial vehicle can be recorded as s t ,s t =(cr t ,p t ,e t ),
Figure BDA0002553467140000141
Wherein
Figure BDA0002553467140000142
Figure BDA0002553467140000143
For the percentage of data that has been collected by each sensor in the sensor network,
Figure BDA0002553467140000144
n is the number of sensors in the sensor network, p t For the current position of the drone, p t =(x t ,y t ,z t ),e t The surplus power of the unmanned aerial vehicle.
In one embodiment, the height z of the drone t Is a constant H, i.e. z t =H。
The unmanned aerial vehicle starts from the initial position, and data acquisition is carried out on the sensors in the sensor network one by one. And in the data acquisition process, if the electric quantity is insufficient, the unmanned aerial vehicle returns to the charging station to be fully charged and continues to acquire data, and when the data acquisition of all the sensors in the sensor network is finished, the unmanned aerial vehicle returns to the initial position.
In one embodiment, the starting position of the drone is a charging station. The unmanned aerial vehicle starts from the charging station, if the electric quantity is insufficient in the data acquisition process, the unmanned aerial vehicle returns to the charging station to be fully charged and then continues to acquire data, and when the data acquisition of all the sensors in the sensor network is finished, the unmanned aerial vehicle returns to the charging station.
The sensor network comprises a plurality of sensors deployed on the ground, the positions of the sensors are randomly distributed, and the data volume carried by each sensor is different. Therefore, the time that the unmanned aerial vehicle stays on each sensor in the data acquisition process is different.
The planning module 202 is configured to input the state information into a value function neural network, and obtain a probability that the unmanned aerial vehicle selects each Option (Option) in an Option set, where the options in the Option set include collecting data of each sensor in the sensor network, returning to the home, charging, and ending a task.
The option set can be recorded as
Figure BDA0002553467140000151
o s,1 For acquiring data, o, of a first sensor in a sensor network s,2 For the acquisition of data of the second sensor in the sensor network … …, o s,N For collecting data of the Nth sensor in the sensor network, o c For return journey charging, o p To end the task. Option set
Figure BDA0002553467140000152
The number of the included options is
Figure BDA0002553467140000153
Option set
Figure BDA0002553467140000154
Each option in (a) is a triplet<I ooo >,
Figure BDA0002553467140000155
I o And a state information set corresponding to the option (indicating the state information of the unmanned aerial vehicle in which the option can be selected). In one embodiment, the drone is in any one state (any state information)) The selectable option is the whole option set
Figure BDA0002553467140000156
Therefore, the temperature of the molten metal is controlled,
Figure BDA0002553467140000157
π o the strategy corresponding to the option. Beta is a beta o Is the termination condition for each option.
In one embodiment, each option corresponds to a policy π o Termination condition beta for each option for predefined policy o To finish pi o All actions defined. In particular, option o for collecting sensor data s,i I =1,2, …, N, and the strategy is to fly to the ith sensor from the current position in a straight line and collect the data of the sensor until the data is collected and exit from the sensor s,i . For return voyage charging o c The strategy is that the flying is in a straight line to a charging station and charging is carried out until the electric quantity is full and the flying is exited from the step o c . For the end task o p The strategy is that the unmanned aerial vehicle flies back to the charging station in a straight line, and the user is informed of the completion of the task and quits p . It is understood that each option may correspond to other policies.
The value function neural network is a pre-trained neural network.
In one embodiment, the value function neural network comprises an input layer, a hidden layer and an output layer, the hidden layer comprises a first fully-connected layer and a second fully-connected layer, and the output of the first fully-connected layer is:
Figure BDA0002553467140000158
wherein W 1 And b 1 Respectively, a weight parameter and a deviation parameter of the first fully connected layer, wherein the ReLU is a linear rectification function;
the output of the second fully connected layer is:
Figure BDA0002553467140000161
wherein W 2 And b 2 Respectively, a weight parameter and a deviation parameter of the second fully-connected layer;
the output of the output layer is:
Figure BDA0002553467140000162
wherein W 3 And b 3 Respectively, a weight parameter and a bias parameter of the output layer, softmax being a normalized exponential function.
The input to the second fully-connected layer is the output of the first fully-connected layer, and the first fully-connected network may consist of 1024 neurons, with the second fully-connected layer consisting of 300 neurons.
The input of the output layer is the output of the second fully connected layer. The output of the output layer is an N + 2-dimensional vector comprising the probability outputs of the drone for selecting each option in the set of options j
Figure BDA0002553467140000163
It will be appreciated that other network architectures may be used for the value function neural network.
In an embodiment, the unmanned aerial vehicle data acquisition device 20 further includes:
a training module for inputting the state information into a value-function neural network from a set of training samples
Figure BDA0002553467140000164
Training the value function neural network by randomly extracting training samples, and training a sample set
Figure BDA0002553467140000165
The kth training sample in (1)
Figure BDA0002553467140000166
Figure BDA00025534671400001617
Is the state information of the unmanned aerial vehicle before the training,
Figure BDA0002553467140000167
is composed of
Figure BDA0002553467140000168
The most preferred under the conditions of the reaction,
Figure BDA0002553467140000169
for unmanned aerial vehicle implementation
Figure BDA00025534671400001610
The total instant prize to be won later on,
Figure BDA00025534671400001611
for unmanned aerial vehicle implementation
Figure BDA00025534671400001612
The latter state information;
using training samples d k The loss function for training the value function neural network is as follows:
Figure BDA00025534671400001613
wherein
Figure BDA00025534671400001614
Representing the expectation, gamma is a discount factor, theta represents all parameters of the value function neural network, Q op A neural network representing the function of the value,
Figure BDA00025534671400001615
a target network representing the value function neural network.
In one embodiment, the update rule for θ is:
Figure BDA00025534671400001616
where α is the learning rate, θ bew And theta old Respectively representing updated parameters and parameters before updating of the value function neural network, and a loss function Q op Gradient of (2)
Figure BDA0002553467140000171
Comprises the following steps:
Figure BDA0002553467140000172
in one embodiment, the target network is updated in a "soft update" manner, that is, after every certain period, the target network is updated by using the parameter synthesis of the original target network and the current function neural network, and the update rule is as follows:
θ target,new =αθ target,old +(1-α)θ
where α is the update rate and α ∈ [0,1 ]],θ target,new And theta target,old Respectively representing target networks
Figure BDA0002553467140000173
Updated parameters and parameters before updating. The robustness of neural network training can be increased by adopting a soft update mode for the target network.
In one embodiment, the overall instant prize is
Figure BDA0002553467140000174
Including electric power rewards
Figure BDA0002553467140000175
Collecting rewards
Figure BDA0002553467140000176
And path rewards
Figure BDA0002553467140000177
The electric quantity reward
Figure BDA0002553467140000178
And the method is used for punishing the condition that the electric quantity of the unmanned aerial vehicle is insufficient in the execution process of executing the strategy corresponding to the option.
The collection reward
Figure BDA0002553467140000179
An option for punishing that the drone repeatedly selects a sensor whose acquisition has completed.
The path reward
Figure BDA00025534671400001710
The method is used for guiding the unmanned aerial vehicle to learn the shortest possible path for flying so as to acquire the data of the sensor.
In one embodiment, the amount of electricity awards
Figure BDA00025534671400001711
Calculated according to the following formula:
Figure BDA00025534671400001712
the collection reward
Figure BDA00025534671400001713
Calculated according to the following formula:
Figure BDA00025534671400001714
the path reward
Figure BDA00025534671400001715
Calculated according to the following formula:
Figure BDA00025534671400001716
wherein N is e 、N c And N l Are all negative constants,. L k For unmanned aerial vehicle at
Figure BDA0002553467140000181
The distance flown in.
The total instant prize
Figure BDA0002553467140000182
Awarding power
Figure BDA0002553467140000183
Collecting rewards
Figure BDA0002553467140000184
And path rewards
Figure BDA0002553467140000185
And (4) summing.
A determining module 203, configured to determine a most preferred option according to a probability that the drone selects each option in the option set.
In an embodiment, the determining the most preferred item according to the probability of the drone selecting each option in the set of options includes:
and determining the optimal preference from the option set through an epsilon-greedy algorithm according to the probability of each option in the option set selected by the unmanned aerial vehicle.
Specifically, determining the most preferred term from the set of options by an epsilon-greedy algorithm comprises:
generating a random number between 0 and 1;
judging whether the random number is smaller than epsilon, wherein epsilon is a constant between 0 and 1;
if the random number is smaller than epsilon, randomly selecting one option from the option set as the optimal option;
and if the random number is not less than epsilon, selecting the option with the highest probability from the option set as the optimal option.
And the executing module 204 is configured to acquire a policy corresponding to the most preferable item, and control the unmanned aerial vehicle to execute the policy.
Controlling the drone to execute the policy is controlling the drone to execute a sequence of actions specified by the policy.
For example, the most preferable item is to collect data of the ith sensor, then a strategy for collecting data of the ith sensor is obtained, the unmanned aerial vehicle is controlled to fly to the ith sensor from the current position in a straight line and collect data of the sensor until the unmanned aerial vehicle is completely collected and exits from the position o s,i
As another example, the most preferred is return charging o c Then obtain the return charge o c According to the strategy of return charging o c The strategy controls the unmanned aerial vehicle to fly to a charging station in a straight line and charge until the electric quantity is full and exit from the charging station c
As another example, the most preferred item is the end task o p Then get the end task o p According to the strategy of ending the task o p The strategy controls the unmanned aerial vehicle to fly back to the charging station in a straight line, informs the user of ending the task and quits p
A determining module 205, configured to determine whether the most preferred item is the task to be ended, and if the most preferred item is not the task to be ended, the obtaining module 201 obtains the current state information of the unmanned aerial vehicle again.
For example, if the optimal preference is to collect data of the ith sensor, and the optimal preference is not to end the task, the obtaining module 201 obtains the current state information of the aircraft again.
And if the optimal option is the task ending, finishing data acquisition.
The second embodiment provides a data acquisition solution for unmanned aerial vehicle autonomous charging and path planning. According to the scheme, the layered reinforcement learning algorithm based on the Option-DQN is adopted to ensure that the unmanned aerial vehicle finds the optimal path selection, so that the data acquisition time is shortest, and meanwhile, the unmanned aerial vehicle can judge when to charge according to the state of the unmanned aerial vehicle and complete the charging action.
EXAMPLE III
The present embodiment provides a storage medium, where a computer program is stored on the storage medium, and when being executed by a processor, the computer program implements the steps in the above-mentioned unmanned aerial vehicle data acquisition method embodiment, for example, 101 to 106 shown in fig. 1:
101, acquiring current state information of the unmanned aerial vehicle, wherein the state information comprises the percentage of data acquired by each sensor in a sensor network, the current position of the unmanned aerial vehicle and the residual electric quantity of the unmanned aerial vehicle;
102, inputting the state information into a value function neural network to obtain the probability of each option in an option set selected by the unmanned aerial vehicle, wherein the options in the option set comprise data acquisition of each sensor in a sensor network, return journey charging and task ending;
103, determining a most preferred option according to the probability of the unmanned aerial vehicle selecting each option in the option set;
104, acquiring a strategy corresponding to the most preferred item, and controlling the unmanned aerial vehicle to execute the strategy;
and 105, judging whether the optimal preference is the task ending or not, and if the optimal preference is not the task ending, returning to 101.
Alternatively, the computer program, when executed by a processor, implements the functionality of the modules in the above-described apparatus embodiments, such as modules 201-205 in fig. 2:
an obtaining module 201, configured to obtain current state information of the unmanned aerial vehicle, where the state information includes a percentage of data acquired by each sensor in the sensor network, a current position of the unmanned aerial vehicle, and a remaining power of the unmanned aerial vehicle;
the planning module 202 is configured to input the state information into a value function neural network to obtain a probability that the unmanned aerial vehicle selects each option in an option set, where the options in the option set include data acquisition of each sensor in the sensor network, return journey charging, and task completion;
a determining module 203, configured to determine a most preferred option according to a probability that the drone selects each option in the option set;
an executing module 204, configured to acquire a policy corresponding to the most preferable item, and control the unmanned aerial vehicle to execute the policy;
a judging module 205, configured to judge whether the most preferable item is the end task.
Example four
Fig. 3 is a schematic diagram of a computer device according to a fourth embodiment of the present invention. The computer device 30 comprises a memory 301, a processor 302 and a computer program 303, such as a drone data acquisition program, stored in the memory 301 and executable on the processor 302. The processor 302, when executing the computer program 303, implements the steps in the above-described drone data acquisition method embodiments, such as 101-05 shown in fig. 1. Alternatively, the computer program, when executed by a processor, implements the functionality of modules in the above-described apparatus embodiments, such as modules 201-205 in FIG. 2.
Illustratively, the computer program 303 may be partitioned into one or more modules that are stored in the memory 301 and executed by the processor 302 to perform the present method. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 303 in the computer device 30.
The computer device 30 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. Those skilled in the art will appreciate that the schematic diagram 3 is merely an example of the computer device 30 and does not constitute a limitation of the computer device 30, and may include more or less components than those shown, or combine certain components, or different components, for example, the computer device 30 may also include input and output devices, network access devices, buses, etc.
The Processor 302 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor 302 may be any conventional processor or the like, the processor 302 being the control center of the computer device 30 and connecting the various parts of the overall computer device 30 using various interfaces and lines.
The memory 301 may be used to store the computer program 303, and the processor 302 may implement various functions of the computer device 30 by running or executing the computer program or module stored in the memory 301 and calling data stored in the memory 301. The memory 301 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the computer device 30. In addition, the memory 301 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other non-volatile solid state storage device.
The modules integrated by the computer device 30 may be stored in a storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a storage medium and executed by a processor, to instruct related hardware to implement the steps of the embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one position, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a mode of hardware and a software functional module.
The integrated module implemented in the form of a software functional module may be stored in a storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. Furthermore, it is to be understood that the word "comprising" does not exclude other modules or steps, and the singular does not exclude the plural. A plurality of modules or means recited in the system claims may also be implemented by one module or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (6)

1. A data acquisition method for an unmanned aerial vehicle, the method comprising:
from a training sample set
Figure FDA0003811037920000011
Training value function neural network by randomly extracting training samples in the training sample set
Figure FDA0003811037920000012
The kth training sample in (1)
Figure FDA0003811037920000013
Figure FDA0003811037920000014
Figure FDA0003811037920000015
Is the state information of the unmanned aerial vehicle before the training,
Figure FDA0003811037920000016
is composed of
Figure FDA0003811037920000017
The most preferred option under the conditions of the condition,
Figure FDA0003811037920000018
for unmanned aerial vehicle implementation
Figure FDA0003811037920000019
The total instant prize to be won later on,
Figure FDA00038110379200000110
for unmanned aerial vehicle implementation
Figure FDA00038110379200000111
The latter state information, the overall instant prize
Figure FDA00038110379200000112
Including electric power rewards
Figure FDA00038110379200000113
Collecting rewards
Figure FDA00038110379200000114
And path rewards
Figure FDA00038110379200000115
The electric quantity reward
Figure FDA00038110379200000116
Calculated according to the following formula:
Figure FDA00038110379200000117
the collection reward
Figure FDA00038110379200000118
Calculated according to the following formula:
Figure FDA00038110379200000119
the path reward
Figure FDA00038110379200000120
Calculated according to the following formula:
Figure FDA00038110379200000121
wherein N is e 、N c And N l Are all negative constants,. L k For unmanned aerial vehicles at
Figure FDA00038110379200000122
The distance of internal flight;
using training samples d k The loss function for training the value function neural network is as follows:
Figure FDA00038110379200000123
wherein
Figure FDA00038110379200000124
Representing the expectation, gamma is a discount factor, theta represents all parameters of the value function neural network, Q op A neural network representing a function of said values,
Figure FDA00038110379200000125
representing the target network of the value function neural network, and the updating rule of theta is as follows:
Figure FDA00038110379200000126
wherein alpha is learningVelocity, θ new And theta old Respectively representing the updated parameters and the parameters before updating of the value function neural network, and the loss function
Figure FDA00038110379200000127
Gradient of (2)
Figure FDA00038110379200000128
Comprises the following steps:
Figure FDA0003811037920000021
(a) Acquiring current state information of the unmanned aerial vehicle, wherein the state information comprises the percentage of data acquired by each sensor in a sensor network, the current position of the unmanned aerial vehicle and the residual electric quantity of the unmanned aerial vehicle;
(b) Inputting the state information into a value function neural network to obtain the probability of each option in an option set selected by the unmanned aerial vehicle, wherein the options in the option set comprise data of each sensor in a sensor network, return journey charging and task ending;
(c) Determining a most preferred option according to the probability of each option in the option set selected by the unmanned aerial vehicle;
(d) Acquiring a strategy corresponding to the most preferable item, and controlling the unmanned aerial vehicle to execute the strategy;
(e) And (b) judging whether the optimal preference is the task ending or not, and if the optimal preference is not the task ending, returning to the step (a).
2. The unmanned aerial vehicle data acquisition method of claim 1, wherein the value function neural network comprises an input layer, a hidden layer, and an output layer, the hidden layer comprises a first fully-connected layer and a second fully-connected layer, and an output of the first fully-connected layer is:
Figure FDA0003811037920000022
wherein W 1 And b 1 Respectively, a weight parameter and a deviation parameter of the first fully connected layer, wherein ReLU is a linear rectification function;
the output of the second fully connected layer is:
Figure FDA0003811037920000023
wherein W 2 And b 2 Respectively, a weight parameter and a deviation parameter of the second fully-connected layer;
the output of the output layer is:
Figure FDA0003811037920000024
wherein W 3 And b 3 Respectively, a weight parameter and a bias parameter of the output layer, softmax being a normalized exponential function.
3. The drone data collection method of claim 1, wherein the determining a most preferred item according to the probability of the drone selecting each option in the set of options comprises:
generating a random number between 0 and 1;
judging whether the random number is smaller than epsilon, wherein epsilon is a constant between 0 and 1;
if the random number is smaller than epsilon, randomly selecting one option from the option set as the optimal option;
and if the random number is not less than epsilon, selecting the option with the highest probability from the option set as the most preferred option.
4. An unmanned aerial vehicle data acquisition device, its characterized in that, the device includes:
a training module for deriving a set of training samples
Figure FDA0003811037920000031
Training value function neural network by randomly extracting training samples in the training sample set
Figure FDA0003811037920000032
The kth training sample of (1)
Figure FDA0003811037920000033
Figure FDA0003811037920000034
Is the state information of the unmanned aerial vehicle before the training,
Figure FDA0003811037920000035
is composed of
Figure FDA0003811037920000036
The most preferred option under the conditions of the condition,
Figure FDA0003811037920000037
for unmanned aerial vehicle implementation
Figure FDA0003811037920000038
The total instant prize to be won later on,
Figure FDA0003811037920000039
for unmanned aerial vehicle implementation
Figure FDA00038110379200000310
The latter state information, the overall instant prize
Figure FDA00038110379200000311
Including electric power rewards
Figure FDA00038110379200000312
Collecting rewards
Figure FDA00038110379200000313
And path rewards
Figure FDA00038110379200000314
The electric quantity reward
Figure FDA00038110379200000315
Calculated according to the following formula:
Figure FDA00038110379200000316
the collection reward
Figure FDA00038110379200000317
Calculated according to the following formula:
Figure FDA00038110379200000318
the path reward
Figure FDA00038110379200000319
Calculated according to the following formula:
Figure FDA00038110379200000320
wherein N is e 、N c And N l Are all negative constants,. L k For unmanned aerial vehicle at
Figure FDA00038110379200000321
The distance of internal flight;
using training samples d k The loss function for training the value function neural network is as follows:
Figure FDA00038110379200000322
wherein
Figure FDA00038110379200000323
Denotes the expectation, gamma is a discount factor, theta denotes all parameters of the value function neural network, Q op A neural network representing a function of said values,
Figure FDA00038110379200000324
representing the target network of the value function neural network, and the updating rule of theta is as follows:
Figure FDA0003811037920000041
where α is the learning rate, θ new And theta old Respectively representing the updated parameters and the parameters before updating of the value function neural network, and the loss function
Figure FDA0003811037920000042
Gradient of (2)
Figure FDA0003811037920000043
Comprises the following steps:
Figure FDA0003811037920000044
the acquisition module is used for acquiring the current state information of the unmanned aerial vehicle, wherein the state information comprises the percentage of data acquired by each sensor in the sensor network, the current position of the unmanned aerial vehicle and the residual electric quantity of the unmanned aerial vehicle;
the planning module is used for inputting the state information into the value function neural network to obtain the probability of each option in an option set selected by the unmanned aerial vehicle, wherein the options in the option set comprise data of each sensor in the sensor network, return flight charging and task ending;
a determining module for determining a most preferred option according to a probability of each option in the set of options being selected by the drone;
the execution module is used for acquiring the strategy corresponding to the most preferred item and controlling the unmanned aerial vehicle to execute the strategy;
and the judging module is used for judging whether the optimal option is the task ending.
5. A computer device, characterized in that the computer device comprises a processor for executing a computer program stored in a memory to implement the drone data acquisition method of any one of claims 1 to 3.
6. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the drone data acquisition method of any one of claims 1 to 3.
CN202010584082.4A 2020-06-23 2020-06-23 Unmanned aerial vehicle data acquisition method and related equipment Active CN111752304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010584082.4A CN111752304B (en) 2020-06-23 2020-06-23 Unmanned aerial vehicle data acquisition method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010584082.4A CN111752304B (en) 2020-06-23 2020-06-23 Unmanned aerial vehicle data acquisition method and related equipment

Publications (2)

Publication Number Publication Date
CN111752304A CN111752304A (en) 2020-10-09
CN111752304B true CN111752304B (en) 2022-10-14

Family

ID=72676678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010584082.4A Active CN111752304B (en) 2020-06-23 2020-06-23 Unmanned aerial vehicle data acquisition method and related equipment

Country Status (1)

Country Link
CN (1) CN111752304B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360276B (en) * 2021-04-15 2022-09-27 北京航空航天大学 Unmanned aerial vehicle system task planning method and device based on health state
CN113433967B (en) * 2021-06-07 2022-11-25 北京邮电大学 Chargeable unmanned aerial vehicle path planning method and system
CN113283013B (en) * 2021-06-10 2022-07-19 北京邮电大学 Multi-unmanned aerial vehicle charging and task scheduling method based on deep reinforcement learning
CN114237281B (en) * 2021-11-26 2023-11-21 国网北京市电力公司 Unmanned aerial vehicle inspection control method, unmanned aerial vehicle inspection control device and inspection system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650201A (en) * 2008-08-13 2010-02-17 中国科学院自动化研究所 System and method for ground information acquisition
CN109583665A (en) * 2018-12-26 2019-04-05 武汉烽火凯卓科技有限公司 A kind of unmanned plane charging tasks dispatching method in wireless sensor network
CN110324805A (en) * 2019-07-03 2019-10-11 东南大学 A kind of radio sensor network data collection method of unmanned plane auxiliary
CN110329101A (en) * 2019-05-30 2019-10-15 成都尚德铁科智能科技有限公司 A kind of wireless sensing system based on integrated wireless electrical transmission and unmanned plane
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN110856134A (en) * 2019-10-16 2020-02-28 东南大学 Large-scale wireless sensor network data collection method based on unmanned aerial vehicle

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650201A (en) * 2008-08-13 2010-02-17 中国科学院自动化研究所 System and method for ground information acquisition
CN109583665A (en) * 2018-12-26 2019-04-05 武汉烽火凯卓科技有限公司 A kind of unmanned plane charging tasks dispatching method in wireless sensor network
CN110329101A (en) * 2019-05-30 2019-10-15 成都尚德铁科智能科技有限公司 A kind of wireless sensing system based on integrated wireless electrical transmission and unmanned plane
CN110324805A (en) * 2019-07-03 2019-10-11 东南大学 A kind of radio sensor network data collection method of unmanned plane auxiliary
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN110856134A (en) * 2019-10-16 2020-02-28 东南大学 Large-scale wireless sensor network data collection method based on unmanned aerial vehicle

Also Published As

Publication number Publication date
CN111752304A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
CN111752304B (en) Unmanned aerial vehicle data acquisition method and related equipment
Liu et al. Distributed and energy-efficient mobile crowdsensing with charging stations by deep reinforcement learning
Liu et al. Energy-efficient UAV crowdsensing with multiple charging stations by deep learning
CN108361927A (en) A kind of air-conditioner control method, device and air conditioner based on machine learning
CN112465151A (en) Multi-agent federal cooperation method based on deep reinforcement learning
CN112580801B (en) Reinforced learning training method and decision-making method based on reinforced learning
CN108458716A (en) A kind of electric vehicle charging air navigation aid based on the prediction of charging pile dynamic occupancy
WO2019071909A1 (en) Automatic driving system and method based on relative-entropy deep inverse reinforcement learning
CN110736478A (en) unmanned aerial vehicle assisted mobile cloud-aware path planning and task allocation scheme
CN108803609B (en) Partially observable automatic driving decision method based on constraint online planning
CN111090899B (en) Spatial layout design method for urban building
CN109726676B (en) Planning method for automatic driving system
CN116345578B (en) Micro-grid operation optimization scheduling method based on depth deterministic strategy gradient
CN113852994B (en) High-altitude base station cluster auxiliary edge calculation method used in emergency communication
CN114261400A (en) Automatic driving decision-making method, device, equipment and storage medium
Zhang et al. cgail: Conditional generative adversarial imitation learning—an application in taxi drivers’ strategy learning
CN108106624A (en) A kind of more people&#39;s Dispatch by appointment paths planning methods and relevant apparatus
Tagliaferri et al. A real-time strategy-decision program for sailing yacht races
CN114519433A (en) Multi-agent reinforcement learning and strategy execution method and computer equipment
Song et al. Generalized Model and Deep Reinforcement Learning-Based Evolutionary Method for Multitype Satellite Observation Scheduling
CN113619604A (en) Integrated decision and control method and device for automatic driving automobile and storage medium
Luo et al. Fleet rebalancing for expanding shared e-Mobility systems: A multi-agent deep reinforcement learning approach
CN110032437A (en) A kind of calculating task processing method and processing device based on information timeliness
CN116167254A (en) Multidimensional city simulation deduction method and system based on city big data
CN116259175A (en) Vehicle speed recommendation method and device for diversified dynamic signal lamp modes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant