CN112148008A - Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning - Google Patents
Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN112148008A CN112148008A CN202010988055.3A CN202010988055A CN112148008A CN 112148008 A CN112148008 A CN 112148008A CN 202010988055 A CN202010988055 A CN 202010988055A CN 112148008 A CN112148008 A CN 112148008A
- Authority
- CN
- China
- Prior art keywords
- matrix
- aerial vehicle
- unmanned aerial
- scene
- threat
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000002787 reinforcement Effects 0.000 title claims abstract description 26
- 239000011159 matrix material Substances 0.000 claims abstract description 77
- 238000001514 detection method Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 description 33
- 230000009471 action Effects 0.000 description 20
- 238000005070 sampling Methods 0.000 description 10
- 230000007613 environmental effect Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000003068 static effect Effects 0.000 description 7
- 238000013461 design Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0246—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
- G05D1/0253—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means extracting relative motion information from a plurality of images taken successively, e.g. visual odometry, optical flow
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0223—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Aviation & Aerospace Engineering (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Electromagnetism (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention belongs to the field of route planning, and relates to a real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning. The method comprises the following steps: 101, acquiring a threat matrix in a current detection range of the unmanned aerial vehicle; the central point of the current detection range is the position point of the unmanned aerial vehicle; the threat matrix comprises threat coefficients of each position; step 102, determining the distances from the destination of the unmanned aerial vehicle to each point in the current detection range, and taking the distances as a current distance matrix; 103, obtaining the current flight direction of the unmanned aerial vehicle and the position of the unmanned aerial vehicle flying to the next moment according to the threat matrix, the current distance matrix and the trained A3C network; 104, the unmanned aerial vehicle flies to the position of the next moment along the current flight direction; meanwhile, judging whether the position of the next moment reaches the destination or not; and 105, if not, executing the step 101.
Description
Technical Field
The invention belongs to the field of route planning, and relates to a real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning.
Background
The basic goal of unmanned aerial vehicle routing is to autonomously obtain a flight path that can safely reach the target avoiding threats. In recent years, the technical method of the route planning problem is rapidly developed, and a plurality of documents propose modeling and solving methods for the problem from different aspects. These technologies can be divided into two broad categories according to the different threat information acquisition modes: one is static routing technology, i.e., routing based on environmental prior complete information. The unmanned aerial vehicle constructs a safe, feasible and satisfactory path between an initial point and a target point according to the global complete environment threat information; another type is real-time routing techniques, which assume that the threat environment is either completely or partially unknown a priori in this work. At this time, the unmanned aerial vehicle can only obtain threat information within a limited range (usually, a sensor detection range), and an airway needs to be planned in real time in the flight process in order to reach a target safely. If the real-time route planning needs to be operated on an on-board computer, the on-line real-time route planning is called. The following discussion is directed to static routing and real-time routing, respectively.
A first aspect is directed to a method of static path planning. The key point of unmanned aerial vehicle static path planning is how to calculate a global optimization path under the condition that all threat environments are known. A common planning method includes: searching feasible paths and optimizing the feasible paths in a Voronoi diagram constructing mode; describing the probability of the threat region in a form of a graph in a learning stage, and constructing a feasible path between two nodes in a query stage, or constructing the path by adopting a probability road marking method; visual graphics, Silhouettes, etc. Given the overall threat environment, these methods may compute a safe feasible or optimal flight path for the global threat environment. However, due to the large flight area, the limited detection range of the unmanned aerial vehicle, the variety of threat sources, the dynamic change of threat information, the difficulty in accurately describing and the like, the unmanned aerial vehicle cannot directly acquire the complete information of the flight area, and real-time detection is required in the flight process, so that the static route planning method has certain limitation in practical application.
A second aspect is directed to a method of real-time routing. The important point of unmanned aerial vehicle real-time route planning is how to plan a global route from a starting point to a target point according to the detected limited environment information. The current research mainly takes a robot path planning method as a reference, and combines the performance of the unmanned aerial vehicle and the particularity of the flight environment to carry out method research. At present, the proposed method can be classified into the following methods according to different modeling ideas:
(1) a probability-based approach. Klasing et al replanned paths in real time by using a Cell-based probabilistic roadmaps (Cell-based probabilistic roadmaps) method; jun and D' Andrea propose an air route planning algorithm based on a threat probability graph; zengin and Dogan develop a Probabilistic modeling framework (Probabilistic modeling framework) in a dynamic environment, and provide a relatively complete solution for path planning.
(2) A mathematical programming method. A series of methods for solving paths in real time by using mixed integer programming are given in many recent literatures; shi and Wang adopt a method of combining a Bayesian decision theory and a dynamic programming algorithm to solve an optimal path; in addition, there are methods of artificial Potential fields (Potential field approaches) based on Stream functions (Stream functions), Global dynamic window approaches (Global dynamic window approaches), methods based on Evolutionary computation (evolution), boundary tracking methods (bounding based methods), and the like for real-time path planning; lan and Wen et al analyzed and compared the advantages and disadvantages of path planning using different planning methods.
(3) A method for combining global path planning and real-time path adjustment. Xiao and Gao et al first generate an initial path according to an improved Voronoi diagram by using a Dijkstra algorithm, and then re-plan the path by using a transformed linear dynamic system based on a hybrid dynamic bayesian network when threat information changes; yan and Ding et al search for a feasible path in real time by using a Hybrid path re-planning method (Hybrid path re-planning) based on a Roadmap (Roadmap Diagram) on the basis of giving an initial path; tarjan also provides a general method capable of solving most path problems based on Directed graphs (Directed graphs), and explains that constructing path expressions is the most common path problem in a certain sense, but the method has certain limitations on efficiency and feasibility in solving specific problems.
In addition to the above methods, there are some real-time methods that are improved by static methods (a-x-algorithm, Voronoi diagram, etc.), e.g., Beard et al dynamically generate feasible paths based on the improved Voronoi diagram; bernhard et al provide a local operation iteration step method using Dijkstra's algorithm, and further determine the optimal trajectory for each step; chen et al propose a method for unmanned fighter routing in an unknown environment based on the D-x algorithm, wherein sudden threats are also taken into account.
However, in practical application, the unmanned aerial vehicle cannot acquire all environmental information from the view angle, so that the static route planning has certain limitations; meanwhile, due to the characteristics of complexity and locality of environment description and the problems of large real-time algorithm calculation amount and the like of the route planning method, the dynamic route planning has certain limitation.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the real-time route planning problem of the unmanned aerial vehicle is a continuous decision problem, and the traditional route planning methods such as a genetic algorithm and a fast-expanding random tree algorithm have the characteristics of large calculation amount of the real-time algorithm, complexity and limitation of environment description and the like, so that the traditional route planning methods are difficult to be really applied to an actual unmanned aerial vehicle system. The deep learning method has a good effect on solving the complexity and real-time problems in the practical problem, particularly, the deep reinforcement learning has an advantage on solving the continuous decision problem, and the method can exactly solve the problem of real-time route planning of the unmanned aerial vehicle in the complex environment. The invention provides an unmanned aerial vehicle path prediction method based on deep reinforcement learning, which aims to overcome complex unknown environments and complex real-time path planning models, can autonomously predict a real-time air route in real time according to detected environmental information and provides unmanned aerial vehicle real-time navigation and obstacle avoidance functions based on deep reinforcement learning.
The technical scheme of the invention is as follows:
a real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning comprises the following steps:
101, acquiring a threat matrix in a current detection range of the unmanned aerial vehicle;
step 102, determining the distances from the destination of the unmanned aerial vehicle to each point in the current detection range, and taking the distances as a current distance matrix;
103, obtaining the current flight direction of the unmanned aerial vehicle and the position of the unmanned aerial vehicle flying to the next moment according to the threat matrix, the current distance matrix and the trained A3C network;
104, the unmanned aerial vehicle flies to the position of the next moment along the current flight direction; meanwhile, judging whether the position of the next moment reaches the destination or not;
and 105, if not, executing the step 101.
Further, if not, step 101 is executed, which includes:
if not, judging whether the current execution times is greater than or equal to a preset threshold or not;
if so, the airplane does not fly according to the prediction of the A3C network any more, and the airplane returns; if not, go to step 101.
Further, obtain unmanned aerial vehicle's current flight direction and fly to the position at next moment, include:
inputting the threat matrix and the current distance matrix into a trained A3C network, and predicting the probability of the airplane flying to each direction;
taking the flying direction of the airplane with the maximum probability as the current flying direction;
acquiring a position matrix of a current detection range corresponding to the current distance matrix;
and skipping M points along the direction corresponding to the current flight direction according to the position of the position matrix where the unmanned aerial vehicle is located, and taking the M +1 th point as the position point of the next moment.
Further, the method further comprises:
the untrained A3C network is trained with multiple scene samples, updating the A3C network parameters.
Further, for one scene sample, the untrained A3C network is trained by a plurality of scene samples, including:
discretizing a scene sample comprising an origin and a destination to obtain a scene matrix;
obtaining a scene distance matrix and a scene threat matrix according to the scene matrix; the scene distance matrix is a matrix formed by the distances from each point in the scene matrix to the destination; the scene threat matrix comprises threat coefficients of all points in the scene matrix;
acquiring a sub-matrix taking an initial point as a center in the scene distance matrix as a distance matrix, and acquiring a sub-matrix at a corresponding position in the scene threat matrix as a threat matrix;
inputting the two sub-matrixes into an untrained A3C network to obtain the current flight direction, the position of flying to the next moment, a reward value and a value estimation value until the unmanned aerial vehicle flies to the destination or does not fly to the destination for more than the preset number of times;
parameters of the A3C network are updated based on the reward and value estimates.
Further, the central point of the current detection range is located at the position point where the unmanned aerial vehicle is located.
Further, the threat matrix includes threat coefficients for each location.
A computer-readable storage medium having stored thereon a computer program comprising instructions for carrying out the method of any of the above.
The invention has the beneficial effects that: the invention learns what flight action the unmanned aerial vehicle makes under the current position environment through the A3C algorithm, thereby helping the unmanned aerial vehicle to predict the path of the next position under the completely unknown complex environment and further guiding the flight of the unmanned aerial vehicle. According to the invention, the unmanned aerial vehicle can completely realize flight decision under unknown complex obstacle environment, and the unmanned aerial vehicle determines the flight position at the next moment according to the environmental information of the current position, so that the limitation and complexity of real-time calculation of the original route planning algorithm are broken through.
Drawings
FIG. 1 is a technical scheme;
FIG. 2 is an A3C frame;
FIG. 3 is a schematic diagram of Actor-Critic;
fig. 4 is a schematic diagram of gradient update of the A3C algorithm.
Detailed Description
Embodiments of the invention are explained below with reference to the drawings.
With the development of intelligent technology, the deep learning method has a good effect on solving the problems of complexity and real-time performance in practical problems, wherein the deep reinforcement learning can well solve the problem of continuous decision making, and the method can just solve the problem of real-time route planning of the unmanned aerial vehicle in a complex environment.
The invention provides a real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning, which has the design idea that: the method comprises the steps of abstracting an airway planning scene into a multi-dimensional matrix by means of simulating image pixel points, taking environment information (namely scene information abstracted into the multi-dimensional matrix) in a detection range of the unmanned aerial vehicle as input of a deep reinforcement learning network, outputting flight actions of the unmanned aerial vehicle in the current environment and value estimation of the unmanned aerial vehicle in the current environment through training of a neural network, flying the unmanned aerial vehicle to the next position according to the flight actions to obtain a corresponding reward value, and taking the environment information detected at the next position as input of the deep reinforcement learning network. The above steps are repeated in a circulating way, and on one hand, the training of the network model is carried out through a corresponding criterion function; on the other hand, after the model is trained, the unmanned aerial vehicle can be guided to fly according to the environmental information detected at the current position of the unmanned aerial vehicle, so that the unmanned aerial vehicle can plan the airway in real time in the flying process. The deep reinforcement learning method adopted by the invention is an A3C algorithm (asynchronous acquired activity-critical algorithm), and the specific flow is shown in figure 1.
(1) A3C algorithm
The A3C algorithm is one of deep reinforcement learning algorithms, and is an improvement on an AC (Actor-critical) algorithm, and the architecture of the AC algorithm is shown in fig. 3. Wherein the Actor selects action based on the output probability of the Policy, and Critic judges the score of the action based on the action of the Actor. The Critic network plays a role in judging the potential value of the current state, and the TD error generated by the Critic network is used for updating the Actor network.
The AC algorithm combines the advantages of a strategy-based reinforcement learning algorithm and an estimation network-based reinforcement learning algorithm, so that the AC algorithm is more effective in a high-dimensional and continuous motion space. As shown in fig. 4, the A3C algorithm is an algorithm for parallelization of the AC algorithm, and the AC algorithm is put into multiple threads (or processes) for synchronous training, so that computer resources can be effectively utilized, and training efficiency is improved.
(2) Real-time unmanned aerial vehicle route planning based on deep reinforcement learning
The reinforcement learning method adopted by the invention is an A3C algorithm, and a deep reinforcement learning problem comprises three main concepts, namely Environment state (Environment state), Action (Action) and Reward (Reward). Aiming at the problem of route planning under a two-dimensional scene, in the A3C-based unmanned aerial vehicle route planning, the route planning scene is abstracted into two-dimensional matrixes, namely a threat matrix and a distance matrix, by adopting a mode of simulating image pixel points. Suppose the area K (m) of the scene or the air route planning result graph2) Accordingly, through sampling, the scale of the adjustment matrix is N × N, the sampling time of the route planning of the unmanned aerial vehicle is T, and the flight distance D of the unmanned aerial vehicle should satisfy D ═ K/N in the time. In this way, K × K (m)2) The scene is mapped to two-dimensional matrixes with the size of N multiplied by N, namely the scene is mapped to a Threat matrix thread _ matrix with the size of N multiplied by N and a Distance matrix Distance _ matrix with the size of N multiplied by N; the threat matrix is a threat degree matrix abstracted according to a scene, and the distance matrix is the Euclidean distance between each sampling point and a target point in the scene. The extracted threat matrix and distance matrix represent the current flying scene of the unmanned aerial vehicle and correspond to the Environment (Environment) concept in reinforcement learning. It can be expressed as S ═ M (M) for environmental information in the detection range of the droneij)k×kWherein M isij=(αij,χij),αijRepresenting the degree of threat, χ, abstracted from the sceneijRepresentation and target point in EuropeThe distance k x k represents the detection range of the unmanned aerial vehicle, and S can be understood as a threat matrix and a distance matrix detected in a scene at the current position of the unmanned aerial vehicle; therefore, the environmental information S in the detection range of the current position of the drone can correspond to the current environmental state (state) in the reinforcement learning, and serve as the current state of the drone as the input of the deep reinforcement learning algorithm A3C. The air route planning problem under the two-dimensional scene can be with the flight control of unmanned aerial vehicle discretization to the vector of 4 dimensions, show that unmanned aerial vehicle flies forward, backward, left and right respectively (be X, the speed of Y direction), be equivalent to that unmanned aerial vehicle can select to fly forward or fly backward or fly left or fly right under current environment, just so can be the flight direction discretization to 4 directions to as unmanned aerial vehicle's flight action space. After the unmanned aerial vehicle selects the flight direction, the default unmanned aerial vehicle flies for a certain distance along the flight direction to reach the next position. Thus, the 4 flying directions and the default flying distance of the flying dispersion can correspond to the actions (actions) in reinforcement learning. So far, the problem of unmanned aerial vehicle route planning under the two-dimensional scene is converted into the problem of reinforcement learning, and in order to make the planned path reasonable and have robustness, reasonable definition needs to be carried out on the reward value.
The unmanned aerial vehicle route planning based on A3C disperses the flight control of the unmanned aerial vehicle into vectors with 4 dimensions, and the task goal is to obtain as many return values as possible on the premise of reaching a target point. In a specific time period, the strengthening unit determines the next action according to the current environment state of the unmanned aerial vehicle, establishes a probability relation between the state set and the action set, and the more return values of the quantization standard, the better the execution.
The unmanned aerial vehicle route planning based on A3C reduces the action space of the unmanned aerial vehicle to 4 flight actions described above, taking as the return value (Reward) whether the unmanned aerial vehicle can advance in a direction closer to the target, and as the state information, the scene environment information (threat matrix and distance matrix). Based on such assumptions, the specific algorithm flow is as follows:
the mode of change between the states of the drone is determined by the action of the drone for its next state, which in turn affects the next action.
Where gamma is the discount factor.
The network framework of A3C is shown in FIG. 2, and gradient update training is performed on the whole network according to pi(s) of the strategy net and V(s) of the evaluation net, wherein the update gradient of pi(s) of the strategy net is:
wherein the update gradient of the estimation mesh is:
the pseudo code for a specific algorithm flow is shown in table 1 below.
TABLE 1
(3) Reward value design
The design of the prize value is an important ring in the A3C algorithm, and the reasonable design of the prize value is one of the tasks that the invention needs to pay attention to. The invention designs the following prize values:
for the sampling point where the unmanned aerial vehicle is (i.e. the position where a certain point in the NxN scene after sampling is located, the sampling fieldThe scene contains a threat matrix and a distance matrix), the distance between the unmanned aerial vehicle and the target is diThe threat degree corresponding to the scene is tiNormalized distance r between the sample point and the targetiWhere i ═ 1, 2., nxn, the number of nxn sampling points, is defined as follows:
ri=di×exp(ti) (5)
aiming at the current sampling point position i of the unmanned aerial vehicle, selecting a certain action in an action space according to the environmental information S in the detection range of the unmanned aerial vehicle to obtain the next sampling point position i +1 of the unmanned aerial vehicle, and calculating the normalized distance r between the current sampling point position i and a targetiAnd normalized distance r between next position i +1 and targeti+1And determining the reward value of the action selection in the current state S of the unmanned aerial vehicle by comparing the two distances:
in addition, the reward value should also be related to whether the target point is reached and the length of the flight path required to reach the target point; the method defaults that when the distance between the unmanned aerial vehicle and the target point is short (the value is the 10 th value arranged in the ascending order in the distance matrix), the unmanned aerial vehicle can reach the target point. Meanwhile, when the unmanned aerial vehicle reaches the target point after a series of continuous decisions, the reward value is 100; when successive decisions exceed a certain value tmaxWhen the target point is still unreachable, the reward value is-100.
The invention mainly relates to real-time path prediction of an unmanned aerial vehicle, and provides a real-time path prediction method of the unmanned aerial vehicle based on deep reinforcement learning, so as to realize real-time autonomous flight decision of the unmanned aerial vehicle. The main innovation points of the invention are as follows:
the invention provides an unmanned aerial vehicle real-time autonomous flight decision method based on an A3C model. And establishing a multi-dimensional detection information matrix by using threat information, position information and the like detected by the unmanned aerial vehicle sensor information, and then determining the next flight position of the unmanned aerial vehicle by using the trained A3C network.
Claims (8)
1. A real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning is characterized by comprising the following steps:
101, acquiring a threat matrix in a current detection range of the unmanned aerial vehicle;
step 102, determining the distances from the destination of the unmanned aerial vehicle to each point in the current detection range, and taking the distances as a current distance matrix;
103, obtaining the current flight direction of the unmanned aerial vehicle and the position of the unmanned aerial vehicle flying to the next moment according to the threat matrix, the current distance matrix and the trained A3C network;
104, the unmanned aerial vehicle flies to the position of the next moment along the current flight direction; meanwhile, judging whether the position of the next moment reaches the destination or not;
and 105, if not, executing the step 101.
2. The method of claim 1, wherein if not, performing step 101 comprises:
if not, judging whether the current execution times is greater than or equal to a preset threshold or not;
if so, the airplane does not fly according to the prediction of the A3C network any more, and the airplane returns; if not, go to step 101.
3. The method of claim 1, wherein obtaining the current flight direction of the drone and the location of the drone to the next time comprises:
inputting the threat matrix and the current distance matrix into a trained A3C network, and predicting the probability of the airplane flying to each direction;
taking the flying direction of the airplane with the maximum probability as the current flying direction;
acquiring a position matrix of a current detection range corresponding to the current distance matrix;
and skipping M points along the direction corresponding to the current flight direction according to the position of the position matrix where the unmanned aerial vehicle is located, and taking the M +1 th point as the position point of the next moment.
4. The method of claim 3, further comprising:
the untrained A3C network is trained with multiple scene samples, updating the A3C network parameters.
5. The method of claim 4, wherein training the untrained A3C network with multiple scene samples for one scene sample comprises:
discretizing a scene sample comprising an origin and a destination to obtain a scene matrix;
obtaining a scene distance matrix and a scene threat matrix according to the scene matrix; the scene distance matrix is a matrix formed by the distances from each point in the scene matrix to the destination; the scene threat matrix comprises threat coefficients of all points in the scene matrix;
acquiring a sub-matrix taking an initial point as a center in the scene distance matrix as a distance matrix, and acquiring a sub-matrix at a corresponding position in the scene threat matrix as a threat matrix;
inputting the two sub-matrixes into an untrained A3C network to obtain the current flight direction, the position of flying to the next moment, a reward value and a value estimation value until the unmanned aerial vehicle flies to the destination or does not fly to the destination for more than the preset number of times;
parameters of the A3C network are updated based on the reward and value estimates.
6. The method of claim 1, wherein a center point of the current detection range is located at a location point where the drone is located.
7. The method of claim 1, wherein the threat matrix comprises threat coefficients for each location.
8. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program comprising instructions for carrying out the method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010988055.3A CN112148008B (en) | 2020-09-18 | 2020-09-18 | Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010988055.3A CN112148008B (en) | 2020-09-18 | 2020-09-18 | Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112148008A true CN112148008A (en) | 2020-12-29 |
CN112148008B CN112148008B (en) | 2023-05-02 |
Family
ID=73893992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010988055.3A Active CN112148008B (en) | 2020-09-18 | 2020-09-18 | Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112148008B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113283827A (en) * | 2021-04-16 | 2021-08-20 | 北京航空航天大学合肥创新研究院(北京航空航天大学合肥研究生院) | Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning |
CN113743605A (en) * | 2021-06-16 | 2021-12-03 | 温州大学 | Method for searching smoke and fire detection network architecture based on evolution method |
CN114089752A (en) * | 2021-11-11 | 2022-02-25 | 深圳市杉川机器人有限公司 | Autonomous exploration method for robot, and computer-readable storage medium |
CN114355980A (en) * | 2022-01-06 | 2022-04-15 | 上海交通大学宁波人工智能研究院 | Four-rotor unmanned aerial vehicle autonomous navigation method and system based on deep reinforcement learning |
CN116148862A (en) * | 2023-01-16 | 2023-05-23 | 无锡市雷华科技有限公司 | Comprehensive early warning and evaluating method for bird detection radar flying birds |
CN116627181A (en) * | 2023-07-25 | 2023-08-22 | 吉林农业大学 | Intelligent obstacle avoidance method for plant protection unmanned aerial vehicle based on spatial reasoning |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011009011A1 (en) * | 2009-07-15 | 2011-01-20 | Massachusetts Institute Of Technology | An integrated framework for vehicle operator assistance based on a trajectory prediction and threat assessment |
US20170146991A1 (en) * | 2015-11-24 | 2017-05-25 | Northrop Grumman Systems Corporation | Spatial-temporal forecasting for predictive situational awareness |
CN106873628A (en) * | 2017-04-12 | 2017-06-20 | 北京理工大学 | A kind of multiple no-manned plane tracks the collaboration paths planning method of many maneuvering targets |
CN108731684A (en) * | 2018-05-07 | 2018-11-02 | 西安电子科技大学 | A kind of Route planner of multiple no-manned plane Cooperative Area monitoring |
CN109254591A (en) * | 2018-09-17 | 2019-01-22 | 北京理工大学 | The dynamic route planning method of formula sparse A* and Kalman filtering are repaired based on Anytime |
CN109871031A (en) * | 2019-02-27 | 2019-06-11 | 中科院成都信息技术股份有限公司 | A kind of method for planning track of fixed-wing unmanned plane |
CN109933086A (en) * | 2019-03-14 | 2019-06-25 | 天津大学 | Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study |
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
CN110866887A (en) * | 2019-11-04 | 2020-03-06 | 深圳市唯特视科技有限公司 | Target situation fusion sensing method and system based on multiple sensors |
CN110874578A (en) * | 2019-11-15 | 2020-03-10 | 北京航空航天大学青岛研究院 | Unmanned aerial vehicle visual angle vehicle identification and tracking method based on reinforcement learning |
CN111444786A (en) * | 2020-03-12 | 2020-07-24 | 五邑大学 | Crowd evacuation method, device and system based on unmanned aerial vehicle group and storage medium |
-
2020
- 2020-09-18 CN CN202010988055.3A patent/CN112148008B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011009011A1 (en) * | 2009-07-15 | 2011-01-20 | Massachusetts Institute Of Technology | An integrated framework for vehicle operator assistance based on a trajectory prediction and threat assessment |
US20170146991A1 (en) * | 2015-11-24 | 2017-05-25 | Northrop Grumman Systems Corporation | Spatial-temporal forecasting for predictive situational awareness |
CN106873628A (en) * | 2017-04-12 | 2017-06-20 | 北京理工大学 | A kind of multiple no-manned plane tracks the collaboration paths planning method of many maneuvering targets |
CN108731684A (en) * | 2018-05-07 | 2018-11-02 | 西安电子科技大学 | A kind of Route planner of multiple no-manned plane Cooperative Area monitoring |
CN109254591A (en) * | 2018-09-17 | 2019-01-22 | 北京理工大学 | The dynamic route planning method of formula sparse A* and Kalman filtering are repaired based on Anytime |
CN109871031A (en) * | 2019-02-27 | 2019-06-11 | 中科院成都信息技术股份有限公司 | A kind of method for planning track of fixed-wing unmanned plane |
CN109933086A (en) * | 2019-03-14 | 2019-06-25 | 天津大学 | Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study |
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
CN110866887A (en) * | 2019-11-04 | 2020-03-06 | 深圳市唯特视科技有限公司 | Target situation fusion sensing method and system based on multiple sensors |
CN110874578A (en) * | 2019-11-15 | 2020-03-10 | 北京航空航天大学青岛研究院 | Unmanned aerial vehicle visual angle vehicle identification and tracking method based on reinforcement learning |
CN111444786A (en) * | 2020-03-12 | 2020-07-24 | 五邑大学 | Crowd evacuation method, device and system based on unmanned aerial vehicle group and storage medium |
Non-Patent Citations (1)
Title |
---|
高晓静等: "无人机路径规划中的环境和威胁模型研究", 《航空计算技术》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113283827A (en) * | 2021-04-16 | 2021-08-20 | 北京航空航天大学合肥创新研究院(北京航空航天大学合肥研究生院) | Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning |
CN113283827B (en) * | 2021-04-16 | 2024-03-12 | 北京航空航天大学合肥创新研究院(北京航空航天大学合肥研究生院) | Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning |
CN113743605A (en) * | 2021-06-16 | 2021-12-03 | 温州大学 | Method for searching smoke and fire detection network architecture based on evolution method |
CN114089752A (en) * | 2021-11-11 | 2022-02-25 | 深圳市杉川机器人有限公司 | Autonomous exploration method for robot, and computer-readable storage medium |
CN114355980A (en) * | 2022-01-06 | 2022-04-15 | 上海交通大学宁波人工智能研究院 | Four-rotor unmanned aerial vehicle autonomous navigation method and system based on deep reinforcement learning |
CN114355980B (en) * | 2022-01-06 | 2024-03-08 | 上海交通大学宁波人工智能研究院 | Four-rotor unmanned aerial vehicle autonomous navigation method and system based on deep reinforcement learning |
CN116148862A (en) * | 2023-01-16 | 2023-05-23 | 无锡市雷华科技有限公司 | Comprehensive early warning and evaluating method for bird detection radar flying birds |
CN116148862B (en) * | 2023-01-16 | 2024-04-02 | 无锡市雷华科技有限公司 | Comprehensive early warning and evaluating method for bird detection radar flying birds |
CN116627181A (en) * | 2023-07-25 | 2023-08-22 | 吉林农业大学 | Intelligent obstacle avoidance method for plant protection unmanned aerial vehicle based on spatial reasoning |
CN116627181B (en) * | 2023-07-25 | 2023-10-13 | 吉林农业大学 | Intelligent obstacle avoidance method for plant protection unmanned aerial vehicle based on spatial reasoning |
Also Published As
Publication number | Publication date |
---|---|
CN112148008B (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112148008B (en) | Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning | |
CN113110592B (en) | Unmanned aerial vehicle obstacle avoidance and path planning method | |
Yijing et al. | Q learning algorithm based UAV path learning and obstacle avoidence approach | |
Chai et al. | Design and experimental validation of deep reinforcement learning-based fast trajectory planning and control for mobile robot in unknown environment | |
CN112650237B (en) | Ship path planning method and device based on clustering processing and artificial potential field | |
Wu | A survey on population-based meta-heuristic algorithms for motion planning of aircraft | |
CN109597425B (en) | Unmanned aerial vehicle navigation and obstacle avoidance method based on reinforcement learning | |
Ivanovic et al. | Mats: An interpretable trajectory forecasting representation for planning and control | |
CN110514206B (en) | Unmanned aerial vehicle flight path prediction method based on deep learning | |
Dong et al. | A review of mobile robot motion planning methods: from classical motion planning workflows to reinforcement learning-based architectures | |
Grigorescu et al. | Neurotrajectory: A neuroevolutionary approach to local state trajectory learning for autonomous vehicles | |
CN110926477A (en) | Unmanned aerial vehicle route planning and obstacle avoidance method | |
Xue et al. | Multi-agent deep reinforcement learning for uavs navigation in unknown complex environment | |
Xue et al. | A uav navigation approach based on deep reinforcement learning in large cluttered 3d environments | |
Sonny et al. | Q-learning-based unmanned aerial vehicle path planning with dynamic obstacle avoidance | |
Othman et al. | Deep reinforcement learning for path planning by cooperative robots: Existing approaches and challenges | |
Doshi et al. | Energy–time optimal path planning in dynamic flows: Theory and schemes | |
CN110779526B (en) | Path planning method, device and storage medium | |
Chen et al. | A study of unmanned path planning based on a double-twin RBM-BP deep neural network | |
Cui | Multi-target points path planning for fixed-wing unmanned aerial vehicle performing reconnaissance missions | |
Thomas et al. | Inverse Reinforcement Learning for Generalized Labeled Multi-Bernoulli Multi-Target Tracking | |
Saeed et al. | Domain-aware multiagent reinforcement learning in navigation | |
Prathyusha et al. | Dynamic constraint based multi-route planning and multi-obstacle avoidance model for unmanned aerial vehicles | |
Xie et al. | Hybrid AI-based Dynamic Re-routing Method for Dense Low-Altitude Air Traffic Operations | |
CN113741416B (en) | Multi-robot full-coverage path planning method based on improved predator prey model and DMPC |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |