CN112148008B - Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning - Google Patents
Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN112148008B CN112148008B CN202010988055.3A CN202010988055A CN112148008B CN 112148008 B CN112148008 B CN 112148008B CN 202010988055 A CN202010988055 A CN 202010988055A CN 112148008 B CN112148008 B CN 112148008B
- Authority
- CN
- China
- Prior art keywords
- matrix
- aerial vehicle
- unmanned aerial
- scene
- threat
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 230000002787 reinforcement Effects 0.000 title claims abstract description 26
- 239000011159 matrix material Substances 0.000 claims abstract description 72
- 238000001514 detection method Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 description 32
- 230000009471 action Effects 0.000 description 20
- 230000007613 environmental effect Effects 0.000 description 10
- 238000005070 sampling Methods 0.000 description 10
- 230000003068 static effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000010006 flight Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0246—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
- G05D1/0253—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means extracting relative motion information from a plurality of images taken successively, e.g. visual odometry, optical flow
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0223—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Aviation & Aerospace Engineering (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Electromagnetism (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention belongs to the field of route planning, and relates to a real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning. The method comprises the following steps: step 101, acquiring a threat matrix in a current detection range of the unmanned plane; wherein, the central point of the current detection range is the position point of the unmanned plane; the threat matrix comprises threat coefficients of each location; 102, determining distances from a destination of the unmanned aerial vehicle to points in a current detection range, and taking the distances as a current distance matrix; step 103, obtaining the current flight direction of the unmanned aerial vehicle and the position of the unmanned aerial vehicle to the next moment according to the threat matrix, the current distance matrix and the trained A3C network; 104, the unmanned aerial vehicle flies to the position of the next moment along the current flight direction; meanwhile, judging whether the position at the next moment reaches a destination; step 105, if not, go to step 101.
Description
Technical Field
The invention belongs to the field of route planning, and relates to a real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning.
Background
The basic goal of unmanned aerial vehicle route planning is to autonomously obtain a flight path which can avoid threatening the safe arrival of the target. In recent years, the technical approach to the problem of routings has evolved rapidly, and there are many documents that propose modeling and solving methods for the problem from different aspects. These techniques can be divided into two main categories according to the threat information acquisition manner: one type is static routings technology, i.e., routings based on environmental prior complete information. The unmanned aerial vehicle constructs a safe, feasible and satisfactory path between the starting point and the target point according to the overall complete environment threat information; another category is real-time routings techniques, where it is assumed that the threat environment is completely or partially unknown in advance. At this time, the unmanned aerial vehicle can only acquire threat information within a limited range (usually a sensor detection range), and a route needs to be planned in real time in the flight process for safely reaching the target. If real-time routing is required to run on an on-board computer, it is referred to as on-line real-time routing. The following discussion is directed to static routings and real-time routings, respectively.
The first aspect is a method relating to static path planning. The focus of unmanned plane static path planning is how to calculate a global optimized path under the condition that all threat environments are known. The usual planning methods include: searching a feasible path and optimizing the feasible path by constructing a Voronoi diagram; describing threat region probability in a graph form in a learning stage, and constructing a feasible path between two nodes in a query stage, or constructing a path by adopting a probability road sign method; visible methods, silhouettes, and the like. Given a full threat environment, these methods may calculate a safe, viable or optimal flight path for the global threat environment. However, due to the large flight area, limited detection range of the unmanned aerial vehicle, various threat source types, dynamic change of threat information, difficulty in accurate description and the like, the unmanned aerial vehicle cannot always directly acquire the complete information of the flight area, and real-time detection is required in the flight process, so that the static route planning method has a certain limitation in practical application.
The second aspect is a method relating to real-time routings. The key point of the real-time route planning of the unmanned plane is how to plan a global route from a starting point to a target point according to the detected limited environment information. The current research mainly takes a robot path planning method as a reference, and combines the performance of the unmanned aerial vehicle and the specificity of the flight environment to conduct method research. At present, the proposed method can be classified into the following methods according to modeling ideas:
(1) Probability-based methods. Klasing et al re-plan paths in real time by using a Cell-based probabilistic road sign (Cell-based probabilistic roadmaps) method; jun and D' Andrea propose a route planning algorithm based on threat probability map; zengin and Dogan developed a probabilistic model framework (Probabilistic modeling framework) in a dynamic environment that provided a more complete solution for path planning.
(2) Mathematical programming methods. Recently, a series of methods for solving paths in real time using mixed integer programming have been presented in many documents; the method of combining Bayesian decision theory with dynamic programming algorithm is adopted by Shi and Wang to solve the optimal path; in addition, there are a method (Potential field approaches) of artificial potential field based on Stream Function (Stream Function), a global dynamic window method (Global dynamic window approaches), a method (Evolutionary computation) based on evolutionary computation, a boundary tracking method (Bouncing based methods), and the like for real-time path planning; lan and Wen et al analyze and compare the advantages and disadvantages of path planning using different planning methods.
(3) A method of combining global path planning with real-time path adjustment. The initial path is firstly generated by using Dijkstra algorithm according to the improved Voronoi diagram, and then the path is re-planned by using a conversion linear dynamic system based on a hybrid dynamic Bayesian network when threat information changes; yan and Ding et al search for feasible paths in real time by using a Hybrid path re-planning method (Hybrid path re-planning algorithm) based on a road map (Roadmap Diagram) on the basis of giving an initial path; tarjan also presents a general method for solving most path problems based on Directed graphs (Directed graphs), and illustrates that constructing path expressions is the most common path problem in a certain sense, but the method has certain limitations on efficiency and feasibility in solving specific problems.
In addition to the above methods, some real-time methods modified from static methods (a-algorithm, voronoi diagram method, etc.), for example, beard et al dynamically generate viable paths based on the modified Voronoi diagram; bernhard et al use Dijkstra's algorithm to give a method of iterative steps of local operations, which is then used to determine the optimal trajectory for each step; chen et al propose a method for unmanned fighter plane routing in an unknown environment based on the D-algorithm, wherein bursty threats are also considered.
However, in practical application, the unmanned aerial vehicle cannot acquire all environmental information in a bird's eye view, so that the static route planning has certain limitation; meanwhile, the complexity and the locality of the environment description are characterized, and the route planning method is faced with the problems of large calculation amount of a real-time algorithm and the like, so that the dynamic route planning has certain limitation.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the unmanned aerial vehicle real-time route planning problem is a continuous decision-making problem, and the traditional route planning method such as a genetic algorithm and a fast expanding random tree algorithm face the characteristics of large calculation amount of the real-time algorithm, complexity and limitation of environment description and the like, so that the traditional route planning method is difficult to truly apply to an actual unmanned aerial vehicle system. The deep learning method has a very good effect on solving the problems of complexity and real-time performance in the practical problem, and particularly the deep reinforcement learning is advantageous on solving the problem of continuous decision making, and the method can exactly solve the problem of real-time route planning of the unmanned aerial vehicle in a complex environment. The invention provides an unmanned aerial vehicle path prediction method based on deep reinforcement learning, which aims to overcome a complex unknown environment and a complex real-time path planning model, can autonomously conduct real-time route prediction in real time according to detected environment information, and provides an unmanned aerial vehicle real-time navigation and obstacle avoidance function based on the deep reinforcement learning.
The technical scheme of the invention is as follows:
a real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning comprises the following steps:
step 101, acquiring a threat matrix in a current detection range of the unmanned plane;
102, determining distances from a destination of the unmanned aerial vehicle to points in a current detection range, and taking the distances as a current distance matrix;
step 103, obtaining the current flight direction of the unmanned aerial vehicle and the position of the unmanned aerial vehicle to the next moment according to the threat matrix, the current distance matrix and the trained A3C network;
104, the unmanned aerial vehicle flies to the position of the next moment along the current flight direction; meanwhile, judging whether the position at the next moment reaches a destination;
step 105, if not, go to step 101.
Further, if not, step 101 is executed, including:
if not, judging whether the current execution times are greater than or equal to a preset threshold;
if yes, the aircraft does not fly according to the prediction of the A3C network, and the aircraft returns; if not, go to step 101.
Further, obtaining the current flight direction and the position of the unmanned aerial vehicle from the next moment includes:
inputting a threat matrix and a current distance matrix into a trained A3C network, and predicting the probability of the airplane flying to each direction;
taking the direction of the aircraft with the highest probability as the current flight direction;
acquiring a position matrix of a current detection range corresponding to the current distance matrix;
and skipping M points along the direction corresponding to the current flight direction according to the position of the position matrix where the unmanned aerial vehicle is located, and taking the (M+1) th point as the position point of the next moment.
Further, the method further comprises:
training the untrained A3C network through a plurality of scene samples, and updating the A3C network parameters.
Further, training the untrained A3C network through a plurality of scene samples for one scene sample, comprising:
discretizing a scene sample comprising an origin and a destination to obtain a scene matrix;
obtaining a scene distance matrix and a scene threat matrix according to the scene matrix; the scene distance matrix is a matrix formed by the distances from each point to the destination in the scene matrix; the scene threat matrix comprises threat coefficients of all points in the scene matrix;
acquiring a submatrix taking an originating point as a center in a scene distance matrix as a distance matrix, and acquiring a submatrix at a corresponding position in a scene threat matrix as a threat matrix;
inputting the two submatrices into an untrained A3C network to obtain the current flight direction, the position of the unmanned aerial vehicle from the next moment, the rewarding value and the value estimated value until the unmanned aerial vehicle flies to the destination or the unmanned aerial vehicle does not fly to the destination more than the preset times;
and updating parameters of the A3C network according to the prize value and the value estimated value.
Further, the center point of the current detection range is located at the position of the unmanned plane.
Further, the threat matrix includes threat coefficients for each location.
A computer readable storage medium having stored thereon a computer program comprising instructions for performing the method of any of the above claims.
The beneficial effects of the invention are as follows: according to the invention, what flight actions are made by the unmanned aerial vehicle in the current position environment are learned through the A3C algorithm, so that the unmanned aerial vehicle is helped to predict the path of the next position in the completely unknown complex environment, and further the flight of the unmanned aerial vehicle is guided. According to the invention, the unmanned aerial vehicle can completely realize the flight decision under the unknown complex obstacle environment, and the unmanned aerial vehicle determines the flight position at the next moment according to the environmental information of the current position, thereby breaking through the real-time of the original route planning algorithm
Limitations and complexity of the computation.
Drawings
FIG. 1 is a technical route;
FIG. 2 is an A3C frame;
FIG. 3 is an Actor-Critic schematic;
FIG. 4 is a schematic diagram of gradient update of the A3C algorithm.
Detailed Description
Specific embodiments of the present invention are described below with reference to the accompanying drawings.
With the development of intelligent technology, the deep learning method has a very good effect on solving the problems of complexity and real-time performance in the practical problem, wherein the deep reinforcement learning can well solve the problem of continuous decision, and the method can just solve the problem of real-time route planning of the unmanned aerial vehicle in a complex environment.
The invention provides a real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning, which is characterized by comprising the following design ideas: the method comprises the steps of adopting a mode of simulating image pixels, abstracting a route planning scene into a multidimensional matrix, taking environmental information (namely, scene information abstracted into the multidimensional matrix) in a detection range of the unmanned aerial vehicle as input of a deep reinforcement learning network, outputting flight actions of the unmanned aerial vehicle in a current environment and value estimation of the unmanned aerial vehicle in the current environment through training of a neural network, enabling the unmanned aerial vehicle to fly to a next position according to the flight actions, obtaining corresponding rewarding values, and taking the environmental information detected at the next position as input of the deep reinforcement learning network. The method comprises the steps of performing cyclic reciprocation in this way, and training a network model through a corresponding criterion function on one hand; on the other hand, after the model training is good, the unmanned aerial vehicle can be guided to fly according to the environmental information detected by the current position of the unmanned aerial vehicle, and therefore the unmanned aerial vehicle can plan a route in real time in the flying process. The deep reinforcement learning method adopted by the invention is an A3C algorithm (asynchronous advantage actor-critical algorism), and the specific flow is shown in figure 1.
(1) A3C algorithm
The A3C algorithm is one of the deep reinforcement learning algorithms, and is an improvement on the AC (Actor-Critic) algorithm, and the AC algorithm architecture is shown in fig. 3. Wherein the Actor selects action based on the output probability of Policy, and Critic judges the score of the action based on the behavior of the Actor. The Critic network plays a role in judging the potential value of the current state, and TD error generated by the Critic network is used for updating an Actor network.
The AC algorithm combines the advantages of a strategy-based reinforcement learning algorithm and an evaluation network-based reinforcement learning algorithm, making it more efficient in high dimensions and continuous motion space. As shown in FIG. 4, the A3C algorithm is parallelized by the AC algorithm
The algorithm, the AC algorithm is put into a plurality of threads (or processes) for synchronous training, so that the computer resources can be effectively utilized, and the training efficiency is improved.
(2) Real-time unmanned aerial vehicle route planning based on deep reinforcement learning
The reinforcement learning method adopted by the invention is an A3C algorithm, and one deep reinforcement learning problem comprises three main concepts, namely an environment state (Environment state), actions (actions) and rewards (Reward). Aiming at the problem of route planning in a two-dimensional scene, in the unmanned aerial vehicle route planning based on A3C, the invention adopts a mode of simulating image pixel points to carry out the navigation
The road planning scene is abstracted into two-dimensional matrices-a threat matrix and a distance matrix. Suppose that the area K x K (m 2 ) According to the method, the scale of the matrix is N multiplied by N, the route planning sampling time of the unmanned aerial vehicle is T, and the flight distance D of the unmanned aerial vehicle should meet D=K/N in the time. In this way KxK (m 2 ) Mapping the scene into two-dimensional matrixes with the size of N multiplied by N, namely mapping the scene into an N multiplied by N Threat matrix thread matrix and a Distance matrix with the size of N multiplied by N; the threat matrix is a threat degree matrix abstracted according to the scene, and the distance matrix refers to Euclidean distance between each sampling point and the target point in the scene. The extracted threat matrix and distance matrix represent the current flight scenario of the unmanned aerial vehicle, and correspond to the Environment (Environment) concept in reinforcement learning. The environmental information within the detection range of the unmanned plane can be expressed as s= (M) ij ) k×k Wherein M is ij =(α ij ,χ ij ),α ij Representing threat level, χ abstracted from scene ij Representing Euclidean distance from a target point, wherein k multiplied by k represents the detection range of the unmanned aerial vehicle, and S can be understood as a threat matrix and a distance matrix detected in a scene at the current position of the unmanned aerial vehicle; thus, the environmental information S in the detection range of the current position of the unmanned aerial vehicle can correspond to the current environmental state (state) in reinforcement learning, and the current state of the unmanned aerial vehicle is used as the input of the deep reinforcement learning algorithm A3C. The flight control of the unmanned aerial vehicle can be discretized into vectors with 4 dimensions under a two-dimensional scene, and the vectors respectively represent forward, backward, leftward and rightward flights (namely speeds in X and Y directions) of the unmanned aerial vehicle, which is equivalent to the fact that the unmanned aerial vehicle can choose forward flight or backward flight or leftward flight or rightward flight under the current environment, so that the flight direction can be discretized into 4 directions and used as a flight action space of the unmanned aerial vehicle. After the unmanned aerial vehicle finishes the flight direction, the default unmanned aerial vehicle flies a certain distance along the flight direction to reach the next position. The flying direction and the default flying distance of the flying machine can be 4 scattered flying directionsTo correspond to actions (actions) in reinforcement learning. Therefore, the problem of unmanned aerial vehicle route planning in a two-dimensional scene is converted into the problem of reinforcement learning, and in order to make the planned path reasonable and have robustness, the reward value needs to be reasonably defined.
Unmanned aerial vehicle route planning based on A3C discretizes unmanned aerial vehicle's flight control into vectors of 4 dimensions, and the task objective is to obtain as many return values as possible on the premise of reaching the target point. In a specific time period, the strengthening unit determines the next action according to the current environment state of the unmanned aerial vehicle, establishes the probability relation of mapping the state set and the behavior set, and executes better as the return value of the quantification standard is larger.
Unmanned aerial vehicle route planning based on A3C is to take whether the unmanned aerial vehicle can advance to a direction closer to a target as a return value (forward), and scene environment information (threat matrix and distance matrix) as state information, and the action space of the unmanned aerial vehicle is simplified into 4 flight actions described above. Based on such assumptions, a specific algorithm flow is as follows:
the mode of change between the unmanned aerial vehicle states is that the next state is determined by the unmanned aerial vehicle's actions, which in turn affects the next action.
The return function for the current action is then:
where γ is the discount factor.
As shown in FIG. 2, the network framework of A3C carries out gradient update training on the whole network according to pi(s) of the strategy network and V(s) of the estimation network, wherein the update gradient of pi(s) of the strategy network is as follows:
wherein the update gradient of the estimation network is:
the pseudo code for a particular algorithm flow is shown in table 1 below.
TABLE 1
(3) Prize value design
The design of the prize value is an important ring in the A3C algorithm, and the reasonable design of the prize value is one of the work which needs to be focused by the invention. The invention designs the prize value as follows:
for the sampling point of the unmanned plane (namely, the position of a certain point in the N x N scene after sampling, the sampling scene comprises a threat matrix and a distance matrix), the distance between the unmanned plane and the target is d i The threat degree corresponding to the scene is t i Normalized distance r between the sampling point and the target i Where i=1, 2,..the number of n×n, n×n sampling points is defined as follows:
r i =d i ×exp(t i ) (5)
for the current sampling point position i of the unmanned aerial vehicle, selecting a certain action in the action space according to the environmental information S in the detection range to obtain the next sampling point position i+1 of the unmanned aerial vehicle, and calculating the normalized distance r between the current sampling point position i and the target i And the normalized distance r between the next position i+1 and the target i+1 Determining a prize value of the action of the selection action in the current state S of the unmanned aerial vehicle by comparing the two distances:
furthermore, the prize value should also be related to whether the target point is reached and the length of the flight path required to reach the target point; the invention defaults that when the distance between the unmanned aerial vehicle and the target point is relatively short (the value is the 10 th value arranged in ascending order in the distance matrix in the invention), the unmanned aerial vehicle can reach the target point. Meanwhile, when the unmanned aerial vehicle reaches a target point after a series of continuous decisions, the reward value is 100; when the continuous decision exceeds a certain value t max When the target point still cannot be reached later, the prize value is-100.
The invention mainly aims at developing the real-time path prediction around the unmanned aerial vehicle, and provides a real-time path prediction method for the unmanned aerial vehicle based on deep reinforcement learning, so that real-time autonomous flight decision of the unmanned aerial vehicle is realized. The main innovation points of the invention are as follows:
the invention provides an unmanned aerial vehicle real-time autonomous flight decision method based on an A3C model. And establishing a multidimensional detection information matrix by using threat information, position information and the like detected by the unmanned aerial vehicle sensor information, and then determining the next flight position of the unmanned aerial vehicle by using a trained A3C network.
Claims (5)
1. The real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning is characterized by comprising the following steps of:
step 101, acquiring a threat matrix in a current detection range of the unmanned plane;
102, determining distances from a destination of the unmanned aerial vehicle to points in a current detection range, and taking the distances as a current distance matrix;
step 103, obtaining the current flight direction of the unmanned aerial vehicle and the position of the unmanned aerial vehicle to the next moment according to the threat matrix, the current distance matrix and the trained A3C network;
104, the unmanned aerial vehicle flies to the position of the next moment along the current flight direction; meanwhile, judging whether the position at the next moment reaches a destination;
step 105, if not, executing step 101;
the method for obtaining the current flight direction and the position of the unmanned aerial vehicle to the next moment comprises the following steps:
inputting a threat matrix and a current distance matrix into a trained A3C network, and predicting the probability of the airplane flying to each direction;
taking the direction of the aircraft with the highest probability as the current flight direction;
acquiring a position matrix of a current detection range corresponding to the current distance matrix;
according to the position of the position matrix where the unmanned aerial vehicle is located, skipping M points along the direction corresponding to the current flight direction, and taking the (M+1) th point as the position point of the next moment;
the method further comprises the steps of:
training an untrained A3C network through a plurality of scene samples, and updating A3C network parameters;
training an untrained A3C network through a plurality of scene samples for one scene sample, comprising:
discretizing a scene sample comprising an origin and a destination to obtain a scene matrix;
obtaining a scene distance matrix and a scene threat matrix according to the scene matrix;
the scene distance matrix is a matrix formed by the distances from each point to the destination in the scene matrix;
the scene threat matrix comprises threat coefficients of all points in the scene matrix;
acquiring a submatrix taking an originating point as a center in a scene distance matrix as a distance matrix, and acquiring a submatrix at a corresponding position in a scene threat matrix as a threat matrix;
inputting the two submatrices into an untrained A3C network to obtain the current flight direction, the position of the unmanned aerial vehicle from the next moment, the rewarding value and the value estimated value until the unmanned aerial vehicle flies to the destination or the unmanned aerial vehicle does not fly to the destination more than the preset times;
and updating parameters of the A3C network according to the prize value and the value estimated value.
2. The method of claim 1, wherein if not, step 105 is performed with step 101 comprising:
if not, judging whether the current execution times are greater than or equal to a preset threshold;
if yes, the aircraft does not fly according to the prediction of the A3C network, and the aircraft returns; if not, go to step 101.
3. The method of claim 1, wherein the center point of the current detection range is at a location point of the drone.
4. The method of claim 1, wherein the threat matrix comprises threat coefficients for each location.
5. A computer readable storage medium, characterized in that the storage medium has stored thereon a computer program comprising instructions for performing the method according to any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010988055.3A CN112148008B (en) | 2020-09-18 | 2020-09-18 | Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010988055.3A CN112148008B (en) | 2020-09-18 | 2020-09-18 | Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112148008A CN112148008A (en) | 2020-12-29 |
CN112148008B true CN112148008B (en) | 2023-05-02 |
Family
ID=73893992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010988055.3A Active CN112148008B (en) | 2020-09-18 | 2020-09-18 | Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112148008B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113283827B (en) * | 2021-04-16 | 2024-03-12 | 北京航空航天大学合肥创新研究院(北京航空航天大学合肥研究生院) | Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning |
CN113743605A (en) * | 2021-06-16 | 2021-12-03 | 温州大学 | Method for searching smoke and fire detection network architecture based on evolution method |
CN114089752A (en) * | 2021-11-11 | 2022-02-25 | 深圳市杉川机器人有限公司 | Autonomous exploration method for robot, and computer-readable storage medium |
CN114139791A (en) * | 2021-11-24 | 2022-03-04 | 北京华能新锐控制技术有限公司 | Wind generating set power prediction method, system, terminal and storage medium |
CN114355980B (en) * | 2022-01-06 | 2024-03-08 | 上海交通大学宁波人工智能研究院 | Four-rotor unmanned aerial vehicle autonomous navigation method and system based on deep reinforcement learning |
CN116148862B (en) * | 2023-01-16 | 2024-04-02 | 无锡市雷华科技有限公司 | Comprehensive early warning and evaluating method for bird detection radar flying birds |
CN116627181B (en) * | 2023-07-25 | 2023-10-13 | 吉林农业大学 | Intelligent obstacle avoidance method for plant protection unmanned aerial vehicle based on spatial reasoning |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444786A (en) * | 2020-03-12 | 2020-07-24 | 五邑大学 | Crowd evacuation method, device and system based on unmanned aerial vehicle group and storage medium |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011009009A1 (en) * | 2009-07-15 | 2011-01-20 | Massachusetts Institute Of Technology | Methods and apparati for predicting and quantifying threat being experienced by a modeled system |
US9952591B2 (en) * | 2015-11-24 | 2018-04-24 | Northrop Grumman Systems Corporation | Spatial-temporal forecasting for predictive situational awareness |
CN106873628B (en) * | 2017-04-12 | 2019-09-20 | 北京理工大学 | A kind of collaboration paths planning method of multiple no-manned plane tracking multimachine moving-target |
CN108731684B (en) * | 2018-05-07 | 2021-08-03 | 西安电子科技大学 | Multi-unmanned aerial vehicle cooperative area monitoring airway planning method |
CN109254591B (en) * | 2018-09-17 | 2021-02-12 | 北京理工大学 | Dynamic track planning method based on Anytime restoration type sparse A and Kalman filtering |
CN109871031B (en) * | 2019-02-27 | 2022-02-22 | 中科院成都信息技术股份有限公司 | Trajectory planning method for fixed-wing unmanned aerial vehicle |
CN109933086B (en) * | 2019-03-14 | 2022-08-30 | 天津大学 | Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning |
CN109992000B (en) * | 2019-04-04 | 2020-07-03 | 北京航空航天大学 | Multi-unmanned aerial vehicle path collaborative planning method and device based on hierarchical reinforcement learning |
CN110866887A (en) * | 2019-11-04 | 2020-03-06 | 深圳市唯特视科技有限公司 | Target situation fusion sensing method and system based on multiple sensors |
CN110874578B (en) * | 2019-11-15 | 2023-06-20 | 北京航空航天大学青岛研究院 | Unmanned aerial vehicle visual angle vehicle recognition tracking method based on reinforcement learning |
-
2020
- 2020-09-18 CN CN202010988055.3A patent/CN112148008B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444786A (en) * | 2020-03-12 | 2020-07-24 | 五邑大学 | Crowd evacuation method, device and system based on unmanned aerial vehicle group and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112148008A (en) | 2020-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112148008B (en) | Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning | |
Yijing et al. | Q learning algorithm based UAV path learning and obstacle avoidence approach | |
Xia et al. | Neural inverse reinforcement learning in autonomous navigation | |
CN110514206B (en) | Unmanned aerial vehicle flight path prediction method based on deep learning | |
CN109597425B (en) | Unmanned aerial vehicle navigation and obstacle avoidance method based on reinforcement learning | |
JP2022516383A (en) | Autonomous vehicle planning | |
Grigorescu et al. | Neurotrajectory: A neuroevolutionary approach to local state trajectory learning for autonomous vehicles | |
CN111142522A (en) | Intelligent agent control method for layered reinforcement learning | |
CN112650237A (en) | Ship path planning method and device based on clustering processing and artificial potential field | |
Xu et al. | A brief review of the intelligent algorithm for traveling salesman problem in UAV route planning | |
Yue et al. | A new searching approach using improved multi-ant colony scheme for multi-UAVs in unknown environments | |
Guizilini et al. | Dynamic hilbert maps: Real-time occupancy predictions in changing environments | |
Shkurti et al. | Model-based probabilistic pursuit via inverse reinforcement learning | |
Sonny et al. | Q-learning-based unmanned aerial vehicle path planning with dynamic obstacle avoidance | |
CN114386599A (en) | Method and device for training trajectory prediction model and trajectory planning | |
Wang et al. | Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning | |
CN116448134B (en) | Vehicle path planning method and device based on risk field and uncertain analysis | |
CN110779526B (en) | Path planning method, device and storage medium | |
Yang et al. | Learning graph-enhanced commander-executor for multi-agent navigation | |
Liu et al. | TD3 Based Collision Free Motion Planning for Robot Navigation | |
Zhang et al. | Enhancing Multi-UAV Reconnaissance and Search Through Double Critic DDPG With Belief Probability Maps | |
Ajani et al. | Dynamic path planning approaches based on artificial intelligence and machine learning | |
Thomas et al. | Inverse Reinforcement Learning for Generalized Labeled Multi-Bernoulli Multi-Target Tracking | |
Prathyusha et al. | Dynamic constraint based multi-route planning and multi-obstacle avoidance model for unmanned aerial vehicles | |
Saeed et al. | Domain-aware multiagent reinforcement learning in navigation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |