CN109933086B - Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning - Google Patents

Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning Download PDF

Info

Publication number
CN109933086B
CN109933086B CN201910195250.8A CN201910195250A CN109933086B CN 109933086 B CN109933086 B CN 109933086B CN 201910195250 A CN201910195250 A CN 201910195250A CN 109933086 B CN109933086 B CN 109933086B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
action
state
obstacle avoidance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910195250.8A
Other languages
Chinese (zh)
Other versions
CN109933086A (en
Inventor
田栢苓
刘丽红
崔婕
宗群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910195250.8A priority Critical patent/CN109933086B/en
Publication of CN109933086A publication Critical patent/CN109933086A/en
Application granted granted Critical
Publication of CN109933086B publication Critical patent/CN109933086B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/0088Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours

Landscapes

  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Game Theory and Decision Science (AREA)
  • Business, Economics & Management (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to the field of environmental perception and autonomous obstacle avoidance of a quad-rotor unmanned aerial vehicle, which aims to reduce resource loss and cost; the method adopts the technical scheme that an unmanned aerial vehicle environment sensing and autonomous obstacle avoidance method based on deep Q learning is adopted, firstly, a path within a certain distance in front of the unmanned aerial vehicle is detected by using a radar, and the distance between the radar and an obstacle and the distance between a target point are obtained and used as the current state of the unmanned aerial vehicle; secondly, in the training process, simulating a deep learning Q value corresponding to each state-action pair of the unmanned aerial vehicle by using a neural network; and finally, when the training result is gradually converged, selecting the optimal action for the unmanned aerial vehicle in each specific state by adopting a greedy algorithm, thereby realizing the autonomous obstacle avoidance of the unmanned aerial vehicle. The method is mainly applied to the environment sensing and autonomous obstacle avoidance control occasions of the unmanned aerial vehicle.

Description

Unmanned aerial vehicle environment sensing and autonomous obstacle avoidance method based on deep Q learning
Technical Field
The invention relates to the field of environmental perception and autonomous obstacle avoidance of quad-rotor unmanned aerial vehicles, in particular to the field of intelligent path planning research of unmanned aerial vehicles. In particular to an unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning.
Background
In recent years, Unmanned Aerial Vehicles (UAVs) have gradually come into the field of view of the public, and have been used to generate a great deal of interest in business, agriculture, entertainment, and even military. In the last decade, the number of unmanned aerial vehicles in China has been compared with the number of unmanned aerial vehicles in China, and the situation of development from inexistence to prosperity is realized. Data show that by 2018, the consumption amount of only civil unmanned aerial vehicles in China is nearly billion, and the consumption amount is on a rapid rising trend. The prosperity of the unmanned aerial vehicle market puts forward higher requirements on the safety and the development of the unmanned aerial vehicle control technology. At the present stage, a complete unmanned aerial vehicle airspace management regulation is not formed in China, the unmanned aerial vehicle is applied in various fields, even the phenomenon of 'black flight' and the like, potential safety hazards are easily caused in the flying process of the unmanned aerial vehicle, and unnecessary property loss and casualties are formed. Therefore, the perception and obstacle avoidance technology of the unmanned aerial vehicle becomes a subject of common attention of scholars at home and abroad. The collision of the unmanned aerial vehicle generally means that in the flight process, the distance between the unmanned aerial vehicle and buildings, mountains, birds and other flying objects in the path is smaller than a safety threshold value, and even the phenomenon of direct collision is generated. Different from piloted unmanned aerial vehicles, the unmanned aerial vehicle can not change the flight speed and the course depending on a driver in the navigation process so as to achieve the purpose of avoiding obstacles. Therefore, the sensing and obstacle avoidance device in the unmanned system becomes an essential component of the unmanned system. At present, the perception technology and the autonomous obstacle avoidance technology of the unmanned aerial vehicle mainly comprise the following technologies:
1. obstacle avoidance technology based on vision: the technology mainly utilizes an environment image in a front path acquired by the unmanned aerial vehicle in the flight process, utilizes an image processing technology to predict potential collision, and carries out path planning in real time to realize safe flight of the unmanned aerial vehicle; the scheme mainly depends on mature image perception and processing technology and is easily influenced by environmental factors such as weather, haze and the like.
2. Obstacle avoidance technology based on a detected object: the technology covers a wide range, mainly includes that the sensing device of radar, ultrasonic wave, infrared ray of utilizing the installation on the unmanned aerial vehicle surveys the distance between self and the barrier, modifies unmanned aerial vehicle's route on this basis, realizes keeping away the purpose of barrier. The technical scheme has the defects that the ultrasonic equidistant detection technology depends on the high requirement of the object reflecting surface, is easily influenced by environmental factors and the like.
3. Obstacle avoidance technology based on an electronic map: the technology mainly utilizes an electronic map built in the unmanned aerial vehicle and a GPS positioning technology of the unmanned aerial vehicle, and can accurately judge the position of the unmanned aerial vehicle and select a path. The scheme has the defects that the scheme cannot be applied to emergencies such as unknown maps, moving obstacles in airspace and the like, and the robustness is poor.
4. An obstacle avoidance technology based on an artificial potential field method comprises the following steps: the technology is mainly applied to the route planning layer of the unmanned aerial vehicle, and suitable charge attributes are distributed for the unmanned aerial vehicle, the barrier and the target point by using the principle that like charges repel each other and opposite charges attract each other in an electric field, so that the unmanned aerial vehicle can avoid the barrier and reach the specified target point.
5. An autonomous obstacle avoidance technology based on genetic algorithm, neural network, fuzzy control and the like: the technology is mainly applied to the path planning level of the unmanned aerial vehicle, and a nonlinear optimization model or a fuzzy controller is designed according to the detected information such as the distance, so that the flying speed and the course of the unmanned aerial vehicle are controlled.
According to the current research situation of the unmanned aerial vehicle environment perception and autonomous obstacle avoidance technology, most unmanned aerial vehicle obstacle avoidance technologies adopt a scheme of separating perception and path planning. Namely, a perception technology and a path planning technology are used as two modules in the system, and the obstacle avoidance of the unmanned aerial vehicle is realized through the transmission between data. The drawbacks of this solution are: 1) the transmission of data between the two modules may have time delay, so that a safety path planned by a path planning and calculating rule is delayed, and the safety navigation of the unmanned aerial vehicle is influenced; 2) the data transmission is lost and distorted, so that the path planning part loses reliable data support and cannot react to the barrier in time; 3) most path planning algorithms are easy to fall into local optimal solutions, and the problem of path planning in a complex flight environment is difficult to efficiently solve. 4) The distance sensing technology is easily affected by environmental factors such as weather, and when the weather condition is severe or reverse interference occurs, accurate obstacle distance detection cannot be carried out. In a word, most of the conventional unmanned aerial vehicle autonomous obstacle avoidance schemes adopt a mode that sensing and path planning are mutually connected, and the maturity of respective technologies and efficient data transmission between the technologies need to be ensured; when the algorithm is influenced by factors such as external interference or uncertainty, the algorithm may fail and the robustness is poor.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a method for environment perception and autonomous obstacle avoidance of a quad-rotor unmanned aerial vehicle based on a deep Q learning algorithm. On one hand, the existing unmanned aerial vehicle autonomous obstacle avoidance path planning scheme is easy to fall into a local optimal solution, so that unnecessary resource loss and cost expenditure are caused in the task execution process of the unmanned aerial vehicle; on the other hand, the unmanned aerial vehicle operation environment is more changeable and complex, and various uncertainties in the flight process put forward higher requirements on the instantaneity, robustness and safety of unmanned aerial vehicle autonomous obstacle avoidance. According to the technical scheme, the unmanned aerial vehicle environment sensing and autonomous obstacle avoidance method based on deep Q learning comprises the steps that firstly, a path within a certain distance in front of an unmanned aerial vehicle is detected by using a radar, and the distance between the radar and an obstacle and the distance between a target point and the target point are obtained and used as the current state of the unmanned aerial vehicle; secondly, in the training process, simulating a deep learning Q value corresponding to each state-action pair of the unmanned aerial vehicle by using a neural network; and finally, when the training result is gradually converged, selecting the optimal action for the unmanned aerial vehicle in each specific state by adopting a greedy algorithm, thereby realizing the autonomous obstacle avoidance of the unmanned aerial vehicle.
Specifically, the distance between the unmanned aerial vehicle and a destination and the distance between the unmanned aerial vehicle and an obstacle are acquired through sensing of the unmanned aerial vehicle and the environment and are used as state information of a deep Q learning algorithm;
the neural network fitting module is responsible for calculating a Q value: fitting the Q values of all possible state-action pairs aiming at a certain state by utilizing the approximation capability of the neural network;
the action selection module is responsible for selecting actions executed by the unmanned aerial vehicle, the unmanned aerial vehicle is selected to execute optimal actions according to epsilon probability by utilizing a greedy algorithm, the Q value corresponding to the optimal actions is maximum, the actions are randomly selected according to 1-epsilon probability, and after the unmanned aerial vehicle receives action information, the corresponding actions are executed to reach a new position;
the drone will step to the specified destination in state acquisition-Q value fitting-action selection-perform action-new state acquisition.
The concrete steps are detailed as follows:
establishing a Markov model of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, and modeling a quintuple (s, a, r, p, gamma) of an MDP (Markov decision process) according to an action decision process of autonomous obstacle avoidance of the unmanned aerial vehicle:
(1) a state set s defining the position coordinates (x, y) and the heading of the drone in the flight sceneThe angle theta indicates the determined position of the drone, (x) g ,y g ) Representing the destination of the flight mission of the drone, the distance of the drone to the destination is defined as follows:
△x=x-x g ,△y=y-y g (1)
in order to survey the environment on unmanned aerial vehicle the place ahead route, travel between the place ahead-45 degrees to 45 degrees at unmanned aerial vehicle, erect a length for 4 m's radar survey line every 5 degrees, totally 16, the detection distance of every radar survey line defines as follows:
Figure BDA0001995607310000031
where i is 1, … …,16, j is 1, … …, n, (obs _ x) j ,obs_y j ) The coordinate position of n obstacles is represented, detected represents that the radar detection lines of the unmanned aerial vehicle detect the obstacles, and simultaneously, in order to facilitate data processing, the distance dis detected by each radar detection line of the unmanned aerial vehicle is used i (i ═ 1., 16) normalized to norm _ dis i The following:
Figure BDA0001995607310000032
finally the state of the unmanned plane is determined as
s=[△x,△y,θ,norm_dis i ] (4)
(2) An action set a refers to a set of all possible actions taken by the unmanned aerial vehicle according to the position of the unmanned aerial vehicle after receiving a feedback value of an external environment, the motion speed v of the unmanned aerial vehicle is given in an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, and a selectable action set is defined as
Figure BDA0001995607310000033
The unmanned plane always flies forwards at a speed v, and a course angle theta is changed by selecting different actions, so that the speed components in the x and y directions are changed, and the planning of a flight path is realized;
(3) an immediate return function r, which refers to an instantaneous feedback obtained after the unmanned aerial vehicle selects a certain action in a certain state, and represents a reward for a certain state-action pair, and when Δ dis is defined for measuring a moment t, a distance traveled by the unmanned aerial vehicle towards a target point at a moment t-1 before a current state:
Figure BDA0001995607310000034
the delta theta is used for measuring the difference value between the current aerial school angle of the unmanned aerial vehicle and the angle of the unmanned aerial vehicle towards a target point:
Figure BDA0001995607310000035
(norm_dis 8 -1) indicating whether an obstacle is detected and the distance to the obstacle is detected by an 8 th radar detection line ahead of the heading of the drone:
Figure BDA0001995607310000036
in summary, the immediate reporting function is defined as follows
Figure BDA0001995607310000037
Wherein hit represents that the unmanned aerial vehicle collides with the obstacle, and at target represents that the unmanned aerial vehicle reaches the target point;
(4) the state transition probability function is used for describing the probability that the quad-rotor unmanned aerial vehicle selects a certain action to be transferred to the state of the next moment from the state of the current moment in a flight scene;
(5) the discount factor gamma is used for describing the 'degree of importance' of the current flight decision on the future immediate return function in the autonomous obstacle avoidance decision process of the unmanned aerial vehicle;
secondly, selecting a deep Q learning algorithm according to a modeled Markov decision process, determining an algorithm flow, and finding out an optimal solution of unmanned aerial vehicle environment perception and autonomous obstacle avoidance;
and thirdly, designing a complex flight scene of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, wherein the complex flight scene comprises constructing an unmanned aerial vehicle model, designing an unmanned aerial vehicle perception model for the surrounding environment, and then applying the first step and the second step to unmanned aerial vehicle control to realize unmanned aerial vehicle environment perception and autonomous obstacle avoidance.
The deep Q learning algorithm flow is as follows: firstly, randomly initializing the state of the unmanned aerial vehicle and parameters of a neural network; secondly, selecting the action which enables the Q value to be maximum according to a plurality of Q values which are fitted to the current state by the neural network by using an epsilon probability, selecting the action randomly by using a 1-epsilon probability according to the probability that 0 is greater than epsilon and less than 1, obtaining a feedback value after the action is finished, reaching a new state, and storing an experience fragment of 'current state-action-feedback value-next time state' into an experience pool; finally, the process is circulated until the unmanned aerial vehicle reaches the destination, and the neural network is trained after every certain number of steps in the process;
the training process of the neural network is as follows: firstly, randomly extracting experience segments from an experience pool by a neural network, and selecting an action which enables the Q value of the experience segments to be maximum according to the state of the experience segments at the next moment; secondly, calculating a feedback value, and taking the square of the difference value between the maximum Q value corresponding to the next state and the Q value of the current state as the reverse error of the neural network; finally, to minimize the reverse transmission error, the neural network adjusts the parameters using a gradient descent algorithm.
The invention has the characteristics and beneficial effects that:
in order to verify the effectiveness of the unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on the deep Q learning algorithm, the unmanned aerial vehicle autonomous obstacle avoidance virtual simulation system is designed, and simulation experiments are carried out on the system. In the virtual simulation environment, the following simulation parameters are set:
(1) unmanned aerial vehicle flight scene: the square flight range l is 20m as shown in fig. 6, where all obstacles areThe ratio of the total area to the square flight range is d equal to 0.01, the radius of the obstacle is randomly generated, and the radius is equal to or larger than 0.1m and equal to or smaller than 0.3 m. In order to increase the complexity of the flight environment of the unmanned aerial vehicle, the ratio of the moving obstacle to all obstacles in all obstacles is r-0.2, and the moving speed v is obs Randomly generated but satisfies-3.0 m/s ≦ v obs Is less than or equal to 3.0m/s, and the refreshing frequency of the flying scene is 30 Hz.
(2) Neural network parameters: the learning rate of the neural network gradient descent optimizer is 0.01, and the neural network training model is shown in fig. 3 and comprises an input layer of 19 neurons, a hidden layer of 10 neurons and an output layer of 3 neurons. The activation functions of the input layer and the hidden layer both adopt linear modification units.
(3) Deep Q learning algorithm: the search rate e is 0.9, the discount factor γ is 0.9, and the memory pool storage capacity of the deep Q learning is 500, and the update is performed every 300 times of operation.
(4) A radar detector: the unmanned aerial vehicle traveles between 45 degrees and 45 degrees in the place ahead, erects a radar detection line that is 4m long at every 5 degrees, 16 totally.
(5) Unmanned aerial vehicle model: the flying speed v of the drone was 2.5m/s, and the image rendering data was from the 3D printing model 3DBuilder, in which some of the data are shown in table 2.
The unmanned aerial vehicle environment perception and autonomous obstacle avoidance method is developed based on a deep Q learning algorithm, and due to the fitting capability of deep learning and the decision-making capability of reinforcement learning, the method still has good robustness under the condition that the flight scene of the unmanned aerial vehicle is extremely complex. In order to further prove the effectiveness of the unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on the deep Q learning algorithm, simulation verification is carried out on a flight scene, wherein the position, the radius and the moving speed of an obstacle, the starting position of the unmanned aerial vehicle and the position of a target point are randomly set.
The unmanned aerial vehicle autonomous obstacle avoidance flow chart is shown in fig. 4, the unmanned aerial vehicle needs to fly towards a target point in each flight round, when the unmanned aerial vehicle arrives at a destination, the position of the target point is updated, and the unmanned aerial vehicle continues to track; when the unmanned aerial vehicle collides with the barrier, the positions of the unmanned aerial vehicle and the target point are updated simultaneously; in order to improve the efficiency, when the unmanned aerial vehicle does not reach the target point in each flight round for a long time and does not collide with the obstacle, the positions of the unmanned aerial vehicle and the target point are updated simultaneously.
The simulation result is shown in fig. 5, the unmanned aerial vehicle can realize the convergence of the loss function from high to low in each flight round, and the neural network training value reaches the target point quickly after convergence due to the fact that the movement speed of the unmanned aerial vehicle is accelerated. And then, after the unmanned aerial vehicle reaches the terminal, the target point is updated immediately. The loss function then generates a higher jump until the neural network converges again, where it reaches the end point, and so on.
The simulation of the moving process of the unmanned aerial vehicle obstacle avoidance is shown in fig. 6, the upper group of images and the lower group of images can be seen, and the unmanned aerial vehicle can safely reach the destination point in a complex environment. The result shows that the unmanned aerial vehicle autonomous obstacle avoidance algorithm can complete obstacle avoidance flight from a starting point to a target point in a complex flight scene.
Table 2 unmanned plane model 3D printing data (part)
Figure BDA0001995607310000051
In a designed complex flight scene of the unmanned aerial vehicle, the provided unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm based on the depth Q learning algorithm is adopted, obstacle avoidance tests under different obstacle distribution conditions are respectively realized, the following control performance is analyzed from different angles by combining test results, and the effectiveness of the guidance algorithm is further clarified.
(1) And (3) robustness analysis: according to the method for erecting the radar detection line in the range from-45 degrees to 45 degrees in front of the course of the unmanned aerial vehicle, influences of factors such as weather and climate can be eliminated, information such as obstacles and flight boundaries in front of the unmanned aerial vehicle in advancing can be effectively detected, and reliable information is provided for autonomous obstacle avoidance; meanwhile, the adopted deep Q learning algorithm can make an optimal decision according to Q values aiming at different flight states of the unmanned aerial vehicle, and an obstacle avoidance instruction is provided for the unmanned aerial vehicle. In conclusion, the unmanned aerial vehicle has stronger robustness aiming at different flight scenes, climates, weather and other influencing factors in the obstacle avoidance flight process.
(2) And (3) real-time analysis: according to the algorithm provided by the invention, the front path information detected by the radar is used as a decision basis, and the optimal instruction for obstacle avoidance of the unmanned aerial vehicle is directly generated through processing of the deep neural network and the Q learning algorithm.
(3) And (3) safety analysis: as can be seen from fig. 6, the algorithm provided by the invention can accurately and effectively identify the obstacle in the flight scene, and make an optimal action decision, so that the collision between the unmanned aerial vehicle and the obstacle and the motion boundary is avoided, and the safety of the unmanned aerial vehicle in the flight in a complex scene is ensured.
In conclusion, the unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm based on deep learning, which is proposed by the research, has high applicability to the obstacle avoidance problem of the unmanned aerial vehicle in the complex flight scene.
Description of the drawings:
figure 1 quad-rotor unmanned aerial vehicle environmental perception and autonomic obstacle avoidance system structure picture.
FIG. 2 is a block diagram of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm design idea.
FIG. 3 is a schematic diagram of a neural network training model.
Fig. 4 is a flow chart of autonomous obstacle avoidance of the unmanned aerial vehicle.
Figure 5 shows a graph of the loss function of a neural network.
FIG. 6 is a schematic diagram of an environment sensing and autonomous obstacle avoidance simulation process.
Detailed Description
In order to overcome the defect of poor robustness of the traditional unmanned aerial vehicle autonomous obstacle avoidance algorithm, the mapping between the sensing distance between the unmanned aerial vehicle and the obstacle and the unmanned aerial vehicle obstacle avoidance strategy is established by means of a deep reinforcement learning algorithm in the field of artificial intelligence which causes attention of all parties at present in research, and a method for sensing and avoiding the obstacle of the quad-rotor unmanned aerial vehicle based on the deep Q learning algorithm is provided through a deep reinforcement learning network. The method utilizes the radar detector in front of the unmanned aerial vehicle to detect the flying environment in a certain range in front, so that the influences of factors such as climate, distance and the like can be avoided to the greatest extent, and the robustness of the algorithm is improved; meanwhile, the detection information is used as original data, and a deep Q learning network is adopted to directly generate an obstacle avoidance strategy of the unmanned aerial vehicle, so that the real-time performance of obstacle avoidance of the unmanned aerial vehicle can be obviously improved; in addition, in the training process of the obstacle avoidance strategy based on the deep Q learning, each state-action pair of the unmanned aerial vehicle can be effectively fitted to the corresponding Q value, and the flight safety of the unmanned aerial vehicle can be effectively guaranteed by utilizing the strategy generated by the greedy algorithm. The unmanned aerial vehicle perception and obstacle avoidance strategy based on the depth Q learning is used for path planning of the unmanned aerial vehicle in a complex environment, has important theoretical significance for the unmanned aerial vehicle autonomous obstacle avoidance research field, and has high strategic value.
Aiming at the defects of the traditional unmanned aerial vehicle autonomous obstacle avoidance scheme based on environment perception and path planning, the invention provides an unmanned aerial vehicle autonomous obstacle avoidance method based on deep Q learning, wherein firstly, a path within a certain distance in front of an unmanned aerial vehicle is detected by using a radar, and the distance between the unmanned aerial vehicle autonomous obstacle avoidance method and a barrier and a target point is obtained and is used as the current state of the unmanned aerial vehicle; secondly, in the training process, simulating a Q value corresponding to each state-action pair of the unmanned aerial vehicle by using a neural network; and finally, when the training result is gradually converged, selecting the optimal action for the unmanned aerial vehicle in each specific state by adopting a greedy algorithm, thereby realizing the autonomous obstacle avoidance of the unmanned aerial vehicle.
Therefore, the unmanned aerial vehicle environment sensing and autonomous obstacle avoidance method based on the deep Q learning algorithm is a closed-loop intelligent real-time control scheme, and is high in safety and rapidness; the method can solve the problem of autonomous obstacle avoidance of the quad-rotor unmanned aerial vehicle in a complex scene, and has strong robustness; the scheme has high effectiveness and reliability, is beneficial to improving the autonomous decision-making capability of the unmanned aerial vehicle in the task execution process, and can be applied to various civil and military fields; the intelligent path planning scheme can be applied to autonomous obstacle avoidance of an actual unmanned aerial vehicle, action instructions are generated rapidly on line, and safe obstacle avoidance flight is achieved.
The invention discloses a method for sensing the environment of a quad-rotor unmanned aerial vehicle and automatically avoiding obstacles based on deep Q learning, which is mainly researched by combining a control theory method and a virtual simulation technology, and a simulation experiment is carried out in a python2.7 environment, so that the effectiveness of the method is verified.
Firstly, establishing a Markov model of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm. And modeling the quintuple (s, a, r, p, gamma) of the Markov Decision Process (MDP) according to the action decision process of autonomous obstacle avoidance of the unmanned aerial vehicle.
(1) A set of states s. Defining the position coordinates (x, y) of the drone in the flight scene and the heading angle theta to represent the determined position of the drone, (x g ,y g ) Representing the destination of the flight mission of the drone, the distance of the drone to the destination is defined as follows:
△x=x-x g ,△y=y-y g (1)
in order to detect the environment of unmanned aerial vehicle place ahead route, between unmanned aerial vehicle driving the place ahead-45 degrees to 45 degrees, erect a length 4 m's radar detection line every 5 degrees, 16 totally, the detection distance of every radar detection line defines as follows:
Figure BDA0001995607310000071
wherein, (obs _ x) j ,obs_y j ) And (j 1.. multidot.n) represents the coordinate positions of n obstacles, and detected represents that the radar detection line of the unmanned aerial vehicle detects the obstacles (as shown in a module 1 in fig. 2). Meanwhile, in order to facilitate data processing, the distance dis detected by each radar detection line of the unmanned aerial vehicle i (i ═ 1., 16) normalized to norm _ dis i (i ═ 1.., 16), as follows:
Figure BDA0001995607310000072
finally the state of the unmanned plane is determined as
s=[△x,△y,θ,norm_dis i ] (4)
(2) The set of actions a. The action set refers to a set of all possible actions taken by the unmanned aerial vehicle according to the position of the unmanned aerial vehicle after receiving the feedback value of the external environment. In the unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, the motion speed v of the unmanned aerial vehicle is given, and the selectable action set is defined as
Figure BDA0001995607310000081
Namely, the unmanned plane always flies forwards at the speed v, and the heading angle theta is changed by selecting different actions, so that the speed components in the x and y directions are changed, and the planning of the flight path is realized.
(3) The function r is reported immediately. The immediate return function refers to the instant feedback obtained after the unmanned aerial vehicle selects a certain action in a certain state, and represents the reward of a certain state-action pair. Defining delta dis for measuring the distance of the unmanned aerial vehicle towards the target point at the previous moment t-1 in the current state when the moment t is measured:
Figure BDA0001995607310000082
the delta theta is used for measuring the difference value between the current aerial school angle of the unmanned aerial vehicle and the angle of the unmanned aerial vehicle towards a target point:
Figure BDA0001995607310000083
(norm_dis 8 -1) indicating whether an obstacle is detected and the distance to the obstacle is detected by the 8 th radar detection line ahead of the heading of the drone:
Figure BDA0001995607310000084
in summary, the immediate reporting function is defined as follows
Figure BDA0001995607310000085
Wherein hit represents that the unmanned aerial vehicle collides with the obstacle, and at target represents that the unmanned aerial vehicle arrives at the target point.
(4) The state transition probability function p. In the subject, the state transition probability function is used to describe the probability that the quad-rotor unmanned aerial vehicle selects a certain action to transition to the state of the next moment from the state of the current moment in the flight scene.
The flight environment is complex in the subject, so that the model is modeled into a Markov process with unknown state transition probability p. In the field of reinforcement learning, an effective solution algorithm exists under each condition aiming at the problem that whether the state transition probability is known to be based on an environment model or not. The deep Q learning algorithm is one of reinforcement learning algorithms, and can effectively solve the problem which is never based on the environment model under the condition that p is unknown.
(5) The discount factor gamma. The discount factor is used for describing the attention degree of the flight decision at the current moment to the future immediate return function in the autonomous obstacle avoidance decision process of the unmanned aerial vehicle.
And secondly, selecting a deep Q learning algorithm according to the modeled Markov decision process, determining an algorithm flow, and finding out an optimal solution of unmanned aerial vehicle environment perception and autonomous obstacle avoidance. The algorithm flow is determined as shown in table 1:
table 1: unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm
Figure BDA0001995607310000086
Figure BDA0001995607310000091
The algorithm flow comprises the following steps of firstly, randomly initializing the unmanned aerial vehicle state and the parameter of the neural network; secondly, selecting the action which enables the Q value to be maximum according to a plurality of Q values which are fitted to the current state by the neural network by using an epsilon probability (0< epsilon <1), randomly selecting the action by using a 1-epsilon probability, obtaining a feedback value after the action is completed, reaching a new state, and storing an experience segment of 'current state-action-feedback value-next time state' into an experience pool; and finally, the process is circulated until the unmanned aerial vehicle reaches the destination, and the neural network is trained at regular intervals in the process.
The training process of the neural network is as follows: firstly, the neural network randomly extracts experience segments from an experience pool and selects an action which enables the Q value of the experience segments to be maximum according to the state of the experience segments at the next moment; secondly, calculating the feedback value of the current state, and the square of the difference value between the maximum Q value corresponding to the state at the next moment and the Q value of the current state as the reverse error of the neural network; finally, to minimize the reverse transmission error, the neural network adjusts the parameters using a gradient descent algorithm.
And thirdly, setting an environment for unmanned aerial vehicle environment sensing and autonomous obstacle avoidance. In the process of unmanned aerial vehicle environment perception and autonomous obstacle avoidance, the unmanned aerial vehicle as an intelligent agent needs to continuously interact with the environment with obstacles around to obtain enough data, and then enough information can be collected as a basis for decision making. Meanwhile, the unmanned aerial vehicle is used as a controlled object, and a model of the unmanned aerial vehicle is also an indispensable part in the simulation verification process.
The unmanned aerial vehicle flight environment assumes that within the square range, cylinder that distributes and have inconsistent size is as the barrier, and green sign represents the destination that unmanned aerial vehicle flies simultaneously. The model of quad-rotor unmanned aerial vehicle is obtained through 3D printing data, and 3D printing data is input into environment Director, can reappear quad-rotor unmanned aerial vehicle's model.
Based on the three steps, the unmanned aerial vehicle can detect the obstacle and realize autonomous obstacle avoidance through the radar detection device of the unmanned aerial vehicle under the complex motion scene, and the unmanned aerial vehicle can reach the destination.
The structure diagram of the environment sensing and autonomous obstacle avoidance system of the quad-rotor unmanned aerial vehicle is shown in fig. 1. By acquiring state information such as obstacles and target points in the flight environment and selecting the optimal action in the current state, the quad-rotor unmanned aerial vehicle can be controlled, and the target requirement for reaching the destination is met. The fitting of the Q value is a core link of the algorithm, and only through the accurate fitting of the Q value, the appropriate action can be selected for the unmanned aerial vehicle to complete the set flight task. If the fitting part of the Q value does not exist, the unmanned aerial vehicle cannot obtain a flight instruction, and cannot complete a flight task in a complex environment.
Fig. 2 is a block diagram of the design idea of the environment sensing and autonomous obstacle avoidance algorithm of the unmanned aerial vehicle provided by the invention. The state detection module is responsible for acquiring information, and the distance between the unmanned aerial vehicle and a destination and the distance between the unmanned aerial vehicle and an obstacle are acquired through sensing of the unmanned aerial vehicle and the environment and used as state information of the deep Q learning algorithm. The neural network fitting module is responsible for calculating the Q value, and the Q values of all possible state-action pairs for a certain state are fitted by utilizing the approximation capability of the neural network. The action selection module is responsible for selecting actions executed by the unmanned aerial vehicle, selects the action with the maximum Q value according to the probability of epsilon (0< epsilon <1) in a plurality of Q values corresponding to the current state by utilizing a greedy algorithm, and randomly selects the action according to the probability of 1-epsilon. The action execution module is responsible for executing specific actions, and after the unmanned aerial vehicle receives the action information, corresponding actions are executed to reach a new position. The drone will gradually reach the specified destination in state acquisition-Q value fitting-action selection-execute action-new state acquisition.
Firstly, modeling an unmanned aerial vehicle environment perception and Markov process of an autonomous obstacle avoidance algorithm. And modeling the quintuple (s, a, r, p, gamma) of the Markov Decision Process (MDP) according to the action decision process of autonomous obstacle avoidance of the unmanned aerial vehicle.
(1) And a state set s, wherein the state set refers to a state quantity capable of determining the current flight information of the unmanned aerial vehicle.
Defining the current position (x, y) of the drone in the flight scene and the heading angle theta to represent the determined position of the drone, (x g ,y g ) Representing the destination of the mission for which the drone is flying, the distance of the drone from the destination is defined as follows:
△x=x-x g ,△y=y-y g (10)
in order to detect the environment of unmanned aerial vehicle place ahead route, between unmanned aerial vehicle driving the place ahead-45 degrees to 45 degrees, erect a length 4 m's radar detection line every 5 degrees, 16 totally, the detection distance of every radar detection line defines as follows:
Figure BDA0001995607310000101
wherein, (obs _ x) j ,obs_y j ) And (j 1.. multidot.n) represents the coordinate positions of n obstacles, and detected represents that the radar detection line of the unmanned aerial vehicle detects the obstacles (as shown in a module 1 in fig. 2). Meanwhile, in order to facilitate data processing, the distance dis detected by each radar detection line of the unmanned aerial vehicle is used i (i ═ 1...., 16) normalization processing to norm _ dis i (i ═ 1.., 16), as follows:
Figure BDA0001995607310000102
finally the state of the unmanned plane is determined as
s=[△x,△y,θ,norm_dis i ] (13)
In the state information, the distance between the current flight position of the unmanned aerial vehicle and the destination can be represented; meanwhile, the distance between the unmanned aerial vehicle and the obstacle existing in the flying scene can be represented, and therefore whether obstacle avoidance operation is needed or not is selected.
(2) Action set a, the action set refers to a set of all actions that the unmanned aerial vehicle may take for the position where the unmanned aerial vehicle is located after receiving the feedback value of the external environment.
In the unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, the motion speed v of the unmanned aerial vehicle is given, and the selectable action set is defined as
Figure BDA0001995607310000103
Namely, the unmanned plane always flies forwards at the speed v, and the heading angle theta is changed by selecting different actions, so that the speed components in the x and y directions are changed, and the planning of the flight path is realized. Therefore, before the unmanned aerial vehicle reaches the terminal, the unmanned aerial vehicle always moves along the track under the action of the heading angle theta at the speed v, and the track of the unmanned aerial vehicle changes along with the change of the heading angle until the unmanned aerial vehicle reaches the destination.
(3) And reporting a function r immediately, wherein the function for reporting immediately refers to instantaneous feedback obtained after the unmanned aerial vehicle selects a certain action in a certain state, and represents a reward for a certain state-action pair.
The state-action pair in the flight process of the unmanned aerial vehicle mainly comprises three conditions: and when the aircraft reaches a target point, the aircraft collides with an obstacle, and the aircraft is in a safe flight state. For each case, the immediate reward function needs to be designed reasonably. The scene of the unmanned aerial vehicle for the obstacle is simple, the immediate return function is respectively defined as a reward value of 15 and a penalty value of-20, the safe flight state is complex, and the travel distance of the unmanned aerial vehicle flying at the previous moment, the angle difference towards the target point and the distance between the unmanned aerial vehicle flying and the obstacle need to be comprehensively considered.
Δ dis is defined to measure the distance traveled by the current state toward the target point at time t compared to the previous state:
Figure BDA0001995607310000111
the delta theta is used for measuring the difference value between the current aviation correction angle of the unmanned aerial vehicle and the angle of the unmanned aerial vehicle towards a target point:
Figure BDA0001995607310000112
(norm_dis 8 -1) indicating whether the 8 th radar detection line ahead of the heading of the drone detects an obstacle and the distance to the obstacle.
Figure BDA0001995607310000113
In summary, the immediate reporting function is defined as follows
Figure BDA0001995607310000114
Wherein hit represents that the unmanned aerial vehicle collides with the obstacle, and at target represents that the unmanned aerial vehicle arrives at the target point.
(4) The state transition probability function p. In the subject, the state transition probability function is used to describe the probability that the quad-rotor unmanned aerial vehicle selects a certain action to transition to the state of the next moment from the state of the current moment in the flight scene.
The flight environment is complex in the subject, so that the model is modeled into a Markov process with unknown state transition probability p. In the field of reinforcement learning, an effective solution algorithm exists in each case for the problem of whether the state transition probability is known to be based on an environmental model or not. The deep Q learning algorithm is one of reinforcement learning algorithms, and can effectively solve the problem which is never based on the environment model under the condition that p is unknown.
(5) And the discount factor gamma is used for describing the attention degree of the current flight decision on the future immediate return function in the autonomous obstacle avoidance decision process of the unmanned aerial vehicle.
In the unmanned aerial vehicle environment perception and autonomous obstacle avoidance flight process, in order to enable the unmanned aerial vehicle to intelligently avoid the obstacle, the unmanned aerial vehicle needs to be in the current state until the accumulated return value of the future terminal state
Figure BDA0001995607310000115
And max.
When the cumulative reward function is maximum, the unmanned aerial vehicle can find the optimal path. Where γ represents the drone is in the current state s t At the moment, the 'degree of importance' of the future reward is determined, wherein gamma is 1, which represents enough 'far vision' of the unmanned plane, and the current and future immediate reward values are treated equally; gamma is 0 to indicate that the unmanned plane has very short sight and only looks atThe current immediate return value is repeated while the future effect is ignored.
And secondly, building a depth Q learning algorithm of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm. In order to enable the neural network to accurately fit the Q value of each state-action pair, the neural network is trained by using a deep Q learning algorithm, and the purpose is to adjust the weight and deviation in each neural network layer by using a gradient descent algorithm.
Meanwhile, in the process of fitting the Q value of the neural network, a flight instruction in each state is selected by using a deep Q learning algorithm. In the selection process of the flight action, in order to avoid the algorithm from falling into a local optimal solution, the relationship between the utilization and the exploration of the unmanned aerial vehicle in the flight scene needs to be considered. And (3) adopting a greedy algorithm, and exploring the flight scene by the unmanned aerial vehicle with the probability of epsilon (0< epsilon <1) and the probability of 1-epsilon by utilizing the collected flight scene data.
Finally, the depth Q learning algorithm for unmanned aerial vehicle environment perception and autonomous obstacle avoidance is shown in table 2
Table 2: unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm
Figure BDA0001995607310000121
The algorithm flow comprises the following steps of firstly, randomly initializing the state of the unmanned aerial vehicle and the parameters of a neural network; secondly, selecting the action which enables the Q value to be maximum according to a plurality of Q values which are fitted to the current state by the neural network by using an epsilon probability (0< epsilon <1), randomly selecting the action by using a 1-epsilon probability, obtaining a feedback value after the action is completed, reaching a new state, and storing an experience segment of 'current state-action-feedback value-next time state' into an experience pool; and finally, the process is circulated until the unmanned aerial vehicle reaches the destination, and the neural network is trained after every certain number of steps in the process.
The training process of the neural network is as follows: firstly, randomly extracting experience segments from an experience pool by a neural network, and selecting an action which enables the Q value of the experience segments to be maximum according to the state of the experience segments at the next moment; secondly, calculating a feedback value, and taking the square of the difference value between the maximum Q value corresponding to the next state and the Q value of the current state as the reverse error of the neural network; finally, to minimize the reverse transmission error, the neural network adjusts the parameters using a gradient descent algorithm.
And thirdly, designing a complex flight scene of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm. And a complex flight scene is built to carry out experimental verification on the effectiveness of the unmanned aerial vehicle autonomous obstacle avoidance algorithm. In the process of sensing and obstacle avoidance of the unmanned aerial vehicle, the unmanned aerial vehicle needs to continuously interact with a flight scene, collect data as much as possible as a decision basis, and can fully train the neural network and make the most correct decision behavior in the process of obstacle avoidance. Meanwhile, the unmanned aerial vehicle is used as a controlled object, and a model of the unmanned aerial vehicle is also an indispensable part in the simulation verification process.
The unmanned aerial vehicle flight scene is assumed to be in a square flight range, and cylinders with different sizes are distributed in the boundary to serve as obstacles. To enhance the complexity of the flight scenario, the destination for the drone flight is generated randomly within each flight round. Meanwhile, the positions and the radii of all the obstacles in the boundary and the moving speed of the obstacles are randomly generated, and an algorithm for setting the obstacles in the flight scene of the unmanned aerial vehicle is shown in table 3
TABLE 3 unmanned aerial vehicle flight scene setting algorithm
Figure BDA0001995607310000131
The algorithm flow is as follows: firstly, determining the total area of obstacles in the flight environment and the ratio of the moving obstacles to the total area of the obstacles; secondly, randomly generating the radius and the position of the obstacle (both in an allowable range), and selecting the moving speed to be 0 or randomly generating the radius and the position (both in an allowable range) by taking the area ratio of the moving obstacle as the probability; finally, the obstacles are mapped in the flight environment according to the radius, position and moving speed of the obstacles until the area and the total area of the obstacles are reached.
Simultaneously, the model of four rotor unmanned aerial vehicle in the flight scene is obtained through 3D print data, with 3D print data input to open source environment Director, can reappear four rotor unmanned aerial vehicle's flight scene.
Based on the three steps, the unmanned aerial vehicle can detect the obstacle and realize autonomous obstacle avoidance through the radar detection device of the unmanned aerial vehicle under the complex flying scene to reach the destination.

Claims (2)

1. An unmanned aerial vehicle environment sensing and autonomous obstacle avoidance method based on depth Q learning is characterized in that a path in a certain distance in front of an unmanned aerial vehicle is detected by a radar to obtain the distance between the unmanned aerial vehicle and a barrier and a target point as the current state of the unmanned aerial vehicle; secondly, in the training process, simulating a deep learning Q value corresponding to each state-action pair of the unmanned aerial vehicle by using a neural network; finally, when the training result is gradually converged, selecting the optimal action for the unmanned aerial vehicle in each specific state by adopting a greedy algorithm, so as to realize the autonomous obstacle avoidance of the unmanned aerial vehicle;
specifically, the distance between the unmanned aerial vehicle and a destination and the distance between the unmanned aerial vehicle and an obstacle are acquired through sensing of the unmanned aerial vehicle and the environment and serve as state information of a deep Q learning algorithm;
the neural network fitting module is responsible for calculating a Q value: fitting the Q values of all possible state-action pairs aiming at a certain state by utilizing the approximation capability of the neural network;
the action selection module is responsible for selecting the action executed by the unmanned aerial vehicle, the unmanned aerial vehicle is selected to execute the optimal action according to the epsilon probability by utilizing a greedy algorithm, the Q value corresponding to the optimal action is maximum, the action is randomly selected according to the 1-epsilon probability, and after the unmanned aerial vehicle receives the action information, the corresponding action is executed to reach a new position;
the unmanned aerial vehicle gradually reaches a specified destination in state acquisition-Q value fitting-action selection-execution action-new state acquisition;
the concrete steps are detailed as follows:
establishing a Markov model of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, modeling quintuple (s, a, r, p, gamma) of an MDP (minimization drive protocol) in the Markov decision process according to an action decision process of autonomous obstacle avoidance of the unmanned aerial vehicle, and performing a p-state transition probability function:
(1) a state set s defining the position coordinates (x, y) of the drone in the flight scene and a heading angle θ representing the determined position of the drone, (x) g ,y g ) Representing the destination of the flight mission of the drone, the distance of the drone to the destination is defined as follows:
Δx=x-x g ,Δy=y-y g (1)
in order to survey the environment on unmanned aerial vehicle the place ahead route, travel between the place ahead-45 degrees to 45 degrees at unmanned aerial vehicle, erect a length for 4 m's radar survey line every 5 degrees, totally 16, the detection distance of every radar survey line defines as follows:
Figure FDA0003690022010000011
where i is 1, … …,16, j is 1, … …, n, (obs _ x) j ,obs_y j ) Represent the coordinate position of n barriers, detected represents that unmanned aerial vehicle's radar detection line has detected the barrier, and for the convenience of data processing, the distance dis that detects every radar detection line of unmanned aerial vehicle simultaneously i (i ═ 1...., 16) normalization processing to norm _ dis i And, as follows:
Figure FDA0003690022010000012
finally the state of the unmanned plane is determined as
s=[Δx,Δy,θ,norm_dis i ] (4)
(2) An action set a refers to a set of all possible actions taken by the unmanned aerial vehicle according to the position of the unmanned aerial vehicle after receiving a feedback value of an external environment, the motion speed v of the unmanned aerial vehicle is given in an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, and a selectable action set is defined as
Figure FDA0003690022010000021
The unmanned plane always flies forwards at a speed v, and the course angle theta is changed by selecting different actions, so that the speed components in the x and y directions are changed, and the planning of a flight path is realized;
(3) an immediate return function r, which refers to an instantaneous feedback obtained after the unmanned aerial vehicle selects a certain action in a certain state, and represents a reward for a certain state-action pair, and when Δ dis is defined for measuring a time t, the distance traveled by the unmanned aerial vehicle towards a target point in a current state is greater than a previous time t-1:
Figure FDA0003690022010000022
Δ θ is used to measure the difference between the current aerial school angle of the drone and the angle of the drone towards the target point:
Figure FDA0003690022010000023
(norm_dis 8 -1) indicating whether an obstacle is detected and the distance to the obstacle is detected by the 8 th radar detection line ahead of the heading of the drone:
Figure FDA0003690022010000024
wherein radius represents the radius of the obstacle, and the immediate return function is defined as follows
Figure FDA0003690022010000025
Wherein hit represents that the unmanned aerial vehicle collides with the obstacle, and at target represents that the unmanned aerial vehicle reaches the target point;
(4) the state transition probability function is used for describing the probability that the quad-rotor unmanned aerial vehicle selects a certain action to be transferred to the state of the next moment from the state of the current moment in a flight scene;
(5) the discount factor gamma is used for describing the 'degree of importance' of the current flight decision on the future immediate return function in the autonomous obstacle avoidance decision process of the unmanned aerial vehicle;
secondly, selecting a deep Q learning algorithm according to a modeled Markov decision process, determining an algorithm flow, and finding out an optimal solution of unmanned aerial vehicle environment perception and autonomous obstacle avoidance;
and thirdly, designing a complex flight scene of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, wherein the complex flight scene comprises constructing an unmanned aerial vehicle model, designing an unmanned aerial vehicle perception model for the surrounding environment, and then applying the first step and the second step to unmanned aerial vehicle control to realize unmanned aerial vehicle environment perception and autonomous obstacle avoidance.
2. The unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning of claim 1, wherein the deep Q learning algorithm flow is as follows: firstly, randomly initializing the state of the unmanned aerial vehicle and parameters of a neural network; secondly, selecting the action which enables the Q value to be maximum according to a plurality of Q values which are fitted to the current state by the neural network by using an epsilon probability, selecting the action randomly by using a 1-epsilon probability, obtaining a feedback value after the action is finished, reaching a new state, and storing an experience segment of 'current state-action-feedback value-next time state' into an experience pool; finally, the process is circulated until the unmanned aerial vehicle reaches the destination, and the neural network is trained after every certain times of steps in the process;
the training process of the neural network is as follows: firstly, randomly extracting experience segments from an experience pool by a neural network, and selecting an action which enables the Q value of the experience segments to be maximum according to the state of the experience segments at the next moment; secondly, calculating a feedback value, and taking the square of the difference value between the maximum Q value corresponding to the next state and the Q value of the current state as the reverse error of the neural network; finally, to minimize the reverse transmission error, the neural network adjusts the parameters using a gradient descent algorithm.
CN201910195250.8A 2019-03-14 2019-03-14 Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning Active CN109933086B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910195250.8A CN109933086B (en) 2019-03-14 2019-03-14 Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910195250.8A CN109933086B (en) 2019-03-14 2019-03-14 Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning

Publications (2)

Publication Number Publication Date
CN109933086A CN109933086A (en) 2019-06-25
CN109933086B true CN109933086B (en) 2022-08-30

Family

ID=66987310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910195250.8A Active CN109933086B (en) 2019-03-14 2019-03-14 Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning

Country Status (1)

Country Link
CN (1) CN109933086B (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110488859B (en) * 2019-07-15 2020-08-21 北京航空航天大学 Unmanned aerial vehicle route planning method based on improved Q-learning algorithm
CN110488861B (en) * 2019-07-30 2020-08-28 北京邮电大学 Unmanned aerial vehicle track optimization method and device based on deep reinforcement learning and unmanned aerial vehicle
CN110378439B (en) * 2019-08-09 2021-03-30 重庆理工大学 Single robot path planning method based on Q-Learning algorithm
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm
CN110806756B (en) * 2019-09-10 2022-08-02 西北工业大学 Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN110596734B (en) * 2019-09-17 2020-12-01 南京航空航天大学 Multi-mode Q learning-based unmanned aerial vehicle positioning interference source system and method
CN110716575A (en) * 2019-09-29 2020-01-21 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning
CN110554707B (en) * 2019-10-17 2022-09-30 陕西师范大学 Q learning automatic parameter adjusting method for aircraft attitude control loop
CN110879610B (en) * 2019-10-24 2021-08-13 北京航空航天大学 Reinforced learning method for autonomous optimizing track planning of solar unmanned aerial vehicle
CN112764423A (en) * 2019-11-05 2021-05-07 上海为彪汽配制造有限公司 Method and system for constructing flight path of multi-rotor unmanned aerial vehicle
CN110703766B (en) * 2019-11-07 2022-01-11 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network
CN112937564B (en) * 2019-11-27 2022-09-02 魔门塔(苏州)科技有限公司 Lane change decision model generation method and unmanned vehicle lane change decision method and device
CN111123963B (en) * 2019-12-19 2021-06-08 南京航空航天大学 Unknown environment autonomous navigation system and method based on reinforcement learning
CN111198568A (en) * 2019-12-23 2020-05-26 燕山大学 Underwater robot obstacle avoidance control method based on Q learning
CN110968102B (en) * 2019-12-27 2022-08-26 东南大学 Multi-agent collision avoidance method based on deep reinforcement learning
CN111260658B (en) * 2020-01-10 2023-10-17 厦门大学 Deep reinforcement learning method for image segmentation
CN111473794B (en) * 2020-04-01 2022-02-11 北京理工大学 Structural road unmanned decision planning method based on reinforcement learning
CN111487992A (en) * 2020-04-22 2020-08-04 北京航空航天大学 Unmanned aerial vehicle sensing and obstacle avoidance integrated method and device based on deep reinforcement learning
JP6950117B1 (en) * 2020-04-30 2021-10-13 楽天グループ株式会社 Learning device, information processing device, and trained control model
WO2021220467A1 (en) * 2020-04-30 2021-11-04 楽天株式会社 Learning device, information processing device, and learned control model
CN111667513B (en) * 2020-06-01 2022-02-18 西北工业大学 Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN112036261A (en) * 2020-08-11 2020-12-04 海尔优家智能科技(北京)有限公司 Gesture recognition method and device, storage medium and electronic device
CN112148008B (en) * 2020-09-18 2023-05-02 中国航空无线电电子研究所 Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning
US11866070B2 (en) 2020-09-28 2024-01-09 Guangzhou Automobile Group Co., Ltd. Vehicle control method and apparatus, storage medium, and electronic device
CN112947562B (en) * 2021-02-10 2021-11-30 西北工业大学 Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG
CN113232016A (en) * 2021-04-13 2021-08-10 哈尔滨工业大学(威海) Mechanical arm path planning method integrating reinforcement learning and fuzzy obstacle avoidance
CN113110547B (en) * 2021-04-21 2022-06-07 吉林大学 Flight control method, device and equipment of miniature aviation aircraft
CN113298368B (en) * 2021-05-14 2023-11-10 南京航空航天大学 Multi-unmanned aerial vehicle task planning method based on deep reinforcement learning
CN114371720B (en) * 2021-12-29 2023-09-29 国家电投集团贵州金元威宁能源股份有限公司 Control method and control device for realizing tracking target of unmanned aerial vehicle
CN114578834B (en) * 2022-05-09 2022-07-26 北京大学 Target layering double-perception domain-based reinforcement learning unmanned vehicle path planning method
CN115574816B (en) * 2022-11-24 2023-03-14 东南大学 Bionic vision multi-source information intelligent perception unmanned platform

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106595671A (en) * 2017-02-22 2017-04-26 南方科技大学 Method and apparatus for planning route of unmanned aerial vehicle based on reinforcement learning
CN107065890A (en) * 2017-06-02 2017-08-18 北京航空航天大学 A kind of unmanned vehicle intelligent barrier avoiding method and system
CN108255182A (en) * 2018-01-30 2018-07-06 上海交通大学 A kind of service robot pedestrian based on deeply study perceives barrier-avoiding method
CN108388270A (en) * 2018-03-21 2018-08-10 天津大学 Cluster unmanned plane track posture cooperative control method towards security domain
CN109032168A (en) * 2018-05-07 2018-12-18 西安电子科技大学 A kind of Route planner of the multiple no-manned plane Cooperative Area monitoring based on DQN
CN109443366A (en) * 2018-12-20 2019-03-08 北京航空航天大学 A kind of unmanned aerial vehicle group paths planning method based on improvement Q learning algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109348707A (en) * 2016-04-27 2019-02-15 纽拉拉股份有限公司 For the method and apparatus of the Q study trimming experience memory based on deep neural network
CN106970648B (en) * 2017-04-19 2019-05-14 北京航空航天大学 Unmanned plane multi-goal path plans combined method for searching under the environment of city low latitude

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106595671A (en) * 2017-02-22 2017-04-26 南方科技大学 Method and apparatus for planning route of unmanned aerial vehicle based on reinforcement learning
CN107065890A (en) * 2017-06-02 2017-08-18 北京航空航天大学 A kind of unmanned vehicle intelligent barrier avoiding method and system
CN108255182A (en) * 2018-01-30 2018-07-06 上海交通大学 A kind of service robot pedestrian based on deeply study perceives barrier-avoiding method
CN108388270A (en) * 2018-03-21 2018-08-10 天津大学 Cluster unmanned plane track posture cooperative control method towards security domain
CN109032168A (en) * 2018-05-07 2018-12-18 西安电子科技大学 A kind of Route planner of the multiple no-manned plane Cooperative Area monitoring based on DQN
CN109443366A (en) * 2018-12-20 2019-03-08 北京航空航天大学 A kind of unmanned aerial vehicle group paths planning method based on improvement Q learning algorithm

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于深度Q值网络的自动小车控制方法;王立群 等;《电子测量技术》;20171130;第40卷(第11期);第226-229页 *
基于马尔可夫网络排队论的电梯交通建模及应用;宗群 等;《天津大学学报》;20050131;第38卷(第1期);第9-13页 *
王立群 等.基于深度Q值网络的自动小车控制方法.《电子测量技术》.2017,第40卷(第11期), *
面向智能避障场景的深度强化学习研究;刘庆杰 等;《智能物联技术》;20180930;第1卷(第2期);第18-22页 *

Also Published As

Publication number Publication date
CN109933086A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN109933086B (en) Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning
CN110673637B (en) Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
CN105892489B (en) A kind of automatic obstacle avoiding UAV system and control method based on Multi-sensor Fusion
CN108897312B (en) Method for planning continuous monitoring path of multiple unmanned aerial vehicles to large-scale environment
CN112684807A (en) Unmanned aerial vehicle cluster three-dimensional formation method
CN109521794A (en) A kind of multiple no-manned plane routeing and dynamic obstacle avoidance method
CN111950873B (en) Satellite real-time guiding task planning method and system based on deep reinforcement learning
CN106094569A (en) Multi-sensor Fusion unmanned plane perception with evade analogue system and emulation mode thereof
CN110362083A (en) It is a kind of based on multiple target tracking prediction space-time map under autonomous navigation method
CN109358638A (en) Unmanned plane vision barrier-avoiding method based on distributed maps
CN107065929A (en) A kind of unmanned plane is around flying method and system
CN112378397B (en) Unmanned aerial vehicle target tracking method and device and unmanned aerial vehicle
US20210325891A1 (en) Graph construction and execution ml techniques
CN111665508B (en) Helicopter-mounted terrain following and avoiding visual navigation system and navigation method
CN112596071A (en) Unmanned aerial vehicle autonomous positioning method and device and unmanned aerial vehicle
CN112379681A (en) Unmanned aerial vehicle obstacle avoidance flight method and device and unmanned aerial vehicle
Lawrance et al. Long endurance autonomous flight for unmanned aerial vehicles
CN110793522B (en) Flight path planning method based on ant colony algorithm
Fragoso et al. Dynamically feasible motion planning for micro air vehicles using an egocylinder
CN112380933B (en) Unmanned aerial vehicle target recognition method and device and unmanned aerial vehicle
Zhao et al. Autonomous exploration method for fast unknown environment mapping by using UAV equipped with limited FOV sensor
Yang et al. Optimization of dynamic obstacle avoidance path of multirotor UAV based on ant colony algorithm
Fei et al. Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environments
Kamat et al. A survey on autonomous navigation techniques
Shen et al. Research on real-time flight path planning of UAV based on grey prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant