CN109933086B

CN109933086B - Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning

Info

Publication number: CN109933086B
Application number: CN201910195250.8A
Authority: CN
Inventors: 田栢苓; 刘丽红; 崔婕; 宗群
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2022-08-30
Anticipated expiration: 2039-03-14
Also published as: CN109933086A

Abstract

The invention relates to the field of environmental perception and autonomous obstacle avoidance of a quad-rotor unmanned aerial vehicle, which aims to reduce resource loss and cost; the method adopts the technical scheme that an unmanned aerial vehicle environment sensing and autonomous obstacle avoidance method based on deep Q learning is adopted, firstly, a path within a certain distance in front of the unmanned aerial vehicle is detected by using a radar, and the distance between the radar and an obstacle and the distance between a target point are obtained and used as the current state of the unmanned aerial vehicle; secondly, in the training process, simulating a deep learning Q value corresponding to each state-action pair of the unmanned aerial vehicle by using a neural network; and finally, when the training result is gradually converged, selecting the optimal action for the unmanned aerial vehicle in each specific state by adopting a greedy algorithm, thereby realizing the autonomous obstacle avoidance of the unmanned aerial vehicle. The method is mainly applied to the environment sensing and autonomous obstacle avoidance control occasions of the unmanned aerial vehicle.

Description

Unmanned aerial vehicle environment sensing and autonomous obstacle avoidance method based on deep Q learning

Technical Field

The invention relates to the field of environmental perception and autonomous obstacle avoidance of quad-rotor unmanned aerial vehicles, in particular to the field of intelligent path planning research of unmanned aerial vehicles. In particular to an unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning.

Background

In recent years, Unmanned Aerial Vehicles (UAVs) have gradually come into the field of view of the public, and have been used to generate a great deal of interest in business, agriculture, entertainment, and even military. In the last decade, the number of unmanned aerial vehicles in China has been compared with the number of unmanned aerial vehicles in China, and the situation of development from inexistence to prosperity is realized. Data show that by 2018, the consumption amount of only civil unmanned aerial vehicles in China is nearly billion, and the consumption amount is on a rapid rising trend. The prosperity of the unmanned aerial vehicle market puts forward higher requirements on the safety and the development of the unmanned aerial vehicle control technology. At the present stage, a complete unmanned aerial vehicle airspace management regulation is not formed in China, the unmanned aerial vehicle is applied in various fields, even the phenomenon of 'black flight' and the like, potential safety hazards are easily caused in the flying process of the unmanned aerial vehicle, and unnecessary property loss and casualties are formed. Therefore, the perception and obstacle avoidance technology of the unmanned aerial vehicle becomes a subject of common attention of scholars at home and abroad. The collision of the unmanned aerial vehicle generally means that in the flight process, the distance between the unmanned aerial vehicle and buildings, mountains, birds and other flying objects in the path is smaller than a safety threshold value, and even the phenomenon of direct collision is generated. Different from piloted unmanned aerial vehicles, the unmanned aerial vehicle can not change the flight speed and the course depending on a driver in the navigation process so as to achieve the purpose of avoiding obstacles. Therefore, the sensing and obstacle avoidance device in the unmanned system becomes an essential component of the unmanned system. At present, the perception technology and the autonomous obstacle avoidance technology of the unmanned aerial vehicle mainly comprise the following technologies:

1. obstacle avoidance technology based on vision: the technology mainly utilizes an environment image in a front path acquired by the unmanned aerial vehicle in the flight process, utilizes an image processing technology to predict potential collision, and carries out path planning in real time to realize safe flight of the unmanned aerial vehicle; the scheme mainly depends on mature image perception and processing technology and is easily influenced by environmental factors such as weather, haze and the like.

2. Obstacle avoidance technology based on a detected object: the technology covers a wide range, mainly includes that the sensing device of radar, ultrasonic wave, infrared ray of utilizing the installation on the unmanned aerial vehicle surveys the distance between self and the barrier, modifies unmanned aerial vehicle's route on this basis, realizes keeping away the purpose of barrier. The technical scheme has the defects that the ultrasonic equidistant detection technology depends on the high requirement of the object reflecting surface, is easily influenced by environmental factors and the like.

3. Obstacle avoidance technology based on an electronic map: the technology mainly utilizes an electronic map built in the unmanned aerial vehicle and a GPS positioning technology of the unmanned aerial vehicle, and can accurately judge the position of the unmanned aerial vehicle and select a path. The scheme has the defects that the scheme cannot be applied to emergencies such as unknown maps, moving obstacles in airspace and the like, and the robustness is poor.

4. An obstacle avoidance technology based on an artificial potential field method comprises the following steps: the technology is mainly applied to the route planning layer of the unmanned aerial vehicle, and suitable charge attributes are distributed for the unmanned aerial vehicle, the barrier and the target point by using the principle that like charges repel each other and opposite charges attract each other in an electric field, so that the unmanned aerial vehicle can avoid the barrier and reach the specified target point.

5. An autonomous obstacle avoidance technology based on genetic algorithm, neural network, fuzzy control and the like: the technology is mainly applied to the path planning level of the unmanned aerial vehicle, and a nonlinear optimization model or a fuzzy controller is designed according to the detected information such as the distance, so that the flying speed and the course of the unmanned aerial vehicle are controlled.

According to the current research situation of the unmanned aerial vehicle environment perception and autonomous obstacle avoidance technology, most unmanned aerial vehicle obstacle avoidance technologies adopt a scheme of separating perception and path planning. Namely, a perception technology and a path planning technology are used as two modules in the system, and the obstacle avoidance of the unmanned aerial vehicle is realized through the transmission between data. The drawbacks of this solution are: 1) the transmission of data between the two modules may have time delay, so that a safety path planned by a path planning and calculating rule is delayed, and the safety navigation of the unmanned aerial vehicle is influenced; 2) the data transmission is lost and distorted, so that the path planning part loses reliable data support and cannot react to the barrier in time; 3) most path planning algorithms are easy to fall into local optimal solutions, and the problem of path planning in a complex flight environment is difficult to efficiently solve. 4) The distance sensing technology is easily affected by environmental factors such as weather, and when the weather condition is severe or reverse interference occurs, accurate obstacle distance detection cannot be carried out. In a word, most of the conventional unmanned aerial vehicle autonomous obstacle avoidance schemes adopt a mode that sensing and path planning are mutually connected, and the maturity of respective technologies and efficient data transmission between the technologies need to be ensured; when the algorithm is influenced by factors such as external interference or uncertainty, the algorithm may fail and the robustness is poor.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a method for environment perception and autonomous obstacle avoidance of a quad-rotor unmanned aerial vehicle based on a deep Q learning algorithm. On one hand, the existing unmanned aerial vehicle autonomous obstacle avoidance path planning scheme is easy to fall into a local optimal solution, so that unnecessary resource loss and cost expenditure are caused in the task execution process of the unmanned aerial vehicle; on the other hand, the unmanned aerial vehicle operation environment is more changeable and complex, and various uncertainties in the flight process put forward higher requirements on the instantaneity, robustness and safety of unmanned aerial vehicle autonomous obstacle avoidance. According to the technical scheme, the unmanned aerial vehicle environment sensing and autonomous obstacle avoidance method based on deep Q learning comprises the steps that firstly, a path within a certain distance in front of an unmanned aerial vehicle is detected by using a radar, and the distance between the radar and an obstacle and the distance between a target point and the target point are obtained and used as the current state of the unmanned aerial vehicle; secondly, in the training process, simulating a deep learning Q value corresponding to each state-action pair of the unmanned aerial vehicle by using a neural network; and finally, when the training result is gradually converged, selecting the optimal action for the unmanned aerial vehicle in each specific state by adopting a greedy algorithm, thereby realizing the autonomous obstacle avoidance of the unmanned aerial vehicle.

Specifically, the distance between the unmanned aerial vehicle and a destination and the distance between the unmanned aerial vehicle and an obstacle are acquired through sensing of the unmanned aerial vehicle and the environment and are used as state information of a deep Q learning algorithm;

the neural network fitting module is responsible for calculating a Q value: fitting the Q values of all possible state-action pairs aiming at a certain state by utilizing the approximation capability of the neural network;

the action selection module is responsible for selecting actions executed by the unmanned aerial vehicle, the unmanned aerial vehicle is selected to execute optimal actions according to epsilon probability by utilizing a greedy algorithm, the Q value corresponding to the optimal actions is maximum, the actions are randomly selected according to 1-epsilon probability, and after the unmanned aerial vehicle receives action information, the corresponding actions are executed to reach a new position;

the drone will step to the specified destination in state acquisition-Q value fitting-action selection-perform action-new state acquisition.

The concrete steps are detailed as follows:

establishing a Markov model of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, and modeling a quintuple (s, a, r, p, gamma) of an MDP (Markov decision process) according to an action decision process of autonomous obstacle avoidance of the unmanned aerial vehicle:

(1) a state set s defining the position coordinates (x, y) and the heading of the drone in the flight sceneThe angle theta indicates the determined position of the drone, (x) _g ,y _g ) Representing the destination of the flight mission of the drone, the distance of the drone to the destination is defined as follows:

△x＝x-x _g ,△y＝y-y _g (1)

in order to survey the environment on unmanned aerial vehicle the place ahead route, travel between the place ahead-45 degrees to 45 degrees at unmanned aerial vehicle, erect a length for 4 m's radar survey line every 5 degrees, totally 16, the detection distance of every radar survey line defines as follows:

where i is 1, … …,16, j is 1, … …, n, (obs _ x) _j ,obs_y _j ) The coordinate position of n obstacles is represented, detected represents that the radar detection lines of the unmanned aerial vehicle detect the obstacles, and simultaneously, in order to facilitate data processing, the distance dis detected by each radar detection line of the unmanned aerial vehicle is used _i (i ═ 1., 16) normalized to norm _ dis _i The following:

finally the state of the unmanned plane is determined as

s＝[△x,△y,θ,norm_dis _i ] (4)

(2) An action set a refers to a set of all possible actions taken by the unmanned aerial vehicle according to the position of the unmanned aerial vehicle after receiving a feedback value of an external environment, the motion speed v of the unmanned aerial vehicle is given in an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, and a selectable action set is defined as

The unmanned plane always flies forwards at a speed v, and a course angle theta is changed by selecting different actions, so that the speed components in the x and y directions are changed, and the planning of a flight path is realized;

(3) an immediate return function r, which refers to an instantaneous feedback obtained after the unmanned aerial vehicle selects a certain action in a certain state, and represents a reward for a certain state-action pair, and when Δ dis is defined for measuring a moment t, a distance traveled by the unmanned aerial vehicle towards a target point at a moment t-1 before a current state:

the delta theta is used for measuring the difference value between the current aerial school angle of the unmanned aerial vehicle and the angle of the unmanned aerial vehicle towards a target point:

(norm_dis ₈ -1) indicating whether an obstacle is detected and the distance to the obstacle is detected by an 8 th radar detection line ahead of the heading of the drone:

in summary, the immediate reporting function is defined as follows

Wherein hit represents that the unmanned aerial vehicle collides with the obstacle, and at target represents that the unmanned aerial vehicle reaches the target point;

(4) the state transition probability function is used for describing the probability that the quad-rotor unmanned aerial vehicle selects a certain action to be transferred to the state of the next moment from the state of the current moment in a flight scene;

(5) the discount factor gamma is used for describing the 'degree of importance' of the current flight decision on the future immediate return function in the autonomous obstacle avoidance decision process of the unmanned aerial vehicle;

secondly, selecting a deep Q learning algorithm according to a modeled Markov decision process, determining an algorithm flow, and finding out an optimal solution of unmanned aerial vehicle environment perception and autonomous obstacle avoidance;

and thirdly, designing a complex flight scene of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, wherein the complex flight scene comprises constructing an unmanned aerial vehicle model, designing an unmanned aerial vehicle perception model for the surrounding environment, and then applying the first step and the second step to unmanned aerial vehicle control to realize unmanned aerial vehicle environment perception and autonomous obstacle avoidance.

The deep Q learning algorithm flow is as follows: firstly, randomly initializing the state of the unmanned aerial vehicle and parameters of a neural network; secondly, selecting the action which enables the Q value to be maximum according to a plurality of Q values which are fitted to the current state by the neural network by using an epsilon probability, selecting the action randomly by using a 1-epsilon probability according to the probability that 0 is greater than epsilon and less than 1, obtaining a feedback value after the action is finished, reaching a new state, and storing an experience fragment of 'current state-action-feedback value-next time state' into an experience pool; finally, the process is circulated until the unmanned aerial vehicle reaches the destination, and the neural network is trained after every certain number of steps in the process;

the training process of the neural network is as follows: firstly, randomly extracting experience segments from an experience pool by a neural network, and selecting an action which enables the Q value of the experience segments to be maximum according to the state of the experience segments at the next moment; secondly, calculating a feedback value, and taking the square of the difference value between the maximum Q value corresponding to the next state and the Q value of the current state as the reverse error of the neural network; finally, to minimize the reverse transmission error, the neural network adjusts the parameters using a gradient descent algorithm.

The invention has the characteristics and beneficial effects that:

in order to verify the effectiveness of the unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on the deep Q learning algorithm, the unmanned aerial vehicle autonomous obstacle avoidance virtual simulation system is designed, and simulation experiments are carried out on the system. In the virtual simulation environment, the following simulation parameters are set:

(1) unmanned aerial vehicle flight scene: the square flight range l is 20m as shown in fig. 6, where all obstacles areThe ratio of the total area to the square flight range is d equal to 0.01, the radius of the obstacle is randomly generated, and the radius is equal to or larger than 0.1m and equal to or smaller than 0.3 m. In order to increase the complexity of the flight environment of the unmanned aerial vehicle, the ratio of the moving obstacle to all obstacles in all obstacles is r-0.2, and the moving speed v is _obs Randomly generated but satisfies-3.0 m/s ≦ v _obs Is less than or equal to 3.0m/s, and the refreshing frequency of the flying scene is 30 Hz.

(2) Neural network parameters: the learning rate of the neural network gradient descent optimizer is 0.01, and the neural network training model is shown in fig. 3 and comprises an input layer of 19 neurons, a hidden layer of 10 neurons and an output layer of 3 neurons. The activation functions of the input layer and the hidden layer both adopt linear modification units.

(3) Deep Q learning algorithm: the search rate e is 0.9, the discount factor γ is 0.9, and the memory pool storage capacity of the deep Q learning is 500, and the update is performed every 300 times of operation.

(4) A radar detector: the unmanned aerial vehicle traveles between 45 degrees and 45 degrees in the place ahead, erects a radar detection line that is 4m long at every 5 degrees, 16 totally.

(5) Unmanned aerial vehicle model: the flying speed v of the drone was 2.5m/s, and the image rendering data was from the 3D printing model 3DBuilder, in which some of the data are shown in table 2.

The unmanned aerial vehicle environment perception and autonomous obstacle avoidance method is developed based on a deep Q learning algorithm, and due to the fitting capability of deep learning and the decision-making capability of reinforcement learning, the method still has good robustness under the condition that the flight scene of the unmanned aerial vehicle is extremely complex. In order to further prove the effectiveness of the unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on the deep Q learning algorithm, simulation verification is carried out on a flight scene, wherein the position, the radius and the moving speed of an obstacle, the starting position of the unmanned aerial vehicle and the position of a target point are randomly set.

The unmanned aerial vehicle autonomous obstacle avoidance flow chart is shown in fig. 4, the unmanned aerial vehicle needs to fly towards a target point in each flight round, when the unmanned aerial vehicle arrives at a destination, the position of the target point is updated, and the unmanned aerial vehicle continues to track; when the unmanned aerial vehicle collides with the barrier, the positions of the unmanned aerial vehicle and the target point are updated simultaneously; in order to improve the efficiency, when the unmanned aerial vehicle does not reach the target point in each flight round for a long time and does not collide with the obstacle, the positions of the unmanned aerial vehicle and the target point are updated simultaneously.

The simulation result is shown in fig. 5, the unmanned aerial vehicle can realize the convergence of the loss function from high to low in each flight round, and the neural network training value reaches the target point quickly after convergence due to the fact that the movement speed of the unmanned aerial vehicle is accelerated. And then, after the unmanned aerial vehicle reaches the terminal, the target point is updated immediately. The loss function then generates a higher jump until the neural network converges again, where it reaches the end point, and so on.

The simulation of the moving process of the unmanned aerial vehicle obstacle avoidance is shown in fig. 6, the upper group of images and the lower group of images can be seen, and the unmanned aerial vehicle can safely reach the destination point in a complex environment. The result shows that the unmanned aerial vehicle autonomous obstacle avoidance algorithm can complete obstacle avoidance flight from a starting point to a target point in a complex flight scene.

Table 2 unmanned plane model 3D printing data (part)

In a designed complex flight scene of the unmanned aerial vehicle, the provided unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm based on the depth Q learning algorithm is adopted, obstacle avoidance tests under different obstacle distribution conditions are respectively realized, the following control performance is analyzed from different angles by combining test results, and the effectiveness of the guidance algorithm is further clarified.

(1) And (3) robustness analysis: according to the method for erecting the radar detection line in the range from-45 degrees to 45 degrees in front of the course of the unmanned aerial vehicle, influences of factors such as weather and climate can be eliminated, information such as obstacles and flight boundaries in front of the unmanned aerial vehicle in advancing can be effectively detected, and reliable information is provided for autonomous obstacle avoidance; meanwhile, the adopted deep Q learning algorithm can make an optimal decision according to Q values aiming at different flight states of the unmanned aerial vehicle, and an obstacle avoidance instruction is provided for the unmanned aerial vehicle. In conclusion, the unmanned aerial vehicle has stronger robustness aiming at different flight scenes, climates, weather and other influencing factors in the obstacle avoidance flight process.

(2) And (3) real-time analysis: according to the algorithm provided by the invention, the front path information detected by the radar is used as a decision basis, and the optimal instruction for obstacle avoidance of the unmanned aerial vehicle is directly generated through processing of the deep neural network and the Q learning algorithm.

(3) And (3) safety analysis: as can be seen from fig. 6, the algorithm provided by the invention can accurately and effectively identify the obstacle in the flight scene, and make an optimal action decision, so that the collision between the unmanned aerial vehicle and the obstacle and the motion boundary is avoided, and the safety of the unmanned aerial vehicle in the flight in a complex scene is ensured.

In conclusion, the unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm based on deep learning, which is proposed by the research, has high applicability to the obstacle avoidance problem of the unmanned aerial vehicle in the complex flight scene.

Description of the drawings:

figure 1 quad-rotor unmanned aerial vehicle environmental perception and autonomic obstacle avoidance system structure picture.

FIG. 2 is a block diagram of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm design idea.

FIG. 3 is a schematic diagram of a neural network training model.

Fig. 4 is a flow chart of autonomous obstacle avoidance of the unmanned aerial vehicle.

Figure 5 shows a graph of the loss function of a neural network.

FIG. 6 is a schematic diagram of an environment sensing and autonomous obstacle avoidance simulation process.

Detailed Description

In order to overcome the defect of poor robustness of the traditional unmanned aerial vehicle autonomous obstacle avoidance algorithm, the mapping between the sensing distance between the unmanned aerial vehicle and the obstacle and the unmanned aerial vehicle obstacle avoidance strategy is established by means of a deep reinforcement learning algorithm in the field of artificial intelligence which causes attention of all parties at present in research, and a method for sensing and avoiding the obstacle of the quad-rotor unmanned aerial vehicle based on the deep Q learning algorithm is provided through a deep reinforcement learning network. The method utilizes the radar detector in front of the unmanned aerial vehicle to detect the flying environment in a certain range in front, so that the influences of factors such as climate, distance and the like can be avoided to the greatest extent, and the robustness of the algorithm is improved; meanwhile, the detection information is used as original data, and a deep Q learning network is adopted to directly generate an obstacle avoidance strategy of the unmanned aerial vehicle, so that the real-time performance of obstacle avoidance of the unmanned aerial vehicle can be obviously improved; in addition, in the training process of the obstacle avoidance strategy based on the deep Q learning, each state-action pair of the unmanned aerial vehicle can be effectively fitted to the corresponding Q value, and the flight safety of the unmanned aerial vehicle can be effectively guaranteed by utilizing the strategy generated by the greedy algorithm. The unmanned aerial vehicle perception and obstacle avoidance strategy based on the depth Q learning is used for path planning of the unmanned aerial vehicle in a complex environment, has important theoretical significance for the unmanned aerial vehicle autonomous obstacle avoidance research field, and has high strategic value.

Aiming at the defects of the traditional unmanned aerial vehicle autonomous obstacle avoidance scheme based on environment perception and path planning, the invention provides an unmanned aerial vehicle autonomous obstacle avoidance method based on deep Q learning, wherein firstly, a path within a certain distance in front of an unmanned aerial vehicle is detected by using a radar, and the distance between the unmanned aerial vehicle autonomous obstacle avoidance method and a barrier and a target point is obtained and is used as the current state of the unmanned aerial vehicle; secondly, in the training process, simulating a Q value corresponding to each state-action pair of the unmanned aerial vehicle by using a neural network; and finally, when the training result is gradually converged, selecting the optimal action for the unmanned aerial vehicle in each specific state by adopting a greedy algorithm, thereby realizing the autonomous obstacle avoidance of the unmanned aerial vehicle.

Therefore, the unmanned aerial vehicle environment sensing and autonomous obstacle avoidance method based on the deep Q learning algorithm is a closed-loop intelligent real-time control scheme, and is high in safety and rapidness; the method can solve the problem of autonomous obstacle avoidance of the quad-rotor unmanned aerial vehicle in a complex scene, and has strong robustness; the scheme has high effectiveness and reliability, is beneficial to improving the autonomous decision-making capability of the unmanned aerial vehicle in the task execution process, and can be applied to various civil and military fields; the intelligent path planning scheme can be applied to autonomous obstacle avoidance of an actual unmanned aerial vehicle, action instructions are generated rapidly on line, and safe obstacle avoidance flight is achieved.

The invention discloses a method for sensing the environment of a quad-rotor unmanned aerial vehicle and automatically avoiding obstacles based on deep Q learning, which is mainly researched by combining a control theory method and a virtual simulation technology, and a simulation experiment is carried out in a python2.7 environment, so that the effectiveness of the method is verified.

Firstly, establishing a Markov model of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm. And modeling the quintuple (s, a, r, p, gamma) of the Markov Decision Process (MDP) according to the action decision process of autonomous obstacle avoidance of the unmanned aerial vehicle.

(1) A set of states s. Defining the position coordinates (x, y) of the drone in the flight scene and the heading angle theta to represent the determined position of the drone, (x _g ,y _g ) Representing the destination of the flight mission of the drone, the distance of the drone to the destination is defined as follows:

△x＝x-x _g ,△y＝y-y _g (1)

in order to detect the environment of unmanned aerial vehicle place ahead route, between unmanned aerial vehicle driving the place ahead-45 degrees to 45 degrees, erect a length 4 m's radar detection line every 5 degrees, 16 totally, the detection distance of every radar detection line defines as follows:

wherein, (obs _ x) _j ,obs_y _j ) And (j 1.. multidot.n) represents the coordinate positions of n obstacles, and detected represents that the radar detection line of the unmanned aerial vehicle detects the obstacles (as shown in a module 1 in fig. 2). Meanwhile, in order to facilitate data processing, the distance dis detected by each radar detection line of the unmanned aerial vehicle _i (i ═ 1., 16) normalized to norm _ dis _i (i ═ 1.., 16), as follows:

finally the state of the unmanned plane is determined as

s＝[△x,△y,θ,norm_dis _i ] (4)

(2) The set of actions a. The action set refers to a set of all possible actions taken by the unmanned aerial vehicle according to the position of the unmanned aerial vehicle after receiving the feedback value of the external environment. In the unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, the motion speed v of the unmanned aerial vehicle is given, and the selectable action set is defined as

Namely, the unmanned plane always flies forwards at the speed v, and the heading angle theta is changed by selecting different actions, so that the speed components in the x and y directions are changed, and the planning of the flight path is realized.

(3) The function r is reported immediately. The immediate return function refers to the instant feedback obtained after the unmanned aerial vehicle selects a certain action in a certain state, and represents the reward of a certain state-action pair. Defining delta dis for measuring the distance of the unmanned aerial vehicle towards the target point at the previous moment t-1 in the current state when the moment t is measured:

(norm_dis ₈ -1) indicating whether an obstacle is detected and the distance to the obstacle is detected by the 8 th radar detection line ahead of the heading of the drone:

in summary, the immediate reporting function is defined as follows

Wherein hit represents that the unmanned aerial vehicle collides with the obstacle, and at target represents that the unmanned aerial vehicle arrives at the target point.

(4) The state transition probability function p. In the subject, the state transition probability function is used to describe the probability that the quad-rotor unmanned aerial vehicle selects a certain action to transition to the state of the next moment from the state of the current moment in the flight scene.

The flight environment is complex in the subject, so that the model is modeled into a Markov process with unknown state transition probability p. In the field of reinforcement learning, an effective solution algorithm exists under each condition aiming at the problem that whether the state transition probability is known to be based on an environment model or not. The deep Q learning algorithm is one of reinforcement learning algorithms, and can effectively solve the problem which is never based on the environment model under the condition that p is unknown.

(5) The discount factor gamma. The discount factor is used for describing the attention degree of the flight decision at the current moment to the future immediate return function in the autonomous obstacle avoidance decision process of the unmanned aerial vehicle.

And secondly, selecting a deep Q learning algorithm according to the modeled Markov decision process, determining an algorithm flow, and finding out an optimal solution of unmanned aerial vehicle environment perception and autonomous obstacle avoidance. The algorithm flow is determined as shown in table 1:

table 1: unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm

The algorithm flow comprises the following steps of firstly, randomly initializing the unmanned aerial vehicle state and the parameter of the neural network; secondly, selecting the action which enables the Q value to be maximum according to a plurality of Q values which are fitted to the current state by the neural network by using an epsilon probability (0< epsilon <1), randomly selecting the action by using a 1-epsilon probability, obtaining a feedback value after the action is completed, reaching a new state, and storing an experience segment of 'current state-action-feedback value-next time state' into an experience pool; and finally, the process is circulated until the unmanned aerial vehicle reaches the destination, and the neural network is trained at regular intervals in the process.

The training process of the neural network is as follows: firstly, the neural network randomly extracts experience segments from an experience pool and selects an action which enables the Q value of the experience segments to be maximum according to the state of the experience segments at the next moment; secondly, calculating the feedback value of the current state, and the square of the difference value between the maximum Q value corresponding to the state at the next moment and the Q value of the current state as the reverse error of the neural network; finally, to minimize the reverse transmission error, the neural network adjusts the parameters using a gradient descent algorithm.

And thirdly, setting an environment for unmanned aerial vehicle environment sensing and autonomous obstacle avoidance. In the process of unmanned aerial vehicle environment perception and autonomous obstacle avoidance, the unmanned aerial vehicle as an intelligent agent needs to continuously interact with the environment with obstacles around to obtain enough data, and then enough information can be collected as a basis for decision making. Meanwhile, the unmanned aerial vehicle is used as a controlled object, and a model of the unmanned aerial vehicle is also an indispensable part in the simulation verification process.

The unmanned aerial vehicle flight environment assumes that within the square range, cylinder that distributes and have inconsistent size is as the barrier, and green sign represents the destination that unmanned aerial vehicle flies simultaneously. The model of quad-rotor unmanned aerial vehicle is obtained through 3D printing data, and 3D printing data is input into environment Director, can reappear quad-rotor unmanned aerial vehicle's model.

Based on the three steps, the unmanned aerial vehicle can detect the obstacle and realize autonomous obstacle avoidance through the radar detection device of the unmanned aerial vehicle under the complex motion scene, and the unmanned aerial vehicle can reach the destination.

The structure diagram of the environment sensing and autonomous obstacle avoidance system of the quad-rotor unmanned aerial vehicle is shown in fig. 1. By acquiring state information such as obstacles and target points in the flight environment and selecting the optimal action in the current state, the quad-rotor unmanned aerial vehicle can be controlled, and the target requirement for reaching the destination is met. The fitting of the Q value is a core link of the algorithm, and only through the accurate fitting of the Q value, the appropriate action can be selected for the unmanned aerial vehicle to complete the set flight task. If the fitting part of the Q value does not exist, the unmanned aerial vehicle cannot obtain a flight instruction, and cannot complete a flight task in a complex environment.

Fig. 2 is a block diagram of the design idea of the environment sensing and autonomous obstacle avoidance algorithm of the unmanned aerial vehicle provided by the invention. The state detection module is responsible for acquiring information, and the distance between the unmanned aerial vehicle and a destination and the distance between the unmanned aerial vehicle and an obstacle are acquired through sensing of the unmanned aerial vehicle and the environment and used as state information of the deep Q learning algorithm. The neural network fitting module is responsible for calculating the Q value, and the Q values of all possible state-action pairs for a certain state are fitted by utilizing the approximation capability of the neural network. The action selection module is responsible for selecting actions executed by the unmanned aerial vehicle, selects the action with the maximum Q value according to the probability of epsilon (0< epsilon <1) in a plurality of Q values corresponding to the current state by utilizing a greedy algorithm, and randomly selects the action according to the probability of 1-epsilon. The action execution module is responsible for executing specific actions, and after the unmanned aerial vehicle receives the action information, corresponding actions are executed to reach a new position. The drone will gradually reach the specified destination in state acquisition-Q value fitting-action selection-execute action-new state acquisition.

Firstly, modeling an unmanned aerial vehicle environment perception and Markov process of an autonomous obstacle avoidance algorithm. And modeling the quintuple (s, a, r, p, gamma) of the Markov Decision Process (MDP) according to the action decision process of autonomous obstacle avoidance of the unmanned aerial vehicle.

(1) And a state set s, wherein the state set refers to a state quantity capable of determining the current flight information of the unmanned aerial vehicle.

Defining the current position (x, y) of the drone in the flight scene and the heading angle theta to represent the determined position of the drone, (x _g ,y _g ) Representing the destination of the mission for which the drone is flying, the distance of the drone from the destination is defined as follows:

△x＝x-x _g ,△y＝y-y _g (10)

wherein, (obs _ x) _j ,obs_y _j ) And (j 1.. multidot.n) represents the coordinate positions of n obstacles, and detected represents that the radar detection line of the unmanned aerial vehicle detects the obstacles (as shown in a module 1 in fig. 2). Meanwhile, in order to facilitate data processing, the distance dis detected by each radar detection line of the unmanned aerial vehicle is used _i (i ═ 1...., 16) normalization processing to norm _ dis _i (i ═ 1.., 16), as follows:

finally the state of the unmanned plane is determined as

s＝[△x,△y,θ,norm_dis _i ] (13)

In the state information, the distance between the current flight position of the unmanned aerial vehicle and the destination can be represented; meanwhile, the distance between the unmanned aerial vehicle and the obstacle existing in the flying scene can be represented, and therefore whether obstacle avoidance operation is needed or not is selected.

(2) Action set a, the action set refers to a set of all actions that the unmanned aerial vehicle may take for the position where the unmanned aerial vehicle is located after receiving the feedback value of the external environment.

In the unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, the motion speed v of the unmanned aerial vehicle is given, and the selectable action set is defined as

Namely, the unmanned plane always flies forwards at the speed v, and the heading angle theta is changed by selecting different actions, so that the speed components in the x and y directions are changed, and the planning of the flight path is realized. Therefore, before the unmanned aerial vehicle reaches the terminal, the unmanned aerial vehicle always moves along the track under the action of the heading angle theta at the speed v, and the track of the unmanned aerial vehicle changes along with the change of the heading angle until the unmanned aerial vehicle reaches the destination.

(3) And reporting a function r immediately, wherein the function for reporting immediately refers to instantaneous feedback obtained after the unmanned aerial vehicle selects a certain action in a certain state, and represents a reward for a certain state-action pair.

The state-action pair in the flight process of the unmanned aerial vehicle mainly comprises three conditions: and when the aircraft reaches a target point, the aircraft collides with an obstacle, and the aircraft is in a safe flight state. For each case, the immediate reward function needs to be designed reasonably. The scene of the unmanned aerial vehicle for the obstacle is simple, the immediate return function is respectively defined as a reward value of 15 and a penalty value of-20, the safe flight state is complex, and the travel distance of the unmanned aerial vehicle flying at the previous moment, the angle difference towards the target point and the distance between the unmanned aerial vehicle flying and the obstacle need to be comprehensively considered.

Δ dis is defined to measure the distance traveled by the current state toward the target point at time t compared to the previous state:

the delta theta is used for measuring the difference value between the current aviation correction angle of the unmanned aerial vehicle and the angle of the unmanned aerial vehicle towards a target point:

(norm_dis ₈ -1) indicating whether the 8 th radar detection line ahead of the heading of the drone detects an obstacle and the distance to the obstacle.

In summary, the immediate reporting function is defined as follows

The flight environment is complex in the subject, so that the model is modeled into a Markov process with unknown state transition probability p. In the field of reinforcement learning, an effective solution algorithm exists in each case for the problem of whether the state transition probability is known to be based on an environmental model or not. The deep Q learning algorithm is one of reinforcement learning algorithms, and can effectively solve the problem which is never based on the environment model under the condition that p is unknown.

(5) And the discount factor gamma is used for describing the attention degree of the current flight decision on the future immediate return function in the autonomous obstacle avoidance decision process of the unmanned aerial vehicle.

In the unmanned aerial vehicle environment perception and autonomous obstacle avoidance flight process, in order to enable the unmanned aerial vehicle to intelligently avoid the obstacle, the unmanned aerial vehicle needs to be in the current state until the accumulated return value of the future terminal state

And max.

When the cumulative reward function is maximum, the unmanned aerial vehicle can find the optimal path. Where γ represents the drone is in the current state s _t At the moment, the 'degree of importance' of the future reward is determined, wherein gamma is 1, which represents enough 'far vision' of the unmanned plane, and the current and future immediate reward values are treated equally; gamma is 0 to indicate that the unmanned plane has very short sight and only looks atThe current immediate return value is repeated while the future effect is ignored.

And secondly, building a depth Q learning algorithm of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm. In order to enable the neural network to accurately fit the Q value of each state-action pair, the neural network is trained by using a deep Q learning algorithm, and the purpose is to adjust the weight and deviation in each neural network layer by using a gradient descent algorithm.

Meanwhile, in the process of fitting the Q value of the neural network, a flight instruction in each state is selected by using a deep Q learning algorithm. In the selection process of the flight action, in order to avoid the algorithm from falling into a local optimal solution, the relationship between the utilization and the exploration of the unmanned aerial vehicle in the flight scene needs to be considered. And (3) adopting a greedy algorithm, and exploring the flight scene by the unmanned aerial vehicle with the probability of epsilon (0< epsilon <1) and the probability of 1-epsilon by utilizing the collected flight scene data.

Finally, the depth Q learning algorithm for unmanned aerial vehicle environment perception and autonomous obstacle avoidance is shown in table 2

Table 2: unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm

The algorithm flow comprises the following steps of firstly, randomly initializing the state of the unmanned aerial vehicle and the parameters of a neural network; secondly, selecting the action which enables the Q value to be maximum according to a plurality of Q values which are fitted to the current state by the neural network by using an epsilon probability (0< epsilon <1), randomly selecting the action by using a 1-epsilon probability, obtaining a feedback value after the action is completed, reaching a new state, and storing an experience segment of 'current state-action-feedback value-next time state' into an experience pool; and finally, the process is circulated until the unmanned aerial vehicle reaches the destination, and the neural network is trained after every certain number of steps in the process.

And thirdly, designing a complex flight scene of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm. And a complex flight scene is built to carry out experimental verification on the effectiveness of the unmanned aerial vehicle autonomous obstacle avoidance algorithm. In the process of sensing and obstacle avoidance of the unmanned aerial vehicle, the unmanned aerial vehicle needs to continuously interact with a flight scene, collect data as much as possible as a decision basis, and can fully train the neural network and make the most correct decision behavior in the process of obstacle avoidance. Meanwhile, the unmanned aerial vehicle is used as a controlled object, and a model of the unmanned aerial vehicle is also an indispensable part in the simulation verification process.

The unmanned aerial vehicle flight scene is assumed to be in a square flight range, and cylinders with different sizes are distributed in the boundary to serve as obstacles. To enhance the complexity of the flight scenario, the destination for the drone flight is generated randomly within each flight round. Meanwhile, the positions and the radii of all the obstacles in the boundary and the moving speed of the obstacles are randomly generated, and an algorithm for setting the obstacles in the flight scene of the unmanned aerial vehicle is shown in table 3

TABLE 3 unmanned aerial vehicle flight scene setting algorithm

The algorithm flow is as follows: firstly, determining the total area of obstacles in the flight environment and the ratio of the moving obstacles to the total area of the obstacles; secondly, randomly generating the radius and the position of the obstacle (both in an allowable range), and selecting the moving speed to be 0 or randomly generating the radius and the position (both in an allowable range) by taking the area ratio of the moving obstacle as the probability; finally, the obstacles are mapped in the flight environment according to the radius, position and moving speed of the obstacles until the area and the total area of the obstacles are reached.

Simultaneously, the model of four rotor unmanned aerial vehicle in the flight scene is obtained through 3D print data, with 3D print data input to open source environment Director, can reappear four rotor unmanned aerial vehicle's flight scene.

Based on the three steps, the unmanned aerial vehicle can detect the obstacle and realize autonomous obstacle avoidance through the radar detection device of the unmanned aerial vehicle under the complex flying scene to reach the destination.

Claims

1. An unmanned aerial vehicle environment sensing and autonomous obstacle avoidance method based on depth Q learning is characterized in that a path in a certain distance in front of an unmanned aerial vehicle is detected by a radar to obtain the distance between the unmanned aerial vehicle and a barrier and a target point as the current state of the unmanned aerial vehicle; secondly, in the training process, simulating a deep learning Q value corresponding to each state-action pair of the unmanned aerial vehicle by using a neural network; finally, when the training result is gradually converged, selecting the optimal action for the unmanned aerial vehicle in each specific state by adopting a greedy algorithm, so as to realize the autonomous obstacle avoidance of the unmanned aerial vehicle;

specifically, the distance between the unmanned aerial vehicle and a destination and the distance between the unmanned aerial vehicle and an obstacle are acquired through sensing of the unmanned aerial vehicle and the environment and serve as state information of a deep Q learning algorithm;

the action selection module is responsible for selecting the action executed by the unmanned aerial vehicle, the unmanned aerial vehicle is selected to execute the optimal action according to the epsilon probability by utilizing a greedy algorithm, the Q value corresponding to the optimal action is maximum, the action is randomly selected according to the 1-epsilon probability, and after the unmanned aerial vehicle receives the action information, the corresponding action is executed to reach a new position;

the unmanned aerial vehicle gradually reaches a specified destination in state acquisition-Q value fitting-action selection-execution action-new state acquisition;

the concrete steps are detailed as follows:

establishing a Markov model of an unmanned aerial vehicle environment perception and autonomous obstacle avoidance algorithm, modeling quintuple (s, a, r, p, gamma) of an MDP (minimization drive protocol) in the Markov decision process according to an action decision process of autonomous obstacle avoidance of the unmanned aerial vehicle, and performing a p-state transition probability function:

(1) a state set s defining the position coordinates (x, y) of the drone in the flight scene and a heading angle θ representing the determined position of the drone, (x) _g ,y _g ) Representing the destination of the flight mission of the drone, the distance of the drone to the destination is defined as follows:

Δx＝x-x _g ,Δy＝y-y _g (1)

where i is 1, … …,16, j is 1, … …, n, (obs _ x) _j ,obs_y _j ) Represent the coordinate position of n barriers, detected represents that unmanned aerial vehicle's radar detection line has detected the barrier, and for the convenience of data processing, the distance dis that detects every radar detection line of unmanned aerial vehicle simultaneously _i (i ═ 1...., 16) normalization processing to norm _ dis _i And, as follows:

finally the state of the unmanned plane is determined as

s＝[Δx,Δy,θ,norm_dis _i ] (4)

The unmanned plane always flies forwards at a speed v, and the course angle theta is changed by selecting different actions, so that the speed components in the x and y directions are changed, and the planning of a flight path is realized;

(3) an immediate return function r, which refers to an instantaneous feedback obtained after the unmanned aerial vehicle selects a certain action in a certain state, and represents a reward for a certain state-action pair, and when Δ dis is defined for measuring a time t, the distance traveled by the unmanned aerial vehicle towards a target point in a current state is greater than a previous time t-1:

Δ θ is used to measure the difference between the current aerial school angle of the drone and the angle of the drone towards the target point:

wherein radius represents the radius of the obstacle, and the immediate return function is defined as follows

2. The unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning of claim 1, wherein the deep Q learning algorithm flow is as follows: firstly, randomly initializing the state of the unmanned aerial vehicle and parameters of a neural network; secondly, selecting the action which enables the Q value to be maximum according to a plurality of Q values which are fitted to the current state by the neural network by using an epsilon probability, selecting the action randomly by using a 1-epsilon probability, obtaining a feedback value after the action is finished, reaching a new state, and storing an experience segment of 'current state-action-feedback value-next time state' into an experience pool; finally, the process is circulated until the unmanned aerial vehicle reaches the destination, and the neural network is trained after every certain times of steps in the process;