CN109933086A

CN109933086A - Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study

Info

Publication number: CN109933086A
Application number: CN201910195250.8A
Authority: CN
Inventors: 田栢苓; 刘丽红; 崔婕; 宗群
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2019-06-25
Anticipated expiration: 2039-03-14
Also published as: CN109933086B

Abstract

The present invention relates to quadrotor drone environment sensings and automatic obstacle avoiding field, to reduce resource loss, cost；Adapt to real-time, robustness and the security requirement of unmanned plane automatic obstacle avoiding, the technical solution adopted by the present invention is that, unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study, the path in certain distance in front of unmanned plane is detected first with radar, obtain itself between barrier, target point at a distance from the state that is presently in as unmanned plane；Secondly, in training process, using each the state-movement of neuron network simulation unmanned plane to corresponding deep learning Q value；Finally, using greedy algorithm, optimal movement is selected for the unmanned plane under each particular state, to realize the automatic obstacle avoiding of unmanned plane when training result is gradually restrained.Present invention is mainly applied to unmanned plane environment sensings and automatic obstacle avoiding to control occasion.

Description

Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study

Technical field

The present invention relates to quadrotor drone environment sensings and automatic obstacle avoiding field, especially design UAV Intelligent path Project study field.More particularly to the unmanned plane environment sensing and automatic obstacle avoiding method learnt based on depth Q.

Background technique

In recent years, unmanned plane (Unmanned Aerial Vehicle, UAV) gradually enters into the public visual field, business, Agricultural, amusement even military field yield unusually brilliant results.Nearly ten years, China's unmanned plane quantity mutually realize than before from scratch, The situation of prosperity and development is arrived again.Data show, by the end of 2018, only the civilian unmanned plane amount of consumption had been approached 10,000,000,000 in China, and Consuming number is in rapid increase trend.The prosperity in unmanned plane market proposes more the safety and development of unmanned aerial vehicle (UAV) control technology High requirement.At this stage, China not yet forms complete unmanned plane airspace management rules and regulations, and unmanned plane applications in various fields is To " black fly " phenomenon etc., security risk is easily caused during unmanned plane during flying, forms unnecessary loss of property and people Member's injures and deaths.Therefore, the perception of unmanned plane and avoidance technology become the project that domestic and foreign scholars pay close attention to jointly.The collision of unmanned plane is logical Refer in flight course, in path building, mountains and rivers, birds even between other flyers at a distance from be less than safety threshold Value, or even directly generate the phenomenon that colliding.It is different from manned unmanned plane, it cannot rely on driver during unmanned plane navigation Change of flight speed and course, to achieve the purpose of obstacle avoidance.Therefore, the perception in unmanned systems and obstacle avoidance apparatus become nobody The essential component of system.At present the cognition technology of unmanned plane and automatic obstacle avoiding technology mainly include the following types:

1. the avoidance technology of view-based access control model: the forward path that the technology is mainly got in flight course using unmanned plane In ambient image, potential collision is predicted using image processing techniques, and real-time perfoming path planning is to realize nothing Man-machine safe flight；The program depends on mature image perception and processing technique, vulnerable to the environment such as weather, haze because Element influences.

2. the avoidance technology based on detection object: the technology covers that face is wider, mainly including the use of the thunder installed on unmanned plane Reach, the sensing device of ultrasonic wave, infrared ray detects the distance between itself and barrier, on this basis to the path of unmanned plane It is modified, realizes the purpose of avoidance.The disadvantages of this solution is the equidistant Detection Techniques of ultrasonic wave dependent on object reflecting surface Requirement it is excessively high, vulnerable to such environmental effects etc..

3. the avoidance technology based on electronic map: the technology mainly utilize electronic map built in unmanned plane and itself GPS positioning technology can accurately judge the location of unmanned plane itself and carry out Path selection.The defect of the program is not It can be suitably used for the emergency cases such as the moving obstacle that map is unknown, in airspace, poor robustness.

4. the avoidance technology based on Artificial Potential Field Method: the technology is mainly used in the path planning level of unmanned plane, continues to use Principle that is mutually exclusive between homophilic charge in electric field, attracting each other between the charges of different polarity is unmanned plane, barrier, target point Suitable charge attribute is distributed, unmanned plane avoiding obstacles are finally enable, reaches specified target point.

5. the automatic obstacle avoiding technology based on genetic algorithm, neural network, fuzzy control etc.: the technology is mainly used in nobody The path planning level of machine designs Non-linear Optimal Model or fuzzy controller, control for information such as the distances detected The flying speed of unmanned plane and course.

By the present Research of above-mentioned unmanned plane environment sensing and automatic obstacle avoiding technology it is found that the unmanned plane of the current overwhelming majority The scheme that avoidance technology is separated using perception with path planning.I.e. using cognition technology and Path Planning Technique as in system Two modules realize the avoidance of unmanned plane by the transmitting between data.The defect of this scheme is: 1) data are in two modules Between transmitting there may be delay, there is hysteresis, influences unmanned plane in the secure path for causing path planning algorithm to cook up Safe navigation；2) data pass out active, distortion phenomenon, cause path planning part to lose reliable data supporting, no Timely reaction can be made to barrier；3) most of path planning algorithm is easily trapped into locally optimal solution, it is difficult to efficiently solve compared with For the path planning problem in complicated flight environment of vehicle.4) perceived distance technology works as weather vulnerable to such environmental effects such as weather It is ill-conditioned, or when appearance counter interference, accurate obstacle distance detection can not be carried out.In short, traditional nobody at present The mode that machine automatic obstacle avoiding scheme mostly uses greatly perception to be connected each other with path planning needs to guarantee the maturation and two of respective technology Data efficient transmitting between person；When being influenced by external interference or the factors such as uncertain, it may cause algorithm and fail, robustness It is poor.

Summary of the invention

In order to overcome the deficiencies of the prior art, the present invention is directed to propose a kind of quadrotor based on depth Q learning algorithm nobody Machine environment sensing and automatic obstacle avoiding method.On the one hand, existing unmanned plane automatic obstacle avoiding path planning scheme is easily trapped into part Optimal solution causes unnecessary resource loss, cost during unmanned plane execution task；On the other hand, unmanned machine operation Environment is more changeable and complicated, various uncertain real-times, robustness and safety to unmanned plane automatic obstacle avoiding in flight course Property proposes higher requirement.For this reason, the technical scheme adopted by the present invention is that the unmanned plane environment sensing based on depth Q study With automatic obstacle avoiding method, the path in certain distance in front of unmanned plane is detected first with radar, obtains itself and barrier The state for hindering the distance between object, target point to be presently in as unmanned plane；Secondly, utilizing neuron network simulation in training process Each state-movement of unmanned plane is to corresponding deep learning Q value；Finally, when training result is gradually restrained, using greed Algorithm selects optimal movement for the unmanned plane under each particular state, to realize the automatic obstacle avoiding of unmanned plane.

Specifically, by the perception of unmanned plane and environment, acquisition and the distance between destination, barrier, as depth Q The status information of learning algorithm；

Neural network fitting module is responsible for the calculating of Q value: using the approximation capability of neural network, fitting for a certain shape Institute possible to state is stateful-the Q value that acts pair；

The selection that action selection module is responsible for unmanned plane execution movement is held using greedy algorithm with ε probability selection unmanned plane The optimal movement of row, the corresponding Q value of optimal movement is maximum, with 1- ε probability random selection movement, unmanned plane receive action message it Afterwards, corresponding movement is executed, a new position is reached；

Unmanned plane state acquisition-Q value fitting-movement selection-execute movement-it is new state acquisition will gradually reach it is specified Destination.

Specific steps refinement is as follows:

The first step establishes the Markov model of unmanned plane environment sensing Yu automatic obstacle avoiding algorithm, autonomous according to unmanned plane The movement decision process of avoidance models the five-tuple (s, a, r, p, γ) of markov decision process MDP:

(1) state set s, defining position coordinates (x, y) and course heading θ of the unmanned plane in flying scene indicates nobody The determination position of machine, (x_g,y_g) indicate unmanned plane during flying task destination, then unmanned plane is as follows to the distance definition of destination:

△ x=x-x_g, △ y=y-y_g (1)

In order to which the environment to unmanned plane forward path detects, -45 spend between 45 degree in front of unmanned plane traveling, often The radar detection line of an a length of 4m is set up every 5 degree, totally 16, the detection range of every radar detection line is defined as follows:

Wherein, i=1 ... ..., 16, j=1 ... ..., n, (obs_x_j,obs_y_j), indicate the coordinate position of n barrier, Detected indicates that the radar detection line of unmanned plane has detected barrier, while for the ease of data processing, and unmanned plane is every The distance dis that bar radar detection line detects_i, (i=1 ..., 16) normalized is norm_dis_i, it is as follows:

The state of last unmanned plane is determined as

S=[△ x, △ y, θ, norm_dis_i] (4)

(2) behavior aggregate a, behavior aggregate refers to unmanned plane after receiving the value of feedback of external environment, locating for itself Position, the set for the everything that may be taken give unmanned plane in unmanned plane environment sensing and automatic obstacle avoiding algorithm Movement velocity v, and selectable behavior aggregate is defined as

I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, to change In x, the velocity component in the direction y realizes the planning of track；

(3) Reward Program r immediately, Reward Program refers to that unmanned plane under a certain state, after selecting a certain movement, obtains immediately Instantaneous feedback, indicate award to a certain state-movement pair, define △ dis for when measuring moment t, current state to be earlier above One moment t-1, the distance that unmanned plane is advanced towards target point:

△ θ be used for measure current unmanned plane aviation school's angle and unmanned plane towards target point angle difference:

(norm_dis₈- 1) indicate unmanned plane course in front of the 8th bar of radar detection line whether detect barrier and with The distance between barrier:

To sum up, Reward Program is defined as follows immediately

Wherein, hit indicates that unmanned plane collides with barrier, and at target indicates that unmanned plane reaches target point；

(4) state transition probability function, state transition probability function to describe quadrotor drone in flying scene, Subsequent time shape probability of state is transferred to by a certain movement of current time state selection；

(5) discount factor γ, discount factor is for describing in unmanned plane automatic obstacle avoiding decision process, current time flight " attention degree " of the decision to the following Reward Program immediately；

Second step according to the markov decision process modeled, selected depth Q learning algorithm, and determines algorithm flow, Find the optimal solution of unmanned plane environment sensing and automatic obstacle avoiding；

The complicated flying scene of third step, unmanned plane environment sensing and automatic obstacle avoiding algorithm designs, including building unmanned plane Model, design unmanned plane, then will be Step 1: two be applied to unmanned aerial vehicle (UAV) control, realization unmanned plane to ambient enviroment sensor model Environment sensing and automatic obstacle avoiding.

Depth Q learning algorithm process is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter； Secondly, according to multiple Q values that neural network fits current state, the maximum movement of Q value is made with ε probability selection, 0 < ε < 1, With 1- ε probability random selection movement, after executing execution, a value of feedback is obtained, reaches a new state, and handle " current state-movement-value of feedback-subsequent time state " experience segment is stored into experience pond；Finally, recycle the process until Unmanned plane arrives at the destination, and is trained later to neural network every the step of certain number in the process；

The training process of neural network is as follows: firstly, neural network randomly selects experience segment and basis from experience pond Wherein subsequent time state, selection make its maximum movement of Q value；Secondly, calculating the corresponding maximum Q of value of feedback, subsequent time state The reversed error square as neural network of value and current state Q value difference value；Finally, to keep back transfer error minimum, mind Parameter is adjusted using gradient descent algorithm through network.

The features of the present invention and beneficial effect are:

In order to verify the unmanned plane environment sensing proposed by the present invention based on depth Q learning algorithm and automatic obstacle avoiding method Validity devises unmanned plane automatic obstacle avoiding dummy emulation system, and has carried out emulation experiment on this system.In virtual emulation In environment, following simulation parameter is set:

(1) unmanned plane during flying scene: square flight range l=20m as shown in Figure 6, wherein total face of all barriers Product is d=0.01 to the accounting of square flight range, and barrier radius radius is randomly generated, but meets 0.1m≤radius ≤0.3m.In order to increase the complexity of unmanned plane during flying environment, in all barriers, moving obstacle accounts for all barriers Than for r=0.2, movement speed v_obsIt is randomly generated, but satisfaction -3.0m/s≤v_obs≤ 3.0m/s, the refreshing frequency of flying scene For 30Hz.

(2) neural network parameter: the learning rate that neural network gradient declines optimizer is 0.01, neural network training model As shown in figure 3, including the input layer of 19 neurons, the output layer of the hidden layer of 10 neurons and 3 neurons.Wherein input layer, The activation primitive of hidden layer is all made of linear amending unit.

(3) depth Q learning algorithm: exploration rate ε=0.9, discount factor γ=0.9, the memory pond of depth Q study, which stores, to hold Amount is 500, and every run 300 times is updated.

(4) radar detedtor: unmanned plane traveling front -45 is spent between 45 degree, and the thunder of an a length of 4m is set up every 5 degree Up to detection line, totally 16.

(5) unmanned plane model: the flying speed v=2.5m/s of unmanned plane, Image Rendering data come from 3D printing model 3DBuilder, part of data are as shown in table 2.

Unmanned plane environment sensing proposed by the present invention and automatic obstacle avoiding method are unfolded based on depth Q learning algorithm, by It, should in the case where unmanned plane during flying scene is extremely complex in the capability of fitting of deep learning and the decision-making capability of intensified learning Method still has good robustness.In order to further prove the present invention is based on the unmanned plane environment sensing of depth Q learning algorithm with The validity of automatic obstacle avoiding method carries out simulating, verifying for flying scene, wherein Obstacle Position, radius, movement speed, nothing Man-machine initial position is set at random with aiming spot.

Unmanned plane automatic obstacle avoiding flow chart is as shown in figure 4, unmanned plane is both needed to fly towards target point in each flight bout Row, upon arrival at a destination, the position of target point will do it update, and unmanned plane continues to track；When unmanned plane and barrier carry out When shock, the position of unmanned plane and target point is updated simultaneously；In order to improve efficiency unmanned plane in each flight bout, It, can be simultaneously to unmanned plane and target point when not only not reaching target point in longer period of time, but also not hit with barrier Position is updated.

Simulation result is as shown in figure 5, unmanned plane in each flight bout, can be realized loss function from high to low Convergence reaches target point after the convergence of neural metwork training value since the movement velocity of unmanned plane is accelerated quickly.Subsequent unmanned plane arrives Up to after terminal, target point can be updated immediately.Therefore, subsequent loss function can generate higher jump again, until nerve net Network is restrained again, is reached home herein, is so recycled.

The numerical analysis of evolution process of unmanned plane avoidance is as shown in fig. 6, upper and lower two groups of images can be seen that unmanned plane is more multiple Under miscellaneous environment, arrive safe and sound terminal.The result shows that unmanned plane automatic obstacle avoiding algorithm can be completed in complicated flying scene Avoidance flight from starting point to target point.

2 unmanned plane model 3D printing data (part) of table

In designed unmanned plane complexity flying scene, using the unmanned plane based on depth Q learning algorithm proposed Environment sensing and automatic obstacle avoiding algorithm realize the avoidance test in different distribution of obstacles respectively, below by knot Test result is closed, control performance is analyzed from different perspectives, further to define the validity of this guidance algorithm.

(1) robust analysis: proposed by the present invention in -45 to 45 degree of angular regions, to set up radar in front of unmanned plane course Line method is detected, the influence of the factors such as weather, weather can be excluded, effectively detects barrier, the flight in unmanned plane traveling front The information such as boundary provide reliable information for automatic obstacle avoiding；The depth Q learning algorithm used simultaneously is directed to different unmanned planes State of flight can make optimal decision according to Q value, provide avoidance instruction for unmanned plane.To sum up, unmanned plane flies in avoidance During row, there is stronger robustness for influence factors such as different flying scenes, weather, weather.

(2) real time analysis: algorithm proposed by the present invention, using the forward path information that radar detects as decision according to According to by deep neural network, the processing of Q learning algorithm, the directly optimum instruction of generation unmanned plane avoidance, with traditional avoidance side Method is compared, and the integration and transmitting of unmanned plane during flying environment sensing Yu two intermodular datas of unmanned plane automatic obstacle avoiding are avoided, and is shown Write the real-time for improving unmanned plane automatic obstacle avoiding algorithm.

(3) safety analysis: can be accurately to flying scene by can be seen that algorithm proposed by the present invention shown in Fig. 6 In barrier effectively identified, and make optimal movement decision, unmanned plane and barrier, moving boundaries avoided to occur Collision, ensure that the safety that unmanned plane flies in complex scene.

In conclusion the unmanned plane environment sensing and automatic obstacle avoiding algorithm based on deep learning for the proposition originally researched and proposed There is quite high applicability for avoidance problem of the unmanned plane in complicated flying scene.

Detailed description of the invention:

1 quadrotor drone environment sensing of attached drawing and automatic obstacle avoiding system construction drawing.

2 unmanned plane environment sensing of attached drawing and automatic obstacle avoiding algorithm mentality of designing block diagram.

3 neural network training model schematic diagram of attached drawing.

4 unmanned plane automatic obstacle avoiding flow chart of attached drawing.

5 neural network loss function change curve of attached drawing.

6 environment sensing of attached drawing and automatic obstacle avoiding simulation process schematic diagram.

Specific embodiment

In order to overcome the disadvantage of traditional unmanned plane automatic obstacle avoiding algorithm robustness difference, the present invention under study for action, by current Cause each side pay close attention to artificial intelligence field deeply learning algorithm, it is established that the perceived distance of unmanned plane and barrier with Mapping between unmanned plane Robot dodge strategy proposes a kind of quadrotor based on depth Q learning algorithm by deeply learning network Unmanned plane perception and barrier-avoiding method.This method is using the radar detedtor in front of unmanned plane to a certain range of flying ring in front Border is detected, and can utmostly be avoided factors such as weather, distance etc. from influencing, be improved the robustness of algorithm；Meanwhile utilizing spy Measurement information can directly be generated the Robot dodge strategy of unmanned plane using depth Q learning network, can significantly improved as initial data The real-time of unmanned plane avoidance；Moreover, the Robot dodge strategy based on depth Q study is in the training process, it can effectively be fitted nobody Each state-movement of machine can effectively ensure that unmanned plane during flying to corresponding Q value, using the strategy that greedy algorithm generates Safety.This is used to path of the unmanned plane under complex environment with Robot dodge strategy based on the unmanned plane perception of depth Q study to advise It draws, not only there is important theory significance, but also strategic value with higher to unmanned plane automatic obstacle avoiding research field.

For traditional unmanned plane automatic obstacle avoiding scheme based on environment sensing and path planning there are the shortcomings that, the present invention A kind of unmanned plane automatic obstacle avoiding method based on depth Q study is proposed, first with radar in certain distance in front of unmanned plane Path detected, obtain itself between barrier, target point at a distance from the state that is presently in as unmanned plane；Secondly, In training process, using neuron network simulation unmanned plane each state-movement to corresponding Q value；Finally, working as training result When gradually restraining, using greedy algorithm, optimal movement is selected for the unmanned plane under each particular state, to realize nothing Man-machine automatic obstacle avoiding.

It follows that the unmanned plane environment sensing and automatic obstacle avoiding method proposed by the present invention based on depth Q learning algorithm It is a kind of intelligent real-time control scheme of closed loop, it is highly-safe, rapidity is good；This method can solve under complex scene Quadrotor drone automatic obstacle avoiding problem, strong robustness；The validity and high reliablity of the program are conducive to improve unmanned plane Ability of making decisions on one's own during execution task can be applied to a variety of civil, military fields；It can be by the intelligence path planning Scheme is applied to the automatic obstacle avoiding of practical unmanned plane, quickly generates action order online, realizes safe avoidance flight.

The present invention is integrated as main research means with control theory method and Virtual Simulation, invents a kind of based on deep The quadrotor drone environment sensing and automatic obstacle avoiding method for spending Q study, emulation experiment is carried out under python2.7 environment, is tested The validity of this method is demonstrate,proved.

The first step establishes the Markov model of unmanned plane environment sensing Yu automatic obstacle avoiding algorithm.It is autonomous according to unmanned plane The movement decision process of avoidance models the five-tuple (s, a, r, p, γ) of markov decision process (MDP).

(1) state set s.Defining position coordinates (x, y) and course heading θ of the unmanned plane in flying scene indicates nobody The determination position of machine, (x_g,y_g) indicate unmanned plane during flying task destination, then unmanned plane is as follows to the distance definition of destination:

△ x=x-x_g, △ y=y-y_g (1)

Wherein, (obs_x_j,obs_y_j), (j=1 ..., n) indicates that the coordinate position of n barrier, detected indicate The radar detection line of unmanned plane has detected barrier (as shown in Fig. 2 module 1).Simultaneously for the ease of data processing, by nobody The distance dis that every radar detection line of machine detects_i, (i=1 ..., 16) normalized is norm_dis_i, (i=1 ..., 16), as follows:

The state of last unmanned plane is determined as

S=[△ x, △ y, θ, norm_dis_i] (4)

(2) behavior aggregate a.Behavior aggregate refers to unmanned plane after receiving the value of feedback of external environment, locating for itself Position, the set for the everything that may be taken.In unmanned plane environment sensing and automatic obstacle avoiding algorithm, unmanned plane is given Movement velocity v, and selectable behavior aggregate is defined as

I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, to change In x, the velocity component in the direction y realizes the planning of track.

(3) Reward Program r immediately.Reward Program refers to that unmanned plane under a certain state, after selecting a certain movement, obtains immediately Instantaneous feedback, indicate award to a certain state-movement pair.△ dis is defined for when measuring moment t, current state to be earlier above One moment t-1, the distance that unmanned plane is advanced towards target point:

To sum up, Reward Program is defined as follows immediately

Wherein, hit indicates that unmanned plane collides with barrier, and at target indicates that unmanned plane reaches target point.

(4) state transition probability function p.In this project, state transition probability function is to describe quadrotor drone In flying scene, subsequent time shape probability of state is transferred to by a certain movement of current time state selection.

Flight environment of vehicle is complicated in this project, therefore is modeled as the unknown markoff process of state transition probability p.By force Aiming at the problem that changing learning areas and be divided into based on environmental model for whether state transition probability is known with environmental model is not based on, each There is effective solution algorithm in the case of.The one kind of depth Q learning algorithm as nitrification enhancement, can be unknown in p In the case where effectively solve the problems, such as to be not based on environmental model.

(5) discount factor γ.Discount factor is for describing in unmanned plane automatic obstacle avoiding decision process, current time flight " attention degree " of the decision to the following Reward Program immediately.

Second step according to the markov decision process modeled, selected depth Q learning algorithm, and determines algorithm flow, Find the optimal solution of unmanned plane environment sensing and automatic obstacle avoiding.Determine that algorithm flow is as shown in table 1:

Table 1: unmanned plane environment sensing and automatic obstacle avoiding algorithm

Algorithm flow is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter parameter；Secondly, According to multiple Q values that neural network fits current state, the maximum movement of Q value is made with ε probability (0 < ε < 1) selection, with 1- ε Probability random selection movement after executing execution, obtains a value of feedback, reaches a new state, and " current shape State-movement-value of feedback-subsequent time state " experience segment is stored into experience pond；Finally, recycling the process until unmanned plane It arrives at the destination, neural network is trained every the step of certain number in the process.

The training process of neural network is as follows: firstly, neural network randomly selects experience segment and basis from experience pond Wherein subsequent time state, selection make its maximum movement of Q value；Secondly, calculating current state value of feedback, subsequent time state pair Answer the reversed error square as neural network of maximum Q value and current state Q value difference value；Finally, to make back transfer error Minimum, neural network are adjusted parameter using gradient descent algorithm.

The environment of unmanned plane environment sensing and automatic obstacle avoiding is arranged in third step.In unmanned plane environment sensing and automatic obstacle avoiding During, unmanned plane needs constantly that there are the environment of barrier to interact with surrounding as intelligent body, obtains enough numbers According to enough information can be collected, the foundation as decision.Meanwhile unmanned plane is as controlled device, the model of unmanned plane It is indispensable a part during simulating, verifying.

Unmanned plane during flying environmental postulates be square range in, cylindrical body not of uniform size is distributed with as obstacle Object, while green mark indicates the destination of unmanned plane during flying.The model of quadrotor drone is obtained by 3D printing data, is incited somebody to action 3D printing data are input in environment Director, can reproduce the model of quadrotor drone.

Based on above three step, it can be achieved that unmanned plane is under compound movement scene, by the radar detection apparatus of itself, carry out Detection of obstacles simultaneously realizes automatic obstacle avoiding, arrives at the destination.

Quadrotor drone environment sensing and automatic obstacle avoiding system construction drawing are as shown in Figure 1.By to hindering in flight environment of vehicle Hinder the acquisition of the status informations such as object, target point, selects movement optimal under current state, can control quadrotor drone, it is real The target call now arrived at the destination.Wherein the fitting of Q value is the core link of algorithm, only by the accurate fitting of Q value, It can be the suitable movement of unmanned plane selection, complete set aerial mission.If without the fitting part of Q value, unmanned plane can not Flight directive is obtained, the aerial mission being unable to complete under complex environment.

Fig. 2 show unmanned plane environment sensing proposed by the present invention and automatic obstacle avoiding algorithm mentality of designing block diagram.State inspection The acquisition that module is responsible for information is surveyed, by the perception of unmanned plane and environment, is obtained and the distance between destination, barrier, work For the status information of depth Q learning algorithm.Neural network fitting module is responsible for the calculating of Q value, approaches energy using neural network Power, fit for possible to a certain state it is stateful-movement pair Q value.Action selection module is responsible for unmanned plane execution The selection of movement in multiple Q values corresponding to current state, makes Q value most with ε probability (0 < ε < 1) selection using greedy algorithm Big movement, with 1- ε probability random selection movement.It executes action module and is responsible for the execution specifically acted, unmanned plane receives dynamic After making information, corresponding movement is executed, a new position is reached.Unmanned plane is in state acquisition-Q value fitting-movement selection- Designated destination will gradually be reached by executing the new state acquisition of movement-.

The markoff process of the first step, unmanned plane environment sensing and automatic obstacle avoiding algorithm models.It is autonomous according to unmanned plane The movement decision process of avoidance models the five-tuple (s, a, r, p, γ) of markov decision process (MDP).

(1) state set s, state set, which refers to, can determine that the quantity of state for indicating unmanned plane current flight information.

Defining current location (x, y) and course heading θ of the unmanned plane in flying scene indicates that unmanned plane positions really It sets, (x_g,y_g) indicate unmanned plane during flying task destination, then distance definition of the unmanned plane apart from destination is as follows:

△ x=x-x_g, △ y=y-y_g (10)

The state of last unmanned plane is determined as

S=[△ x, △ y, θ, norm_dis_i] (13)

In the status information, the distance between unmanned plane current flight position and destination can be both indicated；It simultaneously can To indicate the distance between barrier present in unmanned plane and flying scene, thus choose whether to need to carry out avoidance operation.

(2) behavior aggregate a, behavior aggregate refers to unmanned plane after receiving the value of feedback of external environment, locating for itself Position, the set for the everything that may be taken.

In unmanned plane environment sensing and automatic obstacle avoiding algorithm, the movement velocity v of given unmanned plane, and selectable movement Collection is defined as

I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, to change In x, the velocity component in the direction y realizes the planning of track.Thus it indicates, unmanned plane is existed always with speed v before reaching home It is moved under the action of course angle θ along track, against the variation of course heading, the track of unmanned plane can change therewith, until arriving Up to destination.

(3) function r is reported immediately, and Reward Program refers to that unmanned plane under a certain state, after selecting a certain movement, obtains immediately Instantaneous feedback, indicate award to a certain state-movement pair.

State-movement during unmanned plane during flying is to mainly point three kinds of situations: reaching target point, strikes obstacles, peace Full state of flight.For every case, require to rationally design Reward Program immediately.It wherein reaches target point and hits obstacle Object field scape is simple, and Reward Program is respectively defined as 15 reward value and -20 penalty value immediately, and safe flight state is more multiple It is miscellaneous, need to comprehensively consider travel distance of the unmanned plane during flying compared with previous moment, the differential seat angle towards target point, and and barrier Between distance.

△ dis is defined for when measuring moment t, current state to be compared with the distance that previous moment state is advanced towards target point:

(norm_dis₈- 1) indicate unmanned plane course in front of the 8th bar of radar detection line whether detect barrier and with The distance between barrier.

To sum up, Reward Program is defined as follows immediately

(5) discount factor γ, discount factor is for describing in unmanned plane automatic obstacle avoiding decision process, current time flight " attention degree " of the decision to the following Reward Program immediately.

In unmanned plane environment sensing and automatic obstacle avoiding flight course, to enable the intelligent avoidance of unmanned plane, reality is needed Existing unmanned plane is under current state, until the accumulative return value of future terminal state

It is maximum.

When accumulative Reward Program maximum, unmanned plane can find optimal path.Wherein γ indicates unmanned plane current State s_tAt the moment, to " attention degree " of future returns, γ=1 indicates that unmanned plane is enough " long sight ", coequally treats current With following return value immediately；γ=0 indicates unmanned plane very " short-sighted ", only values current return value immediately, and has ignored Following influence.

Second step, unmanned plane environment sensing and the depth Q learning algorithm of automatic obstacle avoiding algorithm are built.To enable neural network It is enough accurately to fit each state-movement pair Q value, neural network is trained using depth Q learning algorithm, purpose exists In adjusting weight and deviation in each neural net layer using gradient descent algorithm.

Meanwhile during neural network is fitted Q value, the flight under each state is selected using depth Q learning algorithm Instruction.In the selection course of flare maneuver, in order to avoid algorithm falls into locally optimal solution, need to consider unmanned plane in flight field Relationship in scape between " utilization " and " exploration ".Using greedy algorithm, unmanned plane is utilized with the probability (0 < ε < 1) of ε and has been collected The data of obtained flying scene explore flying scene with the probability of 1- ε.

Finally, the depth Q learning algorithm of unmanned plane environment sensing and automatic obstacle avoiding is as shown in table 2

Table 2: unmanned plane environment sensing and automatic obstacle avoiding algorithm

Algorithm flow is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter；Secondly, according to Multiple Q values that neural network fits current state make the maximum movement of Q value with ε probability (0 < ε < 1) selection, with 1- ε probability Random selection movement after executing execution, obtains a value of feedback, reaches a new state, and " current state- Movement-value of feedback-subsequent time state " experience segment is stored into experience pond；Finally, recycling the process until unmanned plane reaches Destination is in the process later trained neural network every the step of certain number.

The complicated flying scene of third step, unmanned plane environment sensing and automatic obstacle avoiding algorithm designs.Build one it is complicated Flying scene carries out experimental verification to the validity of unmanned plane automatic obstacle avoiding algorithm.During unmanned plane perception is with avoidance, It needs constantly to interact with flying scene, collects data as much as possible as decision-making foundation, nerve can be trained up Network, while most correct decision behavior is made during avoidance.Meanwhile unmanned plane is as controlled device, the mould of unmanned plane Type is also indispensable a part during simulating, verifying.

Unmanned plane during flying suppositive scenario is that circle not of uniform size is distributed in boundary in a square flight range Cylinder is as barrier.In order to enhance the complexity of flying scene, in each flight bout, the destination of unmanned plane during flying is random It generates.Meanwhile the movement speed of the positions of all barriers in boundary, radius and barrier is randomly generated, unmanned plane The setting algorithm of barrier is as shown in table 3 in flying scene

3 unmanned plane during flying scene setting algorithm of table

Algorithm flow is as follows: firstly, determining that the gross area of barrier and moving obstacle account for its total face in flight environment of vehicle Long-pending ratio；Secondly, the radius of random dyspoiesis object, position (within the allowable range), with moving obstacle area accounting For probability, selecting movement speed is 0, or is randomly generated (in allowed band)；Finally, according to the radius of barrier, position with Movement speed draws barrier in flight environment of vehicle, until the gross area of area and arrival barrier.

Meanwhile the model of quadrotor drone is obtained by 3D printing data in flying scene, and 3D printing data are inputted Into open source environment Director, the flying scene of quadrotor drone can be reproduced.

Based on above three step, it can be achieved that unmanned plane is under complicated flying scene, by the radar detection apparatus of itself, carry out Detection of obstacles simultaneously realizes automatic obstacle avoiding, arrives at the destination.

Claims

1. a kind of unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study, characterized in that first with radar pair Path in front of unmanned plane in certain distance is detected, obtain itself with barrier, target point between at a distance from be used as unmanned plane The state being presently in；Secondly, in training process, using neuron network simulation unmanned plane each state-movement to corresponding Deep learning Q value；Finally, when training result is gradually restrained, using greedy algorithm, be nobody under each particular state Machine selects optimal movement, to realize the automatic obstacle avoiding of unmanned plane.

2. unmanned plane environment sensing and automatic obstacle avoiding method as described in claim 1 based on depth Q study, characterized in that Specifically, by the perception of unmanned plane and environment, acquisition and the distance between destination, barrier, as depth Q learning algorithm Status information；

Neural network fitting module is responsible for the calculating of Q value: using the approximation capability of neural network, fitting for a certain state institute The Q value of possible stateful-movement pair；

Action selection module is responsible for the selection of unmanned plane execution movement, using greedy algorithm, is executed most with ε probability selection unmanned plane Excellent movement, it is optimal to act corresponding Q value maximum, with 1- ε probability random selection movement, after unmanned plane receives action message, Corresponding movement is executed, a new position is reached；

Unmanned plane, which executes the new state acquisition of movement-in state acquisition-Q value fitting-movement selection-, will gradually reach specified mesh Ground.

3. unmanned plane environment sensing and automatic obstacle avoiding method as described in claim 1 based on depth Q study, characterized in that Specific steps refinement is as follows:

The first step establishes the Markov model of unmanned plane environment sensing Yu automatic obstacle avoiding algorithm, according to unmanned plane automatic obstacle avoiding Movement decision process, the five-tuple (s, a, r, p, γ) of markov decision process MDP is modeled:

(1) state set s, defining position coordinates (x, y) and course heading θ of the unmanned plane in flying scene indicates unmanned plane Determine position, (x_g,y_g) indicate unmanned plane during flying task destination, then unmanned plane is as follows to the distance definition of destination:

Δ x=x-x_g, Δ y=y-y_g (1)

In order to which the environment to unmanned plane forward path detects, -45 spend between 45 degree in front of unmanned plane traveling, every 5 Degree sets up the radar detection line of an a length of 4m, and totally 16, the detection range of every radar detection line is defined as follows:

Wherein, i=1 ... ..., 16, j=1 ... ..., n, (obs_x_j,obs_y_j) indicate n barrier coordinate position, Detected indicates that the radar detection line of unmanned plane has detected barrier, while for the ease of data processing, and unmanned plane is every The distance dis that bar radar detection line detects_i, (i=1 ..., 16) normalized is norm_dis_i, it is as follows:

The state of last unmanned plane is determined as

S=[Δ x, Δ y, θ, norm_dis_i] (4)

(2) behavior aggregate a, behavior aggregate refer to unmanned plane after receiving the value of feedback of external environment, for itself the location of, The set for the everything that may be taken gives the movement of unmanned plane in unmanned plane environment sensing and automatic obstacle avoiding algorithm Speed v, and selectable behavior aggregate is defined as

I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, thus change in x, The velocity component in the direction y realizes the planning of track；

(3) Reward Program r immediately, Reward Program refers to unmanned plane under a certain state immediately, after selecting a certain movement, obtained wink When feed back, indicate award to a certain state-movement pair, define Δ dis for when measuring moment t, current state is earlier above for the moment T-1 is carved, the distance that unmanned plane is advanced towards target point:

Δ θ be used for measure current unmanned plane aviation school's angle and unmanned plane towards target point angle difference:

(norm_dis₈- 1) indicate whether the 8th bar of radar detection line detects barrier and and barrier in front of unmanned plane course The distance between:

To sum up, Reward Program is defined as follows immediately

(4) state transition probability function, state transition probability function is to describe quadrotor drone in flying scene, by working as The preceding a certain movement of moment state selection is transferred to subsequent time shape probability of state；

(5) discount factor γ, discount factor is for describing the current time flight decision in unmanned plane automatic obstacle avoiding decision process To " attention degree " of the following Reward Program immediately；

Second step according to the markov decision process modeled, selected depth Q learning algorithm, and determines algorithm flow, finds The optimal solution of unmanned plane environment sensing and automatic obstacle avoiding；

The complicated flying scene of third step, unmanned plane environment sensing and automatic obstacle avoiding algorithm designs, including building unmanned plane model, Design unmanned plane, then will be Step 1: two be applied to unmanned aerial vehicle (UAV) control, realization unmanned plane environment sense to ambient enviroment sensor model Know and automatic obstacle avoiding.

4. unmanned plane environment sensing and automatic obstacle avoiding method as described in claim 1 based on depth Q study, characterized in that Depth Q learning algorithm process is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter；Secondly, according to Multiple Q values that neural network fits current state make the maximum movement of Q value with ε probability selection, and 0 < ε < 1 is general with 1- ε Rate random selection movement after executing execution, obtains a value of feedback, reaches a new state, and " current shape State-movement-value of feedback-subsequent time state " experience segment is stored into experience pond；Finally, recycling the process until unmanned plane It arrives at the destination, neural network is trained later every the step of certain number in the process；

The training process of neural network is as follows: firstly, neural network randomly selects experience segment from experience pond and according to wherein Subsequent time state, selection make its maximum movement of Q value；Secondly, calculate the corresponding maximum Q value of value of feedback, subsequent time state and The reversed error square as neural network of current state Q value difference value；Finally, to keep back transfer error minimum, nerve net Network is adjusted parameter using gradient descent algorithm.