CN109933086A - Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study - Google Patents

Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study Download PDF

Info

Publication number
CN109933086A
CN109933086A CN201910195250.8A CN201910195250A CN109933086A CN 109933086 A CN109933086 A CN 109933086A CN 201910195250 A CN201910195250 A CN 201910195250A CN 109933086 A CN109933086 A CN 109933086A
Authority
CN
China
Prior art keywords
unmanned plane
state
movement
automatic obstacle
obstacle avoiding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910195250.8A
Other languages
Chinese (zh)
Other versions
CN109933086B (en
Inventor
田栢苓
刘丽红
崔婕
宗群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910195250.8A priority Critical patent/CN109933086B/en
Publication of CN109933086A publication Critical patent/CN109933086A/en
Application granted granted Critical
Publication of CN109933086B publication Critical patent/CN109933086B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/0088Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours

Abstract

The present invention relates to quadrotor drone environment sensings and automatic obstacle avoiding field, to reduce resource loss, cost;Adapt to real-time, robustness and the security requirement of unmanned plane automatic obstacle avoiding, the technical solution adopted by the present invention is that, unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study, the path in certain distance in front of unmanned plane is detected first with radar, obtain itself between barrier, target point at a distance from the state that is presently in as unmanned plane;Secondly, in training process, using each the state-movement of neuron network simulation unmanned plane to corresponding deep learning Q value;Finally, using greedy algorithm, optimal movement is selected for the unmanned plane under each particular state, to realize the automatic obstacle avoiding of unmanned plane when training result is gradually restrained.Present invention is mainly applied to unmanned plane environment sensings and automatic obstacle avoiding to control occasion.

Description

Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study
Technical field
The present invention relates to quadrotor drone environment sensings and automatic obstacle avoiding field, especially design UAV Intelligent path Project study field.More particularly to the unmanned plane environment sensing and automatic obstacle avoiding method learnt based on depth Q.
Background technique
In recent years, unmanned plane (Unmanned Aerial Vehicle, UAV) gradually enters into the public visual field, business, Agricultural, amusement even military field yield unusually brilliant results.Nearly ten years, China's unmanned plane quantity mutually realize than before from scratch, The situation of prosperity and development is arrived again.Data show, by the end of 2018, only the civilian unmanned plane amount of consumption had been approached 10,000,000,000 in China, and Consuming number is in rapid increase trend.The prosperity in unmanned plane market proposes more the safety and development of unmanned aerial vehicle (UAV) control technology High requirement.At this stage, China not yet forms complete unmanned plane airspace management rules and regulations, and unmanned plane applications in various fields is To " black fly " phenomenon etc., security risk is easily caused during unmanned plane during flying, forms unnecessary loss of property and people Member's injures and deaths.Therefore, the perception of unmanned plane and avoidance technology become the project that domestic and foreign scholars pay close attention to jointly.The collision of unmanned plane is logical Refer in flight course, in path building, mountains and rivers, birds even between other flyers at a distance from be less than safety threshold Value, or even directly generate the phenomenon that colliding.It is different from manned unmanned plane, it cannot rely on driver during unmanned plane navigation Change of flight speed and course, to achieve the purpose of obstacle avoidance.Therefore, the perception in unmanned systems and obstacle avoidance apparatus become nobody The essential component of system.At present the cognition technology of unmanned plane and automatic obstacle avoiding technology mainly include the following types:
1. the avoidance technology of view-based access control model: the forward path that the technology is mainly got in flight course using unmanned plane In ambient image, potential collision is predicted using image processing techniques, and real-time perfoming path planning is to realize nothing Man-machine safe flight;The program depends on mature image perception and processing technique, vulnerable to the environment such as weather, haze because Element influences.
2. the avoidance technology based on detection object: the technology covers that face is wider, mainly including the use of the thunder installed on unmanned plane Reach, the sensing device of ultrasonic wave, infrared ray detects the distance between itself and barrier, on this basis to the path of unmanned plane It is modified, realizes the purpose of avoidance.The disadvantages of this solution is the equidistant Detection Techniques of ultrasonic wave dependent on object reflecting surface Requirement it is excessively high, vulnerable to such environmental effects etc..
3. the avoidance technology based on electronic map: the technology mainly utilize electronic map built in unmanned plane and itself GPS positioning technology can accurately judge the location of unmanned plane itself and carry out Path selection.The defect of the program is not It can be suitably used for the emergency cases such as the moving obstacle that map is unknown, in airspace, poor robustness.
4. the avoidance technology based on Artificial Potential Field Method: the technology is mainly used in the path planning level of unmanned plane, continues to use Principle that is mutually exclusive between homophilic charge in electric field, attracting each other between the charges of different polarity is unmanned plane, barrier, target point Suitable charge attribute is distributed, unmanned plane avoiding obstacles are finally enable, reaches specified target point.
5. the automatic obstacle avoiding technology based on genetic algorithm, neural network, fuzzy control etc.: the technology is mainly used in nobody The path planning level of machine designs Non-linear Optimal Model or fuzzy controller, control for information such as the distances detected The flying speed of unmanned plane and course.
By the present Research of above-mentioned unmanned plane environment sensing and automatic obstacle avoiding technology it is found that the unmanned plane of the current overwhelming majority The scheme that avoidance technology is separated using perception with path planning.I.e. using cognition technology and Path Planning Technique as in system Two modules realize the avoidance of unmanned plane by the transmitting between data.The defect of this scheme is: 1) data are in two modules Between transmitting there may be delay, there is hysteresis, influences unmanned plane in the secure path for causing path planning algorithm to cook up Safe navigation;2) data pass out active, distortion phenomenon, cause path planning part to lose reliable data supporting, no Timely reaction can be made to barrier;3) most of path planning algorithm is easily trapped into locally optimal solution, it is difficult to efficiently solve compared with For the path planning problem in complicated flight environment of vehicle.4) perceived distance technology works as weather vulnerable to such environmental effects such as weather It is ill-conditioned, or when appearance counter interference, accurate obstacle distance detection can not be carried out.In short, traditional nobody at present The mode that machine automatic obstacle avoiding scheme mostly uses greatly perception to be connected each other with path planning needs to guarantee the maturation and two of respective technology Data efficient transmitting between person;When being influenced by external interference or the factors such as uncertain, it may cause algorithm and fail, robustness It is poor.
Summary of the invention
In order to overcome the deficiencies of the prior art, the present invention is directed to propose a kind of quadrotor based on depth Q learning algorithm nobody Machine environment sensing and automatic obstacle avoiding method.On the one hand, existing unmanned plane automatic obstacle avoiding path planning scheme is easily trapped into part Optimal solution causes unnecessary resource loss, cost during unmanned plane execution task;On the other hand, unmanned machine operation Environment is more changeable and complicated, various uncertain real-times, robustness and safety to unmanned plane automatic obstacle avoiding in flight course Property proposes higher requirement.For this reason, the technical scheme adopted by the present invention is that the unmanned plane environment sensing based on depth Q study With automatic obstacle avoiding method, the path in certain distance in front of unmanned plane is detected first with radar, obtains itself and barrier The state for hindering the distance between object, target point to be presently in as unmanned plane;Secondly, utilizing neuron network simulation in training process Each state-movement of unmanned plane is to corresponding deep learning Q value;Finally, when training result is gradually restrained, using greed Algorithm selects optimal movement for the unmanned plane under each particular state, to realize the automatic obstacle avoiding of unmanned plane.
Specifically, by the perception of unmanned plane and environment, acquisition and the distance between destination, barrier, as depth Q The status information of learning algorithm;
Neural network fitting module is responsible for the calculating of Q value: using the approximation capability of neural network, fitting for a certain shape Institute possible to state is stateful-the Q value that acts pair;
The selection that action selection module is responsible for unmanned plane execution movement is held using greedy algorithm with ε probability selection unmanned plane The optimal movement of row, the corresponding Q value of optimal movement is maximum, with 1- ε probability random selection movement, unmanned plane receive action message it Afterwards, corresponding movement is executed, a new position is reached;
Unmanned plane state acquisition-Q value fitting-movement selection-execute movement-it is new state acquisition will gradually reach it is specified Destination.
Specific steps refinement is as follows:
The first step establishes the Markov model of unmanned plane environment sensing Yu automatic obstacle avoiding algorithm, autonomous according to unmanned plane The movement decision process of avoidance models the five-tuple (s, a, r, p, γ) of markov decision process MDP:
(1) state set s, defining position coordinates (x, y) and course heading θ of the unmanned plane in flying scene indicates nobody The determination position of machine, (xg,yg) indicate unmanned plane during flying task destination, then unmanned plane is as follows to the distance definition of destination:
△ x=x-xg, △ y=y-yg (1)
In order to which the environment to unmanned plane forward path detects, -45 spend between 45 degree in front of unmanned plane traveling, often The radar detection line of an a length of 4m is set up every 5 degree, totally 16, the detection range of every radar detection line is defined as follows:
Wherein, i=1 ... ..., 16, j=1 ... ..., n, (obs_xj,obs_yj), indicate the coordinate position of n barrier, Detected indicates that the radar detection line of unmanned plane has detected barrier, while for the ease of data processing, and unmanned plane is every The distance dis that bar radar detection line detectsi, (i=1 ..., 16) normalized is norm_disi, it is as follows:
The state of last unmanned plane is determined as
S=[△ x, △ y, θ, norm_disi] (4)
(2) behavior aggregate a, behavior aggregate refers to unmanned plane after receiving the value of feedback of external environment, locating for itself Position, the set for the everything that may be taken give unmanned plane in unmanned plane environment sensing and automatic obstacle avoiding algorithm Movement velocity v, and selectable behavior aggregate is defined as
I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, to change In x, the velocity component in the direction y realizes the planning of track;
(3) Reward Program r immediately, Reward Program refers to that unmanned plane under a certain state, after selecting a certain movement, obtains immediately Instantaneous feedback, indicate award to a certain state-movement pair, define △ dis for when measuring moment t, current state to be earlier above One moment t-1, the distance that unmanned plane is advanced towards target point:
△ θ be used for measure current unmanned plane aviation school's angle and unmanned plane towards target point angle difference:
(norm_dis8- 1) indicate unmanned plane course in front of the 8th bar of radar detection line whether detect barrier and with The distance between barrier:
To sum up, Reward Program is defined as follows immediately
Wherein, hit indicates that unmanned plane collides with barrier, and at target indicates that unmanned plane reaches target point;
(4) state transition probability function, state transition probability function to describe quadrotor drone in flying scene, Subsequent time shape probability of state is transferred to by a certain movement of current time state selection;
(5) discount factor γ, discount factor is for describing in unmanned plane automatic obstacle avoiding decision process, current time flight " attention degree " of the decision to the following Reward Program immediately;
Second step according to the markov decision process modeled, selected depth Q learning algorithm, and determines algorithm flow, Find the optimal solution of unmanned plane environment sensing and automatic obstacle avoiding;
The complicated flying scene of third step, unmanned plane environment sensing and automatic obstacle avoiding algorithm designs, including building unmanned plane Model, design unmanned plane, then will be Step 1: two be applied to unmanned aerial vehicle (UAV) control, realization unmanned plane to ambient enviroment sensor model Environment sensing and automatic obstacle avoiding.
Depth Q learning algorithm process is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter; Secondly, according to multiple Q values that neural network fits current state, the maximum movement of Q value is made with ε probability selection, 0 < ε < 1, With 1- ε probability random selection movement, after executing execution, a value of feedback is obtained, reaches a new state, and handle " current state-movement-value of feedback-subsequent time state " experience segment is stored into experience pond;Finally, recycle the process until Unmanned plane arrives at the destination, and is trained later to neural network every the step of certain number in the process;
The training process of neural network is as follows: firstly, neural network randomly selects experience segment and basis from experience pond Wherein subsequent time state, selection make its maximum movement of Q value;Secondly, calculating the corresponding maximum Q of value of feedback, subsequent time state The reversed error square as neural network of value and current state Q value difference value;Finally, to keep back transfer error minimum, mind Parameter is adjusted using gradient descent algorithm through network.
The features of the present invention and beneficial effect are:
In order to verify the unmanned plane environment sensing proposed by the present invention based on depth Q learning algorithm and automatic obstacle avoiding method Validity devises unmanned plane automatic obstacle avoiding dummy emulation system, and has carried out emulation experiment on this system.In virtual emulation In environment, following simulation parameter is set:
(1) unmanned plane during flying scene: square flight range l=20m as shown in Figure 6, wherein total face of all barriers Product is d=0.01 to the accounting of square flight range, and barrier radius radius is randomly generated, but meets 0.1m≤radius ≤0.3m.In order to increase the complexity of unmanned plane during flying environment, in all barriers, moving obstacle accounts for all barriers Than for r=0.2, movement speed vobsIt is randomly generated, but satisfaction -3.0m/s≤vobs≤ 3.0m/s, the refreshing frequency of flying scene For 30Hz.
(2) neural network parameter: the learning rate that neural network gradient declines optimizer is 0.01, neural network training model As shown in figure 3, including the input layer of 19 neurons, the output layer of the hidden layer of 10 neurons and 3 neurons.Wherein input layer, The activation primitive of hidden layer is all made of linear amending unit.
(3) depth Q learning algorithm: exploration rate ε=0.9, discount factor γ=0.9, the memory pond of depth Q study, which stores, to hold Amount is 500, and every run 300 times is updated.
(4) radar detedtor: unmanned plane traveling front -45 is spent between 45 degree, and the thunder of an a length of 4m is set up every 5 degree Up to detection line, totally 16.
(5) unmanned plane model: the flying speed v=2.5m/s of unmanned plane, Image Rendering data come from 3D printing model 3DBuilder, part of data are as shown in table 2.
Unmanned plane environment sensing proposed by the present invention and automatic obstacle avoiding method are unfolded based on depth Q learning algorithm, by It, should in the case where unmanned plane during flying scene is extremely complex in the capability of fitting of deep learning and the decision-making capability of intensified learning Method still has good robustness.In order to further prove the present invention is based on the unmanned plane environment sensing of depth Q learning algorithm with The validity of automatic obstacle avoiding method carries out simulating, verifying for flying scene, wherein Obstacle Position, radius, movement speed, nothing Man-machine initial position is set at random with aiming spot.
Unmanned plane automatic obstacle avoiding flow chart is as shown in figure 4, unmanned plane is both needed to fly towards target point in each flight bout Row, upon arrival at a destination, the position of target point will do it update, and unmanned plane continues to track;When unmanned plane and barrier carry out When shock, the position of unmanned plane and target point is updated simultaneously;In order to improve efficiency unmanned plane in each flight bout, It, can be simultaneously to unmanned plane and target point when not only not reaching target point in longer period of time, but also not hit with barrier Position is updated.
Simulation result is as shown in figure 5, unmanned plane in each flight bout, can be realized loss function from high to low Convergence reaches target point after the convergence of neural metwork training value since the movement velocity of unmanned plane is accelerated quickly.Subsequent unmanned plane arrives Up to after terminal, target point can be updated immediately.Therefore, subsequent loss function can generate higher jump again, until nerve net Network is restrained again, is reached home herein, is so recycled.
The numerical analysis of evolution process of unmanned plane avoidance is as shown in fig. 6, upper and lower two groups of images can be seen that unmanned plane is more multiple Under miscellaneous environment, arrive safe and sound terminal.The result shows that unmanned plane automatic obstacle avoiding algorithm can be completed in complicated flying scene Avoidance flight from starting point to target point.
2 unmanned plane model 3D printing data (part) of table
In designed unmanned plane complexity flying scene, using the unmanned plane based on depth Q learning algorithm proposed Environment sensing and automatic obstacle avoiding algorithm realize the avoidance test in different distribution of obstacles respectively, below by knot Test result is closed, control performance is analyzed from different perspectives, further to define the validity of this guidance algorithm.
(1) robust analysis: proposed by the present invention in -45 to 45 degree of angular regions, to set up radar in front of unmanned plane course Line method is detected, the influence of the factors such as weather, weather can be excluded, effectively detects barrier, the flight in unmanned plane traveling front The information such as boundary provide reliable information for automatic obstacle avoiding;The depth Q learning algorithm used simultaneously is directed to different unmanned planes State of flight can make optimal decision according to Q value, provide avoidance instruction for unmanned plane.To sum up, unmanned plane flies in avoidance During row, there is stronger robustness for influence factors such as different flying scenes, weather, weather.
(2) real time analysis: algorithm proposed by the present invention, using the forward path information that radar detects as decision according to According to by deep neural network, the processing of Q learning algorithm, the directly optimum instruction of generation unmanned plane avoidance, with traditional avoidance side Method is compared, and the integration and transmitting of unmanned plane during flying environment sensing Yu two intermodular datas of unmanned plane automatic obstacle avoiding are avoided, and is shown Write the real-time for improving unmanned plane automatic obstacle avoiding algorithm.
(3) safety analysis: can be accurately to flying scene by can be seen that algorithm proposed by the present invention shown in Fig. 6 In barrier effectively identified, and make optimal movement decision, unmanned plane and barrier, moving boundaries avoided to occur Collision, ensure that the safety that unmanned plane flies in complex scene.
In conclusion the unmanned plane environment sensing and automatic obstacle avoiding algorithm based on deep learning for the proposition originally researched and proposed There is quite high applicability for avoidance problem of the unmanned plane in complicated flying scene.
Detailed description of the invention:
1 quadrotor drone environment sensing of attached drawing and automatic obstacle avoiding system construction drawing.
2 unmanned plane environment sensing of attached drawing and automatic obstacle avoiding algorithm mentality of designing block diagram.
3 neural network training model schematic diagram of attached drawing.
4 unmanned plane automatic obstacle avoiding flow chart of attached drawing.
5 neural network loss function change curve of attached drawing.
6 environment sensing of attached drawing and automatic obstacle avoiding simulation process schematic diagram.
Specific embodiment
In order to overcome the disadvantage of traditional unmanned plane automatic obstacle avoiding algorithm robustness difference, the present invention under study for action, by current Cause each side pay close attention to artificial intelligence field deeply learning algorithm, it is established that the perceived distance of unmanned plane and barrier with Mapping between unmanned plane Robot dodge strategy proposes a kind of quadrotor based on depth Q learning algorithm by deeply learning network Unmanned plane perception and barrier-avoiding method.This method is using the radar detedtor in front of unmanned plane to a certain range of flying ring in front Border is detected, and can utmostly be avoided factors such as weather, distance etc. from influencing, be improved the robustness of algorithm;Meanwhile utilizing spy Measurement information can directly be generated the Robot dodge strategy of unmanned plane using depth Q learning network, can significantly improved as initial data The real-time of unmanned plane avoidance;Moreover, the Robot dodge strategy based on depth Q study is in the training process, it can effectively be fitted nobody Each state-movement of machine can effectively ensure that unmanned plane during flying to corresponding Q value, using the strategy that greedy algorithm generates Safety.This is used to path of the unmanned plane under complex environment with Robot dodge strategy based on the unmanned plane perception of depth Q study to advise It draws, not only there is important theory significance, but also strategic value with higher to unmanned plane automatic obstacle avoiding research field.
For traditional unmanned plane automatic obstacle avoiding scheme based on environment sensing and path planning there are the shortcomings that, the present invention A kind of unmanned plane automatic obstacle avoiding method based on depth Q study is proposed, first with radar in certain distance in front of unmanned plane Path detected, obtain itself between barrier, target point at a distance from the state that is presently in as unmanned plane;Secondly, In training process, using neuron network simulation unmanned plane each state-movement to corresponding Q value;Finally, working as training result When gradually restraining, using greedy algorithm, optimal movement is selected for the unmanned plane under each particular state, to realize nothing Man-machine automatic obstacle avoiding.
It follows that the unmanned plane environment sensing and automatic obstacle avoiding method proposed by the present invention based on depth Q learning algorithm It is a kind of intelligent real-time control scheme of closed loop, it is highly-safe, rapidity is good;This method can solve under complex scene Quadrotor drone automatic obstacle avoiding problem, strong robustness;The validity and high reliablity of the program are conducive to improve unmanned plane Ability of making decisions on one's own during execution task can be applied to a variety of civil, military fields;It can be by the intelligence path planning Scheme is applied to the automatic obstacle avoiding of practical unmanned plane, quickly generates action order online, realizes safe avoidance flight.
The present invention is integrated as main research means with control theory method and Virtual Simulation, invents a kind of based on deep The quadrotor drone environment sensing and automatic obstacle avoiding method for spending Q study, emulation experiment is carried out under python2.7 environment, is tested The validity of this method is demonstrate,proved.
The first step establishes the Markov model of unmanned plane environment sensing Yu automatic obstacle avoiding algorithm.It is autonomous according to unmanned plane The movement decision process of avoidance models the five-tuple (s, a, r, p, γ) of markov decision process (MDP).
(1) state set s.Defining position coordinates (x, y) and course heading θ of the unmanned plane in flying scene indicates nobody The determination position of machine, (xg,yg) indicate unmanned plane during flying task destination, then unmanned plane is as follows to the distance definition of destination:
△ x=x-xg, △ y=y-yg (1)
In order to which the environment to unmanned plane forward path detects, -45 spend between 45 degree in front of unmanned plane traveling, often The radar detection line of an a length of 4m is set up every 5 degree, totally 16, the detection range of every radar detection line is defined as follows:
Wherein, (obs_xj,obs_yj), (j=1 ..., n) indicates that the coordinate position of n barrier, detected indicate The radar detection line of unmanned plane has detected barrier (as shown in Fig. 2 module 1).Simultaneously for the ease of data processing, by nobody The distance dis that every radar detection line of machine detectsi, (i=1 ..., 16) normalized is norm_disi, (i=1 ..., 16), as follows:
The state of last unmanned plane is determined as
S=[△ x, △ y, θ, norm_disi] (4)
(2) behavior aggregate a.Behavior aggregate refers to unmanned plane after receiving the value of feedback of external environment, locating for itself Position, the set for the everything that may be taken.In unmanned plane environment sensing and automatic obstacle avoiding algorithm, unmanned plane is given Movement velocity v, and selectable behavior aggregate is defined as
I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, to change In x, the velocity component in the direction y realizes the planning of track.
(3) Reward Program r immediately.Reward Program refers to that unmanned plane under a certain state, after selecting a certain movement, obtains immediately Instantaneous feedback, indicate award to a certain state-movement pair.△ dis is defined for when measuring moment t, current state to be earlier above One moment t-1, the distance that unmanned plane is advanced towards target point:
△ θ be used for measure current unmanned plane aviation school's angle and unmanned plane towards target point angle difference:
(norm_dis8- 1) indicate unmanned plane course in front of the 8th bar of radar detection line whether detect barrier and with The distance between barrier:
To sum up, Reward Program is defined as follows immediately
Wherein, hit indicates that unmanned plane collides with barrier, and at target indicates that unmanned plane reaches target point.
(4) state transition probability function p.In this project, state transition probability function is to describe quadrotor drone In flying scene, subsequent time shape probability of state is transferred to by a certain movement of current time state selection.
Flight environment of vehicle is complicated in this project, therefore is modeled as the unknown markoff process of state transition probability p.By force Aiming at the problem that changing learning areas and be divided into based on environmental model for whether state transition probability is known with environmental model is not based on, each There is effective solution algorithm in the case of.The one kind of depth Q learning algorithm as nitrification enhancement, can be unknown in p In the case where effectively solve the problems, such as to be not based on environmental model.
(5) discount factor γ.Discount factor is for describing in unmanned plane automatic obstacle avoiding decision process, current time flight " attention degree " of the decision to the following Reward Program immediately.
Second step according to the markov decision process modeled, selected depth Q learning algorithm, and determines algorithm flow, Find the optimal solution of unmanned plane environment sensing and automatic obstacle avoiding.Determine that algorithm flow is as shown in table 1:
Table 1: unmanned plane environment sensing and automatic obstacle avoiding algorithm
Algorithm flow is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter parameter;Secondly, According to multiple Q values that neural network fits current state, the maximum movement of Q value is made with ε probability (0 < ε < 1) selection, with 1- ε Probability random selection movement after executing execution, obtains a value of feedback, reaches a new state, and " current shape State-movement-value of feedback-subsequent time state " experience segment is stored into experience pond;Finally, recycling the process until unmanned plane It arrives at the destination, neural network is trained every the step of certain number in the process.
The training process of neural network is as follows: firstly, neural network randomly selects experience segment and basis from experience pond Wherein subsequent time state, selection make its maximum movement of Q value;Secondly, calculating current state value of feedback, subsequent time state pair Answer the reversed error square as neural network of maximum Q value and current state Q value difference value;Finally, to make back transfer error Minimum, neural network are adjusted parameter using gradient descent algorithm.
The environment of unmanned plane environment sensing and automatic obstacle avoiding is arranged in third step.In unmanned plane environment sensing and automatic obstacle avoiding During, unmanned plane needs constantly that there are the environment of barrier to interact with surrounding as intelligent body, obtains enough numbers According to enough information can be collected, the foundation as decision.Meanwhile unmanned plane is as controlled device, the model of unmanned plane It is indispensable a part during simulating, verifying.
Unmanned plane during flying environmental postulates be square range in, cylindrical body not of uniform size is distributed with as obstacle Object, while green mark indicates the destination of unmanned plane during flying.The model of quadrotor drone is obtained by 3D printing data, is incited somebody to action 3D printing data are input in environment Director, can reproduce the model of quadrotor drone.
Based on above three step, it can be achieved that unmanned plane is under compound movement scene, by the radar detection apparatus of itself, carry out Detection of obstacles simultaneously realizes automatic obstacle avoiding, arrives at the destination.
Quadrotor drone environment sensing and automatic obstacle avoiding system construction drawing are as shown in Figure 1.By to hindering in flight environment of vehicle Hinder the acquisition of the status informations such as object, target point, selects movement optimal under current state, can control quadrotor drone, it is real The target call now arrived at the destination.Wherein the fitting of Q value is the core link of algorithm, only by the accurate fitting of Q value, It can be the suitable movement of unmanned plane selection, complete set aerial mission.If without the fitting part of Q value, unmanned plane can not Flight directive is obtained, the aerial mission being unable to complete under complex environment.
Fig. 2 show unmanned plane environment sensing proposed by the present invention and automatic obstacle avoiding algorithm mentality of designing block diagram.State inspection The acquisition that module is responsible for information is surveyed, by the perception of unmanned plane and environment, is obtained and the distance between destination, barrier, work For the status information of depth Q learning algorithm.Neural network fitting module is responsible for the calculating of Q value, approaches energy using neural network Power, fit for possible to a certain state it is stateful-movement pair Q value.Action selection module is responsible for unmanned plane execution The selection of movement in multiple Q values corresponding to current state, makes Q value most with ε probability (0 < ε < 1) selection using greedy algorithm Big movement, with 1- ε probability random selection movement.It executes action module and is responsible for the execution specifically acted, unmanned plane receives dynamic After making information, corresponding movement is executed, a new position is reached.Unmanned plane is in state acquisition-Q value fitting-movement selection- Designated destination will gradually be reached by executing the new state acquisition of movement-.
The markoff process of the first step, unmanned plane environment sensing and automatic obstacle avoiding algorithm models.It is autonomous according to unmanned plane The movement decision process of avoidance models the five-tuple (s, a, r, p, γ) of markov decision process (MDP).
(1) state set s, state set, which refers to, can determine that the quantity of state for indicating unmanned plane current flight information.
Defining current location (x, y) and course heading θ of the unmanned plane in flying scene indicates that unmanned plane positions really It sets, (xg,yg) indicate unmanned plane during flying task destination, then distance definition of the unmanned plane apart from destination is as follows:
△ x=x-xg, △ y=y-yg (10)
In order to which the environment to unmanned plane forward path detects, -45 spend between 45 degree in front of unmanned plane traveling, often The radar detection line of an a length of 4m is set up every 5 degree, totally 16, the detection range of every radar detection line is defined as follows:
Wherein, (obs_xj,obs_yj), (j=1 ..., n) indicates that the coordinate position of n barrier, detected indicate The radar detection line of unmanned plane has detected barrier (as shown in Fig. 2 module 1).Simultaneously for the ease of data processing, by nobody The distance dis that every radar detection line of machine detectsi, (i=1 ..., 16) normalized is norm_disi, (i=1 ..., 16), as follows:
The state of last unmanned plane is determined as
S=[△ x, △ y, θ, norm_disi] (13)
In the status information, the distance between unmanned plane current flight position and destination can be both indicated;It simultaneously can To indicate the distance between barrier present in unmanned plane and flying scene, thus choose whether to need to carry out avoidance operation.
(2) behavior aggregate a, behavior aggregate refers to unmanned plane after receiving the value of feedback of external environment, locating for itself Position, the set for the everything that may be taken.
In unmanned plane environment sensing and automatic obstacle avoiding algorithm, the movement velocity v of given unmanned plane, and selectable movement Collection is defined as
I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, to change In x, the velocity component in the direction y realizes the planning of track.Thus it indicates, unmanned plane is existed always with speed v before reaching home It is moved under the action of course angle θ along track, against the variation of course heading, the track of unmanned plane can change therewith, until arriving Up to destination.
(3) function r is reported immediately, and Reward Program refers to that unmanned plane under a certain state, after selecting a certain movement, obtains immediately Instantaneous feedback, indicate award to a certain state-movement pair.
State-movement during unmanned plane during flying is to mainly point three kinds of situations: reaching target point, strikes obstacles, peace Full state of flight.For every case, require to rationally design Reward Program immediately.It wherein reaches target point and hits obstacle Object field scape is simple, and Reward Program is respectively defined as 15 reward value and -20 penalty value immediately, and safe flight state is more multiple It is miscellaneous, need to comprehensively consider travel distance of the unmanned plane during flying compared with previous moment, the differential seat angle towards target point, and and barrier Between distance.
△ dis is defined for when measuring moment t, current state to be compared with the distance that previous moment state is advanced towards target point:
△ θ be used for measure current unmanned plane aviation school's angle and unmanned plane towards target point angle difference:
(norm_dis8- 1) indicate unmanned plane course in front of the 8th bar of radar detection line whether detect barrier and with The distance between barrier.
To sum up, Reward Program is defined as follows immediately
Wherein, hit indicates that unmanned plane collides with barrier, and at target indicates that unmanned plane reaches target point.
(4) state transition probability function p.In this project, state transition probability function is to describe quadrotor drone In flying scene, subsequent time shape probability of state is transferred to by a certain movement of current time state selection.
Flight environment of vehicle is complicated in this project, therefore is modeled as the unknown markoff process of state transition probability p.By force Aiming at the problem that changing learning areas and be divided into based on environmental model for whether state transition probability is known with environmental model is not based on, each There is effective solution algorithm in the case of.The one kind of depth Q learning algorithm as nitrification enhancement, can be unknown in p In the case where effectively solve the problems, such as to be not based on environmental model.
(5) discount factor γ, discount factor is for describing in unmanned plane automatic obstacle avoiding decision process, current time flight " attention degree " of the decision to the following Reward Program immediately.
In unmanned plane environment sensing and automatic obstacle avoiding flight course, to enable the intelligent avoidance of unmanned plane, reality is needed Existing unmanned plane is under current state, until the accumulative return value of future terminal state
It is maximum.
When accumulative Reward Program maximum, unmanned plane can find optimal path.Wherein γ indicates unmanned plane current State stAt the moment, to " attention degree " of future returns, γ=1 indicates that unmanned plane is enough " long sight ", coequally treats current With following return value immediately;γ=0 indicates unmanned plane very " short-sighted ", only values current return value immediately, and has ignored Following influence.
Second step, unmanned plane environment sensing and the depth Q learning algorithm of automatic obstacle avoiding algorithm are built.To enable neural network It is enough accurately to fit each state-movement pair Q value, neural network is trained using depth Q learning algorithm, purpose exists In adjusting weight and deviation in each neural net layer using gradient descent algorithm.
Meanwhile during neural network is fitted Q value, the flight under each state is selected using depth Q learning algorithm Instruction.In the selection course of flare maneuver, in order to avoid algorithm falls into locally optimal solution, need to consider unmanned plane in flight field Relationship in scape between " utilization " and " exploration ".Using greedy algorithm, unmanned plane is utilized with the probability (0 < ε < 1) of ε and has been collected The data of obtained flying scene explore flying scene with the probability of 1- ε.
Finally, the depth Q learning algorithm of unmanned plane environment sensing and automatic obstacle avoiding is as shown in table 2
Table 2: unmanned plane environment sensing and automatic obstacle avoiding algorithm
Algorithm flow is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter;Secondly, according to Multiple Q values that neural network fits current state make the maximum movement of Q value with ε probability (0 < ε < 1) selection, with 1- ε probability Random selection movement after executing execution, obtains a value of feedback, reaches a new state, and " current state- Movement-value of feedback-subsequent time state " experience segment is stored into experience pond;Finally, recycling the process until unmanned plane reaches Destination is in the process later trained neural network every the step of certain number.
The training process of neural network is as follows: firstly, neural network randomly selects experience segment and basis from experience pond Wherein subsequent time state, selection make its maximum movement of Q value;Secondly, calculating the corresponding maximum Q of value of feedback, subsequent time state The reversed error square as neural network of value and current state Q value difference value;Finally, to keep back transfer error minimum, mind Parameter is adjusted using gradient descent algorithm through network.
The complicated flying scene of third step, unmanned plane environment sensing and automatic obstacle avoiding algorithm designs.Build one it is complicated Flying scene carries out experimental verification to the validity of unmanned plane automatic obstacle avoiding algorithm.During unmanned plane perception is with avoidance, It needs constantly to interact with flying scene, collects data as much as possible as decision-making foundation, nerve can be trained up Network, while most correct decision behavior is made during avoidance.Meanwhile unmanned plane is as controlled device, the mould of unmanned plane Type is also indispensable a part during simulating, verifying.
Unmanned plane during flying suppositive scenario is that circle not of uniform size is distributed in boundary in a square flight range Cylinder is as barrier.In order to enhance the complexity of flying scene, in each flight bout, the destination of unmanned plane during flying is random It generates.Meanwhile the movement speed of the positions of all barriers in boundary, radius and barrier is randomly generated, unmanned plane The setting algorithm of barrier is as shown in table 3 in flying scene
3 unmanned plane during flying scene setting algorithm of table
Algorithm flow is as follows: firstly, determining that the gross area of barrier and moving obstacle account for its total face in flight environment of vehicle Long-pending ratio;Secondly, the radius of random dyspoiesis object, position (within the allowable range), with moving obstacle area accounting For probability, selecting movement speed is 0, or is randomly generated (in allowed band);Finally, according to the radius of barrier, position with Movement speed draws barrier in flight environment of vehicle, until the gross area of area and arrival barrier.
Meanwhile the model of quadrotor drone is obtained by 3D printing data in flying scene, and 3D printing data are inputted Into open source environment Director, the flying scene of quadrotor drone can be reproduced.
Based on above three step, it can be achieved that unmanned plane is under complicated flying scene, by the radar detection apparatus of itself, carry out Detection of obstacles simultaneously realizes automatic obstacle avoiding, arrives at the destination.

Claims (4)

1. a kind of unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study, characterized in that first with radar pair Path in front of unmanned plane in certain distance is detected, obtain itself with barrier, target point between at a distance from be used as unmanned plane The state being presently in;Secondly, in training process, using neuron network simulation unmanned plane each state-movement to corresponding Deep learning Q value;Finally, when training result is gradually restrained, using greedy algorithm, be nobody under each particular state Machine selects optimal movement, to realize the automatic obstacle avoiding of unmanned plane.
2. unmanned plane environment sensing and automatic obstacle avoiding method as described in claim 1 based on depth Q study, characterized in that Specifically, by the perception of unmanned plane and environment, acquisition and the distance between destination, barrier, as depth Q learning algorithm Status information;
Neural network fitting module is responsible for the calculating of Q value: using the approximation capability of neural network, fitting for a certain state institute The Q value of possible stateful-movement pair;
Action selection module is responsible for the selection of unmanned plane execution movement, using greedy algorithm, is executed most with ε probability selection unmanned plane Excellent movement, it is optimal to act corresponding Q value maximum, with 1- ε probability random selection movement, after unmanned plane receives action message, Corresponding movement is executed, a new position is reached;
Unmanned plane, which executes the new state acquisition of movement-in state acquisition-Q value fitting-movement selection-, will gradually reach specified mesh Ground.
3. unmanned plane environment sensing and automatic obstacle avoiding method as described in claim 1 based on depth Q study, characterized in that Specific steps refinement is as follows:
The first step establishes the Markov model of unmanned plane environment sensing Yu automatic obstacle avoiding algorithm, according to unmanned plane automatic obstacle avoiding Movement decision process, the five-tuple (s, a, r, p, γ) of markov decision process MDP is modeled:
(1) state set s, defining position coordinates (x, y) and course heading θ of the unmanned plane in flying scene indicates unmanned plane Determine position, (xg,yg) indicate unmanned plane during flying task destination, then unmanned plane is as follows to the distance definition of destination:
Δ x=x-xg, Δ y=y-yg (1)
In order to which the environment to unmanned plane forward path detects, -45 spend between 45 degree in front of unmanned plane traveling, every 5 Degree sets up the radar detection line of an a length of 4m, and totally 16, the detection range of every radar detection line is defined as follows:
Wherein, i=1 ... ..., 16, j=1 ... ..., n, (obs_xj,obs_yj) indicate n barrier coordinate position, Detected indicates that the radar detection line of unmanned plane has detected barrier, while for the ease of data processing, and unmanned plane is every The distance dis that bar radar detection line detectsi, (i=1 ..., 16) normalized is norm_disi, it is as follows:
The state of last unmanned plane is determined as
S=[Δ x, Δ y, θ, norm_disi] (4)
(2) behavior aggregate a, behavior aggregate refer to unmanned plane after receiving the value of feedback of external environment, for itself the location of, The set for the everything that may be taken gives the movement of unmanned plane in unmanned plane environment sensing and automatic obstacle avoiding algorithm Speed v, and selectable behavior aggregate is defined as
I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, thus change in x, The velocity component in the direction y realizes the planning of track;
(3) Reward Program r immediately, Reward Program refers to unmanned plane under a certain state immediately, after selecting a certain movement, obtained wink When feed back, indicate award to a certain state-movement pair, define Δ dis for when measuring moment t, current state is earlier above for the moment T-1 is carved, the distance that unmanned plane is advanced towards target point:
Δ θ be used for measure current unmanned plane aviation school's angle and unmanned plane towards target point angle difference:
(norm_dis8- 1) indicate whether the 8th bar of radar detection line detects barrier and and barrier in front of unmanned plane course The distance between:
To sum up, Reward Program is defined as follows immediately
Wherein, hit indicates that unmanned plane collides with barrier, and at target indicates that unmanned plane reaches target point;
(4) state transition probability function, state transition probability function is to describe quadrotor drone in flying scene, by working as The preceding a certain movement of moment state selection is transferred to subsequent time shape probability of state;
(5) discount factor γ, discount factor is for describing the current time flight decision in unmanned plane automatic obstacle avoiding decision process To " attention degree " of the following Reward Program immediately;
Second step according to the markov decision process modeled, selected depth Q learning algorithm, and determines algorithm flow, finds The optimal solution of unmanned plane environment sensing and automatic obstacle avoiding;
The complicated flying scene of third step, unmanned plane environment sensing and automatic obstacle avoiding algorithm designs, including building unmanned plane model, Design unmanned plane, then will be Step 1: two be applied to unmanned aerial vehicle (UAV) control, realization unmanned plane environment sense to ambient enviroment sensor model Know and automatic obstacle avoiding.
4. unmanned plane environment sensing and automatic obstacle avoiding method as described in claim 1 based on depth Q study, characterized in that Depth Q learning algorithm process is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter;Secondly, according to Multiple Q values that neural network fits current state make the maximum movement of Q value with ε probability selection, and 0 < ε < 1 is general with 1- ε Rate random selection movement after executing execution, obtains a value of feedback, reaches a new state, and " current shape State-movement-value of feedback-subsequent time state " experience segment is stored into experience pond;Finally, recycling the process until unmanned plane It arrives at the destination, neural network is trained later every the step of certain number in the process;
The training process of neural network is as follows: firstly, neural network randomly selects experience segment from experience pond and according to wherein Subsequent time state, selection make its maximum movement of Q value;Secondly, calculate the corresponding maximum Q value of value of feedback, subsequent time state and The reversed error square as neural network of current state Q value difference value;Finally, to keep back transfer error minimum, nerve net Network is adjusted parameter using gradient descent algorithm.
CN201910195250.8A 2019-03-14 2019-03-14 Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning Active CN109933086B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910195250.8A CN109933086B (en) 2019-03-14 2019-03-14 Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910195250.8A CN109933086B (en) 2019-03-14 2019-03-14 Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning

Publications (2)

Publication Number Publication Date
CN109933086A true CN109933086A (en) 2019-06-25
CN109933086B CN109933086B (en) 2022-08-30

Family

ID=66987310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910195250.8A Active CN109933086B (en) 2019-03-14 2019-03-14 Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning

Country Status (1)

Country Link
CN (1) CN109933086B (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm
CN110378439A (en) * 2019-08-09 2019-10-25 重庆理工大学 Single robot path planning method based on Q-Learning algorithm
CN110488859A (en) * 2019-07-15 2019-11-22 北京航空航天大学 A kind of Path Planning for UAV based on improvement Q-learning algorithm
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN110554707A (en) * 2019-10-17 2019-12-10 陕西师范大学 Q learning automatic parameter adjusting method for aircraft attitude control loop
CN110596734A (en) * 2019-09-17 2019-12-20 南京航空航天大学 Multi-mode Q learning-based unmanned aerial vehicle positioning interference source system and method
CN110703766A (en) * 2019-11-07 2020-01-17 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network
CN110716575A (en) * 2019-09-29 2020-01-21 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning
CN110806756A (en) * 2019-09-10 2020-02-18 西北工业大学 Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN110879610A (en) * 2019-10-24 2020-03-13 北京航空航天大学 Reinforced learning method for autonomous optimizing track planning of solar unmanned aerial vehicle
CN110968102A (en) * 2019-12-27 2020-04-07 东南大学 Multi-agent collision avoidance method based on deep reinforcement learning
CN111123963A (en) * 2019-12-19 2020-05-08 南京航空航天大学 Unknown environment autonomous navigation system and method based on reinforcement learning
CN111198568A (en) * 2019-12-23 2020-05-26 燕山大学 Underwater robot obstacle avoidance control method based on Q learning
CN111260658A (en) * 2020-01-10 2020-06-09 厦门大学 Novel depth reinforcement learning algorithm for image segmentation
CN111473794A (en) * 2020-04-01 2020-07-31 北京理工大学 Structural road unmanned decision planning method based on reinforcement learning
CN111487992A (en) * 2020-04-22 2020-08-04 北京航空航天大学 Unmanned aerial vehicle sensing and obstacle avoidance integrated method and device based on deep reinforcement learning
CN111667513A (en) * 2020-06-01 2020-09-15 西北工业大学 Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN112036261A (en) * 2020-08-11 2020-12-04 海尔优家智能科技(北京)有限公司 Gesture recognition method and device, storage medium and electronic device
CN112148008A (en) * 2020-09-18 2020-12-29 中国航空无线电电子研究所 Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning
WO2021088133A1 (en) * 2019-11-05 2021-05-14 上海为彪汽配制造有限公司 Method and system for constructing flight trajectory of multi-rotor unmanned aerial vehicle
CN112947562A (en) * 2021-02-10 2021-06-11 西北工业大学 Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG
CN112937564A (en) * 2019-11-27 2021-06-11 初速度(苏州)科技有限公司 Lane change decision model generation method and unmanned vehicle lane change decision method and device
CN113110547A (en) * 2021-04-21 2021-07-13 吉林大学 Flight control method, device and equipment of miniature aviation aircraft
CN113232016A (en) * 2021-04-13 2021-08-10 哈尔滨工业大学(威海) Mechanical arm path planning method integrating reinforcement learning and fuzzy obstacle avoidance
CN113298368A (en) * 2021-05-14 2021-08-24 南京航空航天大学 Multi-unmanned aerial vehicle task planning method based on deep reinforcement learning
JP6950117B1 (en) * 2020-04-30 2021-10-13 楽天グループ株式会社 Learning device, information processing device, and trained control model
WO2021220467A1 (en) * 2020-04-30 2021-11-04 楽天株式会社 Learning device, information processing device, and learned control model
CN114371720A (en) * 2021-12-29 2022-04-19 国家电投集团贵州金元威宁能源股份有限公司 Control method and control device for unmanned aerial vehicle to track target
CN114578834A (en) * 2022-05-09 2022-06-03 北京大学 Target layered double-perception domain-based reinforcement learning unmanned vehicle path planning method
CN115574816A (en) * 2022-11-24 2023-01-06 东南大学 Bionic vision multi-source information intelligent perception unmanned platform
US11866070B2 (en) 2020-09-28 2024-01-09 Guangzhou Automobile Group Co., Ltd. Vehicle control method and apparatus, storage medium, and electronic device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106595671A (en) * 2017-02-22 2017-04-26 南方科技大学 Method and apparatus for planning route of unmanned aerial vehicle based on reinforcement learning
CN107065890A (en) * 2017-06-02 2017-08-18 北京航空航天大学 A kind of unmanned vehicle intelligent barrier avoiding method and system
CN108255182A (en) * 2018-01-30 2018-07-06 上海交通大学 A kind of service robot pedestrian based on deeply study perceives barrier-avoiding method
CN108388270A (en) * 2018-03-21 2018-08-10 天津大学 Cluster unmanned plane track posture cooperative control method towards security domain
US20180308371A1 (en) * 2017-04-19 2018-10-25 Beihang University Joint search method for uav multiobjective path planning in urban low altitude environment
CN109032168A (en) * 2018-05-07 2018-12-18 西安电子科技大学 A kind of Route planner of the multiple no-manned plane Cooperative Area monitoring based on DQN
US20190061147A1 (en) * 2016-04-27 2019-02-28 Neurala, Inc. Methods and Apparatus for Pruning Experience Memories for Deep Neural Network-Based Q-Learning
CN109443366A (en) * 2018-12-20 2019-03-08 北京航空航天大学 A kind of unmanned aerial vehicle group paths planning method based on improvement Q learning algorithm

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190061147A1 (en) * 2016-04-27 2019-02-28 Neurala, Inc. Methods and Apparatus for Pruning Experience Memories for Deep Neural Network-Based Q-Learning
CN106595671A (en) * 2017-02-22 2017-04-26 南方科技大学 Method and apparatus for planning route of unmanned aerial vehicle based on reinforcement learning
US20180308371A1 (en) * 2017-04-19 2018-10-25 Beihang University Joint search method for uav multiobjective path planning in urban low altitude environment
CN107065890A (en) * 2017-06-02 2017-08-18 北京航空航天大学 A kind of unmanned vehicle intelligent barrier avoiding method and system
CN108255182A (en) * 2018-01-30 2018-07-06 上海交通大学 A kind of service robot pedestrian based on deeply study perceives barrier-avoiding method
CN108388270A (en) * 2018-03-21 2018-08-10 天津大学 Cluster unmanned plane track posture cooperative control method towards security domain
CN109032168A (en) * 2018-05-07 2018-12-18 西安电子科技大学 A kind of Route planner of the multiple no-manned plane Cooperative Area monitoring based on DQN
CN109443366A (en) * 2018-12-20 2019-03-08 北京航空航天大学 A kind of unmanned aerial vehicle group paths planning method based on improvement Q learning algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘庆杰 等: "面向智能避障场景的深度强化学习研究", 《智能物联技术》 *
宗群 等: "基于马尔可夫网络排队论的电梯交通建模及应用", 《天津大学学报》 *
王立群 等: "基于深度Q值网络的自动小车控制方法", 《电子测量技术》 *

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110488859A (en) * 2019-07-15 2019-11-22 北京航空航天大学 A kind of Path Planning for UAV based on improvement Q-learning algorithm
CN110488859B (en) * 2019-07-15 2020-08-21 北京航空航天大学 Unmanned aerial vehicle route planning method based on improved Q-learning algorithm
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN110378439A (en) * 2019-08-09 2019-10-25 重庆理工大学 Single robot path planning method based on Q-Learning algorithm
CN110345948A (en) * 2019-08-16 2019-10-18 重庆邮智机器人研究院有限公司 Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm
CN110806756B (en) * 2019-09-10 2022-08-02 西北工业大学 Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN110806756A (en) * 2019-09-10 2020-02-18 西北工业大学 Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN110596734A (en) * 2019-09-17 2019-12-20 南京航空航天大学 Multi-mode Q learning-based unmanned aerial vehicle positioning interference source system and method
CN110716575A (en) * 2019-09-29 2020-01-21 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning
CN110554707B (en) * 2019-10-17 2022-09-30 陕西师范大学 Q learning automatic parameter adjusting method for aircraft attitude control loop
CN110554707A (en) * 2019-10-17 2019-12-10 陕西师范大学 Q learning automatic parameter adjusting method for aircraft attitude control loop
CN110879610A (en) * 2019-10-24 2020-03-13 北京航空航天大学 Reinforced learning method for autonomous optimizing track planning of solar unmanned aerial vehicle
WO2021088133A1 (en) * 2019-11-05 2021-05-14 上海为彪汽配制造有限公司 Method and system for constructing flight trajectory of multi-rotor unmanned aerial vehicle
CN110703766A (en) * 2019-11-07 2020-01-17 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network
CN110703766B (en) * 2019-11-07 2022-01-11 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network
CN112937564A (en) * 2019-11-27 2021-06-11 初速度(苏州)科技有限公司 Lane change decision model generation method and unmanned vehicle lane change decision method and device
CN111123963A (en) * 2019-12-19 2020-05-08 南京航空航天大学 Unknown environment autonomous navigation system and method based on reinforcement learning
CN111198568A (en) * 2019-12-23 2020-05-26 燕山大学 Underwater robot obstacle avoidance control method based on Q learning
CN110968102A (en) * 2019-12-27 2020-04-07 东南大学 Multi-agent collision avoidance method based on deep reinforcement learning
CN110968102B (en) * 2019-12-27 2022-08-26 东南大学 Multi-agent collision avoidance method based on deep reinforcement learning
CN111260658A (en) * 2020-01-10 2020-06-09 厦门大学 Novel depth reinforcement learning algorithm for image segmentation
CN111260658B (en) * 2020-01-10 2023-10-17 厦门大学 Deep reinforcement learning method for image segmentation
CN111473794A (en) * 2020-04-01 2020-07-31 北京理工大学 Structural road unmanned decision planning method based on reinforcement learning
CN111473794B (en) * 2020-04-01 2022-02-11 北京理工大学 Structural road unmanned decision planning method based on reinforcement learning
CN111487992A (en) * 2020-04-22 2020-08-04 北京航空航天大学 Unmanned aerial vehicle sensing and obstacle avoidance integrated method and device based on deep reinforcement learning
WO2021220467A1 (en) * 2020-04-30 2021-11-04 楽天株式会社 Learning device, information processing device, and learned control model
JP6950117B1 (en) * 2020-04-30 2021-10-13 楽天グループ株式会社 Learning device, information processing device, and trained control model
CN111667513B (en) * 2020-06-01 2022-02-18 西北工业大学 Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN111667513A (en) * 2020-06-01 2020-09-15 西北工业大学 Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN112036261A (en) * 2020-08-11 2020-12-04 海尔优家智能科技(北京)有限公司 Gesture recognition method and device, storage medium and electronic device
CN112148008A (en) * 2020-09-18 2020-12-29 中国航空无线电电子研究所 Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning
US11866070B2 (en) 2020-09-28 2024-01-09 Guangzhou Automobile Group Co., Ltd. Vehicle control method and apparatus, storage medium, and electronic device
CN112947562A (en) * 2021-02-10 2021-06-11 西北工业大学 Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG
CN112947562B (en) * 2021-02-10 2021-11-30 西北工业大学 Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG
CN113232016A (en) * 2021-04-13 2021-08-10 哈尔滨工业大学(威海) Mechanical arm path planning method integrating reinforcement learning and fuzzy obstacle avoidance
CN113110547A (en) * 2021-04-21 2021-07-13 吉林大学 Flight control method, device and equipment of miniature aviation aircraft
CN113298368A (en) * 2021-05-14 2021-08-24 南京航空航天大学 Multi-unmanned aerial vehicle task planning method based on deep reinforcement learning
CN113298368B (en) * 2021-05-14 2023-11-10 南京航空航天大学 Multi-unmanned aerial vehicle task planning method based on deep reinforcement learning
CN114371720A (en) * 2021-12-29 2022-04-19 国家电投集团贵州金元威宁能源股份有限公司 Control method and control device for unmanned aerial vehicle to track target
CN114371720B (en) * 2021-12-29 2023-09-29 国家电投集团贵州金元威宁能源股份有限公司 Control method and control device for realizing tracking target of unmanned aerial vehicle
CN114578834B (en) * 2022-05-09 2022-07-26 北京大学 Target layering double-perception domain-based reinforcement learning unmanned vehicle path planning method
CN114578834A (en) * 2022-05-09 2022-06-03 北京大学 Target layered double-perception domain-based reinforcement learning unmanned vehicle path planning method
CN115574816B (en) * 2022-11-24 2023-03-14 东南大学 Bionic vision multi-source information intelligent perception unmanned platform
CN115574816A (en) * 2022-11-24 2023-01-06 东南大学 Bionic vision multi-source information intelligent perception unmanned platform

Also Published As

Publication number Publication date
CN109933086B (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN109933086A (en) Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study
Xie et al. Unmanned aerial vehicle path planning algorithm based on deep reinforcement learning in large-scale and dynamic environments
CN114384920B (en) Dynamic obstacle avoidance method based on real-time construction of local grid map
Zhang et al. A novel real-time penetration path planning algorithm for stealth UAV in 3D complex dynamic environment
CN105549597B (en) A kind of unmanned vehicle dynamic path planning method based on environmental uncertainty
CN109870162A (en) A kind of unmanned plane during flying paths planning method based on competition deep learning network
CN111780777A (en) Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning
CN111399541B (en) Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network
CN107886750B (en) Unmanned automobile control method and system based on beyond-visual-range cooperative cognition
Wang et al. A deep reinforcement learning approach to flocking and navigation of uavs in large-scale complex environments
CN109445456A (en) A kind of multiple no-manned plane cluster air navigation aid
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
Wei et al. Recurrent MADDPG for object detection and assignment in combat tasks
CN114967721B (en) Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet
CN112114592B (en) Method for realizing autonomous crossing of movable frame-shaped barrier by unmanned aerial vehicle
CN112631134A (en) Intelligent trolley obstacle avoidance method based on fuzzy neural network
Makantasis et al. A deep reinforcement learning driving policy for autonomous road vehicles
Chen et al. Parallel motion planning: Learning a deep planning model against emergencies
Ke et al. Cooperative path planning for air–sea heterogeneous unmanned vehicles using search-and-tracking mission
CN116540784A (en) Unmanned system air-ground collaborative navigation and obstacle avoidance method based on vision
Zijian et al. Imaginary filtered hindsight experience replay for UAV tracking dynamic targets in large-scale unknown environments
Zhang et al. A bionic dynamic path planning algorithm of the micro UAV based on the fusion of deep neural network optimization/filtering and hawk-eye vision
Xie et al. Long and short term maneuver trajectory prediction of UCAV based on deep learning
Li et al. Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning
Fu et al. UAV mission path planning based on reinforcement learning in Dynamic Environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant