CN109933086A - Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study - Google Patents
Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study Download PDFInfo
- Publication number
- CN109933086A CN109933086A CN201910195250.8A CN201910195250A CN109933086A CN 109933086 A CN109933086 A CN 109933086A CN 201910195250 A CN201910195250 A CN 201910195250A CN 109933086 A CN109933086 A CN 109933086A
- Authority
- CN
- China
- Prior art keywords
- unmanned plane
- state
- movement
- automatic obstacle
- obstacle avoiding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 230000004888 barrier function Effects 0.000 claims abstract description 51
- 230000008569 process Effects 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 16
- 210000002569 neuron Anatomy 0.000 claims abstract description 7
- 238000004088 simulation Methods 0.000 claims abstract description 7
- 238000013135 deep learning Methods 0.000 claims abstract description 5
- 238000013528 artificial neural network Methods 0.000 claims description 41
- 238000001514 detection method Methods 0.000 claims description 32
- 238000013439 planning Methods 0.000 claims description 17
- 230000006399 behavior Effects 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 12
- 230000007704 transition Effects 0.000 claims description 12
- 230000008859 change Effects 0.000 claims description 11
- 230000008447 perception Effects 0.000 claims description 11
- 230000009471 action Effects 0.000 claims description 10
- 238000013461 design Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 7
- 238000012546 transfer Methods 0.000 claims description 4
- 238000004064 recycling Methods 0.000 claims description 3
- 210000004218 nerve net Anatomy 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 20
- 230000007613 environmental effect Effects 0.000 description 9
- 238000010146 3D printing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000019771 cognition Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241001425390 Aphis fabae Species 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 206010048669 Terminal state Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 231100000517 death Toxicity 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 208000001491 myopia Diseases 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 238000013432 robust analysis Methods 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/0088—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
Abstract
The present invention relates to quadrotor drone environment sensings and automatic obstacle avoiding field, to reduce resource loss, cost;Adapt to real-time, robustness and the security requirement of unmanned plane automatic obstacle avoiding, the technical solution adopted by the present invention is that, unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study, the path in certain distance in front of unmanned plane is detected first with radar, obtain itself between barrier, target point at a distance from the state that is presently in as unmanned plane;Secondly, in training process, using each the state-movement of neuron network simulation unmanned plane to corresponding deep learning Q value;Finally, using greedy algorithm, optimal movement is selected for the unmanned plane under each particular state, to realize the automatic obstacle avoiding of unmanned plane when training result is gradually restrained.Present invention is mainly applied to unmanned plane environment sensings and automatic obstacle avoiding to control occasion.
Description
Technical field
The present invention relates to quadrotor drone environment sensings and automatic obstacle avoiding field, especially design UAV Intelligent path
Project study field.More particularly to the unmanned plane environment sensing and automatic obstacle avoiding method learnt based on depth Q.
Background technique
In recent years, unmanned plane (Unmanned Aerial Vehicle, UAV) gradually enters into the public visual field, business,
Agricultural, amusement even military field yield unusually brilliant results.Nearly ten years, China's unmanned plane quantity mutually realize than before from scratch,
The situation of prosperity and development is arrived again.Data show, by the end of 2018, only the civilian unmanned plane amount of consumption had been approached 10,000,000,000 in China, and
Consuming number is in rapid increase trend.The prosperity in unmanned plane market proposes more the safety and development of unmanned aerial vehicle (UAV) control technology
High requirement.At this stage, China not yet forms complete unmanned plane airspace management rules and regulations, and unmanned plane applications in various fields is
To " black fly " phenomenon etc., security risk is easily caused during unmanned plane during flying, forms unnecessary loss of property and people
Member's injures and deaths.Therefore, the perception of unmanned plane and avoidance technology become the project that domestic and foreign scholars pay close attention to jointly.The collision of unmanned plane is logical
Refer in flight course, in path building, mountains and rivers, birds even between other flyers at a distance from be less than safety threshold
Value, or even directly generate the phenomenon that colliding.It is different from manned unmanned plane, it cannot rely on driver during unmanned plane navigation
Change of flight speed and course, to achieve the purpose of obstacle avoidance.Therefore, the perception in unmanned systems and obstacle avoidance apparatus become nobody
The essential component of system.At present the cognition technology of unmanned plane and automatic obstacle avoiding technology mainly include the following types:
1. the avoidance technology of view-based access control model: the forward path that the technology is mainly got in flight course using unmanned plane
In ambient image, potential collision is predicted using image processing techniques, and real-time perfoming path planning is to realize nothing
Man-machine safe flight;The program depends on mature image perception and processing technique, vulnerable to the environment such as weather, haze because
Element influences.
2. the avoidance technology based on detection object: the technology covers that face is wider, mainly including the use of the thunder installed on unmanned plane
Reach, the sensing device of ultrasonic wave, infrared ray detects the distance between itself and barrier, on this basis to the path of unmanned plane
It is modified, realizes the purpose of avoidance.The disadvantages of this solution is the equidistant Detection Techniques of ultrasonic wave dependent on object reflecting surface
Requirement it is excessively high, vulnerable to such environmental effects etc..
3. the avoidance technology based on electronic map: the technology mainly utilize electronic map built in unmanned plane and itself
GPS positioning technology can accurately judge the location of unmanned plane itself and carry out Path selection.The defect of the program is not
It can be suitably used for the emergency cases such as the moving obstacle that map is unknown, in airspace, poor robustness.
4. the avoidance technology based on Artificial Potential Field Method: the technology is mainly used in the path planning level of unmanned plane, continues to use
Principle that is mutually exclusive between homophilic charge in electric field, attracting each other between the charges of different polarity is unmanned plane, barrier, target point
Suitable charge attribute is distributed, unmanned plane avoiding obstacles are finally enable, reaches specified target point.
5. the automatic obstacle avoiding technology based on genetic algorithm, neural network, fuzzy control etc.: the technology is mainly used in nobody
The path planning level of machine designs Non-linear Optimal Model or fuzzy controller, control for information such as the distances detected
The flying speed of unmanned plane and course.
By the present Research of above-mentioned unmanned plane environment sensing and automatic obstacle avoiding technology it is found that the unmanned plane of the current overwhelming majority
The scheme that avoidance technology is separated using perception with path planning.I.e. using cognition technology and Path Planning Technique as in system
Two modules realize the avoidance of unmanned plane by the transmitting between data.The defect of this scheme is: 1) data are in two modules
Between transmitting there may be delay, there is hysteresis, influences unmanned plane in the secure path for causing path planning algorithm to cook up
Safe navigation;2) data pass out active, distortion phenomenon, cause path planning part to lose reliable data supporting, no
Timely reaction can be made to barrier;3) most of path planning algorithm is easily trapped into locally optimal solution, it is difficult to efficiently solve compared with
For the path planning problem in complicated flight environment of vehicle.4) perceived distance technology works as weather vulnerable to such environmental effects such as weather
It is ill-conditioned, or when appearance counter interference, accurate obstacle distance detection can not be carried out.In short, traditional nobody at present
The mode that machine automatic obstacle avoiding scheme mostly uses greatly perception to be connected each other with path planning needs to guarantee the maturation and two of respective technology
Data efficient transmitting between person;When being influenced by external interference or the factors such as uncertain, it may cause algorithm and fail, robustness
It is poor.
Summary of the invention
In order to overcome the deficiencies of the prior art, the present invention is directed to propose a kind of quadrotor based on depth Q learning algorithm nobody
Machine environment sensing and automatic obstacle avoiding method.On the one hand, existing unmanned plane automatic obstacle avoiding path planning scheme is easily trapped into part
Optimal solution causes unnecessary resource loss, cost during unmanned plane execution task;On the other hand, unmanned machine operation
Environment is more changeable and complicated, various uncertain real-times, robustness and safety to unmanned plane automatic obstacle avoiding in flight course
Property proposes higher requirement.For this reason, the technical scheme adopted by the present invention is that the unmanned plane environment sensing based on depth Q study
With automatic obstacle avoiding method, the path in certain distance in front of unmanned plane is detected first with radar, obtains itself and barrier
The state for hindering the distance between object, target point to be presently in as unmanned plane;Secondly, utilizing neuron network simulation in training process
Each state-movement of unmanned plane is to corresponding deep learning Q value;Finally, when training result is gradually restrained, using greed
Algorithm selects optimal movement for the unmanned plane under each particular state, to realize the automatic obstacle avoiding of unmanned plane.
Specifically, by the perception of unmanned plane and environment, acquisition and the distance between destination, barrier, as depth Q
The status information of learning algorithm;
Neural network fitting module is responsible for the calculating of Q value: using the approximation capability of neural network, fitting for a certain shape
Institute possible to state is stateful-the Q value that acts pair;
The selection that action selection module is responsible for unmanned plane execution movement is held using greedy algorithm with ε probability selection unmanned plane
The optimal movement of row, the corresponding Q value of optimal movement is maximum, with 1- ε probability random selection movement, unmanned plane receive action message it
Afterwards, corresponding movement is executed, a new position is reached;
Unmanned plane state acquisition-Q value fitting-movement selection-execute movement-it is new state acquisition will gradually reach it is specified
Destination.
Specific steps refinement is as follows:
The first step establishes the Markov model of unmanned plane environment sensing Yu automatic obstacle avoiding algorithm, autonomous according to unmanned plane
The movement decision process of avoidance models the five-tuple (s, a, r, p, γ) of markov decision process MDP:
(1) state set s, defining position coordinates (x, y) and course heading θ of the unmanned plane in flying scene indicates nobody
The determination position of machine, (xg,yg) indicate unmanned plane during flying task destination, then unmanned plane is as follows to the distance definition of destination:
△ x=x-xg, △ y=y-yg (1)
In order to which the environment to unmanned plane forward path detects, -45 spend between 45 degree in front of unmanned plane traveling, often
The radar detection line of an a length of 4m is set up every 5 degree, totally 16, the detection range of every radar detection line is defined as follows:
Wherein, i=1 ... ..., 16, j=1 ... ..., n, (obs_xj,obs_yj), indicate the coordinate position of n barrier,
Detected indicates that the radar detection line of unmanned plane has detected barrier, while for the ease of data processing, and unmanned plane is every
The distance dis that bar radar detection line detectsi, (i=1 ..., 16) normalized is norm_disi, it is as follows:
The state of last unmanned plane is determined as
S=[△ x, △ y, θ, norm_disi] (4)
(2) behavior aggregate a, behavior aggregate refers to unmanned plane after receiving the value of feedback of external environment, locating for itself
Position, the set for the everything that may be taken give unmanned plane in unmanned plane environment sensing and automatic obstacle avoiding algorithm
Movement velocity v, and selectable behavior aggregate is defined as
I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, to change
In x, the velocity component in the direction y realizes the planning of track;
(3) Reward Program r immediately, Reward Program refers to that unmanned plane under a certain state, after selecting a certain movement, obtains immediately
Instantaneous feedback, indicate award to a certain state-movement pair, define △ dis for when measuring moment t, current state to be earlier above
One moment t-1, the distance that unmanned plane is advanced towards target point:
△ θ be used for measure current unmanned plane aviation school's angle and unmanned plane towards target point angle difference:
(norm_dis8- 1) indicate unmanned plane course in front of the 8th bar of radar detection line whether detect barrier and with
The distance between barrier:
To sum up, Reward Program is defined as follows immediately
Wherein, hit indicates that unmanned plane collides with barrier, and at target indicates that unmanned plane reaches target point;
(4) state transition probability function, state transition probability function to describe quadrotor drone in flying scene,
Subsequent time shape probability of state is transferred to by a certain movement of current time state selection;
(5) discount factor γ, discount factor is for describing in unmanned plane automatic obstacle avoiding decision process, current time flight
" attention degree " of the decision to the following Reward Program immediately;
Second step according to the markov decision process modeled, selected depth Q learning algorithm, and determines algorithm flow,
Find the optimal solution of unmanned plane environment sensing and automatic obstacle avoiding;
The complicated flying scene of third step, unmanned plane environment sensing and automatic obstacle avoiding algorithm designs, including building unmanned plane
Model, design unmanned plane, then will be Step 1: two be applied to unmanned aerial vehicle (UAV) control, realization unmanned plane to ambient enviroment sensor model
Environment sensing and automatic obstacle avoiding.
Depth Q learning algorithm process is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter;
Secondly, according to multiple Q values that neural network fits current state, the maximum movement of Q value is made with ε probability selection, 0 < ε < 1,
With 1- ε probability random selection movement, after executing execution, a value of feedback is obtained, reaches a new state, and handle
" current state-movement-value of feedback-subsequent time state " experience segment is stored into experience pond;Finally, recycle the process until
Unmanned plane arrives at the destination, and is trained later to neural network every the step of certain number in the process;
The training process of neural network is as follows: firstly, neural network randomly selects experience segment and basis from experience pond
Wherein subsequent time state, selection make its maximum movement of Q value;Secondly, calculating the corresponding maximum Q of value of feedback, subsequent time state
The reversed error square as neural network of value and current state Q value difference value;Finally, to keep back transfer error minimum, mind
Parameter is adjusted using gradient descent algorithm through network.
The features of the present invention and beneficial effect are:
In order to verify the unmanned plane environment sensing proposed by the present invention based on depth Q learning algorithm and automatic obstacle avoiding method
Validity devises unmanned plane automatic obstacle avoiding dummy emulation system, and has carried out emulation experiment on this system.In virtual emulation
In environment, following simulation parameter is set:
(1) unmanned plane during flying scene: square flight range l=20m as shown in Figure 6, wherein total face of all barriers
Product is d=0.01 to the accounting of square flight range, and barrier radius radius is randomly generated, but meets 0.1m≤radius
≤0.3m.In order to increase the complexity of unmanned plane during flying environment, in all barriers, moving obstacle accounts for all barriers
Than for r=0.2, movement speed vobsIt is randomly generated, but satisfaction -3.0m/s≤vobs≤ 3.0m/s, the refreshing frequency of flying scene
For 30Hz.
(2) neural network parameter: the learning rate that neural network gradient declines optimizer is 0.01, neural network training model
As shown in figure 3, including the input layer of 19 neurons, the output layer of the hidden layer of 10 neurons and 3 neurons.Wherein input layer,
The activation primitive of hidden layer is all made of linear amending unit.
(3) depth Q learning algorithm: exploration rate ε=0.9, discount factor γ=0.9, the memory pond of depth Q study, which stores, to hold
Amount is 500, and every run 300 times is updated.
(4) radar detedtor: unmanned plane traveling front -45 is spent between 45 degree, and the thunder of an a length of 4m is set up every 5 degree
Up to detection line, totally 16.
(5) unmanned plane model: the flying speed v=2.5m/s of unmanned plane, Image Rendering data come from 3D printing model
3DBuilder, part of data are as shown in table 2.
Unmanned plane environment sensing proposed by the present invention and automatic obstacle avoiding method are unfolded based on depth Q learning algorithm, by
It, should in the case where unmanned plane during flying scene is extremely complex in the capability of fitting of deep learning and the decision-making capability of intensified learning
Method still has good robustness.In order to further prove the present invention is based on the unmanned plane environment sensing of depth Q learning algorithm with
The validity of automatic obstacle avoiding method carries out simulating, verifying for flying scene, wherein Obstacle Position, radius, movement speed, nothing
Man-machine initial position is set at random with aiming spot.
Unmanned plane automatic obstacle avoiding flow chart is as shown in figure 4, unmanned plane is both needed to fly towards target point in each flight bout
Row, upon arrival at a destination, the position of target point will do it update, and unmanned plane continues to track;When unmanned plane and barrier carry out
When shock, the position of unmanned plane and target point is updated simultaneously;In order to improve efficiency unmanned plane in each flight bout,
It, can be simultaneously to unmanned plane and target point when not only not reaching target point in longer period of time, but also not hit with barrier
Position is updated.
Simulation result is as shown in figure 5, unmanned plane in each flight bout, can be realized loss function from high to low
Convergence reaches target point after the convergence of neural metwork training value since the movement velocity of unmanned plane is accelerated quickly.Subsequent unmanned plane arrives
Up to after terminal, target point can be updated immediately.Therefore, subsequent loss function can generate higher jump again, until nerve net
Network is restrained again, is reached home herein, is so recycled.
The numerical analysis of evolution process of unmanned plane avoidance is as shown in fig. 6, upper and lower two groups of images can be seen that unmanned plane is more multiple
Under miscellaneous environment, arrive safe and sound terminal.The result shows that unmanned plane automatic obstacle avoiding algorithm can be completed in complicated flying scene
Avoidance flight from starting point to target point.
2 unmanned plane model 3D printing data (part) of table
In designed unmanned plane complexity flying scene, using the unmanned plane based on depth Q learning algorithm proposed
Environment sensing and automatic obstacle avoiding algorithm realize the avoidance test in different distribution of obstacles respectively, below by knot
Test result is closed, control performance is analyzed from different perspectives, further to define the validity of this guidance algorithm.
(1) robust analysis: proposed by the present invention in -45 to 45 degree of angular regions, to set up radar in front of unmanned plane course
Line method is detected, the influence of the factors such as weather, weather can be excluded, effectively detects barrier, the flight in unmanned plane traveling front
The information such as boundary provide reliable information for automatic obstacle avoiding;The depth Q learning algorithm used simultaneously is directed to different unmanned planes
State of flight can make optimal decision according to Q value, provide avoidance instruction for unmanned plane.To sum up, unmanned plane flies in avoidance
During row, there is stronger robustness for influence factors such as different flying scenes, weather, weather.
(2) real time analysis: algorithm proposed by the present invention, using the forward path information that radar detects as decision according to
According to by deep neural network, the processing of Q learning algorithm, the directly optimum instruction of generation unmanned plane avoidance, with traditional avoidance side
Method is compared, and the integration and transmitting of unmanned plane during flying environment sensing Yu two intermodular datas of unmanned plane automatic obstacle avoiding are avoided, and is shown
Write the real-time for improving unmanned plane automatic obstacle avoiding algorithm.
(3) safety analysis: can be accurately to flying scene by can be seen that algorithm proposed by the present invention shown in Fig. 6
In barrier effectively identified, and make optimal movement decision, unmanned plane and barrier, moving boundaries avoided to occur
Collision, ensure that the safety that unmanned plane flies in complex scene.
In conclusion the unmanned plane environment sensing and automatic obstacle avoiding algorithm based on deep learning for the proposition originally researched and proposed
There is quite high applicability for avoidance problem of the unmanned plane in complicated flying scene.
Detailed description of the invention:
1 quadrotor drone environment sensing of attached drawing and automatic obstacle avoiding system construction drawing.
2 unmanned plane environment sensing of attached drawing and automatic obstacle avoiding algorithm mentality of designing block diagram.
3 neural network training model schematic diagram of attached drawing.
4 unmanned plane automatic obstacle avoiding flow chart of attached drawing.
5 neural network loss function change curve of attached drawing.
6 environment sensing of attached drawing and automatic obstacle avoiding simulation process schematic diagram.
Specific embodiment
In order to overcome the disadvantage of traditional unmanned plane automatic obstacle avoiding algorithm robustness difference, the present invention under study for action, by current
Cause each side pay close attention to artificial intelligence field deeply learning algorithm, it is established that the perceived distance of unmanned plane and barrier with
Mapping between unmanned plane Robot dodge strategy proposes a kind of quadrotor based on depth Q learning algorithm by deeply learning network
Unmanned plane perception and barrier-avoiding method.This method is using the radar detedtor in front of unmanned plane to a certain range of flying ring in front
Border is detected, and can utmostly be avoided factors such as weather, distance etc. from influencing, be improved the robustness of algorithm;Meanwhile utilizing spy
Measurement information can directly be generated the Robot dodge strategy of unmanned plane using depth Q learning network, can significantly improved as initial data
The real-time of unmanned plane avoidance;Moreover, the Robot dodge strategy based on depth Q study is in the training process, it can effectively be fitted nobody
Each state-movement of machine can effectively ensure that unmanned plane during flying to corresponding Q value, using the strategy that greedy algorithm generates
Safety.This is used to path of the unmanned plane under complex environment with Robot dodge strategy based on the unmanned plane perception of depth Q study to advise
It draws, not only there is important theory significance, but also strategic value with higher to unmanned plane automatic obstacle avoiding research field.
For traditional unmanned plane automatic obstacle avoiding scheme based on environment sensing and path planning there are the shortcomings that, the present invention
A kind of unmanned plane automatic obstacle avoiding method based on depth Q study is proposed, first with radar in certain distance in front of unmanned plane
Path detected, obtain itself between barrier, target point at a distance from the state that is presently in as unmanned plane;Secondly,
In training process, using neuron network simulation unmanned plane each state-movement to corresponding Q value;Finally, working as training result
When gradually restraining, using greedy algorithm, optimal movement is selected for the unmanned plane under each particular state, to realize nothing
Man-machine automatic obstacle avoiding.
It follows that the unmanned plane environment sensing and automatic obstacle avoiding method proposed by the present invention based on depth Q learning algorithm
It is a kind of intelligent real-time control scheme of closed loop, it is highly-safe, rapidity is good;This method can solve under complex scene
Quadrotor drone automatic obstacle avoiding problem, strong robustness;The validity and high reliablity of the program are conducive to improve unmanned plane
Ability of making decisions on one's own during execution task can be applied to a variety of civil, military fields;It can be by the intelligence path planning
Scheme is applied to the automatic obstacle avoiding of practical unmanned plane, quickly generates action order online, realizes safe avoidance flight.
The present invention is integrated as main research means with control theory method and Virtual Simulation, invents a kind of based on deep
The quadrotor drone environment sensing and automatic obstacle avoiding method for spending Q study, emulation experiment is carried out under python2.7 environment, is tested
The validity of this method is demonstrate,proved.
The first step establishes the Markov model of unmanned plane environment sensing Yu automatic obstacle avoiding algorithm.It is autonomous according to unmanned plane
The movement decision process of avoidance models the five-tuple (s, a, r, p, γ) of markov decision process (MDP).
(1) state set s.Defining position coordinates (x, y) and course heading θ of the unmanned plane in flying scene indicates nobody
The determination position of machine, (xg,yg) indicate unmanned plane during flying task destination, then unmanned plane is as follows to the distance definition of destination:
△ x=x-xg, △ y=y-yg (1)
In order to which the environment to unmanned plane forward path detects, -45 spend between 45 degree in front of unmanned plane traveling, often
The radar detection line of an a length of 4m is set up every 5 degree, totally 16, the detection range of every radar detection line is defined as follows:
Wherein, (obs_xj,obs_yj), (j=1 ..., n) indicates that the coordinate position of n barrier, detected indicate
The radar detection line of unmanned plane has detected barrier (as shown in Fig. 2 module 1).Simultaneously for the ease of data processing, by nobody
The distance dis that every radar detection line of machine detectsi, (i=1 ..., 16) normalized is norm_disi, (i=1 ...,
16), as follows:
The state of last unmanned plane is determined as
S=[△ x, △ y, θ, norm_disi] (4)
(2) behavior aggregate a.Behavior aggregate refers to unmanned plane after receiving the value of feedback of external environment, locating for itself
Position, the set for the everything that may be taken.In unmanned plane environment sensing and automatic obstacle avoiding algorithm, unmanned plane is given
Movement velocity v, and selectable behavior aggregate is defined as
I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, to change
In x, the velocity component in the direction y realizes the planning of track.
(3) Reward Program r immediately.Reward Program refers to that unmanned plane under a certain state, after selecting a certain movement, obtains immediately
Instantaneous feedback, indicate award to a certain state-movement pair.△ dis is defined for when measuring moment t, current state to be earlier above
One moment t-1, the distance that unmanned plane is advanced towards target point:
△ θ be used for measure current unmanned plane aviation school's angle and unmanned plane towards target point angle difference:
(norm_dis8- 1) indicate unmanned plane course in front of the 8th bar of radar detection line whether detect barrier and with
The distance between barrier:
To sum up, Reward Program is defined as follows immediately
Wherein, hit indicates that unmanned plane collides with barrier, and at target indicates that unmanned plane reaches target point.
(4) state transition probability function p.In this project, state transition probability function is to describe quadrotor drone
In flying scene, subsequent time shape probability of state is transferred to by a certain movement of current time state selection.
Flight environment of vehicle is complicated in this project, therefore is modeled as the unknown markoff process of state transition probability p.By force
Aiming at the problem that changing learning areas and be divided into based on environmental model for whether state transition probability is known with environmental model is not based on, each
There is effective solution algorithm in the case of.The one kind of depth Q learning algorithm as nitrification enhancement, can be unknown in p
In the case where effectively solve the problems, such as to be not based on environmental model.
(5) discount factor γ.Discount factor is for describing in unmanned plane automatic obstacle avoiding decision process, current time flight
" attention degree " of the decision to the following Reward Program immediately.
Second step according to the markov decision process modeled, selected depth Q learning algorithm, and determines algorithm flow,
Find the optimal solution of unmanned plane environment sensing and automatic obstacle avoiding.Determine that algorithm flow is as shown in table 1:
Table 1: unmanned plane environment sensing and automatic obstacle avoiding algorithm
Algorithm flow is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter parameter;Secondly,
According to multiple Q values that neural network fits current state, the maximum movement of Q value is made with ε probability (0 < ε < 1) selection, with 1- ε
Probability random selection movement after executing execution, obtains a value of feedback, reaches a new state, and " current shape
State-movement-value of feedback-subsequent time state " experience segment is stored into experience pond;Finally, recycling the process until unmanned plane
It arrives at the destination, neural network is trained every the step of certain number in the process.
The training process of neural network is as follows: firstly, neural network randomly selects experience segment and basis from experience pond
Wherein subsequent time state, selection make its maximum movement of Q value;Secondly, calculating current state value of feedback, subsequent time state pair
Answer the reversed error square as neural network of maximum Q value and current state Q value difference value;Finally, to make back transfer error
Minimum, neural network are adjusted parameter using gradient descent algorithm.
The environment of unmanned plane environment sensing and automatic obstacle avoiding is arranged in third step.In unmanned plane environment sensing and automatic obstacle avoiding
During, unmanned plane needs constantly that there are the environment of barrier to interact with surrounding as intelligent body, obtains enough numbers
According to enough information can be collected, the foundation as decision.Meanwhile unmanned plane is as controlled device, the model of unmanned plane
It is indispensable a part during simulating, verifying.
Unmanned plane during flying environmental postulates be square range in, cylindrical body not of uniform size is distributed with as obstacle
Object, while green mark indicates the destination of unmanned plane during flying.The model of quadrotor drone is obtained by 3D printing data, is incited somebody to action
3D printing data are input in environment Director, can reproduce the model of quadrotor drone.
Based on above three step, it can be achieved that unmanned plane is under compound movement scene, by the radar detection apparatus of itself, carry out
Detection of obstacles simultaneously realizes automatic obstacle avoiding, arrives at the destination.
Quadrotor drone environment sensing and automatic obstacle avoiding system construction drawing are as shown in Figure 1.By to hindering in flight environment of vehicle
Hinder the acquisition of the status informations such as object, target point, selects movement optimal under current state, can control quadrotor drone, it is real
The target call now arrived at the destination.Wherein the fitting of Q value is the core link of algorithm, only by the accurate fitting of Q value,
It can be the suitable movement of unmanned plane selection, complete set aerial mission.If without the fitting part of Q value, unmanned plane can not
Flight directive is obtained, the aerial mission being unable to complete under complex environment.
Fig. 2 show unmanned plane environment sensing proposed by the present invention and automatic obstacle avoiding algorithm mentality of designing block diagram.State inspection
The acquisition that module is responsible for information is surveyed, by the perception of unmanned plane and environment, is obtained and the distance between destination, barrier, work
For the status information of depth Q learning algorithm.Neural network fitting module is responsible for the calculating of Q value, approaches energy using neural network
Power, fit for possible to a certain state it is stateful-movement pair Q value.Action selection module is responsible for unmanned plane execution
The selection of movement in multiple Q values corresponding to current state, makes Q value most with ε probability (0 < ε < 1) selection using greedy algorithm
Big movement, with 1- ε probability random selection movement.It executes action module and is responsible for the execution specifically acted, unmanned plane receives dynamic
After making information, corresponding movement is executed, a new position is reached.Unmanned plane is in state acquisition-Q value fitting-movement selection-
Designated destination will gradually be reached by executing the new state acquisition of movement-.
The markoff process of the first step, unmanned plane environment sensing and automatic obstacle avoiding algorithm models.It is autonomous according to unmanned plane
The movement decision process of avoidance models the five-tuple (s, a, r, p, γ) of markov decision process (MDP).
(1) state set s, state set, which refers to, can determine that the quantity of state for indicating unmanned plane current flight information.
Defining current location (x, y) and course heading θ of the unmanned plane in flying scene indicates that unmanned plane positions really
It sets, (xg,yg) indicate unmanned plane during flying task destination, then distance definition of the unmanned plane apart from destination is as follows:
△ x=x-xg, △ y=y-yg (10)
In order to which the environment to unmanned plane forward path detects, -45 spend between 45 degree in front of unmanned plane traveling, often
The radar detection line of an a length of 4m is set up every 5 degree, totally 16, the detection range of every radar detection line is defined as follows:
Wherein, (obs_xj,obs_yj), (j=1 ..., n) indicates that the coordinate position of n barrier, detected indicate
The radar detection line of unmanned plane has detected barrier (as shown in Fig. 2 module 1).Simultaneously for the ease of data processing, by nobody
The distance dis that every radar detection line of machine detectsi, (i=1 ..., 16) normalized is norm_disi, (i=1 ...,
16), as follows:
The state of last unmanned plane is determined as
S=[△ x, △ y, θ, norm_disi] (13)
In the status information, the distance between unmanned plane current flight position and destination can be both indicated;It simultaneously can
To indicate the distance between barrier present in unmanned plane and flying scene, thus choose whether to need to carry out avoidance operation.
(2) behavior aggregate a, behavior aggregate refers to unmanned plane after receiving the value of feedback of external environment, locating for itself
Position, the set for the everything that may be taken.
In unmanned plane environment sensing and automatic obstacle avoiding algorithm, the movement velocity v of given unmanned plane, and selectable movement
Collection is defined as
I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, to change
In x, the velocity component in the direction y realizes the planning of track.Thus it indicates, unmanned plane is existed always with speed v before reaching home
It is moved under the action of course angle θ along track, against the variation of course heading, the track of unmanned plane can change therewith, until arriving
Up to destination.
(3) function r is reported immediately, and Reward Program refers to that unmanned plane under a certain state, after selecting a certain movement, obtains immediately
Instantaneous feedback, indicate award to a certain state-movement pair.
State-movement during unmanned plane during flying is to mainly point three kinds of situations: reaching target point, strikes obstacles, peace
Full state of flight.For every case, require to rationally design Reward Program immediately.It wherein reaches target point and hits obstacle
Object field scape is simple, and Reward Program is respectively defined as 15 reward value and -20 penalty value immediately, and safe flight state is more multiple
It is miscellaneous, need to comprehensively consider travel distance of the unmanned plane during flying compared with previous moment, the differential seat angle towards target point, and and barrier
Between distance.
△ dis is defined for when measuring moment t, current state to be compared with the distance that previous moment state is advanced towards target point:
△ θ be used for measure current unmanned plane aviation school's angle and unmanned plane towards target point angle difference:
(norm_dis8- 1) indicate unmanned plane course in front of the 8th bar of radar detection line whether detect barrier and with
The distance between barrier.
To sum up, Reward Program is defined as follows immediately
Wherein, hit indicates that unmanned plane collides with barrier, and at target indicates that unmanned plane reaches target point.
(4) state transition probability function p.In this project, state transition probability function is to describe quadrotor drone
In flying scene, subsequent time shape probability of state is transferred to by a certain movement of current time state selection.
Flight environment of vehicle is complicated in this project, therefore is modeled as the unknown markoff process of state transition probability p.By force
Aiming at the problem that changing learning areas and be divided into based on environmental model for whether state transition probability is known with environmental model is not based on, each
There is effective solution algorithm in the case of.The one kind of depth Q learning algorithm as nitrification enhancement, can be unknown in p
In the case where effectively solve the problems, such as to be not based on environmental model.
(5) discount factor γ, discount factor is for describing in unmanned plane automatic obstacle avoiding decision process, current time flight
" attention degree " of the decision to the following Reward Program immediately.
In unmanned plane environment sensing and automatic obstacle avoiding flight course, to enable the intelligent avoidance of unmanned plane, reality is needed
Existing unmanned plane is under current state, until the accumulative return value of future terminal state
It is maximum.
When accumulative Reward Program maximum, unmanned plane can find optimal path.Wherein γ indicates unmanned plane current
State stAt the moment, to " attention degree " of future returns, γ=1 indicates that unmanned plane is enough " long sight ", coequally treats current
With following return value immediately;γ=0 indicates unmanned plane very " short-sighted ", only values current return value immediately, and has ignored
Following influence.
Second step, unmanned plane environment sensing and the depth Q learning algorithm of automatic obstacle avoiding algorithm are built.To enable neural network
It is enough accurately to fit each state-movement pair Q value, neural network is trained using depth Q learning algorithm, purpose exists
In adjusting weight and deviation in each neural net layer using gradient descent algorithm.
Meanwhile during neural network is fitted Q value, the flight under each state is selected using depth Q learning algorithm
Instruction.In the selection course of flare maneuver, in order to avoid algorithm falls into locally optimal solution, need to consider unmanned plane in flight field
Relationship in scape between " utilization " and " exploration ".Using greedy algorithm, unmanned plane is utilized with the probability (0 < ε < 1) of ε and has been collected
The data of obtained flying scene explore flying scene with the probability of 1- ε.
Finally, the depth Q learning algorithm of unmanned plane environment sensing and automatic obstacle avoiding is as shown in table 2
Table 2: unmanned plane environment sensing and automatic obstacle avoiding algorithm
Algorithm flow is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter;Secondly, according to
Multiple Q values that neural network fits current state make the maximum movement of Q value with ε probability (0 < ε < 1) selection, with 1- ε probability
Random selection movement after executing execution, obtains a value of feedback, reaches a new state, and " current state-
Movement-value of feedback-subsequent time state " experience segment is stored into experience pond;Finally, recycling the process until unmanned plane reaches
Destination is in the process later trained neural network every the step of certain number.
The training process of neural network is as follows: firstly, neural network randomly selects experience segment and basis from experience pond
Wherein subsequent time state, selection make its maximum movement of Q value;Secondly, calculating the corresponding maximum Q of value of feedback, subsequent time state
The reversed error square as neural network of value and current state Q value difference value;Finally, to keep back transfer error minimum, mind
Parameter is adjusted using gradient descent algorithm through network.
The complicated flying scene of third step, unmanned plane environment sensing and automatic obstacle avoiding algorithm designs.Build one it is complicated
Flying scene carries out experimental verification to the validity of unmanned plane automatic obstacle avoiding algorithm.During unmanned plane perception is with avoidance,
It needs constantly to interact with flying scene, collects data as much as possible as decision-making foundation, nerve can be trained up
Network, while most correct decision behavior is made during avoidance.Meanwhile unmanned plane is as controlled device, the mould of unmanned plane
Type is also indispensable a part during simulating, verifying.
Unmanned plane during flying suppositive scenario is that circle not of uniform size is distributed in boundary in a square flight range
Cylinder is as barrier.In order to enhance the complexity of flying scene, in each flight bout, the destination of unmanned plane during flying is random
It generates.Meanwhile the movement speed of the positions of all barriers in boundary, radius and barrier is randomly generated, unmanned plane
The setting algorithm of barrier is as shown in table 3 in flying scene
3 unmanned plane during flying scene setting algorithm of table
Algorithm flow is as follows: firstly, determining that the gross area of barrier and moving obstacle account for its total face in flight environment of vehicle
Long-pending ratio;Secondly, the radius of random dyspoiesis object, position (within the allowable range), with moving obstacle area accounting
For probability, selecting movement speed is 0, or is randomly generated (in allowed band);Finally, according to the radius of barrier, position with
Movement speed draws barrier in flight environment of vehicle, until the gross area of area and arrival barrier.
Meanwhile the model of quadrotor drone is obtained by 3D printing data in flying scene, and 3D printing data are inputted
Into open source environment Director, the flying scene of quadrotor drone can be reproduced.
Based on above three step, it can be achieved that unmanned plane is under complicated flying scene, by the radar detection apparatus of itself, carry out
Detection of obstacles simultaneously realizes automatic obstacle avoiding, arrives at the destination.
Claims (4)
1. a kind of unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study, characterized in that first with radar pair
Path in front of unmanned plane in certain distance is detected, obtain itself with barrier, target point between at a distance from be used as unmanned plane
The state being presently in;Secondly, in training process, using neuron network simulation unmanned plane each state-movement to corresponding
Deep learning Q value;Finally, when training result is gradually restrained, using greedy algorithm, be nobody under each particular state
Machine selects optimal movement, to realize the automatic obstacle avoiding of unmanned plane.
2. unmanned plane environment sensing and automatic obstacle avoiding method as described in claim 1 based on depth Q study, characterized in that
Specifically, by the perception of unmanned plane and environment, acquisition and the distance between destination, barrier, as depth Q learning algorithm
Status information;
Neural network fitting module is responsible for the calculating of Q value: using the approximation capability of neural network, fitting for a certain state institute
The Q value of possible stateful-movement pair;
Action selection module is responsible for the selection of unmanned plane execution movement, using greedy algorithm, is executed most with ε probability selection unmanned plane
Excellent movement, it is optimal to act corresponding Q value maximum, with 1- ε probability random selection movement, after unmanned plane receives action message,
Corresponding movement is executed, a new position is reached;
Unmanned plane, which executes the new state acquisition of movement-in state acquisition-Q value fitting-movement selection-, will gradually reach specified mesh
Ground.
3. unmanned plane environment sensing and automatic obstacle avoiding method as described in claim 1 based on depth Q study, characterized in that
Specific steps refinement is as follows:
The first step establishes the Markov model of unmanned plane environment sensing Yu automatic obstacle avoiding algorithm, according to unmanned plane automatic obstacle avoiding
Movement decision process, the five-tuple (s, a, r, p, γ) of markov decision process MDP is modeled:
(1) state set s, defining position coordinates (x, y) and course heading θ of the unmanned plane in flying scene indicates unmanned plane
Determine position, (xg,yg) indicate unmanned plane during flying task destination, then unmanned plane is as follows to the distance definition of destination:
Δ x=x-xg, Δ y=y-yg (1)
In order to which the environment to unmanned plane forward path detects, -45 spend between 45 degree in front of unmanned plane traveling, every 5
Degree sets up the radar detection line of an a length of 4m, and totally 16, the detection range of every radar detection line is defined as follows:
Wherein, i=1 ... ..., 16, j=1 ... ..., n, (obs_xj,obs_yj) indicate n barrier coordinate position,
Detected indicates that the radar detection line of unmanned plane has detected barrier, while for the ease of data processing, and unmanned plane is every
The distance dis that bar radar detection line detectsi, (i=1 ..., 16) normalized is norm_disi, it is as follows:
The state of last unmanned plane is determined as
S=[Δ x, Δ y, θ, norm_disi] (4)
(2) behavior aggregate a, behavior aggregate refer to unmanned plane after receiving the value of feedback of external environment, for itself the location of,
The set for the everything that may be taken gives the movement of unmanned plane in unmanned plane environment sensing and automatic obstacle avoiding algorithm
Speed v, and selectable behavior aggregate is defined as
I.e. unmanned plane is flown with speed v forwards always, and by selecting different movements, change course angle θ, thus change in x,
The velocity component in the direction y realizes the planning of track;
(3) Reward Program r immediately, Reward Program refers to unmanned plane under a certain state immediately, after selecting a certain movement, obtained wink
When feed back, indicate award to a certain state-movement pair, define Δ dis for when measuring moment t, current state is earlier above for the moment
T-1 is carved, the distance that unmanned plane is advanced towards target point:
Δ θ be used for measure current unmanned plane aviation school's angle and unmanned plane towards target point angle difference:
(norm_dis8- 1) indicate whether the 8th bar of radar detection line detects barrier and and barrier in front of unmanned plane course
The distance between:
To sum up, Reward Program is defined as follows immediately
Wherein, hit indicates that unmanned plane collides with barrier, and at target indicates that unmanned plane reaches target point;
(4) state transition probability function, state transition probability function is to describe quadrotor drone in flying scene, by working as
The preceding a certain movement of moment state selection is transferred to subsequent time shape probability of state;
(5) discount factor γ, discount factor is for describing the current time flight decision in unmanned plane automatic obstacle avoiding decision process
To " attention degree " of the following Reward Program immediately;
Second step according to the markov decision process modeled, selected depth Q learning algorithm, and determines algorithm flow, finds
The optimal solution of unmanned plane environment sensing and automatic obstacle avoiding;
The complicated flying scene of third step, unmanned plane environment sensing and automatic obstacle avoiding algorithm designs, including building unmanned plane model,
Design unmanned plane, then will be Step 1: two be applied to unmanned aerial vehicle (UAV) control, realization unmanned plane environment sense to ambient enviroment sensor model
Know and automatic obstacle avoiding.
4. unmanned plane environment sensing and automatic obstacle avoiding method as described in claim 1 based on depth Q study, characterized in that
Depth Q learning algorithm process is as follows: firstly, carrying out the random initializtion of drone status and neural network parameter;Secondly, according to
Multiple Q values that neural network fits current state make the maximum movement of Q value with ε probability selection, and 0 < ε < 1 is general with 1- ε
Rate random selection movement after executing execution, obtains a value of feedback, reaches a new state, and " current shape
State-movement-value of feedback-subsequent time state " experience segment is stored into experience pond;Finally, recycling the process until unmanned plane
It arrives at the destination, neural network is trained later every the step of certain number in the process;
The training process of neural network is as follows: firstly, neural network randomly selects experience segment from experience pond and according to wherein
Subsequent time state, selection make its maximum movement of Q value;Secondly, calculate the corresponding maximum Q value of value of feedback, subsequent time state and
The reversed error square as neural network of current state Q value difference value;Finally, to keep back transfer error minimum, nerve net
Network is adjusted parameter using gradient descent algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910195250.8A CN109933086B (en) | 2019-03-14 | 2019-03-14 | Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910195250.8A CN109933086B (en) | 2019-03-14 | 2019-03-14 | Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109933086A true CN109933086A (en) | 2019-06-25 |
CN109933086B CN109933086B (en) | 2022-08-30 |
Family
ID=66987310
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910195250.8A Active CN109933086B (en) | 2019-03-14 | 2019-03-14 | Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109933086B (en) |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110345948A (en) * | 2019-08-16 | 2019-10-18 | 重庆邮智机器人研究院有限公司 | Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm |
CN110378439A (en) * | 2019-08-09 | 2019-10-25 | 重庆理工大学 | Single robot path planning method based on Q-Learning algorithm |
CN110488859A (en) * | 2019-07-15 | 2019-11-22 | 北京航空航天大学 | A kind of Path Planning for UAV based on improvement Q-learning algorithm |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
CN110554707A (en) * | 2019-10-17 | 2019-12-10 | 陕西师范大学 | Q learning automatic parameter adjusting method for aircraft attitude control loop |
CN110596734A (en) * | 2019-09-17 | 2019-12-20 | 南京航空航天大学 | Multi-mode Q learning-based unmanned aerial vehicle positioning interference source system and method |
CN110703766A (en) * | 2019-11-07 | 2020-01-17 | 南京航空航天大学 | Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network |
CN110716575A (en) * | 2019-09-29 | 2020-01-21 | 哈尔滨工程大学 | UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning |
CN110806756A (en) * | 2019-09-10 | 2020-02-18 | 西北工业大学 | Unmanned aerial vehicle autonomous guidance control method based on DDPG |
CN110879610A (en) * | 2019-10-24 | 2020-03-13 | 北京航空航天大学 | Reinforced learning method for autonomous optimizing track planning of solar unmanned aerial vehicle |
CN110968102A (en) * | 2019-12-27 | 2020-04-07 | 东南大学 | Multi-agent collision avoidance method based on deep reinforcement learning |
CN111123963A (en) * | 2019-12-19 | 2020-05-08 | 南京航空航天大学 | Unknown environment autonomous navigation system and method based on reinforcement learning |
CN111198568A (en) * | 2019-12-23 | 2020-05-26 | 燕山大学 | Underwater robot obstacle avoidance control method based on Q learning |
CN111260658A (en) * | 2020-01-10 | 2020-06-09 | 厦门大学 | Novel depth reinforcement learning algorithm for image segmentation |
CN111473794A (en) * | 2020-04-01 | 2020-07-31 | 北京理工大学 | Structural road unmanned decision planning method based on reinforcement learning |
CN111487992A (en) * | 2020-04-22 | 2020-08-04 | 北京航空航天大学 | Unmanned aerial vehicle sensing and obstacle avoidance integrated method and device based on deep reinforcement learning |
CN111667513A (en) * | 2020-06-01 | 2020-09-15 | 西北工业大学 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
CN112036261A (en) * | 2020-08-11 | 2020-12-04 | 海尔优家智能科技(北京)有限公司 | Gesture recognition method and device, storage medium and electronic device |
CN112148008A (en) * | 2020-09-18 | 2020-12-29 | 中国航空无线电电子研究所 | Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning |
WO2021088133A1 (en) * | 2019-11-05 | 2021-05-14 | 上海为彪汽配制造有限公司 | Method and system for constructing flight trajectory of multi-rotor unmanned aerial vehicle |
CN112947562A (en) * | 2021-02-10 | 2021-06-11 | 西北工业大学 | Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG |
CN112937564A (en) * | 2019-11-27 | 2021-06-11 | 初速度(苏州)科技有限公司 | Lane change decision model generation method and unmanned vehicle lane change decision method and device |
CN113110547A (en) * | 2021-04-21 | 2021-07-13 | 吉林大学 | Flight control method, device and equipment of miniature aviation aircraft |
CN113232016A (en) * | 2021-04-13 | 2021-08-10 | 哈尔滨工业大学(威海) | Mechanical arm path planning method integrating reinforcement learning and fuzzy obstacle avoidance |
CN113298368A (en) * | 2021-05-14 | 2021-08-24 | 南京航空航天大学 | Multi-unmanned aerial vehicle task planning method based on deep reinforcement learning |
JP6950117B1 (en) * | 2020-04-30 | 2021-10-13 | 楽天グループ株式会社 | Learning device, information processing device, and trained control model |
WO2021220467A1 (en) * | 2020-04-30 | 2021-11-04 | 楽天株式会社 | Learning device, information processing device, and learned control model |
CN114371720A (en) * | 2021-12-29 | 2022-04-19 | 国家电投集团贵州金元威宁能源股份有限公司 | Control method and control device for unmanned aerial vehicle to track target |
CN114578834A (en) * | 2022-05-09 | 2022-06-03 | 北京大学 | Target layered double-perception domain-based reinforcement learning unmanned vehicle path planning method |
CN115574816A (en) * | 2022-11-24 | 2023-01-06 | 东南大学 | Bionic vision multi-source information intelligent perception unmanned platform |
US11866070B2 (en) | 2020-09-28 | 2024-01-09 | Guangzhou Automobile Group Co., Ltd. | Vehicle control method and apparatus, storage medium, and electronic device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106595671A (en) * | 2017-02-22 | 2017-04-26 | 南方科技大学 | Method and apparatus for planning route of unmanned aerial vehicle based on reinforcement learning |
CN107065890A (en) * | 2017-06-02 | 2017-08-18 | 北京航空航天大学 | A kind of unmanned vehicle intelligent barrier avoiding method and system |
CN108255182A (en) * | 2018-01-30 | 2018-07-06 | 上海交通大学 | A kind of service robot pedestrian based on deeply study perceives barrier-avoiding method |
CN108388270A (en) * | 2018-03-21 | 2018-08-10 | 天津大学 | Cluster unmanned plane track posture cooperative control method towards security domain |
US20180308371A1 (en) * | 2017-04-19 | 2018-10-25 | Beihang University | Joint search method for uav multiobjective path planning in urban low altitude environment |
CN109032168A (en) * | 2018-05-07 | 2018-12-18 | 西安电子科技大学 | A kind of Route planner of the multiple no-manned plane Cooperative Area monitoring based on DQN |
US20190061147A1 (en) * | 2016-04-27 | 2019-02-28 | Neurala, Inc. | Methods and Apparatus for Pruning Experience Memories for Deep Neural Network-Based Q-Learning |
CN109443366A (en) * | 2018-12-20 | 2019-03-08 | 北京航空航天大学 | A kind of unmanned aerial vehicle group paths planning method based on improvement Q learning algorithm |
-
2019
- 2019-03-14 CN CN201910195250.8A patent/CN109933086B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190061147A1 (en) * | 2016-04-27 | 2019-02-28 | Neurala, Inc. | Methods and Apparatus for Pruning Experience Memories for Deep Neural Network-Based Q-Learning |
CN106595671A (en) * | 2017-02-22 | 2017-04-26 | 南方科技大学 | Method and apparatus for planning route of unmanned aerial vehicle based on reinforcement learning |
US20180308371A1 (en) * | 2017-04-19 | 2018-10-25 | Beihang University | Joint search method for uav multiobjective path planning in urban low altitude environment |
CN107065890A (en) * | 2017-06-02 | 2017-08-18 | 北京航空航天大学 | A kind of unmanned vehicle intelligent barrier avoiding method and system |
CN108255182A (en) * | 2018-01-30 | 2018-07-06 | 上海交通大学 | A kind of service robot pedestrian based on deeply study perceives barrier-avoiding method |
CN108388270A (en) * | 2018-03-21 | 2018-08-10 | 天津大学 | Cluster unmanned plane track posture cooperative control method towards security domain |
CN109032168A (en) * | 2018-05-07 | 2018-12-18 | 西安电子科技大学 | A kind of Route planner of the multiple no-manned plane Cooperative Area monitoring based on DQN |
CN109443366A (en) * | 2018-12-20 | 2019-03-08 | 北京航空航天大学 | A kind of unmanned aerial vehicle group paths planning method based on improvement Q learning algorithm |
Non-Patent Citations (3)
Title |
---|
刘庆杰 等: "面向智能避障场景的深度强化学习研究", 《智能物联技术》 * |
宗群 等: "基于马尔可夫网络排队论的电梯交通建模及应用", 《天津大学学报》 * |
王立群 等: "基于深度Q值网络的自动小车控制方法", 《电子测量技术》 * |
Cited By (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110488859A (en) * | 2019-07-15 | 2019-11-22 | 北京航空航天大学 | A kind of Path Planning for UAV based on improvement Q-learning algorithm |
CN110488859B (en) * | 2019-07-15 | 2020-08-21 | 北京航空航天大学 | Unmanned aerial vehicle route planning method based on improved Q-learning algorithm |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
CN110378439A (en) * | 2019-08-09 | 2019-10-25 | 重庆理工大学 | Single robot path planning method based on Q-Learning algorithm |
CN110345948A (en) * | 2019-08-16 | 2019-10-18 | 重庆邮智机器人研究院有限公司 | Dynamic obstacle avoidance method based on neural network in conjunction with Q learning algorithm |
CN110806756B (en) * | 2019-09-10 | 2022-08-02 | 西北工业大学 | Unmanned aerial vehicle autonomous guidance control method based on DDPG |
CN110806756A (en) * | 2019-09-10 | 2020-02-18 | 西北工业大学 | Unmanned aerial vehicle autonomous guidance control method based on DDPG |
CN110596734A (en) * | 2019-09-17 | 2019-12-20 | 南京航空航天大学 | Multi-mode Q learning-based unmanned aerial vehicle positioning interference source system and method |
CN110716575A (en) * | 2019-09-29 | 2020-01-21 | 哈尔滨工程大学 | UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning |
CN110554707B (en) * | 2019-10-17 | 2022-09-30 | 陕西师范大学 | Q learning automatic parameter adjusting method for aircraft attitude control loop |
CN110554707A (en) * | 2019-10-17 | 2019-12-10 | 陕西师范大学 | Q learning automatic parameter adjusting method for aircraft attitude control loop |
CN110879610A (en) * | 2019-10-24 | 2020-03-13 | 北京航空航天大学 | Reinforced learning method for autonomous optimizing track planning of solar unmanned aerial vehicle |
WO2021088133A1 (en) * | 2019-11-05 | 2021-05-14 | 上海为彪汽配制造有限公司 | Method and system for constructing flight trajectory of multi-rotor unmanned aerial vehicle |
CN110703766A (en) * | 2019-11-07 | 2020-01-17 | 南京航空航天大学 | Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network |
CN110703766B (en) * | 2019-11-07 | 2022-01-11 | 南京航空航天大学 | Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network |
CN112937564A (en) * | 2019-11-27 | 2021-06-11 | 初速度(苏州)科技有限公司 | Lane change decision model generation method and unmanned vehicle lane change decision method and device |
CN111123963A (en) * | 2019-12-19 | 2020-05-08 | 南京航空航天大学 | Unknown environment autonomous navigation system and method based on reinforcement learning |
CN111198568A (en) * | 2019-12-23 | 2020-05-26 | 燕山大学 | Underwater robot obstacle avoidance control method based on Q learning |
CN110968102A (en) * | 2019-12-27 | 2020-04-07 | 东南大学 | Multi-agent collision avoidance method based on deep reinforcement learning |
CN110968102B (en) * | 2019-12-27 | 2022-08-26 | 东南大学 | Multi-agent collision avoidance method based on deep reinforcement learning |
CN111260658A (en) * | 2020-01-10 | 2020-06-09 | 厦门大学 | Novel depth reinforcement learning algorithm for image segmentation |
CN111260658B (en) * | 2020-01-10 | 2023-10-17 | 厦门大学 | Deep reinforcement learning method for image segmentation |
CN111473794A (en) * | 2020-04-01 | 2020-07-31 | 北京理工大学 | Structural road unmanned decision planning method based on reinforcement learning |
CN111473794B (en) * | 2020-04-01 | 2022-02-11 | 北京理工大学 | Structural road unmanned decision planning method based on reinforcement learning |
CN111487992A (en) * | 2020-04-22 | 2020-08-04 | 北京航空航天大学 | Unmanned aerial vehicle sensing and obstacle avoidance integrated method and device based on deep reinforcement learning |
WO2021220467A1 (en) * | 2020-04-30 | 2021-11-04 | 楽天株式会社 | Learning device, information processing device, and learned control model |
JP6950117B1 (en) * | 2020-04-30 | 2021-10-13 | 楽天グループ株式会社 | Learning device, information processing device, and trained control model |
CN111667513B (en) * | 2020-06-01 | 2022-02-18 | 西北工业大学 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
CN111667513A (en) * | 2020-06-01 | 2020-09-15 | 西北工业大学 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
CN112036261A (en) * | 2020-08-11 | 2020-12-04 | 海尔优家智能科技(北京)有限公司 | Gesture recognition method and device, storage medium and electronic device |
CN112148008A (en) * | 2020-09-18 | 2020-12-29 | 中国航空无线电电子研究所 | Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning |
US11866070B2 (en) | 2020-09-28 | 2024-01-09 | Guangzhou Automobile Group Co., Ltd. | Vehicle control method and apparatus, storage medium, and electronic device |
CN112947562A (en) * | 2021-02-10 | 2021-06-11 | 西北工业大学 | Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG |
CN112947562B (en) * | 2021-02-10 | 2021-11-30 | 西北工业大学 | Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG |
CN113232016A (en) * | 2021-04-13 | 2021-08-10 | 哈尔滨工业大学(威海) | Mechanical arm path planning method integrating reinforcement learning and fuzzy obstacle avoidance |
CN113110547A (en) * | 2021-04-21 | 2021-07-13 | 吉林大学 | Flight control method, device and equipment of miniature aviation aircraft |
CN113298368A (en) * | 2021-05-14 | 2021-08-24 | 南京航空航天大学 | Multi-unmanned aerial vehicle task planning method based on deep reinforcement learning |
CN113298368B (en) * | 2021-05-14 | 2023-11-10 | 南京航空航天大学 | Multi-unmanned aerial vehicle task planning method based on deep reinforcement learning |
CN114371720A (en) * | 2021-12-29 | 2022-04-19 | 国家电投集团贵州金元威宁能源股份有限公司 | Control method and control device for unmanned aerial vehicle to track target |
CN114371720B (en) * | 2021-12-29 | 2023-09-29 | 国家电投集团贵州金元威宁能源股份有限公司 | Control method and control device for realizing tracking target of unmanned aerial vehicle |
CN114578834B (en) * | 2022-05-09 | 2022-07-26 | 北京大学 | Target layering double-perception domain-based reinforcement learning unmanned vehicle path planning method |
CN114578834A (en) * | 2022-05-09 | 2022-06-03 | 北京大学 | Target layered double-perception domain-based reinforcement learning unmanned vehicle path planning method |
CN115574816B (en) * | 2022-11-24 | 2023-03-14 | 东南大学 | Bionic vision multi-source information intelligent perception unmanned platform |
CN115574816A (en) * | 2022-11-24 | 2023-01-06 | 东南大学 | Bionic vision multi-source information intelligent perception unmanned platform |
Also Published As
Publication number | Publication date |
---|---|
CN109933086B (en) | 2022-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109933086A (en) | Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study | |
Xie et al. | Unmanned aerial vehicle path planning algorithm based on deep reinforcement learning in large-scale and dynamic environments | |
CN114384920B (en) | Dynamic obstacle avoidance method based on real-time construction of local grid map | |
Zhang et al. | A novel real-time penetration path planning algorithm for stealth UAV in 3D complex dynamic environment | |
CN105549597B (en) | A kind of unmanned vehicle dynamic path planning method based on environmental uncertainty | |
CN109870162A (en) | A kind of unmanned plane during flying paths planning method based on competition deep learning network | |
CN111780777A (en) | Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning | |
CN111399541B (en) | Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network | |
CN107886750B (en) | Unmanned automobile control method and system based on beyond-visual-range cooperative cognition | |
Wang et al. | A deep reinforcement learning approach to flocking and navigation of uavs in large-scale complex environments | |
CN109445456A (en) | A kind of multiple no-manned plane cluster air navigation aid | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
Wei et al. | Recurrent MADDPG for object detection and assignment in combat tasks | |
CN114967721B (en) | Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet | |
CN112114592B (en) | Method for realizing autonomous crossing of movable frame-shaped barrier by unmanned aerial vehicle | |
CN112631134A (en) | Intelligent trolley obstacle avoidance method based on fuzzy neural network | |
Makantasis et al. | A deep reinforcement learning driving policy for autonomous road vehicles | |
Chen et al. | Parallel motion planning: Learning a deep planning model against emergencies | |
Ke et al. | Cooperative path planning for air–sea heterogeneous unmanned vehicles using search-and-tracking mission | |
CN116540784A (en) | Unmanned system air-ground collaborative navigation and obstacle avoidance method based on vision | |
Zijian et al. | Imaginary filtered hindsight experience replay for UAV tracking dynamic targets in large-scale unknown environments | |
Zhang et al. | A bionic dynamic path planning algorithm of the micro UAV based on the fusion of deep neural network optimization/filtering and hawk-eye vision | |
Xie et al. | Long and short term maneuver trajectory prediction of UCAV based on deep learning | |
Li et al. | Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning | |
Fu et al. | UAV mission path planning based on reinforcement learning in Dynamic Environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |