CN115903880A - Unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning - Google Patents

Unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning Download PDF

Info

Publication number
CN115903880A
CN115903880A CN202211002222.8A CN202211002222A CN115903880A CN 115903880 A CN115903880 A CN 115903880A CN 202211002222 A CN202211002222 A CN 202211002222A CN 115903880 A CN115903880 A CN 115903880A
Authority
CN
China
Prior art keywords
experience
unmanned aerial
aerial vehicle
obstacle
obs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211002222.8A
Other languages
Chinese (zh)
Inventor
祝小平
王飞
祝宁华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aisheng Technology Group Co Ltd
Original Assignee
Xian Aisheng Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aisheng Technology Group Co Ltd filed Critical Xian Aisheng Technology Group Co Ltd
Priority to CN202211002222.8A priority Critical patent/CN115903880A/en
Publication of CN115903880A publication Critical patent/CN115903880A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to an unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning, and provides an unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on an image and experience pool storage mechanism, namely an FRDDM-DQN method. In the invention, an agent meeting the requirement is trained by an FRDDM-DQN method; when the task is executed, the trained intelligent body controls the unmanned aerial vehicle to realize autonomous image navigation and obstacle avoidance. Has the advantages that: by introducing the Faster R-CNN model into the DQN algorithm and converting the recognition result of the Faster R-CNN model, the unmanned aerial vehicle autonomous image navigation and obstacle avoidance capability in a complex environment is obtained. By introducing the experience pool data storage mechanism provided by the invention into the DQN algorithm, the autonomous image navigation and obstacle avoidance capability of the unmanned aerial vehicle in a complex environment is improved. By the aid of the method of subsection training, retraining time consumption during application scene change is reduced.

Description

Unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning
Technical Field
The invention belongs to the application field of unmanned aerial vehicles, and relates to an unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved strong learning, in particular to an unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning.
Background
The unmanned aerial vehicle is an autonomous and semi-autonomous unmanned aerial vehicle, has high maneuverability, good concealment and adaptability, and can replace human beings to execute some dangerous military tasks. For example, drones may perform terrestrial, marine search, rescue, and reconnaissance tasks instead of humans. In some military scenes, communication between the unmanned aerial vehicle and the ground station may be interfered, the unmanned aerial vehicle cannot execute tasks in a manual control mode, and the unmanned aerial vehicle is required to have autonomous navigation and obstacle avoidance capacity. In addition, in order to cope with military scenarios in which airborne radar cannot be used or fails, the drone should have the ability to avoid obstacles through various sensors. The airborne photoelectric equipment (such as an airborne camera) is widely used for unmanned aerial vehicles at present due to the characteristics of small volume and light weight, in particular reconnaissance type unmanned aerial vehicles and reconnaissance and striking integrated unmanned aerial vehicles. Therefore, the unmanned aerial vehicle should have the capability of autonomous navigation and obstacle avoidance through images shot by the onboard camera.
Currently, some studies implement image-based navigation and obstacle avoidance. In an unmanned aerial vehicle visual image algorithm, an obstacle avoidance step and an information fusion processing system thereof (patent, publication number CN112286230A, publication date 2020.11.13), navigation and obstacle avoidance of an unmanned aerial vehicle are realized through images acquired by an airborne camera. When an obstacle is encountered, the algorithm processes the obstacle avoidance task and the navigation task separately, namely the obstacle avoidance algorithm calculates how to avoid the obstacle, and the navigation task returns to the original navigation route to continue to execute the task after the obstacle is avoided. This processing method may result in actions selected by the algorithm that may avoid the obstacle but are not necessarily optimal for the navigation task, i.e. the selected action is not globally optimal. In Deep Learning-based monoclonal objective Avoidance for Unmanned Aerial Vehicle Navigation in Tree Plantations, fast Region-based general Network Approach ("Journal of Intelligent & robotics Systems (2021) 101 5), a fast R-CNN model is used to extract obstacles in images, and Obstacle Avoidance of Unmanned Aerial vehicles is realized. However, the obstacle avoidance strategy in the algorithm is established based on human experience, and the limited strategy established according to human experience is not necessarily the optimal strategy in all cases. Therefore, the current image-based navigation and obstacle avoidance algorithm has the problem that the action selection strategy is not globally optimal.
In addition, the autonomous navigation and obstacle avoidance problems can be solved through a method based on reinforcement learning. The method does not need prior knowledge, and the intelligent agent can gradually find a global optimal strategy suitable for the current rule (reward function) in the training process. In The autonomous navigation and obstacle visibility for USVs with ANOA depth retrieval method (knowlege-Based Systems (2020) 5) 196), autonomous navigation and obstacle avoidance of The unmanned ship are realized in a simple simulation environment by an enhanced learning algorithm, i.e., a DQN method with a convolutional neural network. In a robot obstacle avoidance method based on a DoubleDQN network and deep reinforcement learning (patent, publication number CN 109407676A, publication date 2019.03.01), obstacle avoidance of a robot is realized through a reinforcement learning algorithm, namely a DoubleDQN algorithm. Because the intelligent agent performs optimization through random sampling experience in the experience pool, when the size of a scene applied by the two algorithms changes, for example, the scene of the unmanned aerial vehicle executing tasks is large and the navigation speed of the unmanned aerial vehicle is low compared with the size of the scene, the occupation ratio of various types of experience in the experience pool changes, which limits further improvement of the training effect of the intelligent agent in the later training period, and even possibly causes the training failure of the intelligent agent.
Therefore, the method which can realize the autonomous image navigation and obstacle avoidance task of the unmanned aerial vehicle in a complex scene and has a good application effect is designed to have great significance.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides an unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning. The unmanned aerial vehicle who is mainly used for carrying out low-altitude flight tasks such as reconnaissance, search and rescue realizes keeping away the barrier when carrying out the task, keeps away the barrier here and mainly means avoiding unmanned aerial vehicle to get into the forbidden zone that flies that ground barrier produced. Currently, the autonomous navigation and obstacle avoidance algorithm based on images has the problem that the output action is not globally optimal; the autonomous navigation and obstacle avoidance algorithm based on reinforcement learning also has the problem of poor training effect in the scene of the invention. Therefore, the invention provides a method based on improved reinforcement learning, an intelligent agent with stronger autonomous image navigation and obstacle avoidance capabilities can be obtained through the method, and the intelligent agent controls the unmanned aerial vehicle to execute tasks.
Technical scheme
An unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning is characterized by comprising the following steps:
step 1: unmanned aerial vehicle autonomous image navigation and obstacle avoidance problem modeling;
1. setting a kinematic model of the unmanned aerial vehicle;
Figure BDA0003807897280000021
wherein, P u =[x u (t),y u (t),z u (t)]Is the position of the unmanned aerial vehicle, V is the speed of the unmanned aerial vehicle, and χ (t) and γ (t) are the heading angle and the climbing angle of the unmanned aerial vehicle, respectively, [ u [ ] γ ,u χ ]Is the control quantity of the unmanned aerial vehicle;
2. arrival definition;
the location of the destination is P g =[x g (t)y g (t)z g (t)] T The radius of the destination area of influence is R g (ii) a Distance D between unmanned aerial vehicle and destination g Is defined as
Figure BDA0003807897280000031
When D is present g ≤R g When the unmanned aerial vehicle arrives at the destination;
3. defining collision;
position of the obstacle is P obs =[x obs (t)y obs (t)z obs (t)] T The radius of a no-fly zone generated by the barrier is R obs (ii) a Distance D between unmanned aerial vehicle and obstacle obs Is defined as
Figure BDA0003807897280000032
When D is present obs <R obs When the unmanned aerial vehicle enters a no-fly zone generated by the barrier, the unmanned aerial vehicle collides with the barrier;
4. out-of-bound definition;
when the unmanned aerial vehicle executes the task, the flying range is
P range ={(x,y,z)|X min ≤x≤X max ,Y min ≤y(t)≤Y max ,H min ≤z≤H max }
When in use
Figure BDA0003807897280000033
When the unmanned aerial vehicle is out of bound;
step 2: image s acquired from onboard camera o Extracting position information of the obstacle;
1. identification of images s by the Faster R-CNN model o The obstacle in (1);
Figure BDA0003807897280000034
wherein obs posImage Is the recognition result of the Faster R-CNN model, and the subscript i represents the ith obstacle recognized by the Faster R-CNN model; x is the number of i,1 ,y i,1 And x i,2 ,y i,2 Respectively representing the coordinates of the upper left corner and the lower right corner of the obstacle;
2. processing the recognition result of the Faster R-CNN model;
obs′ pos =(x′ o ,y′ o )=(τ 1 ×x oInImage ,(-1×τ 1 ×y oInImage ))
U′ pos =(x′ U ,y′ U )=(τ 1 ×x image /2,(-(τ 1 ×y image +d c )))
wherein x is image And y image Is the size of the image, τ 1 Is the scale of the image, d c Is not provided withThe distance between the human-machine and the view frame;
3. the position information of the obstacle is
Figure BDA0003807897280000035
Figure BDA0003807897280000041
Wherein, theta' o Is an obstacle-the unmanned aerial vehicle front corner; d' OtoU The distance between the unmanned aerial vehicle and the obstacle; χ' is the relative heading angle of the drone in the sight frame;
and step 3: experience logging mechanism in formulating training agents
1. The agent
The structure of the agent decision network is 29 × 512 × 128 × 6, where 29 is the number of input nodes and 6 is the number of output nodes;
2. intelligent input s' (t)
Suppose the position of the drone is P u =[x u (t),y u (t),z u (t)]The pre-designated destination position is P g =[x g (t)y g (t)z g (t)] T (ii) a Distance D between unmanned aerial vehicle and destination g With the leading angle theta of the unmanned aerial vehicle in the XOY plane g_XOY Is defined as:
Figure BDA0003807897280000042
Figure BDA0003807897280000043
the inputs to the agent are: s' (t) = [ z ] u (t),H UtoG (t),D g (t),θ g_XOY (t),χ(t),s′ o (t)]
Wherein H UtoG (t) unmanned aerial vehicle and destinationS 'is the height difference' o =[D′ OtoU ,θ′ o ];
3. Defining a reward function r U
Defining the destination reward of unmanned aerial vehicle as
Figure BDA0003807897280000044
/>
Defining the reward of collision of the unmanned aerial vehicle as
Figure BDA0003807897280000045
Defining the reward for the unmanned plane out of bounds as
Figure BDA0003807897280000046
Thus, the reward function r U Is r U (s(t+1),a U )=r arrived +r collision +r out
4. Classification of experiences
In the training process of the agent, the experience RM stored in the experience pool is
RM={RM(i)|RM(i)=(s i (a-),a U ,r U ,s i (a+)),i<RM Capacity }
Wherein, the superscript i represents the number of the current experience in the experience pool; s i (a-) and s i (a +) each represents performing action a U Preceding and performing action a U The latter state, RM Capacity Is the capacity of the experience pool;
in a single experience, state s i (t) the task condition is defined as
Figure BDA0003807897280000047
Wherein the content of the first and second substances,
Figure BDA0003807897280000051
acquiring a task condition represented by a specified state; e.g. of the type o ,e c ,e out ,e g Is used for describingService status parameters: e.g. of the type o For describing whether the agent detected an obstacle; e.g. of the type c The intelligent agent is used for describing whether the intelligent agent is collided or not; e.g. of the type out For describing whether the agent is out of bounds; e.g. of the type g For describing whether the agent has reached the destination;
in the training process of unmanned aerial vehicle, state s of intelligent body i (t) can be divided into the following categories: state s where no obstacle is detected, no collision occurs, no exit is possible, and no destination is reached by the drone safe (ii) a State s when the drone detects an obstacle, does not collide, does not exit, and does not reach the destination obs (ii) a Collision state s between unmanned aerial vehicle and obstacle collision (ii) a State s of unmanned aerial vehicle out of bounds out (ii) a State s of unmanned aerial vehicle arriving at destination arrival (ii) a Namely:
s i (t)∈{s safe ,s obs ,s collision ,s out ,s arrival }
Figure BDA0003807897280000052
Figure BDA0003807897280000053
Figure BDA0003807897280000054
Figure BDA0003807897280000055
Figure BDA0003807897280000056
thus, for any experience RM (i) =(s) i (a-),a U ,s i (a +) is divided into the following categories:
(1) Results experience RE: divided into reach experience RE arrival And collision experience RE collision RE, out of bounds experience out Namely:
RE={RE arrival ,RE collision ,RE out },RE∈RM
RE arrival ={RM(i){s i (a-)∈s safe ,s i (a+)∈s arrival }∪{s i (a-)∈s obs ,s i (a+)∈s arrival }}
RE collision ={RM(i)|{s i (a-)∈s safe ,s i (a+)∈s collision }∪{s i (a-)∈s obs ,s i (a+)∈s collision }}
RE out ={RM(i)|{s i (a-)∈s safe ,s i (a+)∈s out }∪{s i (a-)∈s obs ,s i (a+)∈s out }}
(2) Hazard experience DE: representing that the agent has detected an obstacle, i.e.:
DE={RM(i)|{s i (a-)∈s obs ,s i (a+)∈s safe }∪{s i (a-)∈s safe ,s i (a+)∈s obs }∪{s i (a-)∈s obs ,s i (a+)∈s obs }}
(3) Safety experience SE: the intermediate state that the unmanned aerial vehicle is far away from the barrier in the process of navigating to the destination is as follows:
SE={RM(i)|{s i (a-)∈s safe ,s i (a+)∈s safe }}
5. treatment of experiences
Setting the stock ratio p for the experience of RE type, the experience of DE type, and the experience of SE type RE 、p DE 、 p SE
In the training process, classifying the generated experience according to the definition of the experience type, randomly screening the experience according to the experience storage ratio of the type, storing part of the experience in an experience pool, and discarding the rest of the experience; the number relation of various types of experience in the experience pool RM' after the experience pool is verified and stored into the mechanism adjustment is as follows:
|RM′|=p RE ×|RE|+p DE ×|DE|+p SE ×|SE|
wherein, | · | is the number of specified experiences in the experience pool;
and 4, step 4: training agent according to FRDDM-DQN algorithm
1. Training a Faster-R CNN model to identify a specified obstacle;
initializing a Faster-R CNN model through a pre-training model VGG 16;
setting an initial learning rate, a delay coefficient and a delay weight of the Faster-R CNN model;
acquiring an image containing an obstacle through an unmanned aerial vehicle, and marking the position of the obstacle and the type of the obstacle in the image;
training a Faster-R CNN model through an image containing an obstacle and corresponding labeling information;
obtaining a Faster-R CNN model for recognizing the obstacles after the training is finished;
2. training an agent based on the output of the Faster-R CNN model;
step 2.1: initializing relevant parameters
Setting a reward function r U Experience pool experience logging mechanism, experience logging ratio p = p SE :p DE :p RE
Initializing empirical pool capacity RM Capacity Attenuation coefficient gamma, maximum number of steps T of single screen e Maximum effective training step number T t Network update frequency C;
initial search rate epsilon and minimum search rate epsilon min A search rate reset period N and a search rate reset value epsilon reset
Initial learning rate alpha, segmented learning rate alpha 1234 ]Boundary of segmented learning rate (n) 1 ,n 2 ,n 3 ,n 4 );
The decision network of the agent is divided into a prediction network and a target network; initializing a predictive network Q and a target network
Figure BDA0003807897280000061
Parameters theta and theta of -
Step 2.2: initializing a training scene;
initializing the starting point and the destination position of the unmanned aerial vehicle, and initializing the position of the obstacle;
number of steps t executed in a single screen e Effective training step number t t Reset to 0;
acquiring an initial state s' (t);
step 2.3: selecting action a according to state s' (t) U
Taking a random number p epsilon [0,1], and if p is larger than epsilon, selecting an action according to the predicted network Q; otherwise, selecting random action;
step 2.4: performing action a U Then, the prize r is acquired U And the new state s '(t + 1) and obtaining the experience RM = (s' (t), a) generated at the current time step U ,r U ,s′(t+1));
Step 2.5: processing the experience RM;
storing the experience RM into an experience pool or discarding the experience RM according to an experience pool storing mechanism;
step 2.6: updating the learning rate:
the learning rate adopts a segmented fixed learning rate, and the learning rate is updated according to an adjustment strategy;
step 2.7: updating the exploration rate;
updating the exploration rate according to the exploration rate updating strategy;
step 2.8: performing network optimization;
if the network optimization is not executed, the step 2.9 is carried out; otherwise, executing network optimization:
randomly sampling m groups of experiences from an experience pool;
if the experience is an end experience, the predicted Q value y = r of the target network U (ii) a If the experience is not over, the predicted Q value of the target network is set as
Figure BDA0003807897280000071
Calculation of loss L (θ) = E (y-Q (s (t), a) U (t),θ));
Optimizing a parameter theta of the prediction network according to the loss value L (theta) through a gradient descent algorithm;
covering the target network with the parameters of the prediction network every C significant steps, i.e. theta - =θ;
Step 2.9: update state s '(t) ← s' (t + 1), t e ←t e +1;
Step 2.10: judging a training state;
if the number of valid steps t t ≥T t If so, ending the training and storing the agent at the moment; otherwise, whether the unmanned aerial vehicle reaches the destination at the current time step or not, whether the unmanned aerial vehicle collides or not, whether the unmanned aerial vehicle is out of bounds or reaches the maximum step number T of the single screen is continuously judged e If yes, the current screen is ended, and the step 2.2 is carried out; otherwise, turning to the step 2.3;
and 5: and (4) controlling the unmanned aerial vehicle to perform autonomous image navigation and obstacle avoidance through the intelligent agent saved in the step (4).
In the step 2: during the execution of a task, x image 、y image 、d c Is a fixed value; χ' is constant.
E in step 3 for describing whether the agent reaches the destination g The values and meanings are as follows:
Figure BDA0003807897280000072
the experience pool storing mechanism in the step 3 is described as follows:
Figure BDA0003807897280000073
/>
Figure RE-GDA0004028563750000081
the adjustment strategy updating learning rate in the step 3 is as follows:
Figure BDA0003807897280000082
/>
Figure BDA0003807897280000091
the search rate update strategy in the step 3 is as follows:
Figure RE-GDA0004028563750000092
/>
Figure RE-GDA0004028563750000101
advantageous effects
The invention provides an unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning, and provides an unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on an image and experience pool storage mechanism, namely an FRDDM-DQN method. In the invention, an agent meeting the requirement is trained by an FRDDM-DQN method; when the task is executed, the trained intelligent body controls the unmanned aerial vehicle to realize autonomous image navigation and obstacle avoidance.
In the FRDDM-DQN method, firstly, images acquired by an airborne camera are processed through a Faster R-CNN model, namely obstacle information is extracted from the images, and the obstacle information is converted according to the kinematics characteristics of an unmanned aerial vehicle. The obstacle information is then added in the input state and training of the agent begins. During the training process, the number of different types of experience in the experience pool is found to be widely different, which affects the convergence of the agent. According to the characteristic that the unmanned aerial vehicle generates experience when executing tasks, an experience pool storing mechanism is provided. During training, the mechanism sets a logging rate for each type of experience, i.e., each type of experience is only partially logged, and the rest of the experience is discarded. And randomly extracting a small batch of experience from the optimized experience pool to optimize the network. And finally, training an intelligent body with stronger autonomous image navigation and obstacle avoidance capabilities in a complex and unknown scene.
Has the advantages that:
1. the unmanned aerial vehicle autonomous image navigation and obstacle avoidance capability under the complex environment is obtained by introducing the Faster R-CNN model into the DQN algorithm and converting the recognition result of the Faster R-CNN model.
According to the invention, the Faster R-CNN model is added in the DQN algorithm, and the Faster R-CNN model has stronger image recognition capability, so that the DQN algorithm combined with the Faster R-CNN model can preliminarily realize unmanned aerial vehicle autonomous navigation and obstacle avoidance based on images. In addition, the output of the Faster R-CNN model is converted, namely the obstacle coordinate information output by the Faster R-CNN model is converted into angle and distance information according to the kinematics characteristics of the unmanned aerial vehicle, so that the intelligent agent trained by the algorithm can better control the unmanned aerial vehicle.
2. By introducing the experience pool data storage mechanism provided by the invention into the DQN algorithm, the autonomous image navigation and obstacle avoidance capability of the unmanned aerial vehicle in a complex environment is improved.
As the DQN algorithm is added with the experience pool data storage mechanism provided by the invention, the mechanism classifies the experiences generated by training and assigns corresponding storage ratios to each experience type, so that compared with the DQN algorithm, the method provided by the invention trains an intelligent agent with stronger autonomous image navigation and obstacle avoidance capabilities.
3. By the aid of the method of subsection training, retraining time consumption during application scene change is reduced.
The training process is divided into two parts to be respectively executed, namely, the training of the Faster R-CNN model for image recognition and the FRDDM-DQN training based on the output value of the Faster R-CNN model are separately executed, so that the time consumption for retraining is reduced when the application scene is changed, namely, only the Faster R-CNN model needs to be retrained to recognize the specified obstacle, and the FRDDM-DQN does not need to be retrained.
Drawings
FIG. 1: the method is a three-dimensional task scene graph of the unmanned aerial vehicle.
FIG. 2: is a two-dimensional task scene graph of the unmanned aerial vehicle.
FIG. 3: the relative position relationship diagram of the unmanned aerial vehicle, the obstacle and the airborne camera view frame is shown.
FIG. 4 is a schematic view of: is a flow chart of the FRDDM-DQN method provided by the invention.
FIG. 5: is the result of the Faster R-CNN model recognizing the obstacle.
FIG. 6: the invention provides an arrival rate curve chart of the FRDDM-DQN method and the FR-DQN method in the training process.
FIG. 7: the test result chart of the FRDDM-DQN method and the FR-DQN method provided by the invention for the intelligent agent at different training stages in the training process.
FIG. 8: is a trace diagram of an agent trained based on the FRDDM-DQN method and the FR-DQN method in an environment containing multiple static obstacles.
FIG. 9: is a trace diagram of an agent trained based on the FRDDM-DQN method and the FR-DQN method in an environment containing multiple dynamic obstacles.
Detailed Description
The invention will now be further described with reference to the following examples and drawings:
step 1: modeling autonomous image navigation and obstacle avoidance problems of the unmanned aerial vehicle;
in order to realize the autonomous image navigation and obstacle avoidance functions of the unmanned aerial vehicle, the problem is defined firstly. The method provided by the invention is a reinforcement learning algorithm, and also defines core elements of the reinforcement learning algorithm, namely states, actions and reward functions;
step 1-1: the unmanned aerial vehicle autonomous image navigation and obstacle avoidance problem is defined, and the scene is shown in fig. 1 and fig. 2;
in the invention, the task executed by the unmanned aerial vehicle is to fast navigate from a starting point to a specified destination, and avoid entering a no-fly zone generated by ground obstacles in the process of executing the task;
position of unmanned aerial vehicle is P u =[x u (t)y u (t)z u (t)] T The speed is fixed as V, and the heading angle and the climbing angle of the unmanned aerial vehicle are respectively chi (t) and gamma (t). In order for the unmanned aerial vehicle to better complete the task, the unmanned aerial vehicle should be (H) min ,H max ) Sailing in altitude, wherein H min And H max Respectively representing the minimum value and the maximum value of the flight altitude of the unmanned aerial vehicle;
the location of the destination is P g =[x g (t)y g (t)z g (t)] T Distance D between unmanned aerial vehicle and destination g With the leading angle theta of the drone in the XOY plane g_XOY Is defined as:
Figure BDA0003807897280000121
Figure BDA0003807897280000122
defining the area of influence of the destination as a radius R g When D is g ≤R g When the unmanned aerial vehicle arrives at the destination, the unmanned aerial vehicle is considered to arrive at the destination;
position of the obstacle is P obs =[x obs (t)y obs (t)z obs (t)] T The radius of a no-fly zone generated by the barrier is R obs . Distance D between unmanned aerial vehicle and obstacle obs Is defined as
Figure BDA0003807897280000123
When D is present obs <R obs When the unmanned aerial vehicle enters into a no-fly zone generated by the obstacle, the unmanned aerial vehicle is considered to collide with the obstacle;
step 1-2: setting a kinematics model of the unmanned aerial vehicle;
Figure BDA0003807897280000124
a U is a control quantity of an agent, i.e. a U =[u γ ,u χ ]
Step 1-3: setting the state s (t) of the agent;
the available state information of the unmanned aerial vehicle is as follows: state s of the drone U Destination information s g Image information s o . Wherein the state of the drone is acquired by a GPS and a gyroscope, s U =[x u (t),y u (t),z u (t),V,γ(t),χ(t)](ii) a Destination information s g Is pre-specified before execution of the task, s g =[x g (t),y g (t),z g (t)](ii) a Image information s o The unmanned aerial vehicle obstacle avoidance system is acquired by an airborne camera, and the image is mainly used for guiding the unmanned aerial vehicle to avoid obstacles. Considering the kinematics of the drone, the input state s (t) is defined as
s(t)=[z u (t),H UtoG (t),D g (t),θ g_XOY (t),χ(t),s o (t)]
Wherein H UtoG (t) is the altitude difference of the drone and the destination;
step 1-4: setting an action a of an agent U (t);
Defining the action as a according to the input state s (t) defined in the previous step U (t)=[u γ ,u χ ]Wherein u is χ And u γ Respectively controlling the navigation angle and the climbing angle;
step 1-5: setting a reward function r for an agent U
During the training process, the agent selects action a according to the current state s (t) U At the execution of a U Then a new state s (t + 1) is reached and the prize value is obtained from the environment. The prize value is the prize function r U To output of (c).
In order for the agent to know that he should navigate to the destination, the reward in the reward function as to whether the drone is arriving at the destination is set to
Figure BDA0003807897280000131
That is, when the drone arrives at the destination, the reward value is set to +1, otherwise 0. In the course of navigating the unmanned aerial vehicle, when approaching an obstacle, in order to make the intelligent agent know that a collision should be avoided, the reward about the collision in the reward function is set to be
Figure BDA0003807897280000132
That is, when the drone collides with an obstacle, the reward is set to-1, otherwise 0. In addition, in order to guarantee the mission effect, the unmanned aerial vehicle should be (H) min ,H max ) Sailing in the height range; in order to improve the training efficiency of the agent and prevent the agent from detouring for obstacle avoidance (detouring the obstacle area from the outermost side of the obstacle area), a horizontal boundary is set in the XOY plane. In order for the drone to know that it should navigate within the boundary, the reward for out of bounds in the reward function is set to
Figure BDA0003807897280000133
That is, when the drone is out of bounds, the value of the reward is set to-1, otherwise 0. The method provided by the invention is a reinforcement learning algorithm, and the essence of the reinforcement learning algorithm is to enable an intelligent agent to find an optimal strategy in the process of interaction with the environment. Therefore, no other bonus is set regarding the intermediate state. In summary, the reward function is set to
r U (s(t+1),a U )=r arrived +r collision +r out
Step 2: extracting obstacle information from image information acquired by a airborne camera;
in the state s (t) defined in steps 1-3, s o Is the image information collected by the onboard camera. To get s o The method is used for guiding the intelligent body to avoid the obstacle, and obstacle information in the image needs to be extracted through a Faster R-CNN model. For more efficient training, further processing of the output information of the Faster R-CNN model is required.
Step 2-1: identifying an obstacle in the image through a Faster R-CNN model;
in the input state s (t) of the agent, the information about the obstacle is only the image information s collected by the onboard camera of the unmanned aerial vehicle o . The information of the obstacles possibly existing in the image is identified through a Faster R-CNN model, and the output is
Figure BDA0003807897280000141
Wherein, the subscript i represents the ith anchor point identified by the Faster R-CNN model; b i Is the bounding box (rectangular box) of the ith anchor point, x i,1 ,y i,1 And x i,2 ,y i,2 Respectively representing the coordinates of the upper left corner and the lower right corner of the anchor point bounding box, as shown in the figure; obs posImage Is the coordinates of the center of the anchor point bounding box, i.e., the relative coordinates of the center of the obstacle in the image. obs posImage Is also the recognition result of the Faster R-CNN model.
Step 2-2: processing the recognition result of the Faster R-CNN model, as shown in FIG. 3;
obs recognition result due to the Faster R-CNN model posImage The relative coordinates of the obstacles in the image are adopted, and the intelligent agent cannot directly avoid the obstacles through the information, so that the information is further processed. Will obs posImage Converting to relative coordinates obs 'between drone and obstacle' pos I.e. by
obs′ pos =(x′ o ,y′o)=(τ 1 ×x oInImage ,(-1×τ 1 ×y oInImage ))
Wherein x is image And y image Is the size of the image, τ 1 Is the scale of the image. After the onboard camera is mounted and fixed to the drone, the drone may be considered a fixed value during the performance of the mission. In addition, the position relation between the unmanned aerial vehicle and the onboard camera sight frame is fixed. Thus, in the relative positional relationship of the unmanned aerial vehicle and the view field frame, the unmanned aerial vehicleThe position can be considered fixed with coordinates of
U′ pos =(x′ U ,y′ U )=(τ 1 ×x image /2,(-(τ 1 ×y image +d c )))
Wherein d is c Is the distance between the unmanned aerial vehicle and the view field frame. After the camera is fixed to the drone, during the execution of the mission, x image 、y image 、d c Can be considered as a fixed value. Since the intelligent agent is controlled by the unmanned aerial vehicle in the current task scene, the input coordinate information is not beneficial to the convergence of the intelligent agent. Thus, the positional relationship of the unmanned aerial vehicle to the obstacle in the image is converted into an obstacle-unmanned aerial vehicle front angle θ' o And distance D 'between unmanned aerial vehicle and obstacle' OtoU
Figure BDA0003807897280000142
Figure BDA0003807897280000143
Where χ' is the relative heading angle of the drone in the frame of the view, this value is constant after the onboard camera is fixed to the drone.
Finally, the output of the Faster R-CNN model is
s′(t)=[z u (t),H UtoG (t),D g (t),θ g_XOY (t),χ(t),s′ o (t)]
Wherein, s' o =[D′ OtoU ,θ′ o ]。
And 3, step 3: screening the experience stored in the experience pool through an experience pool storing mechanism;
the agent is trained by the new state s' (t) containing the obstacle information obtained in step 2. In the training process, although the number of experiences in the experience pool is large, the number of experiences of various types is not uniformly distributed. The experience pool storing mechanism provided by the invention is that each type of experience in the experience pool is analyzed, a proper storing ratio is set for the experience pool, and the rest of experiences are discarded, namely, only part of experiences in each type of experience are stored in the experience pool. In this way, the number of each type of experience in the experience pool can be rebalanced, which is beneficial to improving the training effect of the agent.
Step 3-1: defining a single experience;
in the training process of the agent, the experience RM stored in the experience pool is
RM={RM(i)|RM(i)=(s i (a-),a U ,s i (a+)),i<RM Capacity }
Wherein, the superscript i represents the number of the current experience in the experience pool; s is i (a-) and s i (a +) represents the execution of action a, respectively U Preceding and performing action a U The latter state, RM Capacity Is the capacity of the experience pool. At this time, state s in the i-th experience i (a +) is s of the i +1 th experience i+1 (a-)。
In a single experience, state s i The task condition represented by (t) can be defined as
Figure BDA0003807897280000152
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003807897280000153
the representative obtains the task status represented by the designated state; e.g. of a cylinder o ,e c ,e out ,e g Are parameters used to describe the task conditions: e.g. of the type o For describing whether the agent detected an obstacle; e.g. of the type c The intelligent agent is used for describing whether the intelligent agent is collided or not; e.g. of the type out For describing whether the agent is out of bounds; e.g. of the type g For describing whether the agent has reached the destination. The values and meanings are as follows:
Figure BDA0003807897280000151
in the training process of unmanned aerial vehicle, state s of intelligent body i (t) may beThe method is divided into the following categories: state s in which the drone has not detected an obstacle, has not collided, has not exited and has not reached the destination safe (ii) a State s when the drone detects an obstacle, does not collide, does not exit, and does not reach the destination obs (ii) a Collision state s between unmanned aerial vehicle and obstacle collision (ii) a State s of unmanned aerial vehicle out of bounds out (ii) a State s of unmanned aerial vehicle arriving at destination arrival . Namely:
s i (t)∈{s safe ,s obs ,s collision ,s out ,s arrival }
Figure BDA0003807897280000154
Figure BDA0003807897280000155
Figure BDA0003807897280000156
Figure BDA0003807897280000161
Figure BDA0003807897280000162
step 3-2: classifying the experience RM (i) and setting an experience storing ratio;
for any experience RM (i) =(s) i (a-),a U ,s i (a +)), which can be classified into the following categories:
(1) The result is an experience RE, this type of experience being the experience generated in the last step of each screen. Experience of this type can be further divided into arriving experience REs arrival And collision experience RE collision Exit experience RE out Namely:
RE={RE arrival ,RE collision ,RE out },RE∈RM
RE arrival ={RM(i)|{s i (a-)∈s safe ,s i (a+)∈s arrival }∪{s i (a-)∈s obs ,s i (a+)∈s arrival }}
RE collision ={RM(i)|{s i (a-)∈s safe ,s i (a+)∈s collision }∪{s i (a-)∈s obs ,s i (a+)∈s collision }}
RE out ={RM(i)|{s i (a-)∈s safe ,s i (a+)∈s out }∪{s i (a-)∈s obs ,s i (a+)∈s out }}
all three of these end experiences in the end experience RE have a non-zero reward value by which the agent can learn an effective policy, namely: knowing through the positive reward agent that the destination should be navigated to; intelligent perception through negative prize values will know that collisions or out of bounds should be avoided. In addition, since the end experience is generated only at the end of each screen, the number of end experience REs in the experience pool is small. Therefore, the experience of RE type should be set to a higher score ratio p RE
(2) Hazard experience DE, this type of experience representing that an intelligent agent has detected an obstacle. The type of experience shows that the unmanned aerial vehicle approaches the obstacle, at the moment, the intelligent body needs to decide whether to avoid the obstacle according to the current state, and if so, decides to execute obstacle avoidance action after synthesizing destination information. Defining the hazard experience DE as
DE={RM(i)|{s i (a-)∈s obs ,s i (a+)∈s safe }∪{s i (a-)∈s safe ,s i (a+)∈s obs }∪{s i (a-)∈s obs ,s i (a+)∈s obs Where, experience { RM (i) | s i (a-)∈s obs ,s i (a+)∈s safe Represents that an obstacle is detected before the action is executed and is not detected after the action is executed; experience with{RM(i)|s i (a-)∈s safe ,s i (a+)∈s obs Represents that no obstacle is detected before the action is executed and an obstacle is detected after the action is executed; experience { RM (i) | s i (a-)∈s obs ,s i (a+)∈s obs Represents that the obstacles are detected before and after the action is executed;
through experience with RE types, the agent may know that collisions with obstacles should be avoided when approaching them. With DE type experience, the agent can learn how to avoid collisions when approaching an obstacle. This type of experience occurs when the drone is near an obstacle during training. As training progresses, the more the intelligence will behave to understand the rules, the more situations will be navigating between obstacles. The number of this type of experience in the final experience pool is also large. Thus, a lower score ratio p is set for the DE type experience DE
(3) Safety experience SE, this type of experience being an intermediate state of the drone moving away from the obstacle during its journey to the destination, i.e. SE = { RM (i) | { s = i (a-)∈s safe ,s i (a+)∈s safe }}
This type of experience may help the agent learn how to navigate to the destination in order for the agent to eventually reach the destination (get a positive reward). This type of experience is more extensive due to the larger scene in which the drone performs the task. Therefore, a lower logging ratio p should be set for the safety experience SE SE
Step 3-3: processing experience generated in the training process through a set logging ratio to obtain an optimized experience pool RM';
based on the above analysis, different experience stocking ratios are set for the various types of experience. In the training process, the experience part of each type is stored in an experience pool according to different experience storing ratios, and the rest of the experience is discarded. The quantitative relation of various types of experience in the experience pool after the experience pool is stored into the experience pool adjusted by the mechanism is as follows:
|RM′|=p RE ×|RE|+p DE ×|DE|+p SE ×|SE|
where | is the number of specified experiences in the experience pool.
And 4, step 4: training an intelligent body for realizing unmanned aerial vehicle autonomous image navigation and obstacle avoidance through an FRDDM-DQN method;
in step 2, an image s for obstacle avoidance in the input state of the agent is used o Processing is carried out, and a new state s' (t) capable of helping the intelligent body to avoid the obstacle is obtained; in step 3, an experience pool experience storage mechanism is provided, and storage rules for generating experiences in training are specified. In this step, an intelligent agent is trained by an optimized DQN algorithm, namely an FRDDM-DQN method, and the structure of the FRDDM-DQN method is shown in FIG. 4.
The training of the FRDDM-DQN method is divided into two parts, and a Faster-R CNN model is trained to identify a specified obstacle; training an FRDDM-DQN method based on the output value of the Faster-R CNN model;
step 4-1: training a Faster-R CNN model to identify a specified obstacle;
initializing a Faster-R CNN model through a pre-training model VGG 16;
setting an initial learning rate, a delay coefficient and a delay weight of the Faster-R CNN model;
acquiring an image containing an obstacle through an unmanned aerial vehicle, and marking the position of the obstacle and the type of the obstacle in the acquired image;
training a Faster-R CNN model through an image containing an obstacle and corresponding labeling information;
after the training is finished, obtaining a Faster-R CNN model capable of identifying the obstacle, wherein the identification result is shown in figure 5;
step 4-2: training FRDDM-DQN based on the output value of the Faster-R CNN model;
step 4-2-1: establishing a decision network of an agent in an FRDDM-DQN method and initializing related parameters;
building decision network of intelligent agent, setting reward function r U Experience ratio p = p in experience pool experience logging mechanism SE :p DE :p RE
Initializing empirical pool capacity RM Capacity Learning rate alpha, attenuation coefficient gamma, search rate epsilon, search rate minimum value epsilon min A search rate reset period N and a search rate reset value epsilon reset Maximum number of steps T of single screen e Maximum effective training step number T t Network update frequency C;
in the training of the FRDDM-DQN method, the decision network of an agent is divided into a prediction network and a target network. Initialized prediction network Q and target network
Figure BDA0003807897280000181
Parameters theta and theta of -
Step 4-2-2: initializing a training scene;
initializing the starting point and the destination position of the unmanned aerial vehicle, and initializing the position of an obstacle;
number of steps t executed in a single screen e Effective training step number t t Reset to 0;
acquiring an initial state s' (t);
step 4-2-3: selecting action a according to action selection policy and state s' (t) U
Step 4-2-4: performing action a U Then, the prize r is acquired U And the new state s '(t + 1) and obtaining the experience RM = (s' (t), a) generated at the current time step U ,r U ,s′(t+1));
Step 4-2-5: screening the experience according to an experience storing mechanism provided in the step 3;
if the experience is stored in the experience pool, the time step is a valid step t t ←t t +1; if the experience is not stored in the experience pool, discarding the experience, wherein the time step is not a valid step, t t ←t t
Step 4-2-6: changing the learning rate alpha according to a learning rate adjusting strategy;
step 4-2-7: changing the exploration rate epsilon according to an exploration rate adjusting strategy;
step 4-2-8: judging whether the current time step executes network optimization or not;
if the network optimization is not executed, the step 4-2-10 is carried out; if network optimization is performed:
randomly sampling m sets of experiences from the experience pool RM' and calculating the m sets of experiences:
if the experience is an end experience, the predicted Q value y = r of the target network U (ii) a If the experience is not over, the predicted Q value of the target network is set as
Figure BDA0003807897280000182
Calculation of loss L (θ) = E (y-Q (s (t), a) U (t),θ));
Optimizing a parameter theta of the prediction network according to the loss value L (theta) through a gradient descent algorithm;
overlaying the target network with the parameters of the prediction network every C significant steps, i.e. theta - =θ;
Step 4-2-9: updating state and step number: s '(t) ← s' (t + 1), t e ←t e +1;
Step 4-2-10: judging a training state;
if the number of valid steps t t ≥T t Turning to the step 4-2-11: (ii) a Otherwise, whether the unmanned aerial vehicle reaches the destination or not at the current time step or whether the unmanned aerial vehicle collides or whether the unmanned aerial vehicle is out of range is continuously judged, if so, the current screen is ended, and the step 4-2-2 is carried out: (ii) a Otherwise, continuously judging the number t of steps executed by the single screen e Maximum number of steps T of single screen e If t is e ≥T e If yes, the method goes to step 4-2-2: (ii) a Otherwise, turning to the step 4-2-3: continuing training;
step 4-2-11: and after the training is finished, saving the currently trained network. The network is the decision network of the trained agent, and the step 5 is carried out;
and 5: through the steps 4-2-11: the trained intelligent body controls the unmanned aerial vehicle to carry out autonomous image navigation and obstacle avoidance;
the trained intelligent agent is tested by setting different scenes. The starting point and the destination of the unmanned aerial vehicle are randomly generated, the initial position of the obstacle is randomly generated, and the motion direction of the dynamic obstacle is randomly generated;
the specific embodiment is as follows:
step 1: unmanned aerial vehicle autonomous image navigation and obstacle avoidance problem modeling;
step 1-1: defining the unmanned aerial vehicle autonomous image navigation and obstacle avoidance problem;
position of unmanned plane is P u =[x u (t)y u (t)z u (t)] T The heading angle and the climbing angle are respectively chi (t) and gamma (t), and the speed is fixed to be v =42m/s;
the location of the destination is P g =[x g (t)y g (t)z g (t)] T The radius of the destination area of influence is R g =2000m, destination-drone lead angle θ in XOY plane g_XOY Distance between unmanned aerial vehicle and destination is D g . When D is g ≤R g When the unmanned aerial vehicle arrives at the destination, the unmanned aerial vehicle is regarded as the unmanned aerial vehicle;
position of the obstacle is P obs =[x obs (t)y obs (t)z obs (t)] T The radius of a no-fly zone generated by the barrier is R obs =1500m; the distance between the unmanned plane and the barrier is D obs When D is present obs <R obs When the unmanned aerial vehicle enters a no-fly zone generated by the barrier, the unmanned aerial vehicle is considered to collide with the barrier;
step 1-2: setting a kinematic model of the unmanned aerial vehicle;
Figure BDA0003807897280000191
a U is a controlled quantity of an agent, i.e. a U =[u γ ,u χ ]
Step 1-3: setting the state s (t) of the intelligent agent;
the available state information of the unmanned aerial vehicle is as follows: state s of the drone U Destination information s g Image information s o . Wherein the state of the drone is acquired by a GPS and a gyroscope, s U =[x u (t),y u (t),z u (t),V,γ(t),χ(t)](ii) a Destination information s g Is pre-specified before execution of the task, s g =[x g (t),y g (t),z g (t)](ii) a Image information s o The image is acquired by an airborne camera and is mainly used for guiding the unmanned aerial vehicle to avoid obstacles;
considering the kinematics of the drone, the input state s (t) is defined as
s(t)=[z u (t),H UtoG (t),D g (t),θ g_XOY (t),χ(t),s o (t)]
Wherein H UtoG (t) is the altitude difference of the drone and the destination;
step 1-4: setting actions a of Agents U (t);
Defining the action as a according to the input state s (t) defined in the previous step U (t)=[u γ ,u χ ]Wherein u is χ And u γ Respectively controlling the navigation angle and the climbing angle;
step 1-5: setting a reward function r for an agent U
In order for the agent to know that he should navigate to the destination, the reward in the reward function as to whether the drone is arriving at the destination is set to
Figure BDA0003807897280000201
That is, when the drone arrives at the destination, the reward value is set to +1, otherwise 0.
During the course of the flight of the drone, in order for the agent to know that a collision should be avoided when approaching an obstacle, the reward for the collision in the reward function is set to
Figure BDA0003807897280000202
That is, when the drone collides with an obstacle, the reward is set to-1, otherwise 0.
In order for the drone to know that it should be sailing within the boundary, the reward for out of bounds in the reward function is set to
Figure BDA0003807897280000203
That is, when the drone is out of bounds, the value of the reward is set to-1, otherwise 0.
The method provided by the invention is a reinforcement learning algorithm, and the essence of the reinforcement learning algorithm is to enable an intelligent agent to independently find an optimal strategy in the process of interacting with the environment. Therefore, no other bonus is set regarding the intermediate state. In summary, the reward function is set to
r U (s(t+1),a U )=r arrived +r collision +r out
Step 2: training an intelligent body for realizing unmanned aerial vehicle autonomous image navigation and obstacle avoidance through an FRDDM-DQN method;
step 2-1: for image information s in input state s (t) o Carrying out treatment;
step 2-1-1: training fast-R CNN model recognition image information s o The obstacle specified in (1);
initializing a Faster-R CNN model through a pre-training model VGG 16;
setting the initial learning rate of the Faster-R CNN model to 0.001, the delay coefficient to 0.1 and the delay weight to 0.0005;
acquiring an image containing a barrier by an unmanned aerial vehicle, and marking the position of the barrier and the type of the barrier in the acquired image;
training a Faster-R CNN model through an image containing an obstacle and corresponding labeling information;
obtaining a fast-R CNN model capable of identifying the obstacle after the training is finished, wherein the output of the model is obs posImage . As a result of recognition, as shown in FIG. 5, the gray dots are the obstacles recognized by the Faster-R CNN model, the letters and numbers above the gray dots mean that the type of the currently recognized object is an obstacle, and the object is an obstacleThe probability of an object;
step 2-1-2: output obs to fast-R CNN model posImage Carrying out conversion;
will obs posImage Converting to relative coordinates obs 'between drone and obstacle' pos I.e. by
obs′ pos =(x′ o ,y′ o )=(τ 1 ×x oInImage ,(-1×τ 1 ×y oInImage ))
Wherein, tau 1 Is the scale of the image, in this embodiment τ 1 =2.5。
Calculating the relative position U 'of the unmanned aerial vehicle according to the position relation between the unmanned aerial vehicle and the airborne camera view frame' pos
U′ pos =(x′ U ,y′ U )=(τ 1 ×x image /2,(-(τ 1 ×y image +d c )))
Wherein, d c Is the distance between the unmanned aerial vehicle and the view field frame, d in this embodiment c =624。
Converting the position relation of the unmanned aerial vehicle and the obstacle in the image into an obstacle-unmanned aerial vehicle front angle theta' o And distance D 'between unmanned aerial vehicle and obstacle' OtoU
Figure BDA0003807897280000211
Figure BDA0003807897280000212
Wherein χ 'is a relative heading angle of the unmanned aerial vehicle in the view frame, and χ' =90 in the embodiment, namely, the airborne camera is installed towards the front of the unmanned aerial vehicle;
the output s '(t) of the Faster R-CNN model is s' o =[D′ OtoU ,θ′ o ]
s′(t)=[z u (t),H UtoG (t),D g (t),θ g_XOY (t),χ(t),s′ o (t)]
Step 2-2: training FRDDM-DQN based on the output value of the Faster-R CNN model;
step 2-2-1: establishing a decision network of an agent in an FRDDM-DQN method and initializing related parameters;
in this embodiment, the decision network structure of the agent is 29 × 512 × 128 × 6, where 29 is the number of input nodes and 6 is the number of output nodes; setting the reward function to r U (s(t+1),a U )=r arrived +r collision +r out (ii) a The relevant parameters are set as follows:
hyper-parameter Value of
Number of samples m 64
Empirical pool capacity RM Capacity 300,000
Coefficient of attenuation gamma 0.95
Segment learning rate α = (α) 1234 ) 0.001,0.0005,0.0001,0.00005
Boundary of segmental learning rate [ n ] 1 ,n 2 ,n 3 ,n 4 ] [0,100000,200000,300000,350000]
Initial exploration rate ε 0 1.0
Minimum exploratory rate ε min 0.001
Fraction reset period N 100000
The search rate reset value epsilon reset 0.5
Maximum number of steps T of single screen e 600
Maximum effective number of steps T t 350,000
Target network update frequency (C) 3000
Data storage ratio p = p SE :p DE :p RE [0.16:0.05:1]
In the training of the FRDDM-DQN method, the decision network of an agent is divided into a prediction network and a target network. Initialized prediction network Q and target network
Figure BDA0003807897280000221
Parameters theta and theta of -
Step 2-2-2: initializing a training scene;
initializing the starting point and the destination position of the unmanned aerial vehicle, and initializing the position of an obstacle;
number of executed steps t of single screen e Effective training step number t t Reset to 0;
acquiring an initial state s' (t);
step 2-2-3: selecting action a according to action selection policy and state s' (t) U . In this embodiment, the action selection strategy adopts an epsilon-greedy algorithm;
step 2-2-4: performing action a U Then, the prize r is acquired U And the new state s '(t + 1) and obtaining the experience RM = (s' (t), a) generated at the current time step U ,r U ,s′(t+1));
Step 2-2-5: the experience is screened according to the experience storing mechanism provided by the invention, and the screening process is as follows:
Figure BDA0003807897280000231
if the piece of experience is stored in the experience pool (i.e. F) storage True), then the time step is the active step, t t ←t t +1; if the piece of experience does not fit into the experience pool (i.e., F) storage False), then the rule is discarded, the time step is not a valid step, t t ←t t
Step 2-2-6: changing the learning rate alpha according to a learning rate adjusting strategy;
in this embodiment, the learning rate is a fixed learning rate in segments, and the adjustment strategy is as follows:
Figure BDA0003807897280000232
Figure BDA0003807897280000241
step 2-2-7: changing the exploration rate epsilon according to an exploration rate adjusting strategy;
because the invention provides the concept of effective steps, the search rate epsilon changing mode in the epsilon-greedy algorithm is optimized as follows:
Figure BDA0003807897280000242
Figure BDA0003807897280000251
step 2-2-8: judging whether the current time step executes network optimization or not;
if the network optimization is not executed, the step 2-2-10 is carried out; if network optimization is performed:
randomly sampling m groups of experiences from the experience pool, and calculating the m groups of experiences:
if the experience is an end experience, the predicted Q value y = r of the target network U (ii) a If the experience is not over, the predicted Q value of the target network is set as
Figure BDA0003807897280000252
Calculation of loss L (θ) = E (y-Q (s (t), a) U (t),θ));
Optimizing a parameter theta of the prediction network according to the loss value L (theta) through a gradient descent algorithm;
covering the target network with the parameters of the prediction network every C significant steps, i.e. theta - =θ;
Step 2-2-9: updating state and step number: s '(t) ← s' (t + 1), t e ←t e +1;
Step 2-2-10: judging a training state;
if the number of valid steps t t ≥T t If so, the training is finished, the prediction network (or the target network) is the decision network of the trained intelligent body, and the step 2-2-11 is carried out: (ii) a Otherwise, whether the unmanned aerial vehicle reaches the destination at the current time step or not, whether the unmanned aerial vehicle collides or not or whether the unmanned aerial vehicle is out of bounds is continuously judged, if yes, the current screen is ended, and the step 2-2-2 is carried out:(ii) a Otherwise, continuously judging the number t of the executed steps of the single screen e Maximum number of steps T of single screen e If t is e ≥T e If yes, the method goes to step 2-2-2: (ii) a Otherwise, turning to the step 2-2-3: continuing training;
step 2-2-11: and after the training is finished, saving the currently trained network. The network at this time is the decision network of the trained agent.
To demonstrate the superiority of the FRDDM-DQN method proposed by the present invention during training, the training curve of the FRDDM-DQN method is shown in FIG. 6. For comparison, the training curves for the FR-DQN method in the same training environment are also shown. Among them, the FR-DQN method is a DQN method in combination with the Faster R-CNN model. In FIG. 6, the FRDDM-DQN process was in the first 5000 screens, the performance of the agent was not significantly improved, while the performance of the FR-DQN process agent was gradually improved; after 5000 curtains, the performance of the agent trained by the FRDDM-DQN method starts to be gradually improved, and the final arrival rate is increased to 83%, while the performance of the agent trained by the FR-DQN method is not obviously improved, and the arrival rate fluctuates around 75% all the time. Therefore, the final reaching rate of the agent trained by the FRDDM-DQN method is higher. In addition, because an epsilon-greedy strategy is adopted in the action selection strategy in the training process, the selection of partial actions of the intelligent agent is randomly generated, and the training effect of the intelligent agent cannot be completely displayed by a training curve. Thus, the agent's decision network is saved every 5000 curtains during the training process. Finally, 500 screens of testing are performed on each of the saved agents, and the testing results are shown in fig. 7. Likewise, comparative experiments were performed on the FR-DQN method. As shown in fig. 7, in the first 15000 screens of training, since each step of the FR-DQN method performs network optimization, the FR-DQN method obtains a better training effect, while the training effect of the FRDDM-DQN method is promoted more slowly; after 15000 scenes, the arrival rate of the agent trained by the FR-DQN method fluctuates around 70%, while the arrival rate of the agent trained by the FRDDM-DQN method continues to increase, and finally the arrival rate increases to 93%. Therefore, combining fig. 6 and 7 concludes: the FRDDM-DQN method provided by the invention has better performance in the training process.
Turning to the step 3 to test the trained intelligent agent;
and 3, step 3: testing steps 2-2-11 in a three-dimensional environment containing a plurality of obstacles: a saved agent;
in order to verify the performance of the agent trained by the FRDDM-DQN method proposed by the present invention, it was tested in a three-dimensional scene containing a plurality of static obstacles and dynamic obstacles as shown in fig. 8 and 9. In comparison, the FR-DQN method trained agent was also tested. In fig. 8 and 9, subgraphs (a) - (f) represent scene graphs at different times, respectively. Wherein, the small sphere represents a no-fly zone generated by the obstacle, and a black point in the middle of the small sphere is the position of the obstacle; the large sphere represents the drone destination; the black lines represent the navigation path of the agent trained by the FRDDM-DQN method, and the gray lines represent the navigation path of the agent trained by the FR-DQN method; the black lines on the obstacle of fig. 9 represent the trajectory traveled by the obstacle.
In FIG. 8, during the initial stages of the test (FIGS. 8 (a) and (b)), the intelligence trained by the FRDDM-DQN method and the FR-DQN method performed approximately the same. Starting from fig. 8 (c), when an obstacle is detected, both exhibit different obstacle avoidance strategies: after detecting an obstacle, an agent trained by the FR-DQN method does not comprehensively consider how to avoid the obstacle and quickly reach the destination, selects an obstacle avoidance strategy that can avoid the obstacle but is far away from the destination (detour), and finally reaches the destination in 672 seconds (fig. 8 (f)); after detecting an obstacle, the agent trained by the FRDDM-DQN method comprehensively considers how to avoid the obstacle and quickly reach the destination, selects an obstacle avoidance strategy that can simultaneously satisfy obstacle avoidance and quickly reach the destination, and finally reaches the destination in 366 seconds (fig. 8 (e)), which takes far less time than the agent trained by the FR-DQN method.
In fig. 9, during the initial stages of training (fig. 9 (a) and (b)), the agents trained by the FRDDM-DQN method and the FR-DQN method exhibit different strategies when navigating towards the destination. The navigation direction of the agent trained by the FRDDM-DQN method is the direction towards the destination, and when the obstacle information is unknown, the intelligent agent can reach the destination most quickly; the FR-DQN method trained agent selects a direction of flight close to, but not the fastest to, the destination. When the no-fly zones generated by two obstacles are close to each other, the agent trained by the FRDDM-DQN method can accurately control the unmanned aerial vehicle to navigate between the no-fly zones (fig. 9 (c) - (e)), and safely reach the destination in 321 seconds (fig. 9 (f)); the obstacle avoidance performance of the agent trained by the FR-DQN method is poor, and the agent collides with an obstacle at the 213 th second (fig. 9 (e)). Therefore, in a scene containing a plurality of static and dynamic obstacles, the FRDDM-DQN method provided by the invention can be used for training an intelligent object to better perform.
The FRDDM-DQN method provided by the invention realizes autonomous image navigation and obstacle avoidance of the unmanned aerial vehicle in a complex environment. In the FRDDM-DQN method, firstly, an obstacle in an image acquired by an airborne camera is identified through a Faster R-CNN model, and an identification result is converted according to the kinematics characteristic of an unmanned aerial vehicle; secondly, in the training process, the number of various types of experience in the experience pool is adjusted through the experience pool storing mechanism provided by the invention, and the problem of imbalance of the proportion of various types of experience in the DQN algorithm is improved. In the training process, an agent trained by the FRDDM-DQN method finds a better strategy; in the tests, the agent trained with the FRDDM-DQN method performed better. In conclusion, compared with the FR-DQN method, the FRDDM-DQN method provided by the invention improves the autonomous image navigation and obstacle avoidance capability of the unmanned aerial vehicle in a complex environment.

Claims (6)

1. An unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning is characterized by comprising the following steps:
step 1: unmanned aerial vehicle autonomous image navigation and obstacle avoidance problem modeling;
1. setting a kinematics model of the unmanned aerial vehicle;
Figure FDA0003807897270000011
wherein, P u =[x u (t),y u (t),z u (t)]For the position of the unmanned aerial vehicle, V is the speed of the unmanned aerial vehicle, and x (t) and gamma (t) are unmanned respectivelyHeading angle and climbing angle of the machine [ u ] γ ,u χ ]Is the control quantity of the unmanned aerial vehicle;
2. arrival definition;
the location of the destination is P g =[x g (t) y g (t) z g (t)] T The radius of the destination area of influence is R g (ii) a Distance D between unmanned aerial vehicle and destination g Is defined as
Figure FDA0003807897270000012
When D is present g ≤R g When the unmanned aerial vehicle arrives at the destination;
3. defining collision;
position of the obstacle is P obs =[x obs (t) y obs (t) z obs (t)] T The radius of a no-fly zone generated by the barrier is R obs (ii) a Distance D between unmanned aerial vehicle and obstacle obs Is defined as
Figure FDA0003807897270000013
When D is present obs <R obs When the unmanned aerial vehicle enters a no-fly zone generated by the barrier, the unmanned aerial vehicle collides with the barrier;
4. out-of-bound definition;
when the unmanned aerial vehicle executes the task, the flying range is
P range ={(x,y,z)|X min ≤x≤X max ,Y min ≤y(t)≤Y max ,H min ≤z≤H max }
When in use
Figure FDA0003807897270000014
When the unmanned aerial vehicle is out of the bound;
and 2, step: image s acquired from onboard camera o Extracting position information of the obstacle;
1. by Faster RCNN model identification image s o The obstacle in (1);
Figure FDA0003807897270000021
wherein obs posImage Is the recognition result of the Faster R-CNN model, and the subscript i represents the ith obstacle recognized by the Faster R-CNN model; x is the number of i,1 ,y i,1 And x i,2 ,y i,2 Respectively representing the coordinates of the upper left corner and the lower right corner of the obstacle;
2. processing the recognition result of the Faster R-CNN model;
obs′ pos =(x′ o ,y′ o )=(τ 1 ×x oInImage ,(-1×τ 1 ×y oInImage ))
U′ pos =(x′ U ,y′ U )=(τ 1 ×x image /2,(-(τ 1 ×y image +d c )))
wherein x is image And y image Is the size of the image, τ 1 Is the scale of the image, d c Is the distance between the unmanned aerial vehicle and the view field frame;
3. the position information of the obstacle is
Figure FDA0003807897270000022
Figure FDA0003807897270000023
Wherein, theta' o Is an obstacle-the drone lead angle; d' OtoU The distance between the unmanned aerial vehicle and the obstacle; χ' is the relative heading angle of the drone in the frame of view;
and step 3: experience logging mechanism when formulating a training agent
1. The agent
The structure of the agent decision network is 29 × 512 × 128 × 6, where 29 is the number of input nodes and 6 is the number of output nodes;
2. intelligent input s' (t)
Suppose the position of the drone is P u =[x u (t),y u (t),z u (t)]The pre-designated destination position is P g =[x g (t) y g (t) z g (t)] T (ii) a Distance D between unmanned aerial vehicle and destination g With the leading angle theta of the drone in the XOY plane g_XOY Is defined as:
Figure FDA0003807897270000024
Figure FDA0003807897270000031
the input of the agent is
s′(t)=[z u (t),H UtoG (t),D g (t),θ g_XOY (t),χ(t),s′ o (t)]
Wherein H UtoG (t) is the altitude difference, s 'between the drone and the destination' o =[D′ OtoU ,θ′ o ];
3. Defining a reward function r U
Defining the destination reward of unmanned aerial vehicle as
Figure FDA0003807897270000032
Defining the reward of collision of the unmanned aerial vehicle as
Figure FDA0003807897270000033
Defining the reward for unmanned aerial vehicle going out of bounds as
Figure FDA0003807897270000034
Thus, the reward function r U Is composed of
r U (s(t+1),a U )=r arrived +r collision +r out
4. Classification of experiences
In the training process of the agent, the experience RM stored in the experience pool is
RM={RM(i)|RM(i)=(s i (a-),a U ,r U ,s i (a+)),i<RM Capacity }
Wherein, the superscript i represents the number of the current experience in the experience pool; s i (a-) and s i (a +) each represents performing action a U Preceding and performing action a U The latter state, RM Capacity Is the capacity of the experience pool;
in a single experience, state s i (t) the task condition is defined as
[[s i (t)]]=(e o ,e c ,e out ,e g )
Wherein, [ [ ·]]Acquiring a task condition represented by a specified state; e.g. of the type o ,e c ,e out ,e g Are parameters used to describe the task status: e.g. of the type o For describing whether the agent detected an obstacle; e.g. of the type c The intelligent agent is used for describing whether the intelligent agent is collided or not; e.g. of the type out For describing whether the agent is out of bounds; e.g. of the type g For describing whether the agent has reached the destination;
in the training process of unmanned aerial vehicle, state s of intelligent body i (t) can be divided into the following categories: state s in which the drone has not detected an obstacle, has not collided, has not exited and has not reached the destination safe (ii) a State s when the drone detects an obstacle, does not collide, does not exit, and does not reach the destination obs (ii) a Collision state s between unmanned aerial vehicle and obstacle collision (ii) a State s of unmanned aerial vehicle out of bounds out (ii) a Unmanned aerial vehicle reaches purposeState of ground s arrival (ii) a Namely:
s i (t)∈{s safe ,s obs ,s collision ,s out ,s arrival }
s safe ={s i (t)|[[s i (t)]]=(e o =0,e c =0,e out =0,e g =0)}
s obs ={s i (t)|[[s i (t)]]=(e o =1,e c =0,e out =0,e g =0)}
s collision ={s i (t)|[[s i (t)]]=(e o ∈{0,1},e c =1,e out =0,e g =0)}
s out ={s i (t)|[[s i (t)]]=(e o ∈{0,1},e c =0,e out =1,e g =0)}
s arrival ={s i (t)|[[s i (t)]]=(e o ∈{0,1},e c =0,e out =0,e g =1)}
thus, for any experience RM (i) =(s) i (a-),a U ,s i (a +) is divided into the following categories:
(1) Results experience RE: divided into reach experience RE arrival And collision experience RE collision Exit experience RE out Namely:
RE={RE arrival ,RE collision ,RE out },RE∈RM
RE arrival ={RM(i)|{s i (a-)∈s safe ,s i (a+)∈s arrival }∪{s i (a-)∈s obs ,s i (a+)∈s arrival }}
RE collision ={RM(i)|{s i (a-)∈s safe ,s i (a+)∈s collision }∪{s i (a-)∈s obs ,s i (a+)∈s collision }}
RE out ={RM(i)|{s i (a-)∈s safe ,s i (a+)∈s out }∪{s i (a-)∈s obs ,s i (a+)∈s out }}
(2) Hazard experience DE: representing that the agent has detected an obstacle, namely:
DE={RM(i)|{s i (a-)∈s obs ,s i (a+)∈s safe }∪{s i (a-)∈s safe ,s i (a+)∈s obs }∪{s i (a-)∈s obs ,s i (a+)∈s obs }}
(3) Safety experience SE: the intermediate state that the unmanned aerial vehicle keeps away from the barrier to the destination navigation in-process, promptly:
SE={RM(i)|{s i (a-)∈s safe ,s i (a+)∈s safe }}
5. treatment of experiences
Setting the stock ratio p for the experience of RE type, the experience of DE type, and the experience of SE type RE 、p DE 、p SE
In the training process, classifying the generated experience according to the definition of the experience type, randomly screening the experience according to the experience storage ratio of the type, storing part of the experience in an experience pool, and discarding the rest of the experience; the quantitative relation of various types of experience in the experience pool RM' after the experience pool is stored into the mechanism adjustment is as follows:
|RM′|=p RE ×|RE|+p DE ×|DE|+p SE ×|SE|
wherein, | · | is the number of specified experiences in the experience pool;
and 4, step 4: training agent according to FRDDM-DQN algorithm
1. Training a Faster-R CNN model to identify a specified obstacle;
initializing a Faster-R CNN model through a pre-training model VGG 16;
setting an initial learning rate, a delay coefficient and a delay weight of the Faster-R CNN model;
acquiring an image containing an obstacle through an unmanned aerial vehicle, and marking the position of the obstacle and the type of the obstacle in the image;
training a Faster-R CNN model through an image containing an obstacle and corresponding labeling information;
obtaining a Faster-R CNN model for recognizing the obstacles after the training is finished;
2. training an agent based on the output of the Faster-R CNN model;
step 2.1: initializing relevant parameters
Setting a reward function r U Experience pool experience logging mechanism, experience logging ratio p = p SE :p DE :p RE
Initializing empirical pool capacity RM Capacity Attenuation coefficient gamma, maximum number of steps T of single screen e Maximum effective training step number T t Network update frequency C;
initial search rate epsilon and minimum search rate epsilon min A search rate reset period N and a search rate reset value epsilon reset
Initial learning rate alpha, segmented learning rate alpha 1234 ]Boundary of the segmental learning rate (n) 1 ,n 2 ,n 3 ,n 4 );
The decision network of the agent is divided into a prediction network and a target network; initializing a prediction network Q and a target network
Figure FDA0003807897270000062
Parameters theta and theta of -
Step 2.2: initializing a training scene;
initializing the starting point and the destination position of the unmanned aerial vehicle, and initializing the position of an obstacle;
number of steps t executed in a single screen e Effective training step number t t Reset to 0;
acquiring an initial state s' (t);
step 2.3: selecting action a according to state s' (t) U
Selecting a random number p from [0,1], and selecting an action according to the prediction network Q if p is larger than epsilon; otherwise, selecting random action;
step 2.4: performing action a U Then, the prize r is acquired U And new states '(t + 1) and obtains the empirical RM = (s' (t), a) generated at the current time step U ,r U ,s′(t+1));
Step 2.5: processing the experience RM;
storing the experience RM into an experience pool or discarding the experience RM according to an experience pool storing mechanism;
step 2.6: updating the learning rate:
the learning rate adopts a segmented fixed learning rate, and the learning rate is updated according to an adjustment strategy;
step 2.7: updating the exploration rate;
updating the exploration rate according to the exploration rate updating strategy;
step 2.8: performing network optimization;
if the network optimization is not executed, the step 2.9 is carried out; otherwise, executing network optimization:
randomly sampling m groups of experiences from an experience pool;
if the experience is an end experience, the predicted Q value y = r of the target network U (ii) a If the experience is not the end of experience, the predicted Q value of the target network is set as
Figure FDA0003807897270000061
Calculation of loss L (θ) = E (y-Q (s (t), a) U (t),θ));
Optimizing a parameter theta of the prediction network according to the loss value L (theta) through a gradient descent algorithm;
overlaying the target network with the parameters of the prediction network every C significant steps, i.e. theta - =θ;
Step 2.9: update status s '(t) ← s' (t + 1), t e ←t e +1;
Step 2.10: judging a training state;
if the number of valid steps t t ≥T t If so, ending the training and storing the agent at the moment; otherwise, whether the unmanned aerial vehicle reaches the destination at the current time step or not, whether the unmanned aerial vehicle collides or not, whether the unmanned aerial vehicle is out of bounds or reaches the maximum single-screen step number T is continuously judged e If yes, the current screen is ended, and the step 2.2 is carried out; otherwise, turning to the step 2.3;
and 5: and (5) controlling the unmanned aerial vehicle to perform autonomous image navigation and obstacle avoidance through the intelligent agent stored in the step (4).
2. The unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on the improved reinforcement learning of claim 1, wherein: in the step 2: during the execution of a task, x image 、y image 、d c Is a fixed value; χ' is a constant.
3. The unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on the improved reinforcement learning of claim 1, wherein: e in step 3 for describing whether the agent reaches the destination g The values and meanings are as follows:
Figure FDA0003807897270000071
4. the unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on the reinforcement learning improvement as claimed in claim 1, wherein: the experience pool logging mechanism is described as follows:
Figure FDA0003807897270000072
Figure FDA0003807897270000081
/>
5. the unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on the improved reinforcement learning of claim 1, wherein: the adjustment strategy updating learning rate is as follows:
inputting: current total number of valid steps t t Whether to begin training the identification, isTraining And (3) outputting: learning rate alpha Initialization: fractional learning rate [ alpha ] 1234 ]Boundary of the segmental learning rate (n) 1 ,n 2 ,n 3 ,n 4 ) if isTraining and t t ≤n 1 do α←α 1 else if isTraining and n 1 <t t ≤n 2 do α←α 2 else if isTraining and n 2 <t t ≤n 3 do α←α 3 else if isTraining and n 3 <t t ≤n 4 do α←α 4 end if Returnα
6. The unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on the improved reinforcement learning of claim 1, wherein: the exploration rate updating strategy is as follows:
Figure FDA0003807897270000082
/>
Figure FDA0003807897270000091
/>
CN202211002222.8A 2022-08-21 2022-08-21 Unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning Pending CN115903880A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211002222.8A CN115903880A (en) 2022-08-21 2022-08-21 Unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211002222.8A CN115903880A (en) 2022-08-21 2022-08-21 Unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning

Publications (1)

Publication Number Publication Date
CN115903880A true CN115903880A (en) 2023-04-04

Family

ID=86479292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211002222.8A Pending CN115903880A (en) 2022-08-21 2022-08-21 Unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning

Country Status (1)

Country Link
CN (1) CN115903880A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116412831A (en) * 2023-06-12 2023-07-11 中国电子科技集团公司信息科学研究院 Multi-unmanned aerial vehicle dynamic obstacle avoidance route planning method for recall and anti-dive
CN116449874A (en) * 2023-06-13 2023-07-18 北京瀚科智翔科技发展有限公司 Modularized unmanned control refitting kit of piloted plane and construction method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116412831A (en) * 2023-06-12 2023-07-11 中国电子科技集团公司信息科学研究院 Multi-unmanned aerial vehicle dynamic obstacle avoidance route planning method for recall and anti-dive
CN116412831B (en) * 2023-06-12 2023-09-19 中国电子科技集团公司信息科学研究院 Multi-unmanned aerial vehicle dynamic obstacle avoidance route planning method for recall and anti-dive
CN116449874A (en) * 2023-06-13 2023-07-18 北京瀚科智翔科技发展有限公司 Modularized unmanned control refitting kit of piloted plane and construction method
CN116449874B (en) * 2023-06-13 2023-08-18 北京瀚科智翔科技发展有限公司 Modularized unmanned control refitting kit of piloted plane and construction method

Similar Documents

Publication Publication Date Title
EP3319016B1 (en) Control systems using deep reinforcement learning
Ross et al. Learning monocular reactive uav control in cluttered natural environments
EP3405845B1 (en) Object-focused active three-dimensional reconstruction
CN115903880A (en) Unmanned aerial vehicle autonomous image navigation and obstacle avoidance method based on improved reinforcement learning
Lee et al. Deep learning-based monocular obstacle avoidance for unmanned aerial vehicle navigation in tree plantations: Faster region-based convolutional neural network approach
US11972339B2 (en) Controlling a robot based on free-form natural language input
CN112650237B (en) Ship path planning method and device based on clustering processing and artificial potential field
US11561544B2 (en) Indoor monocular navigation method based on cross-sensor transfer learning and system thereof
US20210276598A1 (en) Machine-learning based system for path and/or motion planning and method of training the same
Back et al. Autonomous UAV trail navigation with obstacle avoidance using deep neural networks
Sales et al. Adaptive finite state machine based visual autonomous navigation system
KR102313115B1 (en) Autonomous flying drone using artificial intelligence neural network
CN109238288A (en) Autonomous navigation method in a kind of unmanned plane room
Garcia et al. A convolutional neural network vision system approach to indoor autonomous quadrotor navigation
CN113848984A (en) Unmanned aerial vehicle cluster control method and system
CN116627154B (en) Unmanned aerial vehicle guiding landing method based on pose prediction and track optimization and unmanned aerial vehicle
CN115686052A (en) Unmanned aerial vehicle obstacle avoidance path planning method and device, computer equipment and storage medium
Arreola et al. Object recognition and tracking using Haar-like Features Cascade Classifiers: Application to a quad-rotor UAV
Zhang et al. P-CAP: Pre-computed alternative paths to enable aggressive aerial maneuvers in cluttered environments
KR102372687B1 (en) Learning method and learning device for heterogeneous sensor fusion by using merging network which learns non-maximum suppression
CN114554040A (en) Object tracking system including stereo camera assembly and method of use
Xue et al. Multi-agent deep reinforcement learning for UAVs navigation in unknown complex environment
Olejnik et al. A tailless flapping wing mav performing monocular visual servoing tasks
Fei et al. Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environments
CN116385909A (en) Unmanned aerial vehicle target tracking method based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination