CN114326821A - Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning - Google Patents

Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning Download PDF

Info

Publication number
CN114326821A
CN114326821A CN202210195266.0A CN202210195266A CN114326821A CN 114326821 A CN114326821 A CN 114326821A CN 202210195266 A CN202210195266 A CN 202210195266A CN 114326821 A CN114326821 A CN 114326821A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
obstacle avoidance
reinforcement learning
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210195266.0A
Other languages
Chinese (zh)
Other versions
CN114326821B (en
Inventor
王钦辉
陈志龙
魏军儒
何昌其
王云宪
焦萍
闫茜茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARMY COMMAND INST CPLA
Original Assignee
ARMY COMMAND INST CPLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARMY COMMAND INST CPLA filed Critical ARMY COMMAND INST CPLA
Priority to CN202210195266.0A priority Critical patent/CN114326821B/en
Publication of CN114326821A publication Critical patent/CN114326821A/en
Application granted granted Critical
Publication of CN114326821B publication Critical patent/CN114326821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses an unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning, wherein training and decision are separated through a novel system architecture, so that training time consumption can be greatly reduced, and the decision timeliness of an aircraft is improved; the autonomous obstacle avoidance method adopts a depth reinforcement learning model based on strategy iteration, takes an original RGB image shot by an unmanned aerial vehicle monocular camera as training data, does not need other 3D information such as complex point cloud, trains the original RGB image through a complete convolution neural network to obtain depth image information, analyzes and predicts the image through a reinforcement learning method based on strategy iteration, and pre-judges the flight action of the unmanned aerial vehicle at the next moment in advance to realize autonomous obstacle avoidance. Compared with the existing typical value iteration-based method, the obstacle avoidance method provided by the invention has the advantages of higher training time consumption, lower time consumption, capability of flexibly and autonomously avoiding obstacles, and suitability for autonomous obstacle avoidance scenes with high requirements such as automatic substation inspection, unmanned aerial vehicle cruising and the like.

Description

Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning
Technical Field
The invention relates to an unmanned aerial vehicle obstacle avoidance system and a method, in particular to an unmanned aerial vehicle autonomous obstacle avoidance system and a method based on deep reinforcement learning; belong to unmanned aerial vehicle flight control technical field.
Background
Obstacle avoidance is one of the core problems of unmanned aerial vehicles, and aims to enable the unmanned aerial vehicle to autonomously explore an unknown environment so as to avoid collision with other objects, so that a flight path capable of avoiding threats to safely reach a target is obtained. The traditional obstacle avoidance technology is to plan a path by detecting a traversable space and obstacles, and data information used by the traditional obstacle avoidance technology is captured by an RGB-D camera, light detection, a distance measurement sensor (LIDAR), even a sonar and the like. These traditional obstacle avoidance techniques can be better applicable to the autonomic obstacle avoidance of ground robot, but have great degree of difficulty when using in the autonomic obstacle avoidance of unmanned aerial vehicle class aerial vehicle. The specific expression is that the ranging sensor can only capture limited information, and for unmanned aerial vehicles, the unmanned aerial vehicles are too heavy and consume electricity, and are expensive. In contrast, monocular cameras capture rich information about the environment, are low cost, lightweight, and are suitable for a variety of platforms. However, when distance perception is captured by a monocular camera (i.e., RGB images), the 3-D world is flattened into a 2-D image, eliminating the direct correspondence between pixels and distances, and the obstacle avoidance problem becomes extremely difficult.
With the wide application of deep learning in robot and computer vision, the application of deep learning to obstacle avoidance path planning is becoming more and more popular. The prior art uses Convolutional Neural Network (CNN) training methods to enable aircraft to cruise in complex forest environments. Some techniques label track types by training a convolutional neural network using 3D point cloud data. These methods can be divided into two categories, namely supervised learning and semi-supervised learning, wherein the former requires a lot of manpower for type marking, and the latter learning strategy is limited to some extent by the label generation strategy.
The Deep Reinforcement Learning (DRL) method has been proved recently, and can realize the appearance of superman in the game on the basis of fully utilizing the original image. Therefore, in recent years, people are concerned about realizing vision-based autonomous obstacle avoidance by using DRL research, and one common point of the work is that data of model training is not an original image. Some use laser scanners and depth image data for network training, and some propose training the network entirely in a 3D CAD model simulator to predict collisions. While these efforts may extend the trained network to the real world, significant computing resources are still required to generate and train a large data set. Based on the above reasons, it is necessary to provide a more practical and convenient unmanned aerial vehicle autonomous obstacle avoidance technology.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide the unmanned aerial vehicle autonomous obstacle avoidance system and method based on the deep reinforcement learning, and the unmanned aerial vehicle autonomous obstacle avoidance system and method can realize flexible and efficient autonomous obstacle avoidance through an original RGB image acquired by a monocular camera.
In order to achieve the above object, the present invention adopts the following technical solutions:
the invention firstly discloses an unmanned aerial vehicle autonomous obstacle avoidance system based on deep reinforcement learning, which comprises:
the server is used for finishing data training and calculation;
the base station is connected with the server;
the aircraft is communicated with the base station, receives the training result of the server fed back by the base station and makes a flight decision;
the server comprises a local server and a cloud server which are connected through the Internet.
Preferably, the aforesaid aircraft is a drone, equipped with a monocular camera for taking raw RGB images.
The invention also discloses an obstacle avoidance method of the unmanned aerial vehicle autonomous obstacle avoidance system based on the deep reinforcement learning, which comprises the following steps:
s1, acquiring an original RGB image acquired by the unmanned aerial vehicle monocular camera;
s2, training the original RGB image by adopting a complete convolution neural network to obtain depth information;
s3, training the depth image by adopting a reinforcement learning method based on an iterative method based on the preset discrete unmanned aerial vehicle flight action (described by linear velocity and angular velocity);
s4, the server obtains the flight action pre-taken by the unmanned aerial vehicle: and feeding back the linear velocity and the angular velocity to the unmanned aerial vehicle, and selecting flight actions based on the linear velocity and the angular velocity by the unmanned aerial vehicle to realize autonomous obstacle avoidance.
Preferably, the specific process of the foregoing step S2 is: acquiring the weighted sum of pixel values in an observation area, outputting a characteristic value by using a nonlinear activation function after convolution operation, wherein the nonlinear activation function is preferably a sigmoid function:
Figure 124047DEST_PATH_IMAGE001
(ii) a Specifically, an FCNN complete convolutional neural network learning mode is adopted for depth information perception, the system receives an input image with any size, and the deconvolution layer is adopted for up-sampling the feature map of the last convolution layer to restore the feature map to the same size of the input image, so that a prediction is generated for each pixel to obtain depth image information.
More preferably, the operation of each phase of the aforementioned FCNN comprises the following three steps: convolution, nonlinear activation, pooling.
Still preferably, the policy-based reinforcement learning in the step S3 directly iterates the policy, and uses a function
Figure 135866DEST_PATH_IMAGE002
To approximately represent the strategy in which,
Figure 162727DEST_PATH_IMAGE003
the state of the unmanned aerial vehicle is represented, and the state description can be represented by a multi-dimensional vector, including the flight state, the flight position, the environment information (environment image) and the like of the unmanned aerial vehicle;
Figure 816563DEST_PATH_IMAGE004
representing the action of the unmanned aerial vehicle, including the flight angular velocity and the flight speed;
Figure 245270DEST_PATH_IMAGE005
the representation contains adjustable parameters
Figure 693569DEST_PATH_IMAGE006
Function of, using parameters
Figure 473306DEST_PATH_IMAGE006
Approximating the obtained strategy;
Figure 415985DEST_PATH_IMAGE007
indicating a state
Figure 699199DEST_PATH_IMAGE008
Take action down
Figure 583979DEST_PATH_IMAGE009
The probability of (d); the goal of the algorithm is to maximize the expected yield of the strategy
Figure 585433DEST_PATH_IMAGE010
Wherein
Figure 581070DEST_PATH_IMAGE011
Is shown in the current state
Figure 718791DEST_PATH_IMAGE012
Under performed the action
Figure 774471DEST_PATH_IMAGE013
The resulting reward.
Still preferably, in the foregoing step, the parameter is derived from the expected profit
Figure 263222DEST_PATH_IMAGE006
The updating calculation method comprises the following steps:
Figure 46239DEST_PATH_IMAGE014
wherein
Figure 304045DEST_PATH_IMAGE015
Is a differential operator. Based on such a concept, the Actor-criticic method adds a value function to evaluate the selected action on the basis of a direct iteration of the strategy. Actor stands for the policy structure in the algorithm, which is used for action selection, and Critic stands for a value function, which evaluates the action selected by Actor.
More preferably, in the foregoing step S3, the method of using the cut proxy when updating the Actor network maximizes
Figure 265047DEST_PATH_IMAGE016
Wherein
Figure 37831DEST_PATH_IMAGE006
Is a parameter of the Actor function,
Figure 578534DEST_PATH_IMAGE017
and
Figure 956426DEST_PATH_IMAGE018
respectively representing the old strategy and the new strategy; the first half part of the formula is gradient updating, the Actor modifies the new strategy according to the potential on the basis of the old strategy, and if the potential is larger, the modification amplitude is large, so that the new strategy is more likely to occur; the latter half of the above equation contains a penalty term, i.e., KL divergence, using a parameter
Figure 822751DEST_PATH_IMAGE019
An influence factor representing a divergence term; if the difference between the old strategy and the new strategy is large, the KL divergence is also large, which is not favorable for convergence.
Further preferably, the method for cutting the proxy comprises: note the book
Figure 99142DEST_PATH_IMAGE020
Proxy object is noted as
Figure 912378DEST_PATH_IMAGE021
Clipping proxy objects limits proxy changesAn amplitude; the final optimization objective becomes:
Figure 207093DEST_PATH_IMAGE022
wherein
Figure 447581DEST_PATH_IMAGE023
A function representing the clipping function is represented,
Figure 460537DEST_PATH_IMAGE024
represents a tuning parameter; minimization of Critic update
Figure 77463DEST_PATH_IMAGE025
Wherein, in the step (A),
Figure 226684DEST_PATH_IMAGE026
the parameters of the criticic function are represented, the updating of the criticic network is not different from the general Actor-criticic framework, and the error of the dominance function is minimized;
Figure 903653DEST_PATH_IMAGE027
representing a function of the state values of the band parameters.
The invention has the advantages that:
(1) according to the obstacle avoidance system based on deep reinforcement learning, training and decision are separated through a novel system architecture, so that training time consumption can be greatly reduced, and aircraft decision timeliness is improved;
(2) the unmanned aerial vehicle autonomous obstacle avoidance method is based on a depth reinforcement learning technology of strategy iteration (DPPO), an RGB image shot by an unmanned aerial vehicle monocular camera is used as an original image of training data to obtain depth image information, other 3D information such as complex point cloud is not needed, the image is analyzed and predicted through the depth reinforcement learning method, the flying speed and the flying angle which are needed to be adopted at the next moment are judged in advance, and autonomous obstacle avoidance is achieved. The experiment comparison of a plurality of methods shows that the training performance of the method based on strategy iteration is more efficient, the time consumption is lower, and the training performance is more efficient than that of a typical value-based iteration method DQN;
(3) the method is based on the obstacle avoidance learning model of the depth reinforcement learning, is different from simple flight control actions, the aircraft action output by the model is more flexible, the discrete angular velocity and linear velocity can be set randomly for selection of the aircraft, the efficient and flexible obstacle avoidance can be realized, and the method is suitable for autonomous obstacle avoidance scenes with high requirements such as automatic substation inspection, unmanned aerial vehicle cruising and the like.
Drawings
FIG. 1 is a block flow diagram of an obstacle avoidance method of the present invention;
FIG. 2 is a graph comparing the performance of the method of the invention (DPPO) with a value-based iterative model (DQN) of the prior art;
fig. 3 is a graph comparing the training time consumption of the method of the present invention and the A3C, DQN method of the prior art.
Detailed Description
The autonomous obstacle avoidance system is based on the obstacle avoidance problem of monocular vision, an unmanned aerial vehicle observes the state of the environment through a monocular camera, the state is fed back to a server through a base station, data training and calculation are completed, then the data are sent to the unmanned aerial vehicle, and the unmanned aerial vehicle correspondingly selects the action, the linear velocity and the angular velocity according to the result to complete autonomous obstacle avoidance.
For a better understanding and appreciation of the invention, reference will now be made in detail to the following description taken in conjunction with the accompanying drawings and specific examples.
Example 1
In this embodiment, the discount coefficient is set to 0.95, the learning rate is set to 0.0001, and the original image size is set to 84 × 84. The instantaneous reward function is defined as
Figure 387593DEST_PATH_IMAGE028
Where is the time of each training cycle, set to 0.5 seconds. The reward is intended to run the robot as fast as possible and is penalized by a simple in-place rotation. If a collision is detected, the training epicode is immediately terminated with a penalty of-10. Otherwise, the epicode will continue to the maximum number of steps, which is set to 500, with no penalty. In this embodiment, the original 10000 pictures are learned。
The embodiment is an unmanned aerial vehicle autonomous obstacle avoidance method based on deep reinforcement learning, and an obstacle avoidance process of the unmanned aerial vehicle autonomous obstacle avoidance method is shown in fig. 1, and the method specifically comprises the following steps:
and S1, acquiring the original RGB image acquired by the unmanned aerial vehicle monocular camera.
And S2, training the original RGB image by adopting a complete convolution neural network method to obtain depth information.
The specific process of the step is as follows: acquiring the weighted sum of pixel values in an observation area, and outputting a characteristic value by adopting a nonlinear activation function after convolution operation; specifically, a FCNN complete convolution neural network learning manner is adopted for deep information sensing, and the operation at each stage of the FCNN includes the following three steps: convolution, nonlinear activation, pooling. The system receives an input image with any size, and the deconvolution layer is adopted to perform upsampling on the feature map of the last convolution layer so as to restore the feature map to the same size of the input image, thereby generating a prediction for each pixel and obtaining depth image information.
And S3, training the depth image by adopting a reinforcement learning method based on an iterative method based on the preset discrete unmanned aerial vehicle flight action, linear velocity and angular velocity.
In this step, the strategy is directly iterated by adopting reinforced learning based on the strategy, and a function is used
Figure 542631DEST_PATH_IMAGE002
To approximately represent the strategy in which,
Figure 811938DEST_PATH_IMAGE003
the state of the unmanned aerial vehicle is represented, and the state description can be represented by a multi-dimensional vector, including the flight state, the flight position, the environment information (environment image) and the like of the unmanned aerial vehicle;
Figure 394229DEST_PATH_IMAGE004
representing the action of the unmanned aerial vehicle, including the flight angular velocity and the flight speed;
Figure 116198DEST_PATH_IMAGE005
the representation contains adjustable parameters
Figure 74926DEST_PATH_IMAGE006
Function of, using parameters
Figure 933161DEST_PATH_IMAGE006
Approximating the obtained strategy;
Figure 764982DEST_PATH_IMAGE007
indicating a state
Figure 177509DEST_PATH_IMAGE008
Take action down
Figure 471087DEST_PATH_IMAGE009
The probability of (d); the goal of the algorithm is to maximize the expected yield of the strategy
Figure 387090DEST_PATH_IMAGE010
Wherein
Figure 639080DEST_PATH_IMAGE011
Is shown in the current state
Figure 273324DEST_PATH_IMAGE012
Under performed the action
Figure 636172DEST_PATH_IMAGE013
The resulting reward.
Wherein the parameters are derived from the expected yield
Figure 406682DEST_PATH_IMAGE006
The updating calculation method comprises the following steps:
Figure 96418DEST_PATH_IMAGE014
Figure 217958DEST_PATH_IMAGE015
is a differential operator. Based on such a concept, ActThe or-Critic method adds a value function to evaluate the selected action based on a direct iteration of the strategy. Actor stands for the policy structure in the algorithm, which is used for action selection, and Critic stands for a value function, which evaluates the action selected by Actor.
When updating the Actor network, adopting a method of shearing an agent to maximize
Figure 118918DEST_PATH_IMAGE016
Wherein
Figure 9514DEST_PATH_IMAGE006
Is a parameter of the Actor function,
Figure 603306DEST_PATH_IMAGE017
and
Figure 946563DEST_PATH_IMAGE018
respectively representing the old strategy and the new strategy; the first half of the above formula is the gradient update, and Actor is in old strategy, according to potential
Figure 916793DEST_PATH_IMAGE029
Modifying the new strategy, wherein if the potential is larger, the modification amplitude is large, so that the new strategy is more likely to occur; the latter half of the above formula contains a penalty term, i.e. KL divergence, and the influence factor of the divergence term is represented by a parameter; if the difference between the old strategy and the new strategy is large, the KL divergence is also large, which is not favorable for convergence.
Specifically, the method for cutting the proxy comprises the following steps: note the book
Figure 661895DEST_PATH_IMAGE020
Proxy object is noted as
Figure 177321DEST_PATH_IMAGE021
Clipping proxy objects limits the magnitude of change of the proxy; the final optimization objective becomes:
Figure 7874DEST_PATH_IMAGE022
wherein
Figure 250636DEST_PATH_IMAGE023
A function representing the clipping function is represented,
Figure 912562DEST_PATH_IMAGE030
represents a tuning parameter; minimization of Critic update
Figure 51419DEST_PATH_IMAGE025
Wherein, in the step (A),
Figure 166006DEST_PATH_IMAGE026
the parameters of the criticic function are represented, the updating of the criticic network is not different from the general Actor-criticic framework, and the error of the dominance function is minimized;
Figure 150142DEST_PATH_IMAGE027
representing a function of the state values of the band parameters.
S4, the server obtains the flight action pre-taken by the unmanned aerial vehicle: and feeding back the linear velocity and the angular velocity to the unmanned aerial vehicle, and selecting flight actions based on the linear velocity and the angular velocity by the unmanned aerial vehicle to realize autonomous obstacle avoidance.
The training efficiency and performance of the DPPO model of the present application was compared with the value iteration based approach (DQN) of the prior art, with the results shown in fig. 2, where the abscissa is the learned episde, the maximum number of steps per episde is 500, and the ordinate is the average reward obtained by the robot. As can be seen from the figure, the method of the present invention exhibits good performance; however, the DQN model in the prior art does not perform well, which the applicant analyzed may be due to: for the obstacle avoidance problem, overestimation of the Q value is not a problem that can be alleviated by more exploration, and the longer the training time is, the more serious it may be, further hindering the DQN from obtaining high performance. Fig. 3 is a graph comparing the time consumption of the method of the present invention with that of two methods in the prior art, and the ordinate represents unit time, and it can be seen from the graph that the training time consumption of the DPPO-based method proposed by the present invention is more efficient than the existing A3C method and the typical value-based iterative method DQN. Therefore, the unmanned aerial vehicle autonomous obstacle avoidance method based on DPPO deep learning obtains depth information by training the collected RGB images, analyzes and predicts the images by the reinforcement learning method, and pre-judges the flight speed and the flight angle to be taken at the next moment in advance, so that autonomous obstacle avoidance is realized, and the unmanned aerial vehicle autonomous obstacle avoidance method has a good application prospect.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It should be understood by those skilled in the art that the above embodiments do not limit the present invention in any way, and all technical solutions obtained by using equivalent alternatives or equivalent variations fall within the scope of the present invention.

Claims (9)

1. Unmanned aerial vehicle independently keeps away barrier system based on degree of depth reinforcement study, its characterized in that includes:
the server is used for finishing data training and calculation;
the base station is in communication connection with the server;
the aircraft is communicated with the base station, receives the training result of the server fed back by the base station and makes a flight decision;
the server comprises a local server and a cloud server which are connected through the Internet.
2. The unmanned aerial vehicle autonomous obstacle avoidance system based on depth reinforcement learning of claim 1, wherein the aircraft is an unmanned aerial vehicle equipped with a monocular camera for capturing original RGB images.
3. The obstacle avoidance method of the unmanned aerial vehicle autonomous obstacle avoidance system based on the deep reinforcement learning of claim 1 is characterized by comprising the following steps:
s1, acquiring an original RGB image acquired by the unmanned aerial vehicle monocular camera;
s2, training the original RGB image by adopting a complete convolution neural network to obtain depth information;
s3, unmanned aerial vehicle flying action based on preset dispersion: training the depth image by adopting a reinforcement learning method based on a strategy iteration method to obtain the optimal flight action to be taken by the unmanned aerial vehicle at the next moment;
s4, the server obtains the flight action pre-taken by the unmanned aerial vehicle: and feeding back the linear velocity and the angular velocity to the unmanned aerial vehicle, and selecting flight actions based on the linear velocity and the angular velocity by the unmanned aerial vehicle to realize autonomous obstacle avoidance.
4. The unmanned aerial vehicle autonomous obstacle avoidance method based on deep reinforcement learning of claim 3, wherein the specific process of step S2 is as follows: acquiring the weighted sum of pixel values in an observation area, and outputting a characteristic value by adopting a nonlinear activation function after convolution operation; specifically, a complete convolutional neural network (FCNN) learning mode is adopted for depth information perception, the system receives an input image with any size, an deconvolution layer is adopted for up-sampling a feature map of the last convolution layer, the feature map is restored to the same size as the input image, and therefore a prediction is generated for each pixel to obtain depth image information.
5. The unmanned aerial vehicle autonomous obstacle avoidance method based on deep reinforcement learning according to claim 4, wherein the operation of each phase of the FCNN comprises the following three steps: convolution, nonlinear activation, pooling.
6. The unmanned aerial vehicle autonomous obstacle avoidance method based on deep reinforcement learning of claim 3, wherein the reinforcement learning in step S3 directly iterates the strategy by using a function
Figure 616815DEST_PATH_IMAGE001
To approximately represent the strategy in which,
Figure 707000DEST_PATH_IMAGE002
the state of the unmanned aerial vehicle is represented, the state description is represented by a multi-dimensional vector and comprises the flight state, the flight position and the environment information of the unmanned aerial vehicle;
Figure 512145DEST_PATH_IMAGE003
representing the action of the unmanned aerial vehicle, including the flight angular velocity and the flight linear velocity;
Figure 362289DEST_PATH_IMAGE004
the representation contains adjustable parameters
Figure 936490DEST_PATH_IMAGE005
Function of, using parameters
Figure 213888DEST_PATH_IMAGE005
Approximating the obtained strategy;
Figure 240749DEST_PATH_IMAGE006
indicating a state
Figure 894585DEST_PATH_IMAGE007
Take action down
Figure 323292DEST_PATH_IMAGE008
The probability of (d); the goal of the algorithm is to maximize the expected yield of the strategy
Figure 522323DEST_PATH_IMAGE009
Wherein
Figure 302060DEST_PATH_IMAGE010
Is shown in the current state
Figure 494007DEST_PATH_IMAGE011
Under performed the action
Figure 777221DEST_PATH_IMAGE012
The resulting reward.
7. The unmanned aerial vehicle autonomous obstacle avoidance method based on deep reinforcement learning of claim 6, wherein parameters are obtained from expected profit
Figure 662001DEST_PATH_IMAGE005
The updating calculation method comprises the following steps:
Figure 663455DEST_PATH_IMAGE013
wherein
Figure 393513DEST_PATH_IMAGE014
Is a differential operator.
8. The unmanned aerial vehicle autonomous obstacle avoidance method based on deep reinforcement learning of claim 6, wherein in step S3, a method of using a cut proxy is adopted when an Actor network is updated, so as to maximize the capability of the Actor network
Figure 796813DEST_PATH_IMAGE015
Wherein
Figure 836182DEST_PATH_IMAGE005
Is a parameter of the Actor function,
Figure 324932DEST_PATH_IMAGE016
and
Figure 124261DEST_PATH_IMAGE017
respectively representing the old strategy and the new strategy; the first half of the above formula is the gradient update, and Actor is in old strategy, according to potential
Figure 382067DEST_PATH_IMAGE018
Modifying the new strategy, wherein if the potential is larger, the modification amplitude is large, so that the new strategy is more likely to occur; the latter half of the above equation contains a penalty term, i.e., KL divergence, using a parameter
Figure 343069DEST_PATH_IMAGE019
An influence factor representing a divergence term; if the difference between the old strategy and the new strategy is large, the KL divergence is also large, which is not favorable for convergence.
9. The unmanned aerial vehicle autonomous obstacle avoidance method based on deep reinforcement learning of claim 6, wherein the method of shearing the agent is as follows: note the book
Figure 115853DEST_PATH_IMAGE020
Proxy object is noted as
Figure 656556DEST_PATH_IMAGE021
Clipping proxy objects limits the magnitude of change of the proxy; the final optimization objective becomes:
Figure 768869DEST_PATH_IMAGE022
wherein, in the step (A),
Figure 651505DEST_PATH_IMAGE023
a function representing the clipping function is represented,
Figure 911585DEST_PATH_IMAGE024
which is indicative of the adjustment parameter(s),
Figure 990400DEST_PATH_IMAGE018
represents a potential;
minimization of Critic update
Figure 285115DEST_PATH_IMAGE025
Wherein, in the step (A),
Figure 525603DEST_PATH_IMAGE026
the parameters representing the Critic function are,
Figure 538559DEST_PATH_IMAGE027
represents a function of the state values with parameters, T represents a time period,
Figure 155485DEST_PATH_IMAGE028
indicating the current time of day
Figure 553974DEST_PATH_IMAGE029
The search time at which the search is started is a variable parameter.
CN202210195266.0A 2022-03-02 2022-03-02 Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning Active CN114326821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210195266.0A CN114326821B (en) 2022-03-02 2022-03-02 Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210195266.0A CN114326821B (en) 2022-03-02 2022-03-02 Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114326821A true CN114326821A (en) 2022-04-12
CN114326821B CN114326821B (en) 2022-06-03

Family

ID=81031485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210195266.0A Active CN114326821B (en) 2022-03-02 2022-03-02 Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114326821B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117707204A (en) * 2024-01-30 2024-03-15 清华大学 Unmanned aerial vehicle high-speed obstacle avoidance system and method based on photoelectric end-to-end network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107553490A (en) * 2017-09-08 2018-01-09 深圳市唯特视科技有限公司 A kind of monocular vision barrier-avoiding method based on deep learning
CN112034887A (en) * 2020-09-10 2020-12-04 南京大学 Optimal path training method for unmanned aerial vehicle to avoid cylindrical barrier to reach target point
CN112766499A (en) * 2021-02-02 2021-05-07 电子科技大学 Method for realizing autonomous flight of unmanned aerial vehicle through reinforcement learning technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107553490A (en) * 2017-09-08 2018-01-09 深圳市唯特视科技有限公司 A kind of monocular vision barrier-avoiding method based on deep learning
CN112034887A (en) * 2020-09-10 2020-12-04 南京大学 Optimal path training method for unmanned aerial vehicle to avoid cylindrical barrier to reach target point
CN112766499A (en) * 2021-02-02 2021-05-07 电子科技大学 Method for realizing autonomous flight of unmanned aerial vehicle through reinforcement learning technology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张香竹 等: "基于深度学习的无人机单目视觉避障算法", 《华南理工大学学报(自然科学版)》 *
贾俊良 等: "基于深度学习的无人机自主避障方法研究", 《机电产品开发与创新》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117707204A (en) * 2024-01-30 2024-03-15 清华大学 Unmanned aerial vehicle high-speed obstacle avoidance system and method based on photoelectric end-to-end network

Also Published As

Publication number Publication date
CN114326821B (en) 2022-06-03

Similar Documents

Publication Publication Date Title
US20230043931A1 (en) Multi-Task Multi-Sensor Fusion for Three-Dimensional Object Detection
US10845815B2 (en) Systems, methods and controllers for an autonomous vehicle that implement autonomous driver agents and driving policy learners for generating and improving policies based on collective driving experiences of the autonomous driver agents
EP3405845B1 (en) Object-focused active three-dimensional reconstruction
CN110007675B (en) Vehicle automatic driving decision-making system based on driving situation map and training set preparation method based on unmanned aerial vehicle
US20200033869A1 (en) Systems, methods and controllers that implement autonomous driver agents and a policy server for serving policies to autonomous driver agents for controlling an autonomous vehicle
CN111428765B (en) Target detection method based on global convolution and local depth convolution fusion
WO2022100107A1 (en) Methods and systems for predicting dynamic object behavior
CN111340868B (en) Unmanned underwater vehicle autonomous decision control method based on visual depth estimation
CN107481292A (en) The attitude error method of estimation and device of vehicle-mounted camera
CN111176309B (en) Multi-unmanned aerial vehicle self-group mutual inductance understanding method based on spherical imaging
CN111967373B (en) Self-adaptive enhanced fusion real-time instance segmentation method based on camera and laser radar
CN114326821B (en) Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning
CN110222822B (en) Construction method of black box prediction model internal characteristic causal graph
Sun et al. RobNet: real-time road-object 3D point cloud segmentation based on SqueezeNet and cyclic CRF
CN111611869B (en) End-to-end monocular vision obstacle avoidance method based on serial deep neural network
WO2023155903A1 (en) Systems and methods for generating road surface semantic segmentation map from sequence of point clouds
CN116820131A (en) Unmanned aerial vehicle tracking method based on target perception ViT
Rahmania et al. Exploration of the impact of kernel size for yolov5-based object detection on quadcopter
CN114326826A (en) Multi-unmanned aerial vehicle formation transformation method and system
Zheng et al. Policy-based monocular vision autonomous quadrotor obstacle avoidance method
Wen et al. A Hybrid Technique for Active SLAM Based on RPPO Model with Transfer Learning
KR102399047B1 (en) Method and system for visual properties estimation in autonomous driving
CN116863430B (en) Point cloud fusion method for automatic driving
CN111695403B (en) Depth perception convolutional neural network-based 2D and 3D image synchronous detection method
CN116902003B (en) Unmanned method based on laser radar and camera mixed mode

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant