CN115469663A - End-to-end navigation obstacle avoidance method facing automatic driving and based on deep reinforcement learning - Google Patents
End-to-end navigation obstacle avoidance method facing automatic driving and based on deep reinforcement learning Download PDFInfo
- Publication number
- CN115469663A CN115469663A CN202211119904.7A CN202211119904A CN115469663A CN 115469663 A CN115469663 A CN 115469663A CN 202211119904 A CN202211119904 A CN 202211119904A CN 115469663 A CN115469663 A CN 115469663A
- Authority
- CN
- China
- Prior art keywords
- agent
- environment
- model
- obstacle
- vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000002787 reinforcement Effects 0.000 title claims abstract description 18
- 230000009471 action Effects 0.000 claims abstract description 38
- 238000013528 artificial neural network Methods 0.000 claims abstract description 35
- 230000003068 static effect Effects 0.000 claims abstract description 33
- 230000006870 function Effects 0.000 claims abstract description 29
- 238000004088 simulation Methods 0.000 claims abstract description 27
- 230000008569 process Effects 0.000 claims abstract description 18
- 230000004888 barrier function Effects 0.000 claims abstract description 12
- 230000007613 environmental effect Effects 0.000 claims abstract description 6
- 238000006243 chemical reaction Methods 0.000 claims abstract description 3
- 230000003993 interaction Effects 0.000 claims abstract description 3
- 238000012549 training Methods 0.000 claims description 35
- 230000006399 behavior Effects 0.000 claims description 33
- 238000012360 testing method Methods 0.000 claims description 8
- 230000001133 acceleration Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 4
- 231100001261 hazardous Toxicity 0.000 claims description 2
- 238000012544 monitoring process Methods 0.000 claims description 2
- 230000014759 maintenance of location Effects 0.000 claims 1
- 230000007246 mechanism Effects 0.000 claims 1
- 210000005036 nerve Anatomy 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 230000008447 perception Effects 0.000 description 3
- 230000000750 progressive effect Effects 0.000 description 3
- 238000007493 shaping process Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012854 evaluation process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0214—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention relates to an end-to-end navigation obstacle avoidance method facing automatic driving and based on deep reinforcement learning, which adopts a laser radar to sample a laser radar point cloud to generate an environment one-dimensional laser point cloud, and converts the environment one-dimensional laser point cloud through a cost map conversion algorithm to obtain an obstacle map representing an environment dynamic static obstacle; collecting an environmental barrier map by performing closed-loop interaction with a simulation environment, thereby generating a navigation environment data set; constructing a heterogeneous intelligent agent self-game obstacle avoidance model, wherein the heterogeneous multi-intelligent agent self-game model comprises the following steps: observation space, action space, reward function and neural network; performing multi-stage parallel course learning to enable the neural network to achieve the local optimal solution faster and better, and accelerating the learning process to finally obtain a trained heterogeneous intelligent agent self-game obstacle avoidance model; and deploying the trained self-game obstacle avoidance model of the heterogeneous intelligent body to an actual vehicle, and realizing navigation and obstacle avoidance in the real world.
Description
Technical Field
The invention relates to an end-to-end navigation obstacle avoidance method based on deep reinforcement learning and oriented to automatic driving, and belongs to the technical field of automatic driving.
Background
In recent years, with the wide application of automatic driving, navigation and obstacle avoidance methods are in endless. The Deep Reinforcement Learning (DRL) method is applied to intelligent navigation and achieves better results, wherein the automatic driving vehicle is an Ackermann dynamics model. However, developing a method of collision avoidance for Ackermann steered vehicles based on deep reinforcement learning in unknown moving obstacle scenarios remains challenging. Trained networks need not only generate collision-free trajectories that satisfy Ackermann kinematic constraints, but also handle multiple moving obstacles in various environments. This makes deep reinforcement learning necessary to search feasible collision-free and dynamically constrained trajectories in a larger search space. Ackermann steered vehicles effectively find collision-free and kinematically feasible paths to targets in a variety of environments with static and moving obstacles, which is one of the major challenges of autodrive automobiles [1]. In an autonomous driving scenario, common static and dynamic obstacles mainly include static scene elements, dynamic vehicles, pedestrians, bicycles, and the like. The problem of implementing autopilot in static and dynamic barrier scenarios can be described as a multi-agent navigation and obstacle avoidance problem under a heterogeneous dynamics model.
Although many approaches to collision avoidance have been proposed, they have some common limitations in practice. For example, the assumptions of this approach may not apply to all environments [2].
Furthermore, collision avoidance for multi-agents with different shapes and different kinematic constraints in distributed and non-communicative scenarios is a more challenging task. Existing methods, such as optimal mutual collision avoidance (ORCA) [3] and bicycle mutual collision avoidance (B-ORCA) [4], provide sufficient conditions for agents to avoid mutual collisions. However, they require that the agent use the same kinematic constraints. Generalized mutual collision avoidance extends the approach to agents with different kinematic constraints, such as differential drive and Ackermann steering.
To overcome these limitations, deep Reinforcement Learning (DRL) methods have been applied to avoid collision problems and achieve better results. However, common deep reinforcement learning approaches rarely consider multi-agent collision avoidance with different kinematic constraints, which limits their applications. For example, in a complex scene with both automobiles and pedestrians, the traditional deep reinforcement learning method cannot well realize better navigation.
[1]Dong Y,Zhang Y,Ai J.Experimental test of unmanned ground vehicle delivering goods using RRT path planning algorithm[J].Unmanned Systems,2017,5(01):45-57.
[2]Zhang W,Wei S,Teng Y,et al.Dynamic obstacle avoidance for unmanned underwater vehicles based on an improved velocity obstacle method[J].Sensors,2017,17(12):2742.
[3]Berg J,Guy S J,Lin M,et al.Reciprocal n-body collision avoidance[M]//Robotics research.Springer,Berlin,Heidelberg,2011:3-19.
[4]Alonso-Mora J,Breitenmoser A,Beardsley P,et al.Reciprocal collision avoidance for multiple car-like robots[C]//2012IEEE International Conference on Robotics and Automation.IEEE,2012:360-366.
Disclosure of Invention
The invention solves the problems: the method overcomes the defects of the prior art, reduces the consumption of computing resources, improves the reasoning speed of the automatic driving navigation, enhances the obstacle avoidance capability of the automatic driving automobile in a complex multi-agent scene, and has higher arrival rate and safety in dealing with pedestrian scenes which are difficult to predict. The test achieves 98% of task completion rate in an actual scene.
The technical scheme of the invention is as follows: an end-to-end navigation obstacle avoidance method facing automatic driving and based on deep reinforcement learning comprises the following implementation steps:
step 1: sampling the laser radar point cloud by adopting a laser radar to generate an environment one-dimensional laser point cloud, and converting the environment one-dimensional laser point cloud through a cost map conversion algorithm to finally obtain an obstacle map representing environment dynamic static obstacles; secondly, continuously collecting an environmental barrier map by performing closed-loop interaction with a simulation environment, thereby generating a navigation environment data set, wherein the data set comprises a training set and a test set;
step 2: constructing a heterogeneous intelligent agent self-game obstacle avoidance model, wherein the heterogeneous multi-intelligent agent self-game model comprises the following steps: observation space, action space, reward function and neural network;
the observation space is a complete description of the state of the world, the world has no other information except the state, and a certain value of the observation space represents the observation data of the current agent;
an action space defining a range of actions that an agent can take;
reward function, which restrains and guides the behavior of the intelligent body to realize safe navigation and sets different reward functions r for the pedestrian and the vehicle respectively r And r v Wherein a reward modeling method is used while at r v Introduction of r in the formula w And r d Reward punishment as warning and hazard zones;
the neural network is used for learning the navigation strategy;
the neural network is combined with a reward function to jointly plan a target behavior, and the target behavior is contained in an action space; inputting a target behavior planned by the neural network into the intelligent agent, executing the behavior command by the intelligent agent, and outputting an obstacle map under the current environment, wherein the obstacle map is contained in an observation space, and the behavior executed by the intelligent agent and the obtained obstacle map are input into the neural network again to form closed-loop training;
the method comprises the steps that learning of an obstacle avoidance strategy in a simulation environment is achieved by combining an observation space, an action space, a reward function and a neural network, the simulation environment is abstract description of a real world, wherein obstacles comprise an ackerman model-based vehicle intelligent body and a differential model-based pedestrian intelligent body, the action space of an ackerman vehicle is described as an expected driving track of the ackerman vehicle according to a behavior strategy mode of a dynamic obstacle in the real environment, and the action space of a pedestrian is described as the action direction and the speed of the pedestrian; based on the simulation environment and the heterogeneous intelligent agent self-game obstacle avoidance model, finally realizing intelligent agent game learning between Ackermann vehicles, between Ackermann vehicles and differential pedestrians and between differential pedestrians and differential pedestrians by using the same network structure and parameters, so as to realize flexible obstacle avoidance of the multi-intelligent agent in a dynamic environment;
and 3, step 3: based on the training set in the step 1, performing multi-stage parallel course learning on the heterogeneous intelligent body self-game obstacle avoidance model constructed in the step 2, so that a neural network can reach local optimal solution more quickly and better, and meanwhile, the learning process is accelerated, and finally the trained heterogeneous intelligent body self-game obstacle avoidance model is obtained;
and 4, step 4: and deploying the trained self-game obstacle avoidance model of the heterogeneous intelligent body to an actual vehicle, and realizing navigation and obstacle avoidance in the real world.
In the heterogeneous multi-agent self-gaming model constructed in the step 2:
observation space: the intelligent agent monitoring system comprises three parts, namely an obstacle map of the environment where the intelligent agent is located, the current self state and a target point; the obstacle map is specified by a self-centered local grid map; the obstacle map represents environmental information surrounding the agent, including the shape of the agent and the appearance of observable obstacles;
an action space: firstly, a continuous action space is set for a pedestrian intelligent body based on a differential model, namely v is set t ∈[0,0.6]And ω t ∈[-0.9,0.9]Wherein v is t For the speed range assumed at time t, 0,0.6 represents the maximum and minimum speed of the agent, 0m/s 0.6m/s, respectively, in units: m/s, omega t The unit of the rotation angle range adopted for the time t: rad; -0.9,0.9 respectively representing the rotation angle range of the agent; secondly, setting a continuous action space for the vehicle intelligent agent based on the Ackerman model, namely setting c t ∈[-1.43,1.43]And delta t ∈[-11.25,11.25]Wherein c is t Target track curvature, δ, for the desired travel of the agent at time t t Acceleration corresponding to the track expected to be driven by the intelligent agent at the moment t;
the reward function: different reward functions r are set for pedestrians and vehicles respectively r And r v Wherein a reward modeling method is used while at r v Introduction of r in the formula w And r d As a reward and punishment system of the warning and danger area, the following is specifically mentioned:
reward function r of pedestrian r :
Reward function r of a vehicle v Wherein r is g ,r c ,r s Consistent with the reward function of the pedestrian:
r v =r g +r c +r s +r w +r d
formula r r In, r arr >0,p t Representing the position of the agent at the current time t, ε is a hyperparameter of the neural network, r g Representing rewards for reaching the goal and penalties for leaving the goal; r is col <0 and r c Represents a penalty for a collision; finally, a negative penalty, r, is applied over the entire learning period s <0 to encourage shortest paths;
formula r v In, r warn ,r danger <0,r w Indicating a penalty when there is an obstacle in the warning area of the vehicle, r d Representing a penalty when there is an obstacle within the hazardous area of the vehicle; in an implementation, r is set arr =500、ε=10、r col =-500、r s =-5、r warn = -20 and r danger =-10;
A neural network: the output of a strategy network in the neural network is respectively described as the behaviors of a vehicle intelligent agent based on an Ackerman model and a pedestrian intelligent agent based on a differential model, so that the self game learning of the heterogeneous intelligent agents is realized, wherein the input is the observation data of each heterogeneous intelligent agent, including the barrier map, the current self state and the target point of the intelligent agent; and modifying the output layers of the last layers in the neural network structure, so that the neural network can simultaneously output the behavior strategies of the heterogeneous intelligent agents. Wherein the modified output layer obtains linear velocity and angular velocity as the action direction and action speed of the pedestrian and the curvature and acceleration of the expected running track of the vehicle using Gaussian distribution sampling.
The representation mode enables a set of network to realize the behavior prediction of different heterogeneous agents and the behavior prediction of a plurality of agents. Higher navigation performance is obtained after the neural network training is finished, meanwhile, the pedestrian and vehicle game process in a real environment is simulated as much as possible, and the safety of a navigation strategy is enhanced.
Compared with the prior art, the invention has the advantages that:
(1) The invention can realize the rapid and collision-free navigation task under the scenes of the mobile obstacles of a plurality of heterogeneous dynamic models and a plurality of types of static obstacles, and can cover most scenes in practical application by utilizing an end-to-end deep reinforcement learning method to collect data and iterate strategies for thousands of times in a simulation environment. Compared with the traditional navigation algorithm, the method has the advantages of higher training speed and higher reasoning speed. The navigation obstacle avoidance algorithm combined with the deep reinforcement learning algorithm can exert the advantage of autonomous learning and the advantage of high learning efficiency in a simulation environment, and an optimal navigation obstacle avoidance strategy can be rapidly learned.
(2) The self-adaptive capacity of the automatic driving vehicle in the scene can be improved by combining self-game cooperative training. Compared with the traditional navigation strategy, the dynamic obstacle behavior strategy and the static obstacle shape characteristics in the environment are simple, and the robustness of the learned strategy and the ability of adapting to a new scene are poor. The invention can adapt to various simple to complex static scenes, various dynamic scenes and mixed scenes after the training process is finished. In order to realize the improvement of the self-adaptive capacity, the invention enables the intelligent agents to mutually improve the obstacle avoidance capacity and the adaptive capacity through self-gaming cooperative training under various dynamic models and various static obstacle scenes by a heterogeneous multi-intelligent-agent self-gaming method.
(3) According to the invention, the track of the automatic driving vehicle and the action mode of the pedestrian are output through the deep reinforcement learning algorithm, and the deployment efficiency and the obstacle avoidance capability of the automatic driving vehicle in the actual deployment process are improved. Compared with the traditional method for realizing action behavior output by utilizing the bottom control command of the vehicle, the high-level track output can better guarantee the safety and the navigation capability of the vehicle. The track strategy is introduced into the self game of the heterogeneous multi-agent, the jitter problem is reduced, and meanwhile, the track output contains time sequence information, so that possible future dangers can be prefabricated in advance and avoided in time.
(4) Because the dynamic environment of a plurality of intelligent agents of the complex heterogeneous dynamics model is too complex, the model can not be converged when the complex heterogeneous dynamics model is directly studied, so that a plurality of intelligent agents are placed in a plurality of sets of environments with progressive difficulty for study, and multi-environment parallel study and simple to complex course study are realized; compared with single-environment serial training, the multi-environment parallel acceleration learning speed is increased, and the training difficulty is reduced in multi-environment progressive course learning.
Drawings
FIG. 1 is a flow chart of a method implementation of the present invention;
FIG. 2 is a diagram of a heterogeneous agent self-gaming obstacle avoidance neural network structure in the present invention;
FIG. 3 is a multi-stage parallel curriculum learning of the present invention: the left graph is parallel course learning in a random static scene, and the right graph is parallel course learning in a circular scene;
FIG. 4 is a simulated training environment: a left graph random training environment and a right graph circular training environment;
FIG. 5 is a physical vehicle deployment and test chart.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
The automatic driving scene is complicated and changeable, the vehicle can realize high-efficiency and safe navigation and obstacle avoidance in the complicated and changeable scene, dynamic obstacles and various static obstacles of different dynamic models can be flexibly sensed by a navigation strategy, and a reasonable collision-free track is planned. The invention provides an end-to-end navigation obstacle avoidance method facing automatic driving and based on deep reinforcement learning, which is used for collision avoidance of Ackerman steering vehicles, and the vehicles can process a plurality of moving obstacles with heterogeneous dynamics and a plurality of types of static obstacles. Specifically, a convolutional neural network is introduced that maps the vehicle's observations to kinematically feasible trajectories and trains the network in a simulated environment where the moving obstacle strategy is the same as the ackerman steering vehicle strategy. Specified reward shaping strategies and multi-stage parallel course learning strategies are introduced to accelerate and stabilize the learning process. And then, deploying the trained model to the actual vehicle to realize navigation and obstacle avoidance in the real world. The method is used for evaluating various scenes in a simulation and a real world, so that the vehicle can avoid various moving obstacles with high success rate.
Finding a trajectory that is collision free and meets the constraints of a dynamic model in a complex autonomous driving scenario is a challenging task. The existing method generally senses and models the environment through a sensing module, and realizes navigation in a complex scene by generating a navigation map required by navigation and combining a navigation obstacle avoidance algorithm. Existing methods generally assume that surrounding dynamic obstacles move according to a specific behavior strategy, for example, a dynamic ackerman vehicle uses a bicycle response collision algorithm (B-ORCA) and a pedestrian uses an Optimal Response Collision Algorithm (ORCA), and the behavior strategy of the dynamic obstacles in an actual automatic driving scene is complicated and variable, so that the dynamic scene simulated by using a traditional algorithm has high repeatability and needs to be further explored to improve the performance of the dynamic scene.
As shown in fig. 1, the method of the present invention comprises the following steps:
step 1: and constructing a self-game obstacle avoidance model of the heterogeneous intelligent body, interactively collecting data with the simulation environment module, preprocessing the data, converting the data into input data required by the self-game obstacle avoidance model of the heterogeneous intelligent body, and using the data in the subsequent model training and evaluation process. The simulation environment is a strategy-level simulator for automatic driving design, is customized based on OpenCV, expands the self-developed simulation environment, and adds pedestrians and vehicles which can be driven by a neural network. The simulation environment is shown in fig. 4, the left diagram is a simulation environment example diagram of a scene with various static obstacles and various dynamic obstacles, wherein the square is the simulation representation of an automatic driving vehicle, the circle is the representation mode of a pedestrian, and the attributes such as the sizes of the pedestrian and the vehicle can be adjusted through the simulation environment setting. The corresponding static obstacle is square, rectangular or round, and the size of the static obstacle can be adjusted by adjusting the size. The right diagram is a dynamic barrier scenario for use in a simulation environment with progressive difficulty in subsequent steps, with the relevant elements described above.
And 2, step: the dynamic models of different agents are modeled into a heterogeneous agent self game model, the action behaviors of the agents with different dynamic models are output by using the same network, and meanwhile, when a plurality of agents exist in the environment, the network can predict the behaviors of all the agents at the same time.
Heterogeneous agent self-gaming module: fig. 2 shows a heterogeneous agent self-gaming obstacle avoidance model structure. The heterogeneous multi-agent self-gaming module mainly comprises: observation space, action space, reward function, and neural network structure. The observation space is responsible for carrying out complete perception and preprocessing on the environment, and the action space defines the effective control command types and the size and the range of the control quantity of the heterogeneous multi-agent. The reward function is used to define a quantitative assessment formula for how well the heterogeneous agent takes action. The neural network structure is used for learning navigation and obstacle avoidance strategies in complex and changeable scenes. The contents involved in the heterogeneous agent self-gaming model are respectively described as follows:
observation space: the observation consists of three parts, namely an obstacle map of the environment where the intelligent agent is located, the current self state and a target point; the obstacle map is specified by a self-centered local grid map; as shown in fig. 4, the obstacle map represents environmental information around the agent, including the shape of the agent and the appearance of observable obstacles; the obstacle map is constructed from cost maps generated from various sensor data (e.g., output of a 2D laser, a 3D laser, or a depth camera).
An action space: the self-game process of heterogeneous agents is realized by respectively setting different behavior strategies for the agents, and as shown in figure 2, continuous actions are firstly implemented for pedestrian agents based on a differential model, namely v is set t ∈[0,0.6]And ω t ∈[-0.9,0.9]Secondly, implementing a continuous action for the vehicle agent based on the Ackerman model, i.e. setting c t ∈[-1.43,1.43]And delta t ∈[-11.25,11.25]Through the design of the action space, the heterogeneous intelligent agent can use the same set of network to learn and train.
The reward function: the self-game of the heterogeneous intelligent agent is realized by performing behavior modeling on the vehicles and the pedestrians in a complex dynamic scene, and meanwhile, in order to realize safe navigation, the navigation strategy aims to reduce the collision times of the vehicles and the pedestrians to the maximum extent and reduce the collision-free arrival time of the intelligent agent to the maximum extent. In the invention, different Reward functions r are respectively set for pedestrians and vehicles by introducing a special judgment area and combining a Reward Shaping strategy r And r v . Reward function r of pedestrian r :
Reward function r of pedestrian r :
r r =r g +r c +r s ,
Reward function r of vehicle v Wherein r is g ,r c ,r s Consistent with the reward function of the pedestrian:
r v =r g +r c +r s +r w +r d
a neural network: the convolutional network structure for the navigation strategy is shown in fig. 2. The input to the network is the observed data of the agent. And the network output is sampled by Gaussian distribution, the weighted average of the actions is finally calculated, and the weighted average is finally output as the behavior strategy of the agent, wherein the pedestrian agent based on the differential model is controlled by using linear velocity and angular velocity, and the vehicle agent based on the Ackerman model is controlled by using the track.
And step 3: multi-stage curriculum learning and parallel learning are combined to accelerate the training and convergence speed of the model in the simulation environment. The Learning process can be accelerated by multi-stage course Learning and parallel Learning, and the course Learning (Curriculum Learning) is proved to be an effective strategy, which can help to find a better local minimum and accelerate the training process. The main idea of curriculum learning is to decompose a hard learning task into several simple tasks, and the difficulty is gradually increased from the simplest task. In the invention, the learning efficiency of the scene difficulty university in the static scene and the dynamic scene is low, so that the parallel learning acceleration and the training guiding process are introduced, and simultaneously, the difficulty increasing relationship exists between the static scene and the dynamic scene, so that a multi-stage course learning strategy is introduced between the static scene and the dynamic scene to guide the intelligent agent to learn, a set of complete multi-stage parallel course learning strategy is formed, and the network can be trained more efficiently. And evaluating and testing the model after the model is converged, and verifying the effectiveness of the model. And finally, deploying the target model into the real world to complete the target navigation task.
In the work, a scene with heterogeneous multi-agent is built through a simulation environment and simultaneously comprises static barriers, and the vehicle agent based on the Ackerman model, the pedestrian agent based on the differential model and the barriers are trained together, so that the capability of the vehicle agent based on the Ackerman model to cope with the complex scene is improved. FIG. 4 shows two examples of training scenarios within a simulation scenario, FIG. 3 multi-stage parallel lesson learning: the left graph is parallel course learning in a random static scene, and the right graph is parallel course learning in a circular scene. This will help the ackermann model-based vehicle agent to avoid random obstacles (also called random scenarios) by first performing parallel course learning within the first example random obstacle scenario. Parallel course learning is carried out in the second example circular scene through the trained model, and therefore the obstacle avoidance capability of the complex dynamic obstacle scene is improved by the vehicle intelligent body vehicle based on the Ackerman model. Firstly, self game training is carried out in a random scene, so that a strategy network can cope with a complex static obstacle scene and a simple dynamic pedestrian scene. In particular, the environment in the stochastic scenario contains 3 ackermann-model-based vehicle agents, 3 differential-model-based pedestrian agents and 5 static obstacles, which randomly select the location of the obstacles, and the starting and target locations of the ackermann-model-based vehicle agents and the differential-model-based pedestrian agents are also random within a certain range. Secondly, both ackermann-model-based vehicle agents and differential-model-based pedestrian agents in the environment will be driven by the policy network. There are multiple heterogeneous agents in an environment that share the same navigation strategy, and they also need to avoid conflicts with each other, which helps the ability of the network to learn multi-agent obstacle avoidance. Meanwhile, a Reward Shaping strategy is combined, so that the vehicle intelligent agent based on the Ackerman model can effectively and actively avoid other dynamic and static obstacles and keep a certain safety distance. Whereas an environment in a circular scene contains 3 ackerman model-based vehicle agents and 3 differential model-based pedestrian agents, this places ackerman model-based vehicle agents and differential model-based pedestrian agents randomly on circles of random radius. The vehicle intelligent agent based on the Ackerman model and the pedestrian intelligent agent based on the differential model can effectively learn navigation in a complex dynamic scene by training in a circular scene.
Meanwhile, in the training process, as the scene complexity is increased, a multi-stage parallel course learning framework is introduced. Therefore, the training process of the heterogeneous intelligent agent self game obstacle avoidance model is accelerated, the learning direction of the vehicle intelligent agent based on the Ackerman model under the complex dynamic scene is effectively guided, and the training speed is improved.
As shown in fig. 3, the simulated training environment: the left graph is a random training environment, and the right graph is a circular training environment. The training environment that the left side diagram was shown is random training environment, the right side is the observation data of each agent in the environment in the left side diagram, the left side includes the vehicle agent of a plurality of initial positions random based on ackermann model and the pedestrian agent based on differential model, include the static barrier of arbitrary shape and a plurality of dynamic barrier in the environment simultaneously, every agent in the simulation environment all is driven by neural network, each agent can obtain the obstacle avoidance ability under the stronger static scene through long-time self-gaming learning process at last. The training environment that the right side graph was shown is circular training environment, the right side is the observation data of each agent in the environment in the right side graph, the left side includes that a plurality of initial positions are at random in appointed circular interval vehicle agent based on ackermann model and pedestrian's agent based on differential model, do not contain any static barrier in the simulation environment simultaneously, every agent in the simulation environment is driven by neural network, each agent can obtain the obstacle avoidance ability under the stronger dynamic scene through long-time self-gaming learning process at last.
The method deploys the strategy model in the simulation environment on the entity Ackerman model vehicle platform, thereby testing the obstacle avoidance capability of the complex scene in the real environment.
As shown in FIG. 5, the physical vehicle deployment and test are carried out, the Ackerman model vehicle is based on a Songling mobile intelligent body chassis, carries a Tengling 16-line laser radar for environment perception, and simultaneously comprises an RTK-GPS inertial navigation system for positioning. The vehicle is provided with an industrial personal computer which is loaded with RTX 2080Ti as a core computing unit. The method is mainly used for generating environment perception data and operating a trained strategy network. A random pedestrian scene and a circular scene under a training scene are respectively simulated in a real environment, and meanwhile, an entity test is carried out under a static dynamic scene.
Under a complex static scene, a static dynamic combination scene and a dynamic pedestrian scene, the Ackerman model vehicle can reach a target terminal point quickly and in a collision-free manner, and a navigation task is completed safely. The depth reinforcement learning algorithm based on the map and the self game can effectively improve the obstacle avoidance capability of the Ackerman model vehicle, so that the Ackerman model vehicle can perform well in complex static and dynamic scenes. Meanwhile, deployment of the entity can be realized more quickly by using the track as a strategy behavior, and the effect is improved by fine-tuning simple parameters.
The method is used for navigation and obstacle avoidance in a complex automatic driving scene. The observation space is used to express the environment information around the agent. Meanwhile, dynamics and shape feature modeling is carried out on the heterogeneous intelligent bodies, the behavior strategies of the heterogeneous intelligent bodies are trained through the neural network, and the complex dynamic scenes are explored through the self-game process, so that the obstacle avoidance capability of the Ackerman vehicle for responding to the complex dynamic scenes is improved. The method of the present invention is evaluated under a variety of scenarios, both simulated environments and real scenarios. The result shows that the Ackerman model vehicle can better cope with navigation in complex static and complex dynamic pedestrian scenes by the method.
Although particular embodiments of the present invention have been described above, it will be appreciated by those skilled in the art that these are merely examples and that many variations or modifications may be made to these embodiments without departing from the principles and implementations of the invention, the scope of which is therefore defined by the appended claims.
Claims (2)
1. An end-to-end navigation obstacle avoidance method facing automatic driving and based on deep reinforcement learning is characterized by comprising the following implementation steps:
step 1: sampling the laser radar point cloud by adopting a laser radar, generating an environment one-dimensional laser point cloud, and converting the environment one-dimensional laser point cloud through a cost map conversion algorithm to finally obtain an obstacle map representing environment dynamic static obstacles; performing closed-loop interaction with a simulation environment, continuously collecting an environmental barrier map, and generating a navigation environment data set, wherein the data set comprises a training set and a test set;
step 2: constructing a heterogeneous intelligent agent self-game obstacle avoidance model, wherein the heterogeneous multi-intelligent agent self-game model comprises the following steps: observation space, action space, reward function and neural network;
an observation space representing observation data of a current agent;
an action space that defines the scope of actions taken by the agent;
reward function for restricting and guiding the behavior of intelligent body to realize safe navigation, and different reward functions r are set for pedestrian and vehicle r And r v Wherein a reward modeling method is used while at r v Introduction of r in the formula w And r d Reward punishment as warning and hazard zones;
a neural network for learning of a navigation strategy;
the neural network is combined with a reward function to jointly plan a target behavior, and the target behavior is contained in an action space; inputting a target behavior planned by the neural network into the intelligent agent, executing the behavior command by the intelligent agent, and outputting an obstacle map under the current environment, wherein the obstacle map is contained in an observation space, and the behavior executed by the intelligent agent and the obtained obstacle map are input into the neural network again to form closed-loop training;
the method comprises the steps that learning of an obstacle avoidance strategy in a simulation environment is achieved by combining an observation space, an action space, a reward function and a neural network, the simulation environment is abstract description of a real world, wherein obstacles comprise an ackerman model-based vehicle intelligent body and a differential model-based pedestrian intelligent body, the action space of an ackerman vehicle is described as an expected driving track of the ackerman vehicle according to a behavior strategy mode of a dynamic obstacle in the real environment, and the action space of a pedestrian is described as the action direction and the speed of the pedestrian; based on the simulation environment and the heterogeneous intelligent agent self-game obstacle avoidance model, the same network structure and parameters are finally adopted to realize intelligent agent game learning between Ackermann vehicles, between Ackermann vehicles and differential pedestrians and between differential pedestrians and differential pedestrians, so that the multi-intelligent agent flexibly avoids obstacles in a dynamic environment;
and step 3: based on the training set in the step 1, performing multi-stage parallel course learning on the heterogeneous intelligent body self-game obstacle avoidance model constructed in the step 2, so that a neural network can reach local optimal solution more quickly and better, and meanwhile, the learning process is accelerated, and finally the trained heterogeneous intelligent body self-game obstacle avoidance model is obtained;
and 4, step 4: and deploying the trained self-game obstacle avoidance model of the heterogeneous intelligent body to an actual vehicle, and realizing navigation and obstacle avoidance in the real world.
2. The end-to-end navigation obstacle avoidance method facing automatic driving and based on deep reinforcement learning as recited in claim 1, wherein: in the heterogeneous multi-agent self-gaming model constructed in the step 2:
observation space: the intelligent agent monitoring system comprises three parts, namely an obstacle map of the environment where the intelligent agent is located, the current self state and a target point; the obstacle map is specified by a self-centered local grid map; the obstacle map represents environmental information surrounding the agent, including the shape of the agent and the appearance of observable obstacles;
an action space: firstly, a continuous action space is set for a pedestrian intelligent body based on a differential model, namely v is set t ∈[0,0.6]And ω t ∈[-0.9,0.9]Wherein v is t For the speed range taken at time t, 0,0.6 represents the maximum and minimum speed of the agent 0m/s 0.6m/s, respectively, in units: m/s, ω t The unit of the rotation angle range adopted for the time t: rad; -0.9,0.9 respectively representing the range of rotation angles of the agent; secondly, setting a continuous action space for the vehicle intelligent body based on the Ackerman model, namely setting c t ∈[-1.43,1.43]And delta t ∈[-11.25,11.25]Wherein c is t Target track curvature, δ, for the desired travel of the agent at time t t Acceleration corresponding to the track expected to be driven by the intelligent agent at the moment t;
the reward function: is divided intoDifferent reward functions r are set for pedestrians and vehicles r And r v Wherein a reward modeling method is used while at r v Introduction of r in the formula w And r d As reward and punishment mechanisms of the warning and danger areas, the following are specific:
reward function r of pedestrian r :
r r =r g +r c +r s ,
Reward function r of vehicle v Wherein r is g ,r c ,r s Consistent with the reward function of the pedestrian:
r v =r g +r c +r s +r w +r d
formula r r In, r arr >0,p t Representing the position of the agent at the current time t, ε is a hyperparameter of the neural network, r g Representing rewards for reaching the goal and penalties for leaving the goal; r is col <0 and r c Representing a penalty for the collision; finally, a negative penalty, r, is applied over the entire learning period s <0 to encourage shortest paths;
formula r v In, r warn ,r danger <0,r w Indicating when the vehicle is within the warning zonePenalty in case of an obstacle, r d Representing a penalty when there is an obstacle within the hazardous area of the vehicle; in an implementation, r is set arr =500、ε=10、r col =-500、r s =-5、r warn = -20 and r danger =-10;
A neural network: the output of a strategy network in the neural network is respectively described as the behaviors of a vehicle intelligent body based on an Ackerman model and a pedestrian intelligent body based on a differential model, so as to further realize the self-game learning of the heterogeneous intelligent bodies, wherein the input is the observation data of each heterogeneous intelligent body, including the barrier map, the current self state and the target point of the intelligent body; and modifying the output layers of the last layers in the neural network structure, so that the neural network can simultaneously output the behavior strategies of the heterogeneous intelligent agents. Wherein the modified output layer obtains linear velocity and angular velocity as action direction and action speed of the pedestrian and curvature and acceleration of the expected running track of the vehicle by using Gaussian distribution sampling;
the representation mode enables a set of network to realize the behavior prediction of different heterogeneous agents and the behavior prediction of a plurality of agents; higher navigation performance is obtained after the training of the nerve retentions is finished, meanwhile, the pedestrian and vehicle game process in the real environment is simulated as much as possible, and the safety of the navigation strategy is enhanced.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211119904.7A CN115469663B (en) | 2022-09-15 | 2022-09-15 | End-to-end navigation obstacle avoidance method based on deep reinforcement learning and oriented to automatic driving |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211119904.7A CN115469663B (en) | 2022-09-15 | 2022-09-15 | End-to-end navigation obstacle avoidance method based on deep reinforcement learning and oriented to automatic driving |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115469663A true CN115469663A (en) | 2022-12-13 |
CN115469663B CN115469663B (en) | 2024-06-14 |
Family
ID=84333111
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211119904.7A Active CN115469663B (en) | 2022-09-15 | 2022-09-15 | End-to-end navigation obstacle avoidance method based on deep reinforcement learning and oriented to automatic driving |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115469663B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116501086A (en) * | 2023-04-27 | 2023-07-28 | 天津大学 | Aircraft autonomous avoidance decision method based on reinforcement learning |
CN116540602A (en) * | 2023-04-28 | 2023-08-04 | 金陵科技学院 | Vehicle unmanned method based on road section safety level DQN |
CN116755329A (en) * | 2023-05-12 | 2023-09-15 | 江南大学 | Multi-agent danger avoiding and escaping method and device based on deep reinforcement learning |
CN117227763A (en) * | 2023-11-10 | 2023-12-15 | 新石器慧通(北京)科技有限公司 | Automatic driving behavior decision method and device based on game theory and reinforcement learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102019120880A1 (en) * | 2018-08-03 | 2020-02-06 | Ford Global Technologies, Llc | END-TO-END-DEEP-GENERATIVE MODEL FOR SIMULTANEOUS LOCALIZATION AND IMAGE |
CN113341958A (en) * | 2021-05-21 | 2021-09-03 | 西北工业大学 | Multi-agent reinforcement learning movement planning method with mixed experience |
CN114626505A (en) * | 2022-03-04 | 2022-06-14 | 北京理工大学 | Mobile robot deep reinforcement learning control method |
-
2022
- 2022-09-15 CN CN202211119904.7A patent/CN115469663B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102019120880A1 (en) * | 2018-08-03 | 2020-02-06 | Ford Global Technologies, Llc | END-TO-END-DEEP-GENERATIVE MODEL FOR SIMULTANEOUS LOCALIZATION AND IMAGE |
CN113341958A (en) * | 2021-05-21 | 2021-09-03 | 西北工业大学 | Multi-agent reinforcement learning movement planning method with mixed experience |
CN114626505A (en) * | 2022-03-04 | 2022-06-14 | 北京理工大学 | Mobile robot deep reinforcement learning control method |
Non-Patent Citations (1)
Title |
---|
张海霞;李腆腆;李东阳;刘文杰;: "基于车辆行为分析的智能车联网关键技术研究", 电子与信息学报, no. 01, 15 January 2020 (2020-01-15) * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116501086A (en) * | 2023-04-27 | 2023-07-28 | 天津大学 | Aircraft autonomous avoidance decision method based on reinforcement learning |
CN116501086B (en) * | 2023-04-27 | 2024-03-26 | 天津大学 | Aircraft autonomous avoidance decision method based on reinforcement learning |
CN116540602A (en) * | 2023-04-28 | 2023-08-04 | 金陵科技学院 | Vehicle unmanned method based on road section safety level DQN |
CN116540602B (en) * | 2023-04-28 | 2024-02-23 | 金陵科技学院 | Vehicle unmanned method based on road section safety level DQN |
CN116755329A (en) * | 2023-05-12 | 2023-09-15 | 江南大学 | Multi-agent danger avoiding and escaping method and device based on deep reinforcement learning |
CN116755329B (en) * | 2023-05-12 | 2024-05-24 | 江南大学 | Multi-agent danger avoiding and escaping method and device based on deep reinforcement learning |
CN117227763A (en) * | 2023-11-10 | 2023-12-15 | 新石器慧通(北京)科技有限公司 | Automatic driving behavior decision method and device based on game theory and reinforcement learning |
CN117227763B (en) * | 2023-11-10 | 2024-02-20 | 新石器慧通(北京)科技有限公司 | Automatic driving behavior decision method and device based on game theory and reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN115469663B (en) | 2024-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110136481B (en) | Parking strategy based on deep reinforcement learning | |
CN115469663B (en) | End-to-end navigation obstacle avoidance method based on deep reinforcement learning and oriented to automatic driving | |
Zhang et al. | Human-like autonomous vehicle speed control by deep reinforcement learning with double Q-learning | |
Choi et al. | Deep reinforcement learning of navigation in a complex and crowded environment with a limited field of view | |
CN112433525A (en) | Mobile robot navigation method based on simulation learning and deep reinforcement learning | |
CN114846425A (en) | Prediction and planning of mobile robots | |
Mouhagir et al. | A markov decision process-based approach for trajectory planning with clothoid tentacles | |
Grando et al. | Deep reinforcement learning for mapless navigation of unmanned aerial vehicles | |
CN114792148A (en) | Method and device for predicting motion trail | |
Schütt et al. | An application of scenario exploration to find new scenarios for the development and testing of automated driving systems in urban scenarios | |
Khaitan et al. | State dropout-based curriculum reinforcement learning for self-driving at unsignalized intersections | |
Chen et al. | Deep reinforcement learning of map-based obstacle avoidance for mobile robot navigation | |
Elallid et al. | Deep Reinforcement Learning for Autonomous Vehicle Intersection Navigation | |
Fang et al. | Ontology-based reasoning approach for long-term behavior prediction of road users | |
Gutiérrez-Moreno et al. | Hybrid decision making for autonomous driving in complex urban scenarios | |
CN117553798A (en) | Safe navigation method, equipment and medium for mobile robot in complex crowd scene | |
CN117371895A (en) | Multi-ground unmanned vehicle path planning method, system and medium in unknown environment | |
Li et al. | DDPG-Based Path Planning Approach for Autonomous Driving | |
CN116127853A (en) | Unmanned driving overtaking decision method based on DDPG (distributed data base) with time sequence information fused | |
Cutrone et al. | A framework for identifying and simulating worst-case animal-vehicle interactions | |
Guo et al. | Modeling, learning and prediction of longitudinal behaviors of human-driven vehicles by incorporating internal human DecisionMaking process using inverse model predictive control | |
Zhao et al. | Fluids: A first-order local urban intersection driving simulator | |
Imam et al. | Autonomous driving system using proximal policy optimization in deep reinforcement learning | |
CN114167856A (en) | Service robot local path planning method based on artificial emotion | |
Ha et al. | Vehicle control with prediction model based Monte-Carlo tree search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |