WO2024087654A1 - 一种自动驾驶车辆导航控制方法及系统 - Google Patents

一种自动驾驶车辆导航控制方法及系统 Download PDF

Info

Publication number
WO2024087654A1
WO2024087654A1 PCT/CN2023/100154 CN2023100154W WO2024087654A1 WO 2024087654 A1 WO2024087654 A1 WO 2024087654A1 CN 2023100154 W CN2023100154 W CN 2023100154W WO 2024087654 A1 WO2024087654 A1 WO 2024087654A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
navigation
drl
path
vehicle control
Prior art date
Application number
PCT/CN2023/100154
Other languages
English (en)
French (fr)
Inventor
吴艳
高龙飞
王丽芳
苟晋芳
Original Assignee
中国科学院电工研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院电工研究所 filed Critical 中国科学院电工研究所
Publication of WO2024087654A1 publication Critical patent/WO2024087654A1/zh

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0257Control of position or course in two dimensions specially adapted to land vehicles using a radar
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the field of autonomous driving, and in particular to a navigation control method and system for an autonomous driving vehicle.
  • the deep reinforcement learning (DRL) method can directly convert road network data and various sensor information into decision values and control commands through unsupervised learning.
  • This feature makes the DRL method widely used in key scenarios such as autonomous driving path planning systems, behavioral decision systems, and navigation control systems.
  • the DRL vehicle navigation control system requires the intelligent agent to be able to directly output vehicle control instructions based on the data of the perception system; at the same time, the decision value of the intelligent agent is required to meet the requirements of vehicle dynamics and be able to respond to various situations in the operating scenario completely independently. It belongs to an end-to-end vehicle control method, in which the deep neural network acts as the "brain" of the intelligent vehicle and controls the vehicle movement according to the environmental state.
  • a reward and punishment mechanism is used to optimize the neural network. That is, the agent observes a multidimensional state S, and calculates an action value A based on the weights and biases of the neural network. After the agent performs this action, it will enter the next state S′. If S′ is a relatively ideal state, it will receive a "reward", otherwise it will receive a "punishment”. The appropriateness of the neural network parameters from S to A is judged based on the reward and punishment situation. If not, the parameters are adjusted based on the reward and punishment values. In the early stages of neural network training, since the mapping relationship from S to A is almost random, the probability of obtaining a "reward" during the operation of the agent is very small. This sparse reward largely causes the training of the neural network to be slow and difficult to converge to an ideal state.
  • the agent uses sensors to observe the environment and obtain a multi-dimensional state S.
  • Common sensors are cameras and lidar. Among them, the camera obtains plane data and cannot observe depth information; using a binocular camera can calculate depth information, but this method has a large error; laser Radar can directly observe accurate three-dimensional information of the environment. Whether it is the image data obtained by the camera or the point cloud information obtained by the lidar, the data volume is too large and needs to be preprocessed to minimize the dimension of the state S.
  • the neural network used for this task has many neuron parameters that need to be adjusted, so the training of the neural network is a very time-consuming task, usually requiring hundreds of thousands or even millions of trainings.
  • the optimization schemes at this stage are mainly divided into two categories.
  • the first category is to design different decision or control tasks by combining multiple reinforcement learning methods. For example, Y. Xiao et al.
  • L. Chen et al. also used a similar scheme. This method of combining neural networks is borrowed from hierarchical reinforcement learning. This processing method can often effectively improve the comprehensive performance of the DRL model, but the running speed will be slower.
  • the second category is to combine traditional control theory to correct the output of the decision value during the DRL training process. For example, Xie L et al.
  • the object of the present invention is to provide a method and system for controlling navigation of an autonomous driving vehicle, which can improve the accuracy of navigation control of an autonomous driving vehicle.
  • the present invention provides the following solutions:
  • a navigation control method for an autonomous driving vehicle comprising:
  • the vehicle and environment status data includes: point cloud data of obstacles within a set distance of the vehicle acquired by laser radar and position information of the vehicle acquired by inertial navigation system;
  • Navigation is performed using an optimized navigation model based on vehicle and environment status data; the optimization
  • the training process of the navigation model is as follows:
  • a navigation control algorithm is used to determine the vehicle control amount;
  • the vehicle control amount includes: throttle action value, steering action value and brake action value;
  • the navigation control algorithm includes: DWA algorithm;
  • an Actor-Critic type DRL decision network is used to construct a navigation model;
  • the Actor-Critic type DRL decision network includes: a DRLActor network and a DRL Critic network;
  • the DRLActor network is used to output a first vehicle control quantity according to the vehicle and environment status data and the vehicle control quantity of the optimal path;
  • the DRL Critic network is used to output the vehicle control quantity determined by the navigation control algorithm and the expected benefit corresponding to the first vehicle control quantity according to the first vehicle control quantity, the vehicle control quantity determined by the navigation control algorithm, and the vehicle and environment status data;
  • the final control quantity is determined according to the expected benefits corresponding to the two groups of vehicle control quantities, and is output;
  • the navigation model is optimized by using a reward and punishment mechanism to determine the optimized navigation model.
  • the acquiring of vehicle and environment status data specifically includes:
  • the point cloud data of the obstacle is reprojected and encoded to obtain a two-dimensional surround view image
  • the state matrix is used to represent the distance from each angle of the vehicle body to the obstacle;
  • the determining the vehicle control amount by using a navigation control algorithm according to the point cloud data of the obstacle specifically includes:
  • the path planning algorithm is used to determine multiple paths to be driven according to the current position and target position of the vehicle;
  • the vehicle control amount is determined using the path tracking algorithm.
  • optimizing the navigation model by using a reward and punishment mechanism to determine the optimized navigation model specifically includes:
  • r done is the reward value obtained for completing the task.
  • r done 0.
  • r over is the penalty value when a collision occurs or the distance from the global path exceeds the set value.
  • r over 0.
  • V angular_z is the z-axis angular velocity of the vehicle. ⁇ 1 , ⁇ 2 , ⁇ 3 , ⁇ 4 are proportional coefficients.
  • speed is the forward linear velocity of the vehicle.
  • dis l is the Euclidean distance between the vehicle and the path point with the smallest distance value from the current position of the vehicle in the global path.
  • dis a is the angle between the global path and the vehicle heading.
  • An automatic driving vehicle navigation control system comprising:
  • a data acquisition module is used to acquire vehicle and environment status data;
  • the vehicle and environment status data includes: point cloud data of obstacles within a set distance of the vehicle acquired by using a laser radar and position information of the vehicle acquired by using an inertial navigation system;
  • the navigation module is used to perform navigation using an optimized navigation model according to vehicle and environment status data; the training process of the optimized navigation model is as follows:
  • a navigation control algorithm is used to determine the vehicle control amount;
  • the navigation control algorithm includes: a local path planning algorithm determines the route to be driven, and then a path tracking algorithm is used to determine the vehicle control amount;
  • the vehicle control amount includes: throttle action value, steering action value and brake action value;
  • an Actor-Critic type DRL decision network is used to build a navigation model;
  • the Actor-Critic type DRL decision network includes: a DRL Actor network and a DRL Critic network;
  • the DRL Actor network is used to output a first vehicle control amount according to the vehicle and environment status data and the vehicle control amount of the optimal path;
  • the DRL Critic network is used to output a navigation control algorithm according to the first vehicle control amount, the vehicle control amount determined by the navigation control algorithm, and the vehicle and environment status data The vehicle control amount determined by the method and the expected benefits corresponding to the first vehicle control amount; determining the final control amount according to the expected benefits corresponding to the two sets of vehicle control amounts, and outputting it;
  • the navigation model is optimized by using a reward and punishment mechanism to determine the optimized navigation model.
  • the data acquisition module specifically includes:
  • a two-dimensional surround view image determination unit is used to reproject and encode the point cloud data of obstacles with the vehicle as the center of the cylindrical surface to obtain a two-dimensional surround view image;
  • a state matrix determination unit is used to perform lateral mean pooling and longitudinal maximum value pooling on the two-dimensional surround view image to obtain a 1*60 state matrix; the state matrix is used to represent the distance from each angle of the vehicle body to the obstacle;
  • the angle determination unit is used to obtain the global path of the vehicle, and obtain the path point with the smallest distance value from the current position of the vehicle in the global path and the angle with the vehicle heading.
  • the determining the vehicle control amount by using a navigation control algorithm according to the point cloud data of the obstacle specifically includes:
  • the path planning algorithm is used to determine multiple paths to be driven according to the current position and target position of the vehicle;
  • the vehicle control amount is determined using the path tracking algorithm.
  • optimizing the navigation model by using a reward and punishment mechanism to determine the optimized navigation model specifically includes:
  • the present invention discloses the following technical effects:
  • the present invention provides an autonomous driving vehicle navigation control method and system, which integrates a traditional vehicle navigation control algorithm with an Actor-Critic type DRL decision network. Since the parameter adjustment method (algorithm) and network structure of the DRL algorithm are not involved in the process of guiding the training of the Actor-Critic type DRL decision network by the traditional vehicle navigation control algorithm, the characteristics and mechanisms of various Actor-Critic types of DRL algorithms will not be affected. That is, the convergence speed of the neural network can be greatly improved while retaining the original DRL algorithm characteristics, while reducing the convergence difficulty, and at the same time improving the actual performance of the DRL model, thereby improving the accuracy of the autonomous driving vehicle navigation control.
  • FIG1 is a schematic flow chart of a navigation control method for an autonomous driving vehicle provided by the present invention.
  • FIG2 is a schematic diagram of the overall flow of an autonomous driving vehicle navigation control method provided by the present invention.
  • Figure 3 is a schematic diagram of the global path used by Global Path to guide vehicle travel
  • Figure 4 is a schematic diagram of the DRL Actor network structure
  • Figure 5 is a schematic diagram of the DRL Critic network structure.
  • the purpose of the present invention is to provide an autonomous driving vehicle navigation control method and system, which can improve the accuracy of autonomous driving navigation control in DRL (Actor-Critic type) mode, increase the convergence speed of neural network training, and reduce the difficulty of neural network training convergence.
  • DRL Vector-Critic type
  • FIG. 1 is a schematic flow chart of a navigation control method for an autonomous driving vehicle provided by the present invention
  • FIG. 2 is a schematic flow chart of an overall navigation control method for an autonomous driving vehicle provided by the present invention.
  • a navigation control method for an autonomous driving vehicle provided by the present invention includes:
  • the vehicle and environment status data includes: point cloud data of obstacles within a set distance of the vehicle obtained by using a laser radar and posture information of the vehicle obtained by using an inertial navigation system; the point cloud data of obstacles within a set distance of the vehicle output by the laser radar is filtered out by the ground to obtain a point cloud matrix ⁇ containing only obstacles;
  • S101 specifically includes:
  • the point cloud data of the obstacles are reprojected and encoded to obtain a two-dimensional surround view image with depth information.
  • the two-dimensional surround view image is subjected to horizontal mean pooling and vertical maximum pooling to obtain a 1*60 state matrix ⁇ d ; the state matrix is used to represent the distance from each angle of the vehicle body to the obstacle.
  • the global path is planned in advance.
  • the global path consists of multiple equidistant path points.
  • the position deviation between the vehicle and the global path is determined based on the path point closest to the vehicle in the global path and the current position of the vehicle.
  • the angle between the vehicle's current orientation and the direction of the global path is determined based on the orientation of the line connecting the second and third path points after the path point closest to the vehicle in the global path.
  • the global path is the route from the starting point to the end point.
  • the method of obtaining it is: use Unity or the corresponding high-precision map drawing software to draw a route map of a certain area; after obtaining the regional map, according to the position of the starting point and the end point, according to the principle of shortest distance or shortest driving time, obtain an optimal route, which is the global path.
  • the point closest to the current position of the vehicle is c 0 , which is the current waypoint.
  • the Euclidean distance between c 0 and the origin of the vehicle coordinate system is defined as dis l .
  • the path points 2m and 3m after c 0 are defined as c 2 and c 3 .
  • the vector direction from c 2 to c 3 is The angle between the vehicle heading and the vehicle is dis a .
  • the vehicle is on the left side of the global path and the dis l is positive.
  • the vehicle is on the right side of the global path and the dis l is negative.
  • the data volume of each frame of LiDAR is as high as 100,000 to 1 million data points, which will generate a huge amount of calculation when used directly in neural networks.
  • the unnecessary point cloud is filtered out using the ground removal algorithm, and then the remaining point cloud (obstacle point cloud) is remapped into a one-dimensional array of 60 numbers, which can greatly reduce the amount of calculation.
  • each number represents the distance to the obstacle in every 3° interval around the vehicle body, which can effectively retain the necessary information.
  • the relationship between the vehicle and the global path is represented by only two data, dis l and dis a , and the changes in these two values can be effectively mapped to the vehicle's action value, that is, when the vehicle deviates to the left relative to the global path, the dis l and dis a values will increase, and the fully connected layer of the DRL neural network will make it easier to increase the steer value, thereby making the vehicle turn right, and vice versa.
  • This can effectively reduce the number of input parameters, reduce the number of neurons, and thus reduce the training time.
  • a navigation control algorithm is used to determine the vehicle control amount O dwa of the optimal path; the vehicle control amount O dwa includes: throttle action value, steering action value and brake action value; the navigation control algorithm is an algorithm that obtains the vehicle control amount in a non-self-learning manner according to the current position, target position and surrounding environment of the vehicle through local path planning and path tracking calculation methods; the navigation control algorithm includes but is not limited to the DWA algorithm.
  • an Actor-Critic type DRL decision network is used to build a navigation model;
  • the Actor-Critic type DRL decision network includes: a DRLActor network and a DRLCritic network;
  • the DRLActor network is used to output a first vehicle control amount according to the vehicle and environment status data and the optimal path.
  • the DRL Critic network is used to output the expected benefits corresponding to the vehicle control amount determined by the traditional navigation control algorithm and the first vehicle control amount according to the first vehicle control amount, the vehicle control amount determined by the navigation control algorithm, and the vehicle and environment state data; determine the final control amount according to the expected benefits corresponding to the two groups of vehicle control amounts, and output it;
  • the navigation model is optimized by using a reward and punishment mechanism to determine the optimized navigation model.
  • the method of determining the vehicle control amount by using a navigation control algorithm based on the point cloud data of the obstacle specifically includes:
  • the path planning algorithm is used to determine multiple paths to be driven according to the current position and target position of the vehicle;
  • is the maximum steering angle of the vehicle
  • ⁇ i is negative for left turn
  • ⁇ i is positive for right turn
  • n is the total number of drivable paths planned by the navigation control algorithm
  • i is an index value ranging from 1 to n.
  • the best path is selected as follows:
  • Target c 0 +R+V ⁇ k target (2)
  • R is the minimum turning radius of the vehicle
  • V is the forward speed of the vehicle
  • k target is the proportional coefficient
  • the vehicle control amount O dwa is determined using the following formula.
  • the vehicle control amount O dwa is:
  • steer dwa , throttle dwa , and brake dwa respectively represent the steering value, throttle value, and brake value of the vehicle control quantity output by the DWA algorithm.
  • the Actor-Critic type of DRL algorithm includes two types of neural networks.
  • One is the DRL Actor network, whose input parameter is the observation value S drl used to characterize the current environment and state of the vehicle; the output is the action value set O drl , where the action value includes throttle, steering and brake, and the range of the action value is:
  • steer drl , throttle drl and brake drl respectively represent the steering value, throttle value and brake value of the vehicle control quantity output by the DRL algorithm.
  • the other type is the DRL Critic network, whose input values include the observation value s representing the current environment and state of the vehicle and the action value a corresponding to the observation value s; the output value is the expected benefit J of the corresponding relationship [s, a].
  • the two types of neural network structures are shown in Figures 4 and 5 respectively.
  • the Actor network is used to calculate the action value a of vehicle control, so the goal of Actor network training is to select appropriate parameters for each neuron in the Actor network to complete the navigation control task of autonomous driving.
  • the Critic network is used to calculate the benefits of executing a in the current environment s, that is, to evaluate this set of actions, so the training goal of the Critic network is to correctly evaluate the mapping between s and a.
  • the basis for the correctness of the evaluation comes from the artificially set reward function, and the evaluation results will be used to optimize the parameters of the neural network.
  • the reward function is the "reward” (in line with expectations) or “punishment” (not in line with expectations or causing harm) for the action a performed by the vehicle in the current environment s.
  • r done is the reward value obtained for completing the task.
  • r done 0.
  • r over is the penalty value when a collision occurs or the distance from the global path exceeds the set value.
  • r over 0.
  • V angular_z is the z-axis angular velocity of the vehicle. ⁇ 1 , ⁇ 2 , ⁇ 3 , ⁇ 4 are proportional coefficients.
  • speed is the forward linear velocity of the vehicle.
  • dis l is the Euclidean distance between the vehicle and the path point with the smallest distance value from the current position of the vehicle in the global path.
  • dis a is the angle between the global path and the vehicle heading.
  • the vehicle action values O dwa and O drl obtained by the DWA algorithm and the DRL module are obtained respectively. These two action values and the current state s are input into the Critic network respectively to obtain two expected values J dwa and J drl . Then the selector compares the values of J dwa and J drl , and outputs the action value with a larger expected value between O dwa and O drl as the vehicle action a. After action a is executed, the vehicle will enter a new state S′ drl . Then the reward value R obtained by the action is calculated, and the parameters (S td3 , a, S′ td3 , R) before and after each action are executed are stored for adjustment of the neural network parameters.
  • the vehicle actions are almost random due to the initial parameters of the Actor network, and the vehicle control effect is bound to be very poor.
  • the actions output by the DWA algorithm are manually set, and the control effect is much better than the DRL algorithm in the early stage.
  • the initial parameters of the Critic network are random, so the action selection for the DWA algorithm and the DRL module is almost random, so O dwa and O drl will be randomly selected for action output.
  • the control effect of the whole algorithm is better. Better control effect means that the vehicle is easier to obtain rewards during the training process, thereby solving the problem that the DRL algorithm is difficult to converge and converges slowly in the early stage due to sparse rewards.
  • the probability of the results of the later DWA algorithm being selected will become smaller and smaller.
  • the DWA algorithm will not affect the final effect of the DRL module, but only provide assistance to DRL in the early stage.
  • the parameter adjustment method (algorithm) and network structure of the DRL algorithm are not involved in the process of guiding the training of DRL by the DWA algorithm, it will not affect the characteristics and mechanisms of various Actor-Critic DRL algorithms. That is, it can greatly improve the convergence speed of the neural network while retaining the characteristics of the original DRL algorithm, while reducing the difficulty of convergence.
  • the present invention further provides an automatic driving vehicle navigation control system, comprising:
  • a data acquisition module is used to acquire vehicle and environment status data;
  • the vehicle and environment status data includes: point cloud data of obstacles within a set distance of the vehicle acquired by using a laser radar and position information of the vehicle acquired by using an inertial navigation system;
  • the navigation module is used to perform navigation using an optimized navigation model according to vehicle and environment status data; the training process of the optimized navigation model is as follows:
  • a navigation control algorithm is used to determine the vehicle control amount;
  • the vehicle control amount includes: throttle action value, steering action value and brake action value;
  • the navigation control algorithm includes: DWA algorithm;
  • an Actor-Critic type DRL decision network is used to construct a navigation model;
  • the Actor-Critic type DRL decision network includes: a DRL Actor network and a DRL Critic network;
  • the DRL Actor network is used to output a first vehicle control quantity according to the vehicle and environment status data and the vehicle control quantity of the optimal path;
  • the DRL Critic network is used to output the vehicle control quantity determined by the navigation control algorithm and the expected benefit corresponding to the first vehicle control quantity according to the first vehicle control quantity, the vehicle control quantity determined by the navigation control algorithm, and the vehicle and environment status data;
  • the final control quantity is determined according to the expected benefits corresponding to the two groups of vehicle control quantities, and is output;
  • the navigation model is optimized by using a reward and punishment mechanism to determine the optimized navigation model.
  • the data acquisition module specifically includes:
  • a two-dimensional surround view image determination unit is used to reproject and encode the point cloud data of obstacles with the vehicle as the center of the cylindrical surface to obtain a two-dimensional surround view image;
  • a state matrix determination unit is used to perform lateral mean pooling and longitudinal maximum value pooling on the two-dimensional surround view image to obtain a 1*60 state matrix; the state matrix is used to represent the distance from each angle of the vehicle body to the obstacle;
  • the angle determination unit is used to obtain the global path of the vehicle, and obtain the path point with the smallest distance value from the current position of the vehicle in the global path and the angle with the vehicle heading.
  • the method of determining the vehicle control amount of the optimal path by using a navigation control algorithm based on the point cloud data of the obstacle specifically includes:
  • the path planning algorithm determines multiple paths to be driven
  • the vehicle control amount is determined using the path tracking algorithm.
  • the optimizing the navigation model by using the reward and punishment mechanism to determine the optimized navigation model specifically includes:
  • r done is the reward value obtained for completing the task.
  • r done 0.
  • r over is the penalty value when a collision occurs or the distance from the global path exceeds the set value.
  • r over 0.
  • V angular_z is the z-axis angular velocity of the vehicle. ⁇ 1 , ⁇ 2 , ⁇ 3 , ⁇ 4 are proportional coefficients.
  • speed is the forward linear velocity of the vehicle.
  • d l is the Euclidean distance between the vehicle and the path point with the smallest distance value from the current position of the vehicle in the global path.
  • dis a is the angle between the global path and the vehicle heading.
  • each embodiment is described in a progressive manner, and each embodiment focuses on the differences from other embodiments.
  • the same or similar parts between the embodiments can be referred to each other.
  • the description is relatively simple, and the relevant parts can be referred to the method part.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Navigation (AREA)

Abstract

本发明涉及一种自动驾驶车辆导航控制方法及系统。该方法包括根据车辆与环境状态数据采用优化后的导航模型进行导航;训练过程为根据障碍物的点云数据,采用导航控制算法确定车辆控制量;根据车辆控制量和车辆与环境状态数据,采用DRL决策网络构建导航模型;DRL决策网络包括:根据车辆与环境状态数据和车辆控制量输出第一车辆控制量的DRL Actor网络以及根据第一车辆控制量、车辆控制量以及车辆与环境状态数据输出两组车辆控制量对应的期望收益,确定最终控制量并进行输出的DRL Critic网络;利用奖惩机制对导航模型进行优化。本发明能够提高DRL方式的自动驾驶导航控制的精度和神经网络训练收敛速度、降低神经网络训练收敛难度。

Description

一种自动驾驶车辆导航控制方法及系统
本申请要求于2022年10月27日提交中国专利局、申请号为202211322372.7、发明名称为“一种自动驾驶车辆导航控制方法及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及自动驾驶领域,特别是涉及一种自动驾驶车辆导航控制方法及系统。
背景技术
随着自动驾驶汽车的发展,研究者们发现深度强化学习(DRL)方法能够通过无监督学习很好地将路网络数据和各种传感器信息直接转换为决策值与控制命令,这一特性使得DRL方法被广泛应用于自动驾驶的路径规划系统、行为决策系统、导航控制系统等关键场景。其中,DRL方式的车辆导航控制系统要求智能体能够依据感知系统的数据直接输出车辆的控制指令;同时,要求智能体的决策值符合车辆动力学要求,并且能够完全独立地应对运行场景中的各种情况。它属于端到端的车辆控制方法,其中,深度神经网络作为智能车辆的“大脑”,根据环境状态控制车辆运动。研究人员期望通过深度神经网络的大量训练和学习使得智能车辆能够获得人类驾驶的效果。
在传统的DRL神经网络训练过程中使用奖惩机制进行神经网络的优化。即智能体观测到一个多维状态S,根据神经网络的权值和偏置计算得到一个动作值A,在该智能体执行这个动作后会进入到下一个状态S′。如果S′是一个较为理想的状态,则会获得“奖励”,反之会受到“惩罚”。根据奖惩情况判断由S到A的神经网络参数是否合适,不合适则根据奖惩值对参数进行调整。在神经网络训练初期,由于S到A的映射关系是近乎随机的,所以在智能体运行过程中获得“奖励”的几率很小。这种稀疏的奖励很大程度上导致神经网络的训练较为缓慢,且难以收敛到理想状态。
智能体使用传感器观测环境得到多维状态S,常用的传感器为相机和激光雷达。其中,相机获取的是平面数据,无法对深度信息进行观测;使用双目相机可以进行深度信息的计算,但是这种方式存在较大误差;激光 雷达则可以直接观测到环境精确的三维信息。无论是相机获得的图像数据还是激光雷达获得的点云信息,其数据量均过于庞大,需要对其进行预处理以尽量减小状态S的维度。
由于自动驾驶车辆导航控制任务的复杂性,用于该任务的神经网络有很多神经元参数需要调整,所以神经网络的训练是一个非常耗时的任务,通常需要数十万次甚至数百万次训练。针对这种传统DRL方法的探索效率低、难以快速收敛至理想状态等问题,现阶段优化方案主要分为两类,第一类是通过结合多种强化学习的方法对不同的决策或控制任务进行设计。如Y.Xiao等人在车辆端到端的导航控制任务中,使用一个具有离散动作空间的网络进行车辆的行为决策(左转、右转、跟车等),然后根据选择的动作切换不同的连续动作网络以输出车辆控制指令。L.Chen等人也使用了类似的方案,这种组合神经网络的方式借鉴于分层强化学习(Hierarchical Reinforcement Learning),这种处理方式往往能够有效提升DRL模型的综合性能,但是运行速度会变慢。第二类是结合传统的控制理论,在DRL训练的过程中对决策值的输出进行矫正。如Xie L等人使用一个随机切换器对PID、OA(obstacle avoidance)控制器与DDPG决策器进行动作的随机选择输出,进而解决DDPG在应用于复杂的现实世界环境时受到高方差问题的困扰,这种方式往往能够有效提高神经网络的训练速度,但神经网络是一个黑盒系统,如何有效地将传统控制方法融入到DRL方法,进而提高自动驾驶车辆导航控制的精确度仍然是丞待解决的关键问题。
发明内容
本发明的目的是提供一种自动驾驶车辆导航控制方法及系统,能够提高自动驾驶车辆导航控制的精确度。
为实现上述目的,本发明提供了如下方案:
一种自动驾驶车辆导航控制方法,包括:
获取车辆与环境状态数据;所述车辆与环境状态数据包括:利用激光雷达获取的车辆设定距离内障碍物的点云数据以及利用惯性导航仪获取的车辆的位姿信息;
根据车辆与环境状态数据采用优化后的导航模型进行导航;所述优化 后的导航模型的训练过程为:
根据障碍物的点云数据,采用导航控制算法确定车辆控制量;所述车辆控制量包括:油门动作值、转向动作值与刹车动作值;所述导航控制算法包括:DWA算法;
根据导航控制算法确定的车辆控制量以及车辆与环境状态数据,采用Actor-Critic类型的DRL决策网络构建导航模型;所述Actor-Critic类型的DRL决策网络包括:DRLActor网络和DRL Critic网络;所述DRLActor网络用于根据车辆与环境状态数据以及最优路径的车辆控制量输出第一车辆控制量;所述DRL Critic网络用于根据第一车辆控制量、所述导航控制算法确定的车辆控制量以及车辆与环境状态数据输出导航控制算法确定的车辆控制量和第一车辆控制量对应的期望收益;根据两组车辆控制量对应的期望收益确定最终控制量,并进行输出;
利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型。
可选地,所述获取车辆与环境状态数据,具体包括:
以车辆为圆柱面中心,对障碍物的点云数据进行重投影和编码,得到二维环视图像;
对所述二维环视图像进行横向均值池化与纵向最大值池化,得到1*60的状态矩阵;所述状态矩阵用于表示车身每一个角度到障碍物的距离;
获取车辆的全局路径,并获取在全局路径中与车辆的当前位置距离值最小的路径点以及与车辆航向的夹角。
可选地,所述根据障碍物的点云数据,采用导航控制算法确定车辆控制量,具体包括:
以车辆的当前位置为原点,根据车辆当前位姿以及目标位姿,利用路径规划算法确定多条待行驶路径;
根据目标点的位置以及障碍物的点云数据确定最优路径;
根据最优路径,利用路径跟踪算法确定车辆控制量。
可选地,所述利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型,具体包括:
利用公式 确定奖赏函数R;
其中,rdone为完成任务获得的奖励值,未完成任务时,rdone=0,rover为发生碰撞或距离全局路径超过设定值时的惩罚值,未发生碰撞或距离全局路径未超过设定值时rover=0,Vangular_z为车辆z轴角速度,λ1,λ2,λ3,λ4为比例系数,speed为车辆正向线速度,disl为车辆与在全局路径中与车辆的当前位置距离值最小的路径点的欧式距离,disa为全局路径与车辆航向的夹角。
一种自动驾驶车辆导航控制系统,包括:
数据获取模块,用于获取车辆与环境状态数据;所述车辆与环境状态数据包括:利用激光雷达获取的车辆设定距离内障碍物的点云数据以及利用惯性导航仪获取的车辆的位姿信息;
导航模块,用于根据车辆与环境状态数据采用优化后的导航模型进行导航;所述优化后的导航模型的训练过程为:
根据障碍物的点云数据,采用导航控制算法确定车辆控制量;所述导航控制算发包括:局部路径规划算法确定待行驶路线,再使用路径跟踪算法确定车辆控制量;所述车辆控制量包括:油门动作值、转向动作值与刹车动作值;
根据所述导航控制算法确定的车辆控制量以及车辆与环境状态数据,采用Actor-Critic类型的DRL决策网络构建导航模型;所述Actor-Critic类型的DRL决策网络包括:DRLActor网络和DRL Critic网络;所述DRL Actor网络用于根据车辆与环境状态数据以及最优路径的车辆控制量输出第一车辆控制量;所述DRL Critic网络用于根据第一车辆控制量、所述导航控制算法确定的车辆控制量以及车辆与环境状态数据输出导航控制算 法确定的车辆控制量和第一车辆控制量对应的期望收益;根据两组车辆控制量对应的期望收益确定最终控制量,并进行输出;
利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型。
可选地,所述数据获取模块具体包括:
二维环视图像确定单元,用于以车辆为圆柱面中心,对障碍物的点云数据进行重投影和编码,得到二维环视图像;
状态矩阵确定单元,用于对所述二维环视图像进行横向均值池化与纵向最大值池化,得到1*60的状态矩阵;所述状态矩阵用于表示车身每一个角度到障碍物的距离;
夹角确定单元,用于获取车辆的全局路径,并获取在全局路径中与车辆的当前位置距离值最小的路径点以及与车辆航向的夹角。
可选地,所述根据障碍物的点云数据,采用导航控制算法确定车辆控制量,具体包括:
以车辆的当前位置为原点,根据车辆当前位姿以及目标位姿,利用路径规划算法确定多条待行驶路径;
根据目标点的位置以及障碍物的点云数据确定最优路径;
根据最优路径,利用路径跟踪算法确定车辆控制量。
可选地,所述利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型,具体包括:
利用公式确定奖赏函数R;
其中,rdone为完成任务获得的奖励值,未完成任务时,rdone=0,rover为发生碰撞或距离全局路径超过设定值时的惩罚值,未发生碰撞或 距离全局路径未超过设定值时rover=0,Vangular_z为车辆z轴角速度,λ1,λ2,λ3,λ4为比例系数,speed为车辆正向线速度,disl为车辆与在全局路径中与车辆的当前位置距离值最小的路径点的欧式距离,disa为全局路径与车辆航向的夹角。
根据本发明提供的具体实施例,本发明公开了以下技术效果:
本发明所提供的一种自动驾驶车辆导航控制方法及系统,将传统车辆导航控制算法与Actor-Critic类型的DRL决策网络进行融合,由于在传统车辆导航控制算法对Actor-Critic类型的DRL决策网络进行引导训练的过程中并未涉及到DRL算法的参数调整方式(算法)以及网络结构,故不会对各种Actor-Critic类的DRL算法特点和机理造成影响,即能在保留原有DRL算法特点的情况下大大提高神经网络的收敛速度,同时降低收敛难度,同时提高DRL模型的实际表现,进而提高自动驾驶车辆导航控制的精确度。
说明书附图
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明所提供的一种自动驾驶车辆导航控制方法流程示意图;
图2为本发明所提供的一种自动驾驶车辆导航控制方法整体流程示意图;
图3为Global Path指引车辆行驶的全局路径示意图;
图4为DRL Actor网络结构示意图;
图5为DRL Critic网络结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的 范围。
本发明的目的是提供一种自动驾驶车辆导航控制方法及系统,能够提高DRL(Actor-Critic类型)方式的自动驾驶导航控制的精度、提高神经网络训练收敛速度、降低神经网络训练收敛难度。
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。
图1为本发明所提供的一种自动驾驶车辆导航控制方法流程示意图,图2为本发明所提供的一种自动驾驶车辆导航控制方法整体流程示意图,如图1和图2所示,本发明所提供的一种自动驾驶车辆导航控制方法,包括:
S101,获取车辆与环境状态数据;所述车辆与环境状态数据包括:利用激光雷达获取的车辆设定距离内障碍物的点云数据以及利用惯性导航仪获取的车辆的位姿信息;激光雷达输出的车辆设定距离内障碍物的点云数据经过地面滤除,得到仅包含障碍物的点云矩阵Φ;
S101具体包括:
以车辆为圆柱面中心,对障碍物的点云数据进行重投影和编码,得到具有深度信息的二维环视图像。
对所述二维环视图像进行横向均值池化与纵向最大值池化,得到1*60的状态矩阵φd;所述状态矩阵用于表示车身每一个角度到障碍物的距离。
获取车辆的全局路径,并获取在全局路径中与车辆的当前位置距离值最小的路径点以及与车辆航向的夹角。
如图3所示,事先规划好全局路径(Global Path),全局路径由多个等距路径点构成;根据全局路径中距离车辆最近的一个路径点与车辆当前位置确定车辆与全局路径的位置偏差;根据全局路径中距离车辆最近的一个路径点后的第2、3个路径点连线的朝向,确定车辆当前朝向与全局路径方向的夹角。全局路径即起点到终点的路线,获取方法:使用Unity或者相应的高精地图绘制软件进行某一区域的路线地图绘制;得到区域地图后,根据起点与终点的位置,按照距离最短或行驶时间最短的原则,获取一条最优路线,该路线即全局路径。
全局路径点中,距离车辆的当前位置最近的点为c0,即为Current waypoint(当前所在路径点),这里定义c0与车辆坐标原点的欧式距离为disl。定义c0后2m、3m处的路径点为c2、c3,由c2到c3的矢量方向与车辆航向之间的夹角为disa。其中车辆在全局路径左侧disl为正值,车辆在全局路径右侧disl为负值。若yaw(偏航角)相对于左偏则disa为正值,yaw相对于右偏则disa为负值。这样设计的好处在于,当车辆相对于全局路径左偏时,disl与disa值会增大,经过DRL网络的全连接层会更容易使steer(转向动作)值增大,进而使车辆右转。
激光雷达每一帧的数据量高达10-100万个数据点,直接用于神经网络会产生巨大的计算量。先利用地面去除算法滤除不必要的点云,再对剩下的点云(障碍物点云)进行重映射为60个数字的一维数组,可以极大减少计算量。同时每个数字代表车身周围每3°间隔的方向上与障碍物的距离,能够有效保留必要信息。
车辆与全局路径的相互关系仅由disl与disa两个数据表示,且这两个值的变化能够有效映射至车辆的动作值上,即当车辆相对于全局路径左偏时,disl与disa值会增大,经过DRL神经网络的全连接层会更容易使steer值增大,进而使车辆右转,反之亦然。这样能够有效减少输入参数的数量,减小神经元的数量,进而减少训练耗时。
S102,根据车辆与环境状态数据采用优化后的导航模型进行导航;所述优化后的导航模型的训练过程为:
根据障碍物的点云数据,采用导航控制算法确定最优路径的车辆控制量Odwa;所述车辆控制量Odwa包括:油门动作值、转向动作值与刹车动作值;所述导航控制算法,就是以非自学习的方式,根据车辆的当前位置、目标位置以及周围环境,经局部路径规划以及路径跟踪的计算方法,获得车辆的控制量的算法;所述导航控制算法包括但不限于DWA算法。
根据导航控制算法确定的车辆控制量以及车辆与环境状态数据,采用Actor-Critic类型的DRL决策网络构建导航模型;所述Actor-Critic类型的DRL决策网络包括:DRLActor网络和DRLCritic网络;所述DRLActor网络用于根据车辆与环境状态数据以及最优路径的车辆控制量输出第一 车辆控制量;所述DRL Critic网络用于根据第一车辆控制量、所述导航控制算法确定的车辆控制量以及车辆与环境状态数据输出传统导航控制算法确定的车辆控制量和第一车辆控制量对应的期望收益;根据两组车辆控制量对应的期望收益确定最终控制量,并进行输出;
利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型。
所述根据障碍物的点云数据,采用导航控制算法确定车辆控制量,以改进的DWA算法为例,具体包括:
以车辆的当前位置为原点,根据车辆当前位姿以及目标位姿,利用路径规划算法确定多条待行驶路径;
利用公式确定转角δi
其中α为车辆最大转向角,δi为负代表左转,δi为正代表右转。n为导航控制算法所规划出的可行驶路径的总数量,i为大小由1到n的索引值。
根据目标点的位置以及障碍物的点云数据确定最优路径Pathbest
Pathbest的选择方式为:
选择车辆当前所在路径点c0的后一定距离的路径点为Target(目标点):
Target=c0+R+V×ktarget      (2)
其中,R为车辆最小转弯半径,V为车辆前向速度,ktarget为比例系数。
对于每条行驶轨迹,定义最后一个轨迹点为Pi,选择车头中心点为P0
则夹角θi=∠PiP0Target,进而选择最优路径pathbest
Pathbest=Pathj if θj=min(θi,i∈[1,n])    (3)
根据最优路径,采用如下公式确定车辆控制量Odwa
车辆控制量Odwa为:
steerdwa、throttledwa、brakedwa分别代表DWA算法输出的车辆控制量中的转向值、油门值、刹车值。
DWA算法所需的所有状态为Sdwa,以及Actor-Critic类型的DRL决 策网络所需的所有状态为Sdrl。其中speed为车辆正向线速度。
Actor-Critic类型的DRL算法包含两类神经网络。一类为DRL Actor网络,输入参数为用于表征车辆当前环境和状态的观测值Sdrl;输出为动作值集合Odrl,在这里动作值包含油门、转向与刹车,动作值的范围为:
其中steerdrl、throttledrl、brakedrl分别代表DRL算法输出的车辆控制量中的转向值、油门值、刹车值。
另一类为DRL Critic网络,该网络的输入值包括表征车辆当前环境和状态的观测值s以及观测值s对应的动作值a;输出值为[s,a]这组对应关系的期望收益J。两类神经网络结构分别如图4和图5所示。
在Actor-Critic类型的DRL算法中,Actor网络用于计算车辆控制的动作值a,所以Actor网络训练的目标是为Actor网络中各个神经元选择合适的参数以完成自动驾驶的导航控制任务。Critic网络用于计算当前环境s下执行a会带来的收益,也就是对这组动作进行评估,所以Critic网络的训练目标是正确对s、a的映射进行评估,评估正确与否的依据来自于人为设定的奖赏函数,评估的结果将用于优化神经网络的参数。
奖赏函数是车辆在当前环境s下执行动作a之后,对这一动作进行“奖励”(符合预期)或者“惩罚”(不符合预期或产生危害)。
利用公式确定 奖赏函数R;
其中,rdone为完成任务获得的奖励值,未完成任务时,rdone=0,rover为发生碰撞或距离全局路径超过设定值时的惩罚值,未发生碰撞或距离全局路径未超过设定值时rover=0,Vangular_z为车辆z轴角速度,λ1,λ2,λ3,λ4为比例系数,speed为车辆正向线速度,disl为车辆与在全局路径中与车辆的当前位置距离值最小的路径点的欧式距离,disa为全局路径与车辆航向的夹角。
如图2所示,分别获得了DWA算法和DRL模块获取的车辆动作值Odwa与Odrl,将这两个动作值与当前状态s输分别输入到Critic网络中,得到两个期望值Jdwa与Jdrl。然后由选择器比较Jdwa与Jdrl值的大小,将Odwa与Odrl中具有较大期望值的动作值作为车辆动作a输出。动作a被执行后车辆会进入到新的状态S′drl。随后计算该动作所获取的奖赏值R,将每个动作执行前后的(Std3,a,S′td3,R)参数进行存储,用于神经网络参数的调整。
进行多次训练,直至车辆表现收敛为止。
在DRL神经网络训练的初期,由于Actor网络初始参数导致车辆动作是近乎随机的,车辆控制效果必然很差。DWA算法输出的动作是人为设定的,在初期控制效果要远远好于DRL算法。而Critic网络初始参数具有随机性,所以对于DWA算法与DRL模块的动作选择是近乎随机的,所以会随机选择Odwa与Odrl进行动作输出。相比于单一的DRL算法,整套算法的控制效果是更好的。更好的控制效果即意味着车辆在训练过程中更加容易获得奖励,从而解决DRL算法在初期因为奖励稀疏而难以收敛以及收敛慢的过程。
在DRL神经网络训练的后期,由于神经网络参数的逐渐完善,后期DWA算法的结果被选择的概率会越来越小,这种模式下DWA算法并不会影响DRL模块的最终效果,只是在初期为DRL提供帮助。
由于在DWA算法对DRL进行引导训练的过程中并未涉及到DRL算法的参数调整方式(算法)以及网络结构,故不会对各种Actor-Critic类的DRL算法特点和机理造成影响。即能在保留原有DRL算法特点的情况下大大提高神经网络的收敛速度,同时降低收敛难度。
对应上述实施例,本发明还提供一种自动驾驶车辆导航控制系统,包括:
数据获取模块,用于获取车辆与环境状态数据;所述车辆与环境状态数据包括:利用激光雷达获取的车辆设定距离内障碍物的点云数据以及利用惯性导航仪获取的车辆的位姿信息;
导航模块,用于根据车辆与环境状态数据采用优化后的导航模型进行导航;所述优化后的导航模型的训练过程为:
根据障碍物的点云数据,采用导航控制算法确定车辆控制量;所述车辆控制量包括:油门动作值、转向动作值与刹车动作值;所述导航控制算法包括:DWA算法;
根据导航控制算法确定的车辆控制量以及车辆与环境状态数据,采用Actor-Critic类型的DRL决策网络构建导航模型;所述Actor-Critic类型的DRL决策网络包括:DRL Actor网络和DRL Critic网络;所述DRL Actor网络用于根据车辆与环境状态数据以及最优路径的车辆控制量输出第一车辆控制量;所述DRL Critic网络用于根据第一车辆控制量、所述导航控制算法确定的车辆控制量以及车辆与环境状态数据输出导航控制算法确定的车辆控制量和第一车辆控制量对应的期望收益;根据两组车辆控制量对应的期望收益确定最终控制量,并进行输出;
利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型。
所述数据获取模块具体包括:
二维环视图像确定单元,用于以车辆为圆柱面中心,对障碍物的点云数据进行重投影和编码,得到二维环视图像;
状态矩阵确定单元,用于对所述二维环视图像进行横向均值池化与纵向最大值池化,得到1*60的状态矩阵;所述状态矩阵用于表示车身每一个角度到障碍物的距离;
夹角确定单元,用于获取车辆的全局路径,并获取在全局路径中与车辆的当前位置距离值最小的路径点以及与车辆航向的夹角。
所述根据障碍物的点云数据,采用导航控制算法确定最优路径的车辆控制量,具体包括:
以车辆的当前位置为原点,根据车辆当前位姿以及目标位姿,利用路 径规划算法确定多条待行驶路径;
根据目标点的位置以及障碍物的点云数据确定最优路径;
根据最优路径,利用路径跟踪算法确定车辆控制量。
所述利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型,具体包括:
利用公式确定奖赏函数R;
其中,rdone为完成任务获得的奖励值,未完成任务时,rdone=0,rover为发生碰撞或距离全局路径超过设定值时的惩罚值,未发生碰撞或距离全局路径未超过设定值时rover=0,Vangular_z为车辆z轴角速度,λ1,λ2,λ3,λ4为比例系数,speed为车辆正向线速度,disl为车辆与在全局路径中与车辆的当前位置距离值最小的路径点的欧式距离,disa为全局路径与车辆航向的夹角。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本发明的限制。

Claims (8)

  1. 一种自动驾驶车辆导航控制方法,其特征在于,包括:
    获取车辆与环境状态数据;所述车辆与环境状态数据包括:利用激光雷达获取的车辆设定距离内障碍物的点云数据以及利用惯性导航仪获取的车辆的位姿信息;
    根据车辆与环境状态数据采用优化后的导航模型进行导航;所述优化后的导航模型的训练过程为:
    根据障碍物的点云数据,采用导航控制算法确定车辆控制量;所述车辆控制量包括:油门动作值、转向动作值与刹车动作值;
    根据所述导航控制算法确定的车辆控制量以及车辆与环境状态数据,采用Actor-Critic类型的DRL决策网络构建导航模型;所述Actor-Critic类型的DRL决策网络包括:DRLActor网络和DRL Critic网络;所述DRL Actor网络用于根据车辆与环境状态数据以及最优路径的车辆控制量输出第一车辆控制量;所述DRL Critic网络用于根据第一车辆控制量、所述导航控制算法确定的车辆控制量以及车辆与环境状态数据输出导航控制算法确定的车辆控制量和第一车辆控制量对应的期望收益;根据两组车辆控制量对应的期望收益确定最终控制量,并进行输出;
    利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型。
  2. 根据权利要求1所述的一种自动驾驶车辆导航控制方法,其特征在于,所述获取车辆与环境状态数据,具体包括:
    以车辆为圆柱面中心,对障碍物的点云数据进行重投影和编码,得到二维环视图像;
    对所述二维环视图像进行横向均值池化与纵向最大值池化,得到1*60的状态矩阵;所述状态矩阵用于表示车身每一个角度到障碍物的距离;
    获取车辆的全局路径,并获取在全局路径中与车辆的当前位置距离值最小的路径点以及与车辆航向的夹角。
  3. 根据权利要求1所述的一种自动驾驶车辆导航控制方法,其特征在于,所述根据障碍物的点云数据,采用导航控制算法确定车辆控制量,具体包括:
    以车辆的当前位置为原点,根据车辆当前位姿以及目标位姿,利用路 径规划算法确定多条待行驶路径;
    根据目标点的位置以及障碍物的点云数据确定最优路径;
    根据最优路径,利用路径跟踪算法确定车辆控制量。
  4. 根据权利要求2所述的一种自动驾驶车辆导航控制方法,其特征在于,所述利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型,具体包括:
    利用公式确定奖赏函数R;
    其中,rdone为完成任务获得的奖励值,未完成任务时,rdone=0,rover为发生碰撞或距离全局路径超过设定值时的惩罚值,未发生碰撞或距离全局路径未超过设定值时rover=0,Vangular_z为车辆z轴角速度,λ1,λ2,λ3,λ4为比例系数,speed为车辆正向线速度,disl为车辆与在全局路径中与车辆的当前位置距离值最小的路径点的欧式距离,disa为全局路径与车辆航向的夹角。
  5. 一种自动驾驶车辆导航控制系统,其特征在于,包括:
    数据获取模块,用于获取车辆与环境状态数据;所述车辆与环境状态数据包括:利用激光雷达获取的车辆设定距离内障碍物的点云数据以及利用惯性导航仪获取的车辆的位姿信息;
    导航模块,用于根据车辆与环境状态数据采用优化后的导航模型进行导航;所述优化后的导航模型的训练过程为:
    根据障碍物的点云数据,采用导航控制算法确定车辆控制量;所述车辆控制量包括:油门动作值、转向动作值与刹车动作值;
    根据所述导航控制算法确定的车辆控制量以及车辆与环境状态数据,采用Actor-Critic类型的DRL决策网络构建导航模型;所述Actor-Critic类型的DRL决策网络包括:DRLActor网络和DRL Critic网络;所述DRL Actor网络用于根据车辆与环境状态数据以及最优路径的车辆控制量输出第一车辆控制量;所述DRL Critic网络用于根据第一车辆控制量、所述导航控制算法确定的车辆控制量以及车辆与环境状态数据输出导航控制算法确定的车辆控制量和第一车辆控制量对应的期望收益;根据两组车辆控制量对应的期望收益确定最终控制量,并进行输出;
    利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型。
  6. 根据权利要求5所述的一种自动驾驶车辆导航控制系统,其特征在于,所述数据获取模块具体包括:
    二维环视图像确定单元,用于以车辆为圆柱面中心,对障碍物的点云数据进行重投影和编码,得到二维环视图像;
    状态矩阵确定单元,用于对所述二维环视图像进行横向均值池化与纵向最大值池化,得到1*60的状态矩阵;所述状态矩阵用于表示车身每一个角度到障碍物的距离;
    夹角确定单元,用于获取车辆的全局路径,并获取在全局路径中与车辆的当前位置距离值最小的路径点以及与车辆航向的夹角。
  7. 根据权利要求5所述的一种自动驾驶车辆导航控制系统,其特征在于,所述根据障碍物的点云数据,采用导航控制算法确定车辆控制量,具体包括:
    以车辆的当前位置为原点,根据车辆当前位姿以及目标位姿,利用路径规划算法确定多条待行驶路径;
    根据目标点的位置以及障碍物的点云数据确定最优路径;
    根据最优路径,利用路径跟踪算法确定车辆控制量。
  8. 根据权利要求6所述的一种自动驾驶车辆导航控制系统,其特征在于,所述利用奖惩机制对所述导航模型进行优化,确定优化后的导航模型,具体包括:
    利用公式 确定奖赏函数R;
    其中,rdone为完成任务获得的奖励值,未完成任务时,rdone=0,rover为发生碰撞或距离全局路径超过设定值时的惩罚值,未发生碰撞或距离全局路径未超过设定值时rover=0,Vangular_z为车辆z轴角速度,λ1,λ2,λ3,λ4为比例系数,speed为车辆正向线速度,disl为车辆与在全局路径中与车辆的当前位置距离值最小的路径点的欧式距离,disa为全局路径与车辆航向的夹角。
PCT/CN2023/100154 2022-10-27 2023-06-14 一种自动驾驶车辆导航控制方法及系统 WO2024087654A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211322372.7 2022-10-27
CN202211322372.7A CN115494849A (zh) 2022-10-27 2022-10-27 一种自动驾驶车辆导航控制方法及系统

Publications (1)

Publication Number Publication Date
WO2024087654A1 true WO2024087654A1 (zh) 2024-05-02

Family

ID=85114976

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/100154 WO2024087654A1 (zh) 2022-10-27 2023-06-14 一种自动驾驶车辆导航控制方法及系统

Country Status (2)

Country Link
CN (1) CN115494849A (zh)
WO (1) WO2024087654A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115494849A (zh) * 2022-10-27 2022-12-20 中国科学院电工研究所 一种自动驾驶车辆导航控制方法及系统
CN116279457B (zh) * 2023-05-15 2023-08-01 北京斯年智驾科技有限公司 基于雷达点云的防撞方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190113919A1 (en) * 2017-10-18 2019-04-18 Luminar Technologies, Inc. Controlling an autonomous vehicle using smart control architecture selection
CN109991987A (zh) * 2019-04-29 2019-07-09 北京智行者科技有限公司 自动驾驶决策方法及装置
CN113335291A (zh) * 2021-07-27 2021-09-03 燕山大学 一种基于人车风险状态的人机共驾控制权决策方法
CN114355897A (zh) * 2021-12-15 2022-04-15 同济大学 一种基于模型和强化学习混合切换的车辆路径跟踪控制方法
CN115494849A (zh) * 2022-10-27 2022-12-20 中国科学院电工研究所 一种自动驾驶车辆导航控制方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190113919A1 (en) * 2017-10-18 2019-04-18 Luminar Technologies, Inc. Controlling an autonomous vehicle using smart control architecture selection
CN109991987A (zh) * 2019-04-29 2019-07-09 北京智行者科技有限公司 自动驾驶决策方法及装置
CN113335291A (zh) * 2021-07-27 2021-09-03 燕山大学 一种基于人车风险状态的人机共驾控制权决策方法
CN114355897A (zh) * 2021-12-15 2022-04-15 同济大学 一种基于模型和强化学习混合切换的车辆路径跟踪控制方法
CN115494849A (zh) * 2022-10-27 2022-12-20 中国科学院电工研究所 一种自动驾驶车辆导航控制方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LINHAI XIE ET AL.: "Learning With Stochastic Guidance for Robot Navigation", IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 32, no. 1, 31 January 2021 (2021-01-31), XP011829589, ISSN: 2162-237X, DOI: 10.1109/TNNLS.2020.2977924 *

Also Published As

Publication number Publication date
CN115494849A (zh) 2022-12-20

Similar Documents

Publication Publication Date Title
WO2024087654A1 (zh) 一种自动驾驶车辆导航控制方法及系统
CN110745136A (zh) 一种驾驶自适应控制方法
CN111681452B (zh) 一种基于Frenet坐标系下的无人驾驶汽车动态换道轨迹规划方法
CN108445750A (zh) 用于车辆运动规划的方法和系统
JP2023502797A (ja) 自動運転車または半自動運転車のアダプティブコントロール
CN111679660B (zh) 一种融合类人驾驶行为的无人驾驶深度强化学习方法
JP2023504223A (ja) 自動または半自動運転車両の適応制御
Lopez et al. Game-theoretic lane-changing decision making and payoff learning for autonomous vehicles
CN110764507A (zh) 一种强化学习与信息融合的人工智能自动驾驶系统
CN112731925A (zh) 用于无人驾驶方程式赛车锥桶识别和路径规划及控制方法
CN108791290A (zh) 基于在线增量式dhp的双车协同自适应巡航控制方法
Chen et al. Automatic overtaking on two-way roads with vehicle interactions based on proximal policy optimization
Wang et al. Vision-Based Autonomous Driving: A Hierarchical Reinforcement Learning Approach
CN110723207B (zh) 基于模型重构的智能汽车模型预测转向控制器及其控制方法
Björnberg Shared control for vehicle teleoperation with a virtual environment interface
CN113778080B (zh) 单轨双轮机器人的控制方法、装置、电子设备及存储介质
CN115373415A (zh) 一种基于深度强化学习的无人机智能导航方法
CN111221338B (zh) 一种路径跟踪的方法、装置、设备及存储介质
CN111857112B (zh) 一种汽车局部路径规划方法及电子设备
Shour et al. Shared Decision-Making Forward an Autonomous Navigation for Intelligent Vehicles
Cai et al. Implementation of the Human‐Like Lane Changing Driver Model Based on Bi‐LSTM
Liu et al. Reinforcement Learning-Based High-Speed Path Following Control for Autonomous Vehicles
LI Adaptive Sampling Control in Motion Planning of Autonomous Vehicle
Chen et al. Lateral Control for Autonomous Parking System with Fractional Order Controller.
CN116578088B (zh) 室外自主移动机器人连续轨迹生成方法及系统